Karl-Johan Alm
|
0ece2f3006
|
add layer GPU offloading for hidden/target states
|
2024-05-22 15:09:57 +09:00 |
|
Karl-Johan Alm
|
b428e239ed
|
optimization: put rfn_sum on cuda and do .item() call out of for loop
|
2024-05-22 12:43:21 +09:00 |
|
turboderp
|
83baa98ed9
|
Add machine-parseable output to convert script
|
2024-05-20 01:49:34 +02:00 |
|
turboderp
|
750c85e2c7
|
Fixes to allow quantizing Granite
|
2024-05-09 02:31:21 +02:00 |
|
turboderp
|
b68c0bd89b
|
Fix checkpoint interval
|
2024-04-18 23:18:58 +02:00 |
|
turboderp
|
672c7355a3
|
Quant: Change snapshot to time instead of layer interval
|
2024-04-05 21:49:10 +02:00 |
|
turboderp
|
5d9732165e
|
Quant: Swap some state to CPU and attempt to keep more VRAM available in places
|
2024-04-05 21:48:08 +02:00 |
|
turboderp
|
4845b1e89d
|
Quant: Save some memory when preparing quantizers for experts
|
2024-03-29 18:14:48 +01:00 |
|
turboderp
|
7baf3d4198
|
Adjust warning threshold for uncalibrated experts
|
2024-03-29 18:12:34 +01:00 |
|
turboderp
|
f3ed1dfed4
|
Quantize: Memory optimizations
|
2024-03-19 18:22:23 +01:00 |
|
turboderp
|
9c47269913
|
Add parallel decoder block
|
2024-03-19 18:20:44 +01:00 |
|
turboderp
|
0b05686e76
|
Refactor, clean up and consolidate architecture logic
|
2024-03-06 02:46:47 +01:00 |
|
turboderp
|
dce84866e1
|
Support for StarCoder2, initial
|
2024-03-05 21:20:29 +01:00 |
|
turboderp
|
cedeb616ce
|
Support Qwen2
|
2024-02-15 20:50:24 +01:00 |
|
turboderp
|
0e9d9c1010
|
Prevent tensors passed to save_file from sharing memory
|
2024-02-01 10:14:36 +01:00 |
|
turboderp
|
8a0cb9e01d
|
Add last saved checkpoint to status box
|
2024-02-01 04:56:33 +01:00 |
|
turboderp
|
4c93ce852f
|
Fix remaining time estimate
|
2024-02-01 04:56:00 +01:00 |
|
turboderp
|
735807e800
|
Use os.replace to swap checkpoint states in measure.py as well
|
2024-02-01 04:39:34 +01:00 |
|
turboderp
|
1e70113de3
|
Don't print avg accuracy, clarify "completed" -> "measured"
|
2024-02-01 04:24:10 +01:00 |
|
Ben Gorlick
|
56a0d6d995
|
Adding graceful exit signal handling and status box for estimating time remaining in quantization process
|
2024-01-30 17:33:54 -08:00 |
|
turboderp
|
7a9d12ae4c
|
Add non-RMS layernorm, support for Orion
|
2024-01-22 17:21:01 +01:00 |
|
turboderp
|
48b3211d9c
|
Fix for #281
|
2024-01-17 06:38:52 +01:00 |
|
turboderp
|
6e214f59c7
|
Optimize conversion kernels
|
2024-01-08 03:40:40 +01:00 |
|
turboderp
|
41b15dd1c3
|
Refactor to consolidate attn params
|
2024-01-04 04:52:49 +01:00 |
|
turboderp
|
37a1322096
|
Fix mistake in MLP measure
|
2023-12-16 20:30:25 +01:00 |
|
turboderp
|
d2753a29b8
|
Mixtral EXL2 support, initial
|
2023-12-16 16:50:50 +01:00 |
|
turboderp
|
0d63d6479c
|
Rework quantization and optimization
|
2023-12-13 01:00:11 +01:00 |
|