turboderp
5d5d57083e
Increase quant tolerance slightly (for small Qwen2 models esp.)
2024-06-13 20:44:51 +02:00
turboderp
60eedf4622
Add exit status code for quant error
2024-06-13 20:43:49 +02:00
Karl-Johan Alm
b428e239ed
optimization: put rfn_sum on cuda and do .item() call out of for loop
2024-05-22 12:43:21 +09:00
turboderp
83baa98ed9
Add machine-parseable output to convert script
2024-05-20 01:49:34 +02:00
turboderp
750c85e2c7
Fixes to allow quantizing Granite
2024-05-09 02:31:21 +02:00
turboderp
b68c0bd89b
Fix checkpoint interval
2024-04-18 23:18:58 +02:00
turboderp
5c1fcb693e
Quant: Ignore OoM error during second sanity check
2024-04-05 21:49:42 +02:00
turboderp
672c7355a3
Quant: Change snapshot to time instead of layer interval
2024-04-05 21:49:10 +02:00
turboderp
5d9732165e
Quant: Swap some state to CPU and attempt to keep more VRAM available in places
2024-04-05 21:48:08 +02:00
turboderp
2a5533de3f
Quant: Option to load linear layer without allocating scratch space
2024-04-05 21:45:47 +02:00
turboderp
88843a5633
Quant: Offload quanting of very large layers to second GPU
2024-04-05 21:44:21 +02:00
turboderp
ff2ff0a407
Fix typo
2024-03-29 19:44:11 +01:00
turboderp
4845b1e89d
Quant: Save some memory when preparing quantizers for experts
2024-03-29 18:14:48 +01:00
turboderp
7baf3d4198
Adjust warning threshold for uncalibrated experts
2024-03-29 18:12:34 +01:00
turboderp
d8871e9ba1
Quantize: Use RTN mode for tensors > 1e9 elements
2024-03-19 18:25:41 +01:00
turboderp
fe7be9ecef
MoE: Adjust calibration warning threshold
2024-03-19 18:24:11 +01:00
turboderp
a724caf978
Quantize: bit of cleanup
2024-03-19 18:23:33 +01:00
turboderp
f3ed1dfed4
Quantize: Memory optimizations
2024-03-19 18:22:23 +01:00
turboderp
9c47269913
Add parallel decoder block
2024-03-19 18:20:44 +01:00
turboderp
0b05686e76
Refactor, clean up and consolidate architecture logic
2024-03-06 02:46:47 +01:00
turboderp
dce84866e1
Support for StarCoder2, initial
2024-03-05 21:20:29 +01:00
turboderp
7af6494afa
Drop device tensors for head layer during conversion
2024-02-16 17:31:19 +01:00
turboderp
cedeb616ce
Support Qwen2
2024-02-15 20:50:24 +01:00
turboderp
702dd9740a
VRAM optimizations during quant
2024-02-15 20:03:47 +01:00
Ben Gorlick
6c49870ec0
Micro-optimization in file handling when saving checkpoints in quantize.py by using os.replace for atomic operations
2024-01-31 03:22:08 -08:00
turboderp
9c3fd9df3a
Make quantizer sanity check slightly more forgiving
2024-01-30 20:24:40 +01:00
turboderp
7a9d12ae4c
Add non-RMS layernorm, support for Orion
2024-01-22 17:21:01 +01:00
turboderp
1f71d17b89
Use .union() for Python 3.8 compatibility
2024-01-20 06:22:14 +01:00
turboderp
41b15dd1c3
Refactor to consolidate attn params
2024-01-04 04:52:49 +01:00
turboderp
f4fe920a50
Reset snapshot interval
2023-12-27 17:23:58 +01:00
turboderp
02ce583318
Optimize VRAM usage a bit for quantizer
2023-12-26 00:00:37 +01:00
turboderp
b121ee418f
Fix typo
2023-12-17 10:40:11 +01:00
turboderp
d2753a29b8
Mixtral EXL2 support, initial
2023-12-16 16:50:50 +01:00
turboderp
104c367451
Quantizer experiments
2023-12-14 01:29:12 +01:00
turboderp
0d63d6479c
Rework quantization and optimization
2023-12-13 01:00:11 +01:00
turboderp
303d90b65e
Fix regression
2023-12-10 19:51:59 +01:00
turboderp
3c43bad57f
Revert some changes, calibrate to q state again (fixes 70B low bitrate)
2023-12-10 17:34:18 +01:00
turboderp
a89d85a803
Faster and more stable quant
2023-12-10 03:29:06 +01:00
turboderp
2e91239571
New quant optimization procedure
2023-12-08 20:19:57 +01:00
turboderp
644805adba
Reduce VRAM usage when quantizing
2023-12-02 17:18:53 +01:00
turboderp
f0c01a328b
Skip stats for head layer to save system RAM
2023-11-30 19:40:45 +01:00
kingbri
6bfcefe940
Tree: Force utf8 when opening files
...
The default encoding on linux is utf8, but Windows uses cp1252 which
isn't compatible with some unicode characters.
Signed-off-by: kingbri <bdashore3@proton.me >
2023-11-29 19:21:29 -05:00
turboderp
24f00214c9
Revert last commit
2023-11-27 05:29:04 +01:00
turboderp
4ff25ec6ef
Ensure inference_mode while quanting
2023-11-26 19:22:40 +01:00
turboderp
714a19ca8f
Allow irregular group sizes
2023-11-26 16:53:29 +01:00
turboderp
385965ed61
Fix padding for extended vocab models
2023-09-22 00:21:49 +02:00
turboderp
c0dd3412d5
Compute full error for lm_head
2023-09-20 10:25:58 +02:00
turboderp
6fd006b9d0
More options for converter to facilitate scripting
2023-09-18 18:25:30 +02:00
turboderp
5c247f93aa
More memory tweaks, made swapping states to CPU the default to accommodate quanting 70B on 24GB GPUs
2023-09-16 19:15:29 +02:00
turboderp
9e55e44bcb
Optimize memory usage when quantizing, increase damping factor
2023-09-16 15:08:55 +02:00