Commit Graph

30 Commits

Author SHA1 Message Date
turboderp
b1e092af10 Update TODO: items 2024-05-18 06:43:27 +02:00
turboderp
0d8bac53ee Cleanup 2024-04-26 23:24:45 +02:00
turboderp
e85404fbfd Quant: Slight VRAM optimization, don't scale H needlessly 2024-04-24 19:11:53 +02:00
turboderp
893e73c360 Fix scale rounding during quant 2024-04-18 22:35:57 +02:00
turboderp
b112b210aa Quant: Add a little more damping 2024-04-18 09:19:46 +02:00
turboderp
b8c267e224 Quant: Perform H perm on CPU when H is very large 2024-04-05 21:46:20 +02:00
turboderp
88843a5633 Quant: Offload quanting of very large layers to second GPU 2024-04-05 21:44:21 +02:00
turboderp
c92ffcfc6a Quant: Swap hessians and weights to system RAM 2024-03-29 18:15:54 +01:00
turboderp
762d1e4f25 Fix typehints 2024-03-29 18:11:55 +01:00
turboderp
46c59d0d42 Quantize: RTN mode for very large head layers 2024-03-19 17:45:29 +01:00
turboderp
6a0c5a5aa7 Quantize: Perform very large act-order permutations on CPU 2024-03-19 17:42:39 +01:00
turboderp
cedeb616ce Support Qwen2 2024-02-15 20:50:24 +01:00
turboderp
702dd9740a VRAM optimizations during quant 2024-02-15 20:03:47 +01:00
turboderp
305982de43 Expand range for quantized parameter search 2024-01-30 20:22:44 +01:00
turboderp
7d37b50d90 Fix typos 2024-01-09 07:12:38 +01:00
turboderp
e089313afd Reset norm 2024-01-09 05:30:15 +01:00
turboderp
6e214f59c7 Optimize conversion kernels 2024-01-08 03:40:40 +01:00
turboderp
02ce583318 Optimize VRAM usage a bit for quantizer 2023-12-26 00:00:37 +01:00
turboderp
0d63d6479c Rework quantization and optimization 2023-12-13 01:00:11 +01:00
turboderp
644805adba Reduce VRAM usage when quantizing 2023-12-02 17:18:53 +01:00
turboderp
714a19ca8f Allow irregular group sizes 2023-11-26 16:53:29 +01:00
turboderp
02b4e65ba1 Cleanup TODO items 2023-10-22 17:57:37 +02:00
turboderp
4375e6b535 Catch edge case where torch.cholensky_inverse returns NaN tensor instead of throwing
Increase no. attempts at damping before failing
Remove enforcement of symmetry (seems to never be relevant)
2023-09-20 10:01:38 +02:00
turboderp
5c247f93aa More memory tweaks, made swapping states to CPU the default to accommodate quanting 70B on 24GB GPUs 2023-09-16 19:15:29 +02:00
turboderp
9e55e44bcb Optimize memory usage when quantizing, increase damping factor 2023-09-16 15:08:55 +02:00
turboderp
aee7a28170 Set default damping to .01 in line with GPTQ, increase damping if Hessian is not PD, disable removal of "dead" weights 2023-09-16 05:59:42 +02:00
turboderp
c5c90a8b4b Clean up imports 2023-09-11 07:31:43 +02:00
turboderp
5dc32f0f8c Fix padding for head layer when vocab is extended 2023-09-10 20:12:15 +02:00
turboderp
5d798a178a Cleaning up converter 2023-09-09 14:54:23 +02:00
turboderp
bb83469574 Initial commit 2023-08-30 11:05:23 +02:00