Commit Graph

19 Commits

Author SHA1 Message Date
turboderp
cedeb616ce Support Qwen2 2024-02-15 20:50:24 +01:00
turboderp
702dd9740a VRAM optimizations during quant 2024-02-15 20:03:47 +01:00
turboderp
305982de43 Expand range for quantized parameter search 2024-01-30 20:22:44 +01:00
turboderp
7d37b50d90 Fix typos 2024-01-09 07:12:38 +01:00
turboderp
e089313afd Reset norm 2024-01-09 05:30:15 +01:00
turboderp
6e214f59c7 Optimize conversion kernels 2024-01-08 03:40:40 +01:00
turboderp
02ce583318 Optimize VRAM usage a bit for quantizer 2023-12-26 00:00:37 +01:00
turboderp
0d63d6479c Rework quantization and optimization 2023-12-13 01:00:11 +01:00
turboderp
644805adba Reduce VRAM usage when quantizing 2023-12-02 17:18:53 +01:00
turboderp
714a19ca8f Allow irregular group sizes 2023-11-26 16:53:29 +01:00
turboderp
02b4e65ba1 Cleanup TODO items 2023-10-22 17:57:37 +02:00
turboderp
4375e6b535 Catch edge case where torch.cholensky_inverse returns NaN tensor instead of throwing
Increase no. attempts at damping before failing
Remove enforcement of symmetry (seems to never be relevant)
2023-09-20 10:01:38 +02:00
turboderp
5c247f93aa More memory tweaks, made swapping states to CPU the default to accommodate quanting 70B on 24GB GPUs 2023-09-16 19:15:29 +02:00
turboderp
9e55e44bcb Optimize memory usage when quantizing, increase damping factor 2023-09-16 15:08:55 +02:00
turboderp
aee7a28170 Set default damping to .01 in line with GPTQ, increase damping if Hessian is not PD, disable removal of "dead" weights 2023-09-16 05:59:42 +02:00
turboderp
c5c90a8b4b Clean up imports 2023-09-11 07:31:43 +02:00
turboderp
5dc32f0f8c Fix padding for head layer when vocab is extended 2023-09-10 20:12:15 +02:00
turboderp
5d798a178a Cleaning up converter 2023-09-09 14:54:23 +02:00
turboderp
bb83469574 Initial commit 2023-08-30 11:05:23 +02:00