turboderp
|
b1e092af10
|
Update TODO: items
|
2024-05-18 06:43:27 +02:00 |
|
turboderp
|
0d8bac53ee
|
Cleanup
|
2024-04-26 23:24:45 +02:00 |
|
turboderp
|
e85404fbfd
|
Quant: Slight VRAM optimization, don't scale H needlessly
|
2024-04-24 19:11:53 +02:00 |
|
turboderp
|
893e73c360
|
Fix scale rounding during quant
|
2024-04-18 22:35:57 +02:00 |
|
turboderp
|
b112b210aa
|
Quant: Add a little more damping
|
2024-04-18 09:19:46 +02:00 |
|
turboderp
|
b8c267e224
|
Quant: Perform H perm on CPU when H is very large
|
2024-04-05 21:46:20 +02:00 |
|
turboderp
|
88843a5633
|
Quant: Offload quanting of very large layers to second GPU
|
2024-04-05 21:44:21 +02:00 |
|
turboderp
|
c92ffcfc6a
|
Quant: Swap hessians and weights to system RAM
|
2024-03-29 18:15:54 +01:00 |
|
turboderp
|
762d1e4f25
|
Fix typehints
|
2024-03-29 18:11:55 +01:00 |
|
turboderp
|
46c59d0d42
|
Quantize: RTN mode for very large head layers
|
2024-03-19 17:45:29 +01:00 |
|
turboderp
|
6a0c5a5aa7
|
Quantize: Perform very large act-order permutations on CPU
|
2024-03-19 17:42:39 +01:00 |
|
turboderp
|
cedeb616ce
|
Support Qwen2
|
2024-02-15 20:50:24 +01:00 |
|
turboderp
|
702dd9740a
|
VRAM optimizations during quant
|
2024-02-15 20:03:47 +01:00 |
|
turboderp
|
305982de43
|
Expand range for quantized parameter search
|
2024-01-30 20:22:44 +01:00 |
|
turboderp
|
7d37b50d90
|
Fix typos
|
2024-01-09 07:12:38 +01:00 |
|
turboderp
|
e089313afd
|
Reset norm
|
2024-01-09 05:30:15 +01:00 |
|
turboderp
|
6e214f59c7
|
Optimize conversion kernels
|
2024-01-08 03:40:40 +01:00 |
|
turboderp
|
02ce583318
|
Optimize VRAM usage a bit for quantizer
|
2023-12-26 00:00:37 +01:00 |
|
turboderp
|
0d63d6479c
|
Rework quantization and optimization
|
2023-12-13 01:00:11 +01:00 |
|
turboderp
|
644805adba
|
Reduce VRAM usage when quantizing
|
2023-12-02 17:18:53 +01:00 |
|
turboderp
|
714a19ca8f
|
Allow irregular group sizes
|
2023-11-26 16:53:29 +01:00 |
|
turboderp
|
02b4e65ba1
|
Cleanup TODO items
|
2023-10-22 17:57:37 +02:00 |
|
turboderp
|
4375e6b535
|
Catch edge case where torch.cholensky_inverse returns NaN tensor instead of throwing
Increase no. attempts at damping before failing
Remove enforcement of symmetry (seems to never be relevant)
|
2023-09-20 10:01:38 +02:00 |
|
turboderp
|
5c247f93aa
|
More memory tweaks, made swapping states to CPU the default to accommodate quanting 70B on 24GB GPUs
|
2023-09-16 19:15:29 +02:00 |
|
turboderp
|
9e55e44bcb
|
Optimize memory usage when quantizing, increase damping factor
|
2023-09-16 15:08:55 +02:00 |
|
turboderp
|
aee7a28170
|
Set default damping to .01 in line with GPTQ, increase damping if Hessian is not PD, disable removal of "dead" weights
|
2023-09-16 05:59:42 +02:00 |
|
turboderp
|
c5c90a8b4b
|
Clean up imports
|
2023-09-11 07:31:43 +02:00 |
|
turboderp
|
5dc32f0f8c
|
Fix padding for head layer when vocab is extended
|
2023-09-10 20:12:15 +02:00 |
|
turboderp
|
5d798a178a
|
Cleaning up converter
|
2023-09-09 14:54:23 +02:00 |
|
turboderp
|
bb83469574
|
Initial commit
|
2023-08-30 11:05:23 +02:00 |
|