Commit Graph

14 Commits

Author SHA1 Message Date
turboderp
385965ed61 Fix padding for extended vocab models 2023-09-22 00:21:49 +02:00
turboderp
c0dd3412d5 Compute full error for lm_head 2023-09-20 10:25:58 +02:00
turboderp
6fd006b9d0 More options for converter to facilitate scripting 2023-09-18 18:25:30 +02:00
turboderp
5c247f93aa More memory tweaks, made swapping states to CPU the default to accommodate quanting 70B on 24GB GPUs 2023-09-16 19:15:29 +02:00
turboderp
9e55e44bcb Optimize memory usage when quantizing, increase damping factor 2023-09-16 15:08:55 +02:00
turboderp
63f50d72de Stop quantization if sanity check fails 2023-09-15 06:08:40 +02:00
19h
67b8515899 Conversion: release CUDA cache after VRAM intensive quant blocks 2023-09-14 14:06:44 +02:00
turboderp
e35e24346a Fix padding when lm_head width is not multiple of 32 2023-09-13 18:47:22 +02:00
turboderp
52563ca347 Fix regression 2023-09-12 13:52:28 +02:00
turboderp
c5c90a8b4b Clean up imports 2023-09-11 07:31:43 +02:00
turboderp
5dc32f0f8c Fix padding for head layer when vocab is extended 2023-09-10 20:12:15 +02:00
turboderp
5d798a178a Cleaning up converter 2023-09-09 14:54:23 +02:00
turboderp
4b98d98a5c Fix bug in 6-bit matrix preproc 2023-09-06 08:47:09 +02:00
turboderp
bb83469574 Initial commit 2023-08-30 11:05:23 +02:00