turboderp
|
385965ed61
|
Fix padding for extended vocab models
|
2023-09-22 00:21:49 +02:00 |
|
turboderp
|
c0dd3412d5
|
Compute full error for lm_head
|
2023-09-20 10:25:58 +02:00 |
|
turboderp
|
6fd006b9d0
|
More options for converter to facilitate scripting
|
2023-09-18 18:25:30 +02:00 |
|
turboderp
|
5c247f93aa
|
More memory tweaks, made swapping states to CPU the default to accommodate quanting 70B on 24GB GPUs
|
2023-09-16 19:15:29 +02:00 |
|
turboderp
|
9e55e44bcb
|
Optimize memory usage when quantizing, increase damping factor
|
2023-09-16 15:08:55 +02:00 |
|
turboderp
|
63f50d72de
|
Stop quantization if sanity check fails
|
2023-09-15 06:08:40 +02:00 |
|
19h
|
67b8515899
|
Conversion: release CUDA cache after VRAM intensive quant blocks
|
2023-09-14 14:06:44 +02:00 |
|
turboderp
|
e35e24346a
|
Fix padding when lm_head width is not multiple of 32
|
2023-09-13 18:47:22 +02:00 |
|
turboderp
|
52563ca347
|
Fix regression
|
2023-09-12 13:52:28 +02:00 |
|
turboderp
|
c5c90a8b4b
|
Clean up imports
|
2023-09-11 07:31:43 +02:00 |
|
turboderp
|
5dc32f0f8c
|
Fix padding for head layer when vocab is extended
|
2023-09-10 20:12:15 +02:00 |
|
turboderp
|
5d798a178a
|
Cleaning up converter
|
2023-09-09 14:54:23 +02:00 |
|
turboderp
|
4b98d98a5c
|
Fix bug in 6-bit matrix preproc
|
2023-09-06 08:47:09 +02:00 |
|
turboderp
|
bb83469574
|
Initial commit
|
2023-08-30 11:05:23 +02:00 |
|