28 Commits

Author SHA1 Message Date
turboderp
a0ea2b0db7 Move conversion script into exllamav2 package 2024-06-21 23:58:39 +02:00
turboderp
6030517a6f Option to resume conversion job with no other args 2024-06-08 22:15:41 +02:00
Karl-Johan Alm
0ece2f3006 add layer GPU offloading for hidden/target states 2024-05-22 15:09:57 +09:00
turboderp
83baa98ed9 Add machine-parseable output to convert script 2024-05-20 01:49:34 +02:00
turboderp
a847f48720 Allow quantizing models with max_seq_len < 2048 2024-05-09 17:25:28 +02:00
turboderp
893e73c360 Fix scale rounding during quant 2024-04-18 22:35:57 +02:00
turboderp
f52612ce95 Convert: Limit max_output_len to save VRAM while measuring 2024-04-05 21:37:18 +02:00
Ben Gorlick
56a0d6d995 Adding graceful exit signal handling and status box for estimating time remaining in quantization process 2024-01-30 17:33:54 -08:00
turboderp
970af13551 Fix rope_scale display in convert.py 2023-12-29 00:40:47 +01:00
turboderp
47df040fce Read RoPE linear scale from model config 2023-12-28 23:47:56 +01:00
turboderp
5044e38b32 Oopsie 2023-12-14 02:07:32 +01:00
turboderp
0d63d6479c Rework quantization and optimization 2023-12-13 01:00:11 +01:00
turboderp
1f36c4a3e9 Fix quant resume 2023-12-10 20:27:55 +01:00
turboderp
3c43bad57f Revert some changes, calibrate to q state again (fixes 70B low bitrate) 2023-12-10 17:34:18 +01:00
turboderp
a89d85a803 Faster and more stable quant 2023-12-10 03:29:06 +01:00
turboderp
2e91239571 New quant optimization procedure 2023-12-08 20:19:57 +01:00
kingbri
6bfcefe940 Tree: Force utf8 when opening files
The default encoding on linux is utf8, but Windows uses cp1252 which
isn't compatible with some unicode characters.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 19:21:29 -05:00
turboderp
09b981fa57 Add RoPE arguments to quantizer script 2023-11-21 05:13:37 +01:00
turboderp
d4e9020f30 Add QKV embeddings 2023-10-05 21:40:11 +02:00
turboderp
21c44b79ce Bump to 0.0.3 2023-09-21 21:51:23 +02:00
turboderp
09227a7bde Make sure temp buffers are allocated for length of calibration data 2023-09-20 10:05:07 +02:00
turboderp
6fd006b9d0 More options for converter to facilitate scripting 2023-09-18 18:25:30 +02:00
turboderp
fae6fb296c Fix arg type for shard_size 2023-09-17 19:16:58 +02:00
turboderp
5c247f93aa More memory tweaks, made swapping states to CPU the default to accommodate quanting 70B on 24GB GPUs 2023-09-16 19:15:29 +02:00
turboderp
2f72437fcb Add more quant options 2023-09-16 15:04:12 +02:00
turboderp
af1398ff16 Conversion: ability to save sharded models (addresses OoM when compiling output file) 2023-09-16 11:44:07 +02:00
turboderp
952c67c4ff Update defaults for convert script 2023-09-09 14:53:52 +02:00
turboderp
bb83469574 Initial commit 2023-08-30 11:05:23 +02:00