exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-04-20 06:19:00 +00:00

Author	SHA1	Message	Date
turboderp	a0ea2b0db7	Move conversion script into exllamav2 package	2024-06-21 23:58:39 +02:00
turboderp	6030517a6f	Option to resume conversion job with no other args	2024-06-08 22:15:41 +02:00
Karl-Johan Alm	0ece2f3006	add layer GPU offloading for hidden/target states	2024-05-22 15:09:57 +09:00
turboderp	83baa98ed9	Add machine-parseable output to convert script	2024-05-20 01:49:34 +02:00
turboderp	a847f48720	Allow quantizing models with max_seq_len < 2048	2024-05-09 17:25:28 +02:00
turboderp	893e73c360	Fix scale rounding during quant	2024-04-18 22:35:57 +02:00
turboderp	f52612ce95	Convert: Limit max_output_len to save VRAM while measuring	2024-04-05 21:37:18 +02:00
Ben Gorlick	56a0d6d995	Adding graceful exit signal handling and status box for estimating time remaining in quantization process	2024-01-30 17:33:54 -08:00
turboderp	970af13551	Fix rope_scale display in convert.py	2023-12-29 00:40:47 +01:00
turboderp	47df040fce	Read RoPE linear scale from model config	2023-12-28 23:47:56 +01:00
turboderp	5044e38b32	Oopsie	2023-12-14 02:07:32 +01:00
turboderp	0d63d6479c	Rework quantization and optimization	2023-12-13 01:00:11 +01:00
turboderp	1f36c4a3e9	Fix quant resume	2023-12-10 20:27:55 +01:00
turboderp	3c43bad57f	Revert some changes, calibrate to q state again (fixes 70B low bitrate)	2023-12-10 17:34:18 +01:00
turboderp	a89d85a803	Faster and more stable quant	2023-12-10 03:29:06 +01:00
turboderp	2e91239571	New quant optimization procedure	2023-12-08 20:19:57 +01:00
kingbri	6bfcefe940	Tree: Force utf8 when opening files The default encoding on linux is utf8, but Windows uses cp1252 which isn't compatible with some unicode characters. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 19:21:29 -05:00
turboderp	09b981fa57	Add RoPE arguments to quantizer script	2023-11-21 05:13:37 +01:00
turboderp	d4e9020f30	Add QKV embeddings	2023-10-05 21:40:11 +02:00
turboderp	21c44b79ce	Bump to 0.0.3	2023-09-21 21:51:23 +02:00
turboderp	09227a7bde	Make sure temp buffers are allocated for length of calibration data	2023-09-20 10:05:07 +02:00
turboderp	6fd006b9d0	More options for converter to facilitate scripting	2023-09-18 18:25:30 +02:00
turboderp	fae6fb296c	Fix arg type for shard_size	2023-09-17 19:16:58 +02:00
turboderp	5c247f93aa	More memory tweaks, made swapping states to CPU the default to accommodate quanting 70B on 24GB GPUs	2023-09-16 19:15:29 +02:00
turboderp	2f72437fcb	Add more quant options	2023-09-16 15:04:12 +02:00
turboderp	af1398ff16	Conversion: ability to save sharded models (addresses OoM when compiling output file)	2023-09-16 11:44:07 +02:00
turboderp	952c67c4ff	Update defaults for convert script	2023-09-09 14:53:52 +02:00
turboderp	bb83469574	Initial commit	2023-08-30 11:05:23 +02:00

28 Commits