exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-05-04 05:01:39 +00:00

Author	SHA1	Message	Date
turboderp	cedeb616ce	Support Qwen2	2024-02-15 20:50:24 +01:00
turboderp	702dd9740a	VRAM optimizations during quant	2024-02-15 20:03:47 +01:00
turboderp	305982de43	Expand range for quantized parameter search	2024-01-30 20:22:44 +01:00
turboderp	7d37b50d90	Fix typos	2024-01-09 07:12:38 +01:00
turboderp	e089313afd	Reset norm	2024-01-09 05:30:15 +01:00
turboderp	6e214f59c7	Optimize conversion kernels	2024-01-08 03:40:40 +01:00
turboderp	02ce583318	Optimize VRAM usage a bit for quantizer	2023-12-26 00:00:37 +01:00
turboderp	0d63d6479c	Rework quantization and optimization	2023-12-13 01:00:11 +01:00
turboderp	644805adba	Reduce VRAM usage when quantizing	2023-12-02 17:18:53 +01:00
turboderp	714a19ca8f	Allow irregular group sizes	2023-11-26 16:53:29 +01:00
turboderp	02b4e65ba1	Cleanup TODO items	2023-10-22 17:57:37 +02:00
turboderp	4375e6b535	Catch edge case where torch.cholensky_inverse returns NaN tensor instead of throwing Increase no. attempts at damping before failing Remove enforcement of symmetry (seems to never be relevant)	2023-09-20 10:01:38 +02:00
turboderp	5c247f93aa	More memory tweaks, made swapping states to CPU the default to accommodate quanting 70B on 24GB GPUs	2023-09-16 19:15:29 +02:00
turboderp	9e55e44bcb	Optimize memory usage when quantizing, increase damping factor	2023-09-16 15:08:55 +02:00
turboderp	aee7a28170	Set default damping to .01 in line with GPTQ, increase damping if Hessian is not PD, disable removal of "dead" weights	2023-09-16 05:59:42 +02:00
turboderp	c5c90a8b4b	Clean up imports	2023-09-11 07:31:43 +02:00
turboderp	5dc32f0f8c	Fix padding for head layer when vocab is extended	2023-09-10 20:12:15 +02:00
turboderp	5d798a178a	Cleaning up converter	2023-09-09 14:54:23 +02:00
turboderp	bb83469574	Initial commit	2023-08-30 11:05:23 +02:00

19 Commits