exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-05-13 09:15:57 +00:00

Author	SHA1	Message	Date
turboderp	385965ed61	Fix padding for extended vocab models	2023-09-22 00:21:49 +02:00
turboderp	c0dd3412d5	Compute full error for lm_head	2023-09-20 10:25:58 +02:00
turboderp	6fd006b9d0	More options for converter to facilitate scripting	2023-09-18 18:25:30 +02:00
turboderp	5c247f93aa	More memory tweaks, made swapping states to CPU the default to accommodate quanting 70B on 24GB GPUs	2023-09-16 19:15:29 +02:00
turboderp	9e55e44bcb	Optimize memory usage when quantizing, increase damping factor	2023-09-16 15:08:55 +02:00
turboderp	63f50d72de	Stop quantization if sanity check fails	2023-09-15 06:08:40 +02:00
19h	67b8515899	Conversion: release CUDA cache after VRAM intensive quant blocks	2023-09-14 14:06:44 +02:00
turboderp	e35e24346a	Fix padding when lm_head width is not multiple of 32	2023-09-13 18:47:22 +02:00
turboderp	52563ca347	Fix regression	2023-09-12 13:52:28 +02:00
turboderp	c5c90a8b4b	Clean up imports	2023-09-11 07:31:43 +02:00
turboderp	5dc32f0f8c	Fix padding for head layer when vocab is extended	2023-09-10 20:12:15 +02:00
turboderp	5d798a178a	Cleaning up converter	2023-09-09 14:54:23 +02:00
turboderp	4b98d98a5c	Fix bug in 6-bit matrix preproc	2023-09-06 08:47:09 +02:00
turboderp	bb83469574	Initial commit	2023-08-30 11:05:23 +02:00