exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-05-12 00:41:44 +00:00

Author	SHA1	Message	Date
turboderp	19e164eea2	CodeLlama system prompt	2023-09-09 14:53:02 +02:00
turboderp	f79e16c5d0	Optimization, wider loads in EXL2 kernel (int4)	2023-09-07 10:56:43 +02:00
turboderp	f259fafda9	Optimization, wider loads in GPTQ kernel (int2)	2023-09-07 03:03:02 +02:00
turboderp	4b98d98a5c	Fix bug in 6-bit matrix preproc	2023-09-06 08:47:09 +02:00
turboderp	7964c73241	Add sampling settings as cmdline options to chat example	2023-09-05 14:32:02 +02:00
turboderp	e7b50fedcb	Fix chat example Llama mode (EOS was appended twice)	2023-09-05 14:24:53 +02:00
turboderp	fb0825207f	Fix chat example Llama mode (EOS was appended twice)	2023-09-05 14:22:34 +02:00
turboderp	3c80d41234	Add 4-bit GPTQ support	2023-09-05 14:03:51 +02:00
turboderp	6d576b3e56	Reworking attention, allow for batched inference with independent cache per sequence	2023-09-03 15:56:38 +02:00
turboderp	4570f6ee17	Tidying up	2023-09-02 16:40:57 +02:00
turboderp	bb83469574	Initial commit	2023-08-30 11:05:23 +02:00