exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-04-30 03:01:23 +00:00

Author	SHA1	Message	Date
turboderp	d00c03ea69	Optimization: write K/V directly into cache when possible	2023-09-08 07:53:01 +02:00
turboderp	f9dc978e01	Fix and test fallback matmul mode	2023-09-07 18:15:42 +02:00
turboderp	f79e16c5d0	Optimization, wider loads in EXL2 kernel (int4)	2023-09-07 10:56:43 +02:00
turboderp	1075b7514f	Optimization, wider loads in GPTQ kernel (int4)	2023-09-07 04:26:45 +02:00
turboderp	c2f62e1f1f	Optimization, wider loads in GPTQ kernel (int2) working	2023-09-07 04:07:13 +02:00
turboderp	f259fafda9	Optimization, wider loads in GPTQ kernel (int2)	2023-09-07 03:03:02 +02:00
turboderp	a0cb4355c3	Fix regression in EXL2 convert	2023-09-06 08:47:38 +02:00
turboderp	4b98d98a5c	Fix bug in 6-bit matrix preproc	2023-09-06 08:47:09 +02:00
turboderp	7964c73241	Add sampling settings as cmdline options to chat example	2023-09-05 14:32:02 +02:00
turboderp	e7b50fedcb	Fix chat example Llama mode (EOS was appended twice)	2023-09-05 14:24:53 +02:00
turboderp	fb0825207f	Fix chat example Llama mode (EOS was appended twice)	2023-09-05 14:22:34 +02:00
turboderp	3c80d41234	Add 4-bit GPTQ support	2023-09-05 14:03:51 +02:00
turboderp	6d576b3e56	Reworking attention, allow for batched inference with independent cache per sequence	2023-09-03 15:56:38 +02:00
turboderp	4570f6ee17	Tidying up	2023-09-02 16:40:57 +02:00
turboderp	2a2cc16119	More kernel optimizin	2023-09-02 13:29:43 +02:00
turboderp	92ce76dec1	Kernel optimizations WIP	2023-09-02 05:37:00 +02:00
turboderp	c5cf3956dc	Add speed test	2023-09-01 12:03:00 +02:00
turboderp	a386102ac6	Improve prediction of VRAM usage when loading model	2023-09-01 10:47:29 +02:00
turboderp	176dbc43ad	CodeLlama rope_theta_support	2023-09-01 09:26:00 +02:00
turboderp	bb83469574	Initial commit	2023-08-30 11:05:23 +02:00
turboderp	9cc802c11a	Test commit	2023-08-30 11:03:34 +02:00
turboderp	03fb9db2e0	Initial commit	2023-08-30 10:54:23 +02:00

... 18 19 20 21 22

1072 Commits