exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-04-20 06:19:00 +00:00

Author	SHA1	Message	Date
turboderp	157fcfb5b9	Add license	2023-09-12 06:52:09 +02:00
turboderp	8e3cd01889	Update README.md	2023-09-12 06:52:04 +02:00
turboderp	546c91482e	Add FUNDING.yml	2023-09-12 06:42:55 +02:00
turboderp	c240eb0b70	Update README.md	2023-09-12 06:41:36 +02:00
turboderp	7704a6877b	Fix VRAM usage estimate for linear layer spanning multiple shards	2023-09-11 19:20:24 +02:00
turboderp	c5c90a8b4b	Clean up imports	2023-09-11 07:31:43 +02:00
turboderp	27bccf02d3	Clean up Torch extension script	2023-09-11 07:21:55 +02:00
turboderp	6e14f8802b	Unsharding utility	2023-09-11 04:16:42 +02:00
turboderp	49c8d9e51d	Fix load quant tensors that span multiple shards	2023-09-11 04:16:17 +02:00
turboderp	5dc32f0f8c	Fix padding for head layer when vocab is extended	2023-09-10 20:12:15 +02:00
turboderp	8ca9a1896d	Add sharding utility	2023-09-10 18:59:05 +02:00
turboderp	8f9617dd5d	Fix image link	2023-09-10 18:58:54 +02:00
turboderp	f3ef397656	Add README.md	2023-09-10 15:42:38 +02:00
turboderp	ddaf503e98	Add README.md	2023-09-10 14:16:53 +02:00
turboderp	b4afc666dd	Clean up examples	2023-09-10 14:16:42 +02:00
turboderp	c0ade31bfe	Fix typo	2023-09-10 10:00:52 +02:00
turboderp	b389b474eb	Add ninja requirement	2023-09-10 10:00:27 +02:00
turboderp	2617b6c012	Setuptools script	2023-09-10 09:02:05 +02:00
turboderp	0ec776f53e	Add requirements.txt	2023-09-10 08:05:45 +02:00
turboderp	10899838ea	Add speculative generator and example	2023-09-10 06:22:27 +02:00
turboderp	48f0db78b2	Improved VRAM predictions	2023-09-10 06:17:44 +02:00
turboderp	918368b295	34B testing	2023-09-10 06:15:33 +02:00
turboderp	6046dcf39a	Util functions for mem debugging	2023-09-10 06:14:51 +02:00
turboderp	5d798a178a	Cleaning up converter	2023-09-09 14:54:23 +02:00
turboderp	952c67c4ff	Update defaults for convert script	2023-09-09 14:53:52 +02:00
turboderp	19e164eea2	CodeLlama system prompt	2023-09-09 14:53:02 +02:00
turboderp	18fe5d5a5a	Forward pass chunking adapted from V1	2023-09-08 08:15:14 +02:00
turboderp	0af5c8a413	Forward pass chunking adapted from V1	2023-09-08 08:11:40 +02:00
turboderp	d00c03ea69	Optimization: write K/V directly into cache when possible	2023-09-08 07:53:01 +02:00
turboderp	f9dc978e01	Fix and test fallback matmul mode	2023-09-07 18:15:42 +02:00
turboderp	f79e16c5d0	Optimization, wider loads in EXL2 kernel (int4)	2023-09-07 10:56:43 +02:00
turboderp	1075b7514f	Optimization, wider loads in GPTQ kernel (int4)	2023-09-07 04:26:45 +02:00
turboderp	c2f62e1f1f	Optimization, wider loads in GPTQ kernel (int2) working	2023-09-07 04:07:13 +02:00
turboderp	f259fafda9	Optimization, wider loads in GPTQ kernel (int2)	2023-09-07 03:03:02 +02:00
turboderp	a0cb4355c3	Fix regression in EXL2 convert	2023-09-06 08:47:38 +02:00
turboderp	4b98d98a5c	Fix bug in 6-bit matrix preproc	2023-09-06 08:47:09 +02:00
turboderp	7964c73241	Add sampling settings as cmdline options to chat example	2023-09-05 14:32:02 +02:00
turboderp	e7b50fedcb	Fix chat example Llama mode (EOS was appended twice)	2023-09-05 14:24:53 +02:00
turboderp	fb0825207f	Fix chat example Llama mode (EOS was appended twice)	2023-09-05 14:22:34 +02:00
turboderp	3c80d41234	Add 4-bit GPTQ support	2023-09-05 14:03:51 +02:00
turboderp	6d576b3e56	Reworking attention, allow for batched inference with independent cache per sequence	2023-09-03 15:56:38 +02:00
turboderp	4570f6ee17	Tidying up	2023-09-02 16:40:57 +02:00
turboderp	2a2cc16119	More kernel optimizin	2023-09-02 13:29:43 +02:00
turboderp	92ce76dec1	Kernel optimizations WIP	2023-09-02 05:37:00 +02:00
turboderp	c5cf3956dc	Add speed test	2023-09-01 12:03:00 +02:00
turboderp	a386102ac6	Improve prediction of VRAM usage when loading model	2023-09-01 10:47:29 +02:00
turboderp	176dbc43ad	CodeLlama rope_theta_support	2023-09-01 09:26:00 +02:00
turboderp	bb83469574	Initial commit	2023-08-30 11:05:23 +02:00
turboderp	9cc802c11a	Test commit	2023-08-30 11:03:34 +02:00
turboderp	03fb9db2e0	Initial commit	2023-08-30 10:54:23 +02:00

50 Commits