exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-05-04 13:11:18 +00:00

Author	SHA1	Message	Date
turboderp	99b19ec5f1	Cleanup examples a bit	2024-01-20 10:57:16 +01:00
turboderp	41b15dd1c3	Refactor to consolidate attn params	2024-01-04 04:52:49 +01:00
AlpinDale	5131099b5f	add top_a in a few more places	2023-12-21 15:28:34 +00:00
turboderp	5c974259bd	More sensible defaults sampling parameters	2023-12-03 22:09:41 +01:00
turboderp	dfd0bcf888	Revert example	2023-11-22 07:23:43 +01:00
turboderp	5886047b15	Don't update setuptools	2023-11-22 07:07:48 +01:00
turboderp	7a783b3824	Update examples (auto GPU split)	2023-10-22 19:32:26 +02:00
turboderp	c136b2284c	Add token healing	2023-09-29 22:33:51 +02:00
Jeff Kerr	c221ec3630	add comment on model.load() usage	2023-09-13 11:25:49 -04:00
turboderp	b4afc666dd	Clean up examples	2023-09-10 14:16:42 +02:00
turboderp	f79e16c5d0	Optimization, wider loads in EXL2 kernel (int4)	2023-09-07 10:56:43 +02:00
turboderp	f259fafda9	Optimization, wider loads in GPTQ kernel (int2)	2023-09-07 03:03:02 +02:00
turboderp	6d576b3e56	Reworking attention, allow for batched inference with independent cache per sequence	2023-09-03 15:56:38 +02:00
turboderp	bb83469574	Initial commit	2023-08-30 11:05:23 +02:00