exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-05-11 16:30:25 +00:00

Author	SHA1	Message	Date
turboderp	5dec977006	Refactor chat example, split out prompt formats, add working option for TinyLlama-chat	2023-10-04 23:18:45 +02:00
turboderp	d09a3fa000	Add Orca prompt format to chat example	2023-10-04 01:44:57 +02:00
turboderp	d3217f0e4c	Refactor code formatting, integrate in chatbot example	2023-10-01 12:51:20 +02:00
turboderp	51a0104bba	WebSocket server (WIP)	2023-09-30 23:52:11 +02:00
turboderp	0961876eb2	Merge pull request #71 from SinanAkkoyun/code-chat Code highlighting in chat CLI	2023-09-29 23:31:40 +02:00
turboderp	c136b2284c	Add token healing	2023-09-29 22:33:51 +02:00
Sinan Akkoyun	fa23466f68	Really fixed the codeblock lang problem lol	2023-09-29 16:25:38 +02:00
Sinan Akkoyun	4f6f37c4a4	Removed lang after ``` in output	2023-09-29 16:17:16 +02:00
Sinan Akkoyun	2a43d3069d	Added codeblock highlighting to chatcode.py	2023-09-29 15:57:28 +02:00
turboderp	ba5f6191c8	Add typical setting to chat example.	2023-09-26 19:50:44 +02:00
Jeff Kerr	c221ec3630	add comment on model.load() usage	2023-09-13 11:25:49 -04:00
turboderp	c5c90a8b4b	Clean up imports	2023-09-11 07:31:43 +02:00
turboderp	b4afc666dd	Clean up examples	2023-09-10 14:16:42 +02:00
turboderp	10899838ea	Add speculative generator and example	2023-09-10 06:22:27 +02:00
turboderp	19e164eea2	CodeLlama system prompt	2023-09-09 14:53:02 +02:00
turboderp	f79e16c5d0	Optimization, wider loads in EXL2 kernel (int4)	2023-09-07 10:56:43 +02:00
turboderp	f259fafda9	Optimization, wider loads in GPTQ kernel (int2)	2023-09-07 03:03:02 +02:00
turboderp	4b98d98a5c	Fix bug in 6-bit matrix preproc	2023-09-06 08:47:09 +02:00
turboderp	7964c73241	Add sampling settings as cmdline options to chat example	2023-09-05 14:32:02 +02:00
turboderp	e7b50fedcb	Fix chat example Llama mode (EOS was appended twice)	2023-09-05 14:24:53 +02:00
turboderp	fb0825207f	Fix chat example Llama mode (EOS was appended twice)	2023-09-05 14:22:34 +02:00
turboderp	3c80d41234	Add 4-bit GPTQ support	2023-09-05 14:03:51 +02:00
turboderp	6d576b3e56	Reworking attention, allow for batched inference with independent cache per sequence	2023-09-03 15:56:38 +02:00
turboderp	4570f6ee17	Tidying up	2023-09-02 16:40:57 +02:00
turboderp	bb83469574	Initial commit	2023-08-30 11:05:23 +02:00

25 Commits