exllamav2

mirror of https://github.com/turboderp-org/exllamav2.git synced 2026-04-20 14:29:28 +00:00

Author	SHA1	Message	Date
turboderp	9244003a40	Add support for Mistral 3.1 VLM	2025-04-18 22:47:47 +02:00
turboderp	de19cbcc59	Add GLM4 architecture	2025-04-15 18:57:29 +02:00
turboderp	17762c177f	Merge remote-tracking branch 'origin/dev' into dev	2025-03-15 01:37:43 +01:00
turboderp	6f7623ff0e	Update examples	2025-03-15 01:30:52 +01:00
Thomas	eaf8ad1041	Update chat.py, include multi-line input support and context clearing through input (#738 ) * Update chat.py, include multi-line input support and context clearing - Enable multi-line input (mli) support through the -mli argument. When using mli, end input with the EOF char (return/Ctrl+D on Unix, return/Ctrl+Z/return on Windows) - Allow context clearing outside of amnesia by inputting "clear" * Adding qwq chat mode, adding the ability to forget thinking context	2025-03-10 15:28:33 +01:00
turboderp	96b2f9df77	Add Qwen2.5 mode to grounding demo	2025-01-29 22:41:36 +01:00
turboderp	ae241a9af5	Fix video example	2024-12-30 02:24:49 +01:00
turboderp	ab4d9e15eb	Chat example Granite3 template	2024-12-27 18:32:46 +01:00
turboderp	cf7fcd18d2	Fix chat example system prompt	2024-12-18 07:52:09 +01:00
turboderp	f76bc8537a	Read number of vision tower layers from config for Pixtral (fix Pixtral-Large)	2024-12-18 01:29:20 +01:00
turboderp	4061c24373	Qwen2-VL: Basic video support	2024-12-15 23:32:41 +01:00
turboderp	c78d9027aa	Fix ChatML template in multimodal example	2024-12-15 21:38:40 +01:00
turboderp	9934f06442	Refactoring	2024-12-15 21:37:29 +01:00
turboderp	fa7e89c197	Update example	2024-12-01 14:20:33 +01:00
turboderp	48e6306193	Update chat example, prompt formats	2024-11-30 13:31:35 +01:00
turboderp	1f685bd8d3	Update grounding demo	2024-11-23 14:46:51 +01:00
turboderp	638cf3015f	Add Qwen2-VL grounding demo	2024-11-23 12:19:02 +01:00
turboderp	c16aa9b3eb	Update multimodal example	2024-11-18 07:51:41 +01:00
turboderp	c81603c441	Update multimodal example	2024-11-18 07:16:48 +01:00
turboderp	2a888dbd47	Pixtral example	2024-11-12 03:46:29 +01:00
turboderp	d37cf7e764	Fix regressions	2024-11-10 13:38:21 +01:00
turboderp	a40c07a333	Update Formatron example (supports conlist since 0.4.6)	2024-10-15 19:28:00 +02:00
turboderp	7bacab2a55	Rename JSON example	2024-10-02 23:59:53 +02:00
turboderp	b651f4abab	Add XTC and DRY options to chatbot example.	2024-10-02 00:01:49 +02:00
turboderp	2616fd74d0	Add Formatron example	2024-09-30 00:41:51 +02:00
Sinan	7c7b1993b4	Added draft token count as parameter to chat.py (#635 )	2024-09-24 11:16:30 +02:00
turboderp	10a8842b25	Fix JSON inference example	2024-09-14 21:35:02 +02:00
turboderp	555c360798	Update TP example	2024-08-22 13:43:19 +02:00
turboderp	4117daa546	Cleanup	2024-08-22 12:49:37 +02:00
turboderp	9917403229	Bulk example: Compute immediate output tokens/second	2024-08-22 12:46:58 +02:00
turboderp	8477da8f8c	Chatbot: Load draft model first	2024-08-15 18:42:37 +02:00
turboderp	b30f796690	TP mode for attn layer, non-paged	2024-08-14 23:41:10 +02:00
turboderp	a8d4c6c42d	Add TP example (draft)	2024-08-07 15:09:16 +02:00
turboderp	587b7410be	Update examples	2024-07-09 07:33:19 +02:00
turboderp	c8e5cedfb3	Example Gemma template	2024-07-04 05:25:12 +02:00
turboderp	e56cfe2219	Chatbot: fix chatml template	2024-07-03 22:34:46 +02:00
turboderp	95e093a2b2	Chatbot: Ignore undefined special tokens	2024-07-03 22:34:34 +02:00
turboderp	8c2132453c	More debug output	2024-07-03 22:04:22 +02:00
turboderp	f01e0d0736	Update example	2024-06-17 01:10:50 +02:00
turboderp	a7a751d966	Add bulk inference example	2024-06-14 00:45:46 +02:00
turboderp	f3596fc0d9	Add Q6 cache mode	2024-06-09 01:23:50 +02:00
turboderp	f6abbba183	Add Q8 cache option to example chatbot	2024-06-08 22:40:12 +02:00
turboderp	127d4c70e5	Allow multiple valid prefixes in ExLlamaV2PrefixFilter	2024-06-03 19:16:59 +02:00
turboderp	e2e3535a9c	Fix deprecated example	2024-06-02 12:53:50 +02:00
turboderp	475c5b5e89	Add granite prompt format to example utils	2024-05-30 20:09:04 +02:00
turboderp	fceb4fd13e	Merge branch 'fork/xformer' into dev # Conflicts: # exllamav2/attn.py # exllamav2/model.py	2024-05-27 00:01:46 +02:00
turboderp	5ef9b13d88	Revert example	2024-05-26 14:11:49 +02:00
turboderp	e6f230bf06	Update README.md	2024-05-25 22:50:36 +02:00
turboderp	93d652ad3c	Add close method to dynamic gen async wrapper	2024-05-25 16:43:22 +02:00
turboderp	4587220485	Dynamic gen: Fix partial page reuse for draft cache	2024-05-25 16:32:34 +02:00

1 2 3 4

183 Commits