Commit Graph

183 Commits

Author SHA1 Message Date
turboderp
9244003a40 Add support for Mistral 3.1 VLM 2025-04-18 22:47:47 +02:00
turboderp
de19cbcc59 Add GLM4 architecture 2025-04-15 18:57:29 +02:00
turboderp
17762c177f Merge remote-tracking branch 'origin/dev' into dev 2025-03-15 01:37:43 +01:00
turboderp
6f7623ff0e Update examples 2025-03-15 01:30:52 +01:00
Thomas
eaf8ad1041 Update chat.py, include multi-line input support and context clearing through input (#738)
* Update chat.py, include multi-line input support and context clearing

- Enable multi-line input (mli) support through the -mli argument. When using mli, end input with the EOF char (return/Ctrl+D on Unix, return/Ctrl+Z/return on Windows)
- Allow context clearing outside of amnesia by inputting "clear"

* Adding qwq chat mode, adding the ability to forget thinking context
2025-03-10 15:28:33 +01:00
turboderp
96b2f9df77 Add Qwen2.5 mode to grounding demo 2025-01-29 22:41:36 +01:00
turboderp
ae241a9af5 Fix video example 2024-12-30 02:24:49 +01:00
turboderp
ab4d9e15eb Chat example Granite3 template 2024-12-27 18:32:46 +01:00
turboderp
cf7fcd18d2 Fix chat example system prompt 2024-12-18 07:52:09 +01:00
turboderp
f76bc8537a Read number of vision tower layers from config for Pixtral (fix Pixtral-Large) 2024-12-18 01:29:20 +01:00
turboderp
4061c24373 Qwen2-VL: Basic video support 2024-12-15 23:32:41 +01:00
turboderp
c78d9027aa Fix ChatML template in multimodal example 2024-12-15 21:38:40 +01:00
turboderp
9934f06442 Refactoring 2024-12-15 21:37:29 +01:00
turboderp
fa7e89c197 Update example 2024-12-01 14:20:33 +01:00
turboderp
48e6306193 Update chat example, prompt formats 2024-11-30 13:31:35 +01:00
turboderp
1f685bd8d3 Update grounding demo 2024-11-23 14:46:51 +01:00
turboderp
638cf3015f Add Qwen2-VL grounding demo 2024-11-23 12:19:02 +01:00
turboderp
c16aa9b3eb Update multimodal example 2024-11-18 07:51:41 +01:00
turboderp
c81603c441 Update multimodal example 2024-11-18 07:16:48 +01:00
turboderp
2a888dbd47 Pixtral example 2024-11-12 03:46:29 +01:00
turboderp
d37cf7e764 Fix regressions 2024-11-10 13:38:21 +01:00
turboderp
a40c07a333 Update Formatron example (supports conlist since 0.4.6) 2024-10-15 19:28:00 +02:00
turboderp
7bacab2a55 Rename JSON example 2024-10-02 23:59:53 +02:00
turboderp
b651f4abab Add XTC and DRY options to chatbot example. 2024-10-02 00:01:49 +02:00
turboderp
2616fd74d0 Add Formatron example 2024-09-30 00:41:51 +02:00
Sinan
7c7b1993b4 Added draft token count as parameter to chat.py (#635) 2024-09-24 11:16:30 +02:00
turboderp
10a8842b25 Fix JSON inference example 2024-09-14 21:35:02 +02:00
turboderp
555c360798 Update TP example 2024-08-22 13:43:19 +02:00
turboderp
4117daa546 Cleanup 2024-08-22 12:49:37 +02:00
turboderp
9917403229 Bulk example: Compute immediate output tokens/second 2024-08-22 12:46:58 +02:00
turboderp
8477da8f8c Chatbot: Load draft model first 2024-08-15 18:42:37 +02:00
turboderp
b30f796690 TP mode for attn layer, non-paged 2024-08-14 23:41:10 +02:00
turboderp
a8d4c6c42d Add TP example (draft) 2024-08-07 15:09:16 +02:00
turboderp
587b7410be Update examples 2024-07-09 07:33:19 +02:00
turboderp
c8e5cedfb3 Example Gemma template 2024-07-04 05:25:12 +02:00
turboderp
e56cfe2219 Chatbot: fix chatml template 2024-07-03 22:34:46 +02:00
turboderp
95e093a2b2 Chatbot: Ignore undefined special tokens 2024-07-03 22:34:34 +02:00
turboderp
8c2132453c More debug output 2024-07-03 22:04:22 +02:00
turboderp
f01e0d0736 Update example 2024-06-17 01:10:50 +02:00
turboderp
a7a751d966 Add bulk inference example 2024-06-14 00:45:46 +02:00
turboderp
f3596fc0d9 Add Q6 cache mode 2024-06-09 01:23:50 +02:00
turboderp
f6abbba183 Add Q8 cache option to example chatbot 2024-06-08 22:40:12 +02:00
turboderp
127d4c70e5 Allow multiple valid prefixes in ExLlamaV2PrefixFilter 2024-06-03 19:16:59 +02:00
turboderp
e2e3535a9c Fix deprecated example 2024-06-02 12:53:50 +02:00
turboderp
475c5b5e89 Add granite prompt format to example utils 2024-05-30 20:09:04 +02:00
turboderp
fceb4fd13e Merge branch 'fork/xformer' into dev
# Conflicts:
#	exllamav2/attn.py
#	exllamav2/model.py
2024-05-27 00:01:46 +02:00
turboderp
5ef9b13d88 Revert example 2024-05-26 14:11:49 +02:00
turboderp
e6f230bf06 Update README.md 2024-05-25 22:50:36 +02:00
turboderp
93d652ad3c Add close method to dynamic gen async wrapper 2024-05-25 16:43:22 +02:00
turboderp
4587220485 Dynamic gen: Fix partial page reuse for draft cache 2024-05-25 16:32:34 +02:00