turboderp
9244003a40
Add support for Mistral 3.1 VLM
2025-04-18 22:47:47 +02:00
turboderp
de19cbcc59
Add GLM4 architecture
2025-04-15 18:57:29 +02:00
turboderp
17762c177f
Merge remote-tracking branch 'origin/dev' into dev
2025-03-15 01:37:43 +01:00
turboderp
6f7623ff0e
Update examples
2025-03-15 01:30:52 +01:00
Thomas
eaf8ad1041
Update chat.py, include multi-line input support and context clearing through input ( #738 )
...
* Update chat.py, include multi-line input support and context clearing
- Enable multi-line input (mli) support through the -mli argument. When using mli, end input with the EOF char (return/Ctrl+D on Unix, return/Ctrl+Z/return on Windows)
- Allow context clearing outside of amnesia by inputting "clear"
* Adding qwq chat mode, adding the ability to forget thinking context
2025-03-10 15:28:33 +01:00
turboderp
96b2f9df77
Add Qwen2.5 mode to grounding demo
2025-01-29 22:41:36 +01:00
turboderp
ae241a9af5
Fix video example
2024-12-30 02:24:49 +01:00
turboderp
ab4d9e15eb
Chat example Granite3 template
2024-12-27 18:32:46 +01:00
turboderp
cf7fcd18d2
Fix chat example system prompt
2024-12-18 07:52:09 +01:00
turboderp
f76bc8537a
Read number of vision tower layers from config for Pixtral (fix Pixtral-Large)
2024-12-18 01:29:20 +01:00
turboderp
4061c24373
Qwen2-VL: Basic video support
2024-12-15 23:32:41 +01:00
turboderp
c78d9027aa
Fix ChatML template in multimodal example
2024-12-15 21:38:40 +01:00
turboderp
9934f06442
Refactoring
2024-12-15 21:37:29 +01:00
turboderp
fa7e89c197
Update example
2024-12-01 14:20:33 +01:00
turboderp
48e6306193
Update chat example, prompt formats
2024-11-30 13:31:35 +01:00
turboderp
1f685bd8d3
Update grounding demo
2024-11-23 14:46:51 +01:00
turboderp
638cf3015f
Add Qwen2-VL grounding demo
2024-11-23 12:19:02 +01:00
turboderp
c16aa9b3eb
Update multimodal example
2024-11-18 07:51:41 +01:00
turboderp
c81603c441
Update multimodal example
2024-11-18 07:16:48 +01:00
turboderp
2a888dbd47
Pixtral example
2024-11-12 03:46:29 +01:00
turboderp
d37cf7e764
Fix regressions
2024-11-10 13:38:21 +01:00
turboderp
a40c07a333
Update Formatron example (supports conlist since 0.4.6)
2024-10-15 19:28:00 +02:00
turboderp
7bacab2a55
Rename JSON example
2024-10-02 23:59:53 +02:00
turboderp
b651f4abab
Add XTC and DRY options to chatbot example.
2024-10-02 00:01:49 +02:00
turboderp
2616fd74d0
Add Formatron example
2024-09-30 00:41:51 +02:00
Sinan
7c7b1993b4
Added draft token count as parameter to chat.py ( #635 )
2024-09-24 11:16:30 +02:00
turboderp
10a8842b25
Fix JSON inference example
2024-09-14 21:35:02 +02:00
turboderp
555c360798
Update TP example
2024-08-22 13:43:19 +02:00
turboderp
4117daa546
Cleanup
2024-08-22 12:49:37 +02:00
turboderp
9917403229
Bulk example: Compute immediate output tokens/second
2024-08-22 12:46:58 +02:00
turboderp
8477da8f8c
Chatbot: Load draft model first
2024-08-15 18:42:37 +02:00
turboderp
b30f796690
TP mode for attn layer, non-paged
2024-08-14 23:41:10 +02:00
turboderp
a8d4c6c42d
Add TP example (draft)
2024-08-07 15:09:16 +02:00
turboderp
587b7410be
Update examples
2024-07-09 07:33:19 +02:00
turboderp
c8e5cedfb3
Example Gemma template
2024-07-04 05:25:12 +02:00
turboderp
e56cfe2219
Chatbot: fix chatml template
2024-07-03 22:34:46 +02:00
turboderp
95e093a2b2
Chatbot: Ignore undefined special tokens
2024-07-03 22:34:34 +02:00
turboderp
8c2132453c
More debug output
2024-07-03 22:04:22 +02:00
turboderp
f01e0d0736
Update example
2024-06-17 01:10:50 +02:00
turboderp
a7a751d966
Add bulk inference example
2024-06-14 00:45:46 +02:00
turboderp
f3596fc0d9
Add Q6 cache mode
2024-06-09 01:23:50 +02:00
turboderp
f6abbba183
Add Q8 cache option to example chatbot
2024-06-08 22:40:12 +02:00
turboderp
127d4c70e5
Allow multiple valid prefixes in ExLlamaV2PrefixFilter
2024-06-03 19:16:59 +02:00
turboderp
e2e3535a9c
Fix deprecated example
2024-06-02 12:53:50 +02:00
turboderp
475c5b5e89
Add granite prompt format to example utils
2024-05-30 20:09:04 +02:00
turboderp
fceb4fd13e
Merge branch 'fork/xformer' into dev
...
# Conflicts:
# exllamav2/attn.py
# exllamav2/model.py
2024-05-27 00:01:46 +02:00
turboderp
5ef9b13d88
Revert example
2024-05-26 14:11:49 +02:00
turboderp
e6f230bf06
Update README.md
2024-05-25 22:50:36 +02:00
turboderp
93d652ad3c
Add close method to dynamic gen async wrapper
2024-05-25 16:43:22 +02:00
turboderp
4587220485
Dynamic gen: Fix partial page reuse for draft cache
2024-05-25 16:32:34 +02:00