Commit Graph

149 Commits

Author SHA1 Message Date
turboderp
c8e5cedfb3 Example Gemma template 2024-07-04 05:25:12 +02:00
turboderp
e56cfe2219 Chatbot: fix chatml template 2024-07-03 22:34:46 +02:00
turboderp
95e093a2b2 Chatbot: Ignore undefined special tokens 2024-07-03 22:34:34 +02:00
turboderp
8c2132453c More debug output 2024-07-03 22:04:22 +02:00
turboderp
f01e0d0736 Update example 2024-06-17 01:10:50 +02:00
turboderp
a7a751d966 Add bulk inference example 2024-06-14 00:45:46 +02:00
turboderp
f3596fc0d9 Add Q6 cache mode 2024-06-09 01:23:50 +02:00
turboderp
f6abbba183 Add Q8 cache option to example chatbot 2024-06-08 22:40:12 +02:00
turboderp
127d4c70e5 Allow multiple valid prefixes in ExLlamaV2PrefixFilter 2024-06-03 19:16:59 +02:00
turboderp
e2e3535a9c Fix deprecated example 2024-06-02 12:53:50 +02:00
turboderp
475c5b5e89 Add granite prompt format to example utils 2024-05-30 20:09:04 +02:00
turboderp
fceb4fd13e Merge branch 'fork/xformer' into dev
# Conflicts:
#	exllamav2/attn.py
#	exllamav2/model.py
2024-05-27 00:01:46 +02:00
turboderp
5ef9b13d88 Revert example 2024-05-26 14:11:49 +02:00
turboderp
e6f230bf06 Update README.md 2024-05-25 22:50:36 +02:00
turboderp
93d652ad3c Add close method to dynamic gen async wrapper 2024-05-25 16:43:22 +02:00
turboderp
4587220485 Dynamic gen: Fix partial page reuse for draft cache 2024-05-25 16:32:34 +02:00
turboderp
89b4af6a60 Dynamic gen: Fix page caching logic 2024-05-25 16:32:34 +02:00
turboderp
742c7e228d Add banned strings option to dynamic gen demo 2024-05-25 15:42:56 +02:00
turboderp
b62007b90f Add dedup example 2024-05-25 03:08:40 +02:00
turboderp
fdf111b8ac Update examples 2024-05-24 20:35:32 +02:00
turboderp
2ba9dbc737 Update async example 2024-05-23 20:30:54 +02:00
turboderp
bf0557bf16 Dynamic gen: Async wrapper and example 2024-05-23 19:34:14 +02:00
turboderp
a3d18564ff Update examples 2024-05-22 22:22:06 +02:00
turboderp
28743d3f3c Dynamic gen: CFG API and example 2024-05-20 14:19:58 +02:00
turboderp
5c326be913 Dynamic gen: Partial page caching 2024-05-19 18:31:31 +02:00
turboderp
9b3ee824c1 Update LoRA example 2024-05-19 12:33:19 +02:00
turboderp
8d6b6efb71 Update JSON example 2024-05-19 10:58:38 +02:00
turboderp
3ef5161021 Update inference example 2024-05-19 00:09:40 +02:00
turboderp
b504cba5cd Update dynamic gen example 2024-05-18 15:58:55 +02:00
turboderp
b1973ee508 Dynamic gen: Update example 2024-05-18 15:44:15 +02:00
turboderp
cd5865ec07 Add dynamic batching generator (unfinished) 2024-05-18 06:39:55 +02:00
turboderp
d0c31a509f A little cleanup 2024-05-18 06:39:26 +02:00
turboderp
d87e81a0c7 Remove state from ExLlamaV2Sampler.Settings, pass filter list to generator instead 2024-05-15 22:43:33 +02:00
turboderp
c771d6362c Tidy up attn forward, remove multiple caches mode and example, prep for paged attn 2024-05-14 22:40:14 +02:00
laoda513
309d0e902e if no flash_attn using xformer when available 2024-05-14 12:08:57 +08:00
turboderp
eaf542672a Skip first forward pass when rewinding after banned string 2024-05-11 14:31:45 +02:00
turboderp
3fe6ca8010 Add C++ function for partial string matching in generator 2024-05-11 01:43:44 +02:00
turboderp
e134469b18 Add banned strings example 2024-05-10 02:58:30 +02:00
turboderp
1c2ed229f1 Add Granite template to chatbot example 2024-05-08 21:46:34 +02:00
turboderp
114dadb379 Add Phi3 template to chatbot example 2024-04-24 22:26:00 +02:00
turboderp
a97716ad7b Add Llama3 instruct template 2024-04-18 19:26:34 +02:00
turboderp
2ab145a1a6 Add Cohere format to example chatbot 2024-04-06 08:00:38 +02:00
turboderp
55f1de45c8 Some cleanup, commenting, type hinting and refactoring 2024-03-23 16:16:32 +01:00
turboderp
48925b46ff fixup example 2024-03-20 07:59:00 +01:00
turboderp
7b721bfb8a Reapply "bump to 0.0.16"
This reverts commit 36cce99304.
2024-03-20 07:55:57 +01:00
turboderp
36cce99304 Revert "bump to 0.0.16"
This reverts commit 5a3fcfa7ab.
2024-03-20 07:55:43 +01:00
turboderp
5a3fcfa7ab bump to 0.0.16 2024-03-20 07:54:50 +01:00
Thanasis Galianis
6d7d39b155 Update chat.py 2024-03-18 22:38:20 +02:00
Thanasis Galianis
4ca5ca35a6 Added --gpu_split explanation on examples/chat.py 2024-03-18 21:35:01 +02:00
turboderp
65ed844060 Option to limit scratch space for output layer 2024-03-16 13:35:21 +01:00