turboderp
|
c8e5cedfb3
|
Example Gemma template
|
2024-07-04 05:25:12 +02:00 |
|
turboderp
|
e56cfe2219
|
Chatbot: fix chatml template
|
2024-07-03 22:34:46 +02:00 |
|
turboderp
|
95e093a2b2
|
Chatbot: Ignore undefined special tokens
|
2024-07-03 22:34:34 +02:00 |
|
turboderp
|
8c2132453c
|
More debug output
|
2024-07-03 22:04:22 +02:00 |
|
turboderp
|
f01e0d0736
|
Update example
|
2024-06-17 01:10:50 +02:00 |
|
turboderp
|
a7a751d966
|
Add bulk inference example
|
2024-06-14 00:45:46 +02:00 |
|
turboderp
|
f3596fc0d9
|
Add Q6 cache mode
|
2024-06-09 01:23:50 +02:00 |
|
turboderp
|
f6abbba183
|
Add Q8 cache option to example chatbot
|
2024-06-08 22:40:12 +02:00 |
|
turboderp
|
127d4c70e5
|
Allow multiple valid prefixes in ExLlamaV2PrefixFilter
|
2024-06-03 19:16:59 +02:00 |
|
turboderp
|
e2e3535a9c
|
Fix deprecated example
|
2024-06-02 12:53:50 +02:00 |
|
turboderp
|
475c5b5e89
|
Add granite prompt format to example utils
|
2024-05-30 20:09:04 +02:00 |
|
turboderp
|
fceb4fd13e
|
Merge branch 'fork/xformer' into dev
# Conflicts:
# exllamav2/attn.py
# exllamav2/model.py
|
2024-05-27 00:01:46 +02:00 |
|
turboderp
|
5ef9b13d88
|
Revert example
|
2024-05-26 14:11:49 +02:00 |
|
turboderp
|
e6f230bf06
|
Update README.md
|
2024-05-25 22:50:36 +02:00 |
|
turboderp
|
93d652ad3c
|
Add close method to dynamic gen async wrapper
|
2024-05-25 16:43:22 +02:00 |
|
turboderp
|
4587220485
|
Dynamic gen: Fix partial page reuse for draft cache
|
2024-05-25 16:32:34 +02:00 |
|
turboderp
|
89b4af6a60
|
Dynamic gen: Fix page caching logic
|
2024-05-25 16:32:34 +02:00 |
|
turboderp
|
742c7e228d
|
Add banned strings option to dynamic gen demo
|
2024-05-25 15:42:56 +02:00 |
|
turboderp
|
b62007b90f
|
Add dedup example
|
2024-05-25 03:08:40 +02:00 |
|
turboderp
|
fdf111b8ac
|
Update examples
|
2024-05-24 20:35:32 +02:00 |
|
turboderp
|
2ba9dbc737
|
Update async example
|
2024-05-23 20:30:54 +02:00 |
|
turboderp
|
bf0557bf16
|
Dynamic gen: Async wrapper and example
|
2024-05-23 19:34:14 +02:00 |
|
turboderp
|
a3d18564ff
|
Update examples
|
2024-05-22 22:22:06 +02:00 |
|
turboderp
|
28743d3f3c
|
Dynamic gen: CFG API and example
|
2024-05-20 14:19:58 +02:00 |
|
turboderp
|
5c326be913
|
Dynamic gen: Partial page caching
|
2024-05-19 18:31:31 +02:00 |
|
turboderp
|
9b3ee824c1
|
Update LoRA example
|
2024-05-19 12:33:19 +02:00 |
|
turboderp
|
8d6b6efb71
|
Update JSON example
|
2024-05-19 10:58:38 +02:00 |
|
turboderp
|
3ef5161021
|
Update inference example
|
2024-05-19 00:09:40 +02:00 |
|
turboderp
|
b504cba5cd
|
Update dynamic gen example
|
2024-05-18 15:58:55 +02:00 |
|
turboderp
|
b1973ee508
|
Dynamic gen: Update example
|
2024-05-18 15:44:15 +02:00 |
|
turboderp
|
cd5865ec07
|
Add dynamic batching generator (unfinished)
|
2024-05-18 06:39:55 +02:00 |
|
turboderp
|
d0c31a509f
|
A little cleanup
|
2024-05-18 06:39:26 +02:00 |
|
turboderp
|
d87e81a0c7
|
Remove state from ExLlamaV2Sampler.Settings, pass filter list to generator instead
|
2024-05-15 22:43:33 +02:00 |
|
turboderp
|
c771d6362c
|
Tidy up attn forward, remove multiple caches mode and example, prep for paged attn
|
2024-05-14 22:40:14 +02:00 |
|
laoda513
|
309d0e902e
|
if no flash_attn using xformer when available
|
2024-05-14 12:08:57 +08:00 |
|
turboderp
|
eaf542672a
|
Skip first forward pass when rewinding after banned string
|
2024-05-11 14:31:45 +02:00 |
|
turboderp
|
3fe6ca8010
|
Add C++ function for partial string matching in generator
|
2024-05-11 01:43:44 +02:00 |
|
turboderp
|
e134469b18
|
Add banned strings example
|
2024-05-10 02:58:30 +02:00 |
|
turboderp
|
1c2ed229f1
|
Add Granite template to chatbot example
|
2024-05-08 21:46:34 +02:00 |
|
turboderp
|
114dadb379
|
Add Phi3 template to chatbot example
|
2024-04-24 22:26:00 +02:00 |
|
turboderp
|
a97716ad7b
|
Add Llama3 instruct template
|
2024-04-18 19:26:34 +02:00 |
|
turboderp
|
2ab145a1a6
|
Add Cohere format to example chatbot
|
2024-04-06 08:00:38 +02:00 |
|
turboderp
|
55f1de45c8
|
Some cleanup, commenting, type hinting and refactoring
|
2024-03-23 16:16:32 +01:00 |
|
turboderp
|
48925b46ff
|
fixup example
|
2024-03-20 07:59:00 +01:00 |
|
turboderp
|
7b721bfb8a
|
Reapply "bump to 0.0.16"
This reverts commit 36cce99304.
|
2024-03-20 07:55:57 +01:00 |
|
turboderp
|
36cce99304
|
Revert "bump to 0.0.16"
This reverts commit 5a3fcfa7ab.
|
2024-03-20 07:55:43 +01:00 |
|
turboderp
|
5a3fcfa7ab
|
bump to 0.0.16
|
2024-03-20 07:54:50 +01:00 |
|
Thanasis Galianis
|
6d7d39b155
|
Update chat.py
|
2024-03-18 22:38:20 +02:00 |
|
Thanasis Galianis
|
4ca5ca35a6
|
Added --gpu_split explanation on examples/chat.py
|
2024-03-18 21:35:01 +02:00 |
|
turboderp
|
65ed844060
|
Option to limit scratch space for output layer
|
2024-03-16 13:35:21 +01:00 |
|