Commit Graph

1459 Commits

Author SHA1 Message Date
Llama Enjoyer
e960dfd68d Fix the temperature argument to accept values passed via the cmd line. 2024-09-24 17:18:08 +02:00
Sinan
7c7b1993b4 Added draft token count as parameter to chat.py (#635) 2024-09-24 11:16:30 +02:00
turboderp
8361f3f4a0 Add missing cp310+cu118 torch 2.4 windows wheel 2024-09-23 17:38:53 +02:00
turboderp
15e54046ba More stream gymnastics 2024-09-23 17:28:55 +02:00
turboderp
a5132d072e Add XTC sampler 2024-09-22 23:09:19 +02:00
turboderp
6d7b2e8e7a Revert snapshot interval 2024-09-22 19:11:32 +02:00
turboderp
43a0be35df Make measurement less sensitive to very sparse inf values in reference fwd pass 2024-09-22 19:01:29 +02:00
turboderp
a17f6665cb Fix streams in quantizer 2024-09-22 18:15:05 +02:00
turboderp
9946f45f1c Force tensor loading onto priority stream 2024-09-20 22:05:20 +02:00
turboderp
e155e0a5b0 Fix loading in new thread 2024-09-18 19:46:56 +02:00
turboderp
c4a03e09f5 Tokenizer: Give priority to tokenizer.json instead of tokenizer.model 2024-09-18 00:41:43 +02:00
turboderp
12bceb9f4b Cleanup 2024-09-18 00:32:00 +02:00
turboderp
0695f3a854 Fix potential bug in filter evaluation 2024-09-17 00:34:28 +02:00
turboderp
8a25e0f2b3 Merge branch 'refs/heads/master' into dev 2024-09-17 00:33:36 +02:00
turboderp
b25210778c Remove fasttensors, add platform-agnostic multithreaded ST loader 2024-09-17 00:33:16 +02:00
turboderp
144c576bdb Fix bottlenecks in quantized tensor loading 2024-09-17 00:00:27 +02:00
turboderp
10a8842b25 Fix JSON inference example 2024-09-14 21:35:02 +02:00
turboderp
b2c7cf280c Add cp310 cu121 torch2.4 Windows wheel v0.2.2 2024-09-14 21:17:52 +02:00
turboderp
46eff43403 Merge branch 'refs/heads/dev' 2024-09-14 21:13:46 +02:00
turboderp
228ba34cec Bump to 0.2.2 2024-09-14 21:13:22 +02:00
turboderp
a372fe1241 Skip superfluous set creation when possible, even if multiple filters used 2024-09-14 19:42:33 +02:00
turboderp
aadc454183 Fix sampling using multiple filters 2024-09-14 19:36:53 +02:00
turboderp
5ee983593b Fix potential race condition with multithreaded sampling and lazy tokenizer initialization 2024-09-14 19:28:02 +02:00
turboderp
1df7b04821 Allow non-causal attn with SDPA 2024-09-13 02:20:24 +02:00
turboderp
1e18e803da Merge branch 'refs/heads/dev' v0.2.1 2024-09-08 19:23:19 +02:00
turboderp
0d9adf96e8 Bump to 0.2.1 2024-09-08 19:22:43 +02:00
turboderp
f0dca9a862 Bit of cleanup 2024-09-08 16:31:47 +02:00
turboderp
f2c53efd34 Remove (experimental) Q-cache calibration feature 2024-09-08 15:13:00 +02:00
turboderp
a029bcd76e Bit of cleanup 2024-09-08 15:11:34 +02:00
AlpinDale
361d2119b3 fix: support for Rank Stabilized LoRA (RSLoRA) (#619)
If an adapter is trained with RSLoRA, the alpha value is multiplied by `sqrt(rank)`.
2024-09-08 01:00:26 +02:00
turboderp
c1fed2ed19 Add DRY range paramater 2024-09-07 16:06:46 +02:00
turboderp
3e8e181717 Add DRY (still needs testing) 2024-09-07 02:16:16 +02:00
turboderp
affdc0d16c Ensure streams are always set during the forward pass for the active thread 2024-09-05 20:28:09 +02:00
turboderp
5c455c18a0 Filter base: Fix instance init 2024-09-05 20:25:51 +02:00
turboderp
c9ce168ce0 Merge remote-tracking branch 'origin/dev' into dev 2024-09-04 23:23:39 +02:00
turboderp
1e462f1f7f Force loading tensors on default stream 2024-09-04 23:04:22 +02:00
Brian Dashore
c18400fa29 Issues: Add issue templates (#615)
Should help encourage well-formatted issues and make development easier.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 14:58:58 +02:00
turboderp
0d5c0bcc8d Asynchronous filter evaluation 2024-08-31 16:31:51 +02:00
turboderp
12f08dbbd8 Accept Sequence return type from ExLlamaV2Filter.next() 2024-08-31 12:06:21 +02:00
turboderp
ea27954767 TP: Add (slower) SDPA fallback mode when flash-attn is unavailable 2024-08-29 22:39:45 +02:00
turboderp
40e37f4944 Merge branch 'refs/heads/dev' v0.2.0 2024-08-28 22:56:08 +02:00
turboderp
c050aec764 Bump to 0.2.0 2024-08-28 22:56:01 +02:00
turboderp
1a82283df3 Merge branch 'refs/heads/dev' 2024-08-28 22:55:11 +02:00
turboderp
f1d8909809 Catch all exceptions for nvidia-smi and rocm-smi 2024-08-28 20:25:56 +02:00
turboderp
db14154fee Don't use default stream for logit padding mask after all 2024-08-27 22:06:19 +02:00
turboderp
8d3d4c227e Ensure logit padding happens on default stream 2024-08-27 21:48:30 +02:00
turboderp
d9f0ecc12c TP: Fix vocab split for models with odd vocab sizes 2024-08-27 21:47:49 +02:00
turboderp
7319b6ea31 Fix graph update for MLP with post layernorm 2024-08-27 20:07:21 +02:00
turboderp
4230dab3c1 Fix model_init feedback 2024-08-27 19:16:23 +02:00
turboderp
69291d1333 Fix another possible sync issues with fasttensors (for Windows) 2024-08-25 22:01:53 +02:00