Commit Graph

1444 Commits

Author SHA1 Message Date
turboderp
144c576bdb Fix bottlenecks in quantized tensor loading 2024-09-17 00:00:27 +02:00
turboderp
10a8842b25 Fix JSON inference example 2024-09-14 21:35:02 +02:00
turboderp
b2c7cf280c Add cp310 cu121 torch2.4 Windows wheel v0.2.2 2024-09-14 21:17:52 +02:00
turboderp
46eff43403 Merge branch 'refs/heads/dev' 2024-09-14 21:13:46 +02:00
turboderp
228ba34cec Bump to 0.2.2 2024-09-14 21:13:22 +02:00
turboderp
a372fe1241 Skip superfluous set creation when possible, even if multiple filters used 2024-09-14 19:42:33 +02:00
turboderp
aadc454183 Fix sampling using multiple filters 2024-09-14 19:36:53 +02:00
turboderp
5ee983593b Fix potential race condition with multithreaded sampling and lazy tokenizer initialization 2024-09-14 19:28:02 +02:00
turboderp
1df7b04821 Allow non-causal attn with SDPA 2024-09-13 02:20:24 +02:00
turboderp
1e18e803da Merge branch 'refs/heads/dev' v0.2.1 2024-09-08 19:23:19 +02:00
turboderp
0d9adf96e8 Bump to 0.2.1 2024-09-08 19:22:43 +02:00
turboderp
f0dca9a862 Bit of cleanup 2024-09-08 16:31:47 +02:00
turboderp
f2c53efd34 Remove (experimental) Q-cache calibration feature 2024-09-08 15:13:00 +02:00
turboderp
a029bcd76e Bit of cleanup 2024-09-08 15:11:34 +02:00
AlpinDale
361d2119b3 fix: support for Rank Stabilized LoRA (RSLoRA) (#619)
If an adapter is trained with RSLoRA, the alpha value is multiplied by `sqrt(rank)`.
2024-09-08 01:00:26 +02:00
turboderp
c1fed2ed19 Add DRY range paramater 2024-09-07 16:06:46 +02:00
turboderp
3e8e181717 Add DRY (still needs testing) 2024-09-07 02:16:16 +02:00
turboderp
affdc0d16c Ensure streams are always set during the forward pass for the active thread 2024-09-05 20:28:09 +02:00
turboderp
5c455c18a0 Filter base: Fix instance init 2024-09-05 20:25:51 +02:00
turboderp
c9ce168ce0 Merge remote-tracking branch 'origin/dev' into dev 2024-09-04 23:23:39 +02:00
turboderp
1e462f1f7f Force loading tensors on default stream 2024-09-04 23:04:22 +02:00
Brian Dashore
c18400fa29 Issues: Add issue templates (#615)
Should help encourage well-formatted issues and make development easier.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 14:58:58 +02:00
turboderp
0d5c0bcc8d Asynchronous filter evaluation 2024-08-31 16:31:51 +02:00
turboderp
12f08dbbd8 Accept Sequence return type from ExLlamaV2Filter.next() 2024-08-31 12:06:21 +02:00
turboderp
ea27954767 TP: Add (slower) SDPA fallback mode when flash-attn is unavailable 2024-08-29 22:39:45 +02:00
turboderp
40e37f4944 Merge branch 'refs/heads/dev' v0.2.0 2024-08-28 22:56:08 +02:00
turboderp
c050aec764 Bump to 0.2.0 2024-08-28 22:56:01 +02:00
turboderp
1a82283df3 Merge branch 'refs/heads/dev' 2024-08-28 22:55:11 +02:00
turboderp
f1d8909809 Catch all exceptions for nvidia-smi and rocm-smi 2024-08-28 20:25:56 +02:00
turboderp
db14154fee Don't use default stream for logit padding mask after all 2024-08-27 22:06:19 +02:00
turboderp
8d3d4c227e Ensure logit padding happens on default stream 2024-08-27 21:48:30 +02:00
turboderp
d9f0ecc12c TP: Fix vocab split for models with odd vocab sizes 2024-08-27 21:47:49 +02:00
turboderp
7319b6ea31 Fix graph update for MLP with post layernorm 2024-08-27 20:07:21 +02:00
turboderp
4230dab3c1 Fix model_init feedback 2024-08-27 19:16:23 +02:00
turboderp
69291d1333 Fix another possible sync issues with fasttensors (for Windows) 2024-08-25 22:01:53 +02:00
turboderp
e539f7cc28 Fix another possible sync issues with fasttensors 2024-08-25 21:56:02 +02:00
turboderp
d3fe9f25d2 Unmap tensors on CPU to reduce temp VRAM overhead while loading 2024-08-25 21:15:28 +02:00
turboderp
7e15947aa1 Fix possible sync issues with fasttensors 2024-08-25 14:51:46 +02:00
turboderp
57ee846672 Fix ROCm compile 2024-08-22 14:04:20 +02:00
turboderp
b926ed4c7e Merge remote-tracking branch 'origin/master' v0.1.9 2024-08-22 13:50:47 +02:00
turboderp
42e1baeef0 Bump to 0.1.9 2024-08-22 13:48:36 +02:00
turboderp
d58d5ab723 Add some docstrings 2024-08-22 13:43:19 +02:00
turboderp
555c360798 Update TP example 2024-08-22 13:43:19 +02:00
turboderp
4117daa546 Cleanup 2024-08-22 12:49:37 +02:00
turboderp
9917403229 Bulk example: Compute immediate output tokens/second 2024-08-22 12:46:58 +02:00
turboderp
547135cc43 Skip sampling if only one token allowed by filters 2024-08-22 12:33:19 +02:00
turboderp
0978ba5f86 Optimize logit filtering in sampler 2024-08-22 11:17:45 +02:00
Michael Panchenko
f1d79c96d2 Fix in ext.py (index error on system without GPU) (#594)
* Update README.md

* Fix in ext.py (index error on system without GPU)

---------

Co-authored-by: turboderp <11859846+turboderp@users.noreply.github.com>
2024-08-20 17:18:50 +02:00
turboderp
e7053198ba Enable defrag for paged TP cache 2024-08-20 17:10:51 +02:00
turboderp
e89dc5b762 Fix for older Torch versions 2024-08-20 14:08:35 +02:00