turboderp
|
144c576bdb
|
Fix bottlenecks in quantized tensor loading
|
2024-09-17 00:00:27 +02:00 |
|
turboderp
|
10a8842b25
|
Fix JSON inference example
|
2024-09-14 21:35:02 +02:00 |
|
turboderp
|
b2c7cf280c
|
Add cp310 cu121 torch2.4 Windows wheel
v0.2.2
|
2024-09-14 21:17:52 +02:00 |
|
turboderp
|
46eff43403
|
Merge branch 'refs/heads/dev'
|
2024-09-14 21:13:46 +02:00 |
|
turboderp
|
228ba34cec
|
Bump to 0.2.2
|
2024-09-14 21:13:22 +02:00 |
|
turboderp
|
a372fe1241
|
Skip superfluous set creation when possible, even if multiple filters used
|
2024-09-14 19:42:33 +02:00 |
|
turboderp
|
aadc454183
|
Fix sampling using multiple filters
|
2024-09-14 19:36:53 +02:00 |
|
turboderp
|
5ee983593b
|
Fix potential race condition with multithreaded sampling and lazy tokenizer initialization
|
2024-09-14 19:28:02 +02:00 |
|
turboderp
|
1df7b04821
|
Allow non-causal attn with SDPA
|
2024-09-13 02:20:24 +02:00 |
|
turboderp
|
1e18e803da
|
Merge branch 'refs/heads/dev'
v0.2.1
|
2024-09-08 19:23:19 +02:00 |
|
turboderp
|
0d9adf96e8
|
Bump to 0.2.1
|
2024-09-08 19:22:43 +02:00 |
|
turboderp
|
f0dca9a862
|
Bit of cleanup
|
2024-09-08 16:31:47 +02:00 |
|
turboderp
|
f2c53efd34
|
Remove (experimental) Q-cache calibration feature
|
2024-09-08 15:13:00 +02:00 |
|
turboderp
|
a029bcd76e
|
Bit of cleanup
|
2024-09-08 15:11:34 +02:00 |
|
AlpinDale
|
361d2119b3
|
fix: support for Rank Stabilized LoRA (RSLoRA) (#619)
If an adapter is trained with RSLoRA, the alpha value is multiplied by `sqrt(rank)`.
|
2024-09-08 01:00:26 +02:00 |
|
turboderp
|
c1fed2ed19
|
Add DRY range paramater
|
2024-09-07 16:06:46 +02:00 |
|
turboderp
|
3e8e181717
|
Add DRY (still needs testing)
|
2024-09-07 02:16:16 +02:00 |
|
turboderp
|
affdc0d16c
|
Ensure streams are always set during the forward pass for the active thread
|
2024-09-05 20:28:09 +02:00 |
|
turboderp
|
5c455c18a0
|
Filter base: Fix instance init
|
2024-09-05 20:25:51 +02:00 |
|
turboderp
|
c9ce168ce0
|
Merge remote-tracking branch 'origin/dev' into dev
|
2024-09-04 23:23:39 +02:00 |
|
turboderp
|
1e462f1f7f
|
Force loading tensors on default stream
|
2024-09-04 23:04:22 +02:00 |
|
Brian Dashore
|
c18400fa29
|
Issues: Add issue templates (#615)
Should help encourage well-formatted issues and make development easier.
Signed-off-by: kingbri <bdashore3@proton.me>
|
2024-09-04 14:58:58 +02:00 |
|
turboderp
|
0d5c0bcc8d
|
Asynchronous filter evaluation
|
2024-08-31 16:31:51 +02:00 |
|
turboderp
|
12f08dbbd8
|
Accept Sequence return type from ExLlamaV2Filter.next()
|
2024-08-31 12:06:21 +02:00 |
|
turboderp
|
ea27954767
|
TP: Add (slower) SDPA fallback mode when flash-attn is unavailable
|
2024-08-29 22:39:45 +02:00 |
|
turboderp
|
40e37f4944
|
Merge branch 'refs/heads/dev'
v0.2.0
|
2024-08-28 22:56:08 +02:00 |
|
turboderp
|
c050aec764
|
Bump to 0.2.0
|
2024-08-28 22:56:01 +02:00 |
|
turboderp
|
1a82283df3
|
Merge branch 'refs/heads/dev'
|
2024-08-28 22:55:11 +02:00 |
|
turboderp
|
f1d8909809
|
Catch all exceptions for nvidia-smi and rocm-smi
|
2024-08-28 20:25:56 +02:00 |
|
turboderp
|
db14154fee
|
Don't use default stream for logit padding mask after all
|
2024-08-27 22:06:19 +02:00 |
|
turboderp
|
8d3d4c227e
|
Ensure logit padding happens on default stream
|
2024-08-27 21:48:30 +02:00 |
|
turboderp
|
d9f0ecc12c
|
TP: Fix vocab split for models with odd vocab sizes
|
2024-08-27 21:47:49 +02:00 |
|
turboderp
|
7319b6ea31
|
Fix graph update for MLP with post layernorm
|
2024-08-27 20:07:21 +02:00 |
|
turboderp
|
4230dab3c1
|
Fix model_init feedback
|
2024-08-27 19:16:23 +02:00 |
|
turboderp
|
69291d1333
|
Fix another possible sync issues with fasttensors (for Windows)
|
2024-08-25 22:01:53 +02:00 |
|
turboderp
|
e539f7cc28
|
Fix another possible sync issues with fasttensors
|
2024-08-25 21:56:02 +02:00 |
|
turboderp
|
d3fe9f25d2
|
Unmap tensors on CPU to reduce temp VRAM overhead while loading
|
2024-08-25 21:15:28 +02:00 |
|
turboderp
|
7e15947aa1
|
Fix possible sync issues with fasttensors
|
2024-08-25 14:51:46 +02:00 |
|
turboderp
|
57ee846672
|
Fix ROCm compile
|
2024-08-22 14:04:20 +02:00 |
|
turboderp
|
b926ed4c7e
|
Merge remote-tracking branch 'origin/master'
v0.1.9
|
2024-08-22 13:50:47 +02:00 |
|
turboderp
|
42e1baeef0
|
Bump to 0.1.9
|
2024-08-22 13:48:36 +02:00 |
|
turboderp
|
d58d5ab723
|
Add some docstrings
|
2024-08-22 13:43:19 +02:00 |
|
turboderp
|
555c360798
|
Update TP example
|
2024-08-22 13:43:19 +02:00 |
|
turboderp
|
4117daa546
|
Cleanup
|
2024-08-22 12:49:37 +02:00 |
|
turboderp
|
9917403229
|
Bulk example: Compute immediate output tokens/second
|
2024-08-22 12:46:58 +02:00 |
|
turboderp
|
547135cc43
|
Skip sampling if only one token allowed by filters
|
2024-08-22 12:33:19 +02:00 |
|
turboderp
|
0978ba5f86
|
Optimize logit filtering in sampler
|
2024-08-22 11:17:45 +02:00 |
|
Michael Panchenko
|
f1d79c96d2
|
Fix in ext.py (index error on system without GPU) (#594)
* Update README.md
* Fix in ext.py (index error on system without GPU)
---------
Co-authored-by: turboderp <11859846+turboderp@users.noreply.github.com>
|
2024-08-20 17:18:50 +02:00 |
|
turboderp
|
e7053198ba
|
Enable defrag for paged TP cache
|
2024-08-20 17:10:51 +02:00 |
|
turboderp
|
e89dc5b762
|
Fix for older Torch versions
|
2024-08-20 14:08:35 +02:00 |
|