Llama Enjoyer
|
e960dfd68d
|
Fix the temperature argument to accept values passed via the cmd line.
|
2024-09-24 17:18:08 +02:00 |
|
Sinan
|
7c7b1993b4
|
Added draft token count as parameter to chat.py (#635)
|
2024-09-24 11:16:30 +02:00 |
|
turboderp
|
8361f3f4a0
|
Add missing cp310+cu118 torch 2.4 windows wheel
|
2024-09-23 17:38:53 +02:00 |
|
turboderp
|
15e54046ba
|
More stream gymnastics
|
2024-09-23 17:28:55 +02:00 |
|
turboderp
|
a5132d072e
|
Add XTC sampler
|
2024-09-22 23:09:19 +02:00 |
|
turboderp
|
6d7b2e8e7a
|
Revert snapshot interval
|
2024-09-22 19:11:32 +02:00 |
|
turboderp
|
43a0be35df
|
Make measurement less sensitive to very sparse inf values in reference fwd pass
|
2024-09-22 19:01:29 +02:00 |
|
turboderp
|
a17f6665cb
|
Fix streams in quantizer
|
2024-09-22 18:15:05 +02:00 |
|
turboderp
|
9946f45f1c
|
Force tensor loading onto priority stream
|
2024-09-20 22:05:20 +02:00 |
|
turboderp
|
e155e0a5b0
|
Fix loading in new thread
|
2024-09-18 19:46:56 +02:00 |
|
turboderp
|
c4a03e09f5
|
Tokenizer: Give priority to tokenizer.json instead of tokenizer.model
|
2024-09-18 00:41:43 +02:00 |
|
turboderp
|
12bceb9f4b
|
Cleanup
|
2024-09-18 00:32:00 +02:00 |
|
turboderp
|
0695f3a854
|
Fix potential bug in filter evaluation
|
2024-09-17 00:34:28 +02:00 |
|
turboderp
|
8a25e0f2b3
|
Merge branch 'refs/heads/master' into dev
|
2024-09-17 00:33:36 +02:00 |
|
turboderp
|
b25210778c
|
Remove fasttensors, add platform-agnostic multithreaded ST loader
|
2024-09-17 00:33:16 +02:00 |
|
turboderp
|
144c576bdb
|
Fix bottlenecks in quantized tensor loading
|
2024-09-17 00:00:27 +02:00 |
|
turboderp
|
10a8842b25
|
Fix JSON inference example
|
2024-09-14 21:35:02 +02:00 |
|
turboderp
|
b2c7cf280c
|
Add cp310 cu121 torch2.4 Windows wheel
v0.2.2
|
2024-09-14 21:17:52 +02:00 |
|
turboderp
|
46eff43403
|
Merge branch 'refs/heads/dev'
|
2024-09-14 21:13:46 +02:00 |
|
turboderp
|
228ba34cec
|
Bump to 0.2.2
|
2024-09-14 21:13:22 +02:00 |
|
turboderp
|
a372fe1241
|
Skip superfluous set creation when possible, even if multiple filters used
|
2024-09-14 19:42:33 +02:00 |
|
turboderp
|
aadc454183
|
Fix sampling using multiple filters
|
2024-09-14 19:36:53 +02:00 |
|
turboderp
|
5ee983593b
|
Fix potential race condition with multithreaded sampling and lazy tokenizer initialization
|
2024-09-14 19:28:02 +02:00 |
|
turboderp
|
1df7b04821
|
Allow non-causal attn with SDPA
|
2024-09-13 02:20:24 +02:00 |
|
turboderp
|
1e18e803da
|
Merge branch 'refs/heads/dev'
v0.2.1
|
2024-09-08 19:23:19 +02:00 |
|
turboderp
|
0d9adf96e8
|
Bump to 0.2.1
|
2024-09-08 19:22:43 +02:00 |
|
turboderp
|
f0dca9a862
|
Bit of cleanup
|
2024-09-08 16:31:47 +02:00 |
|
turboderp
|
f2c53efd34
|
Remove (experimental) Q-cache calibration feature
|
2024-09-08 15:13:00 +02:00 |
|
turboderp
|
a029bcd76e
|
Bit of cleanup
|
2024-09-08 15:11:34 +02:00 |
|
AlpinDale
|
361d2119b3
|
fix: support for Rank Stabilized LoRA (RSLoRA) (#619)
If an adapter is trained with RSLoRA, the alpha value is multiplied by `sqrt(rank)`.
|
2024-09-08 01:00:26 +02:00 |
|
turboderp
|
c1fed2ed19
|
Add DRY range paramater
|
2024-09-07 16:06:46 +02:00 |
|
turboderp
|
3e8e181717
|
Add DRY (still needs testing)
|
2024-09-07 02:16:16 +02:00 |
|
turboderp
|
affdc0d16c
|
Ensure streams are always set during the forward pass for the active thread
|
2024-09-05 20:28:09 +02:00 |
|
turboderp
|
5c455c18a0
|
Filter base: Fix instance init
|
2024-09-05 20:25:51 +02:00 |
|
turboderp
|
c9ce168ce0
|
Merge remote-tracking branch 'origin/dev' into dev
|
2024-09-04 23:23:39 +02:00 |
|
turboderp
|
1e462f1f7f
|
Force loading tensors on default stream
|
2024-09-04 23:04:22 +02:00 |
|
Brian Dashore
|
c18400fa29
|
Issues: Add issue templates (#615)
Should help encourage well-formatted issues and make development easier.
Signed-off-by: kingbri <bdashore3@proton.me>
|
2024-09-04 14:58:58 +02:00 |
|
turboderp
|
0d5c0bcc8d
|
Asynchronous filter evaluation
|
2024-08-31 16:31:51 +02:00 |
|
turboderp
|
12f08dbbd8
|
Accept Sequence return type from ExLlamaV2Filter.next()
|
2024-08-31 12:06:21 +02:00 |
|
turboderp
|
ea27954767
|
TP: Add (slower) SDPA fallback mode when flash-attn is unavailable
|
2024-08-29 22:39:45 +02:00 |
|
turboderp
|
40e37f4944
|
Merge branch 'refs/heads/dev'
v0.2.0
|
2024-08-28 22:56:08 +02:00 |
|
turboderp
|
c050aec764
|
Bump to 0.2.0
|
2024-08-28 22:56:01 +02:00 |
|
turboderp
|
1a82283df3
|
Merge branch 'refs/heads/dev'
|
2024-08-28 22:55:11 +02:00 |
|
turboderp
|
f1d8909809
|
Catch all exceptions for nvidia-smi and rocm-smi
|
2024-08-28 20:25:56 +02:00 |
|
turboderp
|
db14154fee
|
Don't use default stream for logit padding mask after all
|
2024-08-27 22:06:19 +02:00 |
|
turboderp
|
8d3d4c227e
|
Ensure logit padding happens on default stream
|
2024-08-27 21:48:30 +02:00 |
|
turboderp
|
d9f0ecc12c
|
TP: Fix vocab split for models with odd vocab sizes
|
2024-08-27 21:47:49 +02:00 |
|
turboderp
|
7319b6ea31
|
Fix graph update for MLP with post layernorm
|
2024-08-27 20:07:21 +02:00 |
|
turboderp
|
4230dab3c1
|
Fix model_init feedback
|
2024-08-27 19:16:23 +02:00 |
|
turboderp
|
69291d1333
|
Fix another possible sync issues with fasttensors (for Windows)
|
2024-08-25 22:01:53 +02:00 |
|