turboderp
|
c050aec764
|
Bump to 0.2.0
|
2024-08-28 22:56:01 +02:00 |
|
turboderp
|
1a82283df3
|
Merge branch 'refs/heads/dev'
|
2024-08-28 22:55:11 +02:00 |
|
turboderp
|
f1d8909809
|
Catch all exceptions for nvidia-smi and rocm-smi
|
2024-08-28 20:25:56 +02:00 |
|
turboderp
|
db14154fee
|
Don't use default stream for logit padding mask after all
|
2024-08-27 22:06:19 +02:00 |
|
turboderp
|
8d3d4c227e
|
Ensure logit padding happens on default stream
|
2024-08-27 21:48:30 +02:00 |
|
turboderp
|
d9f0ecc12c
|
TP: Fix vocab split for models with odd vocab sizes
|
2024-08-27 21:47:49 +02:00 |
|
turboderp
|
7319b6ea31
|
Fix graph update for MLP with post layernorm
|
2024-08-27 20:07:21 +02:00 |
|
turboderp
|
4230dab3c1
|
Fix model_init feedback
|
2024-08-27 19:16:23 +02:00 |
|
turboderp
|
69291d1333
|
Fix another possible sync issues with fasttensors (for Windows)
|
2024-08-25 22:01:53 +02:00 |
|
turboderp
|
e539f7cc28
|
Fix another possible sync issues with fasttensors
|
2024-08-25 21:56:02 +02:00 |
|
turboderp
|
d3fe9f25d2
|
Unmap tensors on CPU to reduce temp VRAM overhead while loading
|
2024-08-25 21:15:28 +02:00 |
|
turboderp
|
7e15947aa1
|
Fix possible sync issues with fasttensors
|
2024-08-25 14:51:46 +02:00 |
|
turboderp
|
57ee846672
|
Fix ROCm compile
|
2024-08-22 14:04:20 +02:00 |
|
turboderp
|
b926ed4c7e
|
Merge remote-tracking branch 'origin/master'
v0.1.9
|
2024-08-22 13:50:47 +02:00 |
|
turboderp
|
42e1baeef0
|
Bump to 0.1.9
|
2024-08-22 13:48:36 +02:00 |
|
turboderp
|
d58d5ab723
|
Add some docstrings
|
2024-08-22 13:43:19 +02:00 |
|
turboderp
|
555c360798
|
Update TP example
|
2024-08-22 13:43:19 +02:00 |
|
turboderp
|
4117daa546
|
Cleanup
|
2024-08-22 12:49:37 +02:00 |
|
turboderp
|
9917403229
|
Bulk example: Compute immediate output tokens/second
|
2024-08-22 12:46:58 +02:00 |
|
turboderp
|
547135cc43
|
Skip sampling if only one token allowed by filters
|
2024-08-22 12:33:19 +02:00 |
|
turboderp
|
0978ba5f86
|
Optimize logit filtering in sampler
|
2024-08-22 11:17:45 +02:00 |
|
Michael Panchenko
|
f1d79c96d2
|
Fix in ext.py (index error on system without GPU) (#594)
* Update README.md
* Fix in ext.py (index error on system without GPU)
---------
Co-authored-by: turboderp <11859846+turboderp@users.noreply.github.com>
|
2024-08-20 17:18:50 +02:00 |
|
turboderp
|
e7053198ba
|
Enable defrag for paged TP cache
|
2024-08-20 17:10:51 +02:00 |
|
turboderp
|
e89dc5b762
|
Fix for older Torch versions
|
2024-08-20 14:08:35 +02:00 |
|
turboderp
|
15d39896f4
|
Fix Q cache for TP mode
|
2024-08-20 11:52:23 +02:00 |
|
turboderp
|
4c6dc5812a
|
Disable unused code to fix compilation issues
|
2024-08-20 11:50:10 +02:00 |
|
turboderp
|
a72b73fc89
|
Update README.md (#593)
|
2024-08-20 01:00:49 +02:00 |
|
turboderp
|
f17feb8345
|
Row split + all_reduce for MLP (not faster, disabled)
|
2024-08-20 00:48:55 +02:00 |
|
turboderp
|
373bcc187e
|
Update benchmarks
|
2024-08-20 00:48:55 +02:00 |
|
turboderp
|
b120d89b6a
|
Permute output features of MLP up+gate with input map of MLP down
|
2024-08-20 00:48:54 +02:00 |
|
turboderp
|
507ce60d7a
|
Add cudaEventDisableTiming for faster sync
|
2024-08-17 10:05:01 +02:00 |
|
turboderp
|
8a98b435d6
|
Fix bug
|
2024-08-17 00:18:03 +02:00 |
|
turboderp
|
e90bf8bc6e
|
Update README.md
|
2024-08-16 21:00:20 +02:00 |
|
turboderp
|
61e30923e6
|
Update benchmarks after graphs
|
2024-08-15 22:59:10 +02:00 |
|
turboderp
|
96ebec1d8b
|
Revert to event sync (barrier kernel is unstable)
|
2024-08-15 22:47:59 +02:00 |
|
turboderp
|
bef04b9b30
|
Use non-optimized unpaged TP forward when no cache provided
|
2024-08-15 18:44:58 +02:00 |
|
turboderp
|
49bf149c8a
|
Fix device allocation for sin/cos tensors
|
2024-08-15 18:43:09 +02:00 |
|
turboderp
|
8477da8f8c
|
Chatbot: Load draft model first
|
2024-08-15 18:42:37 +02:00 |
|
turboderp
|
c185d3fc68
|
Extension function for TP attn (unpaged)
|
2024-08-15 16:20:45 +02:00 |
|
turboderp
|
aa5fd98ab6
|
Fix sync issue
|
2024-08-15 15:04:53 +02:00 |
|
turboderp
|
6a5448407a
|
Fix TP load bug
|
2024-08-15 13:33:31 +02:00 |
|
turboderp
|
c19e9f2002
|
Balance automatic tensor split more intelligently
|
2024-08-15 12:33:24 +02:00 |
|
turboderp
|
8a85865af5
|
Fix sync issues
|
2024-08-15 11:43:23 +02:00 |
|
turboderp
|
18157d7e3c
|
Sync to all active devices regardless of split
|
2024-08-14 23:58:04 +02:00 |
|
turboderp
|
b30f796690
|
TP mode for attn layer, non-paged
|
2024-08-14 23:41:10 +02:00 |
|
turboderp
|
65b9e17c4f
|
Add cross-device barrier kernel, remove event-based sync
|
2024-08-14 11:50:41 +02:00 |
|
turboderp
|
1c5f3de883
|
TP support for Qwen2
|
2024-08-14 11:20:14 +02:00 |
|
turboderp
|
b8f655dc02
|
Bugfix
|
2024-08-13 20:19:47 +02:00 |
|
turboderp
|
fced82e8bc
|
Re-enable barrier after broadcast
|
2024-08-12 20:11:38 +02:00 |
|
turboderp
|
c9a5ac6e99
|
Double-buffered broadcast/gather
|
2024-08-12 20:11:05 +02:00 |
|