turboderp
|
2d58a3cf35
|
convert.py: Fix parallel mode regression
|
2026-05-10 17:38:46 +02:00 |
|
turboderp
|
b8f0c93c37
|
Actions: Fix build tag for cu132 wheels
|
2026-05-10 16:03:06 +02:00 |
|
turboderp
|
e3410fa2bc
|
Actions: Add cu132 wheels (Python 3.12+, Torch 2.11)
|
2026-05-10 15:58:33 +02:00 |
|
turboderp
|
112612705c
|
Actions: cu132 wheel experiment, attempt #10
|
2026-05-10 14:51:55 +02:00 |
|
turboderp
|
4e7b6ed240
|
Actions: cu132 wheel experiment, attempt #9
|
2026-05-10 14:39:17 +02:00 |
|
turboderp
|
ca822f9400
|
Actions: cu132 wheel experiment, attempt #8
|
2026-05-10 14:27:15 +02:00 |
|
turboderp
|
efbcda2911
|
Actions: cu132 wheel experiment, attempt #6
|
2026-05-10 14:11:33 +02:00 |
|
turboderp
|
005ff30323
|
Actions: cu132 wheel experiment, attempt #5 (add cp312)
|
2026-05-10 13:59:24 +02:00 |
|
turboderp
|
287c3ddbd8
|
Actions: cu132 wheel experiment, attempt #5
|
2026-05-10 13:55:24 +02:00 |
|
turboderp
|
67bdf5633b
|
Actions: cu132 wheel experiment, attempt #4
|
2026-05-10 13:36:00 +02:00 |
|
turboderp
|
0313939d9b
|
Actions: cu132 wheel experiment, attempt #3
|
2026-05-10 13:08:42 +02:00 |
|
turboderp
|
5ddc4ea8dd
|
Actions: cu132 wheel experiment, attempt #2
|
2026-05-10 12:52:28 +02:00 |
|
turboderp
|
0a33b2fe3c
|
Actions: cu132 wheel experiment
|
2026-05-10 12:33:50 +02:00 |
|
turboderp
|
b0e7dd6c8a
|
Unpin pydantic dependency
|
2026-05-09 22:36:44 +02:00 |
|
turboderp
|
c9184a9801
|
Add .modules.attention to packages
|
2026-05-09 22:33:32 +02:00 |
|
turboderp
|
fa142797ce
|
Bump to v0.0.34
v0.0.34
|
2026-05-09 21:08:45 +02:00 |
|
turboderp
|
1fe53b96e6
|
Merge remote-tracking branch 'origin/dev' into dev
|
2026-05-09 21:07:38 +02:00 |
|
turboderp
|
fae1a7a9de
|
Remove redundant reconstruct during prefill (fixes regression after RYS)
|
2026-05-09 21:07:27 +02:00 |
|
turboderp
|
817b35dad9
|
Merge pull request #203 from so-dimm/patch-1
README.md: fix typo and update Gemma 4 support info
|
2026-05-09 20:39:17 +02:00 |
|
turboderp
|
ec7d5102ad
|
Bump to v0.0.33
v0.0.33
|
2026-05-09 19:52:30 +02:00 |
|
turboderp
|
9a3041c5b0
|
longctx eval: Allow draft model
|
2026-05-09 13:30:22 +02:00 |
|
turboderp
|
d48d859d37
|
Attn: Update fallback warning to suggest installing Triton
|
2026-05-09 12:47:50 +02:00 |
|
turboderp
|
befc9b1730
|
Tokenizer: Also scan generation_config.json for EOS tokens
|
2026-05-09 02:22:25 +02:00 |
|
turboderp
|
e9f5a09e91
|
Attn: Allow TP export when use_k_as_v==True
|
2026-05-09 02:05:39 +02:00 |
|
turboderp
|
35b66649fb
|
Attn: Refactor, new dispatch mechanism with multiple backends, remove explicit SDPA mode (keep as fallback), new paged Triton attn kernels for head_dim=512
|
2026-05-09 02:03:49 +02:00 |
|
turboderp
|
2631448e43
|
Attn: Increase bighead attn kernel threshold to 16 (for Gemma4 DFlash)
|
2026-05-07 23:13:14 +02:00 |
|
turboderp
|
11a2a849cc
|
Eval: Add -gp/--gen_prompt option to eval scripts
|
2026-05-07 20:34:02 +02:00 |
|
turboderp
|
972a2c514c
|
Bighead attn: Correct Q shape when bsz > 1 and no cache
|
2026-05-07 20:09:55 +02:00 |
|
turboderp
|
b943f85c36
|
DFlash: Allow quantization
|
2026-05-06 22:44:42 +02:00 |
|
turboderp
|
3ff37127f0
|
convert.py: Allow quantizing without calibration (DFlash)
|
2026-05-06 22:44:24 +02:00 |
|
turboderp
|
0dc28740ad
|
convert.py: Clear tensor temp directory before writing first set of quantized tensors
|
2026-05-06 22:42:37 +02:00 |
|
turboderp
|
770894a9cc
|
Autotune: Improve candidate selection
|
2026-05-06 19:58:28 +02:00 |
|
Optimal
|
10a45faa1f
|
README.md: fix typo and update Gemma 4 support info
- Simple typo fix in README.md
- Added a note that Gemma 4 E2B and E4B models are currently not supported.
|
2026-05-06 21:03:35 +09:00 |
|
turboderp
|
a960f4dbce
|
DFlash: Fix target layer indices
|
2026-05-06 09:53:45 +02:00 |
|
turboderp
|
72e479ddc4
|
Update README.md
|
2026-05-05 10:04:37 +02:00 |
|
turboderp
|
c6a54acbe1
|
SD benchmark: Reduce default cache size
|
2026-05-05 10:04:29 +02:00 |
|
turboderp
|
841897fe42
|
TP: Increase CPU reduce timeout to account for autotune
|
2026-05-03 18:12:48 +02:00 |
|
turboderp
|
e0d330d2ff
|
Update build actions
|
2026-05-03 01:51:56 +02:00 |
|
turboderp
|
d7d6d4240a
|
Autotune: Don't include full torch extension header to prevent build issues on Windows
|
2026-05-03 01:51:45 +02:00 |
|
turboderp
|
86253bdbed
|
Update build actions
|
2026-05-03 00:54:21 +02:00 |
|
turboderp
|
45d1c8572f
|
Suppress warnings in extension build
|
2026-05-03 00:50:37 +02:00 |
|
turboderp
|
b70b0476f5
|
Autotune: Fix typo breaking Windows build
|
2026-05-03 00:50:08 +02:00 |
|
turboderp
|
fae9d1d378
|
Update build actions: restore JIT wheel
|
2026-05-02 23:58:51 +02:00 |
|
turboderp
|
820e5063f3
|
Update build actions, try to pin xformers during wheel build to prevent Torch 2.11 dependency on all versions
v0.0.32
|
2026-05-02 23:37:22 +02:00 |
|
turboderp
|
527ccc87eb
|
Bump to v0.0.32
|
2026-05-02 23:36:26 +02:00 |
|
turboderp
|
7bfa8e6a72
|
Autotune: Adjust tuning precision thresholds
|
2026-05-02 14:05:19 +02:00 |
|
turboderp
|
fd5b65b526
|
Generator: Advance filter when draft tokens are accepted
|
2026-05-02 13:57:05 +02:00 |
|
turboderp
|
61b211eaac
|
Generator: Guard against single stop condition not given as a list
|
2026-05-02 12:23:08 +02:00 |
|
turboderp
|
ca708d9064
|
TP: Increase reduce timeout to 45s to allow for autotune
|
2026-05-02 12:09:21 +02:00 |
|
turboderp
|
d7eb7c3f6b
|
TP: Fix cache layer regression after RYS
|
2026-05-02 12:08:02 +02:00 |
|