973 Commits

Author SHA1 Message Date
turboderp
2d58a3cf35 convert.py: Fix parallel mode regression 2026-05-10 17:38:46 +02:00
turboderp
b8f0c93c37 Actions: Fix build tag for cu132 wheels 2026-05-10 16:03:06 +02:00
turboderp
e3410fa2bc Actions: Add cu132 wheels (Python 3.12+, Torch 2.11) 2026-05-10 15:58:33 +02:00
turboderp
112612705c Actions: cu132 wheel experiment, attempt #10 2026-05-10 14:51:55 +02:00
turboderp
4e7b6ed240 Actions: cu132 wheel experiment, attempt #9 2026-05-10 14:39:17 +02:00
turboderp
ca822f9400 Actions: cu132 wheel experiment, attempt #8 2026-05-10 14:27:15 +02:00
turboderp
efbcda2911 Actions: cu132 wheel experiment, attempt #6 2026-05-10 14:11:33 +02:00
turboderp
005ff30323 Actions: cu132 wheel experiment, attempt #5 (add cp312) 2026-05-10 13:59:24 +02:00
turboderp
287c3ddbd8 Actions: cu132 wheel experiment, attempt #5 2026-05-10 13:55:24 +02:00
turboderp
67bdf5633b Actions: cu132 wheel experiment, attempt #4 2026-05-10 13:36:00 +02:00
turboderp
0313939d9b Actions: cu132 wheel experiment, attempt #3 2026-05-10 13:08:42 +02:00
turboderp
5ddc4ea8dd Actions: cu132 wheel experiment, attempt #2 2026-05-10 12:52:28 +02:00
turboderp
0a33b2fe3c Actions: cu132 wheel experiment 2026-05-10 12:33:50 +02:00
turboderp
b0e7dd6c8a Unpin pydantic dependency 2026-05-09 22:36:44 +02:00
turboderp
c9184a9801 Add .modules.attention to packages 2026-05-09 22:33:32 +02:00
turboderp
fa142797ce Bump to v0.0.34 v0.0.34 2026-05-09 21:08:45 +02:00
turboderp
1fe53b96e6 Merge remote-tracking branch 'origin/dev' into dev 2026-05-09 21:07:38 +02:00
turboderp
fae1a7a9de Remove redundant reconstruct during prefill (fixes regression after RYS) 2026-05-09 21:07:27 +02:00
turboderp
817b35dad9 Merge pull request #203 from so-dimm/patch-1
README.md: fix typo and update Gemma 4 support info
2026-05-09 20:39:17 +02:00
turboderp
ec7d5102ad Bump to v0.0.33 v0.0.33 2026-05-09 19:52:30 +02:00
turboderp
9a3041c5b0 longctx eval: Allow draft model 2026-05-09 13:30:22 +02:00
turboderp
d48d859d37 Attn: Update fallback warning to suggest installing Triton 2026-05-09 12:47:50 +02:00
turboderp
befc9b1730 Tokenizer: Also scan generation_config.json for EOS tokens 2026-05-09 02:22:25 +02:00
turboderp
e9f5a09e91 Attn: Allow TP export when use_k_as_v==True 2026-05-09 02:05:39 +02:00
turboderp
35b66649fb Attn: Refactor, new dispatch mechanism with multiple backends, remove explicit SDPA mode (keep as fallback), new paged Triton attn kernels for head_dim=512 2026-05-09 02:03:49 +02:00
turboderp
2631448e43 Attn: Increase bighead attn kernel threshold to 16 (for Gemma4 DFlash) 2026-05-07 23:13:14 +02:00
turboderp
11a2a849cc Eval: Add -gp/--gen_prompt option to eval scripts 2026-05-07 20:34:02 +02:00
turboderp
972a2c514c Bighead attn: Correct Q shape when bsz > 1 and no cache 2026-05-07 20:09:55 +02:00
turboderp
b943f85c36 DFlash: Allow quantization 2026-05-06 22:44:42 +02:00
turboderp
3ff37127f0 convert.py: Allow quantizing without calibration (DFlash) 2026-05-06 22:44:24 +02:00
turboderp
0dc28740ad convert.py: Clear tensor temp directory before writing first set of quantized tensors 2026-05-06 22:42:37 +02:00
turboderp
770894a9cc Autotune: Improve candidate selection 2026-05-06 19:58:28 +02:00
Optimal
10a45faa1f README.md: fix typo and update Gemma 4 support info
- Simple typo fix in README.md
- Added a note that Gemma 4 E2B and E4B models are currently not supported.
2026-05-06 21:03:35 +09:00
turboderp
a960f4dbce DFlash: Fix target layer indices 2026-05-06 09:53:45 +02:00
turboderp
72e479ddc4 Update README.md 2026-05-05 10:04:37 +02:00
turboderp
c6a54acbe1 SD benchmark: Reduce default cache size 2026-05-05 10:04:29 +02:00
turboderp
841897fe42 TP: Increase CPU reduce timeout to account for autotune 2026-05-03 18:12:48 +02:00
turboderp
e0d330d2ff Update build actions 2026-05-03 01:51:56 +02:00
turboderp
d7d6d4240a Autotune: Don't include full torch extension header to prevent build issues on Windows 2026-05-03 01:51:45 +02:00
turboderp
86253bdbed Update build actions 2026-05-03 00:54:21 +02:00
turboderp
45d1c8572f Suppress warnings in extension build 2026-05-03 00:50:37 +02:00
turboderp
b70b0476f5 Autotune: Fix typo breaking Windows build 2026-05-03 00:50:08 +02:00
turboderp
fae9d1d378 Update build actions: restore JIT wheel 2026-05-02 23:58:51 +02:00
turboderp
820e5063f3 Update build actions, try to pin xformers during wheel build to prevent Torch 2.11 dependency on all versions v0.0.32 2026-05-02 23:37:22 +02:00
turboderp
527ccc87eb Bump to v0.0.32 2026-05-02 23:36:26 +02:00
turboderp
7bfa8e6a72 Autotune: Adjust tuning precision thresholds 2026-05-02 14:05:19 +02:00
turboderp
fd5b65b526 Generator: Advance filter when draft tokens are accepted 2026-05-02 13:57:05 +02:00
turboderp
61b211eaac Generator: Guard against single stop condition not given as a list 2026-05-02 12:23:08 +02:00
turboderp
ca708d9064 TP: Increase reduce timeout to 45s to allow for autotune 2026-05-02 12:09:21 +02:00
turboderp
d7eb7c3f6b TP: Fix cache layer regression after RYS 2026-05-02 12:08:02 +02:00