turboderp
|
1404f7aa48
|
Bump to v0.0.25
v0.0.25
|
2026-03-11 23:48:47 +01:00 |
|
turboderp
|
9db029ded5
|
Separate transpose options for fused expert weights (account for differences between Qwen3Moe and Qwen3_5Moe)
|
2026-03-11 21:43:45 +01:00 |
|
turboderp
|
e05f4636ee
|
Add Qwen3_5ForCausalLM and Qwen3_5MoeForCausalLM
|
2026-03-11 21:00:23 +01:00 |
|
turboderp
|
1b9e58c9b5
|
BlockSparseMLP: Skip redundant gather
|
2026-03-11 20:25:56 +01:00 |
|
turboderp
|
d52c49c17f
|
GatedDeltaNet: Allow bfloat16 a_log
|
2026-03-11 20:24:04 +01:00 |
|
turboderp
|
ad546f7937
|
Bump to v0.0.24
v0.0.24
|
2026-03-08 20:39:35 +01:00 |
|
turboderp
|
63ba4d005c
|
Generator: If model is recurrent, run last page of prompt in a separate forward pass to create checkpoint
Ensures at most 255 tokens have to be reingested per request
|
2026-03-07 23:32:42 +01:00 |
|
turboderp
|
85237d5744
|
chat.py: Debugging features
|
2026-03-07 23:29:29 +01:00 |
|
turboderp
|
60afe8d983
|
Update README.md
|
2026-03-07 20:48:24 +01:00 |
|
turboderp
|
0e8dd89874
|
BlockSparseMLP: Add single expert graph
|
2026-03-07 04:55:37 +01:00 |
|
turboderp
|
7ad51c0422
|
BlockSparseMLP: Keep buffers between experts when possible
|
2026-03-07 04:54:02 +01:00 |
|
turboderp
|
168f21b0ec
|
BlockSparseMLP: Improved batch routing
|
2026-03-07 04:53:40 +01:00 |
|
turboderp
|
8e192e12f7
|
RMSNorm: Only split 1D norm when spanning attn heads, fixes #165
|
2026-03-07 01:58:41 +01:00 |
|
turboderp
|
766a28dc60
|
BlockSparseMLP: Improved batch routing
|
2026-03-07 01:21:34 +01:00 |
|
turboderp
|
86174510bd
|
Merge branch 'master' into dev
|
2026-03-07 01:18:33 +01:00 |
|
turboderp
|
85cb54c6f3
|
perf.py: Make sure test context is nontrivial to force more expert diversity
|
2026-03-07 01:18:27 +01:00 |
|
turboderp
|
a4fb7c2d56
|
Temp build action
|
2026-03-05 18:25:27 +01:00 |
|
turboderp
|
be12f3999c
|
Bump to v0.0.23
v0.0.23
|
2026-03-05 16:49:30 +01:00 |
|
turboderp
|
144d826dda
|
Conversion: Add fallback quant method for layers with all-zero H, and tolerate matrices with rows/columns of zeros
|
2026-03-05 00:29:59 +01:00 |
|
lesj0610
|
c21108341a
|
Qwen3.5: Include Deepstack layers
|
2026-03-05 00:24:21 +01:00 |
|
turboderp
|
1647c653e4
|
chat.py: Command help and think mode toggle
|
2026-03-03 23:15:07 +01:00 |
|
turboderp
|
a6cf34574b
|
chat.py: Limit frequency of markdown renders
|
2026-03-03 22:46:32 +01:00 |
|
turboderp
|
eb1686a840
|
GatedDeltaNet: Set chunked seqlen threshold to num_v_heads (prevents warning from FLA)
|
2026-03-03 22:33:58 +01:00 |
|
turboderp
|
d3d76d38f8
|
GatedDeltaNet: Add fused kernel for Qwen3.5 path
|
2026-03-03 06:10:06 +01:00 |
|
turboderp
|
e5b522872b
|
HGEMM: Cleanup
|
2026-03-03 06:08:07 +01:00 |
|
turboderp
|
30178941f0
|
LinearEXL3: Clean up comments
|
2026-03-03 05:02:59 +01:00 |
|
turboderp
|
5738bc62e5
|
GatedDeltaNet: Skip redundant split+cat and some casts (Qwen3.5)
|
2026-03-03 05:02:22 +01:00 |
|
turboderp
|
2965eec919
|
GatedDeltaNet: Skip redundant zeroing of buffers (Qwen3-Next)
|
2026-03-03 05:01:16 +01:00 |
|
turboderp
|
410a43df22
|
GatedDeltaNet: Increase max no. K/V heads
|
2026-03-03 04:55:55 +01:00 |
|
turboderp
|
725a75386d
|
RMSNorm/GatedRMSNorm: Tidy up launch logic with macros and add more dtypes
|
2026-03-02 22:22:47 +01:00 |
|
turboderp
|
67785fc286
|
compare_q.py: Paper over some dependency problems
|
2026-03-02 18:47:39 +01:00 |
|
turboderp
|
5cb91c5505
|
GatedDeltaNet: Fix output projection no. input features
|
2026-03-02 16:35:26 +01:00 |
|
turboderp
|
e12e6bd759
|
Update README.md
|
2026-03-02 15:51:58 +01:00 |
|
lesj0610
|
88062566f5
|
Qwen3.5: Smoke test
|
2026-03-02 15:49:29 +01:00 |
|
turboderp
|
021b027728
|
Qwen3.5: Enable MRoPE, update multimodal example
|
2026-03-02 05:27:33 +01:00 |
|
lesj0610
|
0ca0d6ac01
|
Add Qwen3_5ForConditionalGeneration and Qwen3_5MoeForConditionalGeneration
|
2026-03-02 05:22:08 +01:00 |
|
lesj0610
|
390624ab3c
|
convert.py: Better ETA calculation
|
2026-03-02 04:23:51 +01:00 |
|
turboderp
|
88dcdf782d
|
Update README.md
|
2026-03-02 03:49:28 +01:00 |
|
turboderp
|
93695e9a7d
|
RMSNorm/RoPE kernels: Allow BF16/FP32 norm weights
|
2026-03-02 03:49:13 +01:00 |
|
turboderp
|
e2f4198406
|
Formatting
|
2026-03-02 00:53:23 +01:00 |
|
turboderp
|
08ca454ec0
|
Step 3.5: Fix TP split
|
2026-03-01 21:32:59 +01:00 |
|
turboderp
|
6386de7a9b
|
Add Step3p5ForCausalLM
|
2026-03-01 17:59:28 +01:00 |
|
turboderp
|
76937421ec
|
convert.py: Make out_scales the default, with options for auto and disable
|
2026-03-01 17:57:55 +01:00 |
|
turboderp
|
c8c2e6178c
|
chat.py: Catbench shortcut
|
2026-03-01 17:57:55 +01:00 |
|
turboderp
|
99f792dce0
|
Add custom activation limits
|
2026-03-01 17:57:55 +01:00 |
|
turboderp
|
b272ea3515
|
Remove C-style conditionals
|
2026-03-01 15:12:33 +01:00 |
|
turboderp
|
18b2a23d8a
|
chat.py: Fix error message
|
2026-03-01 15:10:22 +01:00 |
|
turboderp
|
b0cfe46702
|
Config: Allow for interpreting config key with incorrect data type as missing key (for weirdly implemented layerwise RoPE settings in some models)
|
2026-03-01 03:16:32 +01:00 |
|
turboderp
|
489b3aab12
|
BlockSparseMLP: Allow loading combined experts tensors also when gate and up are not fused
|
2026-03-01 03:13:56 +01:00 |
|
turboderp
|
4bdd22ea77
|
BlockSparseMLP: Make sure bias is always applied during calibration
|
2026-03-01 03:13:03 +01:00 |
|