Commit Graph

756 Commits

Author SHA1 Message Date
turboderp
ebd2efb6bd chat.py: Random benchmark question feature 2026-03-13 04:31:19 +01:00
turboderp
aaf6337f12 Add OlmoHybridForCausalLM 2026-03-13 00:59:10 +01:00
turboderp
f674142ff3 Qwen3.5: Don't set mrope flag if no vision tower 2026-03-13 00:46:27 +01:00
turboderp
bdfb7929f4 GatedDeltaNet: Support split qkv and conv1d weights 2026-03-13 00:45:59 +01:00
turboderp
f4c56f8c6d GatedDeltaNet: Handle head sizes up to 256, divisible by down to 32, support beta scale (linear_allow_neg_eigval) 2026-03-13 00:44:37 +01:00
turboderp
f83a9ae242 GatedRMSNorm: Use single warp for head size up to 256 2026-03-13 00:40:05 +01:00
turboderp
42d0854c39 convert.py: Compactify display of module tree 2026-03-13 00:37:39 +01:00
turboderp
1404f7aa48 Bump to v0.0.25 v0.0.25 2026-03-11 23:48:47 +01:00
turboderp
9db029ded5 Separate transpose options for fused expert weights (account for differences between Qwen3Moe and Qwen3_5Moe) 2026-03-11 21:43:45 +01:00
turboderp
e05f4636ee Add Qwen3_5ForCausalLM and Qwen3_5MoeForCausalLM 2026-03-11 21:00:23 +01:00
turboderp
1b9e58c9b5 BlockSparseMLP: Skip redundant gather 2026-03-11 20:25:56 +01:00
turboderp
d52c49c17f GatedDeltaNet: Allow bfloat16 a_log 2026-03-11 20:24:04 +01:00
turboderp
ad546f7937 Bump to v0.0.24 v0.0.24 2026-03-08 20:39:35 +01:00
turboderp
63ba4d005c Generator: If model is recurrent, run last page of prompt in a separate forward pass to create checkpoint
Ensures at most 255 tokens have to be reingested per request
2026-03-07 23:32:42 +01:00
turboderp
85237d5744 chat.py: Debugging features 2026-03-07 23:29:29 +01:00
turboderp
60afe8d983 Update README.md 2026-03-07 20:48:24 +01:00
turboderp
0e8dd89874 BlockSparseMLP: Add single expert graph 2026-03-07 04:55:37 +01:00
turboderp
7ad51c0422 BlockSparseMLP: Keep buffers between experts when possible 2026-03-07 04:54:02 +01:00
turboderp
168f21b0ec BlockSparseMLP: Improved batch routing 2026-03-07 04:53:40 +01:00
turboderp
8e192e12f7 RMSNorm: Only split 1D norm when spanning attn heads, fixes #165 2026-03-07 01:58:41 +01:00
turboderp
766a28dc60 BlockSparseMLP: Improved batch routing 2026-03-07 01:21:34 +01:00
turboderp
86174510bd Merge branch 'master' into dev 2026-03-07 01:18:33 +01:00
turboderp
85cb54c6f3 perf.py: Make sure test context is nontrivial to force more expert diversity 2026-03-07 01:18:27 +01:00
turboderp
a4fb7c2d56 Temp build action 2026-03-05 18:25:27 +01:00
turboderp
be12f3999c Bump to v0.0.23 v0.0.23 2026-03-05 16:49:30 +01:00
turboderp
144d826dda Conversion: Add fallback quant method for layers with all-zero H, and tolerate matrices with rows/columns of zeros 2026-03-05 00:29:59 +01:00
lesj0610
c21108341a Qwen3.5: Include Deepstack layers 2026-03-05 00:24:21 +01:00
turboderp
1647c653e4 chat.py: Command help and think mode toggle 2026-03-03 23:15:07 +01:00
turboderp
a6cf34574b chat.py: Limit frequency of markdown renders 2026-03-03 22:46:32 +01:00
turboderp
eb1686a840 GatedDeltaNet: Set chunked seqlen threshold to num_v_heads (prevents warning from FLA) 2026-03-03 22:33:58 +01:00
turboderp
d3d76d38f8 GatedDeltaNet: Add fused kernel for Qwen3.5 path 2026-03-03 06:10:06 +01:00
turboderp
e5b522872b HGEMM: Cleanup 2026-03-03 06:08:07 +01:00
turboderp
30178941f0 LinearEXL3: Clean up comments 2026-03-03 05:02:59 +01:00
turboderp
5738bc62e5 GatedDeltaNet: Skip redundant split+cat and some casts (Qwen3.5) 2026-03-03 05:02:22 +01:00
turboderp
2965eec919 GatedDeltaNet: Skip redundant zeroing of buffers (Qwen3-Next) 2026-03-03 05:01:16 +01:00
turboderp
410a43df22 GatedDeltaNet: Increase max no. K/V heads 2026-03-03 04:55:55 +01:00
turboderp
725a75386d RMSNorm/GatedRMSNorm: Tidy up launch logic with macros and add more dtypes 2026-03-02 22:22:47 +01:00
turboderp
67785fc286 compare_q.py: Paper over some dependency problems 2026-03-02 18:47:39 +01:00
turboderp
5cb91c5505 GatedDeltaNet: Fix output projection no. input features 2026-03-02 16:35:26 +01:00
turboderp
e12e6bd759 Update README.md 2026-03-02 15:51:58 +01:00
lesj0610
88062566f5 Qwen3.5: Smoke test 2026-03-02 15:49:29 +01:00
turboderp
021b027728 Qwen3.5: Enable MRoPE, update multimodal example 2026-03-02 05:27:33 +01:00
lesj0610
0ca0d6ac01 Add Qwen3_5ForConditionalGeneration and Qwen3_5MoeForConditionalGeneration 2026-03-02 05:22:08 +01:00
lesj0610
390624ab3c convert.py: Better ETA calculation 2026-03-02 04:23:51 +01:00
turboderp
88dcdf782d Update README.md 2026-03-02 03:49:28 +01:00
turboderp
93695e9a7d RMSNorm/RoPE kernels: Allow BF16/FP32 norm weights 2026-03-02 03:49:13 +01:00
turboderp
e2f4198406 Formatting 2026-03-02 00:53:23 +01:00
turboderp
08ca454ec0 Step 3.5: Fix TP split 2026-03-01 21:32:59 +01:00
turboderp
6386de7a9b Add Step3p5ForCausalLM 2026-03-01 17:59:28 +01:00
turboderp
76937421ec convert.py: Make out_scales the default, with options for auto and disable 2026-03-01 17:57:55 +01:00