Commit Graph

732 Commits

Author SHA1 Message Date
turboderp
be12f3999c Bump to v0.0.23 v0.0.23 2026-03-05 16:49:30 +01:00
turboderp
144d826dda Conversion: Add fallback quant method for layers with all-zero H, and tolerate matrices with rows/columns of zeros 2026-03-05 00:29:59 +01:00
lesj0610
c21108341a Qwen3.5: Include Deepstack layers 2026-03-05 00:24:21 +01:00
turboderp
1647c653e4 chat.py: Command help and think mode toggle 2026-03-03 23:15:07 +01:00
turboderp
a6cf34574b chat.py: Limit frequency of markdown renders 2026-03-03 22:46:32 +01:00
turboderp
eb1686a840 GatedDeltaNet: Set chunked seqlen threshold to num_v_heads (prevents warning from FLA) 2026-03-03 22:33:58 +01:00
turboderp
d3d76d38f8 GatedDeltaNet: Add fused kernel for Qwen3.5 path 2026-03-03 06:10:06 +01:00
turboderp
e5b522872b HGEMM: Cleanup 2026-03-03 06:08:07 +01:00
turboderp
30178941f0 LinearEXL3: Clean up comments 2026-03-03 05:02:59 +01:00
turboderp
5738bc62e5 GatedDeltaNet: Skip redundant split+cat and some casts (Qwen3.5) 2026-03-03 05:02:22 +01:00
turboderp
2965eec919 GatedDeltaNet: Skip redundant zeroing of buffers (Qwen3-Next) 2026-03-03 05:01:16 +01:00
turboderp
410a43df22 GatedDeltaNet: Increase max no. K/V heads 2026-03-03 04:55:55 +01:00
turboderp
725a75386d RMSNorm/GatedRMSNorm: Tidy up launch logic with macros and add more dtypes 2026-03-02 22:22:47 +01:00
turboderp
67785fc286 compare_q.py: Paper over some dependency problems 2026-03-02 18:47:39 +01:00
turboderp
5cb91c5505 GatedDeltaNet: Fix output projection no. input features 2026-03-02 16:35:26 +01:00
turboderp
e12e6bd759 Update README.md 2026-03-02 15:51:58 +01:00
lesj0610
88062566f5 Qwen3.5: Smoke test 2026-03-02 15:49:29 +01:00
turboderp
021b027728 Qwen3.5: Enable MRoPE, update multimodal example 2026-03-02 05:27:33 +01:00
lesj0610
0ca0d6ac01 Add Qwen3_5ForConditionalGeneration and Qwen3_5MoeForConditionalGeneration 2026-03-02 05:22:08 +01:00
lesj0610
390624ab3c convert.py: Better ETA calculation 2026-03-02 04:23:51 +01:00
turboderp
88dcdf782d Update README.md 2026-03-02 03:49:28 +01:00
turboderp
93695e9a7d RMSNorm/RoPE kernels: Allow BF16/FP32 norm weights 2026-03-02 03:49:13 +01:00
turboderp
e2f4198406 Formatting 2026-03-02 00:53:23 +01:00
turboderp
08ca454ec0 Step 3.5: Fix TP split 2026-03-01 21:32:59 +01:00
turboderp
6386de7a9b Add Step3p5ForCausalLM 2026-03-01 17:59:28 +01:00
turboderp
76937421ec convert.py: Make out_scales the default, with options for auto and disable 2026-03-01 17:57:55 +01:00
turboderp
c8c2e6178c chat.py: Catbench shortcut 2026-03-01 17:57:55 +01:00
turboderp
99f792dce0 Add custom activation limits 2026-03-01 17:57:55 +01:00
turboderp
b272ea3515 Remove C-style conditionals 2026-03-01 15:12:33 +01:00
turboderp
18b2a23d8a chat.py: Fix error message 2026-03-01 15:10:22 +01:00
turboderp
b0cfe46702 Config: Allow for interpreting config key with incorrect data type as missing key (for weirdly implemented layerwise RoPE settings in some models) 2026-03-01 03:16:32 +01:00
turboderp
489b3aab12 BlockSparseMLP: Allow loading combined experts tensors also when gate and up are not fused 2026-03-01 03:13:56 +01:00
turboderp
4bdd22ea77 BlockSparseMLP: Make sure bias is always applied during calibration 2026-03-01 03:13:03 +01:00
turboderp
f7ccb524e7 Attn: Support headwise gate 2026-03-01 03:12:03 +01:00
turboderp
447c8bb522 Build actions: Add torch 2.10.0 wheels 2026-02-28 23:53:15 +01:00
turboderp
8ef7f4b5dd Linear: Allow fusing linear layers during unquantized model load 2026-02-22 22:43:34 +01:00
turboderp
c1b16d2fc9 Loader: Allow checking for lists of tensor groups 2026-02-22 22:42:30 +01:00
turboderp
ea1fe0ccea Cleanup 2026-02-22 15:14:57 +01:00
turboderp
ed5bad7235 Alias __nv_bfloat16 -> bfloat16 2026-02-17 21:24:41 +01:00
turboderp
b2b6f37e12 perf.py: Error out if test length > cache size 2026-02-17 20:04:13 +01:00
turboderp
3f9c053227 Merge pull request #141
Add tensor parallel support for MiniMax M2 Q/K norms
2026-02-16 01:24:34 +01:00
turboderp
abb083ceb8 Merge pull request #103 from mratsim/patch-1
Add size estimation script for model tensors size
2026-02-15 17:58:50 +01:00
turboderp
ae3645c455 Merge pull request #147 from lesj0610/feat/hf-chat-template-compat
Tokenizer: robust HF chat template kwargs and output compatibility
2026-02-15 17:58:03 +01:00
turboderp
eca621af79 Merge remote-tracking branch 'origin/dev' into dev 2026-02-15 17:56:31 +01:00
turboderp
1744361cc2 Merge pull request #148 from lesj0610/fix/exaone4-swa-layer-types
exaone4: use layer_types as source of truth for SWA layer mapping
2026-02-15 17:55:08 +01:00
turboderp
44f70da0f9 Merge pull request #149 from MikeRoz47/dev
Add optional arg to compare_q.py for saving plot files
2026-02-15 17:53:55 +01:00
MikeRoz47
52c2f5794d Add optional arg to compare_q to allow it to save plots rather than show them 2026-02-15 16:41:18 +00:00
lesj0610
5c076e5f2a exaone4: prefer layer_types over pattern for SWA layer mapping 2026-02-12 01:48:52 +09:00
lesj0610
019d965eb6 tokenizer: harden HF chat template compatibility and kwargs passthrough 2026-02-12 01:25:30 +09:00
turboderp
701afb9294 Bump to v0.0.22 v0.0.22 2026-02-10 17:48:24 +01:00