turboderp
93695e9a7d
RMSNorm/RoPE kernels: Allow BF16/FP32 norm weights
2026-03-02 03:49:13 +01:00
turboderp
e2f4198406
Formatting
2026-03-02 00:53:23 +01:00
turboderp
08ca454ec0
Step 3.5: Fix TP split
2026-03-01 21:32:59 +01:00
turboderp
6386de7a9b
Add Step3p5ForCausalLM
2026-03-01 17:59:28 +01:00
turboderp
76937421ec
convert.py: Make out_scales the default, with options for auto and disable
2026-03-01 17:57:55 +01:00
turboderp
c8c2e6178c
chat.py: Catbench shortcut
2026-03-01 17:57:55 +01:00
turboderp
99f792dce0
Add custom activation limits
2026-03-01 17:57:55 +01:00
turboderp
b272ea3515
Remove C-style conditionals
2026-03-01 15:12:33 +01:00
turboderp
18b2a23d8a
chat.py: Fix error message
2026-03-01 15:10:22 +01:00
turboderp
b0cfe46702
Config: Allow for interpreting config key with incorrect data type as missing key (for weirdly implemented layerwise RoPE settings in some models)
2026-03-01 03:16:32 +01:00
turboderp
489b3aab12
BlockSparseMLP: Allow loading combined experts tensors also when gate and up are not fused
2026-03-01 03:13:56 +01:00
turboderp
4bdd22ea77
BlockSparseMLP: Make sure bias is always applied during calibration
2026-03-01 03:13:03 +01:00
turboderp
f7ccb524e7
Attn: Support headwise gate
2026-03-01 03:12:03 +01:00
turboderp
447c8bb522
Build actions: Add torch 2.10.0 wheels
2026-02-28 23:53:15 +01:00
turboderp
8ef7f4b5dd
Linear: Allow fusing linear layers during unquantized model load
2026-02-22 22:43:34 +01:00
turboderp
c1b16d2fc9
Loader: Allow checking for lists of tensor groups
2026-02-22 22:42:30 +01:00
turboderp
ea1fe0ccea
Cleanup
2026-02-22 15:14:57 +01:00
turboderp
ed5bad7235
Alias __nv_bfloat16 -> bfloat16
2026-02-17 21:24:41 +01:00
turboderp
b2b6f37e12
perf.py: Error out if test length > cache size
2026-02-17 20:04:13 +01:00
turboderp
3f9c053227
Merge pull request #141
...
Add tensor parallel support for MiniMax M2 Q/K norms
2026-02-16 01:24:34 +01:00
turboderp
abb083ceb8
Merge pull request #103 from mratsim/patch-1
...
Add size estimation script for model tensors size
2026-02-15 17:58:50 +01:00
turboderp
ae3645c455
Merge pull request #147 from lesj0610/feat/hf-chat-template-compat
...
Tokenizer: robust HF chat template kwargs and output compatibility
2026-02-15 17:58:03 +01:00
turboderp
eca621af79
Merge remote-tracking branch 'origin/dev' into dev
2026-02-15 17:56:31 +01:00
turboderp
1744361cc2
Merge pull request #148 from lesj0610/fix/exaone4-swa-layer-types
...
exaone4: use layer_types as source of truth for SWA layer mapping
2026-02-15 17:55:08 +01:00
turboderp
44f70da0f9
Merge pull request #149 from MikeRoz47/dev
...
Add optional arg to compare_q.py for saving plot files
2026-02-15 17:53:55 +01:00
MikeRoz47
52c2f5794d
Add optional arg to compare_q to allow it to save plots rather than show them
2026-02-15 16:41:18 +00:00
lesj0610
5c076e5f2a
exaone4: prefer layer_types over pattern for SWA layer mapping
2026-02-12 01:48:52 +09:00
lesj0610
019d965eb6
tokenizer: harden HF chat template compatibility and kwargs passthrough
2026-02-12 01:25:30 +09:00
turboderp
701afb9294
Bump to v0.0.22
v0.0.22
2026-02-10 17:48:24 +01:00
turboderp
89b841dd8a
safetensors_alt: Allow writing bfloat16 tensors
2026-02-10 17:47:44 +01:00
turboderp
6e4202eade
Bump to v0.0.21
v0.0.21
2026-02-09 22:19:02 +01:00
turboderp
f9a7448366
Merge branch 'refs/heads/st_test' into dev
2026-02-09 04:35:00 +01:00
turboderp
d3e02500e0
Sigmoid+proj kernel: fix regression (Qwen3-Next)
2026-02-09 04:34:37 +01:00
turboderp
d85690204a
Replacement safetensors lib for quantization
2026-01-27 00:52:54 +01:00
turboderp
428a082276
Add performance test
2026-01-22 23:28:53 +01:00
turboderp
91a11853cd
Update README.md
2026-01-22 23:27:23 +01:00
turboderp
96ba966ad9
Bump to v0.0.20
v0.0.20
2026-01-19 23:21:59 +01:00
turboderp
0ecc37bf97
Fix ComboSampler init when initializing as greedy
2026-01-19 22:57:19 +01:00
turboderp
75ee2c78c3
Add Qwen2_5_VLForConditionalGeneration, refactor HCXVisionV2VisionModel as subclass of Qwen2_5VLVisionModel
2026-01-19 22:48:49 +01:00
turboderp
5a6975747f
Bump to v0.0.19
v0.0.19
2026-01-16 23:28:09 +01:00
turboderp
c39616a7b5
Merge pull request #125 from amanwalksdownthestreet/fix-arch-suffix-parsing
...
arch_list: Strip NVIDIA arch suffixes (sm_120a, sm_90a, etc.)
2026-01-14 22:11:43 +01:00
turboderp
f21b92e978
Add Adaptive-P sampler
2026-01-14 21:58:34 +01:00
turboderp
0d09af403a
Diversity test: use greedy sampling for extraction
2026-01-14 21:40:31 +01:00
Jo-Philipp Wich
4845c8fa25
Add tensor parallel support for MiniMax M2 Q/K norms
...
MiniMax M2 uses Q/K RMSNorm with span_heads=True, which normalizes
across ALL heads at each sequence position. When using tensor
parallelism, heads are split across devices, so each device only
sees a subset of heads and computes incorrect local variance.
The fix follows vLLM's approach:
- Compute local sum of squares on each TP rank
- All-reduce the sum across ranks
- Divide by global dimension to get true global mean
- Apply normalization with corrected global variance
Key changes:
- attn.py: Add apply_qk_norms_tp() method with variance all-reduce
- attn.py: Modify tp_export/tp_import to handle span_heads norms
- rmsnorm.py: Preserve span_heads in tp_export, handle 1D tensors in split
- minimax_m2.py: Enable TP support (supports_tp: True)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-12 10:38:59 +00:00
turboderp
e839152802
Add diversity test
2026-01-11 19:12:04 +01:00
turboderp
3186dca9da
generator: Pad token mask when output layer is padded
2026-01-11 19:11:26 +01:00
turboderp
9043690801
generator: Free recurrent state after job completed (prevent memory leak with large job queue)
2026-01-11 17:38:15 +01:00
turboderp
e69d91b12b
model_init: Add sampling args default overrides
2026-01-11 16:38:33 +01:00
turboderp
6b31fc00f5
Add HF tokenizer helper, refactor example
2026-01-11 12:49:12 +01:00
turboderp
288a98f5e3
Refactor sampler args for examples
2026-01-11 12:33:27 +01:00