Commit Graph

732 Commits

Author SHA1 Message Date
turboderp
89b841dd8a safetensors_alt: Allow writing bfloat16 tensors 2026-02-10 17:47:44 +01:00
turboderp
6e4202eade Bump to v0.0.21 v0.0.21 2026-02-09 22:19:02 +01:00
turboderp
f9a7448366 Merge branch 'refs/heads/st_test' into dev 2026-02-09 04:35:00 +01:00
turboderp
d3e02500e0 Sigmoid+proj kernel: fix regression (Qwen3-Next) 2026-02-09 04:34:37 +01:00
turboderp
d85690204a Replacement safetensors lib for quantization 2026-01-27 00:52:54 +01:00
turboderp
428a082276 Add performance test 2026-01-22 23:28:53 +01:00
turboderp
91a11853cd Update README.md 2026-01-22 23:27:23 +01:00
turboderp
96ba966ad9 Bump to v0.0.20 v0.0.20 2026-01-19 23:21:59 +01:00
turboderp
0ecc37bf97 Fix ComboSampler init when initializing as greedy 2026-01-19 22:57:19 +01:00
turboderp
75ee2c78c3 Add Qwen2_5_VLForConditionalGeneration, refactor HCXVisionV2VisionModel as subclass of Qwen2_5VLVisionModel 2026-01-19 22:48:49 +01:00
turboderp
5a6975747f Bump to v0.0.19 v0.0.19 2026-01-16 23:28:09 +01:00
turboderp
c39616a7b5 Merge pull request #125 from amanwalksdownthestreet/fix-arch-suffix-parsing
arch_list: Strip NVIDIA arch suffixes (sm_120a, sm_90a, etc.)
2026-01-14 22:11:43 +01:00
turboderp
f21b92e978 Add Adaptive-P sampler 2026-01-14 21:58:34 +01:00
turboderp
0d09af403a Diversity test: use greedy sampling for extraction 2026-01-14 21:40:31 +01:00
Jo-Philipp Wich
4845c8fa25 Add tensor parallel support for MiniMax M2 Q/K norms
MiniMax M2 uses Q/K RMSNorm with span_heads=True, which normalizes
across ALL heads at each sequence position. When using tensor
parallelism, heads are split across devices, so each device only
sees a subset of heads and computes incorrect local variance.

The fix follows vLLM's approach:
- Compute local sum of squares on each TP rank
- All-reduce the sum across ranks
- Divide by global dimension to get true global mean
- Apply normalization with corrected global variance

Key changes:
- attn.py: Add apply_qk_norms_tp() method with variance all-reduce
- attn.py: Modify tp_export/tp_import to handle span_heads norms
- rmsnorm.py: Preserve span_heads in tp_export, handle 1D tensors in split
- minimax_m2.py: Enable TP support (supports_tp: True)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 10:38:59 +00:00
turboderp
e839152802 Add diversity test 2026-01-11 19:12:04 +01:00
turboderp
3186dca9da generator: Pad token mask when output layer is padded 2026-01-11 19:11:26 +01:00
turboderp
9043690801 generator: Free recurrent state after job completed (prevent memory leak with large job queue) 2026-01-11 17:38:15 +01:00
turboderp
e69d91b12b model_init: Add sampling args default overrides 2026-01-11 16:38:33 +01:00
turboderp
6b31fc00f5 Add HF tokenizer helper, refactor example 2026-01-11 12:49:12 +01:00
turboderp
288a98f5e3 Refactor sampler args for examples 2026-01-11 12:33:27 +01:00
turboderp
27c68d4e65 Update README.md 2026-01-10 15:59:46 +01:00
turboderp
539410a2a3 Support NanoChatForCausalLM 2026-01-10 15:59:08 +01:00
turboderp
3ecb9f54fb Merge pull request #136 from mindkrypted/feature/support-solar-open-moe
Add support for SolarOpenMoE architecture
2026-01-10 10:55:36 +01:00
mindkrypted
fd8659a6c3 Add support for SolarOpenMoE architecture 2026-01-07 13:45:23 -05:00
turboderp
703b05ab52 Update README.md 2026-01-06 16:08:23 +01:00
turboderp
a17d1a4334 Add HCXVisionV2ForCausalLM architecture 2026-01-06 16:01:54 +01:00
turboderp
7de8641fce Attn: Add varlen mode 2026-01-06 16:01:54 +01:00
turboderp
a026b32df3 Support IQuestCoderForCausalLM 2026-01-04 12:31:58 +01:00
turboderp
6e75e7b151 chat.py: Fix for models with eos_token_id=null 2026-01-04 02:02:10 +01:00
turboderp
227621e49e Support HyperCLOVAXForCausalLM 2026-01-03 03:22:50 +01:00
turboderp
a92cf0a13a Attn: Support custom softmax scale in SDPA mode 2026-01-03 03:22:13 +01:00
turboderp
cff5fd542c Embedding: Support embedding multiplier 2026-01-03 03:21:55 +01:00
turboderp
452803e73d Olmo3: Use default RoPE type for SWA layers 2025-12-26 21:38:56 +01:00
turboderp
195d01657a RoPE: Allow RoPE type override 2025-12-26 21:38:34 +01:00
turboderp
e8b77bba4a chat.py: Fix prompt tokens/s display 2025-12-25 23:18:50 +01:00
turboderp
80907797a5 chat.py: Add debug mode 2025-12-25 23:18:25 +01:00
turboderp
f0ea2ca858 Linear: Support new FP8 scale format 2025-12-23 21:05:05 +01:00
turboderp
2698a83022 RoPE: Let arch override theta key name 2025-12-23 21:04:41 +01:00
amanwalksdownthestreet
65cfaf3c60 arch_list: Strip NVIDIA arch suffixes (sm_120a, sm_90a, etc.) 2025-12-16 23:06:34 -07:00
turboderp
a32e2219af Allow -hb 16 while quantizing 2025-12-13 20:55:25 +01:00
turboderp
104268521c Support Olmo3ForCausalLM 2025-12-13 20:49:03 +01:00
turboderp
bd0f26cd0e Fix comments 2025-12-10 21:47:30 +01:00
turboderp
1b7009c5b8 Merge remote-tracking branch 'origin/master' v0.0.18 2025-12-10 10:43:17 +01:00
turboderp
f9d0e6038f Bump to v0.0.18 2025-12-10 10:42:41 +01:00
turboderp
d8be5d638f chat.py: Read all stop conditions from config.json 2025-12-10 00:53:45 +01:00
turboderp
9b75bc5f58 Support Ministral3ForCausalLM 2025-12-10 00:53:22 +01:00
turboderp
9663357c4f Convert: Print some more RoPE debug info 2025-12-10 00:52:49 +01:00
turboderp
24caf2c762 RoPE: Accept partial_rotary_factor in rope_parameters 2025-12-10 00:52:29 +01:00
kingbri
e49c02a3aa Actions: Add builds for torch 2.9
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-12-09 15:03:25 -05:00