turboderp
89b841dd8a
safetensors_alt: Allow writing bfloat16 tensors
2026-02-10 17:47:44 +01:00
turboderp
6e4202eade
Bump to v0.0.21
v0.0.21
2026-02-09 22:19:02 +01:00
turboderp
f9a7448366
Merge branch 'refs/heads/st_test' into dev
2026-02-09 04:35:00 +01:00
turboderp
d3e02500e0
Sigmoid+proj kernel: fix regression (Qwen3-Next)
2026-02-09 04:34:37 +01:00
turboderp
d85690204a
Replacement safetensors lib for quantization
2026-01-27 00:52:54 +01:00
turboderp
428a082276
Add performance test
2026-01-22 23:28:53 +01:00
turboderp
91a11853cd
Update README.md
2026-01-22 23:27:23 +01:00
turboderp
96ba966ad9
Bump to v0.0.20
v0.0.20
2026-01-19 23:21:59 +01:00
turboderp
0ecc37bf97
Fix ComboSampler init when initializing as greedy
2026-01-19 22:57:19 +01:00
turboderp
75ee2c78c3
Add Qwen2_5_VLForConditionalGeneration, refactor HCXVisionV2VisionModel as subclass of Qwen2_5VLVisionModel
2026-01-19 22:48:49 +01:00
turboderp
5a6975747f
Bump to v0.0.19
v0.0.19
2026-01-16 23:28:09 +01:00
turboderp
c39616a7b5
Merge pull request #125 from amanwalksdownthestreet/fix-arch-suffix-parsing
...
arch_list: Strip NVIDIA arch suffixes (sm_120a, sm_90a, etc.)
2026-01-14 22:11:43 +01:00
turboderp
f21b92e978
Add Adaptive-P sampler
2026-01-14 21:58:34 +01:00
turboderp
0d09af403a
Diversity test: use greedy sampling for extraction
2026-01-14 21:40:31 +01:00
Jo-Philipp Wich
4845c8fa25
Add tensor parallel support for MiniMax M2 Q/K norms
...
MiniMax M2 uses Q/K RMSNorm with span_heads=True, which normalizes
across ALL heads at each sequence position. When using tensor
parallelism, heads are split across devices, so each device only
sees a subset of heads and computes incorrect local variance.
The fix follows vLLM's approach:
- Compute local sum of squares on each TP rank
- All-reduce the sum across ranks
- Divide by global dimension to get true global mean
- Apply normalization with corrected global variance
Key changes:
- attn.py: Add apply_qk_norms_tp() method with variance all-reduce
- attn.py: Modify tp_export/tp_import to handle span_heads norms
- rmsnorm.py: Preserve span_heads in tp_export, handle 1D tensors in split
- minimax_m2.py: Enable TP support (supports_tp: True)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-12 10:38:59 +00:00
turboderp
e839152802
Add diversity test
2026-01-11 19:12:04 +01:00
turboderp
3186dca9da
generator: Pad token mask when output layer is padded
2026-01-11 19:11:26 +01:00
turboderp
9043690801
generator: Free recurrent state after job completed (prevent memory leak with large job queue)
2026-01-11 17:38:15 +01:00
turboderp
e69d91b12b
model_init: Add sampling args default overrides
2026-01-11 16:38:33 +01:00
turboderp
6b31fc00f5
Add HF tokenizer helper, refactor example
2026-01-11 12:49:12 +01:00
turboderp
288a98f5e3
Refactor sampler args for examples
2026-01-11 12:33:27 +01:00
turboderp
27c68d4e65
Update README.md
2026-01-10 15:59:46 +01:00
turboderp
539410a2a3
Support NanoChatForCausalLM
2026-01-10 15:59:08 +01:00
turboderp
3ecb9f54fb
Merge pull request #136 from mindkrypted/feature/support-solar-open-moe
...
Add support for SolarOpenMoE architecture
2026-01-10 10:55:36 +01:00
mindkrypted
fd8659a6c3
Add support for SolarOpenMoE architecture
2026-01-07 13:45:23 -05:00
turboderp
703b05ab52
Update README.md
2026-01-06 16:08:23 +01:00
turboderp
a17d1a4334
Add HCXVisionV2ForCausalLM architecture
2026-01-06 16:01:54 +01:00
turboderp
7de8641fce
Attn: Add varlen mode
2026-01-06 16:01:54 +01:00
turboderp
a026b32df3
Support IQuestCoderForCausalLM
2026-01-04 12:31:58 +01:00
turboderp
6e75e7b151
chat.py: Fix for models with eos_token_id=null
2026-01-04 02:02:10 +01:00
turboderp
227621e49e
Support HyperCLOVAXForCausalLM
2026-01-03 03:22:50 +01:00
turboderp
a92cf0a13a
Attn: Support custom softmax scale in SDPA mode
2026-01-03 03:22:13 +01:00
turboderp
cff5fd542c
Embedding: Support embedding multiplier
2026-01-03 03:21:55 +01:00
turboderp
452803e73d
Olmo3: Use default RoPE type for SWA layers
2025-12-26 21:38:56 +01:00
turboderp
195d01657a
RoPE: Allow RoPE type override
2025-12-26 21:38:34 +01:00
turboderp
e8b77bba4a
chat.py: Fix prompt tokens/s display
2025-12-25 23:18:50 +01:00
turboderp
80907797a5
chat.py: Add debug mode
2025-12-25 23:18:25 +01:00
turboderp
f0ea2ca858
Linear: Support new FP8 scale format
2025-12-23 21:05:05 +01:00
turboderp
2698a83022
RoPE: Let arch override theta key name
2025-12-23 21:04:41 +01:00
amanwalksdownthestreet
65cfaf3c60
arch_list: Strip NVIDIA arch suffixes (sm_120a, sm_90a, etc.)
2025-12-16 23:06:34 -07:00
turboderp
a32e2219af
Allow -hb 16 while quantizing
2025-12-13 20:55:25 +01:00
turboderp
104268521c
Support Olmo3ForCausalLM
2025-12-13 20:49:03 +01:00
turboderp
bd0f26cd0e
Fix comments
2025-12-10 21:47:30 +01:00
turboderp
1b7009c5b8
Merge remote-tracking branch 'origin/master'
v0.0.18
2025-12-10 10:43:17 +01:00
turboderp
f9d0e6038f
Bump to v0.0.18
2025-12-10 10:42:41 +01:00
turboderp
d8be5d638f
chat.py: Read all stop conditions from config.json
2025-12-10 00:53:45 +01:00
turboderp
9b75bc5f58
Support Ministral3ForCausalLM
2025-12-10 00:53:22 +01:00
turboderp
9663357c4f
Convert: Print some more RoPE debug info
2025-12-10 00:52:49 +01:00
turboderp
24caf2c762
RoPE: Accept partial_rotary_factor in rope_parameters
2025-12-10 00:52:29 +01:00
kingbri
e49c02a3aa
Actions: Add builds for torch 2.9
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com >
2025-12-09 15:03:25 -05:00