Commit Graph

7261 Commits

Author SHA1 Message Date
Baizhou Zhang
a93065679b Revert "bugfix for weight loading for qwen3-next" (#21496) 2026-03-26 16:17:18 -07:00
SevenJ
2e65c27b29 Api add flush cache timeout (#21413)
Signed-off-by: root <wenjun7j@gmail.com>
2026-03-26 14:44:37 -07:00
Qiaolin Yu
8c3ccef2d9 Fix Kimi K2.5 dp attention+ spec decoding launch crash (#21391) 2026-03-26 14:40:26 -07:00
satyamk7054
be0cca5596 Use torch.addmm instead of separate mm and add_ calls for LoRA torch.native (#20562)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
2026-03-26 14:35:20 -07:00
satyamk7054
e59ea4f6e9 fix: torch-native LoRA for multi-adapter case (#20564)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
2026-03-26 14:34:16 -07:00
Liangsheng Yin
fb90c9d298 [Test] Consolidate eval accuracy test mixins into eval_accuracy_kit (#21047) 2026-03-26 14:26:46 -07:00
Liangsheng Yin
e5b7650353 Fix UnboundLocalError when DetokenizerManager constructor fails (#21471) 2026-03-26 13:00:16 -07:00
Ho-Ren (Jack) Chuang
4b5f63e1b8 FIX: (NSA) Compute topk_indices_offset when NSA prefill flashmla_sparse is used with FP8 KV cache (#20606)
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
2026-03-26 12:50:50 -07:00
jianzhao-xu
3867c6431a Fix bug in dbrx model (#21445)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
2026-03-26 11:23:30 -07:00
shuwenn
646573e4e8 fix: use get_rope_config() to support models without rope_parameters (#21135) 2026-03-26 11:22:12 -07:00
McZyWu
0906e45cec bugfix for weight loading for qwen3-next (#21313) 2026-03-26 21:21:00 +08:00
Mick
35720d9969 [diffusion] fix: fix qwen-image with nunchaku (#21415) 2026-03-26 16:31:44 +08:00
Anant Sharma
f289d173aa [Deps] Bump xgrammar to 0.1.32 (#21032) 2026-03-26 01:22:37 -07:00
Chen, Zhentao
fd535942ac [AMD]Integrate aiter's fused_topk for softmax scoring in topk function (#21421)
Co-authored-by: Chen, Todd <zhenchen@amd.com>
2026-03-26 00:57:56 -07:00
R0CKSTAR
a305964159 [MLX] Add native MLX execution backend for Apple Silicon Mac (#20342)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2026-03-26 00:09:17 -07:00
Xiaoyu Zhang
7ca015fe65 [Diffusion] Refactor diffusion JIT kernel test layout and narrow CI triggers (#21385) 2026-03-26 15:02:02 +08:00
Liangsheng Yin
79db3bec34 [CI] Add PID namespace and ps auxf diagnostics to killall.py (#21401) 2026-03-25 23:57:15 -07:00
MARATRIX
01ccdb91b1 [Fix] Add EPLB rebalance support for Kimi K2.5 (#21004)
Signed-off-by: yafeng.li <yafeng.li@mthreads.com>
2026-03-25 21:01:40 -07:00
ori
f142608408 [MUSA] apply_vocab_mask support musa device (#21296) 2026-03-25 21:00:58 -07:00
MARATRIX
f420b9b4a5 [MUSA][Feature] Enable Piecewise CUDA Graph support for MUSA platform (#20758)
Signed-off-by: yafeng.li <yafeng.li@mthreads.com>
Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>
2026-03-25 21:00:28 -07:00
R0CKSTAR
abf4f1a47a [MPS] Add StreamContext stub (#20782)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2026-03-25 20:59:51 -07:00
R0CKSTAR
02521420b3 [MPS] Support sglang.check_env (#20753)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2026-03-25 20:59:25 -07:00
gjsheu
d9e96153de [NPU] Support Hybrid KV Cache for Ascend backend (#18032)
Co-authored-by: gengjinsong <gengjinsong@huawei.com>
2026-03-26 11:27:36 +08:00
Simo Lin
b835309f0c Reland: compute M-RoPE positions for preprocessed VL inputs (#21244) 2026-03-25 20:12:43 -07:00
DarkSharpness
bb29893689 [Fix] Try to fix nvcc compilation error (#21246) 2026-03-26 10:59:36 +08:00
Aurick Qiao
a34e9ed64a Add adjusted_filter_batch (#21260) 2026-03-26 10:59:05 +08:00
Aurick Qiao
53c1d8e963 Fix customized_info offset truncation (#21262) 2026-03-26 10:57:51 +08:00
Sam Shleifer
1100b9865c Fix MxInt4 MoE returning wrong output variable (#21348) 2026-03-26 10:57:09 +08:00
Xiaoyu Zhang
6f2b51ade1 [Diffusion] Optimize diffusion Triton rotary embedding by processing multiple heads per token (#21387) 2026-03-26 08:59:25 +08:00
Hubert Lu
7c7b2a8c97 [Bugfix] Lazy-import CuteDSL KDA kernel to fix AMD/ROCm startup crash (#21428) 2026-03-25 16:37:26 -07:00
Liangsheng Yin
75682f1d2f Remove noisy streaming backlog warning log (#21432) 2026-03-25 16:25:16 -07:00
Liangsheng Yin
4dd4e06f1d [CI] Fix resource leak when setUpClass fails (#21338) 2026-03-25 16:22:44 -07:00
Xiaoyu Zhang
68f7f00174 [Diffusion] Speed up Qwen select01 Triton modulation kernels (#21318) 2026-03-25 20:48:39 +08:00
Mick
04eb72801f [diffusion] CI: add performance tracking job to nightly (#21091) 2026-03-25 19:01:33 +08:00
Xiaoyu Zhang
689e9ef05c [Diffusion] Add AKO4ALL kernel optimization skill (#21323) 2026-03-25 18:46:21 +08:00
Xiaoyu Zhang
e4ad10520b [diffusion] Skip automatic Wan/MOVA DiT layerwise offload on high-end GPUs (#21248) 2026-03-25 18:45:30 +08:00
DarkSharpness
3d2a61cbf6 [Chore] Clean up JIT compilation flags (#21022) 2026-03-25 18:08:40 +08:00
Liangsheng Yin
4480e6c237 [CI] Add retry loop to killall_sglang GPU cleanup verification (#21393) 2026-03-25 02:16:20 -07:00
YC Yen-Ching Tseng
c494e47843 [AMD] Fix stage-b-test-small-1-gpu-amd (test_tool_choice.py) (#19868) 2026-03-25 01:10:21 -07:00
Alison Shao
5297a3cb46 [CI] Rewrite killall_sglang as Python with CI/local dual mode (#21331)
Co-authored-by: Alison Shao <alison.shao@mac.lan>
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2026-03-24 23:54:01 -07:00
Mick
6cc5717e8a [diffusion] doc: update quantization.md (#21356) 2026-03-25 14:48:38 +08:00
Alison Shao
17e41cfb21 Fix RDMA device mapping for non-zero GPU indices in disaggregation tests (#21303)
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
2026-03-24 22:56:57 -07:00
Duyi-Wang
61a902ce88 [AMD][MoRI] Auto-select dispatch quantization type from MoE weight dtype. (#21040) 2026-03-24 22:53:57 -07:00
kk
86e2622097 [AMD] Add mha fp8-kv support (#21253)
Co-authored-by: wunhuang <wunhuang@amd.com>
2026-03-24 22:38:02 -07:00
Baizhou Zhang
2b75fed0dd Workaround of DSA performance drop on B200 + DP (#21337) 2026-03-24 22:21:07 -07:00
Ke Bao
92492896a5 Fix disaggregation test bootstrap port conflict (#21271) 2026-03-24 21:14:41 -07:00
Ke Bao
c1d930c028 Increase flush cache timeout in hicache CI (#21305) 2026-03-24 19:00:59 -07:00
Yuan Luo
f273ba1ccc [KDA] Support CuTeDSL KDA decode kernel (#21203)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-03-25 09:47:09 +08:00
DarkSharpness
dfc15b78b0 [misc] clean up kernel API (#21325) 2026-03-25 09:10:23 +08:00
ykcai-daniel
281fe10b5e [diffusion] quant: support nvfp4 for Flux.2 (#20137)
Co-authored-by: zcnrex <zcnrex@gmail.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: RubiaCx <1084281732@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-25 08:28:25 +08:00