Commit Graph

7855 Commits

Author SHA1 Message Date
Liangsheng Yin
e5b7650353 Fix UnboundLocalError when DetokenizerManager constructor fails (#21471) 2026-03-26 13:00:16 -07:00
Ho-Ren (Jack) Chuang
4b5f63e1b8 FIX: (NSA) Compute topk_indices_offset when NSA prefill flashmla_sparse is used with FP8 KV cache (#20606)
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
2026-03-26 12:50:50 -07:00
jianzhao-xu
3867c6431a Fix bug in dbrx model (#21445)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
2026-03-26 11:23:30 -07:00
shuwenn
646573e4e8 fix: use get_rope_config() to support models without rope_parameters (#21135) 2026-03-26 11:22:12 -07:00
McZyWu
0906e45cec bugfix for weight loading for qwen3-next (#21313) 2026-03-26 21:21:00 +08:00
Mick
35720d9969 [diffusion] fix: fix qwen-image with nunchaku (#21415) 2026-03-26 16:31:44 +08:00
Anant Sharma
f289d173aa [Deps] Bump xgrammar to 0.1.32 (#21032) 2026-03-26 01:22:37 -07:00
Chen, Zhentao
fd535942ac [AMD]Integrate aiter's fused_topk for softmax scoring in topk function (#21421)
Co-authored-by: Chen, Todd <zhenchen@amd.com>
2026-03-26 00:57:56 -07:00
R0CKSTAR
a305964159 [MLX] Add native MLX execution backend for Apple Silicon Mac (#20342)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2026-03-26 00:09:17 -07:00
Xiaoyu Zhang
7ca015fe65 [Diffusion] Refactor diffusion JIT kernel test layout and narrow CI triggers (#21385) 2026-03-26 15:02:02 +08:00
Liangsheng Yin
79db3bec34 [CI] Add PID namespace and ps auxf diagnostics to killall.py (#21401) 2026-03-25 23:57:15 -07:00
MARATRIX
01ccdb91b1 [Fix] Add EPLB rebalance support for Kimi K2.5 (#21004)
Signed-off-by: yafeng.li <yafeng.li@mthreads.com>
2026-03-25 21:01:40 -07:00
ori
f142608408 [MUSA] apply_vocab_mask support musa device (#21296) 2026-03-25 21:00:58 -07:00
MARATRIX
f420b9b4a5 [MUSA][Feature] Enable Piecewise CUDA Graph support for MUSA platform (#20758)
Signed-off-by: yafeng.li <yafeng.li@mthreads.com>
Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>
2026-03-25 21:00:28 -07:00
R0CKSTAR
abf4f1a47a [MPS] Add StreamContext stub (#20782)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2026-03-25 20:59:51 -07:00
R0CKSTAR
02521420b3 [MPS] Support sglang.check_env (#20753)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2026-03-25 20:59:25 -07:00
gjsheu
d9e96153de [NPU] Support Hybrid KV Cache for Ascend backend (#18032)
Co-authored-by: gengjinsong <gengjinsong@huawei.com>
2026-03-26 11:27:36 +08:00
Simo Lin
b835309f0c Reland: compute M-RoPE positions for preprocessed VL inputs (#21244) 2026-03-25 20:12:43 -07:00
DarkSharpness
bb29893689 [Fix] Try to fix nvcc compilation error (#21246) 2026-03-26 10:59:36 +08:00
Aurick Qiao
a34e9ed64a Add adjusted_filter_batch (#21260) 2026-03-26 10:59:05 +08:00
Aurick Qiao
53c1d8e963 Fix customized_info offset truncation (#21262) 2026-03-26 10:57:51 +08:00
Sam Shleifer
1100b9865c Fix MxInt4 MoE returning wrong output variable (#21348) 2026-03-26 10:57:09 +08:00
Xiaoyu Zhang
6f2b51ade1 [Diffusion] Optimize diffusion Triton rotary embedding by processing multiple heads per token (#21387) 2026-03-26 08:59:25 +08:00
Hubert Lu
7c7b2a8c97 [Bugfix] Lazy-import CuteDSL KDA kernel to fix AMD/ROCm startup crash (#21428) 2026-03-25 16:37:26 -07:00
Liangsheng Yin
75682f1d2f Remove noisy streaming backlog warning log (#21432) 2026-03-25 16:25:16 -07:00
Liangsheng Yin
4dd4e06f1d [CI] Fix resource leak when setUpClass fails (#21338) 2026-03-25 16:22:44 -07:00
Xiaoyu Zhang
68f7f00174 [Diffusion] Speed up Qwen select01 Triton modulation kernels (#21318) 2026-03-25 20:48:39 +08:00
Mick
04eb72801f [diffusion] CI: add performance tracking job to nightly (#21091) 2026-03-25 19:01:33 +08:00
Xiaoyu Zhang
689e9ef05c [Diffusion] Add AKO4ALL kernel optimization skill (#21323) 2026-03-25 18:46:21 +08:00
Xiaoyu Zhang
e4ad10520b [diffusion] Skip automatic Wan/MOVA DiT layerwise offload on high-end GPUs (#21248) 2026-03-25 18:45:30 +08:00
DarkSharpness
3d2a61cbf6 [Chore] Clean up JIT compilation flags (#21022) 2026-03-25 18:08:40 +08:00
Liangsheng Yin
4480e6c237 [CI] Add retry loop to killall_sglang GPU cleanup verification (#21393) 2026-03-25 02:16:20 -07:00
YC Yen-Ching Tseng
c494e47843 [AMD] Fix stage-b-test-small-1-gpu-amd (test_tool_choice.py) (#19868) 2026-03-25 01:10:21 -07:00
Alison Shao
5297a3cb46 [CI] Rewrite killall_sglang as Python with CI/local dual mode (#21331)
Co-authored-by: Alison Shao <alison.shao@mac.lan>
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2026-03-24 23:54:01 -07:00
Mick
6cc5717e8a [diffusion] doc: update quantization.md (#21356) 2026-03-25 14:48:38 +08:00
Alison Shao
17e41cfb21 Fix RDMA device mapping for non-zero GPU indices in disaggregation tests (#21303)
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
2026-03-24 22:56:57 -07:00
Duyi-Wang
61a902ce88 [AMD][MoRI] Auto-select dispatch quantization type from MoE weight dtype. (#21040) 2026-03-24 22:53:57 -07:00
kk
86e2622097 [AMD] Add mha fp8-kv support (#21253)
Co-authored-by: wunhuang <wunhuang@amd.com>
2026-03-24 22:38:02 -07:00
Baizhou Zhang
2b75fed0dd Workaround of DSA performance drop on B200 + DP (#21337) 2026-03-24 22:21:07 -07:00
Ke Bao
92492896a5 Fix disaggregation test bootstrap port conflict (#21271) 2026-03-24 21:14:41 -07:00
Ke Bao
c1d930c028 Increase flush cache timeout in hicache CI (#21305) 2026-03-24 19:00:59 -07:00
Yuan Luo
f273ba1ccc [KDA] Support CuTeDSL KDA decode kernel (#21203)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-03-25 09:47:09 +08:00
DarkSharpness
dfc15b78b0 [misc] clean up kernel API (#21325) 2026-03-25 09:10:23 +08:00
ykcai-daniel
281fe10b5e [diffusion] quant: support nvfp4 for Flux.2 (#20137)
Co-authored-by: zcnrex <zcnrex@gmail.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: RubiaCx <1084281732@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-25 08:28:25 +08:00
Liangsheng Yin
37420dce0b [CI] Enable failfast (-f) by default in run_suite.py (#21330) 2026-03-24 17:04:42 -07:00
Baizhou Zhang
1046dbe038 [Fix] Fix trtllm fp4 moe kernel not found error (#21343) 2026-03-24 16:38:05 -07:00
Mohammad Miadh Angkad
bbe25b2412 Use FlashInfer tinygemm for GPT-OSS MoE router on SM90+ (#20755)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2026-03-24 15:00:18 -07:00
Jiaxin(Jackson) Deng
c4db64c16b Add Lychee Doc Links Check to Local and CI (#19742)
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
2026-03-24 13:48:26 -07:00
Jonah Bernard
a32e0d57e7 [LoRA][III] Add LoRA support for MoE layers and enable TP (#14105)
Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-24 13:14:14 -07:00
Zhang Yiyang (SII)
a3ed2e4d29 [diffusion][CI] Add CI for MOVA model inference (#20430)
Co-authored-by: Luo <139519292+0-693@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-24 21:28:16 +03:00