Commit Graph

7183 Commits

Author SHA1 Message Date
McZyWu
4641e5a3d2 [NPU] enhance accuracy for model minimaxm2 from 16.5% to 95.5% (#17695) 2026-03-23 19:06:38 +08:00
XDaoHong
2d288ba8c9 [Bugfix] fix npu get kv_item_lens in PD separation when use ASCEND_US… (#15852)
Co-authored-by: ZhengdQin <zhengdqin@gmail.com>
2026-03-23 15:56:47 +08:00
kpham-sgl
59cb9a9da6 [Spec][Ngram] 3/N: Fix synchronization issues in Ngram.cpp (#21186)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 00:37:07 -07:00
Lianmin Zheng
814202704b ci: unify PR test suite naming (#21187) 2026-03-23 00:18:45 -07:00
yudian0504
3d312643b9 [BUGFIX] Fix CP residual size mismatch crash when tp_size == attn_cp_size (#21170) 2026-03-23 00:12:58 -07:00
Lianmin Zheng
7757a9ddd0 ci: remove IS_BLACKWELL env var; auto-detect Blackwell (#21118) 2026-03-22 23:44:48 -07:00
kpham-sgl
bc4aaab6a1 [Spec][Ngram] 2/N: Rename branch length to max trie depth (#21181)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:35:25 -07:00
Cheng Wan
d6b12c401c Revert "[bugfix] Fix PPMissingLayer AttributeError when Using PP" (#21189) 2026-03-22 23:28:36 -07:00
Zhiqiang Xie
13f4f010d8 HiSparse for Sparse Attention (#20343) 2026-03-22 23:09:31 -07:00
Lianmin Zheng
7050011dee Enable JIT clamp_position and resolve_future_token_ids on ROCm (#21116) 2026-03-22 22:33:54 -07:00
Yuhao Yang
32a85ef128 [diffusion] CI: auto-skip diffusion tests when required pipeline class is missing from diffusers (#21139) 2026-03-23 12:15:21 +08:00
Mohammad Miadh Angkad
d8a5b1dbaf [Bugfix] Work around FlashInfer unified transport issue on GB (#20039) 2026-03-22 21:10:25 -07:00
Xiaoyu Zhang
a94d67d44b [SKILL] fix(bench): Support model-specific DenoisingStage variants in… (#21137) 2026-03-23 12:08:00 +08:00
fanghao
2b47bd3a34 [Bug Fix] Fix non-streaming request abort failure when --enable-metrics is enabled (#20625) 2026-03-22 19:58:49 -07:00
yuumn
889e8489e9 [diffusion] model: support FireRed-Image-Edit (#20862)
Co-authored-by: yuumn <1010797597@qqã.com>
2026-03-23 10:27:07 +08:00
Cishoon
999bad5aba Fix VRAM leak in overlap scheduling with structured output (#20640) (#20697) 2026-03-22 17:07:39 -07:00
Yilong Zhao
343998865a perf: pad max-num-requests in decode cuda graph for higher coverage (#20978) 2026-03-22 17:06:16 -07:00
Ziang Li
ce0541404f [FlashInfer v0.6.6][RL] Support fp8-last-n-bf16 RL for flashinfer_trtllm_routed moe backend (#20214) 2026-03-22 11:17:01 -07:00
Xiaoyu Zhang
c1fe5de69c [Diffusion] Clean up diffusion Triton kernels and modernize custom op registration (#21122)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-22 22:38:57 +08:00
Xiaoyu Zhang
766d225fcc Add SGLang CUDA crash API logging inspired by FlashInfer (#20910) 2026-03-22 16:39:40 +08:00
Shunkangz
bb737d7a82 Support Qwen3 MoE context parallel (#18233)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
2026-03-22 01:27:20 -07:00
kpham-sgl
6d160b42bb [Spec][Ngram] 1/N: Reference based Speculative Decoding refactor (#20393) 2026-03-22 00:55:10 -07:00
Xiaoyu Zhang
1b65c0d259 [Diffusion] Fix torch.compile RMSNorm fallback for Z-Image (#20962)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-22 15:38:22 +08:00
Bowen Li
3bc595acbc [FlashAttn] Add fused triton kernel for normal_decode_set_metadata (#20778)
Co-authored-by: kinza99 <dh18324568312@163.com>
2026-03-22 15:12:29 +08:00
Mick
f7fc2c8592 [diffusion] fix: fix accuracy for some image models (#20679) 2026-03-22 15:11:57 +08:00
shuwenn
2fba2bdad1 refactor: Remove dead code from utils/common.py (#20668) 2026-03-21 21:54:17 -07:00
Lianmin Zheng
76e4a8662c Replace clamp_position with JIT kernel + platform dispatch (#20999)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 21:26:26 -07:00
Changyi Yang
c1794e2944 [diffusion] fix: fix Sana corrupted output by removing spurious QK norm layers (#20656)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-22 12:06:49 +08:00
Yuhao Yang
c32e35a2a5 [diffusion] CI: fix picklingerror for diffusion models using diffusers backend (#20854) 2026-03-22 11:51:03 +08:00
Mick
6dfa8a40bc [diffusion] CI: make auxiliary coverage explicit and simplify testcases (#20983) 2026-03-21 20:18:23 +08:00
KnightLTC
a0862f00c2 dbrx instruct npu support (#17121)
Co-authored-by: McZyWu <zhuoyun.wu.23@ucl.ac.uk>
2026-03-21 17:10:35 +08:00
Alison Shao
852e112ebf [Qwen3.5] Fix broken pipeline parallelism layer splitting (#21070)
Co-authored-by: Alison Shao <alison.shao@Mac.attlocal.net>
2026-03-21 01:02:51 -07:00
Lianmin Zheng
dba6fb3d30 Fix streaming logprobs corruption caused by shared mutable list reference (#21030) 2026-03-21 00:18:48 -07:00
kk
3f0ba021fc [AMD] Improve openai/gpt-oss performance (#21020)
Co-authored-by: root <root@smci355-ccs-aus-m15-21.cs-aus.dcgpu>
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
2026-03-20 23:16:47 -07:00
Baizhou Zhang
67cad3e69e Revert "Support CuteDSL mm_fp4 backend" (#21077) 2026-03-20 22:47:47 -07:00
Xiaoyu Zhang
c076968c52 [CI] Remove obsolete AOT-only jit-kernel benchmarks after sgl-kernel 4.0 (#21075) 2026-03-21 13:40:42 +08:00
Baizhou Zhang
5f3393c04c Fix deepseek-v32-fp4 b200 ci (#21072) 2026-03-20 22:28:40 -07:00
Alison Shao
048d90e165 Revert "[AMD] Add MoE weights and scales padding" (#21067) 2026-03-20 20:26:17 -07:00
shuwenn
6c91590e1b [HiCache] refactor: hicache normalization flow and compatibility checks (#19669) 2026-03-20 18:38:44 -07:00
mqhc2020
9419453713 [AMD] Add MoE weights and scales padding (#18684) 2026-03-20 14:55:09 -07:00
YC Yen-Ching Tseng
f97c09dac1 [AMD] Enable aiter unified attention for non-SWA models (Qwen3-VL) (#20897)
Co-authored-by: wunhuang <wunhuang@amd.com>
2026-03-20 12:07:41 -07:00
fzyzcjy
146700db68 Add e2e demo test in dump comparator (#21031) 2026-03-20 22:41:01 +08:00
fzyzcjy
6703cc4484 Enhance output formatting in dump comparator (#21029) 2026-03-20 22:04:50 +08:00
fzyzcjy
fdbcb8156e Refactor dp_utils to use ParallelAxis enum in dump comparator (#21028) 2026-03-20 22:04:20 +08:00
fzyzcjy
154395ab7d Support s≡t dimension name equivalence in dump comparator (#21027) 2026-03-20 22:03:34 +08:00
fzyzcjy
cc22601d28 Validate replicated axes orthogonality in dump comparator (#21026) 2026-03-20 22:02:40 +08:00
fzyzcjy
2f01950a0e Support jointly-determined axes inference in dump comparator (#21025) 2026-03-20 22:01:26 +08:00
fzyzcjy
ecd7e40d20 Support dependent axis auto-resolution in dump comparator (#21024) 2026-03-20 21:56:39 +08:00
Lianmin Zheng
104b10f70a refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009) 2026-03-20 05:55:36 -07:00
Артем Савкин
9fbe6800aa [NPU] [Diffusion] Update CI performance baseline for Wan2.2-T2V-A14B-Diffusers-w8a8 (#20997) 2026-03-20 15:54:12 +03:00