Commit Graph

7855 Commits

Author SHA1 Message Date
赵晨阳
62480ebb1b [SGLang-Diffusion] Fix custom op fake impl missing eps default for torch.compile (#19725)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-03-03 15:24:36 +08:00
1StepForever
3c01b44700 [Fix] NPU deepep hccl buffer and fix IPC safe check (#17804) 2026-03-03 14:56:06 +08:00
Xinyuan Tong
dbf1247fe0 Add KimiK2Detector with tool interruption support (#19696)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2026-03-03 14:04:49 +08:00
Yuzhen Zhou
63003a39cf [BUG] Support tuple hidden_states from fused MXFP4/FP8 quantization (#19643) 2026-03-02 20:39:06 -08:00
Alison Shao
fe9d85d93c Fix CompressedTensorsMxInt4MoE abstract method and relax GPQA baseline (#19726)
Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net>
2026-03-02 19:03:21 -08:00
huangtingwei
8dfb6e1684 [HiCache] fix compatibility bugs with eagle and HiCacheStorage (#19570)
Co-authored-by: maoyuhan <susanmao1997@outlook.com>
2026-03-02 18:29:51 -08:00
KnightLTC
1041f240c0 [NPU]grok2 model support (#17119)
Co-authored-by: cy <chenyang08056032@163.com>
2026-03-03 10:24:10 +08:00
Ratish P
e6e02ec938 [diffusion]: Add model detectors and warning for quantized diffusion models (#18041) 2026-03-03 09:46:25 +08:00
Xiaoyu Zhang
145ae518ac [Diffusion] Revert 18619 (#19510) 2026-03-03 08:15:15 +08:00
Mohammad Miadh Angkad
6822941514 [FlashInfer] Bump FlashInfer version from 0.6.3 to 0.6.4 (#19005) 2026-03-02 16:12:09 -08:00
Mohammad Miadh Angkad
3f9fc8b848 [Qwen3.5] Fix missing quant_config in Qwen3VL (#19291) 2026-03-02 14:07:51 -08:00
Glen Liu
cc860a2198 [TestFix] change LoRA tests to use NVIDIA adapter instead of Nutanix (#19642)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-02 12:55:41 -08:00
Xiaoyu Zhang
51ee17ce44 [diffusion] move skills dir (#19697) 2026-03-03 02:51:29 +08:00
Makcum888e
05950853bc [Diffusion] [NPU] Add CI tests for FLUX (#19001)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2026-03-02 20:40:22 +03:00
Yuwei An
c64274c746 Piecewise Cuda Graph set default (#16331) 2026-03-02 23:18:07 +08:00
hlu1
468e3dc56b [Qwen3.5] Set full attn_backend to trtllm_mha on SM100 by default when possible (#19030) 2026-03-02 23:14:53 +08:00
0xNullPath
2d183c4e6d [Feat] add PP Support for minimax-m2 series (#19577) 2026-03-02 23:13:59 +08:00
Ruihang Li
5833ea684d [diffusion] fix: make input/output file save paths configurable and disableable (#19580)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-02 23:02:33 +08:00
Xiaoyu Zhang
53de53fb53 [jit_kernel] Tiny unify jit_kernel tests style (#19694) 2026-03-02 21:33:59 +08:00
Hexq0210
714c53d609 [NPU] support PD disaggregation on ascend when using PP (#14908)
Co-authored-by: iridiumine <42236072+iridiumine@users.noreply.github.com>
2026-03-02 21:33:16 +08:00
Bi Xue
eaf18ebe8d [sgl]add pin_mem to avoid cpu->gpu copy sync point (#19590) 2026-03-02 21:08:31 +08:00
JiaruiChang5268
b3718982a1 [Feature] add feature mla_ag_after_qlora for dsv3.2 (#19428)
Co-authored-by: JiaruiChang5268 <changjiarui1@huawei.com>
2026-03-02 20:00:31 +08:00
Shangming Cai
3f36f27eae [Bugfix] Fix nixl and mori backend for missing decode tp size in PD module (#19690)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-03-02 19:55:19 +08:00
AichenF
8df9b8dce9 [diffusion] fix: skip USP for cross-attention with replicated KV for wan (#19419)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-02 19:52:08 +08:00
Leoyzen
da2a0240f7 Add GLM45 tool interruption support (#17714)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-02 19:34:12 +08:00
fzyzcjy
7579ab3f33 Enhance error resilience in dump comparator (#19685) 2026-03-02 19:08:35 +08:00
fzyzcjy
e5ef845cad Support multiple verbosity in dump comparator (#19684) 2026-03-02 18:47:30 +08:00
fzyzcjy
3dd4649b42 Beautify text output in dump comparator (#19683) 2026-03-02 18:47:01 +08:00
fzyzcjy
5bf3deb4bc Trace execution information in dump comparator (#19682) 2026-03-02 18:46:27 +08:00
fzyzcjy
abdc0ee71f Support directory detection in dump comparator (#19680) 2026-03-02 18:45:35 +08:00
fzyzcjy
6980416149 Support non orthogonal parallel axes and explicit replication annotation in dump comparator (#19679) 2026-03-02 18:44:33 +08:00
fzyzcjy
a70dd11011 Support flattened dims in dump comparator (#19678) 2026-03-02 18:43:01 +08:00
fzyzcjy
15e83eea61 Enhance replication check, matching pattern, logging in dump comparator (#19677) 2026-03-02 18:42:27 +08:00
fzyzcjy
ec44bc82ab Support presets and arbitrary skipping keys in dump comparator (#19676) 2026-03-02 18:41:49 +08:00
Mick
2e15c015c0 [diffusion] feat: Add --model-id for config resolution; deprecate model_detectors (#19607) 2026-03-02 16:39:53 +08:00
kk
15af26d1e8 Add aiter attention support in prefill-attention-backend of gpt-oss (#18282)
Co-authored-by: wunhuang <wunhuang@amd.com>
2026-03-01 23:39:24 -08:00
ishandhanani
f7da379b61 feat: TTL-based prefix pinning with refresh-on-hit for HiRadixCache (#18941)
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-01 23:27:22 -08:00
Leon Gao
07ef5f7be1 Remove sync points in mamba cache + prefill cudagraph plumbing for DP (#19639) 2026-03-02 15:03:42 +08:00
Baidu-AIAK
922aad2faa Cleanup disagg decode prebuilt flow and add cross-stream sync in merge_batch (#19568)
Co-authored-by: vincent <vincent@vincentdeMacBook-Pro.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2026-03-01 21:52:27 -08:00
Prozac614
57c5c343d7 [diffusion] model: support Hunyuan3D-2 (#18170)
Co-authored-by: yingluosanqian <yingluosanqian@gmail.com>
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-02 12:28:05 +08:00
Yuan Luo
f6ee6dc8c3 [JIT-kernel] Add unit test for nsa indexer fused_store_k_cache (#19389)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-03-02 12:18:11 +08:00
Shangming Cai
0a6678bf3a [PD] Remove unused server args for disaggregation (#19618)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-03-02 11:38:50 +08:00
Henry
e5edf222cd [WIP]enable mxfp8 on nvidia sm120 (#19112)
Co-authored-by: Your Name <you@example.com>
2026-03-01 19:06:43 -08:00
SoluMilken
20282f5664 [fix typo] expert_indicies -> expert_indices (#19627)
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
2026-03-01 17:37:34 -08:00
zwang86
f51ddba131 feat: add FA4 SM90 paged KV decode support & update attention docs (#18442)
Co-authored-by: Zeyu Wang <zeyu.wang@yahooinc.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2026-03-02 09:12:19 +08:00
Kangyan-Zhou
98224de29b [Bugfix] Add missing auto_create_handle_loop to communicator methods (#19610)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:00:05 -08:00
SoluMilken
0b3ddbcf10 [fix typo] seperated_timestep -> separated_timestep (#19622) 2026-03-01 14:09:51 -08:00
Kangyan-Zhou
dc02e5bea7 [HiCache] Re-land spec v2 + decode KV cache offloading compatibility (#19615)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 13:58:31 -08:00
Ziang Li
0e86977811 [RL] Support per-layer mixed FP8/BF16 serving for FP8 checkpoints (#18742) 2026-03-01 21:59:22 +08:00
Mick
a75840b373 [diffusion] CI: create and refactor UT (#19619) 2026-03-01 19:38:20 +08:00