赵晨阳
|
62480ebb1b
|
[SGLang-Diffusion] Fix custom op fake impl missing eps default for torch.compile (#19725)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-03-03 15:24:36 +08:00 |
|
1StepForever
|
3c01b44700
|
[Fix] NPU deepep hccl buffer and fix IPC safe check (#17804)
|
2026-03-03 14:56:06 +08:00 |
|
Xinyuan Tong
|
dbf1247fe0
|
Add KimiK2Detector with tool interruption support (#19696)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-03-03 14:04:49 +08:00 |
|
Yuzhen Zhou
|
63003a39cf
|
[BUG] Support tuple hidden_states from fused MXFP4/FP8 quantization (#19643)
|
2026-03-02 20:39:06 -08:00 |
|
Alison Shao
|
fe9d85d93c
|
Fix CompressedTensorsMxInt4MoE abstract method and relax GPQA baseline (#19726)
Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net>
|
2026-03-02 19:03:21 -08:00 |
|
huangtingwei
|
8dfb6e1684
|
[HiCache] fix compatibility bugs with eagle and HiCacheStorage (#19570)
Co-authored-by: maoyuhan <susanmao1997@outlook.com>
|
2026-03-02 18:29:51 -08:00 |
|
KnightLTC
|
1041f240c0
|
[NPU]grok2 model support (#17119)
Co-authored-by: cy <chenyang08056032@163.com>
|
2026-03-03 10:24:10 +08:00 |
|
Ratish P
|
e6e02ec938
|
[diffusion]: Add model detectors and warning for quantized diffusion models (#18041)
|
2026-03-03 09:46:25 +08:00 |
|
Xiaoyu Zhang
|
145ae518ac
|
[Diffusion] Revert 18619 (#19510)
|
2026-03-03 08:15:15 +08:00 |
|
Mohammad Miadh Angkad
|
6822941514
|
[FlashInfer] Bump FlashInfer version from 0.6.3 to 0.6.4 (#19005)
|
2026-03-02 16:12:09 -08:00 |
|
Mohammad Miadh Angkad
|
3f9fc8b848
|
[Qwen3.5] Fix missing quant_config in Qwen3VL (#19291)
|
2026-03-02 14:07:51 -08:00 |
|
Glen Liu
|
cc860a2198
|
[TestFix] change LoRA tests to use NVIDIA adapter instead of Nutanix (#19642)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-02 12:55:41 -08:00 |
|
Xiaoyu Zhang
|
51ee17ce44
|
[diffusion] move skills dir (#19697)
|
2026-03-03 02:51:29 +08:00 |
|
Makcum888e
|
05950853bc
|
[Diffusion] [NPU] Add CI tests for FLUX (#19001)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-03-02 20:40:22 +03:00 |
|
Yuwei An
|
c64274c746
|
Piecewise Cuda Graph set default (#16331)
|
2026-03-02 23:18:07 +08:00 |
|
hlu1
|
468e3dc56b
|
[Qwen3.5] Set full attn_backend to trtllm_mha on SM100 by default when possible (#19030)
|
2026-03-02 23:14:53 +08:00 |
|
0xNullPath
|
2d183c4e6d
|
[Feat] add PP Support for minimax-m2 series (#19577)
|
2026-03-02 23:13:59 +08:00 |
|
Ruihang Li
|
5833ea684d
|
[diffusion] fix: make input/output file save paths configurable and disableable (#19580)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-02 23:02:33 +08:00 |
|
Xiaoyu Zhang
|
53de53fb53
|
[jit_kernel] Tiny unify jit_kernel tests style (#19694)
|
2026-03-02 21:33:59 +08:00 |
|
Hexq0210
|
714c53d609
|
[NPU] support PD disaggregation on ascend when using PP (#14908)
Co-authored-by: iridiumine <42236072+iridiumine@users.noreply.github.com>
|
2026-03-02 21:33:16 +08:00 |
|
Bi Xue
|
eaf18ebe8d
|
[sgl]add pin_mem to avoid cpu->gpu copy sync point (#19590)
|
2026-03-02 21:08:31 +08:00 |
|
JiaruiChang5268
|
b3718982a1
|
[Feature] add feature mla_ag_after_qlora for dsv3.2 (#19428)
Co-authored-by: JiaruiChang5268 <changjiarui1@huawei.com>
|
2026-03-02 20:00:31 +08:00 |
|
Shangming Cai
|
3f36f27eae
|
[Bugfix] Fix nixl and mori backend for missing decode tp size in PD module (#19690)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-03-02 19:55:19 +08:00 |
|
AichenF
|
8df9b8dce9
|
[diffusion] fix: skip USP for cross-attention with replicated KV for wan (#19419)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-02 19:52:08 +08:00 |
|
Leoyzen
|
da2a0240f7
|
Add GLM45 tool interruption support (#17714)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
2026-03-02 19:34:12 +08:00 |
|
fzyzcjy
|
7579ab3f33
|
Enhance error resilience in dump comparator (#19685)
|
2026-03-02 19:08:35 +08:00 |
|
fzyzcjy
|
e5ef845cad
|
Support multiple verbosity in dump comparator (#19684)
|
2026-03-02 18:47:30 +08:00 |
|
fzyzcjy
|
3dd4649b42
|
Beautify text output in dump comparator (#19683)
|
2026-03-02 18:47:01 +08:00 |
|
fzyzcjy
|
5bf3deb4bc
|
Trace execution information in dump comparator (#19682)
|
2026-03-02 18:46:27 +08:00 |
|
fzyzcjy
|
abdc0ee71f
|
Support directory detection in dump comparator (#19680)
|
2026-03-02 18:45:35 +08:00 |
|
fzyzcjy
|
6980416149
|
Support non orthogonal parallel axes and explicit replication annotation in dump comparator (#19679)
|
2026-03-02 18:44:33 +08:00 |
|
fzyzcjy
|
a70dd11011
|
Support flattened dims in dump comparator (#19678)
|
2026-03-02 18:43:01 +08:00 |
|
fzyzcjy
|
15e83eea61
|
Enhance replication check, matching pattern, logging in dump comparator (#19677)
|
2026-03-02 18:42:27 +08:00 |
|
fzyzcjy
|
ec44bc82ab
|
Support presets and arbitrary skipping keys in dump comparator (#19676)
|
2026-03-02 18:41:49 +08:00 |
|
Mick
|
2e15c015c0
|
[diffusion] feat: Add --model-id for config resolution; deprecate model_detectors (#19607)
|
2026-03-02 16:39:53 +08:00 |
|
kk
|
15af26d1e8
|
Add aiter attention support in prefill-attention-backend of gpt-oss (#18282)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2026-03-01 23:39:24 -08:00 |
|
ishandhanani
|
f7da379b61
|
feat: TTL-based prefix pinning with refresh-on-hit for HiRadixCache (#18941)
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-03-01 23:27:22 -08:00 |
|
Leon Gao
|
07ef5f7be1
|
Remove sync points in mamba cache + prefill cudagraph plumbing for DP (#19639)
|
2026-03-02 15:03:42 +08:00 |
|
Baidu-AIAK
|
922aad2faa
|
Cleanup disagg decode prebuilt flow and add cross-stream sync in merge_batch (#19568)
Co-authored-by: vincent <vincent@vincentdeMacBook-Pro.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-03-01 21:52:27 -08:00 |
|
Prozac614
|
57c5c343d7
|
[diffusion] model: support Hunyuan3D-2 (#18170)
Co-authored-by: yingluosanqian <yingluosanqian@gmail.com>
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-02 12:28:05 +08:00 |
|
Yuan Luo
|
f6ee6dc8c3
|
[JIT-kernel] Add unit test for nsa indexer fused_store_k_cache (#19389)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-02 12:18:11 +08:00 |
|
Shangming Cai
|
0a6678bf3a
|
[PD] Remove unused server args for disaggregation (#19618)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-03-02 11:38:50 +08:00 |
|
Henry
|
e5edf222cd
|
[WIP]enable mxfp8 on nvidia sm120 (#19112)
Co-authored-by: Your Name <you@example.com>
|
2026-03-01 19:06:43 -08:00 |
|
SoluMilken
|
20282f5664
|
[fix typo] expert_indicies -> expert_indices (#19627)
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
|
2026-03-01 17:37:34 -08:00 |
|
zwang86
|
f51ddba131
|
feat: add FA4 SM90 paged KV decode support & update attention docs (#18442)
Co-authored-by: Zeyu Wang <zeyu.wang@yahooinc.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2026-03-02 09:12:19 +08:00 |
|
Kangyan-Zhou
|
98224de29b
|
[Bugfix] Add missing auto_create_handle_loop to communicator methods (#19610)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-01 15:00:05 -08:00 |
|
SoluMilken
|
0b3ddbcf10
|
[fix typo] seperated_timestep -> separated_timestep (#19622)
|
2026-03-01 14:09:51 -08:00 |
|
Kangyan-Zhou
|
dc02e5bea7
|
[HiCache] Re-land spec v2 + decode KV cache offloading compatibility (#19615)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-01 13:58:31 -08:00 |
|
Ziang Li
|
0e86977811
|
[RL] Support per-layer mixed FP8/BF16 serving for FP8 checkpoints (#18742)
|
2026-03-01 21:59:22 +08:00 |
|
Mick
|
a75840b373
|
[diffusion] CI: create and refactor UT (#19619)
|
2026-03-01 19:38:20 +08:00 |
|