Commit Graph

7140 Commits

Author SHA1 Message Date
fzyzcjy
fdbcb8156e Refactor dp_utils to use ParallelAxis enum in dump comparator (#21028) 2026-03-20 22:04:20 +08:00
fzyzcjy
154395ab7d Support s≡t dimension name equivalence in dump comparator (#21027) 2026-03-20 22:03:34 +08:00
fzyzcjy
cc22601d28 Validate replicated axes orthogonality in dump comparator (#21026) 2026-03-20 22:02:40 +08:00
fzyzcjy
2f01950a0e Support jointly-determined axes inference in dump comparator (#21025) 2026-03-20 22:01:26 +08:00
fzyzcjy
ecd7e40d20 Support dependent axis auto-resolution in dump comparator (#21024) 2026-03-20 21:56:39 +08:00
Lianmin Zheng
104b10f70a refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009) 2026-03-20 05:55:36 -07:00
Артем Савкин
9fbe6800aa [NPU] [Diffusion] Update CI performance baseline for Wan2.2-T2V-A14B-Diffusers-w8a8 (#20997) 2026-03-20 15:54:12 +03:00
xingsy97
f41832795e Add compile-time 256-bit vector guard for pre-Blackwell (#19794) 2026-03-20 18:25:12 +08:00
DarkSharpness
2dd9196079 [JIT Kernel][Feature] Support JIT custom all reduce (rewrite as v2) (#19880)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-03-20 18:24:07 +08:00
Muqi Li
2099943a49 Fix scale_step_k computation in the fp8_kernel (#20819)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-03-20 18:09:31 +08:00
Jia Guo
ec01ef9092 Fix torch.compile/dynamo crash with Qwen3 QK-norm in piecewise CUDA g… (#19818)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 18:05:09 +08:00
Prozac614
fa89d152c0 [diffusion] CI: fix hunyuan3d JIT cache (#20773)
Co-authored-by: daiweitao <dwti614707404@163.com>
2026-03-20 17:51:55 +08:00
Lianmin Zheng
a0a4dae67f Revert "Fix DeepSeek V32 FP4 test" (#21003) 2026-03-20 02:19:28 -07:00
Lianmin Zheng
112b628227 Replace _resolve_future_token_ids with JIT kernel + platform dispatch (#20976)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 01:47:03 -07:00
Baizhou Zhang
c82d20d48e Fix DeepSeek V32 FP4 test (#20984) 2026-03-20 01:04:32 -07:00
Yilong Zhao
26f709e97d misc: make prefill-delayer compatible with multiple types of mem pool (#20979) 2026-03-20 00:05:53 -07:00
Yilong Zhao
95327458ee misc: add BatchTokenizerReq hook into dp controller (#20981) 2026-03-19 23:59:53 -07:00
Lianmin Zheng
712a48c5d2 ci: move metrics scripts under scripts/ci/utils (#20986) 2026-03-19 23:47:57 -07:00
lviy
46a76af97b [Bugifx] qwen3 rope parameter compatibility (#20931)
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2026-03-19 22:22:01 -07:00
Jia Guo
87549f8f0b perf(mamba): use Triton conv1d for non-contiguous input to avoid .contiguous() copy (#20469) 2026-03-19 19:38:46 -07:00
Vedant V Jhaveri
db995fba47 perf(kimi_linear): replace einops rearrange with native torch ops in Kimi-Linear KDA path (#20396) 2026-03-20 10:38:12 +08:00
ehuaa
fa0d8f6629 perf: avoid unnecessary gpu-cpu sync in eagle_info (#20266)
Co-authored-by: root <qianhao@zhejianglab.org>
2026-03-19 19:37:29 -07:00
Mohammad Miadh Angkad
3d749c49ca [JIT Kernel] Fix NVFP4 multi-arch compilation failure (#20874) 2026-03-20 10:30:04 +08:00
cs-cat
22e378af86 Fix result writer in tuning_block_wise_kernel.py, and add FP8 kernel config for L40 (#20368)
Signed-off-by: cs-cat <118669451+cs-cat@users.noreply.github.com>
2026-03-20 09:28:54 +08:00
Yuan Luo
d9794ef9f7 [Qwen3-Next] Fuse Qwen3-Next GDN's qkvz_proj and ba_proj (#19321)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-03-20 09:25:29 +08:00
Baizhou Zhang
42f4b7276c Revert "feat(mm)(grpc): compute M-RoPE positions for preprocessed VL inputs" (#20956) 2026-03-19 18:03:04 -07:00
Liangsheng Yin
2b53e660de Simplify streaming session logprob handling (#20955) 2026-03-19 17:09:40 -07:00
Leon Gao
63c38aba5e Fix token leak with logprob_start_len=0 in streaming sessions (#20557) 2026-03-19 15:37:27 -07:00
Brayden Zhong
b42b9f6e1a Support CuteDSL mm_fp4 backend (#18801) 2026-03-19 14:20:01 -07:00
Yuwei An
d8ece7fb22 [Tiny Fix] Filter lru related warning with pcg (#20940)
Signed-off-by: yuweia <ayw.sirius19@gmail.com>
2026-03-19 13:20:49 -07:00
Lianmin Zheng
0949b138af Simplify server startup output (#20885)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:11:37 -07:00
Xinyuan Tong
a02cff7f2b [Fix] Patch is_flash_attn_2_available for flash-attn-4 in VLM input format test (#20946) 2026-03-19 13:00:51 -07:00
AlfredYong
c562e0d13b [feat] Enhance Kimi-K2/K2.5 function call and reasoning detection (#19552)
Co-authored-by: alfredyyang <alfredyyang@tencent.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2026-03-19 12:57:57 -07:00
Mohammad Miadh Angkad
29ced9c162 [UX] Suppress noisy httpx/httpcore INFO logs (#20944) 2026-03-19 10:58:41 -07:00
Xinyu Zhang
319bb4974c [Fix] RayEngine multi-node: co-locate rank0 scheduler with Engine and fix CUDA device setting (#20722) 2026-03-19 10:27:16 -07:00
Cao E
274581fb77 Add support for more batch sizes in cpu_graph_runner (#13881) 2026-03-19 09:50:56 -07:00
kk
c8f0122acf Fix gpu-fault issue when run deepseek-r1 and enable dp (#20841)
Co-authored-by: wunhuang <wunhuang@amd.com>
2026-03-19 02:36:12 -07:00
khalilzhk
574572b21b [BugFix] bug fix for DeepSeek eagle3 in Attn-DP mode (#20492) 2026-03-19 14:48:46 +08:00
Shangming Cai
fd05532da1 Add logging for BootstrapServer for CI diagnosis (#20844)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-03-19 14:42:12 +08:00
blzheng
a98b456c70 [CPU] Add frontend support for Gemma (#12590) 2026-03-18 23:02:26 -07:00
jianan-gu
8d4fcf2f7b [CPU] Fix MoE layer support for DeepSeek-OCR models (#12555) 2026-03-18 22:57:55 -07:00
Matti Varjokallio
85fe8c6793 [AMD] Use aiter_dsv3_router_gemm kernel if number of experts <= 256. (#18451) 2026-03-18 22:40:48 -07:00
kk
126cd5cfae gpt-oss decode performance optimization (#20392)
Co-authored-by: wunhuang <wunhuang@amd.com>
2026-03-18 22:30:03 -07:00
blzheng
cd22aa27a9 [CPU] Add FP8 Bmm support (#9744)
Co-authored-by: Fan Yin <1106310035@qq.com>
2026-03-18 22:19:48 -07:00
Zaili Wang
2f4babe32b [CPU] support LayerNorm with 3D shape (#15075)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-03-18 22:15:24 -07:00
blzheng
dc6aa26ce9 [CPU] Add mrope kernel for Qwen3-vl (#12531)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-03-18 22:12:48 -07:00
Juan Muneton
4052b53227 fix scheduler for non-cuda devices and disable piecewise cuda graph f… (#19992)
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
2026-03-18 21:54:19 -07:00
Ling Zhang
f85455ab24 [Bugfix] fix qwen3vl hang when --mm-enable-dp-encoder is enable (#20759) 2026-03-18 21:51:39 -07:00
Ethan (Yusheng) Su
7f6f1a3ab1 [LoRA][II] Add fused MOE LoRA Triton kernel and tests (#19711) 2026-03-18 19:58:14 -07:00
R0CKSTAR
7553b7dcb0 chore: extract diffusion_common in python/pyproject_other.toml (#20803)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2026-03-19 10:39:16 +08:00