Commit Graph

7221 Commits

Author SHA1 Message Date
Mick
6cc5717e8a [diffusion] doc: update quantization.md (#21356) 2026-03-25 14:48:38 +08:00
Alison Shao
17e41cfb21 Fix RDMA device mapping for non-zero GPU indices in disaggregation tests (#21303)
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
2026-03-24 22:56:57 -07:00
Duyi-Wang
61a902ce88 [AMD][MoRI] Auto-select dispatch quantization type from MoE weight dtype. (#21040) 2026-03-24 22:53:57 -07:00
kk
86e2622097 [AMD] Add mha fp8-kv support (#21253)
Co-authored-by: wunhuang <wunhuang@amd.com>
2026-03-24 22:38:02 -07:00
Baizhou Zhang
2b75fed0dd Workaround of DSA performance drop on B200 + DP (#21337) 2026-03-24 22:21:07 -07:00
Ke Bao
92492896a5 Fix disaggregation test bootstrap port conflict (#21271) 2026-03-24 21:14:41 -07:00
Ke Bao
c1d930c028 Increase flush cache timeout in hicache CI (#21305) 2026-03-24 19:00:59 -07:00
Yuan Luo
f273ba1ccc [KDA] Support CuTeDSL KDA decode kernel (#21203)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-03-25 09:47:09 +08:00
DarkSharpness
dfc15b78b0 [misc] clean up kernel API (#21325) 2026-03-25 09:10:23 +08:00
ykcai-daniel
281fe10b5e [diffusion] quant: support nvfp4 for Flux.2 (#20137)
Co-authored-by: zcnrex <zcnrex@gmail.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: RubiaCx <1084281732@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-25 08:28:25 +08:00
Liangsheng Yin
37420dce0b [CI] Enable failfast (-f) by default in run_suite.py (#21330) 2026-03-24 17:04:42 -07:00
Baizhou Zhang
1046dbe038 [Fix] Fix trtllm fp4 moe kernel not found error (#21343) 2026-03-24 16:38:05 -07:00
Mohammad Miadh Angkad
bbe25b2412 Use FlashInfer tinygemm for GPT-OSS MoE router on SM90+ (#20755)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2026-03-24 15:00:18 -07:00
Jiaxin(Jackson) Deng
c4db64c16b Add Lychee Doc Links Check to Local and CI (#19742)
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
2026-03-24 13:48:26 -07:00
Jonah Bernard
a32e0d57e7 [LoRA][III] Add LoRA support for MoE layers and enable TP (#14105)
Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-24 13:14:14 -07:00
Zhang Yiyang (SII)
a3ed2e4d29 [diffusion][CI] Add CI for MOVA model inference (#20430)
Co-authored-by: Luo <139519292+0-693@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-24 21:28:16 +03:00
YC Yen-Ching Tseng
71f5ae3f9a [AMD] Fix AMD Nightly Test - Transformers 5.3.0 incompatibility and gemma2-27b kv issue (#21193)
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
2026-03-24 10:41:44 -07:00
Elizaveta Martirosian
9f4d8ac99f [Diffusion][NPU] Add support for Hunyuan3D (#20352)
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
2026-03-24 16:18:49 +03:00
shadowxz109
1b4933d45d [NPU][ModelSlim] adapt w2 quant layer for Minimax2.5 (#20905) 2026-03-24 20:57:18 +08:00
Aleksi Vesanto
eefb504f84 [diffusion] model: Fix FLUX.1 output correctness (#21041)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-24 15:17:33 +03:00
Mohammad Miadh Angkad
4fbb311234 [Fix][Eval] Keep --dataset-path scoped to longbench_v2 (#21156) 2026-03-24 02:25:11 -07:00
Thomas Wang
855d15adf6 [AMD] Tilelang sparse fwd for dsv32 mi355/mi300 (#19945) 2026-03-24 02:01:39 -07:00
Shunkangz
dac148167c Enable the qwen3 test (#21195)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-03-23 23:40:59 -07:00
Xiaoyu Zhang
69f02e36e8 [diffusion] Fix torch.zeros typo in causal wan (#21250) 2026-03-24 14:39:16 +08:00
Xiaoyu Zhang
d9f97b2115 Refine diffusion skills and align JIT kernel docs with the new CI flow (#21283) 2026-03-24 14:38:36 +08:00
Cheng Wan
c01ee848b0 Revert "fix: use consistent time denominator for throughput metrics in bench_one_batch_server" (#21276) 2026-03-23 22:14:54 -07:00
Baidu-AIAK
6491728797 [Perf] Overlap NSA-CP key all-gather with query computation for DeepSeek-V3.2 (#20438)
Co-authored-by: Shurui Jia <18817781975@163.com>
Co-authored-by: Baidu-AIAK <baiduaiak~123>
2026-03-23 21:31:48 -07:00
Lianmin Zheng
260abe1fb1 Refactor JIT kernel CI to use run_suite.py registration system (#21239) 2026-03-23 21:17:27 -07:00
hzh0425
0986bed8e2 [HiCache][HybridModel]: Support mamba state offloading & HybridCacheController (#20457)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
2026-03-23 20:02:50 -07:00
Ratish P
2b1d3c935e [diffusion] fix Z-Image SP sharding for portrait and padded resolutions (#21042)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-03-24 10:15:33 +08:00
Yuxuan Zhang
fcaad42b00 [Bug Fix] GLM-V / GLM-OCR: field detection for transformers 5.x and MTP omission fix (#21134) 2026-03-23 13:19:48 -07:00
Baizhou Zhang
ed316a26ef Fix CP in-seq-split method for DeepSeek V32 and update related tests (#21192) 2026-03-23 12:34:10 -07:00
Lianmin Zheng
27ac831a84 docs: improve CI and testing documentation (#21202)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 10:48:50 -07:00
jacky.cheng
b4d3fb001d [AMD] Add fused GemmaRMSNorm forward_hip to use aiter/vllm kernels for qwen3.5 (#21188) 2026-03-23 10:21:36 -07:00
Johnsonms
777edb6ef7 Fix(jit): support rmsnorm for hidden_size in {64, 128, 256} (#20661) 2026-03-23 23:17:44 +08:00
Yuan Luo
5bdc07d974 [Qwen3.5] Fuse split/reshape/cat ops in GDN projection with Triton kernel (#21019)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-03-23 23:17:01 +08:00
McZyWu
8662ba7db4 [NPU] bugfix for import sgl-kernel error (#21200) 2026-03-23 19:52:36 +08:00
strgrb
80d4a0753a fix fused_set_kv_buffer for rope with Ling-v2 (#20316) 2026-03-23 19:20:40 +08:00
McZyWu
4641e5a3d2 [NPU] enhance accuracy for model minimaxm2 from 16.5% to 95.5% (#17695) 2026-03-23 19:06:38 +08:00
XDaoHong
2d288ba8c9 [Bugfix] fix npu get kv_item_lens in PD separation when use ASCEND_US… (#15852)
Co-authored-by: ZhengdQin <zhengdqin@gmail.com>
2026-03-23 15:56:47 +08:00
kpham-sgl
59cb9a9da6 [Spec][Ngram] 3/N: Fix synchronization issues in Ngram.cpp (#21186)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 00:37:07 -07:00
Lianmin Zheng
814202704b ci: unify PR test suite naming (#21187) 2026-03-23 00:18:45 -07:00
yudian0504
3d312643b9 [BUGFIX] Fix CP residual size mismatch crash when tp_size == attn_cp_size (#21170) 2026-03-23 00:12:58 -07:00
Lianmin Zheng
7757a9ddd0 ci: remove IS_BLACKWELL env var; auto-detect Blackwell (#21118) 2026-03-22 23:44:48 -07:00
kpham-sgl
bc4aaab6a1 [Spec][Ngram] 2/N: Rename branch length to max trie depth (#21181)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:35:25 -07:00
Cheng Wan
d6b12c401c Revert "[bugfix] Fix PPMissingLayer AttributeError when Using PP" (#21189) 2026-03-22 23:28:36 -07:00
Zhiqiang Xie
13f4f010d8 HiSparse for Sparse Attention (#20343) 2026-03-22 23:09:31 -07:00
Lianmin Zheng
7050011dee Enable JIT clamp_position and resolve_future_token_ids on ROCm (#21116) 2026-03-22 22:33:54 -07:00
Yuhao Yang
32a85ef128 [diffusion] CI: auto-skip diffusion tests when required pipeline class is missing from diffusers (#21139) 2026-03-23 12:15:21 +08:00
Mohammad Miadh Angkad
d8a5b1dbaf [Bugfix] Work around FlashInfer unified transport issue on GB (#20039) 2026-03-22 21:10:25 -07:00