Zhang Yiyang (SII)
|
a3ed2e4d29
|
[diffusion][CI] Add CI for MOVA model inference (#20430)
Co-authored-by: Luo <139519292+0-693@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-24 21:28:16 +03:00 |
|
YC Yen-Ching Tseng
|
71f5ae3f9a
|
[AMD] Fix AMD Nightly Test - Transformers 5.3.0 incompatibility and gemma2-27b kv issue (#21193)
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
|
2026-03-24 10:41:44 -07:00 |
|
Elizaveta Martirosian
|
9f4d8ac99f
|
[Diffusion][NPU] Add support for Hunyuan3D (#20352)
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
|
2026-03-24 16:18:49 +03:00 |
|
shadowxz109
|
1b4933d45d
|
[NPU][ModelSlim] adapt w2 quant layer for Minimax2.5 (#20905)
|
2026-03-24 20:57:18 +08:00 |
|
Aleksi Vesanto
|
eefb504f84
|
[diffusion] model: Fix FLUX.1 output correctness (#21041)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-24 15:17:33 +03:00 |
|
Mohammad Miadh Angkad
|
4fbb311234
|
[Fix][Eval] Keep --dataset-path scoped to longbench_v2 (#21156)
|
2026-03-24 02:25:11 -07:00 |
|
Thomas Wang
|
855d15adf6
|
[AMD] Tilelang sparse fwd for dsv32 mi355/mi300 (#19945)
|
2026-03-24 02:01:39 -07:00 |
|
Shunkangz
|
dac148167c
|
Enable the qwen3 test (#21195)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-03-23 23:40:59 -07:00 |
|
Xiaoyu Zhang
|
69f02e36e8
|
[diffusion] Fix torch.zeros typo in causal wan (#21250)
|
2026-03-24 14:39:16 +08:00 |
|
Xiaoyu Zhang
|
d9f97b2115
|
Refine diffusion skills and align JIT kernel docs with the new CI flow (#21283)
|
2026-03-24 14:38:36 +08:00 |
|
Cheng Wan
|
c01ee848b0
|
Revert "fix: use consistent time denominator for throughput metrics in bench_one_batch_server" (#21276)
|
2026-03-23 22:14:54 -07:00 |
|
Baidu-AIAK
|
6491728797
|
[Perf] Overlap NSA-CP key all-gather with query computation for DeepSeek-V3.2 (#20438)
Co-authored-by: Shurui Jia <18817781975@163.com>
Co-authored-by: Baidu-AIAK <baiduaiak~123>
|
2026-03-23 21:31:48 -07:00 |
|
Lianmin Zheng
|
260abe1fb1
|
Refactor JIT kernel CI to use run_suite.py registration system (#21239)
|
2026-03-23 21:17:27 -07:00 |
|
hzh0425
|
0986bed8e2
|
[HiCache][HybridModel]: Support mamba state offloading & HybridCacheController (#20457)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2026-03-23 20:02:50 -07:00 |
|
Ratish P
|
2b1d3c935e
|
[diffusion] fix Z-Image SP sharding for portrait and padded resolutions (#21042)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-03-24 10:15:33 +08:00 |
|
Yuxuan Zhang
|
fcaad42b00
|
[Bug Fix] GLM-V / GLM-OCR: field detection for transformers 5.x and MTP omission fix (#21134)
|
2026-03-23 13:19:48 -07:00 |
|
Baizhou Zhang
|
ed316a26ef
|
Fix CP in-seq-split method for DeepSeek V32 and update related tests (#21192)
|
2026-03-23 12:34:10 -07:00 |
|
Lianmin Zheng
|
27ac831a84
|
docs: improve CI and testing documentation (#21202)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-23 10:48:50 -07:00 |
|
jacky.cheng
|
b4d3fb001d
|
[AMD] Add fused GemmaRMSNorm forward_hip to use aiter/vllm kernels for qwen3.5 (#21188)
|
2026-03-23 10:21:36 -07:00 |
|
Johnsonms
|
777edb6ef7
|
Fix(jit): support rmsnorm for hidden_size in {64, 128, 256} (#20661)
|
2026-03-23 23:17:44 +08:00 |
|
Yuan Luo
|
5bdc07d974
|
[Qwen3.5] Fuse split/reshape/cat ops in GDN projection with Triton kernel (#21019)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-23 23:17:01 +08:00 |
|
McZyWu
|
8662ba7db4
|
[NPU] bugfix for import sgl-kernel error (#21200)
|
2026-03-23 19:52:36 +08:00 |
|
strgrb
|
80d4a0753a
|
fix fused_set_kv_buffer for rope with Ling-v2 (#20316)
|
2026-03-23 19:20:40 +08:00 |
|
McZyWu
|
4641e5a3d2
|
[NPU] enhance accuracy for model minimaxm2 from 16.5% to 95.5% (#17695)
|
2026-03-23 19:06:38 +08:00 |
|
XDaoHong
|
2d288ba8c9
|
[Bugfix] fix npu get kv_item_lens in PD separation when use ASCEND_US… (#15852)
Co-authored-by: ZhengdQin <zhengdqin@gmail.com>
|
2026-03-23 15:56:47 +08:00 |
|
kpham-sgl
|
59cb9a9da6
|
[Spec][Ngram] 3/N: Fix synchronization issues in Ngram.cpp (#21186)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-23 00:37:07 -07:00 |
|
Lianmin Zheng
|
814202704b
|
ci: unify PR test suite naming (#21187)
|
2026-03-23 00:18:45 -07:00 |
|
yudian0504
|
3d312643b9
|
[BUGFIX] Fix CP residual size mismatch crash when tp_size == attn_cp_size (#21170)
|
2026-03-23 00:12:58 -07:00 |
|
Lianmin Zheng
|
7757a9ddd0
|
ci: remove IS_BLACKWELL env var; auto-detect Blackwell (#21118)
|
2026-03-22 23:44:48 -07:00 |
|
kpham-sgl
|
bc4aaab6a1
|
[Spec][Ngram] 2/N: Rename branch length to max trie depth (#21181)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-22 23:35:25 -07:00 |
|
Cheng Wan
|
d6b12c401c
|
Revert "[bugfix] Fix PPMissingLayer AttributeError when Using PP" (#21189)
|
2026-03-22 23:28:36 -07:00 |
|
Zhiqiang Xie
|
13f4f010d8
|
HiSparse for Sparse Attention (#20343)
|
2026-03-22 23:09:31 -07:00 |
|
Lianmin Zheng
|
7050011dee
|
Enable JIT clamp_position and resolve_future_token_ids on ROCm (#21116)
|
2026-03-22 22:33:54 -07:00 |
|
Yuhao Yang
|
32a85ef128
|
[diffusion] CI: auto-skip diffusion tests when required pipeline class is missing from diffusers (#21139)
|
2026-03-23 12:15:21 +08:00 |
|
Mohammad Miadh Angkad
|
d8a5b1dbaf
|
[Bugfix] Work around FlashInfer unified transport issue on GB (#20039)
|
2026-03-22 21:10:25 -07:00 |
|
Xiaoyu Zhang
|
a94d67d44b
|
[SKILL] fix(bench): Support model-specific DenoisingStage variants in… (#21137)
|
2026-03-23 12:08:00 +08:00 |
|
fanghao
|
2b47bd3a34
|
[Bug Fix] Fix non-streaming request abort failure when --enable-metrics is enabled (#20625)
|
2026-03-22 19:58:49 -07:00 |
|
yuumn
|
889e8489e9
|
[diffusion] model: support FireRed-Image-Edit (#20862)
Co-authored-by: yuumn <1010797597@qqã.com>
|
2026-03-23 10:27:07 +08:00 |
|
Cishoon
|
999bad5aba
|
Fix VRAM leak in overlap scheduling with structured output (#20640) (#20697)
|
2026-03-22 17:07:39 -07:00 |
|
Yilong Zhao
|
343998865a
|
perf: pad max-num-requests in decode cuda graph for higher coverage (#20978)
|
2026-03-22 17:06:16 -07:00 |
|
Ziang Li
|
ce0541404f
|
[FlashInfer v0.6.6][RL] Support fp8-last-n-bf16 RL for flashinfer_trtllm_routed moe backend (#20214)
|
2026-03-22 11:17:01 -07:00 |
|
Xiaoyu Zhang
|
c1fe5de69c
|
[Diffusion] Clean up diffusion Triton kernels and modernize custom op registration (#21122)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-22 22:38:57 +08:00 |
|
Xiaoyu Zhang
|
766d225fcc
|
Add SGLang CUDA crash API logging inspired by FlashInfer (#20910)
|
2026-03-22 16:39:40 +08:00 |
|
Shunkangz
|
bb737d7a82
|
Support Qwen3 MoE context parallel (#18233)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
|
2026-03-22 01:27:20 -07:00 |
|
kpham-sgl
|
6d160b42bb
|
[Spec][Ngram] 1/N: Reference based Speculative Decoding refactor (#20393)
|
2026-03-22 00:55:10 -07:00 |
|
Xiaoyu Zhang
|
1b65c0d259
|
[Diffusion] Fix torch.compile RMSNorm fallback for Z-Image (#20962)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-22 15:38:22 +08:00 |
|
Bowen Li
|
3bc595acbc
|
[FlashAttn] Add fused triton kernel for normal_decode_set_metadata (#20778)
Co-authored-by: kinza99 <dh18324568312@163.com>
|
2026-03-22 15:12:29 +08:00 |
|
Mick
|
f7fc2c8592
|
[diffusion] fix: fix accuracy for some image models (#20679)
|
2026-03-22 15:11:57 +08:00 |
|
shuwenn
|
2fba2bdad1
|
refactor: Remove dead code from utils/common.py (#20668)
|
2026-03-21 21:54:17 -07:00 |
|
Lianmin Zheng
|
76e4a8662c
|
Replace clamp_position with JIT kernel + platform dispatch (#20999)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-21 21:26:26 -07:00 |
|