Ratish P
|
ddfe147377
|
[diffusion]: Improve layerwise offload buffer reuse and shared-storage handling (#18611)
|
2026-02-15 22:17:51 +08:00 |
|
Mick
|
3feb48139e
|
[diffusion] quant: add support for svdquant and nunchaku (#18549)
Co-authored-by: AichenF <aichenf@nvidia.com>
Co-authored-by: jianyingzhu <53300651@qq.com>
|
2026-02-15 20:43:00 +08:00 |
|
Michael
|
88010e9601
|
[AMD] Fix nightly 1-GPU test failures and bench_serving regression (#18761)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
|
2026-02-15 20:36:47 +08:00 |
|
fzyzcjy
|
4c7f986c6b
|
Extract dumper and prefill delayer tests common utils (#18857)
|
2026-02-15 18:33:23 +08:00 |
|
haowen-han
|
b992828ad2
|
fix: fix bug on kimi2.5 with dp2 and tp4 (#18604)
Co-authored-by: hanhaowen <hanhaowen@baidu.com>
|
2026-02-15 16:32:13 +08:00 |
|
Ratish P
|
274bf6607a
|
[diffusion] fix: enable torch.compile for UlyssesAttention (#18840)
|
2026-02-15 15:54:27 +08:00 |
|
zhangxiaolei123456
|
ad1bdb93df
|
perf: add minimax-2.5 fused_moe tuning config for h20 (#18833)
|
2026-02-15 15:46:56 +08:00 |
|
jackey hua
|
922fbc21e2
|
[Perf] Tune MiniMax M2 fused moe kernel on H100 GPU (#18851)
|
2026-02-15 15:30:52 +08:00 |
|
andyluo7
|
944a9f6fcf
|
Fix/qwen3 5 amd rope cutedsl fallback (#18753)
Co-authored-by: seungrokj <seungrok.jung@amd.com>
|
2026-02-14 22:09:44 -08:00 |
|
muse-coder
|
91230dcca8
|
[FIX] Correct JIT kernel compilation on newer GPUs with outdated driver metadata. (#18496)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-02-15 12:14:39 +08:00 |
|
Bhavneek Singh
|
1ce3420784
|
Model: Support IBM Granite (Dense/Mamba + MoE) (#18040)
|
2026-02-15 11:24:41 +08:00 |
|
Lianmin Zheng
|
b33769786f
|
[Auto Sync] Update grpc_request_manager.py, tokenizer_manag... (20260214) (#18838)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
2026-02-14 18:12:32 -08:00 |
|
Guangda Liu
|
190fa8246f
|
Fix model loading for DeepSeek-V3.2-AWQ (#16907)
Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>
|
2026-02-15 09:39:53 +08:00 |
|
Lianmin Zheng
|
8b2020584c
|
[Auto Sync] Update test_deterministic.py (20260214) (#18839)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>
|
2026-02-14 17:19:30 -08:00 |
|
Xiaoyu Zhang
|
4067d9487d
|
[diffusion] feat: opt vae decode with channels_last_3d (#18540)
|
2026-02-14 23:19:45 +08:00 |
|
Xiaoyu Zhang
|
c29394e3c8
|
[kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475)
|
2026-02-14 23:06:21 +08:00 |
|
Kangyan-Zhou
|
ae95869292
|
Enable SGLANG_ENABLE_SPEC_V2 for nightly speculative decoding tests (#18719)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-14 23:00:33 +08:00 |
|
Raayan Dhar
|
92cdd398cd
|
feat: Support mrope_section with rope_type: "yarn" (#13313)
Signed-off-by: Raayan Dhar raayan.dhar@gmail.com <raayan.dhar@gmail.com>
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
|
2026-02-14 22:51:44 +08:00 |
|
Ke Bao
|
f51e9d9ca1
|
Add ci test for ring model (#18829)
|
2026-02-14 22:20:23 +08:00 |
|
ybyang
|
c8aa2a6534
|
Fix dsv32 encode_messages (#18126)
|
2026-02-14 16:44:13 +08:00 |
|
Johnsonms
|
34132d6da5
|
Kernel: optimize decoding metadata in NSA multi-spec backend with fused kernels (#17554)
|
2026-02-14 16:40:15 +08:00 |
|
Yuan Luo
|
fa0ef6e4f7
|
[VLM][LLM] Optimize fused_moe triton kernel tma (#18782)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-14 14:35:26 +08:00 |
|
JD
|
f6c18c3a85
|
Fix/partial gen from waiting queue miss metadata (#17610)
|
2026-02-13 19:04:08 -08:00 |
|
R0CKSTAR
|
45a4697d45
|
[diffusion][MUSA] fix: MUSA platform breakage caused by PR #13662 (#18456)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-02-14 11:00:39 +08:00 |
|
qmzznbxhl
|
066b0b70d9
|
Handle abort for retracted requests in disagg decode prealloc queue (#18705)
Co-authored-by: sunhailiang <sunhailiang@baidu.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2026-02-13 18:39:39 -08:00 |
|
shuwenn
|
bd39de7d5e
|
[Env] centralize hicache vars in environ.py (#17204)
|
2026-02-13 18:02:31 -08:00 |
|
Liangsheng Yin
|
dcea74d63f
|
Add timeout abort kits for normal / eagle. (#18815)
|
2026-02-13 17:57:30 -08:00 |
|
Liangsheng Yin
|
4474fb98b4
|
[PD-Disagg] Fix double free when prebuilt batch is aborted. (#18822)
|
2026-02-13 17:46:35 -08:00 |
|
Leon Gao
|
ab0fb248fd
|
feat: add SGLANG_DISTRIBUTED_INIT_METHOD_OVERRIDE env var (#18743)
|
2026-02-14 09:37:33 +08:00 |
|
Minglei Zhu
|
8be18c655d
|
[Perf] refactor piecewise cuda graph support of Qwen3-Next (#17613)
|
2026-02-14 09:30:50 +08:00 |
|
shuwenn
|
3299c4f9c1
|
[CI] feat: add early exit to wait_for_server when process dies (#18602)
|
2026-02-13 16:46:09 -08:00 |
|
Mohammad Miadh Angkad
|
1be41e9036
|
[FlashInfer] Bump FlashInfer version from 0.6.2 to 0.6.3 (#18448)
|
2026-02-14 07:43:33 +08:00 |
|
JD
|
191d354f53
|
fix double-free kv cache for requests that have already finished and been freed during preemption (#18694)
|
2026-02-13 13:17:44 -08:00 |
|
Lianmin Zheng
|
008ea46af1
|
[Auto Sync] Update loader.py, weight_utils.py (20260213) (#18779)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Xiuyu Li <xiuyu@x.ai>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2026-02-13 12:22:50 -08:00 |
|
Qi Jia
|
4c6afbeeaa
|
[bugfix] fix mamba slot leak when scheduling fails with radix cache (#15840) (#16067)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2026-02-13 23:43:57 +08:00 |
|
dongjiyingdjy
|
8b4c364960
|
refactor context parallel state (#17213)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-02-13 23:18:17 +08:00 |
|
Linyu Wu
|
0012d6a4eb
|
[Kernel Slimming] Migrate GPTQ-Marlin repack kernel to JIT (#18543)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-02-13 22:29:22 +08:00 |
|
Mick
|
37273408eb
|
[diffusion] chore: use batched P2P ops in VAE parallel decoding (#18728)
|
2026-02-13 22:11:20 +08:00 |
|
triple-mu
|
acc940d302
|
[diffusion] fix typo (#18790)
|
2026-02-13 21:59:39 +08:00 |
|
R0CKSTAR
|
07633349c9
|
[diffusion] fix: webui task_type check (#18462)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-02-13 21:19:16 +08:00 |
|
Mick
|
efdd676d56
|
[diffusion] refactor: merge redundant default_dtype and param_dtype parameters in FSDP loader (#18789)
|
2026-02-13 21:18:02 +08:00 |
|
Kaixi
|
98ad284ebf
|
Added cuda availability guard (#18480)
|
2026-02-13 20:18:34 +08:00 |
|
Ke Bao
|
a0ebaa6498
|
Cleanup debug log for Ring model (#18793)
|
2026-02-13 18:36:20 +08:00 |
|
Ke Bao
|
eacab2868a
|
Adjust mamba cache allocation (#18786)
|
2026-02-13 18:06:23 +08:00 |
|
Yinghai Lu
|
e4b2b57620
|
[schedule] Fix streaming return of customized_info (#18654)
|
2026-02-13 17:19:16 +08:00 |
|
Xinwei Qiang
|
356e338607
|
[diffusion] feat: support SparseVideoGen2 attention backend (#17507)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-02-13 16:20:46 +08:00 |
|
ant-yy
|
d97eb111a3
|
Support LingV2_5 model (#18598)
Co-authored-by: zhangkaihong.zkh <zhangkaihong.zkh@antgroup.com>
Co-authored-by: 有禾 <zhangdonghao.zdh@antgroup.com>
Co-authored-by: yudian0504 <138860534+yudian0504@users.noreply.github.com>
Co-authored-by: 悠扬 <youyang.zmy@antgroup.com>
Co-authored-by: xinxingyang <xinxing.yangxx@antgroup.com>
Co-authored-by: zmy460290 <zmy460290@antgroup.com>
|
2026-02-13 16:09:15 +08:00 |
|
Xiaoyu Zhang
|
013a199bc6
|
[CI] Skip cutedsl gdn performance test in jit_kernel ci (#18783)
|
2026-02-13 15:49:30 +08:00 |
|
Shangming Cai
|
1f39bf6523
|
[Bugfix] Add warnings when NSA indexer cache indice mismatch in PD module (#18727)
|
2026-02-13 15:20:05 +08:00 |
|
Liangsheng Yin
|
e6f7a372ef
|
Rename request timeout env vars for waiting/running stages (#18766)
|
2026-02-12 22:58:40 -08:00 |
|