Commit Graph

6437 Commits

Author SHA1 Message Date
muse-coder
91230dcca8 [FIX] Correct JIT kernel compilation on newer GPUs with outdated driver metadata. (#18496)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-02-15 12:14:39 +08:00
Bhavneek Singh
1ce3420784 Model: Support IBM Granite (Dense/Mamba + MoE) (#18040) 2026-02-15 11:24:41 +08:00
Lianmin Zheng
b33769786f [Auto Sync] Update grpc_request_manager.py, tokenizer_manag... (20260214) (#18838)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-02-14 18:12:32 -08:00
Guangda Liu
190fa8246f Fix model loading for DeepSeek-V3.2-AWQ (#16907)
Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>
2026-02-15 09:39:53 +08:00
Lianmin Zheng
8b2020584c [Auto Sync] Update test_deterministic.py (20260214) (#18839)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>
2026-02-14 17:19:30 -08:00
Xiaoyu Zhang
4067d9487d [diffusion] feat: opt vae decode with channels_last_3d (#18540) 2026-02-14 23:19:45 +08:00
Xiaoyu Zhang
c29394e3c8 [kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475) 2026-02-14 23:06:21 +08:00
Kangyan-Zhou
ae95869292 Enable SGLANG_ENABLE_SPEC_V2 for nightly speculative decoding tests (#18719)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 23:00:33 +08:00
Raayan Dhar
92cdd398cd feat: Support mrope_section with rope_type: "yarn" (#13313)
Signed-off-by: Raayan Dhar raayan.dhar@gmail.com <raayan.dhar@gmail.com>
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
2026-02-14 22:51:44 +08:00
Ke Bao
f51e9d9ca1 Add ci test for ring model (#18829) 2026-02-14 22:20:23 +08:00
ybyang
c8aa2a6534 Fix dsv32 encode_messages (#18126) 2026-02-14 16:44:13 +08:00
Johnsonms
34132d6da5 Kernel: optimize decoding metadata in NSA multi-spec backend with fused kernels (#17554) 2026-02-14 16:40:15 +08:00
Yuan Luo
fa0ef6e4f7 [VLM][LLM] Optimize fused_moe triton kernel tma (#18782)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-02-14 14:35:26 +08:00
JD
f6c18c3a85 Fix/partial gen from waiting queue miss metadata (#17610) 2026-02-13 19:04:08 -08:00
R0CKSTAR
45a4697d45 [diffusion][MUSA] fix: MUSA platform breakage caused by PR #13662 (#18456)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2026-02-14 11:00:39 +08:00
qmzznbxhl
066b0b70d9 Handle abort for retracted requests in disagg decode prealloc queue (#18705)
Co-authored-by: sunhailiang <sunhailiang@baidu.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2026-02-13 18:39:39 -08:00
shuwenn
bd39de7d5e [Env] centralize hicache vars in environ.py (#17204) 2026-02-13 18:02:31 -08:00
Liangsheng Yin
dcea74d63f Add timeout abort kits for normal / eagle. (#18815) 2026-02-13 17:57:30 -08:00
Liangsheng Yin
4474fb98b4 [PD-Disagg] Fix double free when prebuilt batch is aborted. (#18822) 2026-02-13 17:46:35 -08:00
Leon Gao
ab0fb248fd feat: add SGLANG_DISTRIBUTED_INIT_METHOD_OVERRIDE env var (#18743) 2026-02-14 09:37:33 +08:00
Minglei Zhu
8be18c655d [Perf] refactor piecewise cuda graph support of Qwen3-Next (#17613) 2026-02-14 09:30:50 +08:00
shuwenn
3299c4f9c1 [CI] feat: add early exit to wait_for_server when process dies (#18602) 2026-02-13 16:46:09 -08:00
Mohammad Miadh Angkad
1be41e9036 [FlashInfer] Bump FlashInfer version from 0.6.2 to 0.6.3 (#18448) 2026-02-14 07:43:33 +08:00
JD
191d354f53 fix double-free kv cache for requests that have already finished and been freed during preemption (#18694) 2026-02-13 13:17:44 -08:00
Lianmin Zheng
008ea46af1 [Auto Sync] Update loader.py, weight_utils.py (20260213) (#18779)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Xiuyu Li <xiuyu@x.ai>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2026-02-13 12:22:50 -08:00
Qi Jia
4c6afbeeaa [bugfix] fix mamba slot leak when scheduling fails with radix cache (#15840) (#16067)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2026-02-13 23:43:57 +08:00
dongjiyingdjy
8b4c364960 refactor context parallel state (#17213)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-02-13 23:18:17 +08:00
Linyu Wu
0012d6a4eb [Kernel Slimming] Migrate GPTQ-Marlin repack kernel to JIT (#18543)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-02-13 22:29:22 +08:00
Mick
37273408eb [diffusion] chore: use batched P2P ops in VAE parallel decoding (#18728) 2026-02-13 22:11:20 +08:00
triple-mu
acc940d302 [diffusion] fix typo (#18790) 2026-02-13 21:59:39 +08:00
R0CKSTAR
07633349c9 [diffusion] fix: webui task_type check (#18462)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-02-13 21:19:16 +08:00
Mick
efdd676d56 [diffusion] refactor: merge redundant default_dtype and param_dtype parameters in FSDP loader (#18789) 2026-02-13 21:18:02 +08:00
Kaixi
98ad284ebf Added cuda availability guard (#18480) 2026-02-13 20:18:34 +08:00
Ke Bao
a0ebaa6498 Cleanup debug log for Ring model (#18793) 2026-02-13 18:36:20 +08:00
Ke Bao
eacab2868a Adjust mamba cache allocation (#18786) 2026-02-13 18:06:23 +08:00
Yinghai Lu
e4b2b57620 [schedule] Fix streaming return of customized_info (#18654) 2026-02-13 17:19:16 +08:00
Xinwei Qiang
356e338607 [diffusion] feat: support SparseVideoGen2 attention backend (#17507)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-02-13 16:20:46 +08:00
ant-yy
d97eb111a3 Support LingV2_5 model (#18598)
Co-authored-by: zhangkaihong.zkh <zhangkaihong.zkh@antgroup.com>
Co-authored-by: 有禾 <zhangdonghao.zdh@antgroup.com>
Co-authored-by: yudian0504 <138860534+yudian0504@users.noreply.github.com>
Co-authored-by: 悠扬 <youyang.zmy@antgroup.com>
Co-authored-by: xinxingyang <xinxing.yangxx@antgroup.com>
Co-authored-by: zmy460290 <zmy460290@antgroup.com>
2026-02-13 16:09:15 +08:00
Xiaoyu Zhang
013a199bc6 [CI] Skip cutedsl gdn performance test in jit_kernel ci (#18783) 2026-02-13 15:49:30 +08:00
Shangming Cai
1f39bf6523 [Bugfix] Add warnings when NSA indexer cache indice mismatch in PD module (#18727) 2026-02-13 15:20:05 +08:00
Liangsheng Yin
e6f7a372ef Rename request timeout env vars for waiting/running stages (#18766) 2026-02-12 22:58:40 -08:00
xiaoye
5700b19cbf [diffusion] feat: support tp for qwen-image-edit-2511 (#18619)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-02-13 13:04:29 +08:00
Liangsheng Yin
d29e331491 [Spec] Move forward timeout before verify to fix Eagle v1 filter mismatch (#18760) 2026-02-12 20:58:34 -08:00
pansicheng
7d4ae057ec [Kernel] Add JIT rotary_embedding_kernel (#17934)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: root <root@zhikuan-A10x2.ea134>
2026-02-13 12:41:25 +08:00
Bhavneek Singh
32e0286829 [diffusion] fix: fixe local model loading issue in bench_serving (#18687)
Co-authored-by: Bhavneek Singh <blazingbhavneek@Bhavneeks-MacBook-Air.local>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-02-13 11:57:52 +08:00
HuangJi
f4d80f9d42 [diffusion] feat: allows quality adjustment of generated images/videos (#17937) 2026-02-13 11:56:20 +08:00
Bingxu Chen
6555b2a71d [diffusion] fix: fix ci on amd (#18716)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-02-13 11:51:24 +08:00
Lianmin Zheng
c56a5efbaa [Auto Sync] Update grok.py (20260213) (#18765)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2026-02-12 18:41:41 -08:00
Lianmin Zheng
d5f66fec15 Revert changes to weight_utils.py (#18759) 2026-02-12 17:15:16 -08:00
Alison Shao
dd77bd4651 Fix invalid import paths in glm_image.py (#18757) 2026-02-12 16:44:34 -08:00