sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-04 06:17:17 +00:00

Author	SHA1	Message	Date
Ratish P	ddfe147377	[diffusion]: Improve layerwise offload buffer reuse and shared-storage handling (#18611 )	2026-02-15 22:17:51 +08:00
Mick	3feb48139e	[diffusion] quant: add support for svdquant and nunchaku (#18549 ) Co-authored-by: AichenF <aichenf@nvidia.com> Co-authored-by: jianyingzhu <53300651@qq.com>	2026-02-15 20:43:00 +08:00
Michael	88010e9601	[AMD] Fix nightly 1-GPU test failures and bench_serving regression (#18761 ) Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>	2026-02-15 20:36:47 +08:00
fzyzcjy	4c7f986c6b	Extract dumper and prefill delayer tests common utils (#18857 )	2026-02-15 18:33:23 +08:00
haowen-han	b992828ad2	fix: fix bug on kimi2.5 with dp2 and tp4 (#18604 ) Co-authored-by: hanhaowen <hanhaowen@baidu.com>	2026-02-15 16:32:13 +08:00
Ratish P	274bf6607a	[diffusion] fix: enable torch.compile for UlyssesAttention (#18840 )	2026-02-15 15:54:27 +08:00
zhangxiaolei123456	ad1bdb93df	perf: add minimax-2.5 fused_moe tuning config for h20 (#18833 )	2026-02-15 15:46:56 +08:00
jackey hua	922fbc21e2	[Perf] Tune MiniMax M2 fused moe kernel on H100 GPU (#18851 )	2026-02-15 15:30:52 +08:00
andyluo7	944a9f6fcf	Fix/qwen3 5 amd rope cutedsl fallback (#18753 ) Co-authored-by: seungrokj <seungrok.jung@amd.com>	2026-02-14 22:09:44 -08:00
muse-coder	91230dcca8	[FIX] Correct JIT kernel compilation on newer GPUs with outdated driver metadata. (#18496 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-02-15 12:14:39 +08:00
Bhavneek Singh	1ce3420784	Model: Support IBM Granite (Dense/Mamba + MoE) (#18040 )	2026-02-15 11:24:41 +08:00
Lianmin Zheng	b33769786f	[Auto Sync] Update grpc_request_manager.py, tokenizer_manag... (20260214) (#18838 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-02-14 18:12:32 -08:00
Guangda Liu	190fa8246f	Fix model loading for DeepSeek-V3.2-AWQ (#16907 ) Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>	2026-02-15 09:39:53 +08:00
Lianmin Zheng	8b2020584c	[Auto Sync] Update test_deterministic.py (20260214) (#18839 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>	2026-02-14 17:19:30 -08:00
Xiaoyu Zhang	4067d9487d	[diffusion] feat: opt vae decode with `channels_last_3d` (#18540 )	2026-02-14 23:19:45 +08:00
Xiaoyu Zhang	c29394e3c8	[kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475 )	2026-02-14 23:06:21 +08:00
Kangyan-Zhou	ae95869292	Enable SGLANG_ENABLE_SPEC_V2 for nightly speculative decoding tests (#18719 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 23:00:33 +08:00
Raayan Dhar	92cdd398cd	feat: Support `mrope_section` with `rope_type: "yarn"` (#13313 ) Signed-off-by: Raayan Dhar raayan.dhar@gmail.com <raayan.dhar@gmail.com> Signed-off-by: raayandhar <raayan.dhar@gmail.com>	2026-02-14 22:51:44 +08:00
Ke Bao	f51e9d9ca1	Add ci test for ring model (#18829 )	2026-02-14 22:20:23 +08:00
ybyang	c8aa2a6534	Fix dsv32 encode_messages (#18126 )	2026-02-14 16:44:13 +08:00
Johnsonms	34132d6da5	Kernel: optimize decoding metadata in NSA multi-spec backend with fused kernels (#17554 )	2026-02-14 16:40:15 +08:00
Yuan Luo	fa0ef6e4f7	[VLM][LLM] Optimize fused_moe triton kernel tma (#18782 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-14 14:35:26 +08:00
JD	f6c18c3a85	Fix/partial gen from waiting queue miss metadata (#17610 )	2026-02-13 19:04:08 -08:00
R0CKSTAR	45a4697d45	[diffusion][MUSA] fix: MUSA platform breakage caused by PR #13662 (#18456 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-02-14 11:00:39 +08:00
qmzznbxhl	066b0b70d9	Handle abort for retracted requests in disagg decode prealloc queue (#18705 ) Co-authored-by: sunhailiang <sunhailiang@baidu.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2026-02-13 18:39:39 -08:00
shuwenn	bd39de7d5e	[Env] centralize hicache vars in environ.py (#17204 )	2026-02-13 18:02:31 -08:00
Liangsheng Yin	dcea74d63f	Add timeout abort kits for normal / eagle. (#18815 )	2026-02-13 17:57:30 -08:00
Liangsheng Yin	4474fb98b4	[PD-Disagg] Fix double free when prebuilt batch is aborted. (#18822 )	2026-02-13 17:46:35 -08:00
Leon Gao	ab0fb248fd	feat: add SGLANG_DISTRIBUTED_INIT_METHOD_OVERRIDE env var (#18743 )	2026-02-14 09:37:33 +08:00
Minglei Zhu	8be18c655d	[Perf] refactor piecewise cuda graph support of Qwen3-Next (#17613 )	2026-02-14 09:30:50 +08:00
shuwenn	3299c4f9c1	[CI] feat: add early exit to wait_for_server when process dies (#18602 )	2026-02-13 16:46:09 -08:00
Mohammad Miadh Angkad	1be41e9036	[FlashInfer] Bump FlashInfer version from 0.6.2 to 0.6.3 (#18448 )	2026-02-14 07:43:33 +08:00
JD	191d354f53	fix double-free kv cache for requests that have already finished and been freed during preemption (#18694 )	2026-02-13 13:17:44 -08:00
Lianmin Zheng	008ea46af1	[Auto Sync] Update loader.py, weight_utils.py (20260213) (#18779 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Xiuyu Li <xiuyu@x.ai> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2026-02-13 12:22:50 -08:00
Qi Jia	4c6afbeeaa	[bugfix] fix mamba slot leak when scheduling fails with radix cache (#15840 ) (#16067 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2026-02-13 23:43:57 +08:00
dongjiyingdjy	8b4c364960	refactor context parallel state (#17213 ) Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2026-02-13 23:18:17 +08:00
Linyu Wu	0012d6a4eb	[Kernel Slimming] Migrate GPTQ-Marlin repack kernel to JIT (#18543 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-02-13 22:29:22 +08:00
Mick	37273408eb	[diffusion] chore: use batched P2P ops in VAE parallel decoding (#18728 )	2026-02-13 22:11:20 +08:00
triple-mu	acc940d302	[diffusion] fix typo (#18790 )	2026-02-13 21:59:39 +08:00
R0CKSTAR	07633349c9	[diffusion] fix: webui task_type check (#18462 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-02-13 21:19:16 +08:00
Mick	efdd676d56	[diffusion] refactor: merge redundant default_dtype and param_dtype parameters in FSDP loader (#18789 )	2026-02-13 21:18:02 +08:00
Kaixi	98ad284ebf	Added cuda availability guard (#18480 )	2026-02-13 20:18:34 +08:00
Ke Bao	a0ebaa6498	Cleanup debug log for Ring model (#18793 )	2026-02-13 18:36:20 +08:00
Ke Bao	eacab2868a	Adjust mamba cache allocation (#18786 )	2026-02-13 18:06:23 +08:00
Yinghai Lu	e4b2b57620	[schedule] Fix streaming return of customized_info (#18654 )	2026-02-13 17:19:16 +08:00
Xinwei Qiang	356e338607	[diffusion] feat: support SparseVideoGen2 attention backend (#17507 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-02-13 16:20:46 +08:00
ant-yy	d97eb111a3	Support LingV2_5 model (#18598 ) Co-authored-by: zhangkaihong.zkh <zhangkaihong.zkh@antgroup.com> Co-authored-by: 有禾 <zhangdonghao.zdh@antgroup.com> Co-authored-by: yudian0504 <138860534+yudian0504@users.noreply.github.com> Co-authored-by: 悠扬 <youyang.zmy@antgroup.com> Co-authored-by: xinxingyang <xinxing.yangxx@antgroup.com> Co-authored-by: zmy460290 <zmy460290@antgroup.com>	2026-02-13 16:09:15 +08:00
Xiaoyu Zhang	013a199bc6	[CI] Skip cutedsl gdn performance test in jit_kernel ci (#18783 )	2026-02-13 15:49:30 +08:00
Shangming Cai	1f39bf6523	[Bugfix] Add warnings when NSA indexer cache indice mismatch in PD module (#18727 )	2026-02-13 15:20:05 +08:00
Liangsheng Yin	e6f7a372ef	Rename request timeout env vars for waiting/running stages (#18766 )	2026-02-12 22:58:40 -08:00

1 2 3 4 5 ...

6296 Commits