sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 04:08:10 +00:00

Author	SHA1	Message	Date
Linyu Wu	beabaa8d37	[Kernel Slimming] Migrate marlin moe kernel to JIT (#19181 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-02-26 09:05:13 +08:00
Daniel Cámpora	350190487b	Flashinfer MOE FP8 support for Mistral Large 3. (#15422 ) Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-02-25 15:00:37 -08:00
Liangsheng Yin	c60dcc40bb	[Logging] Guard `log_prefill_stats` against idle batches in disagg prefill (#19361 )	2026-02-25 13:31:52 -08:00
YAMY	08957c88ea	[Logging] Fix prefill side logging in pd disagg (#19350 )	2026-02-25 12:42:18 -08:00
Kangyan-Zhou	306c552639	Revert "Fix HybridAttnBackend forward for linear attention" (#19356 )	2026-02-25 11:49:50 -08:00
jacky.cheng	b2c46fc60b	[AMD] Support Qwen3-Coder-Next on AMD platform (#18355 ) Co-authored-by: yichiche@amd.com <jacky.cheng>	2026-02-25 11:06:22 -08:00
Makcum888e	0217e82a08	[diffusion] Clean code (#19325 )	2026-02-25 21:16:03 +03:00
Even Zhou	2fb239450e	Revert "bugfix: prioritize init_npu_backend to fix various initialization bugs" (#19343 )	2026-02-25 23:04:30 +08:00
Yuhao Yang	c7c4a1cbbd	refactor linear attention backend (#18622 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2026-02-25 23:02:44 +08:00
Mick	471acd98b9	[diffusion] logging: improve logging (#19312 )	2026-02-25 23:00:35 +08:00
Qingfu Wen	59b9d1e86d	[diffusion] improve: improve fuse_scale_shift_kernel with non-blocking op (#18710 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-02-25 21:04:20 +08:00
akhilg-nv	c144e55462	Fix HybridAttnBackend forward for linear attention (#19006 )	2026-02-25 21:02:37 +08:00
Zheng Li	d38c0e537d	fix(dense): fix Qwen3.5 dense model precision bug in TP_SIZE>1 (#19070 )	2026-02-25 20:54:42 +08:00
Even Zhou	cdc411160b	[NPU] Fix a corner case where FusedMoE.top_k is not explicitly declared (#19287 )	2026-02-25 20:49:59 +08:00
Mick	9840cd3f68	[diffusion] chore: enable sequence shard for wan by default (#19311 )	2026-02-25 18:21:44 +08:00
billishyahao	60eeef7370	[AMD][with CI Fix] support two batch overlapping for mori ep (#19216 ) Co-authored-by: Duyi-Wang <duyi.wang@amd.com> Co-authored-by: kkHuang-amd <wunhuang@amd.com> Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com> Co-authored-by: HAI <hixiao@gmail.com>	2026-02-25 02:14:08 -08:00
GMI Xiao Jin	c4ef33862b	[diffusion] fix: fix bugs to let LTX-2 pipeline support latest Sglang Args pipelines (#19295 )	2026-02-25 17:30:36 +08:00
Mohammad Miadh Angkad	671b595570	Fix `trtllm_mha` fp8 SWA KV index translation (#19107 )	2026-02-25 17:02:17 +08:00
Julian Huang	a55f658835	[Misc] Normalize `--host` parameter to use plain hostname without scheme (#19309 ) Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2026-02-25 00:37:24 -08:00
YAMY	f75abb4521	[Fix][Qwen3.5] Fix KV cache slice transfer for GQA models with replicated KV heads (#19086 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-02-25 16:26:44 +08:00
huangtingwei	d40cb2f725	[HiCache] Support heterogeneous tp for hicache storage (#18541 ) Co-authored-by: hzh0425 <hzh0425@apache.org>	2026-02-25 00:13:57 -08:00
Wang, Yi	3d879c69e9	refactor: extract device-to-backend mapping into get_default_distributed_backend (#19202 ) Signed-off-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2026-02-24 23:50:26 -08:00
Hexq0210	d0bb140034	[NPU] bugfix for model Qwen3-Coder-Next at weight shape transpose for npu. (#18700 ) Co-authored-by: McZyWu <zhuoyun.wu.23@ucl.ac.uk>	2026-02-25 15:46:20 +08:00
xutizhou	a1b39c1c26	Perf/fuse mamba state scatter mtp verify (#18088 )	2026-02-25 15:40:55 +08:00
lw9527	4a3a787f1e	[Fix] Kimi K2.5 support pp (#18434 ) Co-authored-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com> Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-02-25 15:22:11 +08:00
Shangming Cai	8d9ee6669e	Fix comment for tp_rank calculation in dp_attention (#19306 )	2026-02-25 15:19:10 +08:00
Hubert Lu	17b0affbdf	[AMD] Support --enable-aiter-allreduce-fusion on AMD GPUs (#13747 ) Co-authored-by: yctseng0211 <yctseng@amd.com>	2026-02-24 23:11:55 -08:00
YAMY	73fe389dd1	[Qwen3.5] Raise Exception when radix_cache and extra_buffer are enabled at the same time (#19169 )	2026-02-25 15:04:37 +08:00
Liangsheng Yin	76d5410e01	[Disagg] Fix decode querying unregistered `dp_rank` when prefill `dp_size` is 1 (#19305 ) Co-authored-by: Yangmin Li <yangminl@nvidia.com>	2026-02-24 22:52:39 -08:00
Hubert Lu	8bd644765f	[AMD] Enable ROCm kvcache JIT path and add AMD CI coverage. (#18992 ) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-25 14:15:05 +08:00
Michael	ca09d71cf0	Fix nightly grok failure on rotary embedding import (#19192 ) Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>	2026-02-25 13:25:16 +08:00
jacky.cheng	e138f7960a	[AMD] Fix accuracy while using --enable-dp-attention (#19247 ) Co-authored-by: yichiche@amd.com <jacky.cheng>	2026-02-24 20:50:28 -08:00
Liangsheng Yin	ab0f608788	[PD-Disagg] Fix bootstrap server race condition when prefill workers not yet registered (#19288 ) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-24 20:22:16 -08:00
Liangsheng Yin	539f772f54	[PD-Disagg] Fully support external DP dispatch w/ PD-disaggregation mode. (#19268 ) Co-authored-by: Ratish P <114130421+ratish1@users.noreply.github.com>	2026-02-24 19:58:01 -08:00
Mick	241ee90164	[diffusion] chore: tiny fix pyproject.toml (#19256 )	2026-02-25 11:57:53 +08:00
Shangming Cai	0fac2796b6	[PD-Disagg] Improve KVManager init across all backends (#19240 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-02-25 10:37:09 +08:00
siyu	c0fdfd4b92	Delete mm.feature after decode phase (#17324 )	2026-02-24 18:13:03 -08:00
Xiaoyu Zhang	9dff933164	[Kernel Slimming] Remove sgl-kernel AOT marlin kernels (#19241 )	2026-02-25 10:08:22 +08:00
Feng Su	3b89302277	Refactor: observability code cleanup (#17862 ) Signed-off-by: Feng Su <sufeng@linux.alibaba.com>	2026-02-24 18:07:29 -08:00
siyu	245430eaac	Encoder Global Cache Manager (#16137 ) Co-authored-by: Zheng Wengang <zwg0606@gmail.com> Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>	2026-02-24 18:05:43 -08:00
fzyzcjy	b7af58b9af	Support replication axis in dump comparator (#19282 )	2026-02-25 09:48:43 +08:00
fzyzcjy	2e2b18e870	Support context parallel zigzag reordering in dump comparator (#19281 )	2026-02-25 09:46:17 +08:00
fzyzcjy	0de1f4b07b	Support multi axis unsharding in dump comparator (#19280 )	2026-02-25 09:44:07 +08:00
fzyzcjy	4bb678f28a	Full test coverage in dump comparator (#19279 )	2026-02-25 09:43:29 +08:00
fzyzcjy	94ca2ac5d7	Support TP unification and enhance tests in dump comparator (#19278 )	2026-02-25 09:42:56 +08:00
fzyzcjy	39ba9b5ab5	Support simple unsharding in dumper comparator (#19277 )	2026-02-25 09:42:21 +08:00
fzyzcjy	02ca107b2c	Support dims annotation and enhance dump loader in dumper (#19276 )	2026-02-25 09:41:48 +08:00
fzyzcjy	8b1ab4aaf9	Support agent-friendly output formats in dump comparator (#19275 )	2026-02-25 09:40:33 +08:00
fzyzcjy	d7578ce279	Implement simplest dump comparator v2 (#19274 )	2026-02-25 09:37:21 +08:00
wxy	9cec98b445	[diffusion] fix: shard timestep_proj in sequence-sharded ti2v (#19237 )	2026-02-25 09:33:45 +08:00

1 2 3 4 5 ...

6538 Commits