sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 04:08:10 +00:00

Author	SHA1	Message	Date
JD	20d07c4384	Fix remote weight info nnode>1 and dp>1 (#17389 )	2026-03-31 21:17:18 +08:00
Shangming Cai	ca2b2130ba	[PD] Tiny cleanup after KVReceiver refactor (#21760 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-03-31 21:07:57 +08:00
Yuan Luo	c7adca9992	Fix kimi-linear launch server error (#21752 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-31 21:07:08 +08:00
Ke Bao	dbc97456ad	Enable evict swa with piecewise cuda graph (#21754 )	2026-03-31 20:07:16 +08:00
weireweire	4455d17619	[PD] Refactor Disagg Conn and Fix Hang with total_request/total_tokens Balancing (#21299 ) Co-authored-by: Weiliangl User <weiliangl@login-node.hosted.internal>	2026-03-31 18:01:50 +08:00
R0CKSTAR	6c03ae6fe2	[diffusion] fix: fix typo (#21746 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-03-31 17:51:46 +08:00
xiaoqi	a6a8b9b376	bugfix(model):fix deepstack index out of range error (#21727 ) Co-authored-by: xiaoqi.31 <xiaoqi.31@jd.com>	2026-03-31 02:41:47 -07:00
Thomas Wang	5628e908ae	[AMD] Use tgemm.mm for MoEGate router gemm in deepseek_v2.py (#21657 )	2026-03-31 00:55:40 -07:00
xiazhahe	b4cb31f698	[NPU] fix conflict between empty_cache and use_mem_pool (#21507 )	2026-03-31 15:37:33 +08:00
Mohammad Miadh Angkad	dd9c9c1b8e	Add explicit disable flag for FlashInfer allreduce fusion (#21446 )	2026-03-31 00:15:44 -07:00
Yuhao Yang	68a4573627	[diffusion] fix: fix Flux.2 with tp(#21664 )	2026-03-31 14:14:59 +08:00
jacky.cheng	8ba992411d	[AMD] Fix CI multimodal-gen-test-1-gpu-amd for gen model (#21621 )	2026-03-30 23:02:20 -07:00
Jincong Chen	03e4f2858d	[Perf]Remove H2D for Qwen3.5 SpecV2 (#20864 )	2026-03-31 11:54:58 +08:00
Lewis	33e725b052	[Fix] Update supported custom_mem_pool types for mooncake (#21728 ) Co-authored-by: 百麒 <yaozhong.lyz@alibaba-inc.com>	2026-03-31 11:18:30 +08:00
Xiaoyu Zhang	505eb312ec	Revert "DeepSeek-R1-0528-w4a8: DeepEP Low Latency Dispatch Adopts FP8 Communication" (#21719 )	2026-03-31 10:22:01 +08:00
DarkSharpness	4e480982fa	[misc] multiprocess compilation to speed up test (#21483 )	2026-03-31 08:56:37 +08:00
kk	67c295b5f5	[AMD] fix performance regression issue when run gpt-oss with "--context-length 13824" (#21691 )	2026-03-30 16:30:16 -07:00
Zhai Feiyue	daf697afda	[AMD] Add SGLANG_DISAGGREGATION_NUM_PRE_ALLOCATE_REQS env var for configurable KV transfer overlap (#20410 ) Co-authored-by: HaiShaw <hixiao@gmail.com>	2026-03-30 14:37:16 -07:00
Aditya Sharma	d6029de6ad	[Bugfix][NPU] Skip FRACTAL_NZ format for MoE weights with unaligned dimensions (#21209 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-03-30 23:22:17 +03:00
Vedant V Jhaveri	4a9ffc3ab6	fix nemotron capture for non attention layers (#21436 )	2026-03-30 12:50:49 -07:00
Yuxuan Zhang	ad064c2f4e	[GLM-V and GLM-4.7] Cast to FP32 before gate projection for GLM model. (#21660 )	2026-03-30 12:25:27 -07:00
Makcum888e	f4b0e9c64a	[diffusion] [NPU] support ring attention on NPU with FA (#21383 )	2026-03-30 20:10:55 +03:00
GXIN	752d260c77	[NPU][diffusion]: support parallel decoding of qwen-image (#20757 ) Co-authored-by: 高鑫 <gaoxin@gaoxindeMacBook-Pro.local>	2026-03-30 20:03:24 +03:00
cen121212	ba6d54d0f0	[NPU] GLM-5 optimize with fused kernels (#18617 )	2026-03-30 22:48:15 +08:00
xieminghe1	7119d59747	DeepSeek-R1-0528-w4a8: DeepEP Low Latency Dispatch Adopts FP8 Communication (#14162 ) Co-authored-by: undefined <zhouchen.arrebol@jd.com>	2026-03-30 22:27:28 +08:00
heziiop	673ffb3116	[NPU] fix eagle3 accept rate (#21255 )	2026-03-30 21:58:25 +08:00
GXIN	c5c58c3349	[NPU][Diffusion] fix sp modulate for qwen-image-edit (#20974 ) Co-authored-by: 高鑫 <gaoxin@gaoxindeMacBook-Pro.local>	2026-03-30 16:18:48 +03:00
Mick	0a1fb42869	[diffusion] CI: relax pr-test threshold (#21682 )	2026-03-30 20:23:46 +08:00
Mick	b76730701b	[diffusion] feat: enhance overlay mechanism (#21648 )	2026-03-30 19:45:34 +08:00
LiYomi	1d6424d5ad	fix: Mistral Small 4 fails to start due to config/weight format mismatch (#21620 ) Co-authored-by: mengxiancheng03 <mengxiancheng03@kuaishou.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 01:57:35 -07:00
strgrb	b246269444	fix mamba cache leak when adder fails to add a matched req. (#21404 )	2026-03-30 16:45:49 +08:00
Baizhou Zhang	62a63eeff7	[Fix] Fix weight_loader property assignment for qwen3-next FP8 models (#21662 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 01:35:59 -07:00
Hubert Lu	e6071e60c0	[AMD] Support AMD MXFP4 Qwen3.5-397B-A17B model (#21234 )	2026-03-30 01:14:18 -07:00
kk	b9a68c304e	[AMD] Fused rope kv store (#21315 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2026-03-30 00:05:41 -07:00
blzheng	ed01e1d5d6	[CPU] add kernel apply_rotary_pos_emb_cpu for Qwen3-VL and Qwen3-Omni (#13121 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-03-29 23:43:46 -07:00
Aishwarya Ramasethu	c32ee48886	MFU metrics in Prometheus (#19395 )	2026-03-29 23:40:06 -07:00
Polisetty V R K Jyothendra Varma	f0303fd07e	[Intel GPU] Enable DeepSeek R1 inference on XPU (#18461 ) Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>	2026-03-29 22:35:59 -07:00
Feng Su	9b4dd27478	[Fix] Fix Qwen3.5 MoE model loading and Mamba cache sharding in PP mode (#21448 ) Co-authored-by: zhangxiaolei123456 <zhangxiaolei.666@bytedance.com>	2026-03-30 11:57:26 +08:00
Liangsheng Yin	c06ca1526c	Fix circular reference in CustomTestCase.__init_subclass__ (#21650 ) Co-authored-by: wan4ch <wan4ch@gmail.com>	2026-03-29 20:38:12 -07:00
Lianmin Zheng	9f7792415a	Clean up TokenizerManager: remove dead code and improve rid validation (#21639 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 15:12:49 -07:00
Lianmin Zheng	f3970b17ef	[Cleanup] Remove unused BatchMultimodalOutput and BatchMultimodalDecodeReq (#21640 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 14:54:25 -07:00
Lianmin Zheng	1d9c8e8c9e	Simplify routed experts test and move base64 encoding to tokenizer manager (#21634 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 12:44:01 -07:00
Mohammad Miadh Angkad	2acdda1d85	[Fix] Remove redundant allreduce fusion block and skip TP=1 (#20621 )	2026-03-29 12:30:40 -07:00
wili	bda94fc779	[Fix] SGLANG_USE_CUDA_IPC_TRANSPORT=1 and SGLANG_ENABLE_MM_SPLITTING=1 do not work at the same time. (#19915 )	2026-03-30 01:15:26 +08:00
saatwiknagpal	d2440dcf58	[VLM] perf: optimize CUDA IPC for multimodal transfer by caching IPC pool handles (#21418 )	2026-03-30 00:20:38 +08:00
wili	5bb9ca0e63	[Feature] Optimizations for JPEG input on NVIDIA GPU (#19749 )	2026-03-30 00:06:14 +08:00
Bi Xue	42c46e6334	[sgl] disable piecewise cuda graph when a model doesn't have layers (#21565 )	2026-03-29 23:04:20 +08:00
Hanlin Bi	aa9177152e	fix cuda graph capturing error in sm120 mxfp8 triton path (#19835 )	2026-03-29 01:59:24 -07:00
Liangsheng Yin	fec9961a1f	Clean up _wait_for_scheduler_ready implementation (#21626 )	2026-03-29 01:02:33 -07:00
psaab	d2fa8d67ba	Wrap IPv6 addresses in gRPC, bench_serving, and log messages (#21236 ) Co-authored-by: hnyls2002 <lsyincs@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2026-03-29 00:36:31 -07:00

... 9 10 11 12 13 ...

7855 Commits