sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 20:27:57 +00:00

Author	SHA1	Message	Date
Qiaolin Yu	d8db3077ca	Fix draft extend cuda graph when spec_step=1 (#21709 )	2026-03-31 18:29:56 -07:00
Liangsheng Yin	e4c565f2f2	[Misc] Tiny: Add test network timeouts and dynamic max-parallel for 5090/2-gpu runners (#21800 )	2026-03-31 18:27:39 -07:00
Chang Su	1389962f06	[gRPC] Preserve original ImportError in grpc_server.py (#21801 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-03-31 18:22:29 -07:00
Brayden Zhong	6a9b09847c	CUTLASS NVFP4 GEMM improvement of SM120 (#21314 )	2026-04-01 09:04:34 +08:00
Johnsonms	5bbf347bb3	[jit_kernel] Optimize fused_qknorm_rope: deduplicate sincosf for interleave RoPE (#21654 )	2026-04-01 09:04:13 +08:00
Xiaoyu Zhang	cdd7d6a227	Remove obsolete sgl-kernel legacy paths (#21528 )	2026-04-01 09:00:20 +08:00
Liangsheng Yin	a8759dd9af	Fix killall.py crash when sglang is not yet installed (#21797 )	2026-03-31 17:40:58 -07:00
Liangsheng Yin	7581d814ae	Add CompletionSampler for non-chat eval in run_eval (#21785 )	2026-03-31 16:33:07 -07:00
Yilong Zhao	1f7cee81da	[moe] add customized option to moe-a2a-backend (#21786 )	2026-03-31 16:32:47 -07:00
Baizhou Zhang	f60f2ccc10	[Fix] Fall back to triton MOE for GPT-OSS on Blackwell with driver >= 595 (#21780 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 15:52:10 -07:00
weireweire	9191b02eda	Fix cuda graph max bs capture upper bound (#21005 )	2026-03-31 15:20:56 -07:00
Ethan (Yusheng) Su	3c91ebdf55	[2/n] lora - Shared outer experts and support qwen3_30b_a3b_instruct (#21466 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-03-31 14:06:23 -07:00
Liangsheng Yin	f4505e2ee3	Fix ineffective is_base_mistral CI patch for HF API rate limiting (#21729 )	2026-03-31 12:54:34 -07:00
Trevor Morris	b91f78d255	[bugfix] Fix rope theta config for MiniMax after transformers v5 update (#21241 )	2026-03-31 11:37:03 -07:00
Michael	8d919bbd44	[AMD] Fix Handle missing rope_theta in get_rope_config for Grok-1 (#21518 )	2026-03-31 10:58:12 -07:00
Zhangheng	91048b2a8e	[HiMambaTree]: Optimize mamba host lock mechanism (#21750 )	2026-03-31 21:52:24 +08:00
R0CKSTAR	e67dbf257a	[diffusion] fix: fix Wan2.2-I2V-A14B video max size issue(#21390 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-31 21:49:53 +08:00
Mick	7790645b82	[diffusion] UX: replace deprecated ORJSONResponse with orjson_response (#21755 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 21:41:33 +08:00
JD	20d07c4384	Fix remote weight info nnode>1 and dp>1 (#17389 )	2026-03-31 21:17:18 +08:00
Shangming Cai	ca2b2130ba	[PD] Tiny cleanup after KVReceiver refactor (#21760 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-03-31 21:07:57 +08:00
Yuan Luo	c7adca9992	Fix kimi-linear launch server error (#21752 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-31 21:07:08 +08:00
Ke Bao	dbc97456ad	Enable evict swa with piecewise cuda graph (#21754 )	2026-03-31 20:07:16 +08:00
weireweire	4455d17619	[PD] Refactor Disagg Conn and Fix Hang with total_request/total_tokens Balancing (#21299 ) Co-authored-by: Weiliangl User <weiliangl@login-node.hosted.internal>	2026-03-31 18:01:50 +08:00
R0CKSTAR	6c03ae6fe2	[diffusion] fix: fix typo (#21746 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-03-31 17:51:46 +08:00
xiaoqi	a6a8b9b376	bugfix(model):fix deepstack index out of range error (#21727 ) Co-authored-by: xiaoqi.31 <xiaoqi.31@jd.com>	2026-03-31 02:41:47 -07:00
Thomas Wang	5628e908ae	[AMD] Use tgemm.mm for MoEGate router gemm in deepseek_v2.py (#21657 )	2026-03-31 00:55:40 -07:00
xiazhahe	b4cb31f698	[NPU] fix conflict between empty_cache and use_mem_pool (#21507 )	2026-03-31 15:37:33 +08:00
Mohammad Miadh Angkad	dd9c9c1b8e	Add explicit disable flag for FlashInfer allreduce fusion (#21446 )	2026-03-31 00:15:44 -07:00
Yuhao Yang	68a4573627	[diffusion] fix: fix Flux.2 with tp(#21664 )	2026-03-31 14:14:59 +08:00
jacky.cheng	8ba992411d	[AMD] Fix CI multimodal-gen-test-1-gpu-amd for gen model (#21621 )	2026-03-30 23:02:20 -07:00
Jincong Chen	03e4f2858d	[Perf]Remove H2D for Qwen3.5 SpecV2 (#20864 )	2026-03-31 11:54:58 +08:00
Lewis	33e725b052	[Fix] Update supported custom_mem_pool types for mooncake (#21728 ) Co-authored-by: 百麒 <yaozhong.lyz@alibaba-inc.com>	2026-03-31 11:18:30 +08:00
Xiaoyu Zhang	505eb312ec	Revert "DeepSeek-R1-0528-w4a8: DeepEP Low Latency Dispatch Adopts FP8 Communication" (#21719 )	2026-03-31 10:22:01 +08:00
DarkSharpness	4e480982fa	[misc] multiprocess compilation to speed up test (#21483 )	2026-03-31 08:56:37 +08:00
kk	67c295b5f5	[AMD] fix performance regression issue when run gpt-oss with "--context-length 13824" (#21691 )	2026-03-30 16:30:16 -07:00
Zhai Feiyue	daf697afda	[AMD] Add SGLANG_DISAGGREGATION_NUM_PRE_ALLOCATE_REQS env var for configurable KV transfer overlap (#20410 ) Co-authored-by: HaiShaw <hixiao@gmail.com>	2026-03-30 14:37:16 -07:00
Aditya Sharma	d6029de6ad	[Bugfix][NPU] Skip FRACTAL_NZ format for MoE weights with unaligned dimensions (#21209 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-03-30 23:22:17 +03:00
Vedant V Jhaveri	4a9ffc3ab6	fix nemotron capture for non attention layers (#21436 )	2026-03-30 12:50:49 -07:00
Yuxuan Zhang	ad064c2f4e	[GLM-V and GLM-4.7] Cast to FP32 before gate projection for GLM model. (#21660 )	2026-03-30 12:25:27 -07:00
Makcum888e	f4b0e9c64a	[diffusion] [NPU] support ring attention on NPU with FA (#21383 )	2026-03-30 20:10:55 +03:00
GXIN	752d260c77	[NPU][diffusion]: support parallel decoding of qwen-image (#20757 ) Co-authored-by: 高鑫 <gaoxin@gaoxindeMacBook-Pro.local>	2026-03-30 20:03:24 +03:00
cen121212	ba6d54d0f0	[NPU] GLM-5 optimize with fused kernels (#18617 )	2026-03-30 22:48:15 +08:00
xieminghe1	7119d59747	DeepSeek-R1-0528-w4a8: DeepEP Low Latency Dispatch Adopts FP8 Communication (#14162 ) Co-authored-by: undefined <zhouchen.arrebol@jd.com>	2026-03-30 22:27:28 +08:00
heziiop	673ffb3116	[NPU] fix eagle3 accept rate (#21255 )	2026-03-30 21:58:25 +08:00
GXIN	c5c58c3349	[NPU][Diffusion] fix sp modulate for qwen-image-edit (#20974 ) Co-authored-by: 高鑫 <gaoxin@gaoxindeMacBook-Pro.local>	2026-03-30 16:18:48 +03:00
Mick	0a1fb42869	[diffusion] CI: relax pr-test threshold (#21682 )	2026-03-30 20:23:46 +08:00
Mick	b76730701b	[diffusion] feat: enhance overlay mechanism (#21648 )	2026-03-30 19:45:34 +08:00
LiYomi	1d6424d5ad	fix: Mistral Small 4 fails to start due to config/weight format mismatch (#21620 ) Co-authored-by: mengxiancheng03 <mengxiancheng03@kuaishou.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 01:57:35 -07:00
strgrb	b246269444	fix mamba cache leak when adder fails to add a matched req. (#21404 )	2026-03-30 16:45:49 +08:00
Baizhou Zhang	62a63eeff7	[Fix] Fix weight_loader property assignment for qwen3-next FP8 models (#21662 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 01:35:59 -07:00

1 2 3 4 5 ...

7373 Commits