sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 20:27:57 +00:00

Author	SHA1	Message	Date
Lianmin Zheng	fb2e816e83	Fix server args for gpt oss so users can override the moe runner backend (#12696 )	2025-11-05 11:36:59 -08:00
bigmoyan	508d2f7aa2	add Kimi k2 reasoning parser (#12702 ) Signed-off-by: wangzhengtao <wangzhengtao@msh.team>	2025-11-06 00:37:54 +08:00
Yuxuan Zhang	a889c85459	[Grammar Fix] GLM-4-MOE self.first_k_dense_replace is undefined. (#12455 )	2025-11-06 00:03:45 +08:00
Yuhong Guo	4d84f886e7	Refactor `--debug-tensor-dump-layers` to list (#12691 )	2025-11-05 03:30:01 -08:00
yinghui	dc4f541823	fix trtllm_mla attention backend when disabling cuda graph. (#12687 )	2025-11-05 01:35:02 -08:00
zejunchen-zejun	0648eb482d	[Profiler] Add SGLANG_PROFILE_RECORD_SHAPES for recording shapes when profiling (#11641 ) Signed-off-by: zejunchen-zejun <zejun.chen@amd.com> Co-authored-by: HAI <hixiao@gmail.com>	2025-11-04 23:41:46 -08:00
yinghui	b88fab3111	fix: add `seed` bench_serving to cache key, remove redundant function definition. (#12680 )	2025-11-04 23:39:11 -08:00
Glen Liu	cbf23dbbfa	[Feature] add --lora-request-distribution arg to bench_serving.py and support skewed and distinct workloads (#12175 )	2025-11-04 21:41:40 -08:00
ai-easy-cpu	48641435d6	fix typo of args description in sglang.profiler (#12486 ) Co-authored-by: AI-bot-easy <litchys0123@outlook.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-04 20:15:13 -08:00
Liangsheng Yin	44b1b394a4	[PD-Disagg] Check finish after pop tranferred (#12638 )	2025-11-05 11:18:09 +08:00
Kaixi Hou	0711d1509b	[NVIDIA] Fix cutedsl backend of MoE (#12353 )	2025-11-04 18:54:55 -08:00
sglang-bot	09938e1f82	chore: bump SGLang version to 0.5.4.post3 (#12639 )	2025-11-04 18:32:11 -08:00
Nicolas Castet	2340798353	Register allgather/reducescatter buffers with symm memory (#12572 )	2025-11-04 17:11:36 -08:00
soaringk	44da737770	[fix] Handle escaped characters in GLM tool call parser to prevent double serialization (#12456 )	2025-11-04 16:48:14 -08:00
Baizhou Zhang	d22d044734	Revert "Enable memory saver for hybrid model" (#12648 )	2025-11-04 16:22:06 -08:00
Kaixi Hou	34f7564df0	[NVIDIA] Fix wrong symmetric sizes for fp4 cases (#12640 )	2025-11-04 14:19:37 -08:00
Johnsonms	1cfbbc42d8	[Bug] Fix NSA Backend KV-Buffer Shape Mismatch in DeepSeek-V3.2 (#12645 )	2025-11-04 13:57:32 -08:00
Lianmin Zheng	55dfb539cf	[Auto Sync] Update scheduler_metrics_mixin.py, collector.py (20251104) (#12647 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-11-04 13:56:14 -08:00
Baizhou Zhang	42889acbd0	[hotfix] Fix deepep w4a8 bug (#12642 )	2025-11-04 13:55:59 -08:00
Trevor Morris	211f4070e5	fix: Lazy import mooncake-ep to fix extra gpu contexts being created (#12641 )	2025-11-04 12:28:36 -08:00
Liangsheng Yin	befa41a152	Fix `output_ids` inconsistency (#12628 )	2025-11-05 01:43:08 +08:00
Liangsheng Yin	30b26ee9d0	Add io struct naming check back (#12634 )	2025-11-05 01:15:01 +08:00
Liangsheng Yin	aa797d013d	[Test] Merge all constrained decoding tests. (#12633 )	2025-11-05 00:43:06 +08:00
Ke Bao	7cee07a067	Fix skip layer in get_quant_method (#12632 )	2025-11-04 23:27:46 +08:00
Yuan Luo	bb517fe393	[HotFix] Disable torch dynamo for mrope_triton kernel (#12593 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-11-04 23:26:56 +08:00
fzyzcjy	ff0b64e1e6	Ensure GPU work is finished when release memory occupation call is finished (#12592 )	2025-11-04 18:01:27 +08:00
Liangsheng Yin	0678beaaee	[sepc-v2] Fix imcompatibility with constrained decoding (#12615 )	2025-11-04 17:27:31 +08:00
Minglei Zhu	c14cc47e39	[Deterministic] Optimize bmm_batch_invariant op (#12522 )	2025-11-04 00:33:31 -08:00
Trevor Morris	dbcf85b7f0	Add --speculative-moe-runner-backend server arg (#10183 )	2025-11-04 00:20:56 -08:00
Zhao Chen	d5fa019c36	feat: limit peak memory usage when computing logprobs (#6318 ) Signed-off-by: Zhao Chen <zhaochen.zju@gmail.com> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2025-11-03 23:53:20 -08:00
Junrong Lin	173e0f704f	Enable memory saver for hybrid model (#11974 )	2025-11-04 14:55:26 +08:00
Lianmin Zheng	f600866a44	Improve the metrics for PD (#12580 ) Co-authored-by: Kan Wu <wukanustc@gmail.com> Co-authored-by: cctry <shiyang@x.ai>	2025-11-03 22:10:57 -08:00
ishandhanani	93be7e863e	fix: respect `--ignore-eos` in PD case for benchmarking (#12597 )	2025-11-03 21:44:14 -08:00
fzyzcjy	60b0754cc9	Tiny fix ExpertDistributionReq error (#11760 )	2025-11-04 13:39:25 +08:00
Zhao Chen	0b24af4d79	test: support return logprobs in bench_offline_throughput test (#12462 ) Signed-off-by: Zhao Chen <zhaochen.zju@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-03 21:38:48 -08:00
Jonah Bernard	a209fb05c1	[Qwen3 VL] Add LoRA support for Qwen 3 VL (#12165 )	2025-11-03 20:32:54 -08:00
Hanming Lu	48d6bea1ea	[GDN/SWA] mamba and swa radix cache edge case fix (#12111 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-11-04 11:03:37 +08:00
akhilg-nv	e607850fcf	Enable mixed type LayerNorm kernel for NSA indexer (#12044 )	2025-11-03 16:50:41 -08:00
Lianmin Zheng	243c064df2	Remove the dependency of nccl.h in symmetric memory (#12571 )	2025-11-03 16:11:00 -08:00
b8zhong	d31d48b341	update usage of `trtllm_fp8_per_tensor_scale_moe` (#12569 )	2025-11-03 14:25:32 -08:00
fzyzcjy	8834260739	Super tiny dump server info such as args in bench for post analysis (#12550 )	2025-11-03 14:24:08 -08:00
fzyzcjy	fd7a72d62d	Super tiny allow profile activities in bench_serving (#12549 )	2025-11-03 14:23:18 -08:00
Yi Zhang	21a8fa16ea	tiny optimize for bench serving (#12553 )	2025-11-03 14:13:18 -08:00
Lianmin Zheng	7a21d8b276	Reduce the overhead of nccl symmetric memory (#12524 ) Co-authored-by: Nicolas Castet <ncastet@nvidia.com>	2025-11-03 11:56:27 -08:00
Jonah Bernard	6ef23b9833	[Test] Add parameters to SRTRunner (#12227 )	2025-11-03 11:20:56 -08:00
fzyzcjy	385599cb04	Fix error when calling quantization (#12548 )	2025-11-03 10:17:43 -08:00
Yueyang Pan	952fbe47cb	fix: fix the bug which leads qwen2_5_vl to crash with mixed_chunk (#11330 ) Signed-off-by: PanJason <pyyjason@gmail.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>	2025-11-03 09:26:03 -08:00
Liangsheng Yin	edb2569356	[hot-fix] Fix broken CI (#12564 )	2025-11-04 00:03:25 +08:00
Liangsheng Yin	3529c061bb	[spec v2] Fix output repetition by speculative sampling error (#12561 )	2025-11-03 23:00:17 +08:00
harrisonlimh	ffb32a8548	Conditionally recapture cuda graph after model weight update from disk (#12060 )	2025-11-03 05:51:27 -08:00

1 2 3 4 5 ...

4365 Commits