sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-02 04:37:14 +00:00

Author	SHA1	Message	Date
Yi Zhang	21a8fa16ea	tiny optimize for bench serving (#12553 )	2025-11-03 14:13:18 -08:00
Lianmin Zheng	7a21d8b276	Reduce the overhead of nccl symmetric memory (#12524 ) Co-authored-by: Nicolas Castet <ncastet@nvidia.com>	2025-11-03 11:56:27 -08:00
Jonah Bernard	6ef23b9833	[Test] Add parameters to SRTRunner (#12227 )	2025-11-03 11:20:56 -08:00
fzyzcjy	385599cb04	Fix error when calling quantization (#12548 )	2025-11-03 10:17:43 -08:00
Yueyang Pan	952fbe47cb	fix: fix the bug which leads qwen2_5_vl to crash with mixed_chunk (#11330 ) Signed-off-by: PanJason <pyyjason@gmail.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>	2025-11-03 09:26:03 -08:00
Liangsheng Yin	edb2569356	[hot-fix] Fix broken CI (#12564 )	2025-11-04 00:03:25 +08:00
Liangsheng Yin	3529c061bb	[spec v2] Fix output repetition by speculative sampling error (#12561 )	2025-11-03 23:00:17 +08:00
harrisonlimh	ffb32a8548	Conditionally recapture cuda graph after model weight update from disk (#12060 )	2025-11-03 05:51:27 -08:00
Atream	14d8064803	fix: Fix KTransformers hybrid inference with int8 quantization and format (#12536 )	2025-11-03 04:59:39 -08:00
yinghui	de0b10cf5c	fix: move dummy format loader check before quantization checks (#12532 )	2025-11-02 23:41:30 -08:00
Baizhou Zhang	6e29446e45	[hotfix] Remove flashinfer-jit-cache from pyproject (#12530 )	2025-11-02 22:11:05 -08:00
Yineng Zhang	0c3543d7d5	chore: upgrade flashinfer 0.5.0 (#12523 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-11-02 20:54:12 -08:00
Haian Huang(深度眸)	65f1d065c5	[Bug] Fix Intern-S1 model accuracy and support /generate interface with input_ids (#12367 )	2025-11-02 20:22:33 -08:00
Johnsonms	9434a0e50f	[Refact] Remove hardcoded KV cache dimension in MLATokenToKVPool (#12502 )	2025-11-02 19:49:53 -08:00
Lianmin Zheng	20315697f4	move all get_stream in sgl_kernel to c++ to reduce the launch overhead (#12521 )	2025-11-02 13:15:05 -08:00
fzyzcjy	c9db79117f	Super tiny fix naming in bench serving scripts (#12515 )	2025-11-02 12:43:10 -08:00
Hanming Lu	66fb9b1307	[ServerArgs] allow --mamba-ssm-dtype extend (#12481 )	2025-11-02 11:50:04 -08:00
Yuan Luo	819fc59123	Add prefix for torch symm mem (#12506 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-11-02 11:23:05 -08:00
kousakawang	7efd8b3d1f	[FEAT] Shared mem pool based cuda ipc for multi-modal data transport (#11917 ) Co-authored-by: kousakawang <wanghanpei@bytedance.com> Co-authored-by: Yuan Luo <4908075+yuan-luo@users.noreply.github.com>	2025-11-02 16:46:37 +08:00
Ho-Ren (Jack) Chuang	76196b3cbf	feat: Add FP4 (E2M1) KV Cache Support with Quantization Utilities for MLA (#10078 ) Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com> Co-authored-by: Yichen Wang <yichen.wang@bytedance.com>	2025-11-01 22:24:58 -07:00
Binyao Jiang	3451fc3280	[Feature] Qwen3-Next & FLA: Support MTP topk>1; Up to 6% faster (#11133 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-11-01 19:47:56 -07:00
Zhihao Lyu	c550ab9125	[Ascend] Add Ascend NPU support for sglang.check_env & rework proposal (#11052 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2025-11-01 19:26:45 -07:00
Xun Sun	0afd68321b	Update Mooncake EP's a2a interface (#12391 )	2025-11-01 18:48:47 -07:00
Johnsonms	6f858930c8	[Bug] test_flashattn_mla_backend errors in Hopper #12487 (#12488 )	2025-11-01 18:28:06 -07:00
hzh0425	6b634493c3	[HICache / PD]: Support offloading incremental KV cache in decode side. (#11966 )	2025-11-01 14:59:37 -07:00
Xinyuan Tong	d2a8f71c2f	[feat] Add SGLANG_TOOL_STRICT_LEVEL for tool-call behavior control (#12423 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-11-01 13:15:02 -07:00
Ke Bao	69193f7122	Filter tokenizer warning for kimi models (#12485 )	2025-11-01 16:27:31 +08:00
yinghui	d5b6e50fe8	perf: trtllm mla performance minor improvements (#12435 )	2025-10-31 22:48:02 -07:00
Liangsheng Yin	9632e48f5d	[hot fix] Remove `from python.sglang.xxx` (#12483 )	2025-11-01 11:00:05 +08:00
Qiaolin Yu	59cce5941a	Use sgl fp4 quant kernel by default (#12482 )	2025-10-31 19:51:28 -07:00
Surya-Gunukula	795e98f8a6	Forward unknown tool calls instead of dropping (#12226 )	2025-11-01 02:10:35 +00:00
Shangming Cai	358ae3563d	Tiny fix eos handling for PD disaggregation (#12334 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-31 17:57:10 -07:00
sglang-bot	41c10e67fc	chore: bump SGLang version to 0.5.4.post2 (#12439 )	2025-10-31 17:38:50 -07:00
Xinyuan Tong	0bfe1d145c	fa3 & trtllm_mha spec overlap (#11874 )	2025-10-31 17:38:13 -07:00
Ke Bao	a4bf5c6ad2	Support Kimi Linear (#12469 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-10-31 14:03:35 -07:00
fzyzcjy	30ad107028	Try to allow NCCL cumem for multi node nvlink case (#11987 )	2025-10-31 12:48:25 -07:00
Ke Bao	f7f9e41b36	Fix run benchmark (#12473 )	2025-11-01 02:39:48 +08:00
ishandhanani	263eab9f5d	fix: dummy health check server not accessible on non-zero rank nodes (#12297 )	2025-10-31 11:34:57 -07:00
fzyzcjy	25257d8e00	Tiny assert no running requests when releasing memory to avoid IMA (#12341 )	2025-11-01 01:28:53 +08:00
daniel, chen	cf0c24150a	add served model name in bench serving (#12428 )	2025-11-01 01:28:11 +08:00
huangtingwei	5538e05cb1	fix default env var for mooncake store (#12429 )	2025-11-01 01:25:33 +08:00
Yuan Luo	c30ebb9300	[VLM] Optimize async mm data process mechanism (#12066 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-11-01 01:24:53 +08:00
ykcombat	41efcaeb45	[Feature] PD-Multiplexing Context and Scheduler, lazy import spatial. (#12275 )	2025-11-01 00:40:01 +08:00
0xNullPath	70562969b9	[Bug] OOM (Out-of-Memory) errors for extreme testing scenarios (min_tokens=2) (#11757 ) Signed-off-by: Yan Lu <luyan@nvidia.com>	2025-11-01 00:28:41 +08:00
Ke Bao	0095e01874	Fix lint in deepseek-ocr (#12470 )	2025-11-01 00:08:19 +08:00
Xinyuan Tong	684864814b	Feat: deepseek-ocr logits processor (#12415 ) Co-authored-by: xinyuant <xinyuant@usc.edu>	2025-10-31 23:35:22 +08:00
sjtu_shenhai	410225b719	[Bug fix] Fix severe memory waste issue with torch.empty pin_memory (#12266 )	2025-10-31 21:30:37 +08:00
Liangsheng Yin	2c9aebea70	Simplify watchdog (#12463 )	2025-10-31 21:17:38 +08:00
Kindyaa	bc741073a3	fix:watchdog thread exception (#12328 )	2025-10-31 20:54:50 +08:00
Yuhong Guo	2f6af1a3de	Enable bailing_moe to support TP=16 (#12369 )	2025-10-31 19:32:49 +08:00

1 2 3 4 5 ...

4323 Commits