sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 12:17:09 +00:00

Author	SHA1	Message	Date
Binyao Jiang	cf0478d602	[Glm46v] Bug fix for accuracy drop and unable to launch server (#14585 ) Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>	2025-12-07 23:45:02 -08:00
Tiwei Bie	36361adcbf	[DLLM] Add initial cuda graph support (#14203 )	2025-12-08 14:12:35 +08:00
Qiaolin Yu	661e9775d0	[2/2] Add rope kernel in sgl-kernel (#14452 )	2025-12-07 21:37:29 -08:00
Nicholas	f57d4fe78e	[feat] use cachebuffer to store mm feature to speedup hash (#14386 )	2025-12-08 10:35:20 +08:00
wentx	b7b7524e95	[Tool Call] Fix DeepSeekV32Detector skipping functions with no params in streaming mode (#14573 )	2025-12-07 18:32:10 -08:00
Xiaoyu Zhang	03b835e7d1	Refactor tuning block wise kernel and opt Qwen/Qwen3-VL-32B-Instruct-FP8 (#14141 )	2025-12-08 09:24:58 +08:00
b8zhong	3b47973af8	[CI] Tiny speed up VLM CI (#14517 ) Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>	2025-12-07 13:30:41 -08:00
Hudson Xing	84efe54bc4	Fix FP8 KV Triton type issue and add regression test (#14553 )	2025-12-07 10:51:46 -08:00
khalilzhk	948b6acee8	[BugFix] fix prefixcache performance and accuracy on ascend (#13573 )	2025-12-08 02:16:20 +08:00
Vladimir Serov	f124539a01	[NPU]LoRA: Adding Torch Native backend (#14132 )	2025-12-08 02:16:07 +08:00
AichenF	c8683ae305	[diffusion] cli: profiling utilities support (#14185 ) Co-authored-by: jianyingzhu <53300651@qq.com> Co-authored-by: Jianying <53503712+jianyingzhu@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-08 00:59:45 +08:00
Liangsheng Yin	125e17efd5	Add small model test for spec v2 + dp + trtllm_mla (#14576 )	2025-12-07 23:55:00 +08:00
Liangsheng Yin	88c459c6c8	Tiny remove wrong import from `python.sglang` (#14577 )	2025-12-07 22:07:49 +08:00
Xiaoyu Zhang	ae6a6630e4	Add Expert Parallelism (EP) support for kimi-k2-thinking (#13725 )	2025-12-07 20:28:57 +08:00
Yuan Luo	26d95008b6	[apply][2/2] Fused qk_norm_rope for Qwen3-MoE (#13998 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-12-07 20:25:18 +08:00
Tiwei Bie	9abcab3ffa	[DLLM] feat: Add threshold based parallel decoding support (#14412 ) Co-authored-by: Jinwei Yao <jinweiy@illinois.edu> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2025-12-07 18:25:33 +08:00
Alison Shao	41d61faa99	[FLA] Add explicit kernel arguments to kda.py for Kimi Linear support (#14561 )	2025-12-06 22:45:41 -08:00
Chen1022	3c7886ec4c	Fix attention backend logic for Qwen3-Next on SM100 (#14560 )	2025-12-06 22:03:34 -08:00
b8zhong	6d5d76ad97	remove unecessary dual stream token threshold from the rest of models (qwen moe, kimi linear, etc.) (#14337 )	2025-12-06 19:57:26 -08:00
Rain H	32a32cf7d2	Enhance prefill PP node robustness (#14494 )	2025-12-06 18:00:54 -08:00
Minglei Zhu	be4a3ec376	support piecewise cuda graph for Olmo models (#14476 )	2025-12-06 17:57:44 -08:00
almaslof	ff6e3ea934	[docs] Add missing word in argument description (#14205 )	2025-12-06 17:56:54 -08:00
sglang-bot	d2b42477c7	chore: bump sgl-kernel version to 0.3.18.post3 (#14518 )	2025-12-06 13:15:16 -08:00
Baizhou Zhang	9dfa01a435	[Misc]Register and refactor some environs for dpsk-fp4 and DeepEp (#14538 )	2025-12-06 12:29:16 -08:00
Hanming Lu	e592ee6545	[Qwen3-next] remove heuristics and add radix cache kl test (#14520 )	2025-12-06 12:11:40 -08:00
Baizhou Zhang	bc388471d2	[1/n] Fix hanging during DeepGemm Warmup (#14493 )	2025-12-06 10:44:02 -08:00
gongwei-130	3e40c63674	fix "GrammarMatcher has terminated after accepting the stop token, but is trying to find the next token mask" when both reasoning and spec are enabled (#14464 )	2025-12-06 06:15:22 -08:00
WenhaoZhang	80122e4f4c	[diffusion] lora: fix LoRA dtype handling and weight attribute access for z-image model (#14543 ) Co-authored-by: niehen6174 <nihen6174@gmail.com>	2025-12-06 22:14:44 +08:00
Xiaoyu Zhang	6d41791823	[diffusion] perf: add QKV fusion optimization for Flux models (#14505 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-06 20:44:16 +08:00
Mick	35a9a07370	[diffusion] refactor: simplify sampling params' override logic (#14539 )	2025-12-06 20:23:49 +08:00
Rain Jiang	ea177372bd	support mtp with deepseek r1 nvfp4 model (#13115 ) Co-authored-by: Trevor Morris <tmorris@nvidia.com>	2025-12-06 00:45:54 -08:00
Baizhou Zhang	42fcf5438f	Revert "tiny remove deprecated endpoint call" (#14533 )	2025-12-05 23:48:54 -08:00
Mick	d881f31488	[diffusion] chore: temporarily upgrade diffusers to make Z-image compatible with Cache-DiT (#14530 )	2025-12-06 12:39:37 +08:00
Vincent Zhong	2ac5b98395	fix: fix rmsnorm -> layernorm in qwen3 omni (#11791 ) Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>	2025-12-06 12:12:57 +08:00
Alison Shao	b988c18eae	Fix safetensors validation to catch corruption after download (#14465 )	2025-12-05 16:04:00 -08:00
fzyzcjy	3d1b591aa1	Tiny use trtllm_mha as default when possible (#14291 )	2025-12-05 14:26:03 -08:00
b8zhong	ec7b2c16d9	tiny remove deprecated endpoint call (#13607 )	2025-12-05 09:54:49 -08:00
Hudson Xing	38daa29466	Add fused FP8 KV cache write kernel for TRTLLM MHA backend (#14093 ) Co-authored-by: Qiaolin Yu <liin1211@outlook.com>	2025-12-06 00:53:55 +08:00
blahblah	66984a8b3d	[diffusion] feat: support cache-dit integration (#14234 ) Co-authored-by: shuxiguo <shuxiguo@meituan.com> Co-authored-by: DefTruth <qiustudent_r@163.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-06 00:52:22 +08:00
roikoren755	889b46ea50	[Spec] Mamba2 support in target models (#13434 )	2025-12-06 00:50:46 +08:00
Mick	a89045603b	[diffusion] chore: set allowing overriding protected fields of sampling params as default behavior (#14471 )	2025-12-06 00:22:42 +08:00
Alison Shao	662809874c	Add Mistral Large 3 to nightly CI tests (#14459 )	2025-12-05 23:16:27 +08:00
elvischenv	205f041e96	Add Mistral Large 3 Eagle Support (#14466 ) Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-12-05 23:11:41 +08:00
Simo Lin	7235a7fbe9	[misc] add model arch and type to server info and use it for harmony (#14456 )	2025-12-05 06:51:00 -08:00
Yuxuan Zhang	8fce9e7b2a	support GLM-V vision model dp (#14097 )	2025-12-05 21:03:54 +08:00
Xiaoyu Zhang	5347732219	[diffusion] fix: Fix profiler trace missing Python stack in diffusion pipeline (#14499 )	2025-12-05 12:12:35 +00:00
roikoren755	2ce121a1c3	Enable RadixCache for Mamba2 models (#13584 )	2025-12-05 18:23:58 +08:00
WenhaoZhang	35ba6fe19e	[diffusion] fix: fix CLIP text encoder attention mask not used (#14364 ) Co-authored-by: niehen6174 <niehen.6174@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-05 16:30:10 +08:00
GMI Xiao Jin	7c744d137d	[diffusion] cli: add argument --adjust-frames and --override-protected-fields (#13996 ) Co-authored-by: dev <devnull@example.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-12-05 15:32:06 +08:00
zyksir	46b05ef58f	[diffusion] fix: fix bug about pin memory when offloading (#14472 )	2025-12-05 15:26:30 +08:00

1 2 3 4 5 ...

4977 Commits