sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 20:27:57 +00:00

Author	SHA1	Message	Date
Xinyuan Tong	a02cff7f2b	[Fix] Patch is_flash_attn_2_available for flash-attn-4 in VLM input format test (#20946 )	2026-03-19 13:00:51 -07:00
AlfredYong	c562e0d13b	[feat] Enhance Kimi-K2/K2.5 function call and reasoning detection (#19552 ) Co-authored-by: alfredyyang <alfredyyang@tencent.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2026-03-19 12:57:57 -07:00
Mohammad Miadh Angkad	29ced9c162	[UX] Suppress noisy `httpx`/`httpcore` INFO logs (#20944 )	2026-03-19 10:58:41 -07:00
Xinyu Zhang	319bb4974c	[Fix] RayEngine multi-node: co-locate rank0 scheduler with Engine and fix CUDA device setting (#20722 )	2026-03-19 10:27:16 -07:00
Cao E	274581fb77	Add support for more batch sizes in cpu_graph_runner (#13881 )	2026-03-19 09:50:56 -07:00
kk	c8f0122acf	Fix gpu-fault issue when run deepseek-r1 and enable dp (#20841 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2026-03-19 02:36:12 -07:00
khalilzhk	574572b21b	[BugFix] bug fix for DeepSeek eagle3 in Attn-DP mode (#20492 )	2026-03-19 14:48:46 +08:00
Shangming Cai	fd05532da1	Add logging for BootstrapServer for CI diagnosis (#20844 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-03-19 14:42:12 +08:00
blzheng	a98b456c70	[CPU] Add frontend support for Gemma (#12590 )	2026-03-18 23:02:26 -07:00
jianan-gu	8d4fcf2f7b	[CPU] Fix MoE layer support for DeepSeek-OCR models (#12555 )	2026-03-18 22:57:55 -07:00
Matti Varjokallio	85fe8c6793	[AMD] Use aiter_dsv3_router_gemm kernel if number of experts <= 256. (#18451 )	2026-03-18 22:40:48 -07:00
kk	126cd5cfae	gpt-oss decode performance optimization (#20392 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2026-03-18 22:30:03 -07:00
blzheng	cd22aa27a9	[CPU] Add FP8 Bmm support (#9744 ) Co-authored-by: Fan Yin <1106310035@qq.com>	2026-03-18 22:19:48 -07:00
Zaili Wang	2f4babe32b	[CPU] support LayerNorm with 3D shape (#15075 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-03-18 22:15:24 -07:00
blzheng	dc6aa26ce9	[CPU] Add mrope kernel for Qwen3-vl (#12531 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-03-18 22:12:48 -07:00
Juan Muneton	4052b53227	fix scheduler for non-cuda devices and disable piecewise cuda graph f… (#19992 ) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>	2026-03-18 21:54:19 -07:00
Ling Zhang	f85455ab24	[Bugfix] fix qwen3vl hang when --mm-enable-dp-encoder is enable (#20759 )	2026-03-18 21:51:39 -07:00
Ethan (Yusheng) Su	7f6f1a3ab1	[LoRA][II] Add fused MOE LoRA Triton kernel and tests (#19711 )	2026-03-18 19:58:14 -07:00
R0CKSTAR	7553b7dcb0	chore: extract diffusion_common in python/pyproject_other.toml (#20803 ) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>	2026-03-19 10:39:16 +08:00
Qiaolin Yu	eea9e19c13	fix lint introduced in #20708 (#20886 )	2026-03-18 15:38:52 -07:00
Chang Su	0d23a461a0	feat(mm)(grpc): compute M-RoPE positions for preprocessed VL inputs (#19973 ) Signed-off-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Chang Su <chang.s.su@oracle.com>	2026-03-18 15:34:50 -07:00
Liangsheng Yin	8b9482e665	fix(dp-attn): consistent overlap disable decision across DP ranks (#20853 )	2026-03-18 15:16:39 -07:00
maocheng23	4e8829e4cd	Replace topk_ids with curr_topk_ids in fused_moe.py (#20302 )	2026-03-18 21:57:05 +00:00
Chad Voegele	a3196d08b8	[MiniMax M2] Fix KV cache scale loading (#20870 ) Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 14:54:43 -07:00
Xinyuan Tong	6b8a6545b2	Add Mistral Small 4 (Pixtral) support (#20708 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Alex Nails <alexnails@radixark.ai> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: dbari <dbari@users.noreply.github.com>	2026-03-18 14:15:32 -07:00
Trevor Morris	df1d046de2	Add packed_modules_mapping for MiniMax-M2 (#19995 )	2026-03-18 14:10:01 -07:00
Xinyuan Tong	d1e95af282	Upgrade transformers==5.3.0 (#17784 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Alison Shao <alisonshao@mac.lan> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-18 13:50:43 -07:00
Bruce Wu	e5750a572c	Support TP for lora lm_head layer (#18511 ) Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>	2026-03-18 13:48:03 -07:00
ishandhanani	8f0f36c64b	[1/2] Add ModelExpress coordination for remote instance weight loading - matching TP (#19920 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ishan Dhanani <ishan@dhanani.dev>	2026-03-18 13:38:32 -07:00
Yaochen Han	c7a71740a5	[NPU][diffusion] npu support enable_torch_compile for torchair backend on diffusion models (#20687 )	2026-03-18 22:40:35 +03:00
Vladislav Nosivskoy	b9dba851a0	Fix streaming token ids data loss under load (#19977 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>	2026-03-18 12:23:45 -07:00
Gabriel Wu	70876ae93b	fix: guard configure_deep_gemm_num_sms when JIT disabled (#20868 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 11:15:20 -07:00
Jackie	a6c7bb54eb	[Perf]Optimize waiting queue update with set usage (#20503 )	2026-03-18 09:56:24 -07:00
jianan-gu	21c4fc6334	[DP encoder] Fix `pos_emb` layer TP issue when DP encoder enabled for Qwen3 VL (#20788 )	2026-03-18 17:14:47 +08:00
Thomas Wang	c0a4408f78	[AMD] Fix dpsk-v32 accuracy issue on mi355 (#20840 )	2026-03-18 02:06:15 -07:00
billishyahao	f0d7a3f427	[AMD][TBO] Fix mori ep dual stream accuracy (#19888 )	2026-03-18 02:00:55 -07:00
Shangming Cai	8b46f1f4ec	[PD] Add retry interval in ensure_prefill_info (#20832 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-03-18 16:02:20 +08:00
Chuan (Richard) Li	93422f27d6	[AMD][AITER] Guard _use_mla_ps_kernel with self.use_mla in draft_extend_v2 paths (#20409 )	2026-03-18 00:45:22 -07:00
R0CKSTAR	ead9d7aa43	[diffusion] fix: fix vae model offload on mps(#20607 ) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-18 15:44:59 +08:00
chenxu214	532470bcca	[NPU] add new fusion operator DispatchFFNCombine (#20245 )	2026-03-18 15:22:04 +08:00
jinke	ae15fca192	[Bugfix] fix hicache mooncake backend extra config loading (#16808 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: jinke15 <jinke15@jd.com>	2026-03-18 15:07:39 +08:00
xingsy97	d20e9a20fa	[JIT] Inject target architecture flag into JIT compilation (#20103 )	2026-03-17 23:16:49 -07:00
xingsy97	f78d5c3b3c	[JIT Kernel] Add hadamard kernel test and benchmark (#20030 )	2026-03-17 23:16:35 -07:00
Артем Савкин	c64681f162	[Bugfix] [diffusion] Fix cache-dit with sp-degree only (#19965 ) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-03-18 14:05:12 +08:00
Kangyan-Zhou	b6055e59cd	[HiCache] Reduce per-request backup log noise (#20813 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 22:47:14 -07:00
Viacheslav	30a35ecd90	Add gigachat3.1 parser (#19886 ) Signed-off-by: Viacheslav Barinov <vvadbarinov@sberbank.ru> Signed-off-by: Viacheslav Bv <viacheslav.teh@gmail.com> Co-authored-by: Viacheslav Barinov <vvadbarinov@sberbank.ru>	2026-03-17 22:45:01 -07:00
Evgueni Petrov	2e860233ca	rocm: fix oom when loading fp8 weights close to size of available vram (#19941 )	2026-03-17 22:44:19 -07:00
shiyu7	0acc1d3c9a	fix: change qwen 3.5 linear attention a_log to fp32 (#19961 ) Co-authored-by: sunqi.7 <sunqi.7@bytedance.com>	2026-03-17 22:42:06 -07:00
Brayden Zhong	88c40ec16d	Use Flashinfer for target_verify in GDN model for SM120 (#20604 )	2026-03-17 22:40:56 -07:00
Brayden Zhong	97d5386a21	Use TRTLLM allreduce fusion for Qwen 3.5 (#19889 )	2026-03-17 22:40:22 -07:00

1 2 3 4 5 ...

7109 Commits