sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 04:08:10 +00:00

Author	SHA1	Message	Date
Lianmin Zheng	44e67c6835	Remove deprecated double sparsity feature (#23009 )	2026-04-17 13:33:12 -07:00
andyluo7	9df6107dca	[AMD] Enable DFLASH speculative decoding on ROCm (#22342 ) Signed-off-by: Andy Luo <andyluo7@users.noreply.github.com> Co-authored-by: Andy Luo <andyluo7@users.noreply.github.com>	2026-04-17 13:10:14 -07:00
shuwenn	90c76d665e	[HiCache] fix: HiCacheFile component key suffixing (#22891 ) Co-authored-by: Zhangheng <hzh0425@apache.org>	2026-04-17 13:06:28 -07:00
YC Yen-Ching Tseng	5d4e899477	[AMD] Fix AMD Multimodal Test - skip nvfp4 tests (#23045 )	2026-04-17 09:02:39 -07:00
Jincong Chen	2bac219d0c	[Perf] Precompute gemma_weight to avoid redundant add on every forward (#22673 )	2026-04-17 23:37:41 +08:00
Xiaoyu Zhang	83c5119d01	[diffusion] CI: fix ModelOpt B200 CI artifact coverage (#22955 )	2026-04-17 23:33:42 +08:00
Mick	5de89ea942	[diffusion] CI: fix auto-partition (#23076 )	2026-04-17 22:37:24 +08:00
Opher Lieber	6e3bbef568	expose num_embeddings in VocabParallelEmbeddingWithLoRA (#22547 )	2026-04-17 02:35:13 -07:00
Jonah Bernard	0d031335ed	[Pipeline Parallelism][Bug] Fix scheduler hang in pipeline parallelism setup (#23006 )	2026-04-17 14:50:47 +08:00
Duyi-Wang	8c190f6b91	[AMD] Add SGLANG_MORI_MOE_MAX_INPUT_TOKENS to truncate dispatch before MoE. (#22952 )	2026-04-16 23:40:15 -07:00
RichardoMu	7390eddf28	feat(observability): add OpenTelemetry tracing for speculative decoding (#19545 ) Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>	2026-04-17 14:01:58 +08:00
narutolhy	5fa0c6a52e	Allow piecewise CUDA graph with speculative decoding (#22128 ) Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 13:39:30 +08:00
Xiaoyu Zhang	91679d935d	[codex] Update diffusion skills (#23028 )	2026-04-17 13:29:26 +08:00
blzheng	0dcfae5553	[CPU] Add gemma4_rmsnorm_cpu kernel (#22842 ) Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-04-17 13:03:16 +08:00
YC Yen-Ching Tseng	f0f0148167	Revert "feat: Support MXFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#19143 )" (#23031 )	2026-04-16 21:53:25 -07:00
Zhangheng	7d47f40a96	[UnifiedRadixTree]: Add HiCache hook interface for TreeComponent (#22924 )	2026-04-17 12:09:41 +08:00
Byron Hsu	cf9845f8e3	[Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank (#22990 ) Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>	2026-04-17 11:10:31 +08:00
Jan Bernlöhr	04a53955b9	feat: add coordinated checkpoint prefetch for network filesystem loading (#20843 )	2026-04-16 20:08:19 -07:00
Yuhao Yang	a77abbe005	[VLM] Reduce GPU memory footprint of CUDA IPC MM feature transport (#22662 )	2026-04-17 10:38:36 +08:00
Yuxuan Zhang	16d11c2a10	Fix for the low-probability garbled output issue in the GLM-5 series models. (#22811 )	2026-04-17 09:52:13 +08:00
Makcum888e	e353630b57	[Diffusion] [NPU] Fix multimodal gen CI (#22879 )	2026-04-17 04:09:44 +03:00
Egor Filimonov	ba850d3a9d	[Bugfix] [NPU] Fix check_env on Ascend for CANN 8.5 (#22888 )	2026-04-17 04:05:20 +03:00
Mick	3d2d57c6cc	[diffusion] refactor: extract LTX2 image encoding from denoising stage (#22976 )	2026-04-17 08:35:15 +08:00
Daifeng Li	2cc52d8326	feat: Support MXFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#19143 )	2026-04-16 16:51:32 -07:00
pdasgup	f639425ff0	add check for none status code in FinishAbort (#22535 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: hnyls2002 <lsyincs@gmail.com>	2026-04-16 16:21:07 -07:00
Tarushii Goel	2211b4d9c6	[sgl] improve accuracy of additional page requirement during spec decode (#22406 )	2026-04-16 15:50:51 -07:00
Liangsheng Yin	db7a751d48	refactor: extract FanOutCommunicator and use declarative spec table (#22967 )	2026-04-16 15:37:19 -07:00
mqhc2020	52f0b86f5d	[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled (#22948 ) Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>	2026-04-16 15:25:33 -07:00
Liangsheng Yin	c83ef4fdb6	use envs in server_args (#22994 )	2026-04-16 15:01:33 -07:00
Xinyu Zhang	c0172aef6e	[Ray] Bind scheduler actors to GPU-local NUMA node (#22989 ) Co-authored-by: xyuzh <xyuzh@users.noreply.github.com>	2026-04-16 14:52:15 -07:00
Xinyu Zhang	d430034bde	[Ray] Support multi-replica serving by making scheduler actor names unique (#22917 )	2026-04-16 14:51:01 -07:00
Qiaolin Yu	a87806a65f	[misc] refine outdated comments for chain-style multi-layer MTP (#22996 )	2026-04-16 14:49:43 -07:00
ybyang	41258f874d	[PD]feat(bench): add --fake-prefill flag for decode-only stress testing (#22973 )	2026-04-16 13:57:55 -07:00
Yuhao Yang	9da998a882	[diffusion] feat: disaggregated diffusion (#21701 )	2026-04-16 23:51:32 +08:00
Liangsheng Yin	62309f09db	fix(loads): preserve include filtering after watching mode switch (#22959 )	2026-04-16 03:04:53 -07:00
ybyang	03fef357a6	fix(loads): switch get_loads_communicator to watching mode (#22919 )	2026-04-16 02:12:22 -07:00
ybyang	fbd6dc3565	fix: normalize tool message content for GLM5.1 chat template (#22595 )	2026-04-16 16:48:38 +08:00
Aleksi Vesanto	aaa682346e	[diffusion] model: Properly validate device for Mistral 3 attention (#22690 )	2026-04-16 00:29:23 -07:00
Lianmin Zheng	35da90cb76	[misc] Configure logging before ServerArgs.__post_init__ (#22926 )	2026-04-15 23:53:15 -07:00
yuefeng Wu	65bc839a5f	[Fix] eagle/eagle3 speculative decoding conflicts with xgrammar in NPU (#20989 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-15 23:34:23 -07:00
Bi Xue	c43716a357	[sgl] provide an option to send control req to all dp ranks rank0 (#22758 )	2026-04-16 14:24:26 +08:00
Byron Hsu	3600465e81	[Bug Fix] Remove follow_bootstrap_room fast path in PD disaggregation DP rank resolution (#22901 )	2026-04-15 22:53:29 -07:00
LHXuuu	e7ad7c587a	[EPD][VLM] Support Kimi VL EPD (#22490 ) Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>	2026-04-16 12:40:02 +08:00
CYYYC0310	58c6b871b2	Remove compatibility restriction between Pipeline Parallelism and Mixed Chunked Prefill (#22920 ) Co-authored-by: cyy <cy02433585@alibaba-inc.com>	2026-04-16 11:25:31 +08:00
Xinyuan Tong	34fef07a15	Upgrade transformers to 5.5.3 and refactor hf_transformers_utils into subpackage (#21569 )	2026-04-15 20:03:44 -07:00
JINZ	14e122cdee	[BugFix][RadixTree]:Fix stale eviction assertion in HiMambaRadixCache host eviction path (#22592 ) Co-authored-by: Zhangheng <hzh0425@apache.org>	2026-04-16 10:49:30 +08:00
Yuhao Yang	b8794baa6d	[Step3p5] Optimize allreduce in MoE layers (#22773 )	2026-04-16 09:33:12 +08:00
Liangsheng Yin	a4cf2ea128	streaming session: spec v2 bonus accounting + comprehensive test matrix (#22651 )	2026-04-15 17:12:41 -07:00
Xinyu Zhang	e8c6e5466c	[Ray] Auto-create placement group in RayEngine when none is detected (#22898 )	2026-04-15 15:17:52 -07:00
Qiaolin Yu	0b1b07db72	[misc] fix ray folder lint (#22905 )	2026-04-15 15:08:18 -07:00

1 2 3 4 5 ...

7855 Commits