sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 20:27:57 +00:00

Author	SHA1	Message	Date
Mohammad Miadh Angkad	d8a5b1dbaf	[Bugfix] Work around FlashInfer unified transport issue on GB (#20039 )	2026-03-22 21:10:25 -07:00
Xiaoyu Zhang	a94d67d44b	[SKILL] fix(bench): Support model-specific DenoisingStage variants in… (#21137 )	2026-03-23 12:08:00 +08:00
fanghao	2b47bd3a34	[Bug Fix] Fix non-streaming request abort failure when --enable-metrics is enabled (#20625 )	2026-03-22 19:58:49 -07:00
yuumn	889e8489e9	[diffusion] model: support FireRed-Image-Edit (#20862 ) Co-authored-by: yuumn <1010797597@qqã.com>	2026-03-23 10:27:07 +08:00
Cishoon	999bad5aba	Fix VRAM leak in overlap scheduling with structured output (#20640 ) (#20697 )	2026-03-22 17:07:39 -07:00
Yilong Zhao	343998865a	perf: pad max-num-requests in decode cuda graph for higher coverage (#20978 )	2026-03-22 17:06:16 -07:00
Ziang Li	ce0541404f	[FlashInfer v0.6.6][RL] Support fp8-last-n-bf16 RL for `flashinfer_trtllm_routed` moe backend (#20214 )	2026-03-22 11:17:01 -07:00
Xiaoyu Zhang	c1fe5de69c	[Diffusion] Clean up diffusion Triton kernels and modernize custom op registration (#21122 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-22 22:38:57 +08:00
Xiaoyu Zhang	766d225fcc	Add SGLang CUDA crash API logging inspired by FlashInfer (#20910 )	2026-03-22 16:39:40 +08:00
Shunkangz	bb737d7a82	Support Qwen3 MoE context parallel (#18233 ) Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>	2026-03-22 01:27:20 -07:00
kpham-sgl	6d160b42bb	[Spec][Ngram] 1/N: Reference based Speculative Decoding refactor (#20393 )	2026-03-22 00:55:10 -07:00
Xiaoyu Zhang	1b65c0d259	[Diffusion] Fix torch.compile RMSNorm fallback for Z-Image (#20962 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-22 15:38:22 +08:00
Bowen Li	3bc595acbc	[FlashAttn] Add fused triton kernel for normal_decode_set_metadata (#20778 ) Co-authored-by: kinza99 <dh18324568312@163.com>	2026-03-22 15:12:29 +08:00
Mick	f7fc2c8592	[diffusion] fix: fix accuracy for some image models (#20679 )	2026-03-22 15:11:57 +08:00
shuwenn	2fba2bdad1	refactor: Remove dead code from utils/common.py (#20668 )	2026-03-21 21:54:17 -07:00
Lianmin Zheng	76e4a8662c	Replace clamp_position with JIT kernel + platform dispatch (#20999 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 21:26:26 -07:00
Changyi Yang	c1794e2944	[diffusion] fix: fix Sana corrupted output by removing spurious QK norm layers (#20656 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-22 12:06:49 +08:00
Yuhao Yang	c32e35a2a5	[diffusion] CI: fix picklingerror for diffusion models using diffusers backend (#20854 )	2026-03-22 11:51:03 +08:00
Mick	6dfa8a40bc	[diffusion] CI: make auxiliary coverage explicit and simplify testcases (#20983 )	2026-03-21 20:18:23 +08:00
KnightLTC	a0862f00c2	dbrx instruct npu support (#17121 ) Co-authored-by: McZyWu <zhuoyun.wu.23@ucl.ac.uk>	2026-03-21 17:10:35 +08:00
Alison Shao	852e112ebf	[Qwen3.5] Fix broken pipeline parallelism layer splitting (#21070 ) Co-authored-by: Alison Shao <alison.shao@Mac.attlocal.net>	2026-03-21 01:02:51 -07:00
Lianmin Zheng	dba6fb3d30	Fix streaming logprobs corruption caused by shared mutable list reference (#21030 )	2026-03-21 00:18:48 -07:00
kk	3f0ba021fc	[AMD] Improve openai/gpt-oss performance (#21020 ) Co-authored-by: root <root@smci355-ccs-aus-m15-21.cs-aus.dcgpu> Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com>	2026-03-20 23:16:47 -07:00
Baizhou Zhang	67cad3e69e	Revert "Support CuteDSL `mm_fp4` backend" (#21077 )	2026-03-20 22:47:47 -07:00
Xiaoyu Zhang	c076968c52	[CI] Remove obsolete AOT-only jit-kernel benchmarks after sgl-kernel 4.0 (#21075 )	2026-03-21 13:40:42 +08:00
Baizhou Zhang	5f3393c04c	Fix deepseek-v32-fp4 b200 ci (#21072 )	2026-03-20 22:28:40 -07:00
Alison Shao	048d90e165	Revert "[AMD] Add MoE weights and scales padding" (#21067 )	2026-03-20 20:26:17 -07:00
shuwenn	6c91590e1b	[HiCache] refactor: hicache normalization flow and compatibility checks (#19669 )	2026-03-20 18:38:44 -07:00
mqhc2020	9419453713	[AMD] Add MoE weights and scales padding (#18684 )	2026-03-20 14:55:09 -07:00
YC Yen-Ching Tseng	f97c09dac1	[AMD] Enable aiter unified attention for non-SWA models (Qwen3-VL) (#20897 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2026-03-20 12:07:41 -07:00
fzyzcjy	146700db68	Add e2e demo test in dump comparator (#21031 )	2026-03-20 22:41:01 +08:00
fzyzcjy	6703cc4484	Enhance output formatting in dump comparator (#21029 )	2026-03-20 22:04:50 +08:00
fzyzcjy	fdbcb8156e	Refactor dp_utils to use ParallelAxis enum in dump comparator (#21028 )	2026-03-20 22:04:20 +08:00
fzyzcjy	154395ab7d	Support s≡t dimension name equivalence in dump comparator (#21027 )	2026-03-20 22:03:34 +08:00
fzyzcjy	cc22601d28	Validate replicated axes orthogonality in dump comparator (#21026 )	2026-03-20 22:02:40 +08:00
fzyzcjy	2f01950a0e	Support jointly-determined axes inference in dump comparator (#21025 )	2026-03-20 22:01:26 +08:00
fzyzcjy	ecd7e40d20	Support dependent axis auto-resolution in dump comparator (#21024 )	2026-03-20 21:56:39 +08:00
Lianmin Zheng	104b10f70a	refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009 )	2026-03-20 05:55:36 -07:00
Артем Савкин	9fbe6800aa	[NPU] [Diffusion] Update CI performance baseline for Wan2.2-T2V-A14B-Diffusers-w8a8 (#20997 )	2026-03-20 15:54:12 +03:00
xingsy97	f41832795e	Add compile-time 256-bit vector guard for pre-Blackwell (#19794 )	2026-03-20 18:25:12 +08:00
DarkSharpness	2dd9196079	[JIT Kernel][Feature] Support JIT custom all reduce (rewrite as v2) (#19880 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-03-20 18:24:07 +08:00
Muqi Li	2099943a49	Fix scale_step_k computation in the fp8_kernel (#20819 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-03-20 18:09:31 +08:00
Jia Guo	ec01ef9092	Fix torch.compile/dynamo crash with Qwen3 QK-norm in piecewise CUDA g… (#19818 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 18:05:09 +08:00
Prozac614	fa89d152c0	[diffusion] CI: fix hunyuan3d JIT cache (#20773 ) Co-authored-by: daiweitao <dwti614707404@163.com>	2026-03-20 17:51:55 +08:00
Lianmin Zheng	a0a4dae67f	Revert "Fix DeepSeek V32 FP4 test" (#21003 )	2026-03-20 02:19:28 -07:00
Lianmin Zheng	112b628227	Replace _resolve_future_token_ids with JIT kernel + platform dispatch (#20976 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 01:47:03 -07:00
Baizhou Zhang	c82d20d48e	Fix DeepSeek V32 FP4 test (#20984 )	2026-03-20 01:04:32 -07:00
Yilong Zhao	26f709e97d	misc: make prefill-delayer compatible with multiple types of mem pool (#20979 )	2026-03-20 00:05:53 -07:00
Yilong Zhao	95327458ee	misc: add BatchTokenizerReq hook into dp controller (#20981 )	2026-03-19 23:59:53 -07:00
Lianmin Zheng	712a48c5d2	ci: move metrics scripts under scripts/ci/utils (#20986 )	2026-03-19 23:47:57 -07:00

1 2 3 4 5 ...

7172 Commits