sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 12:17:09 +00:00

Author	SHA1	Message	Date
Zhang Yiyang (SII)	a3ed2e4d29	[diffusion][CI] Add CI for MOVA model inference (#20430 ) Co-authored-by: Luo <139519292+0-693@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-24 21:28:16 +03:00
YC Yen-Ching Tseng	71f5ae3f9a	[AMD] Fix AMD Nightly Test - Transformers 5.3.0 incompatibility and gemma2-27b kv issue (#21193 ) Co-authored-by: bingxche <Bingxu.Chen@amd.com>	2026-03-24 10:41:44 -07:00
Elizaveta Martirosian	9f4d8ac99f	[Diffusion][NPU] Add support for Hunyuan3D (#20352 ) Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>	2026-03-24 16:18:49 +03:00
shadowxz109	1b4933d45d	[NPU][ModelSlim] adapt w2 quant layer for Minimax2.5 (#20905 )	2026-03-24 20:57:18 +08:00
Aleksi Vesanto	eefb504f84	[diffusion] model: Fix FLUX.1 output correctness (#21041 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-24 15:17:33 +03:00
Mohammad Miadh Angkad	4fbb311234	[Fix][Eval] Keep `--dataset-path` scoped to `longbench_v2` (#21156 )	2026-03-24 02:25:11 -07:00
Thomas Wang	855d15adf6	[AMD] Tilelang sparse fwd for dsv32 mi355/mi300 (#19945 )	2026-03-24 02:01:39 -07:00
Shunkangz	dac148167c	Enable the qwen3 test (#21195 ) Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2026-03-23 23:40:59 -07:00
Xiaoyu Zhang	69f02e36e8	[diffusion] Fix torch.zeros typo in causal wan (#21250 )	2026-03-24 14:39:16 +08:00
Xiaoyu Zhang	d9f97b2115	Refine diffusion skills and align JIT kernel docs with the new CI flow (#21283 )	2026-03-24 14:38:36 +08:00
Cheng Wan	c01ee848b0	Revert "fix: use consistent time denominator for throughput metrics in bench_one_batch_server" (#21276 )	2026-03-23 22:14:54 -07:00
Baidu-AIAK	6491728797	[Perf] Overlap NSA-CP key all-gather with query computation for DeepSeek-V3.2 (#20438 ) Co-authored-by: Shurui Jia <18817781975@163.com> Co-authored-by: Baidu-AIAK <baiduaiak~123>	2026-03-23 21:31:48 -07:00
Lianmin Zheng	260abe1fb1	Refactor JIT kernel CI to use run_suite.py registration system (#21239 )	2026-03-23 21:17:27 -07:00
hzh0425	0986bed8e2	[HiCache][HybridModel]: Support mamba state offloading & HybridCacheController (#20457 ) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: ispobock <ispobaoke@gmail.com>	2026-03-23 20:02:50 -07:00
Ratish P	2b1d3c935e	[diffusion] fix Z-Image SP sharding for portrait and padded resolutions (#21042 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-03-24 10:15:33 +08:00
Yuxuan Zhang	fcaad42b00	[Bug Fix] GLM-V / GLM-OCR: field detection for transformers 5.x and MTP omission fix (#21134 )	2026-03-23 13:19:48 -07:00
Baizhou Zhang	ed316a26ef	Fix CP in-seq-split method for DeepSeek V32 and update related tests (#21192 )	2026-03-23 12:34:10 -07:00
Lianmin Zheng	27ac831a84	docs: improve CI and testing documentation (#21202 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 10:48:50 -07:00
jacky.cheng	b4d3fb001d	[AMD] Add fused GemmaRMSNorm forward_hip to use aiter/vllm kernels for qwen3.5 (#21188 )	2026-03-23 10:21:36 -07:00
Johnsonms	777edb6ef7	Fix(jit): support rmsnorm for hidden_size in {64, 128, 256} (#20661 )	2026-03-23 23:17:44 +08:00
Yuan Luo	5bdc07d974	[Qwen3.5] Fuse split/reshape/cat ops in GDN projection with Triton kernel (#21019 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-23 23:17:01 +08:00
McZyWu	8662ba7db4	[NPU] bugfix for import sgl-kernel error (#21200 )	2026-03-23 19:52:36 +08:00
strgrb	80d4a0753a	fix fused_set_kv_buffer for rope with Ling-v2 (#20316 )	2026-03-23 19:20:40 +08:00
McZyWu	4641e5a3d2	[NPU] enhance accuracy for model minimaxm2 from 16.5% to 95.5% (#17695 )	2026-03-23 19:06:38 +08:00
XDaoHong	2d288ba8c9	[Bugfix] fix npu get kv_item_lens in PD separation when use ASCEND_US… (#15852 ) Co-authored-by: ZhengdQin <zhengdqin@gmail.com>	2026-03-23 15:56:47 +08:00
kpham-sgl	59cb9a9da6	[Spec][Ngram] 3/N: Fix synchronization issues in `Ngram.cpp` (#21186 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 00:37:07 -07:00
Lianmin Zheng	814202704b	ci: unify PR test suite naming (#21187 )	2026-03-23 00:18:45 -07:00
yudian0504	3d312643b9	[BUGFIX] Fix CP residual size mismatch crash when tp_size == attn_cp_size (#21170 )	2026-03-23 00:12:58 -07:00
Lianmin Zheng	7757a9ddd0	ci: remove IS_BLACKWELL env var; auto-detect Blackwell (#21118 )	2026-03-22 23:44:48 -07:00
kpham-sgl	bc4aaab6a1	[Spec][Ngram] 2/N: Rename branch length to max trie depth (#21181 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 23:35:25 -07:00
Cheng Wan	d6b12c401c	Revert "[bugfix] Fix PPMissingLayer AttributeError when Using PP" (#21189 )	2026-03-22 23:28:36 -07:00
Zhiqiang Xie	13f4f010d8	HiSparse for Sparse Attention (#20343 )	2026-03-22 23:09:31 -07:00
Lianmin Zheng	7050011dee	Enable JIT clamp_position and resolve_future_token_ids on ROCm (#21116 )	2026-03-22 22:33:54 -07:00
Yuhao Yang	32a85ef128	[diffusion] CI: auto-skip diffusion tests when required pipeline class is missing from diffusers (#21139 )	2026-03-23 12:15:21 +08:00
Mohammad Miadh Angkad	d8a5b1dbaf	[Bugfix] Work around FlashInfer unified transport issue on GB (#20039 )	2026-03-22 21:10:25 -07:00
Xiaoyu Zhang	a94d67d44b	[SKILL] fix(bench): Support model-specific DenoisingStage variants in… (#21137 )	2026-03-23 12:08:00 +08:00
fanghao	2b47bd3a34	[Bug Fix] Fix non-streaming request abort failure when --enable-metrics is enabled (#20625 )	2026-03-22 19:58:49 -07:00
yuumn	889e8489e9	[diffusion] model: support FireRed-Image-Edit (#20862 ) Co-authored-by: yuumn <1010797597@qqã.com>	2026-03-23 10:27:07 +08:00
Cishoon	999bad5aba	Fix VRAM leak in overlap scheduling with structured output (#20640 ) (#20697 )	2026-03-22 17:07:39 -07:00
Yilong Zhao	343998865a	perf: pad max-num-requests in decode cuda graph for higher coverage (#20978 )	2026-03-22 17:06:16 -07:00
Ziang Li	ce0541404f	[FlashInfer v0.6.6][RL] Support fp8-last-n-bf16 RL for `flashinfer_trtllm_routed` moe backend (#20214 )	2026-03-22 11:17:01 -07:00
Xiaoyu Zhang	c1fe5de69c	[Diffusion] Clean up diffusion Triton kernels and modernize custom op registration (#21122 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-22 22:38:57 +08:00
Xiaoyu Zhang	766d225fcc	Add SGLang CUDA crash API logging inspired by FlashInfer (#20910 )	2026-03-22 16:39:40 +08:00
Shunkangz	bb737d7a82	Support Qwen3 MoE context parallel (#18233 ) Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>	2026-03-22 01:27:20 -07:00
kpham-sgl	6d160b42bb	[Spec][Ngram] 1/N: Reference based Speculative Decoding refactor (#20393 )	2026-03-22 00:55:10 -07:00
Xiaoyu Zhang	1b65c0d259	[Diffusion] Fix torch.compile RMSNorm fallback for Z-Image (#20962 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-22 15:38:22 +08:00
Bowen Li	3bc595acbc	[FlashAttn] Add fused triton kernel for normal_decode_set_metadata (#20778 ) Co-authored-by: kinza99 <dh18324568312@163.com>	2026-03-22 15:12:29 +08:00
Mick	f7fc2c8592	[diffusion] fix: fix accuracy for some image models (#20679 )	2026-03-22 15:11:57 +08:00
shuwenn	2fba2bdad1	refactor: Remove dead code from utils/common.py (#20668 )	2026-03-21 21:54:17 -07:00
Lianmin Zheng	76e4a8662c	Replace clamp_position with JIT kernel + platform dispatch (#20999 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 21:26:26 -07:00

1 2 3 4 5 ...

7206 Commits