sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 20:27:57 +00:00

Author	SHA1	Message	Date
fzyzcjy	fdbcb8156e	Refactor dp_utils to use ParallelAxis enum in dump comparator (#21028 )	2026-03-20 22:04:20 +08:00
fzyzcjy	154395ab7d	Support s≡t dimension name equivalence in dump comparator (#21027 )	2026-03-20 22:03:34 +08:00
fzyzcjy	cc22601d28	Validate replicated axes orthogonality in dump comparator (#21026 )	2026-03-20 22:02:40 +08:00
fzyzcjy	2f01950a0e	Support jointly-determined axes inference in dump comparator (#21025 )	2026-03-20 22:01:26 +08:00
fzyzcjy	ecd7e40d20	Support dependent axis auto-resolution in dump comparator (#21024 )	2026-03-20 21:56:39 +08:00
Lianmin Zheng	104b10f70a	refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009 )	2026-03-20 05:55:36 -07:00
Артем Савкин	9fbe6800aa	[NPU] [Diffusion] Update CI performance baseline for Wan2.2-T2V-A14B-Diffusers-w8a8 (#20997 )	2026-03-20 15:54:12 +03:00
xingsy97	f41832795e	Add compile-time 256-bit vector guard for pre-Blackwell (#19794 )	2026-03-20 18:25:12 +08:00
DarkSharpness	2dd9196079	[JIT Kernel][Feature] Support JIT custom all reduce (rewrite as v2) (#19880 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-03-20 18:24:07 +08:00
Muqi Li	2099943a49	Fix scale_step_k computation in the fp8_kernel (#20819 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-03-20 18:09:31 +08:00
Jia Guo	ec01ef9092	Fix torch.compile/dynamo crash with Qwen3 QK-norm in piecewise CUDA g… (#19818 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 18:05:09 +08:00
Prozac614	fa89d152c0	[diffusion] CI: fix hunyuan3d JIT cache (#20773 ) Co-authored-by: daiweitao <dwti614707404@163.com>	2026-03-20 17:51:55 +08:00
Lianmin Zheng	a0a4dae67f	Revert "Fix DeepSeek V32 FP4 test" (#21003 )	2026-03-20 02:19:28 -07:00
Lianmin Zheng	112b628227	Replace _resolve_future_token_ids with JIT kernel + platform dispatch (#20976 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 01:47:03 -07:00
Baizhou Zhang	c82d20d48e	Fix DeepSeek V32 FP4 test (#20984 )	2026-03-20 01:04:32 -07:00
Yilong Zhao	26f709e97d	misc: make prefill-delayer compatible with multiple types of mem pool (#20979 )	2026-03-20 00:05:53 -07:00
Yilong Zhao	95327458ee	misc: add BatchTokenizerReq hook into dp controller (#20981 )	2026-03-19 23:59:53 -07:00
Lianmin Zheng	712a48c5d2	ci: move metrics scripts under scripts/ci/utils (#20986 )	2026-03-19 23:47:57 -07:00
lviy	46a76af97b	[Bugifx] qwen3 rope parameter compatibility (#20931 ) Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-03-19 22:22:01 -07:00
Jia Guo	87549f8f0b	perf(mamba): use Triton conv1d for non-contiguous input to avoid .contiguous() copy (#20469 )	2026-03-19 19:38:46 -07:00
Vedant V Jhaveri	db995fba47	perf(kimi_linear): replace einops rearrange with native torch ops in Kimi-Linear KDA path (#20396 )	2026-03-20 10:38:12 +08:00
ehuaa	fa0d8f6629	perf: avoid unnecessary gpu-cpu sync in eagle_info (#20266 ) Co-authored-by: root <qianhao@zhejianglab.org>	2026-03-19 19:37:29 -07:00
Mohammad Miadh Angkad	3d749c49ca	[JIT Kernel] Fix NVFP4 multi-arch compilation failure (#20874 )	2026-03-20 10:30:04 +08:00
cs-cat	22e378af86	Fix result writer in tuning_block_wise_kernel.py, and add FP8 kernel config for L40 (#20368 ) Signed-off-by: cs-cat <118669451+cs-cat@users.noreply.github.com>	2026-03-20 09:28:54 +08:00
Yuan Luo	d9794ef9f7	[Qwen3-Next] Fuse Qwen3-Next GDN's qkvz_proj and ba_proj (#19321 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-20 09:25:29 +08:00
Baizhou Zhang	42f4b7276c	Revert "feat(mm)(grpc): compute M-RoPE positions for preprocessed VL inputs" (#20956 )	2026-03-19 18:03:04 -07:00
Liangsheng Yin	2b53e660de	Simplify streaming session logprob handling (#20955 )	2026-03-19 17:09:40 -07:00
Leon Gao	63c38aba5e	Fix token leak with logprob_start_len=0 in streaming sessions (#20557 )	2026-03-19 15:37:27 -07:00
Brayden Zhong	b42b9f6e1a	Support CuteDSL `mm_fp4` backend (#18801 )	2026-03-19 14:20:01 -07:00
Yuwei An	d8ece7fb22	[Tiny Fix] Filter lru related warning with pcg (#20940 ) Signed-off-by: yuweia <ayw.sirius19@gmail.com>	2026-03-19 13:20:49 -07:00
Lianmin Zheng	0949b138af	Simplify server startup output (#20885 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 13:11:37 -07:00
Xinyuan Tong	a02cff7f2b	[Fix] Patch is_flash_attn_2_available for flash-attn-4 in VLM input format test (#20946 )	2026-03-19 13:00:51 -07:00
AlfredYong	c562e0d13b	[feat] Enhance Kimi-K2/K2.5 function call and reasoning detection (#19552 ) Co-authored-by: alfredyyang <alfredyyang@tencent.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2026-03-19 12:57:57 -07:00
Mohammad Miadh Angkad	29ced9c162	[UX] Suppress noisy `httpx`/`httpcore` INFO logs (#20944 )	2026-03-19 10:58:41 -07:00
Xinyu Zhang	319bb4974c	[Fix] RayEngine multi-node: co-locate rank0 scheduler with Engine and fix CUDA device setting (#20722 )	2026-03-19 10:27:16 -07:00
Cao E	274581fb77	Add support for more batch sizes in cpu_graph_runner (#13881 )	2026-03-19 09:50:56 -07:00
kk	c8f0122acf	Fix gpu-fault issue when run deepseek-r1 and enable dp (#20841 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2026-03-19 02:36:12 -07:00
khalilzhk	574572b21b	[BugFix] bug fix for DeepSeek eagle3 in Attn-DP mode (#20492 )	2026-03-19 14:48:46 +08:00
Shangming Cai	fd05532da1	Add logging for BootstrapServer for CI diagnosis (#20844 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-03-19 14:42:12 +08:00
blzheng	a98b456c70	[CPU] Add frontend support for Gemma (#12590 )	2026-03-18 23:02:26 -07:00
jianan-gu	8d4fcf2f7b	[CPU] Fix MoE layer support for DeepSeek-OCR models (#12555 )	2026-03-18 22:57:55 -07:00
Matti Varjokallio	85fe8c6793	[AMD] Use aiter_dsv3_router_gemm kernel if number of experts <= 256. (#18451 )	2026-03-18 22:40:48 -07:00
kk	126cd5cfae	gpt-oss decode performance optimization (#20392 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2026-03-18 22:30:03 -07:00
blzheng	cd22aa27a9	[CPU] Add FP8 Bmm support (#9744 ) Co-authored-by: Fan Yin <1106310035@qq.com>	2026-03-18 22:19:48 -07:00
Zaili Wang	2f4babe32b	[CPU] support LayerNorm with 3D shape (#15075 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-03-18 22:15:24 -07:00
blzheng	dc6aa26ce9	[CPU] Add mrope kernel for Qwen3-vl (#12531 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-03-18 22:12:48 -07:00
Juan Muneton	4052b53227	fix scheduler for non-cuda devices and disable piecewise cuda graph f… (#19992 ) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>	2026-03-18 21:54:19 -07:00
Ling Zhang	f85455ab24	[Bugfix] fix qwen3vl hang when --mm-enable-dp-encoder is enable (#20759 )	2026-03-18 21:51:39 -07:00
Ethan (Yusheng) Su	7f6f1a3ab1	[LoRA][II] Add fused MOE LoRA Triton kernel and tests (#19711 )	2026-03-18 19:58:14 -07:00
R0CKSTAR	7553b7dcb0	chore: extract diffusion_common in python/pyproject_other.toml (#20803 ) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>	2026-03-19 10:39:16 +08:00

1 2 3 4 5 ...

7140 Commits