sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 11:48:01 +00:00

Author	SHA1	Message	Date
Yuan Luo	e12c78aab6	[sgl-kernel][1/2] Fused qk_norm_rope for Qwen3-MoE (#14036 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-11-28 12:25:15 +08:00
sglang-bot	391a863b3f	chore: bump sgl-kernel version to 0.3.18.post1 (#13942 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-11-25 15:31:53 -08:00
Xiaoyu Zhang	ecefc7904f	[sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819 )	2025-11-24 18:26:25 +08:00
sglang-bot	a22104a676	chore: bump sgl-kernel version to 0.3.18 (#13816 )	2025-11-23 17:24:54 -08:00
Xiaoyu Zhang	fb04d43428	[kimi k2 thinking] Avoid useless torch.zeros_ (#13596 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-21 13:15:27 +08:00
Roger Young	e72cf13693	Support moe topk sigmoid kernel (#13049 ) Co-authored-by: xuebi <xuebi@minimaxi.com>	2025-11-20 00:24:37 +08:00
sglang-bot	b638abbae6	chore: bump sgl-kernel version to 0.3.17.post2 (#13542 )	2025-11-18 22:08:23 -08:00
sglang-bot	4a56fa5cf2	chore: bump sgl-kernel version to 0.3.17.post1 (#13325 )	2025-11-15 23:36:32 +08:00
Xiaoyu Zhang	1d3d42bda0	[opt kimi k2 1 / n] Add kimi k2 moe fused gate (#13287 )	2025-11-15 17:14:19 +08:00
Ke Bao	8e9f05ece1	Update marlin moe kernel interface (#13322 )	2025-11-15 17:10:39 +08:00
Ke Bao	2a96e302cb	Revert moe sum reduce for marlin moe (#13314 )	2025-11-15 15:57:41 +08:00
Fan Yin	2966367a31	[sgl-kernel] support custom fp8 flashmla kernel (#13087 )	2025-11-13 12:45:21 -08:00
Xiaoyu Zhang	547de8c774	[1 / 2] register weak_ref_tensor in sgl-kernel (#12999 )	2025-11-10 22:12:59 +08:00
sglang-bot	a30f190762	chore: bump sgl-kernel version to 0.3.17 (#12931 )	2025-11-10 14:54:02 +08:00
sglang-bot	b2b26d4324	chore: bump sgl-kernel version to 0.3.16.post6 (#12889 )	2025-11-09 09:16:28 +08:00
Ke Bao	44f594d832	Apply moe_reduce_sum kernel for fused_marlin_moe (#12888 )	2025-11-09 01:31:05 +08:00
hlu1	b8ddc296f4	[sgl-kernel][Deepseek V3.2] Add row_starts to topk kernel (#12582 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-11-07 18:33:27 -08:00
fzyzcjy	193fbb0bce	Super tiny add UT for copy_to_gpu_no_ce (#12270 )	2025-11-04 09:40:51 +08:00
Lianmin Zheng	20315697f4	move all get_stream in sgl_kernel to c++ to reduce the launch overhead (#12521 )	2025-11-02 13:15:05 -08:00
sglang-bot	a920b9dace	chore: bump sgl-kernel version to 0.3.16.post5 (#12511 )	2025-11-02 14:08:11 +08:00
yinghui	a80bcb5a68	Add env var to disable FA4 warmup (#12430 )	2025-10-31 12:25:00 -07:00
Lianmin Zheng	2d5605e89b	Fix ci install to allow prerelease (#12449 )	2025-10-31 02:22:15 -07:00
Lianmin Zheng	c0652d907b	Clean up sgl kernel (#12413 ) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2025-10-31 01:13:34 -07:00
zejunchen-zejun	8a6838212a	[Fix] fix type issue of env flag value MODELOPT_MAX_TOKENS_PER_EXPERT (#11709 ) Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>	2025-10-29 09:44:05 -07:00
huangtingwei	3e6281d0aa	[HiCache]Page head layout IO kernel (#11615 )	2025-10-26 15:53:50 +08:00
Kai-Hsun Chen	6371f7af27	[quantization] AWQ Marlin doesn't work when dtype is bfloat16 (#11494 ) Signed-off-by: Kai-Hsun Chen <khchen@x.ai> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-10-26 15:49:45 +08:00
DarkSharpness	e8b71445c0	[Misc] Improve the error message of failed import (#12119 )	2025-10-25 12:09:05 -07:00
sglang-bot	a04212f128	chore: bump sgl-kernel version to 0.3.16.post4 (#12103 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-24 22:04:14 -07:00
Fan Yin	23afdfd1c2	[sgl-kernel] support flashmla libtorch (#11717 )	2025-10-21 21:17:50 -07:00
sglang-bot	283c8ba031	chore: bump sgl-kernel version to 0.3.16.post3 (#11733 )	2025-10-19 21:44:15 -05:00
hlu1	3b80232d06	[DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-10-19 17:13:39 -07:00
Johnny	252dc4e112	[NVIDIA] FA3/FA4 Fix (#11606 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-19 17:10:10 -07:00
fzyzcjy	a27825ae01	Support not officially supported high sgl-kernel version with low srt version (#11786 )	2025-10-19 16:11:59 +08:00
Fan Yin	3289da5b41	[sgl-kernel] support hadamard (#11663 )	2025-10-15 19:00:44 -07:00
Qi Yuhang	6c01844f45	[sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674 )	2025-10-15 13:39:31 -07:00
fzyzcjy	32803fb279	Super tiny improve FA3 import error message (#11590 )	2025-10-14 22:06:31 -07:00
sglang-bot	98923880bc	chore: bump sgl-kernel version to 0.3.16.post2 (#11583 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-13 20:52:38 -07:00
Yineng Zhang	f792e3c561	Revert "[NVIDIA] BUMP FA3 (#11444 )" (#11582 )	2025-10-13 20:51:45 -07:00
sglang-bot	60b0503227	chore: bump sgl-kernel version to 0.3.16.post1 (#11573 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-13 16:26:18 -07:00
Qi Yuhang	dc48c4c0e3	[sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534 )	2025-10-13 16:24:48 -07:00
Johnny	b8c430f1ce	[NVIDIA] BUMP FA3 (#11444 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>	2025-10-13 09:30:57 -07:00
Qi Yuhang	9a30914e94	[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-10-12 20:19:21 -07:00
sglang-bot	2db2cddd12	chore: bump sgl-kernel version to 0.3.16 (#11476 )	2025-10-11 22:04:49 -07:00
PGFLMG	8fdcd98efe	[7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019 )	2025-10-11 14:04:57 -07:00
fzyzcjy	21337b22b9	Reland [1/2] Optimizations and refactors about quant kernel (#10312 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-10-11 15:59:03 +08:00
sglang-bot	8c9670375f	chore: bump sgl-kernel version to 0.3.15 (#11281 )	2025-10-06 18:17:51 -07:00
Lifu Huang	748f86f3de	[Bug] Fix incorrect assertion in FA4 and add UT. (#11182 )	2025-10-06 14:58:39 -07:00
PGFLMG	1a599509cc	chore: bump sgl-kernel v0.3.14.post1 (#11137 )	2025-10-05 13:46:43 -07:00
DarkSharpness	e0b2d3eebe	[Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-05 10:19:03 -07:00
PGFLMG	580051c5a8	chore: bump sgl-kernel v0.3.14 (#11067 )	2025-09-30 02:53:24 -07:00

1 2 3 4 5 ...

280 Commits