sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 19:57:52 +00:00

Author	SHA1	Message	Date
Xiaoyu Zhang	cdd7d6a227	Remove obsolete sgl-kernel legacy paths (#21528 )	2026-04-01 09:00:20 +08:00
Minglei Zhu	a12fea21ed	perf(sgl-kernel): expose get_scheduler_metadata for FA3 decode optimization (#21103 )	2026-03-25 13:17:27 -07:00
Xiaoyu Zhang	25e38216b6	[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277 )	2026-03-14 16:45:54 +08:00
Mohammad Miadh Angkad	f88acf8780	[JIT Kernel] Reland NVFP4 kernels to JIT (#20012 )	2026-03-07 10:31:08 +08:00
Baizhou Zhang	51e5dc845a	Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005 )	2026-03-05 19:40:00 -08:00
Mohammad Miadh Angkad	2bdd89a6cd	[Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437 )	2026-03-05 15:22:28 +08:00
Xiaoyu Zhang	054bd71086	[sgl-kernel slimming] remove sgl-kernel moe-wna16-marlin (#19379 )	2026-02-27 09:17:46 +08:00
pansicheng	2ad475b4ed	use flashinfer.sampling (#18696 )	2026-02-26 10:02:38 +08:00
Xiaoyu Zhang	9dff933164	[Kernel Slimming] Remove sgl-kernel AOT marlin kernels (#19241 )	2026-02-25 10:08:22 +08:00
Xiaoyu Zhang	c29394e3c8	[kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475 )	2026-02-14 23:06:21 +08:00
Xiaoyu Zhang	fb74e43707	[Diffusion] Delete sgl-kernel outdated time_embedding kernel (#17278 )	2026-01-28 14:18:53 +08:00
Michael	53609e5e5b	Revert "[Diffusion] Move diffusion time embedding to jit kernel" (#17257 )	2026-01-17 21:29:22 +08:00
Xiaoyu Zhang	2cdd4370bc	[Diffusion] Move diffusion time embedding to jit kernel (#16879 )	2026-01-17 12:21:22 +08:00
Hubert Lu	8716589826	[AMD][Diffusion] support timestep embedding kernel for AMD GPUs (#16766 )	2026-01-12 22:17:07 -08:00
66RING	46be74b4b4	[diffusion] kernel: timestep embedding kernel implementation (#12995 ) Co-authored-by: 戚余航 <qiyuhang@bytedance.com> Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com>	2025-12-19 20:59:50 +08:00
Kevin_Xiong	4792d1f452	[sgl-kernel][1/2] Fused qk_norm_rope for GLM4.6 (#15141 )	2025-12-18 17:07:04 +08:00
Qiaolin Yu	cb8df87fc1	[1/2] Add rope kernel in sgl-kernel (#14334 )	2025-12-04 16:45:44 +08:00
Qi Yuhang	16ff892c18	[sgl-kernel][Feat][B200][1/N] Support MXFP8 Grouped GEMM in Blackwell (#13731 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-12-04 10:09:09 +08:00
Xiaoyu Zhang	3de09aadbc	Add new moe wna16 marlin gemm (#14122 )	2025-12-01 23:07:53 +08:00
Yuan Luo	e12c78aab6	[sgl-kernel][1/2] Fused qk_norm_rope for Qwen3-MoE (#14036 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-11-28 12:25:15 +08:00
Xiaoyu Zhang	ecefc7904f	[sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819 )	2025-11-24 18:26:25 +08:00
Roger Young	e72cf13693	Support moe topk sigmoid kernel (#13049 ) Co-authored-by: xuebi <xuebi@minimaxi.com>	2025-11-20 00:24:37 +08:00
Xiaoyu Zhang	1d3d42bda0	[opt kimi k2 1 / n] Add kimi k2 moe fused gate (#13287 )	2025-11-15 17:14:19 +08:00
Fan Yin	2966367a31	[sgl-kernel] support custom fp8 flashmla kernel (#13087 )	2025-11-13 12:45:21 -08:00
hlu1	b8ddc296f4	[sgl-kernel][Deepseek V3.2] Add row_starts to topk kernel (#12582 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-11-07 18:33:27 -08:00
Lianmin Zheng	20315697f4	move all get_stream in sgl_kernel to c++ to reduce the launch overhead (#12521 )	2025-11-02 13:15:05 -08:00
Xiaoyu Zhang	95191ebdca	Migrate weak_ref_tensor to sgl-kernel (#12505 )	2025-11-02 10:55:39 +08:00
Lianmin Zheng	c0652d907b	Clean up sgl kernel (#12413 ) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2025-10-31 01:13:34 -07:00
huangtingwei	3e6281d0aa	[HiCache]Page head layout IO kernel (#11615 )	2025-10-26 15:53:50 +08:00
Fan Yin	23afdfd1c2	[sgl-kernel] support flashmla libtorch (#11717 )	2025-10-21 21:17:50 -07:00
hlu1	3b80232d06	[DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-10-19 17:13:39 -07:00
Johnny	252dc4e112	[NVIDIA] FA3/FA4 Fix (#11606 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-19 17:10:10 -07:00
Fan Yin	3289da5b41	[sgl-kernel] support hadamard (#11663 )	2025-10-15 19:00:44 -07:00
Qi Yuhang	6c01844f45	[sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674 )	2025-10-15 13:39:31 -07:00
Yineng Zhang	f792e3c561	Revert "[NVIDIA] BUMP FA3 (#11444 )" (#11582 )	2025-10-13 20:51:45 -07:00
Johnny	b8c430f1ce	[NVIDIA] BUMP FA3 (#11444 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>	2025-10-13 09:30:57 -07:00
Qi Yuhang	9a30914e94	[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-10-12 20:19:21 -07:00
PGFLMG	8fdcd98efe	[7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019 )	2025-10-11 14:04:57 -07:00
fzyzcjy	21337b22b9	Reland [1/2] Optimizations and refactors about quant kernel (#10312 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-10-11 15:59:03 +08:00
DarkSharpness	e0b2d3eebe	[Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-05 10:19:03 -07:00
Yuan Luo	616a3e20df	[sgl-kernel] Support moe_sum_reduce cuda kernel (#10321 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-09-19 14:12:09 +08:00
Zhihao Zhang	e7bc600304	[Feature] Speculative decoding support lookahead (#9873 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-09-18 16:42:41 -07:00
fzyzcjy	3b25dc127a	[1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473 )	2025-09-15 11:53:21 -07:00
Lianmin Zheng	c9ec4cae5b	Fix the style of sgl kernel (#10398 )	2025-09-12 22:20:21 -07:00
Yineng Zhang	6d55f60e77	Revert "[1/2] Optimizations and refactors about quant kernel (#9534 )" (#10292 )	2025-09-10 18:24:23 -07:00
huangtingwei	5be8c2f7f7	Page first direct IO kernel (#10060 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-09-10 13:35:34 +08:00
Yi Zhang	8cbe1538ef	Add mamba kernel (#10234 )	2025-09-09 12:58:43 -07:00
fzyzcjy	0096798ed6	[1/2] Speed up prefill mla attention (#10156 )	2025-09-08 09:00:33 -07:00
hlu1	5f1eb20484	[chore] Remove unused ep_moe cuda kernels (#9956 )	2025-09-06 01:35:50 -07:00
fzyzcjy	bd7f882142	Support copying tensor from cpu to gpu without using copy engines (#10007 )	2025-09-05 20:07:19 +08:00

1 2 3

124 Commits