Commit Graph

86 Commits

Author SHA1 Message Date
Xiaoyu Zhang
cdd7d6a227 Remove obsolete sgl-kernel legacy paths (#21528) 2026-04-01 09:00:20 +08:00
Xiaoyu Zhang
766d225fcc Add SGLang CUDA crash API logging inspired by FlashInfer (#20910) 2026-03-22 16:39:40 +08:00
Xiaoyu Zhang
25e38216b6 [kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277) 2026-03-14 16:45:54 +08:00
Mohammad Miadh Angkad
f88acf8780 [JIT Kernel] Reland NVFP4 kernels to JIT (#20012) 2026-03-07 10:31:08 +08:00
Baizhou Zhang
51e5dc845a Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005) 2026-03-05 19:40:00 -08:00
Mohammad Miadh Angkad
2bdd89a6cd [Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437) 2026-03-05 15:22:28 +08:00
Xiaoyu Zhang
054bd71086 [sgl-kernel slimming] remove sgl-kernel moe-wna16-marlin (#19379) 2026-02-27 09:17:46 +08:00
pansicheng
2ad475b4ed use flashinfer.sampling (#18696) 2026-02-26 10:02:38 +08:00
Xiaoyu Zhang
9dff933164 [Kernel Slimming] Remove sgl-kernel AOT marlin kernels (#19241) 2026-02-25 10:08:22 +08:00
Xiaoyu Zhang
c29394e3c8 [kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475) 2026-02-14 23:06:21 +08:00
jianan-gu
336dc4579e [CPU] Optimize Qwen3-next model on CPU (#12525)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
2026-01-29 22:03:58 -08:00
Xiaoyu Zhang
fb74e43707 [Diffusion] Delete sgl-kernel outdated time_embedding kernel (#17278) 2026-01-28 14:18:53 +08:00
Michael
53609e5e5b Revert "[Diffusion] Move diffusion time embedding to jit kernel" (#17257) 2026-01-17 21:29:22 +08:00
Xiaoyu Zhang
2cdd4370bc [Diffusion] Move diffusion time embedding to jit kernel (#16879) 2026-01-17 12:21:22 +08:00
66RING
46be74b4b4 [diffusion] kernel: timestep embedding kernel implementation (#12995)
Co-authored-by: 戚余航 <qiyuhang@bytedance.com>
Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com>
2025-12-19 20:59:50 +08:00
Qiaolin Yu
cb8df87fc1 [1/2] Add rope kernel in sgl-kernel (#14334) 2025-12-04 16:45:44 +08:00
Qi Yuhang
16ff892c18 [sgl-kernel][Feat][B200][1/N] Support MXFP8 Grouped GEMM in Blackwell (#13731)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-12-04 10:09:09 +08:00
Yuan Luo
e12c78aab6 [sgl-kernel][1/2] Fused qk_norm_rope for Qwen3-MoE (#14036)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-11-28 12:25:15 +08:00
Xiaoyu Zhang
ecefc7904f [sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819) 2025-11-24 18:26:25 +08:00
Xiaoyu Zhang
fb04d43428 [kimi k2 thinking] Avoid useless torch.zeros_ (#13596)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-21 13:15:27 +08:00
Roger Young
e72cf13693 Support moe topk sigmoid kernel (#13049)
Co-authored-by: xuebi <xuebi@minimaxi.com>
2025-11-20 00:24:37 +08:00
Xiaoyu Zhang
1d3d42bda0 [opt kimi k2 1 / n] Add kimi k2 moe fused gate (#13287) 2025-11-15 17:14:19 +08:00
Ke Bao
8e9f05ece1 Update marlin moe kernel interface (#13322) 2025-11-15 17:10:39 +08:00
Xiaoyu Zhang
547de8c774 [1 / 2] register weak_ref_tensor in sgl-kernel (#12999) 2025-11-10 22:12:59 +08:00
Lianmin Zheng
c0652d907b Clean up sgl kernel (#12413)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2025-10-31 01:13:34 -07:00
DarkSharpness
e8b71445c0 [Misc] Improve the error message of failed import (#12119) 2025-10-25 12:09:05 -07:00
hlu1
3b80232d06 [DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-10-19 17:13:39 -07:00
fzyzcjy
a27825ae01 Support not officially supported high sgl-kernel version with low srt version (#11786) 2025-10-19 16:11:59 +08:00
Fan Yin
3289da5b41 [sgl-kernel] support hadamard (#11663) 2025-10-15 19:00:44 -07:00
Qi Yuhang
dc48c4c0e3 [sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534) 2025-10-13 16:24:48 -07:00
Qi Yuhang
9a30914e94 [sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-10-12 20:19:21 -07:00
PGFLMG
8fdcd98efe [7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019) 2025-10-11 14:04:57 -07:00
fzyzcjy
21337b22b9 Reland [1/2] Optimizations and refactors about quant kernel (#10312)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-10-11 15:59:03 +08:00
DarkSharpness
e0b2d3eebe [Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-10-05 10:19:03 -07:00
Kangyan-Zhou
0c9174108a Unify SGL Kernel Releases (#10701) 2025-09-28 19:48:28 -07:00
Yuan Luo
616a3e20df [sgl-kernel] Support moe_sum_reduce cuda kernel (#10321)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-09-19 14:12:09 +08:00
Zhihao Zhang
e7bc600304 [Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-09-18 16:42:41 -07:00
Zaili Wang
6fd4816d9f Fix sgl_kernel import failure on devices other than CUDA (#10610) 2025-09-18 11:38:02 -07:00
EduardDurech
a77564e0fb CUDA Arch Independent (#8813) 2025-09-16 23:01:45 -07:00
fzyzcjy
3b25dc127a [1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473) 2025-09-15 11:53:21 -07:00
Lianmin Zheng
c9ec4cae5b Fix the style of sgl kernel (#10398) 2025-09-12 22:20:21 -07:00
Yineng Zhang
6d55f60e77 Revert "[1/2] Optimizations and refactors about quant kernel (#9534)" (#10292) 2025-09-10 18:24:23 -07:00
Yi Zhang
8cbe1538ef Add mamba kernel (#10234) 2025-09-09 12:58:43 -07:00
fzyzcjy
0096798ed6 [1/2] Speed up prefill mla attention (#10156) 2025-09-08 09:00:33 -07:00
hlu1
5f1eb20484 [chore] Remove unused ep_moe cuda kernels (#9956) 2025-09-06 01:35:50 -07:00
fzyzcjy
bd7f882142 Support copying tensor from cpu to gpu without using copy engines (#10007) 2025-09-05 20:07:19 +08:00
fzyzcjy
339f8eef09 [1/2] Optimizations and refactors about quant kernel (#9534) 2025-09-05 18:45:08 +08:00
Kaixi Hou
e5638573c1 [NVIDA] [1/N] Nvfp4 Masked Gemm: Add quant op for the flashinfer grouped gemm (#9200) 2025-08-22 12:19:45 -07:00
Lianmin Zheng
ecc9f3e47a [Minor] Fix the style of sgl-kernel (#9332) 2025-08-18 23:45:00 -07:00
Lianmin Zheng
c480a3f6ea Minor style fixes for sgl-kernel (#9289) 2025-08-18 09:38:35 -07:00