Commit Graph

110 Commits

Author SHA1 Message Date
Xiaoyu Zhang
cdd7d6a227 Remove obsolete sgl-kernel legacy paths (#21528) 2026-04-01 09:00:20 +08:00
Xiaoyu Zhang
25e38216b6 [kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277) 2026-03-14 16:45:54 +08:00
Mohammad Miadh Angkad
f88acf8780 [JIT Kernel] Reland NVFP4 kernels to JIT (#20012) 2026-03-07 10:31:08 +08:00
Baizhou Zhang
51e5dc845a Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005) 2026-03-05 19:40:00 -08:00
Mohammad Miadh Angkad
2bdd89a6cd [Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437) 2026-03-05 15:22:28 +08:00
Xiaoyu Zhang
054bd71086 [sgl-kernel slimming] remove sgl-kernel moe-wna16-marlin (#19379) 2026-02-27 09:17:46 +08:00
pansicheng
2ad475b4ed use flashinfer.sampling (#18696) 2026-02-26 10:02:38 +08:00
Xiaoyu Zhang
9dff933164 [Kernel Slimming] Remove sgl-kernel AOT marlin kernels (#19241) 2026-02-25 10:08:22 +08:00
Xiaoyu Zhang
c29394e3c8 [kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475) 2026-02-14 23:06:21 +08:00
Xiaoyu Zhang
fb74e43707 [Diffusion] Delete sgl-kernel outdated time_embedding kernel (#17278) 2026-01-28 14:18:53 +08:00
Michael
53609e5e5b Revert "[Diffusion] Move diffusion time embedding to jit kernel" (#17257) 2026-01-17 21:29:22 +08:00
Xiaoyu Zhang
2cdd4370bc [Diffusion] Move diffusion time embedding to jit kernel (#16879) 2026-01-17 12:21:22 +08:00
66RING
46be74b4b4 [diffusion] kernel: timestep embedding kernel implementation (#12995)
Co-authored-by: 戚余航 <qiyuhang@bytedance.com>
Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com>
2025-12-19 20:59:50 +08:00
Kevin_Xiong
4792d1f452 [sgl-kernel][1/2] Fused qk_norm_rope for GLM4.6 (#15141) 2025-12-18 17:07:04 +08:00
Qiaolin Yu
cb8df87fc1 [1/2] Add rope kernel in sgl-kernel (#14334) 2025-12-04 16:45:44 +08:00
Qi Yuhang
16ff892c18 [sgl-kernel][Feat][B200][1/N] Support MXFP8 Grouped GEMM in Blackwell (#13731)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-12-04 10:09:09 +08:00
Xiaoyu Zhang
3de09aadbc Add new moe wna16 marlin gemm (#14122) 2025-12-01 23:07:53 +08:00
Yuan Luo
e12c78aab6 [sgl-kernel][1/2] Fused qk_norm_rope for Qwen3-MoE (#14036)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-11-28 12:25:15 +08:00
Xiaoyu Zhang
ecefc7904f [sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819) 2025-11-24 18:26:25 +08:00
Roger Young
e72cf13693 Support moe topk sigmoid kernel (#13049)
Co-authored-by: xuebi <xuebi@minimaxi.com>
2025-11-20 00:24:37 +08:00
Xiaoyu Zhang
1d3d42bda0 [opt kimi k2 1 / n] Add kimi k2 moe fused gate (#13287) 2025-11-15 17:14:19 +08:00
Fan Yin
2966367a31 [sgl-kernel] support custom fp8 flashmla kernel (#13087) 2025-11-13 12:45:21 -08:00
hlu1
b8ddc296f4 [sgl-kernel][Deepseek V3.2] Add row_starts to topk kernel (#12582)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-11-07 18:33:27 -08:00
Lianmin Zheng
20315697f4 move all get_stream in sgl_kernel to c++ to reduce the launch overhead (#12521) 2025-11-02 13:15:05 -08:00
Xiaoyu Zhang
95191ebdca Migrate weak_ref_tensor to sgl-kernel (#12505) 2025-11-02 10:55:39 +08:00
Lianmin Zheng
c0652d907b Clean up sgl kernel (#12413)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2025-10-31 01:13:34 -07:00
huangtingwei
3e6281d0aa [HiCache]Page head layout IO kernel (#11615) 2025-10-26 15:53:50 +08:00
Fan Yin
23afdfd1c2 [sgl-kernel] support flashmla libtorch (#11717) 2025-10-21 21:17:50 -07:00
hlu1
3b80232d06 [DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-10-19 17:13:39 -07:00
Fan Yin
3289da5b41 [sgl-kernel] support hadamard (#11663) 2025-10-15 19:00:44 -07:00
Qi Yuhang
6c01844f45 [sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674) 2025-10-15 13:39:31 -07:00
Qi Yuhang
9a30914e94 [sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-10-12 20:19:21 -07:00
PGFLMG
8fdcd98efe [7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019) 2025-10-11 14:04:57 -07:00
fzyzcjy
21337b22b9 Reland [1/2] Optimizations and refactors about quant kernel (#10312)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-10-11 15:59:03 +08:00
DarkSharpness
e0b2d3eebe [Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-10-05 10:19:03 -07:00
Yuan Luo
616a3e20df [sgl-kernel] Support moe_sum_reduce cuda kernel (#10321)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-09-19 14:12:09 +08:00
Zhihao Zhang
e7bc600304 [Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-09-18 16:42:41 -07:00
fzyzcjy
3b25dc127a [1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473) 2025-09-15 11:53:21 -07:00
Lianmin Zheng
c9ec4cae5b Fix the style of sgl kernel (#10398) 2025-09-12 22:20:21 -07:00
Yineng Zhang
6d55f60e77 Revert "[1/2] Optimizations and refactors about quant kernel (#9534)" (#10292) 2025-09-10 18:24:23 -07:00
huangtingwei
5be8c2f7f7 Page first direct IO kernel (#10060)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-10 13:35:34 +08:00
Yi Zhang
8cbe1538ef Add mamba kernel (#10234) 2025-09-09 12:58:43 -07:00
fzyzcjy
0096798ed6 [1/2] Speed up prefill mla attention (#10156) 2025-09-08 09:00:33 -07:00
hlu1
5f1eb20484 [chore] Remove unused ep_moe cuda kernels (#9956) 2025-09-06 01:35:50 -07:00
fzyzcjy
bd7f882142 Support copying tensor from cpu to gpu without using copy engines (#10007) 2025-09-05 20:07:19 +08:00
fzyzcjy
339f8eef09 [1/2] Optimizations and refactors about quant kernel (#9534) 2025-09-05 18:45:08 +08:00
Kaixi Hou
5c34b4f1c7 [NVIDIA] [2/N] Optimize silu_and_mul_scaled_fp4_grouped_quant perf (#9556) 2025-08-29 17:17:03 -07:00
Kaixi Hou
e5638573c1 [NVIDA] [1/N] Nvfp4 Masked Gemm: Add quant op for the flashinfer grouped gemm (#9200) 2025-08-22 12:19:45 -07:00
fzyzcjy
42c8704560 Add PDL support for quant kernel and rope kernel (#9106) 2025-08-20 01:56:29 -07:00
Lianmin Zheng
c480a3f6ea Minor style fixes for sgl-kernel (#9289) 2025-08-18 09:38:35 -07:00