Xiaoyu Zhang
|
cdd7d6a227
|
Remove obsolete sgl-kernel legacy paths (#21528)
|
2026-04-01 09:00:20 +08:00 |
|
Xiaoyu Zhang
|
766d225fcc
|
Add SGLang CUDA crash API logging inspired by FlashInfer (#20910)
|
2026-03-22 16:39:40 +08:00 |
|
Xiaoyu Zhang
|
25e38216b6
|
[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277)
|
2026-03-14 16:45:54 +08:00 |
|
Mohammad Miadh Angkad
|
f88acf8780
|
[JIT Kernel] Reland NVFP4 kernels to JIT (#20012)
|
2026-03-07 10:31:08 +08:00 |
|
Baizhou Zhang
|
51e5dc845a
|
Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005)
|
2026-03-05 19:40:00 -08:00 |
|
Mohammad Miadh Angkad
|
2bdd89a6cd
|
[Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437)
|
2026-03-05 15:22:28 +08:00 |
|
Xiaoyu Zhang
|
054bd71086
|
[sgl-kernel slimming] remove sgl-kernel moe-wna16-marlin (#19379)
|
2026-02-27 09:17:46 +08:00 |
|
pansicheng
|
2ad475b4ed
|
use flashinfer.sampling (#18696)
|
2026-02-26 10:02:38 +08:00 |
|
Xiaoyu Zhang
|
9dff933164
|
[Kernel Slimming] Remove sgl-kernel AOT marlin kernels (#19241)
|
2026-02-25 10:08:22 +08:00 |
|
Xiaoyu Zhang
|
c29394e3c8
|
[kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475)
|
2026-02-14 23:06:21 +08:00 |
|
jianan-gu
|
336dc4579e
|
[CPU] Optimize Qwen3-next model on CPU (#12525)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
|
2026-01-29 22:03:58 -08:00 |
|
Xiaoyu Zhang
|
fb74e43707
|
[Diffusion] Delete sgl-kernel outdated time_embedding kernel (#17278)
|
2026-01-28 14:18:53 +08:00 |
|
Michael
|
53609e5e5b
|
Revert "[Diffusion] Move diffusion time embedding to jit kernel" (#17257)
|
2026-01-17 21:29:22 +08:00 |
|
Xiaoyu Zhang
|
2cdd4370bc
|
[Diffusion] Move diffusion time embedding to jit kernel (#16879)
|
2026-01-17 12:21:22 +08:00 |
|
66RING
|
46be74b4b4
|
[diffusion] kernel: timestep embedding kernel implementation (#12995)
Co-authored-by: 戚余航 <qiyuhang@bytedance.com>
Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com>
|
2025-12-19 20:59:50 +08:00 |
|
Qiaolin Yu
|
cb8df87fc1
|
[1/2] Add rope kernel in sgl-kernel (#14334)
|
2025-12-04 16:45:44 +08:00 |
|
Qi Yuhang
|
16ff892c18
|
[sgl-kernel][Feat][B200][1/N] Support MXFP8 Grouped GEMM in Blackwell (#13731)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-12-04 10:09:09 +08:00 |
|
Yuan Luo
|
e12c78aab6
|
[sgl-kernel][1/2] Fused qk_norm_rope for Qwen3-MoE (#14036)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-11-28 12:25:15 +08:00 |
|
Xiaoyu Zhang
|
ecefc7904f
|
[sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819)
|
2025-11-24 18:26:25 +08:00 |
|
Xiaoyu Zhang
|
fb04d43428
|
[kimi k2 thinking] Avoid useless torch.zeros_ (#13596)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
2025-11-21 13:15:27 +08:00 |
|
Roger Young
|
e72cf13693
|
Support moe topk sigmoid kernel (#13049)
Co-authored-by: xuebi <xuebi@minimaxi.com>
|
2025-11-20 00:24:37 +08:00 |
|
Xiaoyu Zhang
|
1d3d42bda0
|
[opt kimi k2 1 / n] Add kimi k2 moe fused gate (#13287)
|
2025-11-15 17:14:19 +08:00 |
|
Ke Bao
|
8e9f05ece1
|
Update marlin moe kernel interface (#13322)
|
2025-11-15 17:10:39 +08:00 |
|
Xiaoyu Zhang
|
547de8c774
|
[1 / 2] register weak_ref_tensor in sgl-kernel (#12999)
|
2025-11-10 22:12:59 +08:00 |
|
Lianmin Zheng
|
c0652d907b
|
Clean up sgl kernel (#12413)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2025-10-31 01:13:34 -07:00 |
|
DarkSharpness
|
e8b71445c0
|
[Misc] Improve the error message of failed import (#12119)
|
2025-10-25 12:09:05 -07:00 |
|
hlu1
|
3b80232d06
|
[DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-10-19 17:13:39 -07:00 |
|
fzyzcjy
|
a27825ae01
|
Support not officially supported high sgl-kernel version with low srt version (#11786)
|
2025-10-19 16:11:59 +08:00 |
|
Fan Yin
|
3289da5b41
|
[sgl-kernel] support hadamard (#11663)
|
2025-10-15 19:00:44 -07:00 |
|
Qi Yuhang
|
dc48c4c0e3
|
[sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534)
|
2025-10-13 16:24:48 -07:00 |
|
Qi Yuhang
|
9a30914e94
|
[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-10-12 20:19:21 -07:00 |
|
PGFLMG
|
8fdcd98efe
|
[7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019)
|
2025-10-11 14:04:57 -07:00 |
|
fzyzcjy
|
21337b22b9
|
Reland [1/2] Optimizations and refactors about quant kernel (#10312)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-10-11 15:59:03 +08:00 |
|
DarkSharpness
|
e0b2d3eebe
|
[Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-05 10:19:03 -07:00 |
|
Kangyan-Zhou
|
0c9174108a
|
Unify SGL Kernel Releases (#10701)
|
2025-09-28 19:48:28 -07:00 |
|
Yuan Luo
|
616a3e20df
|
[sgl-kernel] Support moe_sum_reduce cuda kernel (#10321)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-09-19 14:12:09 +08:00 |
|
Zhihao Zhang
|
e7bc600304
|
[Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2025-09-18 16:42:41 -07:00 |
|
Zaili Wang
|
6fd4816d9f
|
Fix sgl_kernel import failure on devices other than CUDA (#10610)
|
2025-09-18 11:38:02 -07:00 |
|
EduardDurech
|
a77564e0fb
|
CUDA Arch Independent (#8813)
|
2025-09-16 23:01:45 -07:00 |
|
fzyzcjy
|
3b25dc127a
|
[1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473)
|
2025-09-15 11:53:21 -07:00 |
|
Lianmin Zheng
|
c9ec4cae5b
|
Fix the style of sgl kernel (#10398)
|
2025-09-12 22:20:21 -07:00 |
|
Yineng Zhang
|
6d55f60e77
|
Revert "[1/2] Optimizations and refactors about quant kernel (#9534)" (#10292)
|
2025-09-10 18:24:23 -07:00 |
|
Yi Zhang
|
8cbe1538ef
|
Add mamba kernel (#10234)
|
2025-09-09 12:58:43 -07:00 |
|
fzyzcjy
|
0096798ed6
|
[1/2] Speed up prefill mla attention (#10156)
|
2025-09-08 09:00:33 -07:00 |
|
hlu1
|
5f1eb20484
|
[chore] Remove unused ep_moe cuda kernels (#9956)
|
2025-09-06 01:35:50 -07:00 |
|
fzyzcjy
|
bd7f882142
|
Support copying tensor from cpu to gpu without using copy engines (#10007)
|
2025-09-05 20:07:19 +08:00 |
|
fzyzcjy
|
339f8eef09
|
[1/2] Optimizations and refactors about quant kernel (#9534)
|
2025-09-05 18:45:08 +08:00 |
|
Kaixi Hou
|
e5638573c1
|
[NVIDA] [1/N] Nvfp4 Masked Gemm: Add quant op for the flashinfer grouped gemm (#9200)
|
2025-08-22 12:19:45 -07:00 |
|
Lianmin Zheng
|
ecc9f3e47a
|
[Minor] Fix the style of sgl-kernel (#9332)
|
2025-08-18 23:45:00 -07:00 |
|
Lianmin Zheng
|
c480a3f6ea
|
Minor style fixes for sgl-kernel (#9289)
|
2025-08-18 09:38:35 -07:00 |
|