Commit Graph

280 Commits

Author SHA1 Message Date
Joey
15e6572f21 [MUSA][18/N] Add MUSA-optimized kernel implementations for hot ops (#23255)
Signed-off-by: Joey-gvwal <joey_gvwal@yeah.net>
Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>
2026-05-07 20:38:33 -07:00
sglang-bot
6764155914 chore: bump sgl-kernel version to 0.4.2.post1 (#24457)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-05-05 16:19:58 -07:00
Baizhou Zhang
200944b415 Update kernel installation instructions after shifting default cuda to 13 (#24181) 2026-05-02 16:07:20 -07:00
sglang-bot
2e027b1afe chore: bump sgl-kernel version to 0.4.2 (#24170)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-30 15:02:07 -07:00
sglang-bot
714173555c chore: bump sgl-kernel version to 0.4.1.post1 (#23720)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Kangyan Zhou <kangyan.zhou@radixark.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 17:13:02 -07:00
Jia Guo
587fd15bd2 perf: eliminate attention DtoD copy by passing pre-allocated output to FA (#21985)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 12:05:16 -07:00
Lianmin Zheng
9c47bbad13 Clean up bench_one_batch warning and simplify norm dispatch (#23110) 2026-04-17 17:42:20 -07:00
Lianmin Zheng
222eda1598 [Misc] Use cache_once for is_arch_support_pdl in sgl-kernel (#22725) 2026-04-14 15:22:10 -07:00
Baizhou Zhang
d14d368191 [Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467) 2026-04-11 01:59:57 -07:00
sglang-bot
2c4fb88929 chore: bump sgl-kernel version to 0.4.1 (#21447)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-02 22:31:59 -07:00
Baizhou Zhang
c7d03a6215 Revert "Rollback flashmla to older version [1/2]" (#21922) 2026-04-02 00:27:02 -07:00
Xiaoyu Zhang
cdd7d6a227 Remove obsolete sgl-kernel legacy paths (#21528) 2026-04-01 09:00:20 +08:00
Baizhou Zhang
dbe871efdd Rollback flashmla to older version [1/2] (#21430) 2026-03-25 17:49:54 -07:00
Minglei Zhu
a12fea21ed perf(sgl-kernel): expose get_scheduler_metadata for FA3 decode optimization (#21103) 2026-03-25 13:17:27 -07:00
Xiaoyu Zhang
766d225fcc Add SGLang CUDA crash API logging inspired by FlashInfer (#20910) 2026-03-22 16:39:40 +08:00
Xiaoyu Zhang
15097c5c3b Release sglang kernel 0.4.0 (#20440)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-16 20:34:58 +08:00
Xiaoyu Zhang
25e38216b6 [kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277) 2026-03-14 16:45:54 +08:00
Johnsonms
7cf0551014 Migrate norm kernels to FlashInfer JIT implementation (#18871) 2026-03-10 14:56:07 +08:00
Baizhou Zhang
a6ae89fe3c Revert "chore: bump sgl-kernel version to 0.3.21.post1" (#20229) 2026-03-09 20:32:19 -07:00
sglang-bot
0f0c8b2f18 chore: bump sgl-kernel version to 0.3.21.post1 (#20087)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>\
2026-03-08 03:03:58 -07:00
Fan Yin
43d6a32045 [sgl-kernel] rebase FlashMLA 0217 (#18902)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-07 00:30:52 -08:00
Mohammad Miadh Angkad
f88acf8780 [JIT Kernel] Reland NVFP4 kernels to JIT (#20012) 2026-03-07 10:31:08 +08:00
Johnsonms
2d266c73ea Migrate renorm kernels from sgl-kernel to FlashInfer JIT (#18854) 2026-03-06 22:53:28 +08:00
Baizhou Zhang
51e5dc845a Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005) 2026-03-05 19:40:00 -08:00
Rain Jiang
472eef4071 fa4 cleanup (#19727) 2026-03-05 17:54:25 +08:00
Mohammad Miadh Angkad
2bdd89a6cd [Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437) 2026-03-05 15:22:28 +08:00
Xiaoyu Zhang
054bd71086 [sgl-kernel slimming] remove sgl-kernel moe-wna16-marlin (#19379) 2026-02-27 09:17:46 +08:00
pansicheng
2ad475b4ed use flashinfer.sampling (#18696) 2026-02-26 10:02:38 +08:00
Xiaoyu Zhang
9dff933164 [Kernel Slimming] Remove sgl-kernel AOT marlin kernels (#19241) 2026-02-25 10:08:22 +08:00
blake-snc
5fc328465a fix(sgl-kernel): support CUDA 13 runtime preloading for DGX Spark (#18747)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 00:43:04 +08:00
SoluMilken
07a24f1a38 update pre-commit config (#18860) 2026-02-16 00:18:31 +08:00
Xiaoyu Zhang
c29394e3c8 [kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475) 2026-02-14 23:06:21 +08:00
jianan-gu
336dc4579e [CPU] Optimize Qwen3-next model on CPU (#12525)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
2026-01-29 22:03:58 -08:00
Xiaoyu Zhang
fb74e43707 [Diffusion] Delete sgl-kernel outdated time_embedding kernel (#17278) 2026-01-28 14:18:53 +08:00
Michael
53609e5e5b Revert "[Diffusion] Move diffusion time embedding to jit kernel" (#17257) 2026-01-17 21:29:22 +08:00
Xiaoyu Zhang
2cdd4370bc [Diffusion] Move diffusion time embedding to jit kernel (#16879) 2026-01-17 12:21:22 +08:00
sglang-bot
c86ca12875 chore: bump sgl-kernel version to 0.3.21 (#16888)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-01-14 13:27:49 +08:00
Johnny
b5493f65be [NVIDIA] upstream FA4 (#15182)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-01-11 15:31:28 +08:00
sglang-bot
a39126672a chore: bump sgl-kernel version to 0.3.20 (#15564) 2025-12-21 13:15:23 -08:00
66RING
46be74b4b4 [diffusion] kernel: timestep embedding kernel implementation (#12995)
Co-authored-by: 戚余航 <qiyuhang@bytedance.com>
Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com>
2025-12-19 20:59:50 +08:00
sunxxuns
f2d64e6782 [amd] Add deterministic all-reduce kernel for AMD (ROCm) (#15340)
Co-authored-by: Thomas Wang <1am9trash@gmail.com>
2025-12-18 23:36:03 -08:00
Kevin_Xiong
4792d1f452 [sgl-kernel][1/2] Fused qk_norm_rope for GLM4.6 (#15141) 2025-12-18 17:07:04 +08:00
sglang-bot
4a62a0e3cd chore: bump sgl-kernel version to 0.3.19 (#14632) 2025-12-08 19:02:24 +08:00
Yuhao Yang
f72a77038f modify the sgl-kernel to be compatible with transformers 5.x. (#14625) 2025-12-08 00:39:00 -08:00
sglang-bot
e11f795f63 chore: bump sgl-kernel version to 0.3.18.post3 (#14427)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-12-05 14:09:02 -08:00
Qiaolin Yu
cb8df87fc1 [1/2] Add rope kernel in sgl-kernel (#14334) 2025-12-04 16:45:44 +08:00
Qi Yuhang
16ff892c18 [sgl-kernel][Feat][B200][1/N] Support MXFP8 Grouped GEMM in Blackwell (#13731)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-12-04 10:09:09 +08:00
Antonin Vidon
df1f31241b [sgl-kernel] fix runtime error while preloading CUDA runtime (#13089) 2025-12-02 23:03:51 +08:00
sglang-bot
34035d8cd9 chore: bump sgl-kernel version to 0.3.18.post2 (#14229) 2025-12-02 00:50:40 +08:00
Xiaoyu Zhang
3de09aadbc Add new moe wna16 marlin gemm (#14122) 2025-12-01 23:07:53 +08:00