Joey
|
15e6572f21
|
[MUSA][18/N] Add MUSA-optimized kernel implementations for hot ops (#23255)
Signed-off-by: Joey-gvwal <joey_gvwal@yeah.net>
Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>
|
2026-05-07 20:38:33 -07:00 |
|
sglang-bot
|
6764155914
|
chore: bump sgl-kernel version to 0.4.2.post1 (#24457)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-05-05 16:19:58 -07:00 |
|
Baizhou Zhang
|
200944b415
|
Update kernel installation instructions after shifting default cuda to 13 (#24181)
|
2026-05-02 16:07:20 -07:00 |
|
sglang-bot
|
2e027b1afe
|
chore: bump sgl-kernel version to 0.4.2 (#24170)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2026-04-30 15:02:07 -07:00 |
|
sglang-bot
|
714173555c
|
chore: bump sgl-kernel version to 0.4.1.post1 (#23720)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Kangyan Zhou <kangyan.zhou@radixark.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-25 17:13:02 -07:00 |
|
Jia Guo
|
587fd15bd2
|
perf: eliminate attention DtoD copy by passing pre-allocated output to FA (#21985)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-24 12:05:16 -07:00 |
|
Lianmin Zheng
|
9c47bbad13
|
Clean up bench_one_batch warning and simplify norm dispatch (#23110)
|
2026-04-17 17:42:20 -07:00 |
|
Lianmin Zheng
|
222eda1598
|
[Misc] Use cache_once for is_arch_support_pdl in sgl-kernel (#22725)
|
2026-04-14 15:22:10 -07:00 |
|
Baizhou Zhang
|
d14d368191
|
[Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467)
|
2026-04-11 01:59:57 -07:00 |
|
sglang-bot
|
2c4fb88929
|
chore: bump sgl-kernel version to 0.4.1 (#21447)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2026-04-02 22:31:59 -07:00 |
|
Baizhou Zhang
|
c7d03a6215
|
Revert "Rollback flashmla to older version [1/2]" (#21922)
|
2026-04-02 00:27:02 -07:00 |
|
Xiaoyu Zhang
|
cdd7d6a227
|
Remove obsolete sgl-kernel legacy paths (#21528)
|
2026-04-01 09:00:20 +08:00 |
|
Baizhou Zhang
|
dbe871efdd
|
Rollback flashmla to older version [1/2] (#21430)
|
2026-03-25 17:49:54 -07:00 |
|
Minglei Zhu
|
a12fea21ed
|
perf(sgl-kernel): expose get_scheduler_metadata for FA3 decode optimization (#21103)
|
2026-03-25 13:17:27 -07:00 |
|
Xiaoyu Zhang
|
766d225fcc
|
Add SGLang CUDA crash API logging inspired by FlashInfer (#20910)
|
2026-03-22 16:39:40 +08:00 |
|
Xiaoyu Zhang
|
15097c5c3b
|
Release sglang kernel 0.4.0 (#20440)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-16 20:34:58 +08:00 |
|
Xiaoyu Zhang
|
25e38216b6
|
[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277)
|
2026-03-14 16:45:54 +08:00 |
|
Johnsonms
|
7cf0551014
|
Migrate norm kernels to FlashInfer JIT implementation (#18871)
|
2026-03-10 14:56:07 +08:00 |
|
Baizhou Zhang
|
a6ae89fe3c
|
Revert "chore: bump sgl-kernel version to 0.3.21.post1" (#20229)
|
2026-03-09 20:32:19 -07:00 |
|
sglang-bot
|
0f0c8b2f18
|
chore: bump sgl-kernel version to 0.3.21.post1 (#20087)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>\
|
2026-03-08 03:03:58 -07:00 |
|
Fan Yin
|
43d6a32045
|
[sgl-kernel] rebase FlashMLA 0217 (#18902)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-07 00:30:52 -08:00 |
|
Mohammad Miadh Angkad
|
f88acf8780
|
[JIT Kernel] Reland NVFP4 kernels to JIT (#20012)
|
2026-03-07 10:31:08 +08:00 |
|
Johnsonms
|
2d266c73ea
|
Migrate renorm kernels from sgl-kernel to FlashInfer JIT (#18854)
|
2026-03-06 22:53:28 +08:00 |
|
Baizhou Zhang
|
51e5dc845a
|
Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005)
|
2026-03-05 19:40:00 -08:00 |
|
Rain Jiang
|
472eef4071
|
fa4 cleanup (#19727)
|
2026-03-05 17:54:25 +08:00 |
|
Mohammad Miadh Angkad
|
2bdd89a6cd
|
[Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437)
|
2026-03-05 15:22:28 +08:00 |
|
Xiaoyu Zhang
|
054bd71086
|
[sgl-kernel slimming] remove sgl-kernel moe-wna16-marlin (#19379)
|
2026-02-27 09:17:46 +08:00 |
|
pansicheng
|
2ad475b4ed
|
use flashinfer.sampling (#18696)
|
2026-02-26 10:02:38 +08:00 |
|
Xiaoyu Zhang
|
9dff933164
|
[Kernel Slimming] Remove sgl-kernel AOT marlin kernels (#19241)
|
2026-02-25 10:08:22 +08:00 |
|
blake-snc
|
5fc328465a
|
fix(sgl-kernel): support CUDA 13 runtime preloading for DGX Spark (#18747)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-16 00:43:04 +08:00 |
|
SoluMilken
|
07a24f1a38
|
update pre-commit config (#18860)
|
2026-02-16 00:18:31 +08:00 |
|
Xiaoyu Zhang
|
c29394e3c8
|
[kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475)
|
2026-02-14 23:06:21 +08:00 |
|
jianan-gu
|
336dc4579e
|
[CPU] Optimize Qwen3-next model on CPU (#12525)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
|
2026-01-29 22:03:58 -08:00 |
|
Xiaoyu Zhang
|
fb74e43707
|
[Diffusion] Delete sgl-kernel outdated time_embedding kernel (#17278)
|
2026-01-28 14:18:53 +08:00 |
|
Michael
|
53609e5e5b
|
Revert "[Diffusion] Move diffusion time embedding to jit kernel" (#17257)
|
2026-01-17 21:29:22 +08:00 |
|
Xiaoyu Zhang
|
2cdd4370bc
|
[Diffusion] Move diffusion time embedding to jit kernel (#16879)
|
2026-01-17 12:21:22 +08:00 |
|
sglang-bot
|
c86ca12875
|
chore: bump sgl-kernel version to 0.3.21 (#16888)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2026-01-14 13:27:49 +08:00 |
|
Johnny
|
b5493f65be
|
[NVIDIA] upstream FA4 (#15182)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-01-11 15:31:28 +08:00 |
|
sglang-bot
|
a39126672a
|
chore: bump sgl-kernel version to 0.3.20 (#15564)
|
2025-12-21 13:15:23 -08:00 |
|
66RING
|
46be74b4b4
|
[diffusion] kernel: timestep embedding kernel implementation (#12995)
Co-authored-by: 戚余航 <qiyuhang@bytedance.com>
Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com>
|
2025-12-19 20:59:50 +08:00 |
|
sunxxuns
|
f2d64e6782
|
[amd] Add deterministic all-reduce kernel for AMD (ROCm) (#15340)
Co-authored-by: Thomas Wang <1am9trash@gmail.com>
|
2025-12-18 23:36:03 -08:00 |
|
Kevin_Xiong
|
4792d1f452
|
[sgl-kernel][1/2] Fused qk_norm_rope for GLM4.6 (#15141)
|
2025-12-18 17:07:04 +08:00 |
|
sglang-bot
|
4a62a0e3cd
|
chore: bump sgl-kernel version to 0.3.19 (#14632)
|
2025-12-08 19:02:24 +08:00 |
|
Yuhao Yang
|
f72a77038f
|
modify the sgl-kernel to be compatible with transformers 5.x. (#14625)
|
2025-12-08 00:39:00 -08:00 |
|
sglang-bot
|
e11f795f63
|
chore: bump sgl-kernel version to 0.3.18.post3 (#14427)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-12-05 14:09:02 -08:00 |
|
Qiaolin Yu
|
cb8df87fc1
|
[1/2] Add rope kernel in sgl-kernel (#14334)
|
2025-12-04 16:45:44 +08:00 |
|
Qi Yuhang
|
16ff892c18
|
[sgl-kernel][Feat][B200][1/N] Support MXFP8 Grouped GEMM in Blackwell (#13731)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-12-04 10:09:09 +08:00 |
|
Antonin Vidon
|
df1f31241b
|
[sgl-kernel] fix runtime error while preloading CUDA runtime (#13089)
|
2025-12-02 23:03:51 +08:00 |
|
sglang-bot
|
34035d8cd9
|
chore: bump sgl-kernel version to 0.3.18.post2 (#14229)
|
2025-12-02 00:50:40 +08:00 |
|
Xiaoyu Zhang
|
3de09aadbc
|
Add new moe wna16 marlin gemm (#14122)
|
2025-12-01 23:07:53 +08:00 |
|