sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 11:48:01 +00:00

Author	SHA1	Message	Date
Joey	15e6572f21	[MUSA][18/N] Add MUSA-optimized kernel implementations for hot ops (#23255 ) Signed-off-by: Joey-gvwal <joey_gvwal@yeah.net> Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>	2026-05-07 20:38:33 -07:00
sglang-bot	6764155914	chore: bump sgl-kernel version to 0.4.2.post1 (#24457 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-05-05 16:19:58 -07:00
Baizhou Zhang	200944b415	Update kernel installation instructions after shifting default cuda to 13 (#24181 )	2026-05-02 16:07:20 -07:00
sglang-bot	2e027b1afe	chore: bump sgl-kernel version to 0.4.2 (#24170 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-04-30 15:02:07 -07:00
sglang-bot	714173555c	chore: bump sgl-kernel version to 0.4.1.post1 (#23720 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com> Co-authored-by: Kangyan Zhou <kangyan.zhou@radixark.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 17:13:02 -07:00
Jia Guo	587fd15bd2	perf: eliminate attention DtoD copy by passing pre-allocated output to FA (#21985 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-24 12:05:16 -07:00
Lianmin Zheng	9c47bbad13	Clean up bench_one_batch warning and simplify norm dispatch (#23110 )	2026-04-17 17:42:20 -07:00
Lianmin Zheng	222eda1598	[Misc] Use cache_once for is_arch_support_pdl in sgl-kernel (#22725 )	2026-04-14 15:22:10 -07:00
Baizhou Zhang	d14d368191	[Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467 )	2026-04-11 01:59:57 -07:00
sglang-bot	2c4fb88929	chore: bump sgl-kernel version to 0.4.1 (#21447 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-04-02 22:31:59 -07:00
Baizhou Zhang	c7d03a6215	Revert "Rollback flashmla to older version [1/2]" (#21922 )	2026-04-02 00:27:02 -07:00
Xiaoyu Zhang	cdd7d6a227	Remove obsolete sgl-kernel legacy paths (#21528 )	2026-04-01 09:00:20 +08:00
Baizhou Zhang	dbe871efdd	Rollback flashmla to older version [1/2] (#21430 )	2026-03-25 17:49:54 -07:00
Minglei Zhu	a12fea21ed	perf(sgl-kernel): expose get_scheduler_metadata for FA3 decode optimization (#21103 )	2026-03-25 13:17:27 -07:00
Xiaoyu Zhang	766d225fcc	Add SGLang CUDA crash API logging inspired by FlashInfer (#20910 )	2026-03-22 16:39:40 +08:00
Xiaoyu Zhang	15097c5c3b	Release sglang kernel 0.4.0 (#20440 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-03-16 20:34:58 +08:00
Xiaoyu Zhang	25e38216b6	[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277 )	2026-03-14 16:45:54 +08:00
Johnsonms	7cf0551014	Migrate norm kernels to FlashInfer JIT implementation (#18871 )	2026-03-10 14:56:07 +08:00
Baizhou Zhang	a6ae89fe3c	Revert "chore: bump sgl-kernel version to 0.3.21.post1" (#20229 )	2026-03-09 20:32:19 -07:00
sglang-bot	0f0c8b2f18	chore: bump sgl-kernel version to 0.3.21.post1 (#20087 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>\	2026-03-08 03:03:58 -07:00
Fan Yin	43d6a32045	[sgl-kernel] rebase FlashMLA 0217 (#18902 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-03-07 00:30:52 -08:00
Mohammad Miadh Angkad	f88acf8780	[JIT Kernel] Reland NVFP4 kernels to JIT (#20012 )	2026-03-07 10:31:08 +08:00
Johnsonms	2d266c73ea	Migrate renorm kernels from sgl-kernel to FlashInfer JIT (#18854 )	2026-03-06 22:53:28 +08:00
Baizhou Zhang	51e5dc845a	Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005 )	2026-03-05 19:40:00 -08:00
Rain Jiang	472eef4071	fa4 cleanup (#19727 )	2026-03-05 17:54:25 +08:00
Mohammad Miadh Angkad	2bdd89a6cd	[Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437 )	2026-03-05 15:22:28 +08:00
Xiaoyu Zhang	054bd71086	[sgl-kernel slimming] remove sgl-kernel moe-wna16-marlin (#19379 )	2026-02-27 09:17:46 +08:00
pansicheng	2ad475b4ed	use flashinfer.sampling (#18696 )	2026-02-26 10:02:38 +08:00
Xiaoyu Zhang	9dff933164	[Kernel Slimming] Remove sgl-kernel AOT marlin kernels (#19241 )	2026-02-25 10:08:22 +08:00
blake-snc	5fc328465a	fix(sgl-kernel): support CUDA 13 runtime preloading for DGX Spark (#18747 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 00:43:04 +08:00
SoluMilken	07a24f1a38	update pre-commit config (#18860 )	2026-02-16 00:18:31 +08:00
Xiaoyu Zhang	c29394e3c8	[kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475 )	2026-02-14 23:06:21 +08:00
jianan-gu	336dc4579e	[CPU] Optimize Qwen3-next model on CPU (#12525 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com> Co-authored-by: Fan Yin <1106310035@qq.com>	2026-01-29 22:03:58 -08:00
Xiaoyu Zhang	fb74e43707	[Diffusion] Delete sgl-kernel outdated time_embedding kernel (#17278 )	2026-01-28 14:18:53 +08:00
Michael	53609e5e5b	Revert "[Diffusion] Move diffusion time embedding to jit kernel" (#17257 )	2026-01-17 21:29:22 +08:00
Xiaoyu Zhang	2cdd4370bc	[Diffusion] Move diffusion time embedding to jit kernel (#16879 )	2026-01-17 12:21:22 +08:00
sglang-bot	c86ca12875	chore: bump sgl-kernel version to 0.3.21 (#16888 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-01-14 13:27:49 +08:00
Johnny	b5493f65be	[NVIDIA] upstream FA4 (#15182 ) Co-authored-by: Qiaolin-Yu <liin1211@outlook.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-01-11 15:31:28 +08:00
sglang-bot	a39126672a	chore: bump sgl-kernel version to 0.3.20 (#15564 )	2025-12-21 13:15:23 -08:00
66RING	46be74b4b4	[diffusion] kernel: timestep embedding kernel implementation (#12995 ) Co-authored-by: 戚余航 <qiyuhang@bytedance.com> Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com>	2025-12-19 20:59:50 +08:00
sunxxuns	f2d64e6782	[amd] Add deterministic all-reduce kernel for AMD (ROCm) (#15340 ) Co-authored-by: Thomas Wang <1am9trash@gmail.com>	2025-12-18 23:36:03 -08:00
Kevin_Xiong	4792d1f452	[sgl-kernel][1/2] Fused qk_norm_rope for GLM4.6 (#15141 )	2025-12-18 17:07:04 +08:00
sglang-bot	4a62a0e3cd	chore: bump sgl-kernel version to 0.3.19 (#14632 )	2025-12-08 19:02:24 +08:00
Yuhao Yang	f72a77038f	modify the sgl-kernel to be compatible with transformers 5.x. (#14625 )	2025-12-08 00:39:00 -08:00
sglang-bot	e11f795f63	chore: bump sgl-kernel version to 0.3.18.post3 (#14427 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-12-05 14:09:02 -08:00
Qiaolin Yu	cb8df87fc1	[1/2] Add rope kernel in sgl-kernel (#14334 )	2025-12-04 16:45:44 +08:00
Qi Yuhang	16ff892c18	[sgl-kernel][Feat][B200][1/N] Support MXFP8 Grouped GEMM in Blackwell (#13731 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-12-04 10:09:09 +08:00
Antonin Vidon	df1f31241b	[sgl-kernel] fix runtime error while preloading CUDA runtime (#13089 )	2025-12-02 23:03:51 +08:00
sglang-bot	34035d8cd9	chore: bump sgl-kernel version to 0.3.18.post2 (#14229 )	2025-12-02 00:50:40 +08:00
Xiaoyu Zhang	3de09aadbc	Add new moe wna16 marlin gemm (#14122 )	2025-12-01 23:07:53 +08:00

1 2 3 4 5 ...

280 Commits