sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 22:07:12 +00:00

Author	SHA1	Message	Date
Qiaolin Yu	90d5e27f79	Enable fa3 PDL by compiling it with corresponding flags (#18756 )	2026-02-18 17:12:05 +08:00
blake-snc	0d30896015	fix(sgl-kernel): use >= 120 for SM12x CUDA kernel dispatch (#18750 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 00:44:47 +08:00
blake-snc	5fc328465a	fix(sgl-kernel): support CUDA 13 runtime preloading for DGX Spark (#18747 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 00:43:04 +08:00
SoluMilken	07a24f1a38	update pre-commit config (#18860 )	2026-02-16 00:18:31 +08:00
Xiaoyu Zhang	c29394e3c8	[kernel slimming] Move fast_hadamard_transform to jit_kernel (#18475 )	2026-02-14 23:06:21 +08:00
Xiaoyu Zhang	9e9e949261	speed up sgl-kernel build (#18586 )	2026-02-12 23:43:22 +08:00
Baizhou Zhang	2d38b8aca0	Revert "[sgl-kernel] upgrade deepgemm" (#18562 )	2026-02-11 01:17:40 +08:00
Xiaoyu Zhang	bec7fe9e65	[sgl-kernel] upgrade deepgemm (#18362 )	2026-02-10 21:31:30 +08:00
Lianmin Zheng	75997ebe8d	Update author information in pyproject.toml (#18453 )	2026-02-08 12:22:55 -08:00
Baizhou Zhang	9fbec79906	Revert "[Build] Enable full kernel in aarch64 wheel" (#18385 )	2026-02-07 09:19:07 +08:00
zhangxin81	e3021b65fe	support smem in per_token_quant_fp8 kernel (#16725 ) Co-authored-by: zhangxin81 <969206500@qq.com>	2026-02-02 17:18:50 +08:00
Yuan Luo	afebb7ab78	Optimize custom-all-reduce (#17674 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-01 18:59:31 +08:00
Zaili Wang	97593c9f41	[CPU] toml file update (#17861 )	2026-01-31 13:16:06 -08:00
R0CKSTAR	46095f0551	[MUSA] Update 3rd party dir to build/_deps (#18035 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-01-31 12:02:39 -08:00
Yifan Cui	45fe51a28e	Reduce topk kernel shared memory from 128KB to 32KB for better occupancy (#17747 ) Co-authored-by: Claude <noreply@anthropic.com>	2026-01-30 21:42:21 -08:00
jianan-gu	c35aa0238c	[CPU][INT4] Add INT4 kernels for CPU (#8226 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-29 22:30:13 -08:00
Ma Mingfei	88f7759402	[CPU] optimize flash_attn_varlen_func (#15708 )	2026-01-29 22:07:05 -08:00
jianan-gu	336dc4579e	[CPU] Optimize Qwen3-next model on CPU (#12525 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com> Co-authored-by: Fan Yin <1106310035@qq.com>	2026-01-29 22:03:58 -08:00
Xiaoyu Zhang	fb74e43707	[Diffusion] Delete sgl-kernel outdated time_embedding kernel (#17278 )	2026-01-28 14:18:53 +08:00
Xiaoyu Zhang	67fb492c9a	[CI] Fix test_moe_fused_gate error (#17844 )	2026-01-28 12:03:17 +08:00
Yi Zhong	8acd4d7d7e	Make flashMLA work on: Cu13, B300 (#17600 ) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>	2026-01-28 00:12:47 +08:00
R0CKSTAR	628ab5d57b	[MUSA][2/N] sgl-kernel build (#17053 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-01-23 14:41:47 -08:00
Bingxu Chen	50a2e4345a	[AMD CI] Add 2-GPU sgl-kernel Tests (#17555 ) Co-authored-by: YC Tseng <yctseng@amd.com>	2026-01-22 21:48:52 -08:00
Zaili Wang	672eb37534	[CPU][Fix CI] Solidate torch version for sgl-kernel-cpu and fix device orientation error (#17460 )	2026-01-22 14:04:50 +08:00
Serge Panev	e95668abc7	[NVIDIA] Fix CUDA arch requirement in nvfp4 cast (#12581 ) Signed-off-by: Serge Panev <spanev@nvidia.com> Co-authored-by: Fan Yin <1106310035@qq.com>	2026-01-21 20:21:11 -08:00
Binyao Jiang	38c233fd04	[Piecewise] Support PCG weak_ref_tensor cuda kernel on AMD (#17291 )	2026-01-20 14:05:32 -08:00
Michael	53609e5e5b	Revert "[Diffusion] Move diffusion time embedding to jit kernel" (#17257 )	2026-01-17 21:29:22 +08:00
Xiaoyu Zhang	2cdd4370bc	[Diffusion] Move diffusion time embedding to jit kernel (#16879 )	2026-01-17 12:21:22 +08:00
sglang-bot	c86ca12875	chore: bump sgl-kernel version to 0.3.21 (#16888 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-01-14 13:27:49 +08:00
Lianmin Zheng	a4825ed588	Fix kernel type annotations for fp8 quant and logging (#16994 )	2026-01-13 18:14:32 -08:00
Xiaoyu Zhang	2ab3ed3e9e	Fix sgl-kernel per_token_quant fp8 kernel scale shared_memory bug (#16886 )	2026-01-13 23:22:05 +08:00
Hubert Lu	8716589826	[AMD][Diffusion] support timestep embedding kernel for AMD GPUs (#16766 )	2026-01-12 22:17:07 -08:00
Baizhou Zhang	f9fc50acd6	[Tiny] Rename test_sparse_flash_attn.py to fix CI (#16895 )	2026-01-11 18:18:29 +08:00
Johnny	b5493f65be	[NVIDIA] upstream FA4 (#15182 ) Co-authored-by: Qiaolin-Yu <liin1211@outlook.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-01-11 15:31:28 +08:00
MarcoDWei	1c09cbe3ed	[Build] Enable full kernel in aarch64 wheel (#16155 )	2026-01-07 19:40:03 -08:00
hlu1	12a0292bfd	Revert "[sgl-kernel] Update flashmla to include fp8 sparse_mla optimizations" (#16678 )	2026-01-08 10:23:06 +08:00
Yingchun Lai	828cd8936f	Introduce sgl-kernel Dockerfile (#14066 )	2026-01-04 11:19:08 -08:00
Yineng Zhang	5595ae142c	docs: fix markdown preview (#16236 )	2025-12-31 12:43:57 -08:00
shuwenn	c0fc7a89e7	[sgl-kernel] fix: make sgl-kernel build respect MAX_JOBS (#15575 )	2025-12-31 10:44:45 +08:00
Xiaoyu Zhang	de2f2880b5	[JIT sgl-kernel] Jit support per tensor quant (#15709 )	2025-12-25 16:24:37 +08:00
sglang-bot	a39126672a	chore: bump sgl-kernel version to 0.3.20 (#15564 )	2025-12-21 13:15:23 -08:00
Xiaoyu Zhang	7fa4906f4f	[sgl-kernel] Streamline kernel size report (Top 20 only) and clean up (#15552 )	2025-12-21 10:00:47 +08:00
Hubert Lu	51e2eaa458	[AMD] Support fast_topk kernels in sgl-kernel (#15172 )	2025-12-19 22:19:09 -08:00
66RING	46be74b4b4	[diffusion] kernel: timestep embedding kernel implementation (#12995 ) Co-authored-by: 戚余航 <qiyuhang@bytedance.com> Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com>	2025-12-19 20:59:50 +08:00
Fan Yin	65c098592d	[sgl-kernel] chore: update deepgemm version (#13402 )	2025-12-19 00:20:24 -08:00
sunxxuns	f2d64e6782	[amd] Add deterministic all-reduce kernel for AMD (ROCm) (#15340 ) Co-authored-by: Thomas Wang <1am9trash@gmail.com>	2025-12-18 23:36:03 -08:00
Bruce-x-1997	793c96c3d2	[perf]optimize w4afp8 kernel on deepseek-v3-0324 (#12921 ) Signed-off-by: bruce.xu <bruce.x@gmicloud.ai>	2025-12-18 18:13:22 +08:00
Kevin_Xiong	4792d1f452	[sgl-kernel][1/2] Fused qk_norm_rope for GLM4.6 (#15141 )	2025-12-18 17:07:04 +08:00
Xiaoyu Zhang	56d12b4aea	Fix warp illegal instruction in kimi k2 thinking PCG (#15306 )	2025-12-18 16:58:23 +08:00
MarcoDWei	ef7c29acd7	Fix issue: ENABLE_BELOW_SM90 cannot be enabled on aarch64 CPU (#12967 )	2025-12-18 13:26:42 +08:00

1 2 3 4 5 ...

722 Commits