sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-02 21:37:11 +00:00

Author	SHA1	Message	Date
DarkSharpness	d1b7c3907d	[Parallel State Refactor 2/n] Unify code path of AMD deterministic all reduce (#20871 )	2026-04-03 12:33:17 +08:00
Brayden Zhong	6a9b09847c	CUTLASS NVFP4 GEMM improvement of SM120 (#21314 )	2026-04-01 09:04:34 +08:00
Baizhou Zhang	67cad3e69e	Revert "Support CuteDSL `mm_fp4` backend" (#21077 )	2026-03-20 22:47:47 -07:00
Lianmin Zheng	104b10f70a	refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009 )	2026-03-20 05:55:36 -07:00
Brayden Zhong	b42b9f6e1a	Support CuteDSL `mm_fp4` backend (#18801 )	2026-03-19 14:20:01 -07:00
Qi Yuhang	cb8105fe28	[sgl-kernel][6/7]Support Expert Specialization Grouped GEMM (#15471 )	2026-03-19 15:39:52 +08:00
Xiaoyu Zhang	25e38216b6	[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277 )	2026-03-14 16:45:54 +08:00
pansicheng	2ad475b4ed	use flashinfer.sampling (#18696 )	2026-02-26 10:02:38 +08:00
SoluMilken	07a24f1a38	update pre-commit config (#18860 )	2026-02-16 00:18:31 +08:00
Xiaoyu Zhang	de2f2880b5	[JIT sgl-kernel] Jit support per tensor quant (#15709 )	2025-12-25 16:24:37 +08:00
sunxxuns	f2d64e6782	[amd] Add deterministic all-reduce kernel for AMD (ROCm) (#15340 ) Co-authored-by: Thomas Wang <1am9trash@gmail.com>	2025-12-18 23:36:03 -08:00
b8zhong	4b8901ac0f	Update FP4 GEMM Benchmark (#14449 )	2025-12-16 23:04:56 -08:00
Xiaoyu Zhang	c5947ecd85	Opt moe align block size kernel (#14133 )	2025-12-02 19:13:55 +08:00
Xiaoyu Zhang	ecefc7904f	[sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819 )	2025-11-24 18:26:25 +08:00
Roger Young	e72cf13693	Support moe topk sigmoid kernel (#13049 ) Co-authored-by: xuebi <xuebi@minimaxi.com>	2025-11-20 00:24:37 +08:00
Xiaoyu Zhang	1d3d42bda0	[opt kimi k2 1 / n] Add kimi k2 moe fused gate (#13287 )	2025-11-15 17:14:19 +08:00
Yuan Luo	271d3d0d50	Support mrope triton kernel and add unit test (#11722 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>	2025-10-20 11:51:07 +08:00
Qi Yuhang	6c01844f45	[sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674 )	2025-10-15 13:39:31 -07:00
Qi Yuhang	9a30914e94	[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-10-12 20:19:21 -07:00
fzyzcjy	21337b22b9	Reland [1/2] Optimizations and refactors about quant kernel (#10312 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-10-11 15:59:03 +08:00
Lianmin Zheng	9b8ebb2798	move more files under srt/utils (#11285 )	2025-10-09 16:46:15 -07:00
Yuan Luo	4f42c8cd3e	[sgl-kernel] Support float64 moe_sum_reduce cuda kernel (#11068 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-10-07 14:31:11 +00:00
Xiaoyu Zhang	11965b0daf	Fix sgl-kernel benchmark dead code (#11022 )	2025-09-29 15:06:40 +08:00
Xiaoyu Zhang	c4e314f986	Restruct sgl-kernel benchmark (#10861 )	2025-09-25 07:45:25 +08:00
Yineng Zhang	6d55f60e77	Revert "[1/2] Optimizations and refactors about quant kernel (#9534 )" (#10292 )	2025-09-10 18:24:23 -07:00
hlu1	5f1eb20484	[chore] Remove unused ep_moe cuda kernels (#9956 )	2025-09-06 01:35:50 -07:00
fzyzcjy	339f8eef09	[1/2] Optimizations and refactors about quant kernel (#9534 )	2025-09-05 18:45:08 +08:00
fzyzcjy	e85cb1ce9d	Fix quant kernel test errors and benchmark wrong output speeds (#7604 )	2025-08-21 03:48:41 -07:00
Yuan Luo	53dcc750b6	[sgl-kernel] Support FlashInfer top_k_top_p_sampling_from_logits (#9060 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-08-14 10:56:36 -07:00
henryg	841810f227	[Perf] Tunings for SM100 FP8 CUTLASS kernel (#8818 )	2025-08-13 21:59:22 -07:00
fzyzcjy	9aea255522	Fuse writing KV buffer into rope kernel (part 1: sgl-kernel) (#9077 )	2025-08-12 01:46:40 -07:00
Yuan Luo	1bd5316873	fix benchmark fp8 blockwise group gemm (#8815 )	2025-08-06 21:02:21 +08:00
Stefan He	db7343c992	fix per token cuda kernel hidden dim cannot divide by 16 (#8543 )	2025-08-01 09:27:18 -07:00
Peter Pan	6bdd27861b	[Kimi K2] dsv3_router_gemm supports NUM_EXPERTS == 384 (#8013 )	2025-08-01 22:01:24 +08:00
Cheng Wan	a5f5ab4030	update sgl-kernel for EP: kernel part (#8514 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Ke Bao <ispobaoke@gmail.com>	2025-07-30 22:19:55 -07:00
Elfie Guo	5c9c275bc8	Use FlashInfer FP4 gemm. (#8241 )	2025-07-27 01:05:22 -07:00
fzyzcjy	e34cf6ad75	Fix bench script making input data on L2 cache (#7739 )	2025-07-27 00:30:24 -07:00
Qi Yuhang	426b74936a	Add nvfp4 scaled mm benchmark. (#8401 )	2025-07-26 23:18:04 -07:00
Hubert Lu	af4b9bae95	[AMD] Add silu_and_mul, gelu_and_mul, gelu_tanh_and_mul, and gelu_quick kernels for AMD GPUs (#7135 ) Co-authored-by: yiakwy-xpu-ml-framework-team <961186938@qq.com> Co-authored-by: HAI <hixiao@gmail.com>	2025-07-24 23:44:28 -07:00
Peter Pan	0f8b538614	[fix] benchmark : routed_scaling_factor is None (#8059 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-07-22 08:55:35 -07:00
Baizhou Zhang	282eb59ff3	Add bf16 output option for dsv3_router_gemm kernel (#7999 )	2025-07-20 09:49:37 +08:00
Yi Zhang	2998c4bdf4	[optimize] fuse renormalize into moe_topk_softmax (#7744 ) Co-authored-by: ispobock <ispobaoke@gmail.com>	2025-07-03 12:42:44 -07:00
ayrnb	2c4feaf308	Add CUTLASS FP8 Blockscale MoE kernel for Hopper architecture (#7278 ) Co-authored-by: HydraQYH <QYH820@Outlook.com> Co-authored-by: TianQiLin666666 <1834987979@qq.com>	2025-07-02 23:27:03 -07:00
Baizhou Zhang	7248272ccc	Add dsv3 router gemm kernel (#7627 )	2025-06-29 23:31:55 -07:00
Ke Bao	04b35190e2	Add dsv3 fused a gemm to sgl-kernel (#7630 )	2025-06-29 02:52:24 -07:00
Ke Bao	57ab776910	Fuse sorted_token_ids padding to moe_align_block_size kernel (#7437 )	2025-06-24 17:44:27 -07:00
xutizhou	506c4928f5	feat: integrate deepgemm into EPMoE (#6821 ) Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com> Co-authored-by: TianQiLin666666 <1834987979@qq.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-06-23 01:38:58 -07:00
JieXin Liang	ab1a4fa5cb	[fix] fix cutlass_mla_backend with cuda_graph and add sm_scale for sgl-kernel cutlass_mla (#7184 )	2025-06-14 12:45:41 -07:00
fzyzcjy	aa46ed34d2	Remove 200us slow concat kernel (part 1: kernel) (#7145 )	2025-06-13 01:58:29 -07:00
Yuan Luo	84727a5139	[sgl-kernel] Add cuda kernel for moe_ep_silu_and_mul (#6919 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-06-11 20:43:08 -07:00

1 2

87 Commits