sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 11:48:01 +00:00

Author	SHA1	Message	Date
Yuhao Yang	fe9b9b254b	Fix segfault in cudaMemcpyBatchAsync on CUDA 13.0 (#23136 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>	2026-04-20 12:20:22 -07:00
Baizhou Zhang	6ecd6f84db	[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119 ) Co-authored-by: Kangyan Zhou <zky314343421@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alison Shao <a.shao@wustl.edu> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-19 05:32:36 -07:00
Lianmin Zheng	9c47bbad13	Clean up bench_one_batch warning and simplify norm dispatch (#23110 )	2026-04-17 17:42:20 -07:00
blzheng	0dcfae5553	[CPU] Add gemma4_rmsnorm_cpu kernel (#22842 ) Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-04-17 13:03:16 +08:00
Chunyuan WU	6c89214584	[CPU][sgl-kernel] `extend_attention_cpu` and `flash_attn_varlen_func`: fix `nan` for large seq (#22434 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-04-17 13:01:01 +08:00
Baizhou Zhang	113d654152	[Fix] Fix accuracy bug in Flashmla sparse MLA kernel (#22723 )	2026-04-15 13:40:04 -07:00
Lianmin Zheng	222eda1598	[Misc] Use cache_once for is_arch_support_pdl in sgl-kernel (#22725 )	2026-04-14 15:22:10 -07:00
Baizhou Zhang	d14d368191	[Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467 )	2026-04-11 01:59:57 -07:00
jianan-gu	2ab141547d	[CPU] Add apply_routed_scaling_factor_on_output support for biased_grouped_topk fusion (#22413 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-04-10 15:16:05 +08:00
Yibo Cai	4644d28213	[sgl-kernel/cpu] fix build error on non-x86 platform (#22245 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-04-10 09:58:07 +08:00
Brayden Zhong	6aafe756b9	Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+… (#22047 )	2026-04-03 13:12:30 -07:00
Baizhou Zhang	98ac40192b	[Workflow] Fix kernel release build failures for aarch64 and wheel renaming (#22018 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 03:23:03 -07:00
sglang-bot	2c4fb88929	chore: bump sgl-kernel version to 0.4.1 (#21447 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-04-02 22:31:59 -07:00
DarkSharpness	d1b7c3907d	[Parallel State Refactor 2/n] Unify code path of AMD deterministic all reduce (#20871 )	2026-04-03 12:33:17 +08:00
Mook	991f3aa5b3	[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+) (#19652 )	2026-04-03 10:48:15 +08:00
Baizhou Zhang	c7d03a6215	Revert "Rollback flashmla to older version [1/2]" (#21922 )	2026-04-02 00:27:02 -07:00
R0CKSTAR	ca3286d2d5	[diffusion] hardware: support FA3 attention backend on MUSA (attn backend, 14/N) (#18648 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-01 10:49:34 -07:00
Brayden Zhong	6a9b09847c	CUTLASS NVFP4 GEMM improvement of SM120 (#21314 )	2026-04-01 09:04:34 +08:00
Xiaoyu Zhang	cdd7d6a227	Remove obsolete sgl-kernel legacy paths (#21528 )	2026-04-01 09:00:20 +08:00
Ma Mingfei	af62bd9486	[CPU] Implement MXFP4 Gemm kernels for intel AMX to support GPT OSS series. (#14385 )	2026-03-29 23:44:12 -07:00
blzheng	ed01e1d5d6	[CPU] add kernel apply_rotary_pos_emb_cpu for Qwen3-VL and Qwen3-Omni (#13121 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-03-29 23:43:46 -07:00
Ma Mingfei	6da8f5f69e	fix topk softmax performance issue (#14702 )	2026-03-29 23:43:16 -07:00
Johnsonms	8a56a7b04d	[jit_kernel] Migrate cast (downcast_fp8) from sgl-kernel AOT to JIT (#19103 )	2026-03-27 13:21:44 +08:00
Yanda Cheng	662635e7a7	fix(sgl-kernel): align wheel METADATA/WHEEL with +cu filename (#21437 )	2026-03-25 19:44:50 -07:00
Baizhou Zhang	dbe871efdd	Rollback flashmla to older version [1/2] (#21430 )	2026-03-25 17:49:54 -07:00
Minglei Zhu	a12fea21ed	perf(sgl-kernel): expose get_scheduler_metadata for FA3 decode optimization (#21103 )	2026-03-25 13:17:27 -07:00
Lianmin Zheng	27ac831a84	docs: improve CI and testing documentation (#21202 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 10:48:50 -07:00
Brayden Zhong	009eee85a0	CUTLASS FP8 Blockwise GEMM improvement of SM120 (#20887 )	2026-03-22 17:55:54 +08:00
Xiaoyu Zhang	766d225fcc	Add SGLang CUDA crash API logging inspired by FlashInfer (#20910 )	2026-03-22 16:39:40 +08:00
Baizhou Zhang	67cad3e69e	Revert "Support CuteDSL `mm_fp4` backend" (#21077 )	2026-03-20 22:47:47 -07:00
Lianmin Zheng	104b10f70a	refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009 )	2026-03-20 05:55:36 -07:00
Brayden Zhong	b42b9f6e1a	Support CuteDSL `mm_fp4` backend (#18801 )	2026-03-19 14:20:01 -07:00
Cao E	274581fb77	Add support for more batch sizes in cpu_graph_runner (#13881 )	2026-03-19 09:50:56 -07:00
Qi Yuhang	cb8105fe28	[sgl-kernel][6/7]Support Expert Specialization Grouped GEMM (#15471 )	2026-03-19 15:39:52 +08:00
blzheng	cd22aa27a9	[CPU] Add FP8 Bmm support (#9744 ) Co-authored-by: Fan Yin <1106310035@qq.com>	2026-03-18 22:19:48 -07:00
blzheng	c2b01bd2fc	[CPU] fix bug in AVX512 implementation of flash_attn_softmax (#20220 ) Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>	2026-03-18 22:18:47 -07:00
Ma Mingfei	687d9eb66f	[CPU] Optimize image preprocessor performance for Qwen2VLImageProcessorFast (#15168 )	2026-03-18 22:18:15 -07:00
Ma Mingfei	62d7454976	optimize conv3d used in patch embedding (#16040 )	2026-03-18 22:17:53 -07:00
blzheng	cbea9f6909	[CPU] improve numa memory binding (#19666 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-18 22:15:50 -07:00
Zaili Wang	2f4babe32b	[CPU] support LayerNorm with 3D shape (#15075 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-03-18 22:15:24 -07:00
blzheng	dc6aa26ce9	[CPU] Add mrope kernel for Qwen3-vl (#12531 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-03-18 22:12:48 -07:00
Xiaoyu Zhang	15097c5c3b	Release sglang kernel 0.4.0 (#20440 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-03-16 20:34:58 +08:00
Xiaoyu Zhang	25e38216b6	[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277 )	2026-03-14 16:45:54 +08:00
Johnsonms	7cf0551014	Migrate norm kernels to FlashInfer JIT implementation (#18871 )	2026-03-10 14:56:07 +08:00
Baizhou Zhang	a6ae89fe3c	Revert "chore: bump sgl-kernel version to 0.3.21.post1" (#20229 )	2026-03-09 20:32:19 -07:00
sglang-bot	0f0c8b2f18	chore: bump sgl-kernel version to 0.3.21.post1 (#20087 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>\	2026-03-08 03:03:58 -07:00
Fan Yin	43d6a32045	[sgl-kernel] rebase FlashMLA 0217 (#18902 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-03-07 00:30:52 -08:00
Mohammad Miadh Angkad	f88acf8780	[JIT Kernel] Reland NVFP4 kernels to JIT (#20012 )	2026-03-07 10:31:08 +08:00
Johnsonms	2d266c73ea	Migrate renorm kernels from sgl-kernel to FlashInfer JIT (#18854 )	2026-03-06 22:53:28 +08:00
Baizhou Zhang	51e5dc845a	Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005 )	2026-03-05 19:40:00 -08:00

1 2 3 4 5 ...

782 Commits