Commit Graph

782 Commits

Author SHA1 Message Date
Yuhao Yang
fe9b9b254b Fix segfault in cudaMemcpyBatchAsync on CUDA 13.0 (#23136)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
2026-04-20 12:20:22 -07:00
Baizhou Zhang
6ecd6f84db [CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-19 05:32:36 -07:00
Lianmin Zheng
9c47bbad13 Clean up bench_one_batch warning and simplify norm dispatch (#23110) 2026-04-17 17:42:20 -07:00
blzheng
0dcfae5553 [CPU] Add gemma4_rmsnorm_cpu kernel (#22842)
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-04-17 13:03:16 +08:00
Chunyuan WU
6c89214584 [CPU][sgl-kernel] extend_attention_cpu and flash_attn_varlen_func: fix nan for large seq (#22434)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-04-17 13:01:01 +08:00
Baizhou Zhang
113d654152 [Fix] Fix accuracy bug in Flashmla sparse MLA kernel (#22723) 2026-04-15 13:40:04 -07:00
Lianmin Zheng
222eda1598 [Misc] Use cache_once for is_arch_support_pdl in sgl-kernel (#22725) 2026-04-14 15:22:10 -07:00
Baizhou Zhang
d14d368191 [Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467) 2026-04-11 01:59:57 -07:00
jianan-gu
2ab141547d [CPU] Add apply_routed_scaling_factor_on_output support for biased_grouped_topk fusion (#22413)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-04-10 15:16:05 +08:00
Yibo Cai
4644d28213 [sgl-kernel/cpu] fix build error on non-x86 platform (#22245)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-04-10 09:58:07 +08:00
Brayden Zhong
6aafe756b9 Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+… (#22047) 2026-04-03 13:12:30 -07:00
Baizhou Zhang
98ac40192b [Workflow] Fix kernel release build failures for aarch64 and wheel renaming (#22018)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 03:23:03 -07:00
sglang-bot
2c4fb88929 chore: bump sgl-kernel version to 0.4.1 (#21447)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-02 22:31:59 -07:00
DarkSharpness
d1b7c3907d [Parallel State Refactor 2/n] Unify code path of AMD deterministic all reduce (#20871) 2026-04-03 12:33:17 +08:00
Mook
991f3aa5b3 [Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+) (#19652) 2026-04-03 10:48:15 +08:00
Baizhou Zhang
c7d03a6215 Revert "Rollback flashmla to older version [1/2]" (#21922) 2026-04-02 00:27:02 -07:00
R0CKSTAR
ca3286d2d5 [diffusion] hardware: support FA3 attention backend on MUSA (attn backend, 14/N) (#18648)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-01 10:49:34 -07:00
Brayden Zhong
6a9b09847c CUTLASS NVFP4 GEMM improvement of SM120 (#21314) 2026-04-01 09:04:34 +08:00
Xiaoyu Zhang
cdd7d6a227 Remove obsolete sgl-kernel legacy paths (#21528) 2026-04-01 09:00:20 +08:00
Ma Mingfei
af62bd9486 [CPU] Implement MXFP4 Gemm kernels for intel AMX to support GPT OSS series. (#14385) 2026-03-29 23:44:12 -07:00
blzheng
ed01e1d5d6 [CPU] add kernel apply_rotary_pos_emb_cpu for Qwen3-VL and Qwen3-Omni (#13121)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-03-29 23:43:46 -07:00
Ma Mingfei
6da8f5f69e fix topk softmax performance issue (#14702) 2026-03-29 23:43:16 -07:00
Johnsonms
8a56a7b04d [jit_kernel] Migrate cast (downcast_fp8) from sgl-kernel AOT to JIT (#19103) 2026-03-27 13:21:44 +08:00
Yanda Cheng
662635e7a7 fix(sgl-kernel): align wheel METADATA/WHEEL with +cu filename (#21437) 2026-03-25 19:44:50 -07:00
Baizhou Zhang
dbe871efdd Rollback flashmla to older version [1/2] (#21430) 2026-03-25 17:49:54 -07:00
Minglei Zhu
a12fea21ed perf(sgl-kernel): expose get_scheduler_metadata for FA3 decode optimization (#21103) 2026-03-25 13:17:27 -07:00
Lianmin Zheng
27ac831a84 docs: improve CI and testing documentation (#21202)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 10:48:50 -07:00
Brayden Zhong
009eee85a0 CUTLASS FP8 Blockwise GEMM improvement of SM120 (#20887) 2026-03-22 17:55:54 +08:00
Xiaoyu Zhang
766d225fcc Add SGLang CUDA crash API logging inspired by FlashInfer (#20910) 2026-03-22 16:39:40 +08:00
Baizhou Zhang
67cad3e69e Revert "Support CuteDSL mm_fp4 backend" (#21077) 2026-03-20 22:47:47 -07:00
Lianmin Zheng
104b10f70a refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009) 2026-03-20 05:55:36 -07:00
Brayden Zhong
b42b9f6e1a Support CuteDSL mm_fp4 backend (#18801) 2026-03-19 14:20:01 -07:00
Cao E
274581fb77 Add support for more batch sizes in cpu_graph_runner (#13881) 2026-03-19 09:50:56 -07:00
Qi Yuhang
cb8105fe28 [sgl-kernel][6/7]Support Expert Specialization Grouped GEMM (#15471) 2026-03-19 15:39:52 +08:00
blzheng
cd22aa27a9 [CPU] Add FP8 Bmm support (#9744)
Co-authored-by: Fan Yin <1106310035@qq.com>
2026-03-18 22:19:48 -07:00
blzheng
c2b01bd2fc [CPU] fix bug in AVX512 implementation of flash_attn_softmax (#20220)
Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>
2026-03-18 22:18:47 -07:00
Ma Mingfei
687d9eb66f [CPU] Optimize image preprocessor performance for Qwen2VLImageProcessorFast (#15168) 2026-03-18 22:18:15 -07:00
Ma Mingfei
62d7454976 optimize conv3d used in patch embedding (#16040) 2026-03-18 22:17:53 -07:00
blzheng
cbea9f6909 [CPU] improve numa memory binding (#19666)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-18 22:15:50 -07:00
Zaili Wang
2f4babe32b [CPU] support LayerNorm with 3D shape (#15075)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-03-18 22:15:24 -07:00
blzheng
dc6aa26ce9 [CPU] Add mrope kernel for Qwen3-vl (#12531)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-03-18 22:12:48 -07:00
Xiaoyu Zhang
15097c5c3b Release sglang kernel 0.4.0 (#20440)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-16 20:34:58 +08:00
Xiaoyu Zhang
25e38216b6 [kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277) 2026-03-14 16:45:54 +08:00
Johnsonms
7cf0551014 Migrate norm kernels to FlashInfer JIT implementation (#18871) 2026-03-10 14:56:07 +08:00
Baizhou Zhang
a6ae89fe3c Revert "chore: bump sgl-kernel version to 0.3.21.post1" (#20229) 2026-03-09 20:32:19 -07:00
sglang-bot
0f0c8b2f18 chore: bump sgl-kernel version to 0.3.21.post1 (#20087)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>\
2026-03-08 03:03:58 -07:00
Fan Yin
43d6a32045 [sgl-kernel] rebase FlashMLA 0217 (#18902)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-03-07 00:30:52 -08:00
Mohammad Miadh Angkad
f88acf8780 [JIT Kernel] Reland NVFP4 kernels to JIT (#20012) 2026-03-07 10:31:08 +08:00
Johnsonms
2d266c73ea Migrate renorm kernels from sgl-kernel to FlashInfer JIT (#18854) 2026-03-06 22:53:28 +08:00
Baizhou Zhang
51e5dc845a Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005) 2026-03-05 19:40:00 -08:00