Yuhao Yang
|
fe9b9b254b
|
Fix segfault in cudaMemcpyBatchAsync on CUDA 13.0 (#23136)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
|
2026-04-20 12:20:22 -07:00 |
|
Baizhou Zhang
|
6ecd6f84db
|
[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-19 05:32:36 -07:00 |
|
Lianmin Zheng
|
9c47bbad13
|
Clean up bench_one_batch warning and simplify norm dispatch (#23110)
|
2026-04-17 17:42:20 -07:00 |
|
blzheng
|
0dcfae5553
|
[CPU] Add gemma4_rmsnorm_cpu kernel (#22842)
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-17 13:03:16 +08:00 |
|
Chunyuan WU
|
6c89214584
|
[CPU][sgl-kernel] extend_attention_cpu and flash_attn_varlen_func: fix nan for large seq (#22434)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-17 13:01:01 +08:00 |
|
Baizhou Zhang
|
113d654152
|
[Fix] Fix accuracy bug in Flashmla sparse MLA kernel (#22723)
|
2026-04-15 13:40:04 -07:00 |
|
Lianmin Zheng
|
222eda1598
|
[Misc] Use cache_once for is_arch_support_pdl in sgl-kernel (#22725)
|
2026-04-14 15:22:10 -07:00 |
|
Baizhou Zhang
|
d14d368191
|
[Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467)
|
2026-04-11 01:59:57 -07:00 |
|
jianan-gu
|
2ab141547d
|
[CPU] Add apply_routed_scaling_factor_on_output support for biased_grouped_topk fusion (#22413)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-10 15:16:05 +08:00 |
|
Yibo Cai
|
4644d28213
|
[sgl-kernel/cpu] fix build error on non-x86 platform (#22245)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-10 09:58:07 +08:00 |
|
Brayden Zhong
|
6aafe756b9
|
Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+… (#22047)
|
2026-04-03 13:12:30 -07:00 |
|
Baizhou Zhang
|
98ac40192b
|
[Workflow] Fix kernel release build failures for aarch64 and wheel renaming (#22018)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-03 03:23:03 -07:00 |
|
sglang-bot
|
2c4fb88929
|
chore: bump sgl-kernel version to 0.4.1 (#21447)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2026-04-02 22:31:59 -07:00 |
|
DarkSharpness
|
d1b7c3907d
|
[Parallel State Refactor 2/n] Unify code path of AMD deterministic all reduce (#20871)
|
2026-04-03 12:33:17 +08:00 |
|
Mook
|
991f3aa5b3
|
[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+) (#19652)
|
2026-04-03 10:48:15 +08:00 |
|
Baizhou Zhang
|
c7d03a6215
|
Revert "Rollback flashmla to older version [1/2]" (#21922)
|
2026-04-02 00:27:02 -07:00 |
|
R0CKSTAR
|
ca3286d2d5
|
[diffusion] hardware: support FA3 attention backend on MUSA (attn backend, 14/N) (#18648)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-01 10:49:34 -07:00 |
|
Brayden Zhong
|
6a9b09847c
|
CUTLASS NVFP4 GEMM improvement of SM120 (#21314)
|
2026-04-01 09:04:34 +08:00 |
|
Xiaoyu Zhang
|
cdd7d6a227
|
Remove obsolete sgl-kernel legacy paths (#21528)
|
2026-04-01 09:00:20 +08:00 |
|
Ma Mingfei
|
af62bd9486
|
[CPU] Implement MXFP4 Gemm kernels for intel AMX to support GPT OSS series. (#14385)
|
2026-03-29 23:44:12 -07:00 |
|
blzheng
|
ed01e1d5d6
|
[CPU] add kernel apply_rotary_pos_emb_cpu for Qwen3-VL and Qwen3-Omni (#13121)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-03-29 23:43:46 -07:00 |
|
Ma Mingfei
|
6da8f5f69e
|
fix topk softmax performance issue (#14702)
|
2026-03-29 23:43:16 -07:00 |
|
Johnsonms
|
8a56a7b04d
|
[jit_kernel] Migrate cast (downcast_fp8) from sgl-kernel AOT to JIT (#19103)
|
2026-03-27 13:21:44 +08:00 |
|
Yanda Cheng
|
662635e7a7
|
fix(sgl-kernel): align wheel METADATA/WHEEL with +cu filename (#21437)
|
2026-03-25 19:44:50 -07:00 |
|
Baizhou Zhang
|
dbe871efdd
|
Rollback flashmla to older version [1/2] (#21430)
|
2026-03-25 17:49:54 -07:00 |
|
Minglei Zhu
|
a12fea21ed
|
perf(sgl-kernel): expose get_scheduler_metadata for FA3 decode optimization (#21103)
|
2026-03-25 13:17:27 -07:00 |
|
Lianmin Zheng
|
27ac831a84
|
docs: improve CI and testing documentation (#21202)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-23 10:48:50 -07:00 |
|
Brayden Zhong
|
009eee85a0
|
CUTLASS FP8 Blockwise GEMM improvement of SM120 (#20887)
|
2026-03-22 17:55:54 +08:00 |
|
Xiaoyu Zhang
|
766d225fcc
|
Add SGLang CUDA crash API logging inspired by FlashInfer (#20910)
|
2026-03-22 16:39:40 +08:00 |
|
Baizhou Zhang
|
67cad3e69e
|
Revert "Support CuteDSL mm_fp4 backend" (#21077)
|
2026-03-20 22:47:47 -07:00 |
|
Lianmin Zheng
|
104b10f70a
|
refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009)
|
2026-03-20 05:55:36 -07:00 |
|
Brayden Zhong
|
b42b9f6e1a
|
Support CuteDSL mm_fp4 backend (#18801)
|
2026-03-19 14:20:01 -07:00 |
|
Cao E
|
274581fb77
|
Add support for more batch sizes in cpu_graph_runner (#13881)
|
2026-03-19 09:50:56 -07:00 |
|
Qi Yuhang
|
cb8105fe28
|
[sgl-kernel][6/7]Support Expert Specialization Grouped GEMM (#15471)
|
2026-03-19 15:39:52 +08:00 |
|
blzheng
|
cd22aa27a9
|
[CPU] Add FP8 Bmm support (#9744)
Co-authored-by: Fan Yin <1106310035@qq.com>
|
2026-03-18 22:19:48 -07:00 |
|
blzheng
|
c2b01bd2fc
|
[CPU] fix bug in AVX512 implementation of flash_attn_softmax (#20220)
Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>
|
2026-03-18 22:18:47 -07:00 |
|
Ma Mingfei
|
687d9eb66f
|
[CPU] Optimize image preprocessor performance for Qwen2VLImageProcessorFast (#15168)
|
2026-03-18 22:18:15 -07:00 |
|
Ma Mingfei
|
62d7454976
|
optimize conv3d used in patch embedding (#16040)
|
2026-03-18 22:17:53 -07:00 |
|
blzheng
|
cbea9f6909
|
[CPU] improve numa memory binding (#19666)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-18 22:15:50 -07:00 |
|
Zaili Wang
|
2f4babe32b
|
[CPU] support LayerNorm with 3D shape (#15075)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-03-18 22:15:24 -07:00 |
|
blzheng
|
dc6aa26ce9
|
[CPU] Add mrope kernel for Qwen3-vl (#12531)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-03-18 22:12:48 -07:00 |
|
Xiaoyu Zhang
|
15097c5c3b
|
Release sglang kernel 0.4.0 (#20440)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-16 20:34:58 +08:00 |
|
Xiaoyu Zhang
|
25e38216b6
|
[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277)
|
2026-03-14 16:45:54 +08:00 |
|
Johnsonms
|
7cf0551014
|
Migrate norm kernels to FlashInfer JIT implementation (#18871)
|
2026-03-10 14:56:07 +08:00 |
|
Baizhou Zhang
|
a6ae89fe3c
|
Revert "chore: bump sgl-kernel version to 0.3.21.post1" (#20229)
|
2026-03-09 20:32:19 -07:00 |
|
sglang-bot
|
0f0c8b2f18
|
chore: bump sgl-kernel version to 0.3.21.post1 (#20087)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>\
|
2026-03-08 03:03:58 -07:00 |
|
Fan Yin
|
43d6a32045
|
[sgl-kernel] rebase FlashMLA 0217 (#18902)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-07 00:30:52 -08:00 |
|
Mohammad Miadh Angkad
|
f88acf8780
|
[JIT Kernel] Reland NVFP4 kernels to JIT (#20012)
|
2026-03-07 10:31:08 +08:00 |
|
Johnsonms
|
2d266c73ea
|
Migrate renorm kernels from sgl-kernel to FlashInfer JIT (#18854)
|
2026-03-06 22:53:28 +08:00 |
|
Baizhou Zhang
|
51e5dc845a
|
Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005)
|
2026-03-05 19:40:00 -08:00 |
|