Yuan Luo
|
e12c78aab6
|
[sgl-kernel][1/2] Fused qk_norm_rope for Qwen3-MoE (#14036)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-11-28 12:25:15 +08:00 |
|
sglang-bot
|
391a863b3f
|
chore: bump sgl-kernel version to 0.3.18.post1 (#13942)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-11-25 15:31:53 -08:00 |
|
Xiaoyu Zhang
|
ecefc7904f
|
[sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819)
|
2025-11-24 18:26:25 +08:00 |
|
sglang-bot
|
a22104a676
|
chore: bump sgl-kernel version to 0.3.18 (#13816)
|
2025-11-23 17:24:54 -08:00 |
|
Xiaoyu Zhang
|
fb04d43428
|
[kimi k2 thinking] Avoid useless torch.zeros_ (#13596)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
2025-11-21 13:15:27 +08:00 |
|
Roger Young
|
e72cf13693
|
Support moe topk sigmoid kernel (#13049)
Co-authored-by: xuebi <xuebi@minimaxi.com>
|
2025-11-20 00:24:37 +08:00 |
|
sglang-bot
|
b638abbae6
|
chore: bump sgl-kernel version to 0.3.17.post2 (#13542)
|
2025-11-18 22:08:23 -08:00 |
|
sglang-bot
|
4a56fa5cf2
|
chore: bump sgl-kernel version to 0.3.17.post1 (#13325)
|
2025-11-15 23:36:32 +08:00 |
|
Xiaoyu Zhang
|
1d3d42bda0
|
[opt kimi k2 1 / n] Add kimi k2 moe fused gate (#13287)
|
2025-11-15 17:14:19 +08:00 |
|
Ke Bao
|
8e9f05ece1
|
Update marlin moe kernel interface (#13322)
|
2025-11-15 17:10:39 +08:00 |
|
Ke Bao
|
2a96e302cb
|
Revert moe sum reduce for marlin moe (#13314)
|
2025-11-15 15:57:41 +08:00 |
|
Fan Yin
|
2966367a31
|
[sgl-kernel] support custom fp8 flashmla kernel (#13087)
|
2025-11-13 12:45:21 -08:00 |
|
Xiaoyu Zhang
|
547de8c774
|
[1 / 2] register weak_ref_tensor in sgl-kernel (#12999)
|
2025-11-10 22:12:59 +08:00 |
|
sglang-bot
|
a30f190762
|
chore: bump sgl-kernel version to 0.3.17 (#12931)
|
2025-11-10 14:54:02 +08:00 |
|
sglang-bot
|
b2b26d4324
|
chore: bump sgl-kernel version to 0.3.16.post6 (#12889)
|
2025-11-09 09:16:28 +08:00 |
|
Ke Bao
|
44f594d832
|
Apply moe_reduce_sum kernel for fused_marlin_moe (#12888)
|
2025-11-09 01:31:05 +08:00 |
|
hlu1
|
b8ddc296f4
|
[sgl-kernel][Deepseek V3.2] Add row_starts to topk kernel (#12582)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-11-07 18:33:27 -08:00 |
|
fzyzcjy
|
193fbb0bce
|
Super tiny add UT for copy_to_gpu_no_ce (#12270)
|
2025-11-04 09:40:51 +08:00 |
|
Lianmin Zheng
|
20315697f4
|
move all get_stream in sgl_kernel to c++ to reduce the launch overhead (#12521)
|
2025-11-02 13:15:05 -08:00 |
|
sglang-bot
|
a920b9dace
|
chore: bump sgl-kernel version to 0.3.16.post5 (#12511)
|
2025-11-02 14:08:11 +08:00 |
|
yinghui
|
a80bcb5a68
|
Add env var to disable FA4 warmup (#12430)
|
2025-10-31 12:25:00 -07:00 |
|
Lianmin Zheng
|
2d5605e89b
|
Fix ci install to allow prerelease (#12449)
|
2025-10-31 02:22:15 -07:00 |
|
Lianmin Zheng
|
c0652d907b
|
Clean up sgl kernel (#12413)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2025-10-31 01:13:34 -07:00 |
|
zejunchen-zejun
|
8a6838212a
|
[Fix] fix type issue of env flag value MODELOPT_MAX_TOKENS_PER_EXPERT (#11709)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-10-29 09:44:05 -07:00 |
|
huangtingwei
|
3e6281d0aa
|
[HiCache]Page head layout IO kernel (#11615)
|
2025-10-26 15:53:50 +08:00 |
|
Kai-Hsun Chen
|
6371f7af27
|
[quantization] AWQ Marlin doesn't work when dtype is bfloat16 (#11494)
Signed-off-by: Kai-Hsun Chen <khchen@x.ai>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-10-26 15:49:45 +08:00 |
|
DarkSharpness
|
e8b71445c0
|
[Misc] Improve the error message of failed import (#12119)
|
2025-10-25 12:09:05 -07:00 |
|
sglang-bot
|
a04212f128
|
chore: bump sgl-kernel version to 0.3.16.post4 (#12103)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-24 22:04:14 -07:00 |
|
Fan Yin
|
23afdfd1c2
|
[sgl-kernel] support flashmla libtorch (#11717)
|
2025-10-21 21:17:50 -07:00 |
|
sglang-bot
|
283c8ba031
|
chore: bump sgl-kernel version to 0.3.16.post3 (#11733)
|
2025-10-19 21:44:15 -05:00 |
|
hlu1
|
3b80232d06
|
[DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-10-19 17:13:39 -07:00 |
|
Johnny
|
252dc4e112
|
[NVIDIA] FA3/FA4 Fix (#11606)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-19 17:10:10 -07:00 |
|
fzyzcjy
|
a27825ae01
|
Support not officially supported high sgl-kernel version with low srt version (#11786)
|
2025-10-19 16:11:59 +08:00 |
|
Fan Yin
|
3289da5b41
|
[sgl-kernel] support hadamard (#11663)
|
2025-10-15 19:00:44 -07:00 |
|
Qi Yuhang
|
6c01844f45
|
[sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674)
|
2025-10-15 13:39:31 -07:00 |
|
fzyzcjy
|
32803fb279
|
Super tiny improve FA3 import error message (#11590)
|
2025-10-14 22:06:31 -07:00 |
|
sglang-bot
|
98923880bc
|
chore: bump sgl-kernel version to 0.3.16.post2 (#11583)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-13 20:52:38 -07:00 |
|
Yineng Zhang
|
f792e3c561
|
Revert "[NVIDIA] BUMP FA3 (#11444)" (#11582)
|
2025-10-13 20:51:45 -07:00 |
|
sglang-bot
|
60b0503227
|
chore: bump sgl-kernel version to 0.3.16.post1 (#11573)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-13 16:26:18 -07:00 |
|
Qi Yuhang
|
dc48c4c0e3
|
[sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534)
|
2025-10-13 16:24:48 -07:00 |
|
Johnny
|
b8c430f1ce
|
[NVIDIA] BUMP FA3 (#11444)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
|
2025-10-13 09:30:57 -07:00 |
|
Qi Yuhang
|
9a30914e94
|
[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-10-12 20:19:21 -07:00 |
|
sglang-bot
|
2db2cddd12
|
chore: bump sgl-kernel version to 0.3.16 (#11476)
|
2025-10-11 22:04:49 -07:00 |
|
PGFLMG
|
8fdcd98efe
|
[7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019)
|
2025-10-11 14:04:57 -07:00 |
|
fzyzcjy
|
21337b22b9
|
Reland [1/2] Optimizations and refactors about quant kernel (#10312)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-10-11 15:59:03 +08:00 |
|
sglang-bot
|
8c9670375f
|
chore: bump sgl-kernel version to 0.3.15 (#11281)
|
2025-10-06 18:17:51 -07:00 |
|
Lifu Huang
|
748f86f3de
|
[Bug] Fix incorrect assertion in FA4 and add UT. (#11182)
|
2025-10-06 14:58:39 -07:00 |
|
PGFLMG
|
1a599509cc
|
chore: bump sgl-kernel v0.3.14.post1 (#11137)
|
2025-10-05 13:46:43 -07:00 |
|
DarkSharpness
|
e0b2d3eebe
|
[Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-05 10:19:03 -07:00 |
|
PGFLMG
|
580051c5a8
|
chore: bump sgl-kernel v0.3.14 (#11067)
|
2025-09-30 02:53:24 -07:00 |
|