Xiaoyu Zhang
|
cdd7d6a227
|
Remove obsolete sgl-kernel legacy paths (#21528)
|
2026-04-01 09:00:20 +08:00 |
|
Johnsonms
|
8a56a7b04d
|
[jit_kernel] Migrate cast (downcast_fp8) from sgl-kernel AOT to JIT (#19103)
|
2026-03-27 13:21:44 +08:00 |
|
Xiaoyu Zhang
|
25e38216b6
|
[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277)
|
2026-03-14 16:45:54 +08:00 |
|
Xiaoyu Zhang
|
9e9e949261
|
speed up sgl-kernel build (#18586)
|
2026-02-12 23:43:22 +08:00 |
|
Baizhou Zhang
|
2d38b8aca0
|
Revert "[sgl-kernel] upgrade deepgemm" (#18562)
|
2026-02-11 01:17:40 +08:00 |
|
Xiaoyu Zhang
|
bec7fe9e65
|
[sgl-kernel] upgrade deepgemm (#18362)
|
2026-02-10 21:31:30 +08:00 |
|
Yifan Cui
|
45fe51a28e
|
Reduce topk kernel shared memory from 128KB to 32KB for better occupancy (#17747)
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-01-30 21:42:21 -08:00 |
|
Hubert Lu
|
51e2eaa458
|
[AMD] Support fast_topk kernels in sgl-kernel (#15172)
|
2025-12-19 22:19:09 -08:00 |
|
Qiaolin Yu
|
cb8df87fc1
|
[1/2] Add rope kernel in sgl-kernel (#14334)
|
2025-12-04 16:45:44 +08:00 |
|
iLeGend
|
20e59f9510
|
Add FP32 dtype support for RoPE - Part1 (#13181)
|
2025-11-15 11:37:18 -08:00 |
|
hlu1
|
b8ddc296f4
|
[sgl-kernel][Deepseek V3.2] Add row_starts to topk kernel (#12582)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-11-07 18:33:27 -08:00 |
|
Lianmin Zheng
|
20315697f4
|
move all get_stream in sgl_kernel to c++ to reduce the launch overhead (#12521)
|
2025-11-02 13:15:05 -08:00 |
|
bingps
|
15ed27d7d4
|
[Fix] concat_mla_absorb_q_kernel fails for long inputs (#12453)
|
2025-11-02 11:52:06 -08:00 |
|
hlu1
|
3b80232d06
|
[DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-10-19 17:13:39 -07:00 |
|
DarkSharpness
|
e0b2d3eebe
|
[Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-05 10:19:03 -07:00 |
|
Yuan Luo
|
42245551ef
|
[sgl-kernel] Optimize concat_mla_k kernel (#10543)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
|
2025-09-28 23:04:22 +08:00 |
|
fzyzcjy
|
3b25dc127a
|
[1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473)
|
2025-09-15 11:53:21 -07:00 |
|
fzyzcjy
|
0096798ed6
|
[1/2] Speed up prefill mla attention (#10156)
|
2025-09-08 09:00:33 -07:00 |
|
fzyzcjy
|
bd7f882142
|
Support copying tensor from cpu to gpu without using copy engines (#10007)
|
2025-09-05 20:07:19 +08:00 |
|
fzyzcjy
|
42c8704560
|
Add PDL support for quant kernel and rope kernel (#9106)
|
2025-08-20 01:56:29 -07:00 |
|
Hubert Lu
|
c6c379ab31
|
[AMD] Reorganize hip-related header files in sgl-kernel (#9320)
|
2025-08-18 16:53:44 -07:00 |
|
Lianmin Zheng
|
c480a3f6ea
|
Minor style fixes for sgl-kernel (#9289)
|
2025-08-18 09:38:35 -07:00 |
|
fzyzcjy
|
9aea255522
|
Fuse writing KV buffer into rope kernel (part 1: sgl-kernel) (#9077)
|
2025-08-12 01:46:40 -07:00 |
|
Hubert Lu
|
af4b9bae95
|
[AMD] Add silu_and_mul, gelu_and_mul, gelu_tanh_and_mul, and gelu_quick kernels for AMD GPUs (#7135)
Co-authored-by: yiakwy-xpu-ml-framework-team <961186938@qq.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2025-07-24 23:44:28 -07:00 |
|
PGFLMG
|
c08a717c77
|
[Feat] Update sgl-kernel flashinfer to latest main version (#5500)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-04-17 12:43:23 -07:00 |
|
Lianmin Zheng
|
cf0ccd406e
|
Optimize rope in sgl kernel (#4267)
|
2025-03-10 10:07:45 -07:00 |
|
Lianmin Zheng
|
7c0541b385
|
Move activation.cu to sgl-kernel/elementwise (#4250)
|
2025-03-09 22:41:13 -07:00 |
|
Lianmin Zheng
|
eb06dbcbf8
|
Move rope and bmm into sgl-kernel (#4241)
|
2025-03-09 18:38:15 -07:00 |
|
Lianmin Zheng
|
8abf74e3c9
|
Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-03-08 22:54:51 -08:00 |
|