Commit Graph

22 Commits

Author SHA1 Message Date
Xiaoyu Zhang
fb74e43707 [Diffusion] Delete sgl-kernel outdated time_embedding kernel (#17278) 2026-01-28 14:18:53 +08:00
Michael
53609e5e5b Revert "[Diffusion] Move diffusion time embedding to jit kernel" (#17257) 2026-01-17 21:29:22 +08:00
Xiaoyu Zhang
2cdd4370bc [Diffusion] Move diffusion time embedding to jit kernel (#16879) 2026-01-17 12:21:22 +08:00
66RING
46be74b4b4 [diffusion] kernel: timestep embedding kernel implementation (#12995)
Co-authored-by: 戚余航 <qiyuhang@bytedance.com>
Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com>
2025-12-19 20:59:50 +08:00
Qiaolin Yu
cb8df87fc1 [1/2] Add rope kernel in sgl-kernel (#14334) 2025-12-04 16:45:44 +08:00
fzyzcjy
193fbb0bce Super tiny add UT for copy_to_gpu_no_ce (#12270) 2025-11-04 09:40:51 +08:00
Lianmin Zheng
20315697f4 move all get_stream in sgl_kernel to c++ to reduce the launch overhead (#12521) 2025-11-02 13:15:05 -08:00
fzyzcjy
3b25dc127a [1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473) 2025-09-15 11:53:21 -07:00
fzyzcjy
0096798ed6 [1/2] Speed up prefill mla attention (#10156) 2025-09-08 09:00:33 -07:00
fzyzcjy
bd7f882142 Support copying tensor from cpu to gpu without using copy engines (#10007) 2025-09-05 20:07:19 +08:00
fzyzcjy
42c8704560 Add PDL support for quant kernel and rope kernel (#9106) 2025-08-20 01:56:29 -07:00
JieXin Liang
6cdcbcc674 [fix] fix enable_pdl for blackwell (#9011) 2025-08-19 01:16:08 +08:00
Lianmin Zheng
c480a3f6ea Minor style fixes for sgl-kernel (#9289) 2025-08-18 09:38:35 -07:00
fzyzcjy
9aea255522 Fuse writing KV buffer into rope kernel (part 1: sgl-kernel) (#9077) 2025-08-12 01:46:40 -07:00
Hubert Lu
af4b9bae95 [AMD] Add silu_and_mul, gelu_and_mul, gelu_tanh_and_mul, and gelu_quick kernels for AMD GPUs (#7135)
Co-authored-by: yiakwy-xpu-ml-framework-team <961186938@qq.com>
Co-authored-by: HAI <hixiao@gmail.com>
2025-07-24 23:44:28 -07:00
Huapeng Zhou
2f7420bc84 [Feat] Enable PDL automatically on Hopper architecture (#5981) 2025-06-01 12:30:17 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
PGFLMG
c08a717c77 [Feat] Update sgl-kernel flashinfer to latest main version (#5500)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-04-17 12:43:23 -07:00
yinfan98
0d7fe866f9 [Misc] Clean m.def and add Development Tips (#4890) 2025-03-29 23:06:18 -07:00
Yineng Zhang
31dfff7da7 use default for torch.ops (#4835) 2025-03-27 19:09:58 -07:00
Lianmin Zheng
cf0ccd406e Optimize rope in sgl kernel (#4267) 2025-03-10 10:07:45 -07:00
Lianmin Zheng
8abf74e3c9 Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-03-08 22:54:51 -08:00