Baizhou Zhang
|
d14d368191
|
[Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467)
|
2026-04-11 01:59:57 -07:00 |
|
Xiaoyu Zhang
|
25e38216b6
|
[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277)
|
2026-03-14 16:45:54 +08:00 |
|
Mohammad Miadh Angkad
|
f88acf8780
|
[JIT Kernel] Reland NVFP4 kernels to JIT (#20012)
|
2026-03-07 10:31:08 +08:00 |
|
Baizhou Zhang
|
51e5dc845a
|
Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005)
|
2026-03-05 19:40:00 -08:00 |
|
Mohammad Miadh Angkad
|
2bdd89a6cd
|
[Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437)
|
2026-03-05 15:22:28 +08:00 |
|
Xiaoyu Zhang
|
9dff933164
|
[Kernel Slimming] Remove sgl-kernel AOT marlin kernels (#19241)
|
2026-02-25 10:08:22 +08:00 |
|
SoluMilken
|
07a24f1a38
|
update pre-commit config (#18860)
|
2026-02-16 00:18:31 +08:00 |
|
Lianmin Zheng
|
20315697f4
|
move all get_stream in sgl_kernel to c++ to reduce the launch overhead (#12521)
|
2025-11-02 13:15:05 -08:00 |
|
zejunchen-zejun
|
8a6838212a
|
[Fix] fix type issue of env flag value MODELOPT_MAX_TOKENS_PER_EXPERT (#11709)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-10-29 09:44:05 -07:00 |
|
fzyzcjy
|
a27825ae01
|
Support not officially supported high sgl-kernel version with low srt version (#11786)
|
2025-10-19 16:11:59 +08:00 |
|
fzyzcjy
|
21337b22b9
|
Reland [1/2] Optimizations and refactors about quant kernel (#10312)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-10-11 15:59:03 +08:00 |
|
Yineng Zhang
|
6d55f60e77
|
Revert "[1/2] Optimizations and refactors about quant kernel (#9534)" (#10292)
|
2025-09-10 18:24:23 -07:00 |
|
fzyzcjy
|
339f8eef09
|
[1/2] Optimizations and refactors about quant kernel (#9534)
|
2025-09-05 18:45:08 +08:00 |
|
Kaixi Hou
|
5c34b4f1c7
|
[NVIDIA] [2/N] Optimize silu_and_mul_scaled_fp4_grouped_quant perf (#9556)
|
2025-08-29 17:17:03 -07:00 |
|
Kaixi Hou
|
e5638573c1
|
[NVIDA] [1/N] Nvfp4 Masked Gemm: Add quant op for the flashinfer grouped gemm (#9200)
|
2025-08-22 12:19:45 -07:00 |
|
Azure
|
70bb066ee4
|
Fix FP4 inference corruption issue in glm4.5-air model (#9346)
|
2025-08-20 22:13:47 -07:00 |
|
Peng Zhang
|
5aa1ebd242
|
[2/n]decouple quantization implementation from vLLM dependency (#8112)
Co-authored-by: walker-ai <yiyun.wyt@antgroup.com>
Co-authored-by: leoneo <1320612015@qq.com>
|
2025-08-14 03:19:03 -07:00 |
|
Baizhou Zhang
|
282eb59ff3
|
Add bf16 output option for dsv3_router_gemm kernel (#7999)
|
2025-07-20 09:49:37 +08:00 |
|
Baizhou Zhang
|
7248272ccc
|
Add dsv3 router gemm kernel (#7627)
|
2025-06-29 23:31:55 -07:00 |
|
Ke Bao
|
04b35190e2
|
Add dsv3 fused a gemm to sgl-kernel (#7630)
|
2025-06-29 02:52:24 -07:00 |
|
fzyzcjy
|
5c66c4424f
|
Support new DeepGEMM format in per token group quant (#7146)
|
2025-06-13 02:00:22 -07:00 |
|
Pavani Majety
|
eb38c7d1ca
|
[1/2] Add Kernel support for Cutlass based Fused FP4 MoE (#6093)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-06-02 13:48:03 -07:00 |
|
HandH1998
|
4d643f6c7a
|
[1/2] Support Qserve (#6457)
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-05-21 19:48:59 -07:00 |
|
Yineng Zhang
|
136b8e6afb
|
fix: remove cublas_grouped_gemm (#5307)
|
2025-04-11 16:22:37 -07:00 |
|
Yineng Zhang
|
31dfff7da7
|
use default for torch.ops (#4835)
|
2025-03-27 19:09:58 -07:00 |
|
Trevor Morris
|
e9f8e42318
|
Support FP4 gemm (1/2) (#3899)
|
2025-03-24 19:50:23 -07:00 |
|
Chunan Zeng
|
65c24c28f9
|
[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396)
|
2025-03-23 23:44:17 -07:00 |
|
Yineng Zhang
|
0a3960f21f
|
fix awq_dequantize (#4333)
|
2025-03-12 01:04:38 -07:00 |
|
Rex
|
07f944631e
|
Add awq dequantize kernel to sgl with 1x to 3x speedup (#4104)
|
2025-03-12 00:10:02 -07:00 |
|
Lianmin Zheng
|
8abf74e3c9
|
Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-03-08 22:54:51 -08:00 |
|