Xiaoyu Zhang
|
3de09aadbc
|
Add new moe wna16 marlin gemm (#14122)
|
2025-12-01 23:07:53 +08:00 |
|
Xiaoyu Zhang
|
fb04d43428
|
[kimi k2 thinking] Avoid useless torch.zeros_ (#13596)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
2025-11-21 13:15:27 +08:00 |
|
Ke Bao
|
8e9f05ece1
|
Update marlin moe kernel interface (#13322)
|
2025-11-15 17:10:39 +08:00 |
|
Ke Bao
|
2a96e302cb
|
Revert moe sum reduce for marlin moe (#13314)
|
2025-11-15 15:57:41 +08:00 |
|
Ke Bao
|
44f594d832
|
Apply moe_reduce_sum kernel for fused_marlin_moe (#12888)
|
2025-11-09 01:31:05 +08:00 |
|
Lianmin Zheng
|
2d5605e89b
|
Fix ci install to allow prerelease (#12449)
|
2025-10-31 02:22:15 -07:00 |
|
Kai-Hsun Chen
|
6371f7af27
|
[quantization] AWQ Marlin doesn't work when dtype is bfloat16 (#11494)
Signed-off-by: Kai-Hsun Chen <khchen@x.ai>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-10-26 15:49:45 +08:00 |
|
Lianmin Zheng
|
c480a3f6ea
|
Minor style fixes for sgl-kernel (#9289)
|
2025-08-18 09:38:35 -07:00 |
|
Peng Zhang
|
5aa1ebd242
|
[2/n]decouple quantization implementation from vLLM dependency (#8112)
Co-authored-by: walker-ai <yiyun.wyt@antgroup.com>
Co-authored-by: leoneo <1320612015@qq.com>
|
2025-08-14 03:19:03 -07:00 |
|
Hongbo Xu
|
39fd178831
|
refactor: Move scalar_types.py to sgl-kernel to avoid circular import (#8720)
|
2025-08-07 19:22:16 -07:00 |
|
Peng Zhang
|
c28ad1990d
|
[1/n] chore: decouple quantization implementation from vLLM dependency (#7992)
|
2025-07-16 15:56:26 -07:00 |
|
AniZpZ
|
8e03b641ba
|
[1/n] apply wna16marlin kernel in moe weight only quantization (#7683)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: 弋云 <yiyun.wyt@antgroup.com>
Co-authored-by: walker-ai <2398833647@qq.com>
|
2025-07-01 23:21:25 -07:00 |
|