Commit Graph

12 Commits

Author SHA1 Message Date
Xiaoyu Zhang
3de09aadbc Add new moe wna16 marlin gemm (#14122) 2025-12-01 23:07:53 +08:00
Xiaoyu Zhang
fb04d43428 [kimi k2 thinking] Avoid useless torch.zeros_ (#13596)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-21 13:15:27 +08:00
Ke Bao
8e9f05ece1 Update marlin moe kernel interface (#13322) 2025-11-15 17:10:39 +08:00
Ke Bao
2a96e302cb Revert moe sum reduce for marlin moe (#13314) 2025-11-15 15:57:41 +08:00
Ke Bao
44f594d832 Apply moe_reduce_sum kernel for fused_marlin_moe (#12888) 2025-11-09 01:31:05 +08:00
Lianmin Zheng
2d5605e89b Fix ci install to allow prerelease (#12449) 2025-10-31 02:22:15 -07:00
Kai-Hsun Chen
6371f7af27 [quantization] AWQ Marlin doesn't work when dtype is bfloat16 (#11494)
Signed-off-by: Kai-Hsun Chen <khchen@x.ai>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-10-26 15:49:45 +08:00
Lianmin Zheng
c480a3f6ea Minor style fixes for sgl-kernel (#9289) 2025-08-18 09:38:35 -07:00
Peng Zhang
5aa1ebd242 [2/n]decouple quantization implementation from vLLM dependency (#8112)
Co-authored-by: walker-ai <yiyun.wyt@antgroup.com>
Co-authored-by: leoneo <1320612015@qq.com>
2025-08-14 03:19:03 -07:00
Hongbo Xu
39fd178831 refactor: Move scalar_types.py to sgl-kernel to avoid circular import (#8720) 2025-08-07 19:22:16 -07:00
Peng Zhang
c28ad1990d [1/n] chore: decouple quantization implementation from vLLM dependency (#7992) 2025-07-16 15:56:26 -07:00
AniZpZ
8e03b641ba [1/n] apply wna16marlin kernel in moe weight only quantization (#7683)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: 弋云 <yiyun.wyt@antgroup.com>
Co-authored-by: walker-ai <2398833647@qq.com>
2025-07-01 23:21:25 -07:00