sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-05 06:47:05 +00:00

Author	SHA1	Message	Date
Xiaoyu Zhang	3de09aadbc	Add new moe wna16 marlin gemm (#14122 )	2025-12-01 23:07:53 +08:00
Xiaoyu Zhang	fb04d43428	[kimi k2 thinking] Avoid useless torch.zeros_ (#13596 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-21 13:15:27 +08:00
Ke Bao	8e9f05ece1	Update marlin moe kernel interface (#13322 )	2025-11-15 17:10:39 +08:00
Ke Bao	2a96e302cb	Revert moe sum reduce for marlin moe (#13314 )	2025-11-15 15:57:41 +08:00
Ke Bao	44f594d832	Apply moe_reduce_sum kernel for fused_marlin_moe (#12888 )	2025-11-09 01:31:05 +08:00
Lianmin Zheng	2d5605e89b	Fix ci install to allow prerelease (#12449 )	2025-10-31 02:22:15 -07:00
Kai-Hsun Chen	6371f7af27	[quantization] AWQ Marlin doesn't work when dtype is bfloat16 (#11494 ) Signed-off-by: Kai-Hsun Chen <khchen@x.ai> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-10-26 15:49:45 +08:00
Lianmin Zheng	c480a3f6ea	Minor style fixes for sgl-kernel (#9289 )	2025-08-18 09:38:35 -07:00
Peng Zhang	5aa1ebd242	[2/n]decouple quantization implementation from vLLM dependency (#8112 ) Co-authored-by: walker-ai <yiyun.wyt@antgroup.com> Co-authored-by: leoneo <1320612015@qq.com>	2025-08-14 03:19:03 -07:00
Hongbo Xu	39fd178831	refactor: Move scalar_types.py to sgl-kernel to avoid circular import (#8720 )	2025-08-07 19:22:16 -07:00
Peng Zhang	c28ad1990d	[1/n] chore: decouple quantization implementation from vLLM dependency (#7992 )	2025-07-16 15:56:26 -07:00
AniZpZ	8e03b641ba	[1/n] apply wna16marlin kernel in moe weight only quantization (#7683 ) Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: 弋云 <yiyun.wyt@antgroup.com> Co-authored-by: walker-ai <2398833647@qq.com>	2025-07-01 23:21:25 -07:00

12 Commits