Support Native Kimi K2 Thinking (#1663)

* [feat]: fix k2 prefill * Update Kimi-K2-Thinking.md * Create Kimi-K2-Thinking-Native.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking-Native.md * [perf] optimize K2 MoE weight loading with per-expert pointers - Avoid expensive torch.stack().contiguous() in Python (was ~6.6s) - Use per-expert pointer arrays (gate_projs) instead of contiguous memory - C++ worker pool performs parallel memcpy for TP slicing - Add LOAD_TIME_PROFILE for load_weights timing analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: ouqingliang <1692110604@qq.com> Co-authored-by: Claude <noreply@anthropic.com>
2026-03-14 18:37:23 +00:00 · 2025-12-05 21:53:05 +08:00
parent 4850424345
commit 71f683acec
5 changed files with 419 additions and 70 deletions
--- a/doc/en/Kimi-K2-Thinking-Native.md
+++ b/doc/en/Kimi-K2-Thinking-Native.md
@@ -0,0 +1 @@
+需要先写如何安装运行，然后写一个性能，然后链接到如何使用 claude code 接入的文档。
--- a/doc/en/Kimi-K2-Thinking.md
+++ b/doc/en/Kimi-K2-Thinking.md
@@ -1,4 +1,5 @@
 # KTransformers+SGLang Inference Deployment
+Please Note This is Quantization Deployment. For Native Kimi K2 Thinking deployment please refer to [here](./Kimi-K2-Thinking-Native.md).

 ## Installation
				`@@ -0,0 +1 @@`
				`需要先写如何安装运行，然后写一个性能，然后链接到如何使用 claude code 接入的文档。`