Create Kimi-K2.md

2026-04-20 14:29:22 +00:00 · 2025-07-11 09:31:47 +08:00
parent 890b0f1622
commit b4ac21454b
1 changed files with 70 additions and 0 deletions
--- a/doc/en/Kimi-K2.md
+++ b/doc/en/Kimi-K2.md
@@ -0,0 +1,70 @@
+# Kimi-K2 Support for KTransformers
+
+## Introduction
+
+### Overview
+We are very pleased to announce that Ktransformers now supports Kimi-K2.
+
+### Model & Resource Links
+
+- Official Kimi-K2 Release: 
+  - http://xxx.com
+- GGUF Format(quantized models):
+  - Coming soon
+
+## Installation Guide
+
+### 1. Resource Requirements
+
+The model running with 384 Experts requires approximately 2 TB of memory and 14 GB of GPU memory.
+
+### 2. Prepare Models
+
+You can convert the fp8 to bf16.
+
+```bash
+# download fp8
+huggingface-cli download --resume-download xxx
+
+# convert fp8 to bf16
+git clone https://github.com/deepseek-ai/DeepSeek-V3.git
+cd inference
+python fp8_cast_bf16.py --input-fp8-hf-path <path_to_fp8> --output-bf16-hf-path  <path_to_bf16>
+
+```
+
+### 3. Install ktransformers
+
+To install KTransformers, follow the official [Installation Guide](https://kvcache-ai.github.io/ktransformers/en/install.html).
+
+### 4. Run Kimi-K2 Inference Server
+
+```bash
+python ktransformers/server/main.py \
+  --port 10002 \
+  --model_path <path_to_safetensor_config> \
+  --gguf_path <path_to_bf16_files> \
+  --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \
+  --max_new_tokens 1024 \
+  --cache_lens 32768 \
+  --chunk_size 256 \
+  --max_batch_size 4 \
+  --backend_type balance_serve \
+```
+
+### 5. Access server
+
+```
+curl -X POST http://localhost:10002/v1/chat/completions \
+  -H "accept: application/json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [
+      {"role": "user", "content": "hello"}
+    ],
+    "model": "Kimi-K2",
+    "temperature": 0.3,
+    "top_p": 1.0,
+    "stream": true
+  }'
+```