Create Kimi-K2.md

This commit is contained in:
Atream
2025-07-11 09:31:47 +08:00
committed by GitHub
parent 890b0f1622
commit b4ac21454b

70
doc/en/Kimi-K2.md Normal file
View File

@@ -0,0 +1,70 @@
# Kimi-K2 Support for KTransformers
## Introduction
### Overview
We are very pleased to announce that Ktransformers now supports Kimi-K2.
### Model & Resource Links
- Official Kimi-K2 Release:
- http://xxx.com
- GGUF Format(quantized models):
- Coming soon
## Installation Guide
### 1. Resource Requirements
The model running with 384 Experts requires approximately 2 TB of memory and 14 GB of GPU memory.
### 2. Prepare Models
You can convert the fp8 to bf16.
```bash
# download fp8
huggingface-cli download --resume-download xxx
# convert fp8 to bf16
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd inference
python fp8_cast_bf16.py --input-fp8-hf-path <path_to_fp8> --output-bf16-hf-path <path_to_bf16>
```
### 3. Install ktransformers
To install KTransformers, follow the official [Installation Guide](https://kvcache-ai.github.io/ktransformers/en/install.html).
### 4. Run Kimi-K2 Inference Server
```bash
python ktransformers/server/main.py \
--port 10002 \
--model_path <path_to_safetensor_config> \
--gguf_path <path_to_bf16_files> \
--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \
--max_new_tokens 1024 \
--cache_lens 32768 \
--chunk_size 256 \
--max_batch_size 4 \
--backend_type balance_serve \
```
### 5. Access server
```
curl -X POST http://localhost:10002/v1/chat/completions \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "hello"}
],
"model": "Kimi-K2",
"temperature": 0.3,
"top_p": 1.0,
"stream": true
}'
```