Revert "[doc]: update kimi_k2.5 doc (#1823)" (#1825)

This reverts commit 2e6506535b.
2026-03-14 18:37:23 +00:00 · 2026-01-30 16:10:01 +08:00
parent ccbb5b1cf8
commit 794c04fae4
1 changed files with 10 additions and 15 deletions
--- a/doc/en/Kimi-K2.5.md
+++ b/doc/en/Kimi-K2.5.md
@@ -4,36 +4,31 @@ This tutorial demonstrates how to run Kimi-K2.5 model inference using SGLang int

 ## Table of Contents

- [Running Kimi-K2.5 with SGLang and KT-Kernel](#running-kimi-k25-with-sglang-and-kt-kernel)
-  - [Table of Contents](#table-of-contents)
-  - [Hardware Requirements](#hardware-requirements)
-  - [Prerequisites](#prerequisites)
-  - [Step 1: Download Model Weights](#step-1-download-model-weights)
-  - [Step 2: Launch SGLang Server](#step-2-launch-sglang-server)
-    - [Launch Command (4x RTX 4090 Example)](#launch-command-4x-rtx-4090-example)
-  - [Step 3: Send Inference Requests](#step-3-send-inference-requests)
-    - [Basic Chat Completion Request](#basic-chat-completion-request)
-    - [Example Response](#example-response)
+- [Hardware Requirements](#hardware-requirements)
+- [Prerequisites](#prerequisites)
+- [Step 1: Download Model Weights](#step-1-download-model-weights)
+- [Step 2: Launch SGLang Server](#step-2-launch-sglang-server)
+- [Step 3: Send Inference Requests](#step-3-send-inference-requests)

 ## Hardware Requirements

 **Minimum Configuration:**
- **GPU**: NVIDIA RTX 2x4090(24GB) (or equivalent with at least **total 48GB** VRAM available)
+- **GPU**: NVIDIA RTX 2x4090 48GB (or equivalent with at least total 48GB VRAM available)
 - **CPU**: x86 CPU with AVX512F support (e.g., Intel Sapphire Rapids)
 - **RAM**: At least 600GB system memory
 - **Storage**: ~600GB for model weights (native INT4 weight, same weight folder for CPU and GPU)

 ## Prerequisites

-**Update (2026-01-30): Both kimi_k2.5 branches have now been merged into main, so there’s no need to check out those branches anymore. The EPBL feature is also supported.**
 Before starting, ensure you have:

 1. **KT-Kernel installed**:

-   ~~Note: Latest KTransformers' EPLB feature for Kimi-K2.5 will be supported soon.~~
+   Note: Latest KTransformers' EPLB feature for Kimi-K2.5 will be supported soon.

 ```
 git clone https://github.com/kvcache-ai/ktransformers.git
+git checkout kimi_k2.5
 git submodule update --init --recursive
 cd kt-kernel && ./install.sh
 ```
@@ -44,6 +39,7 @@ Note: Currently, please clone our custom SGLang repository:

 ```
 git clone https://github.com/kvcache-ai/sglang.git
+git checkout kimi_k2.5
 cd sglang && pip install -e "python[all]"
 // maybe need to reinstall cudnn according to the issue when launching SGLang
 pip install nvidia-cudnn-cu12==9.16.0.29
@@ -97,8 +93,7 @@ python -m sglang.launch_server \
  --disable-shared-experts-fusion \
  --chunked-prefill-size 32658 \
  --max-total-tokens 50000 \
-  --attention-backend flashinfer \
-  --kt-enable-dynamic-expert-update
+  --attention-backend flashinfer
 ```

 It takes about 2~3 minutes to start the server.