Merge pull request #1493 from Azure-Tang/main

Support kimi-k2-0905
2026-03-14 18:37:23 +00:00 · 2025-09-05 11:56:39 +08:00
parent ee2ede0412 b6d36bffbb
commit 4ccbdb23ae
2 changed files with 6 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -23,6 +23,7 @@ Our vision for KTransformers is to serve as a flexible platform for experimentin

 <h2 id="Updates">🔥 Updates</h2>

+* **July 11, 2025**: Support Kimi-K2-0905. ([Tutorial](./doc/en/Kimi-K2.md))
 * **July 26, 2025**: Support SmallThinker and GLM4-MoE. ([Tutorial](./doc/en/SmallThinker_and_Glm4moe.md))
 * **July 11, 2025**: Support Kimi-K2. ([Tutorial](./doc/en/Kimi-K2.md))
 * **June 30, 2025**: Support 3-layer (GPU-CPU-Disk) [prefix cache](./doc/en/prefix_cache.md) reuse.
--- a/doc/en/Kimi-K2.md
+++ b/doc/en/Kimi-K2.md
@@ -3,7 +3,7 @@
 ## Introduction

 ### Overview
-We are very pleased to announce that Ktransformers now supports Kimi-K2.
+We are very pleased to announce that Ktransformers now supports Kimi-K2 and Kimi-K2-0905.

 On a single-socket CPU with one consumer-grade GPU, running the Q4_K_M model yields roughly 10 TPS and requires about 600 GB of DRAM.  
 With a dual-socket CPU and sufficient system memory, enabling NUMA optimizations increases performance to about 14 TPS.
@@ -14,6 +14,10 @@ With a dual-socket CPU and sufficient system memory, enabling NUMA optimizations
  - https://huggingface.co/collections/moonshotai/kimi-k2-6871243b990f2af5ba60617d
 - GGUF Format(quantized models):
  - https://huggingface.co/KVCache-ai/Kimi-K2-Instruct-GGUF
+- Official Kimi-K2-0905 Release:
+  - https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905
+- GGUF Format(quantized models):
+  - Uploading...

 ## Installation Guide