From b6d36bffbbf646fba64249e20ff5a0f17252fbb1 Mon Sep 17 00:00:00 2001 From: Azure-Tang Date: Fri, 5 Sep 2025 03:52:43 +0000 Subject: [PATCH] update kimi-k2-0905 --- README.md | 1 + doc/en/Kimi-K2.md | 6 +++++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 117a5e0..0d2241e 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,7 @@ Our vision for KTransformers is to serve as a flexible platform for experimentin

🔥 Updates

+* **July 11, 2025**: Support Kimi-K2-0905. ([Tutorial](./doc/en/Kimi-K2.md)) * **July 26, 2025**: Support SmallThinker and GLM4-MoE. ([Tutorial](./doc/en/SmallThinker_and_Glm4moe.md)) * **July 11, 2025**: Support Kimi-K2. ([Tutorial](./doc/en/Kimi-K2.md)) * **June 30, 2025**: Support 3-layer (GPU-CPU-Disk) [prefix cache](./doc/en/prefix_cache.md) reuse. diff --git a/doc/en/Kimi-K2.md b/doc/en/Kimi-K2.md index 298cb64..de1f8f4 100644 --- a/doc/en/Kimi-K2.md +++ b/doc/en/Kimi-K2.md @@ -3,7 +3,7 @@ ## Introduction ### Overview -We are very pleased to announce that Ktransformers now supports Kimi-K2. +We are very pleased to announce that Ktransformers now supports Kimi-K2 and Kimi-K2-0905. On a single-socket CPU with one consumer-grade GPU, running the Q4_K_M model yields roughly 10 TPS and requires about 600 GB of DRAM. With a dual-socket CPU and sufficient system memory, enabling NUMA optimizations increases performance to about 14 TPS. @@ -14,6 +14,10 @@ With a dual-socket CPU and sufficient system memory, enabling NUMA optimizations - https://huggingface.co/collections/moonshotai/kimi-k2-6871243b990f2af5ba60617d - GGUF Format(quantized models): - https://huggingface.co/KVCache-ai/Kimi-K2-Instruct-GGUF +- Official Kimi-K2-0905 Release: + - https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905 +- GGUF Format(quantized models): + - Uploading... ## Installation Guide