From b67cc4095d0a677427e1a8c8cbe2b9f0c97357ba Mon Sep 17 00:00:00 2001
From: Atream <80757050+Atream@users.noreply.github.com>
Date: Sat, 8 Nov 2025 20:56:09 +0800
Subject: [PATCH] Change attention backend to 'flashinfer' in launch command

Updated the launch command to include 'flashinfer' as the attention backend.
---
 doc/en/Kimi-K2-Thinking.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/en/Kimi-K2-Thinking.md b/doc/en/Kimi-K2-Thinking.md
index d6e4cb4..59094d9 100644
--- a/doc/en/Kimi-K2-Thinking.md
+++ b/doc/en/Kimi-K2-Thinking.md
@@ -24,7 +24,7 @@ Download the AMX INT4 quantized weights from https://huggingface.co/KVCache-ai/K
 
 ## How to start
 ```
-python -m sglang.launch_server   --host 0.0.0.0   --port 60000   --model path/to/Kimi-K2-Thinking/   --kt-amx-weight-path path/to/Kimi-K2-Instruct-CPU-weight/   --kt-cpuinfer 56   --kt-threadpool-count 2   --kt-num-gpu-experts 200   --kt-amx-method AMXINT4   --trust-remote-code   --mem-fraction-static 0.98   --chunked-prefill-size 4096   --max-running-requests 37   --max-total-tokens 37000   --enable-mixed-chunk   --tensor-parallel-size 8   --enable-p2p-check   --disable-shared-experts-fusion
+python -m sglang.launch_server   --host 0.0.0.0   --port 60000   --model path/to/Kimi-K2-Thinking/   --kt-amx-weight-path path/to/Kimi-K2-Instruct-CPU-weight/   --kt-cpuinfer 56   --kt-threadpool-count 2   --kt-num-gpu-experts 200   --kt-amx-method AMXINT4   --attention-backend flashinfer   --trust-remote-code   --mem-fraction-static 0.98   --chunked-prefill-size 4096   --max-running-requests 37   --max-total-tokens 37000   --enable-mixed-chunk   --tensor-parallel-size 8   --enable-p2p-check   --disable-shared-experts-fusion
 ```
 tips: