From b67cc4095d0a677427e1a8c8cbe2b9f0c97357ba Mon Sep 17 00:00:00 2001 From: Atream <80757050+Atream@users.noreply.github.com> Date: Sat, 8 Nov 2025 20:56:09 +0800 Subject: [PATCH] Change attention backend to 'flashinfer' in launch command Updated the launch command to include 'flashinfer' as the attention backend. --- doc/en/Kimi-K2-Thinking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/en/Kimi-K2-Thinking.md b/doc/en/Kimi-K2-Thinking.md index d6e4cb4..59094d9 100644 --- a/doc/en/Kimi-K2-Thinking.md +++ b/doc/en/Kimi-K2-Thinking.md @@ -24,7 +24,7 @@ Download the AMX INT4 quantized weights from https://huggingface.co/KVCache-ai/K ## How to start ``` -python -m sglang.launch_server --host 0.0.0.0 --port 60000 --model path/to/Kimi-K2-Thinking/ --kt-amx-weight-path path/to/Kimi-K2-Instruct-CPU-weight/ --kt-cpuinfer 56 --kt-threadpool-count 2 --kt-num-gpu-experts 200 --kt-amx-method AMXINT4 --trust-remote-code --mem-fraction-static 0.98 --chunked-prefill-size 4096 --max-running-requests 37 --max-total-tokens 37000 --enable-mixed-chunk --tensor-parallel-size 8 --enable-p2p-check --disable-shared-experts-fusion +python -m sglang.launch_server --host 0.0.0.0 --port 60000 --model path/to/Kimi-K2-Thinking/ --kt-amx-weight-path path/to/Kimi-K2-Instruct-CPU-weight/ --kt-cpuinfer 56 --kt-threadpool-count 2 --kt-num-gpu-experts 200 --kt-amx-method AMXINT4 --attention-backend flashinfer --trust-remote-code --mem-fraction-static 0.98 --chunked-prefill-size 4096 --max-running-requests 37 --max-total-tokens 37000 --enable-mixed-chunk --tensor-parallel-size 8 --enable-p2p-check --disable-shared-experts-fusion ``` tips: