Files
sglang/docs_new/docs/basic_usage/kimi_k2_5.mdx
Mingyi a3291b5654 Add new Mintlify documentation site (docs_new/) (#23001)
Co-authored-by: AdityaVKochar <adityavardhankochar@gmail.com>
Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
Co-authored-by: adhyan-jain <adhyanjain2006@gmail.com>
Co-authored-by: Adhyan Jain <71976554+adhyan-jain@users.noreply.github.com>
Co-authored-by: Maitri-shah29 <maitrirajivshah@gmail.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: Maitri Shah <shah29maitri@gmail.com>
Co-authored-by: Aditya Vardhan Kochar <80113212+AdityaVKochar@users.noreply.github.com>
Co-authored-by: Rishit Shivam <164783543+pokymono@users.noreply.github.com>
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: IshhanKheria <ishhankheria06@gmail.com>
Co-authored-by: Ishita Joshi <ishitata.joshi@gmail.com>
Co-authored-by: Richard Chen <104477092+Richardczl98@users.noreply.github.com>
Co-authored-by: longGGGGGG <553746008@qq.com>
Co-authored-by: Richard <richardchen@radixark.ai>
Co-authored-by: Nakul Sinha <nakul.new4socials@gmail.com>
Co-authored-by: Divyam Agrawal <ludicrouslytrue@gmail.com>
Co-authored-by: Richardczl98 <Zhenlinc@stanford.edu>
Co-authored-by: Krishang Zinzuwadia <krishangzinzuwadia@gmail.com>
Co-authored-by: nimeshas <nimesha.s106@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jignas Paturu <86356085+JignasP@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
2026-04-20 15:10:22 -07:00

107 lines
3.3 KiB
Plaintext

---
title: "Kimi-K2.5 Usage"
metatags:
description: "Deploy Kimi-K2.5 with SGLang: 1T-parameter multimodal MoE model, 256K context, MLA attention, MoonViT vision encoder, thinking and instant modes, tool calling support."
---
[Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) is Moonshot AI's open-source, native multimodal, agentic MoE. It is a 1T-parameter model (32B active) with 256K context, MLA attention, and a MoonViT vision encoder, supporting both thinking and instant modes.
In SGLang, Kimi-K2.5 uses the `kimi_k2` reasoning and tool-call parsers for correct thinking and tool handling.
```{note} Example
Kimi-K2.5 support is in SGLang main and will land in the next release. Use the latest main or a nightly image until then.
```
Official deployment guide: [Kimi-K2.5 deployment guide](https://huggingface.co/moonshotai/Kimi-K2.5/blob/main/docs/deploy_guidance)
## Install (Latest Main)
```bash Command
uv pip install "sglang @ git+https://github.com/sgl-project/sglang.git#subdirectory=python"
# For CUDA 12:
uv pip install "nvidia-cudnn-cu12==9.16.0.29"
# For CUDA 13:
uv pip install "nvidia-cudnn-cu13==9.16.0.29"
```
## Launch Kimi-K2.5 with SGLang
Example: single node, TP8 on H200.
```bash Command
python3 -m sglang.launch_server \
--model-path moonshotai/Kimi-K2.5 \
--tp 8 \
--trust-remote-code \
--tool-call-parser kimi_k2 \
--reasoning-parser kimi_k2
```
### Parser Requirements
- `--tool-call-parser kimi_k2`: Required for tool calling.
- `--reasoning-parser kimi_k2`: Required to parse thinking content; thinking mode is enabled by default.
## Test the Deployment
Thinking mode is enabled by default. To disable thinking (instant mode), pass `extra_body.chat_template_kwargs.thinking=false`.
```bash Command
# Thinking mode (default)
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/Kimi-K2.5",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain mixture-of-experts in one sentence."}
],
"max_tokens": 256
}'
```
```bash Command
# Instant mode (thinking disabled)
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/Kimi-K2.5",
"messages": [
{"role": "user", "content": "Give one sentence on MoE models."}
],
"max_tokens": 128,
"extra_body": {"chat_template_kwargs": {"thinking": false}}
}'
```
## Multimodal Inputs (Image/Video)
Kimi-K2.5 is multimodal. Image inputs are supported via the OpenAI-compatible vision API. For more details, see `openai_api_vision.ipynb`.
```bash Command
# Image input (SGLang)
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/Kimi-K2.5",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {
"url": "https://github.com/sgl-project/sglang/blob/main/examples/assets/example_image.png?raw=true"
}
}
]
}
],
"max_tokens": 256
}'
```
<Note>
Video chat is experimental and is only supported in the official Moonshot API for now.
</Note>