tabbyAPI/docs/10.-Tool-Calling.md

# Tool Calling in TabbyAPI

Tool calling is available for supported models, and enabled by selecting a tool format in the model
config. This can also be specifed per model using `tabby_config.yml`.

Most tool-calling models are also reasoning models and it is recommended to enable reasoning as
well, with appropriate reasoning tags (these cannot currently be inferred from the model's template).

```yml
model:
    reasoning: true
    reasoning_start_token: "<think>"
    reasoning_end_token: "</think>"
    tool_format: qwen3_5
```

### Supported formats
Below are the currently recognized formats

| tool_format   | Aliases           | Model types
|---------------|-------------------|---------------------------
| qwen3_coder   | qwen3_5, step3_5  | Qwen3-Coder<br> Qwen3-Next<br> Qwen3.5
| minimax_m2    |                   | Minimax-M2<br> Minimax-M2.1<br> Minimax-M2.5
| glm4_5        | glm4_6<br> glm4_7 | GLM4.5<br> GLM4.6<br> GLM4.7
| mistral_old ¹ |                   | (older Mistral-family models)
| mistral       |                   | Codestral 2508+<br> Devstral-Small 2507+<br> Magistral-Medium 2506+<br> Magistral-Small 2506+<br> Ministral-3 2512+<br> Mistral-Medium-3.1 2508+<br> Mistral-Small-3.2 2506+<br>
| gemma4        |                   | Gemma 4-it

**¹** Older Mistral models tend to have unreliable tool calling support and even newer ones are
often released without official chat templates or with templates that omit any tool formatting.
Tokenization also changes frequently between model releases. YMMV

# Clients

TabbyAPI should support any software that uses the OAI tool calling API. But the standard is
evolving, no two clients can agree on *exactly* what it looks like and models are trained with
different assumptions as well. Below will be collected notes pertaining to various client software
and how it relates to TabbyAPI's tool calling support.

### OpenCode

- OpenCode by default forces categorical sampling, overriding TabbyAPI's defaults with top-P = 1.0.
This confuses some models, so if you're experiencing *occasional* random gibberish in your output,
check your OpenCode config to make sure sampling is configured there, e.g.:

  ```json
  "agent": {
    "build": {
      "top_p": 0.8
    },
    "plan": {
      "top_p": 0.8
    }
  }
  ```

- OpenCode doesn't explicitly enable reasoning in the request by default. For some models this doesn't
matter. For others (e.g. Gemma4) you can configure `force_enable_thinking: true` in TabbyAPI.