mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-24 08:29:29 +00:00
Function calling support for Kimi-K2 (#628)
* Implement function calling / tools for ik_llama.cpp for Kimi K2
* Implement basic tool choice
* Backport llama.cpp tool calls support
* Enhance function calls with improved chat parser and string utilities
- Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling
- Improve function calls parsing with fallback to llama.cpp builder pattern
- Add string utility functions (starts_with, ends_with, find_partial_stop)
- Update README with function calls testing instructions
- Enhance Kimi K2 parser and function calls documentation
- Add comprehensive test suite for function calls
- Update CMakeLists.txt and Makefile for new components
* Enhance function calling with unified streaming and parser improvements
- Fix streaming content cleanup to prevent function syntax in output
- Unify content extraction patterns with llama.cpp approach
- Improve Kimi K2 parser robustness and partial content handling
- Add comprehensive test coverage for function call scenarios
- Optimize chat message parsing and diff computation
* Replace hardcoded values in kimi_k2_parser.hpp with named constants
- Add compile-time constants for all token format markers
- Add compile-time constants for XML format markers
- Add compile-time constants for simple format patterns
- Replace all hardcoded string literals with named constants
- Use compile-time length calculation to avoid manual counting
- Improve maintainability and reduce magic numbers throughout parser
* Fix duplicate common_chat_parse definition
- Remove duplicate implementation from chat-parser.cpp
- Keep single implementation in chat.cpp following llama.cpp patterns
- Resolves linker error: multiple definition of common_chat_parse
* Fix JSON assertion failure in function call parsing
- Add proper validation that 'function' field is an object before accessing nested keys
- Handle missing 'arguments' field gracefully with default "{}"
- Prevents crash when parsing malformed tool call JSON structures
* Add comprehensive Qwen3 XML tool calling support with unit tests
- Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format
- Add model detection and routing for Qwen3 vs Kimi-K2 formats
- Create 8 comprehensive unit tests covering parsing, streaming, error handling
- Fix token format cleaning bug in kimi_k2_parser.hpp processing order
- Remove progressive parsing code and related utilities
- Add tool injection support for Qwen3 format in server utils
* Add DeepSeek R1 function calling support with comprehensive unit tests
- Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp
- Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp
- Update function_calls.hpp with DeepSeek R1 integration and content extraction
- Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models
- Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration
- Port exact implementation patterns from original llama.cpp for compatibility
Key features:
- Native DeepSeek R1 format: <|tool▁calls▁begin|>function<|tool▁sep|>name```json{}```<|tool▁call▁end|><|tool▁calls▁end|>
- Reasoning content extraction from <think>...</think> tags
- Multiple tool calls support with separate call blocks
- Model detection for deepseek-r1, deepseek_r1 naming patterns
- Integration with incremental parsing and streaming support
* Add partial parsing support for JSON and regex
- json-partial.h/cpp: JSON partial parsing functionality
- regex-partial.h/cpp: Regex partial parsing functionality
* Add format_chat integration tests for Qwen3 tool injection
- Add test_qwen3_format_chat_integration() to validate tool injection pipeline
- Test tool injection conditions and system message enhancement
- Verify JSON formatting and anti-preamble instructions
- Add comprehensive test documentation
Tests confirm tool injection works correctly - conversational preamble
issue is not in ik_llama.cpp but likely in UI configuration.
* Fix Qwen3 tool call parsing - pass model name to parser
Server was not passing model name to parse_chat_message_incremental(),
causing Qwen3 to fall back to Kimi-K2 parser and return tool calls
as content instead of proper tool_calls array.
* Fix non-streaming path to use model-specific parsing
Non-streaming responses were hardcoded to use Kimi-K2 format,
causing Qwen3 XML tool calls to be returned as content instead
of proper tool_calls array. Now uses same model detection as
streaming path for consistency.
This commit is contained in:
committed by
GitHub
parent
0451f10a42
commit
3701fb1686
209
examples/server/function_calls.md
Normal file
209
examples/server/function_calls.md
Normal file
@@ -0,0 +1,209 @@
|
||||
# Function Calling Support
|
||||
|
||||
This document describes the function calling format supported by the ik_llama.cpp server implementation.
|
||||
|
||||
## Overview
|
||||
|
||||
The server supports multiple native function calling formats including Kimi-K2, Qwen3 (XML), and DeepSeek R1. All function calls are automatically detected and converted to OpenAI-compatible responses.
|
||||
|
||||
**⚠️ Model Requirements**: Function calling support is enabled for the following model types:
|
||||
|
||||
- **Kimi-K2 models**: Models containing "kimi-k2" or "kimi_k2" in the model name
|
||||
- **Qwen3 models**: Models containing "qwen3", "qwen-3", or "qwen_3" in the model name
|
||||
- **DeepSeek R1 models**: Models containing "deepseek-r1", "deepseek_r1", or similar patterns
|
||||
|
||||
Other models will not have tool injection or function call parsing enabled.
|
||||
|
||||
## Supported Formats
|
||||
|
||||
### Kimi-K2 Native Token Format
|
||||
|
||||
**Detection Pattern:** `<|tool_calls_section_begin|>...<|tool_calls_section_end|>`
|
||||
|
||||
**Structure:**
|
||||
```
|
||||
<|tool_calls_section_begin|>
|
||||
<|tool_call_begin|>
|
||||
functions.{name}:{index}<|tool_call_argument_begin|>
|
||||
{JSON arguments}
|
||||
<|tool_call_end|>
|
||||
<|tool_calls_section_end|>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```
|
||||
<|tool_calls_section_begin|>
|
||||
<|tool_call_begin|>
|
||||
functions.get_weather:0<|tool_call_argument_begin|>
|
||||
{"location": "Tokyo"}
|
||||
<|tool_call_end|>
|
||||
<|tool_calls_section_end|>
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
- Native Kimi-K2 token format
|
||||
- Multiple function calls supported with different indices
|
||||
- Arguments are JSON objects
|
||||
- Function names follow `functions.{name}:{index}` pattern
|
||||
|
||||
### XML-Style Format (Fallback)
|
||||
|
||||
**Detection Pattern:** `<tool_call>...<invoke name="...">...<parameter name="...">...</parameter>...</invoke></tool_call>`
|
||||
|
||||
**Structure:**
|
||||
```xml
|
||||
<tool_call>
|
||||
<invoke name="{function_name}">
|
||||
<parameter name="{param_name}">{param_value}</parameter>
|
||||
<parameter name="{param_name}">{param_value}</parameter>
|
||||
</invoke>
|
||||
</tool_call>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```xml
|
||||
<tool_call>
|
||||
<invoke name="Write">
|
||||
<parameter name="file_path">/path/to/file.txt</parameter>
|
||||
<parameter name="content">File content here</parameter>
|
||||
</invoke>
|
||||
</tool_call>
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
- XML-style format as fallback when model generates this format instead of token format
|
||||
- Parameters are extracted as key-value pairs
|
||||
- Automatically converted to JSON arguments
|
||||
|
||||
### DeepSeek R1 Native Format
|
||||
|
||||
**Detection Pattern:** `<|tool▁calls▁begin|>...<|tool▁calls▁end|>`
|
||||
|
||||
**Structure:**
|
||||
```
|
||||
<|tool▁calls▁begin|>
|
||||
<|tool▁call▁begin|>
|
||||
function<|tool▁sep|>{function_name}
|
||||
```json
|
||||
{JSON arguments}
|
||||
```
|
||||
<|tool▁call▁end|>
|
||||
<|tool▁calls▁end|>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```
|
||||
<|tool▁calls▁begin|>
|
||||
<|tool▁call▁begin|>
|
||||
function<|tool▁sep|>get_weather
|
||||
```json
|
||||
{"location": "Tokyo"}
|
||||
```
|
||||
<|tool▁call▁end|>
|
||||
<|tool▁calls▁end|>
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
- Native DeepSeek R1 format ported from original llama.cpp
|
||||
- Supports reasoning with `<think>...</think>` tags (automatically extracted)
|
||||
- Multiple function calls supported with separate call blocks
|
||||
- JSON arguments are contained within markdown code blocks
|
||||
|
||||
## OpenAI-Compatible Output
|
||||
|
||||
The native format is converted to the standard OpenAI function calling response:
|
||||
|
||||
```json
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"finish_reason": "tool_calls",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "filtered_content_without_function_calls",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": "functions.get_weather:0",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_weather",
|
||||
"arguments": "{\"location\": \"Tokyo\"}"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Content Filtering
|
||||
|
||||
When function calls are detected:
|
||||
- Function call syntax is removed from content
|
||||
- Tool calls are extracted into separate array
|
||||
- Content is cleaned for display
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Missing tokens in format returns empty array
|
||||
- Malformed structure returns empty array
|
||||
- Parser gracefully handles invalid JSON in arguments
|
||||
|
||||
## Usage with Tools Parameter
|
||||
|
||||
To enable function calling, include the `tools` parameter in your request:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "kimi-k2",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "What's the weather in Tokyo?"
|
||||
}
|
||||
],
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_weather",
|
||||
"description": "Get weather information for a location",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and state, e.g. San Francisco, CA"
|
||||
}
|
||||
},
|
||||
"required": ["location"]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Model Compatibility
|
||||
|
||||
- **Kimi-K2 models**: Native support with token format
|
||||
- **Qwen3 models**: Native support with XML format (Hermes-style)
|
||||
- **DeepSeek R1 models**: Native support with reasoning and function call format (ported from original llama.cpp)
|
||||
- **Other models**: No function calling support
|
||||
|
||||
## Testing
|
||||
|
||||
Test files are provided to verify function calling:
|
||||
- `test-function-calls.cpp` - Unit tests for the native Kimi-K2 format
|
||||
- Tests native token format parsing
|
||||
- Tests multiple function calls
|
||||
- Tests error handling and malformed input
|
||||
|
||||
## File Structure
|
||||
|
||||
- `function_calls.hpp` - Parser implementation for native Kimi-K2 format
|
||||
- `utils.hpp` - Integration with server (includes function_calls.hpp)
|
||||
- `server.cpp` - Response formatting and content filtering
|
||||
Reference in New Issue
Block a user