Files
ik_llama.cpp/examples/server/function_calls.md
Anton Sokolchenko f4051d9c3e Deepseek R1 function calls (more formats) (#652)
* Implement function calling / tools for ik_llama.cpp for Kimi K2

* Implement basic tool choice

* Backport llama.cpp tool calls support

* Enhance function calls with improved chat parser and string utilities

- Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling
- Improve function calls parsing with fallback to llama.cpp builder pattern
- Add string utility functions (starts_with, ends_with, find_partial_stop)
- Update README with function calls testing instructions
- Enhance Kimi K2 parser and function calls documentation
- Add comprehensive test suite for function calls
- Update CMakeLists.txt and Makefile for new components

* Enhance function calling with unified streaming and parser improvements

- Fix streaming content cleanup to prevent function syntax in output
- Unify content extraction patterns with llama.cpp approach
- Improve Kimi K2 parser robustness and partial content handling
- Add comprehensive test coverage for function call scenarios
- Optimize chat message parsing and diff computation

* Replace hardcoded values in kimi_k2_parser.hpp with named constants

- Add compile-time constants for all token format markers
- Add compile-time constants for XML format markers
- Add compile-time constants for simple format patterns
- Replace all hardcoded string literals with named constants
- Use compile-time length calculation to avoid manual counting
- Improve maintainability and reduce magic numbers throughout parser

* Fix duplicate common_chat_parse definition

- Remove duplicate implementation from chat-parser.cpp
- Keep single implementation in chat.cpp following llama.cpp patterns
- Resolves linker error: multiple definition of common_chat_parse

* Fix JSON assertion failure in function call parsing

- Add proper validation that 'function' field is an object before accessing nested keys
- Handle missing 'arguments' field gracefully with default "{}"
- Prevents crash when parsing malformed tool call JSON structures

* Add comprehensive Qwen3 XML tool calling support with unit tests

- Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format
- Add model detection and routing for Qwen3 vs Kimi-K2 formats
- Create 8 comprehensive unit tests covering parsing, streaming, error handling
- Fix token format cleaning bug in kimi_k2_parser.hpp processing order
- Remove progressive parsing code and related utilities
- Add tool injection support for Qwen3 format in server utils

* Add DeepSeek R1 function calling support with comprehensive unit tests

- Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp
- Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp
- Update function_calls.hpp with DeepSeek R1 integration and content extraction
- Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models
- Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration
- Port exact implementation patterns from original llama.cpp for compatibility

Key features:
- Native DeepSeek R1 format: <|tool▁calls▁begin|>function<|tool▁sep|>name```json{}```<|tool▁call▁end|><|tool▁calls▁end|>
- Reasoning content extraction from <think>...</think> tags
- Multiple tool calls support with separate call blocks
- Model detection for deepseek-r1, deepseek_r1 naming patterns
- Integration with incremental parsing and streaming support

* Add partial parsing support for JSON and regex

- json-partial.h/cpp: JSON partial parsing functionality
- regex-partial.h/cpp: Regex partial parsing functionality

* Add format_chat integration tests for Qwen3 tool injection

- Add test_qwen3_format_chat_integration() to validate tool injection pipeline
- Test tool injection conditions and system message enhancement
- Verify JSON formatting and anti-preamble instructions
- Add comprehensive test documentation

Tests confirm tool injection works correctly - conversational preamble
issue is not in ik_llama.cpp but likely in UI configuration.

* Fix Qwen3 tool call parsing - pass model name to parser

Server was not passing model name to parse_chat_message_incremental(),
causing Qwen3 to fall back to Kimi-K2 parser and return tool calls
as content instead of proper tool_calls array.

* Fix non-streaming path to use model-specific parsing

Non-streaming responses were hardcoded to use Kimi-K2 format,
causing Qwen3 XML tool calls to be returned as content instead
of proper tool_calls array. Now uses same model detection as
streaming path for consistency.

* Update Qwen3 function call handling in server and tests

- Enhanced server function call detection and response formatting
- Improved test coverage for Qwen3 tool call scenarios
- Refined XML parsing for better tool execution support

* Add DeepSeek-R1 function call parsing support

Implements comprehensive parsing for all 4 DeepSeek-R1 function call formats:
- Format 1: Standard function call syntax (already supported)
- Format 2: Alternative function call patterns (already supported)
- Format 3: Tools array format - function\n```json\n{"tools": [...]}
- Format 4: XML wrapped format - <tool_call>function</think>Name\n```json\n{...}```</tool_call>

Key changes:
- Added parse_deepseek_r1_tools_array() following original parse_prefixed_json_tool_call_array pattern
- Added parse_deepseek_r1_xml_wrapped() following Hermes-2-Pro XML wrapper patterns
- Integrated both parsers into exception handling chain for robust fallback
- Added comprehensive TDD test coverage for all formats
- Anonymized all confidential information while preserving functionality

Resolves tool_calls_count=0 issue where DeepSeek-R1 models generated valid tool calls
but server failed to parse them correctly.

* Update function_calls.md documentation for DeepSeek-R1 Format 4

- Added Format 4 (XML wrapped) documentation with examples
- Updated implementation notes with correct parser order (3→4→1→2)
- Marked all DeepSeek-R1 formats as working (July 2025 update)
- Updated test status for Format 3 and 4 as passing
- Added parse_deepseek_r1_xml_wrapped() function reference
- Corrected implementation file line numbers

* Fix merge conflict in test-function-calls.cpp

- Removed incomplete merge conflict marker from line 3027
- Ensured all tests compile and pass successfully
- All DeepSeek-R1 formats (1-4) working correctly
- All streaming and content cleaning tests passing
2025-08-07 08:15:57 +03:00

385 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Function Calling Support
This document describes the function calling format supported by the ik_llama.cpp server implementation.
## Overview
The server supports multiple native function calling formats including Kimi-K2, Qwen3 (XML), and DeepSeek R1. All function calls are automatically detected and converted to OpenAI-compatible responses.
**⚠️ Model Requirements**: Function calling support is enabled for the following model types:
- **Kimi-K2 models**: Models containing "kimi-k2" or "kimi_k2" in the model name
- **Qwen3 models**: Models containing "qwen3", "qwen-3", or "qwen_3" in the model name
- **DeepSeek R1 models**: Models containing "deepseek-r1", "deepseek_r1", or similar patterns
Other models will not have tool injection or function call parsing enabled.
## Supported Formats
### Kimi-K2 Native Token Format
**Detection Pattern:** `<|tool_calls_section_begin|>...<|tool_calls_section_end|>`
**Structure:**
```
<|tool_calls_section_begin|>
<|tool_call_begin|>
functions.{name}:{index}<|tool_call_argument_begin|>
{JSON arguments}
<|tool_call_end|>
<|tool_calls_section_end|>
```
**Example:**
```
<|tool_calls_section_begin|>
<|tool_call_begin|>
functions.get_weather:0<|tool_call_argument_begin|>
{"location": "Tokyo"}
<|tool_call_end|>
<|tool_calls_section_end|>
```
**Notes:**
- Native Kimi-K2 token format
- Multiple function calls supported with different indices
- Arguments are JSON objects
- Function names follow `functions.{name}:{index}` pattern
### XML-Style Format (Fallback)
**Detection Pattern:** `<tool_call>...<invoke name="...">...<parameter name="...">...</parameter>...</invoke></tool_call>`
**Structure:**
```xml
<tool_call>
<invoke name="{function_name}">
<parameter name="{param_name}">{param_value}</parameter>
<parameter name="{param_name}">{param_value}</parameter>
</invoke>
</tool_call>
```
**Example:**
```xml
<tool_call>
<invoke name="Write">
<parameter name="file_path">/path/to/file.txt</parameter>
<parameter name="content">File content here</parameter>
</invoke>
</tool_call>
```
**Notes:**
- XML-style format as fallback when model generates this format instead of token format
- Parameters are extracted as key-value pairs
- Automatically converted to JSON arguments
### DeepSeek R1 Native Format
**Detection Pattern:** Multiple formats supported with automatic fallback
**⚠️ Critical Implementation Note:** DeepSeek R1 models generate different formats depending on context. The parser handles all variants automatically.
#### Format 1: Full Native Format (Primary)
**Pattern:** `<tool▁calls▁begin>...<tool▁calls▁end>`
```
<tool▁calls▁begin>
<tool▁call▁begin>
function<tool▁sep>{function_name}
```json
{JSON arguments}
```
<tool▁call▁end>
<tool▁calls▁end>
```
#### Format 2: Simplified Format (Fallback)
**Pattern:** `function<{function_name}>`
```
function<get_weather>
```json
{"location": "Tokyo"}
```
```
#### Format 3: Tools Array Format (New - July 2025)
**Pattern:** `function\n```json\n{"tools": [...]}`
```
function
```json
{
"tools": [
{
"name": "get_weather",
"arguments": {
"location": "Tokyo"
}
},
{
"name": "Read",
"arguments": {
"file_path": "/path/to/file.java"
}
}
]
}
```
```
#### Format 4: XML Wrapped Format (New - July 2025)
**Pattern:** `<tool_call>function</think>{function_name}\n```json\n{...}\n```</tool_call>`
```
<tool_call>
function</think>Read
```json
{
"file_path": "/path/to/example.txt"
}
```
</tool_call>
```
**Notes:**
- XML wrapper contains function name after `function</think>`
- Single function call per XML block
- JSON arguments within ```json``` code blocks
- Handles reasoning text before function name
**Examples:**
Format 1 (Full):
```
<tool▁calls▁begin>
<tool▁call▁begin>
function<tool▁sep>get_weather
```json
{"location": "Tokyo"}
```
<tool▁call▁end>
<tool▁calls▁end>
```
Format 2 (Simplified):
```
function<Read>
```json
{"file_path": "/path/to/file.txt"}
```
```
Format 3 (Tools Array):
```
function
```json
{
"tools": [
{
"name": "Read",
"arguments": {
"file_path": "/path/to/example/SystemProcessor.java"
}
},
{
"name": "Edit",
"arguments": {
"file_path": "/path/to/file.java",
"old_string": "old code",
"new_string": "new code"
}
}
]
}
```
```
Format 4 (XML Wrapped):
```
<tool_call>
function</think>CompleteTask
```json
{
"status": "completed"
}
```
</tool_call>
```
**Implementation Notes:**
- **Reasoning Support**: All formats support `<think>...</think>` reasoning tags (automatically extracted)
- **Multiple Tool Calls**: Format 1 & 2 use separate blocks, Format 3 uses array structure, Format 4 uses single XML block
- **Automatic Detection**: Parser tries formats in order: Format 3 → Format 4 → Format 1 → Format 2
- **Original llama.cpp Base**: Implementation follows original llama.cpp patterns exactly
- **Status**: All formats ✅ Working (July 2025 update)
## OpenAI-Compatible Output
The native format is converted to the standard OpenAI function calling response:
```json
{
"choices": [
{
"finish_reason": "tool_calls",
"message": {
"role": "assistant",
"content": "filtered_content_without_function_calls",
"tool_calls": [
{
"id": "functions.get_weather:0",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"Tokyo\"}"
}
}
]
}
}
]
}
```
## Implementation Details
### Content Filtering
When function calls are detected:
- Function call syntax is removed from content
- Tool calls are extracted into separate array
- Content is cleaned for display
### Error Handling
- Missing tokens in format returns empty array
- Malformed structure returns empty array
- Parser gracefully handles invalid JSON in arguments
## Usage with Tools Parameter
To enable function calling, include the `tools` parameter in your request:
```json
{
"model": "kimi-k2",
"messages": [
{
"role": "user",
"content": "What's the weather in Tokyo?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}
]
}
```
## Model Compatibility
- **Kimi-K2 models**: Native support with token format
- **Qwen3 models**: Native support with XML format (Hermes-style)
- **DeepSeek R1 models**: Native support with reasoning and function call format (ported from original llama.cpp)
- **Other models**: No function calling support
## Testing
Comprehensive test suite for all supported formats:
### Unit Tests
- **File**: `tests/test-function-calls.cpp`
- **Coverage**: All supported model formats (Kimi-K2, Qwen3, DeepSeek R1)
- **Test Types**:
- Native format parsing for each model type
- Multiple function calls
- Error handling and malformed input
- Streaming and non-streaming responses
- Content extraction and cleaning
- OpenAI-compatible output generation
### DeepSeek R1 Specific Tests
- **Format 1 Tests**: Full native format with separators ✅
- **Format 2 Tests**: Simplified format without separators ✅
- **Format 3 Tests**: Tools array format ✅ (Fixed July 2025)
- **Format 4 Tests**: XML wrapped format ✅ (Added July 2025)
- **Integration Tests**: Server-to-parser call chain verification
- **Regression Tests**: Ensure existing formats continue working
### Running Tests
```bash
# Build tests
cd build && make test-function-calls -j$(nproc)
# Run all function call tests
./bin/test-function-calls
# Run DeepSeek R1 specific tests
./bin/test-function-calls | grep -E "(DeepSeek|tool_calls_count)"
# Check Format 3 specific issues
./bin/test-function-calls | grep -A5 -B5 "Real failing format"
```
### Test Status
- **Kimi-K2**: ✅ All tests passing
- **Qwen3 XML**: ✅ All tests passing
- **DeepSeek R1 Format 1 & 2**: ✅ All tests passing
- **DeepSeek R1 Format 3**: ✅ All tests passing (Fixed July 2025)
- **DeepSeek R1 Format 4**: ✅ All tests passing (Added July 2025)
## File Structure
### Server Integration
- **`examples/server/server.cpp`** - Main server entry point, calls `parse_chat_message_incremental()`
- **`examples/server/function_calls.hpp`** - Server-side parser creation and integration
- **`examples/server/utils.hpp`** - Server utilities (includes function_calls.hpp)
### Core Parsing Engine
- **`common/chat-parser.cpp`** - Main parser routing, delegates to model-specific parsers
- **`common/chat-parser.h`** - Parser interface and JSON parsing infrastructure
- **`common/chat.cpp`** - Model-specific parsing implementations:
- `common_chat_parse_kimi_k2()` - Kimi-K2 native format
- `common_chat_parse_qwen3()` - Qwen3 XML format
- `common_chat_parse_deepseek_r1()` - DeepSeek R1 multiple formats
- `parse_deepseek_r1_tools_array()` - Format 3 tools array parser
- `parse_deepseek_r1_xml_wrapped()` - Format 4 XML wrapper parser
- **`common/chat.h`** - Function declarations and model detection
### Testing
- **`tests/test-function-calls.cpp`** - Comprehensive unit tests for all formats
- **`tests/get-model.cpp`** - Test utilities for model loading
### Integration Flow
```
server.cpp:2832
↓ parse_chat_message_incremental(generated_text, false, modelname)
function_calls.hpp:94-95
↓ common_chat_msg_parser.parse()
chat-parser.cpp:140
↓ model detection → specific parser
chat.cpp
↓ common_chat_parse_deepseek_r1() / kimi_k2() / qwen3()
↓ Format detection → regex matching → JSON parsing → tool_calls array
```
### Key Implementation Files
- **DeepSeek R1 Format 3**: `common/chat.cpp:291-332` (`parse_deepseek_r1_tools_array`)
- **DeepSeek R1 Format 4**: `common/chat.cpp:335-374` (`parse_deepseek_r1_xml_wrapped`)
- **Exception handling**: `common/chat.cpp:222-289` (Format 3 → 4 → 1 → 2 fallback chain)
- **Model detection**: `common/chat.cpp` (`is_deepseek_r1_model`, `is_qwen3_model`, etc.)
- **Comprehensive tests**: `tests/test-function-calls.cpp` (All formats with TDD coverage)