Files
ik_llama.cpp/examples/server/function_calls.md
Anton Sokolchenko f4051d9c3e Deepseek R1 function calls (more formats) (#652)
* Implement function calling / tools for ik_llama.cpp for Kimi K2

* Implement basic tool choice

* Backport llama.cpp tool calls support

* Enhance function calls with improved chat parser and string utilities

- Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling
- Improve function calls parsing with fallback to llama.cpp builder pattern
- Add string utility functions (starts_with, ends_with, find_partial_stop)
- Update README with function calls testing instructions
- Enhance Kimi K2 parser and function calls documentation
- Add comprehensive test suite for function calls
- Update CMakeLists.txt and Makefile for new components

* Enhance function calling with unified streaming and parser improvements

- Fix streaming content cleanup to prevent function syntax in output
- Unify content extraction patterns with llama.cpp approach
- Improve Kimi K2 parser robustness and partial content handling
- Add comprehensive test coverage for function call scenarios
- Optimize chat message parsing and diff computation

* Replace hardcoded values in kimi_k2_parser.hpp with named constants

- Add compile-time constants for all token format markers
- Add compile-time constants for XML format markers
- Add compile-time constants for simple format patterns
- Replace all hardcoded string literals with named constants
- Use compile-time length calculation to avoid manual counting
- Improve maintainability and reduce magic numbers throughout parser

* Fix duplicate common_chat_parse definition

- Remove duplicate implementation from chat-parser.cpp
- Keep single implementation in chat.cpp following llama.cpp patterns
- Resolves linker error: multiple definition of common_chat_parse

* Fix JSON assertion failure in function call parsing

- Add proper validation that 'function' field is an object before accessing nested keys
- Handle missing 'arguments' field gracefully with default "{}"
- Prevents crash when parsing malformed tool call JSON structures

* Add comprehensive Qwen3 XML tool calling support with unit tests

- Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format
- Add model detection and routing for Qwen3 vs Kimi-K2 formats
- Create 8 comprehensive unit tests covering parsing, streaming, error handling
- Fix token format cleaning bug in kimi_k2_parser.hpp processing order
- Remove progressive parsing code and related utilities
- Add tool injection support for Qwen3 format in server utils

* Add DeepSeek R1 function calling support with comprehensive unit tests

- Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp
- Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp
- Update function_calls.hpp with DeepSeek R1 integration and content extraction
- Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models
- Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration
- Port exact implementation patterns from original llama.cpp for compatibility

Key features:
- Native DeepSeek R1 format: <|tool▁calls▁begin|>function<|tool▁sep|>name```json{}```<|tool▁call▁end|><|tool▁calls▁end|>
- Reasoning content extraction from <think>...</think> tags
- Multiple tool calls support with separate call blocks
- Model detection for deepseek-r1, deepseek_r1 naming patterns
- Integration with incremental parsing and streaming support

* Add partial parsing support for JSON and regex

- json-partial.h/cpp: JSON partial parsing functionality
- regex-partial.h/cpp: Regex partial parsing functionality

* Add format_chat integration tests for Qwen3 tool injection

- Add test_qwen3_format_chat_integration() to validate tool injection pipeline
- Test tool injection conditions and system message enhancement
- Verify JSON formatting and anti-preamble instructions
- Add comprehensive test documentation

Tests confirm tool injection works correctly - conversational preamble
issue is not in ik_llama.cpp but likely in UI configuration.

* Fix Qwen3 tool call parsing - pass model name to parser

Server was not passing model name to parse_chat_message_incremental(),
causing Qwen3 to fall back to Kimi-K2 parser and return tool calls
as content instead of proper tool_calls array.

* Fix non-streaming path to use model-specific parsing

Non-streaming responses were hardcoded to use Kimi-K2 format,
causing Qwen3 XML tool calls to be returned as content instead
of proper tool_calls array. Now uses same model detection as
streaming path for consistency.

* Update Qwen3 function call handling in server and tests

- Enhanced server function call detection and response formatting
- Improved test coverage for Qwen3 tool call scenarios
- Refined XML parsing for better tool execution support

* Add DeepSeek-R1 function call parsing support

Implements comprehensive parsing for all 4 DeepSeek-R1 function call formats:
- Format 1: Standard function call syntax (already supported)
- Format 2: Alternative function call patterns (already supported)
- Format 3: Tools array format - function\n```json\n{"tools": [...]}
- Format 4: XML wrapped format - <tool_call>function</think>Name\n```json\n{...}```</tool_call>

Key changes:
- Added parse_deepseek_r1_tools_array() following original parse_prefixed_json_tool_call_array pattern
- Added parse_deepseek_r1_xml_wrapped() following Hermes-2-Pro XML wrapper patterns
- Integrated both parsers into exception handling chain for robust fallback
- Added comprehensive TDD test coverage for all formats
- Anonymized all confidential information while preserving functionality

Resolves tool_calls_count=0 issue where DeepSeek-R1 models generated valid tool calls
but server failed to parse them correctly.

* Update function_calls.md documentation for DeepSeek-R1 Format 4

- Added Format 4 (XML wrapped) documentation with examples
- Updated implementation notes with correct parser order (3→4→1→2)
- Marked all DeepSeek-R1 formats as working (July 2025 update)
- Updated test status for Format 3 and 4 as passing
- Added parse_deepseek_r1_xml_wrapped() function reference
- Corrected implementation file line numbers

* Fix merge conflict in test-function-calls.cpp

- Removed incomplete merge conflict marker from line 3027
- Ensured all tests compile and pass successfully
- All DeepSeek-R1 formats (1-4) working correctly
- All streaming and content cleaning tests passing
2025-08-07 08:15:57 +03:00

10 KiB
Raw Permalink Blame History

Function Calling Support

This document describes the function calling format supported by the ik_llama.cpp server implementation.

Overview

The server supports multiple native function calling formats including Kimi-K2, Qwen3 (XML), and DeepSeek R1. All function calls are automatically detected and converted to OpenAI-compatible responses.

⚠️ Model Requirements: Function calling support is enabled for the following model types:

  • Kimi-K2 models: Models containing "kimi-k2" or "kimi_k2" in the model name
  • Qwen3 models: Models containing "qwen3", "qwen-3", or "qwen_3" in the model name
  • DeepSeek R1 models: Models containing "deepseek-r1", "deepseek_r1", or similar patterns

Other models will not have tool injection or function call parsing enabled.

Supported Formats

Kimi-K2 Native Token Format

Detection Pattern: <|tool_calls_section_begin|>...<|tool_calls_section_end|>

Structure:

<|tool_calls_section_begin|>
<|tool_call_begin|>
functions.{name}:{index}<|tool_call_argument_begin|>
{JSON arguments}
<|tool_call_end|>
<|tool_calls_section_end|>

Example:

<|tool_calls_section_begin|>
<|tool_call_begin|>
functions.get_weather:0<|tool_call_argument_begin|>
{"location": "Tokyo"}
<|tool_call_end|>
<|tool_calls_section_end|>

Notes:

  • Native Kimi-K2 token format
  • Multiple function calls supported with different indices
  • Arguments are JSON objects
  • Function names follow functions.{name}:{index} pattern

XML-Style Format (Fallback)

Detection Pattern: <tool_call>...<invoke name="...">...<parameter name="...">...</parameter>...</invoke></tool_call>

Structure:

<tool_call>
<invoke name="{function_name}">
<parameter name="{param_name}">{param_value}</parameter>
<parameter name="{param_name}">{param_value}</parameter>
</invoke>
</tool_call>

Example:

<tool_call>
<invoke name="Write">
<parameter name="file_path">/path/to/file.txt</parameter>
<parameter name="content">File content here</parameter>
</invoke>
</tool_call>

Notes:

  • XML-style format as fallback when model generates this format instead of token format
  • Parameters are extracted as key-value pairs
  • Automatically converted to JSON arguments

DeepSeek R1 Native Format

Detection Pattern: Multiple formats supported with automatic fallback

⚠️ Critical Implementation Note: DeepSeek R1 models generate different formats depending on context. The parser handles all variants automatically.

Format 1: Full Native Format (Primary)

Pattern: <tool▁calls▁begin>...<tool▁calls▁end>

<tool▁calls▁begin>
<tool▁call▁begin>
function<tool▁sep>{function_name}
```json
{JSON arguments}

<tool▁call▁end> <tool▁calls▁end>


#### Format 2: Simplified Format (Fallback)
**Pattern:** `function<{function_name}>`

function<get_weather>

{"location": "Tokyo"}

#### Format 3: Tools Array Format (New - July 2025)
**Pattern:** `function\n```json\n{"tools": [...]}`

function

{
  "tools": [
    {
      "name": "get_weather",
      "arguments": {
        "location": "Tokyo"
      }
    },
    {
      "name": "Read",
      "arguments": {
        "file_path": "/path/to/file.java"
      }
    }
  ]
}

#### Format 4: XML Wrapped Format (New - July 2025)
**Pattern:** `<tool_call>function</think>{function_name}\n```json\n{...}\n```</tool_call>`

<tool_call> functionRead

{
  "file_path": "/path/to/example.txt"
}

</tool_call>


**Notes:**
- XML wrapper contains function name after `function</think>`
- Single function call per XML block
- JSON arguments within ```json``` code blocks
- Handles reasoning text before function name

**Examples:**

Format 1 (Full):

<tool▁calls▁begin> <tool▁call▁begin> function<tool▁sep>get_weather

{"location": "Tokyo"}

<tool▁call▁end> <tool▁calls▁end>


Format 2 (Simplified):

function

{"file_path": "/path/to/file.txt"}

Format 3 (Tools Array):

function

{
  "tools": [
    {
      "name": "Read",
      "arguments": {
        "file_path": "/path/to/example/SystemProcessor.java"
      }
    },
    {
      "name": "Edit", 
      "arguments": {
        "file_path": "/path/to/file.java",
        "old_string": "old code",
        "new_string": "new code"
      }
    }
  ]
}

Format 4 (XML Wrapped):

<tool_call> functionCompleteTask

{
  "status": "completed"
}

</tool_call>


**Implementation Notes:**
- **Reasoning Support**: All formats support `<think>...</think>` reasoning tags (automatically extracted)
- **Multiple Tool Calls**: Format 1 & 2 use separate blocks, Format 3 uses array structure, Format 4 uses single XML block
- **Automatic Detection**: Parser tries formats in order: Format 3 → Format 4 → Format 1 → Format 2
- **Original llama.cpp Base**: Implementation follows original llama.cpp patterns exactly
- **Status**: All formats ✅ Working (July 2025 update)

## OpenAI-Compatible Output

The native format is converted to the standard OpenAI function calling response:

```json
{
  "choices": [
    {
      "finish_reason": "tool_calls",
      "message": {
        "role": "assistant",
        "content": "filtered_content_without_function_calls",
        "tool_calls": [
          {
            "id": "functions.get_weather:0",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"Tokyo\"}"
            }
          }
        ]
      }
    }
  ]
}

Implementation Details

Content Filtering

When function calls are detected:

  • Function call syntax is removed from content
  • Tool calls are extracted into separate array
  • Content is cleaned for display

Error Handling

  • Missing tokens in format returns empty array
  • Malformed structure returns empty array
  • Parser gracefully handles invalid JSON in arguments

Usage with Tools Parameter

To enable function calling, include the tools parameter in your request:

{
  "model": "kimi-k2",
  "messages": [
    {
      "role": "user",
      "content": "What's the weather in Tokyo?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather information for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Model Compatibility

  • Kimi-K2 models: Native support with token format
  • Qwen3 models: Native support with XML format (Hermes-style)
  • DeepSeek R1 models: Native support with reasoning and function call format (ported from original llama.cpp)
  • Other models: No function calling support

Testing

Comprehensive test suite for all supported formats:

Unit Tests

  • File: tests/test-function-calls.cpp
  • Coverage: All supported model formats (Kimi-K2, Qwen3, DeepSeek R1)
  • Test Types:
    • Native format parsing for each model type
    • Multiple function calls
    • Error handling and malformed input
    • Streaming and non-streaming responses
    • Content extraction and cleaning
    • OpenAI-compatible output generation

DeepSeek R1 Specific Tests

  • Format 1 Tests: Full native format with separators
  • Format 2 Tests: Simplified format without separators
  • Format 3 Tests: Tools array format (Fixed July 2025)
  • Format 4 Tests: XML wrapped format (Added July 2025)
  • Integration Tests: Server-to-parser call chain verification
  • Regression Tests: Ensure existing formats continue working

Running Tests

# Build tests
cd build && make test-function-calls -j$(nproc)

# Run all function call tests
./bin/test-function-calls

# Run DeepSeek R1 specific tests
./bin/test-function-calls | grep -E "(DeepSeek|tool_calls_count)"

# Check Format 3 specific issues
./bin/test-function-calls | grep -A5 -B5 "Real failing format"

Test Status

  • Kimi-K2: All tests passing
  • Qwen3 XML: All tests passing
  • DeepSeek R1 Format 1 & 2: All tests passing
  • DeepSeek R1 Format 3: All tests passing (Fixed July 2025)
  • DeepSeek R1 Format 4: All tests passing (Added July 2025)

File Structure

Server Integration

  • examples/server/server.cpp - Main server entry point, calls parse_chat_message_incremental()
  • examples/server/function_calls.hpp - Server-side parser creation and integration
  • examples/server/utils.hpp - Server utilities (includes function_calls.hpp)

Core Parsing Engine

  • common/chat-parser.cpp - Main parser routing, delegates to model-specific parsers
  • common/chat-parser.h - Parser interface and JSON parsing infrastructure
  • common/chat.cpp - Model-specific parsing implementations:
    • common_chat_parse_kimi_k2() - Kimi-K2 native format
    • common_chat_parse_qwen3() - Qwen3 XML format
    • common_chat_parse_deepseek_r1() - DeepSeek R1 multiple formats
    • parse_deepseek_r1_tools_array() - Format 3 tools array parser
    • parse_deepseek_r1_xml_wrapped() - Format 4 XML wrapper parser
  • common/chat.h - Function declarations and model detection

Testing

  • tests/test-function-calls.cpp - Comprehensive unit tests for all formats
  • tests/get-model.cpp - Test utilities for model loading

Integration Flow

server.cpp:2832
  ↓ parse_chat_message_incremental(generated_text, false, modelname)
function_calls.hpp:94-95  
  ↓ common_chat_msg_parser.parse()
chat-parser.cpp:140
  ↓ model detection → specific parser
chat.cpp
  ↓ common_chat_parse_deepseek_r1() / kimi_k2() / qwen3()
  ↓ Format detection → regex matching → JSON parsing → tool_calls array

Key Implementation Files

  • DeepSeek R1 Format 3: common/chat.cpp:291-332 (parse_deepseek_r1_tools_array)
  • DeepSeek R1 Format 4: common/chat.cpp:335-374 (parse_deepseek_r1_xml_wrapped)
  • Exception handling: common/chat.cpp:222-289 (Format 3 → 4 → 1 → 2 fallback chain)
  • Model detection: common/chat.cpp (is_deepseek_r1_model, is_qwen3_model, etc.)
  • Comprehensive tests: tests/test-function-calls.cpp (All formats with TDD coverage)