Function calling support for Kimi-K2 (#628)

* Implement function calling / tools for ik_llama.cpp for Kimi K2 * Implement basic tool choice * Backport llama.cpp tool calls support * Enhance function calls with improved chat parser and string utilities - Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling - Improve function calls parsing with fallback to llama.cpp builder pattern - Add string utility functions (starts_with, ends_with, find_partial_stop) - Update README with function calls testing instructions - Enhance Kimi K2 parser and function calls documentation - Add comprehensive test suite for function calls - Update CMakeLists.txt and Makefile for new components * Enhance function calling with unified streaming and parser improvements - Fix streaming content cleanup to prevent function syntax in output - Unify content extraction patterns with llama.cpp approach - Improve Kimi K2 parser robustness and partial content handling - Add comprehensive test coverage for function call scenarios - Optimize chat message parsing and diff computation * Replace hardcoded values in kimi_k2_parser.hpp with named constants - Add compile-time constants for all token format markers - Add compile-time constants for XML format markers - Add compile-time constants for simple format patterns - Replace all hardcoded string literals with named constants - Use compile-time length calculation to avoid manual counting - Improve maintainability and reduce magic numbers throughout parser * Fix duplicate common_chat_parse definition - Remove duplicate implementation from chat-parser.cpp - Keep single implementation in chat.cpp following llama.cpp patterns - Resolves linker error: multiple definition of common_chat_parse * Fix JSON assertion failure in function call parsing - Add proper validation that 'function' field is an object before accessing nested keys - Handle missing 'arguments' field gracefully with default "{}" - Prevents crash when parsing malformed tool call JSON structures * Add comprehensive Qwen3 XML tool calling support with unit tests - Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format - Add model detection and routing for Qwen3 vs Kimi-K2 formats - Create 8 comprehensive unit tests covering parsing, streaming, error handling - Fix token format cleaning bug in kimi_k2_parser.hpp processing order - Remove progressive parsing code and related utilities - Add tool injection support for Qwen3 format in server utils * Add DeepSeek R1 function calling support with comprehensive unit tests - Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp - Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp - Update function_calls.hpp with DeepSeek R1 integration and content extraction - Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models - Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration - Port exact implementation patterns from original llama.cpp for compatibility Key features: - Native DeepSeek R1 format: <｜tool▁calls▁begin｜>function<｜tool▁sep｜>name```json{}```<｜tool▁call▁end｜><｜tool▁calls▁end｜> - Reasoning content extraction from <think>...</think> tags - Multiple tool calls support with separate call blocks - Model detection for deepseek-r1, deepseek_r1 naming patterns - Integration with incremental parsing and streaming support * Add partial parsing support for JSON and regex - json-partial.h/cpp: JSON partial parsing functionality - regex-partial.h/cpp: Regex partial parsing functionality * Add format_chat integration tests for Qwen3 tool injection - Add test_qwen3_format_chat_integration() to validate tool injection pipeline - Test tool injection conditions and system message enhancement - Verify JSON formatting and anti-preamble instructions - Add comprehensive test documentation Tests confirm tool injection works correctly - conversational preamble issue is not in ik_llama.cpp but likely in UI configuration. * Fix Qwen3 tool call parsing - pass model name to parser Server was not passing model name to parse_chat_message_incremental(), causing Qwen3 to fall back to Kimi-K2 parser and return tool calls as content instead of proper tool_calls array. * Fix non-streaming path to use model-specific parsing Non-streaming responses were hardcoded to use Kimi-K2 format, causing Qwen3 XML tool calls to be returned as content instead of proper tool_calls array. Now uses same model detection as streaming path for consistency.
2026-04-24 08:29:29 +00:00 · 2025-07-23 18:11:42 +02:00
parent 0451f10a42
commit 3701fb1686
26 changed files with 6978 additions and 9 deletions
--- a/examples/server/function_calls.md
+++ b/examples/server/function_calls.md
@@ -0,0 +1,209 @@
+# Function Calling Support
+
+This document describes the function calling format supported by the ik_llama.cpp server implementation.
+
+## Overview
+
+The server supports multiple native function calling formats including Kimi-K2, Qwen3 (XML), and DeepSeek R1. All function calls are automatically detected and converted to OpenAI-compatible responses.
+
+**⚠️ Model Requirements**: Function calling support is enabled for the following model types:
+
+- **Kimi-K2 models**: Models containing "kimi-k2" or "kimi_k2" in the model name
+- **Qwen3 models**: Models containing "qwen3", "qwen-3", or "qwen_3" in the model name  
+- **DeepSeek R1 models**: Models containing "deepseek-r1", "deepseek_r1", or similar patterns
+
+Other models will not have tool injection or function call parsing enabled.
+
+## Supported Formats
+
+### Kimi-K2 Native Token Format
+
+**Detection Pattern:** `<|tool_calls_section_begin|>...<|tool_calls_section_end|>`
+
+**Structure:**
+```
+<|tool_calls_section_begin|>
+<|tool_call_begin|>
+functions.{name}:{index}<|tool_call_argument_begin|>
+{JSON arguments}
+<|tool_call_end|>
+<|tool_calls_section_end|>
+```
+
+**Example:**
+```
+<|tool_calls_section_begin|>
+<|tool_call_begin|>
+functions.get_weather:0<|tool_call_argument_begin|>
+{"location": "Tokyo"}
+<|tool_call_end|>
+<|tool_calls_section_end|>
+```
+
+**Notes:**
+- Native Kimi-K2 token format
+- Multiple function calls supported with different indices
+- Arguments are JSON objects
+- Function names follow `functions.{name}:{index}` pattern
+
+### XML-Style Format (Fallback)
+
+**Detection Pattern:** `<tool_call>...<invoke name="...">...<parameter name="...">...</parameter>...</invoke></tool_call>`
+
+**Structure:**
+```xml
+<tool_call>
+<invoke name="{function_name}">
+<parameter name="{param_name}">{param_value}</parameter>
+<parameter name="{param_name}">{param_value}</parameter>
+</invoke>
+</tool_call>
+```
+
+**Example:**
+```xml
+<tool_call>
+<invoke name="Write">
+<parameter name="file_path">/path/to/file.txt</parameter>
+<parameter name="content">File content here</parameter>
+</invoke>
+</tool_call>
+```
+
+**Notes:**
+- XML-style format as fallback when model generates this format instead of token format
+- Parameters are extracted as key-value pairs
+- Automatically converted to JSON arguments
+
+### DeepSeek R1 Native Format
+
+**Detection Pattern:** `<｜tool▁calls▁begin｜>...<｜tool▁calls▁end｜>`
+
+**Structure:**
+```
+<｜tool▁calls▁begin｜>
+<｜tool▁call▁begin｜>
+function<｜tool▁sep｜>{function_name}
+```json
+{JSON arguments}
+```
+<｜tool▁call▁end｜>
+<｜tool▁calls▁end｜>
+```
+
+**Example:**
+```
+<｜tool▁calls▁begin｜>
+<｜tool▁call▁begin｜>
+function<｜tool▁sep｜>get_weather
+```json
+{"location": "Tokyo"}
+```
+<｜tool▁call▁end｜>
+<｜tool▁calls▁end｜>
+```
+
+**Notes:**
+- Native DeepSeek R1 format ported from original llama.cpp
+- Supports reasoning with `<think>...</think>` tags (automatically extracted)
+- Multiple function calls supported with separate call blocks
+- JSON arguments are contained within markdown code blocks
+
+## OpenAI-Compatible Output
+
+The native format is converted to the standard OpenAI function calling response:
+
+```json
+{
+  "choices": [
+    {
+      "finish_reason": "tool_calls",
+      "message": {
+        "role": "assistant",
+        "content": "filtered_content_without_function_calls",
+        "tool_calls": [
+          {
+            "id": "functions.get_weather:0",
+            "type": "function",
+            "function": {
+              "name": "get_weather",
+              "arguments": "{\"location\": \"Tokyo\"}"
+            }
+          }
+        ]
+      }
+    }
+  ]
+}
+```
+
+## Implementation Details
+
+### Content Filtering
+
+When function calls are detected:
+- Function call syntax is removed from content
+- Tool calls are extracted into separate array
+- Content is cleaned for display
+
+### Error Handling
+
+- Missing tokens in format returns empty array
+- Malformed structure returns empty array
+- Parser gracefully handles invalid JSON in arguments
+
+## Usage with Tools Parameter
+
+To enable function calling, include the `tools` parameter in your request:
+
+```json
+{
+  "model": "kimi-k2",
+  "messages": [
+    {
+      "role": "user",
+      "content": "What's the weather in Tokyo?"
+    }
+  ],
+  "tools": [
+    {
+      "type": "function",
+      "function": {
+        "name": "get_weather",
+        "description": "Get weather information for a location",
+        "parameters": {
+          "type": "object",
+          "properties": {
+            "location": {
+              "type": "string",
+              "description": "The city and state, e.g. San Francisco, CA"
+            }
+          },
+          "required": ["location"]
+        }
+      }
+    }
+  ]
+}
+```
+
+## Model Compatibility
+
+- **Kimi-K2 models**: Native support with token format
+- **Qwen3 models**: Native support with XML format (Hermes-style)
+- **DeepSeek R1 models**: Native support with reasoning and function call format (ported from original llama.cpp)
+- **Other models**: No function calling support
+
+## Testing
+
+Test files are provided to verify function calling:
+- `test-function-calls.cpp` - Unit tests for the native Kimi-K2 format
+  - Tests native token format parsing
+  - Tests multiple function calls
+  - Tests error handling and malformed input
+
+## File Structure
+
+- `function_calls.hpp` - Parser implementation for native Kimi-K2 format
+- `utils.hpp` - Integration with server (includes function_calls.hpp)
+- `server.cpp` - Response formatting and content filtering