Files
ik_llama.cpp/examples/server/function_calls.hpp
Anton Sokolchenko 9ee72225dc Function calling support for Kimi-K2 (#628)
* Implement function calling / tools for ik_llama.cpp for Kimi K2

* Implement basic tool choice

* Backport llama.cpp tool calls support

* Enhance function calls with improved chat parser and string utilities

- Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling
- Improve function calls parsing with fallback to llama.cpp builder pattern
- Add string utility functions (starts_with, ends_with, find_partial_stop)
- Update README with function calls testing instructions
- Enhance Kimi K2 parser and function calls documentation
- Add comprehensive test suite for function calls
- Update CMakeLists.txt and Makefile for new components

* Enhance function calling with unified streaming and parser improvements

- Fix streaming content cleanup to prevent function syntax in output
- Unify content extraction patterns with llama.cpp approach
- Improve Kimi K2 parser robustness and partial content handling
- Add comprehensive test coverage for function call scenarios
- Optimize chat message parsing and diff computation

* Replace hardcoded values in kimi_k2_parser.hpp with named constants

- Add compile-time constants for all token format markers
- Add compile-time constants for XML format markers
- Add compile-time constants for simple format patterns
- Replace all hardcoded string literals with named constants
- Use compile-time length calculation to avoid manual counting
- Improve maintainability and reduce magic numbers throughout parser

* Fix duplicate common_chat_parse definition

- Remove duplicate implementation from chat-parser.cpp
- Keep single implementation in chat.cpp following llama.cpp patterns
- Resolves linker error: multiple definition of common_chat_parse

* Fix JSON assertion failure in function call parsing

- Add proper validation that 'function' field is an object before accessing nested keys
- Handle missing 'arguments' field gracefully with default "{}"
- Prevents crash when parsing malformed tool call JSON structures

* Add comprehensive Qwen3 XML tool calling support with unit tests

- Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format
- Add model detection and routing for Qwen3 vs Kimi-K2 formats
- Create 8 comprehensive unit tests covering parsing, streaming, error handling
- Fix token format cleaning bug in kimi_k2_parser.hpp processing order
- Remove progressive parsing code and related utilities
- Add tool injection support for Qwen3 format in server utils

* Add DeepSeek R1 function calling support with comprehensive unit tests

- Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp
- Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp
- Update function_calls.hpp with DeepSeek R1 integration and content extraction
- Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models
- Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration
- Port exact implementation patterns from original llama.cpp for compatibility

Key features:
- Native DeepSeek R1 format: <|tool▁calls▁begin|>function<|tool▁sep|>name```json{}```<|tool▁call▁end|><|tool▁calls▁end|>
- Reasoning content extraction from <think>...</think> tags
- Multiple tool calls support with separate call blocks
- Model detection for deepseek-r1, deepseek_r1 naming patterns
- Integration with incremental parsing and streaming support

* Add partial parsing support for JSON and regex

- json-partial.h/cpp: JSON partial parsing functionality
- regex-partial.h/cpp: Regex partial parsing functionality

* Add format_chat integration tests for Qwen3 tool injection

- Add test_qwen3_format_chat_integration() to validate tool injection pipeline
- Test tool injection conditions and system message enhancement
- Verify JSON formatting and anti-preamble instructions
- Add comprehensive test documentation

Tests confirm tool injection works correctly - conversational preamble
issue is not in ik_llama.cpp but likely in UI configuration.

* Fix Qwen3 tool call parsing - pass model name to parser

Server was not passing model name to parse_chat_message_incremental(),
causing Qwen3 to fall back to Kimi-K2 parser and return tool calls
as content instead of proper tool_calls array.

* Fix non-streaming path to use model-specific parsing

Non-streaming responses were hardcoded to use Kimi-K2 format,
causing Qwen3 XML tool calls to be returned as content instead
of proper tool_calls array. Now uses same model detection as
streaming path for consistency.
2025-07-23 18:11:42 +02:00

213 lines
8.5 KiB
C++
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#pragma once
#include "json.hpp"
#include "streaming_chat.hpp"
#include "parsers/kimi_k2_parser.hpp"
#include "parsers/qwen3_parser.hpp"
#include "qwen3_tools.hpp"
#include "deepseek_r1_tools.hpp"
#include "../../common/chat.h"
#include "../../common/chat-parser.h"
#include <string>
#include <regex>
using json = nlohmann::ordered_json;
// Function calling interface for Kimi-K2 format
static json parse_kimi_k2_tool_calls(const std::string& text) {
return kimi_k2::parse_tool_calls(text);
}
// Function calling interface for Qwen3 format
static json parse_qwen3_tool_calls(const std::string& text) {
return qwen3::parse_tool_calls(text);
}
static std::string clean_function_calls_from_content(const std::string& content) {
return kimi_k2::clean_content(content);
}
// New llama.cpp-style content extraction with streaming support
static std::string extract_content_from_mixed_input(const std::string& content, bool is_partial, const std::string& model_name = "") {
if (is_qwen3_model(model_name)) {
return qwen3::extract_content_during_parsing(content, is_partial);
} else if (is_deepseek_r1_model(model_name)) {
// DeepSeek R1 content extraction - remove <think> tags and tool calls
std::string result = content;
// Remove <think>...</think> tags
size_t think_start = 0;
while ((think_start = result.find("<think>", think_start)) != std::string::npos) {
size_t think_end = result.find("</think>", think_start);
if (think_end != std::string::npos) {
result.erase(think_start, think_end + 8 - think_start);
} else {
break;
}
}
// Remove DeepSeek R1 tool call syntax
size_t tool_start = 0;
while ((tool_start = result.find("<tool▁calls▁begin>", tool_start)) != std::string::npos) {
size_t tool_end = result.find("<tool▁calls▁end>", tool_start);
if (tool_end != std::string::npos) {
result.erase(tool_start, tool_end + strlen("<tool▁calls▁end>") - tool_start);
} else {
break;
}
}
return result;
} else {
return kimi_k2::extract_content_during_parsing(content, is_partial);
}
}
// Incremental parsing for streaming tool calls with model detection
static ik_chat_msg parse_chat_message_incremental(const std::string& content, bool is_partial = false, const std::string& model_name = "") {
ik_chat_msg msg;
msg.role = "assistant";
try {
json tool_calls_json;
bool has_function_syntax = false;
// Route parsing based on model type
if (is_qwen3_model(model_name)) {
// Use Qwen3 XML parser
tool_calls_json = parse_qwen3_tool_calls(content);
// Check for partial content during streaming
if (is_partial && qwen3::is_partial_content_advanced(content)) {
throw std::runtime_error("partial structured content detected");
}
// Check for malformed XML tool call syntax
has_function_syntax = content.find("<tool_call>") != std::string::npos;
} else if (is_deepseek_r1_model(model_name)) {
// Use common chat parser for DeepSeek R1
try {
common_chat_syntax syntax;
syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1;
syntax.enable_tool_calls = true;
common_chat_msg_parser parser(content, is_partial, syntax);
parser.parse();
auto result = parser.result();
// Convert tool calls to JSON format expected by the system
tool_calls_json = json::array();
for (const auto& tool_call : result.tool_calls) {
json tc;
tc["id"] = tool_call.id.empty() ? ("call_" + std::to_string(rand())) : tool_call.id;
tc["type"] = "function";
tc["function"]["name"] = tool_call.name;
tc["function"]["arguments"] = tool_call.arguments;
tool_calls_json.push_back(tc);
}
// Check for malformed DeepSeek R1 tool call syntax
has_function_syntax = content.find("<tool▁calls▁begin>") != std::string::npos;
} catch (const common_chat_msg_partial_exception&) {
if (is_partial) {
throw std::runtime_error("partial structured content detected");
}
// If not partial, treat as regular content
tool_calls_json = json::array();
has_function_syntax = false;
}
} else {
// Default to Kimi-K2 parser
tool_calls_json = parse_kimi_k2_tool_calls(content);
// Check for partial content during streaming
if (is_partial && kimi_k2::is_partial_content_advanced(content)) {
throw std::runtime_error("partial structured content detected");
}
// Check for malformed function call syntax
has_function_syntax = content.find("functions.") != std::string::npos;
}
bool parsing_succeeded = !tool_calls_json.empty();
if (has_function_syntax && !parsing_succeeded) {
throw std::runtime_error("malformed function call syntax detected");
}
// Process successful parsing results
if (!tool_calls_json.empty()) {
for (const auto& tc_json : tool_calls_json) {
try {
ik_chat_tool_call tc;
tc.id = tc_json.value("id", "");
if (!tc_json.contains("function") || !tc_json["function"].is_object() || !tc_json["function"].contains("name")) {
continue;
}
tc.name = tc_json["function"]["name"];
if (tc.name.empty()) {
continue;
}
if (tc_json["function"].contains("arguments")) {
tc.arguments = tc_json["function"]["arguments"];
} else {
tc.arguments = "{}";
}
// Validate arguments (only if not partial)
if (!is_partial && !tc.arguments.empty()) {
try {
auto parsed = json::parse(tc.arguments);
(void)parsed;
} catch (const std::exception&) {
continue;
}
}
msg.tool_calls.push_back(tc);
} catch (const std::exception&) {
continue;
}
}
// Use model-specific content extraction
if (is_qwen3_model(model_name)) {
msg.content = qwen3::extract_content_during_parsing(content, is_partial);
} else {
msg.content = kimi_k2::extract_content_during_parsing(content, is_partial);
}
} else {
// No tool calls found, extract content
if (is_qwen3_model(model_name)) {
msg.content = qwen3::extract_content_during_parsing(content, is_partial);
} else {
msg.content = kimi_k2::extract_content_during_parsing(content, is_partial);
}
}
} catch (const std::exception& e) {
if (!is_partial) {
// Original llama.cpp fallback pattern - use public API
common_chat_syntax syntax;
syntax.format = COMMON_CHAT_FORMAT_CONTENT_ONLY; // Use content-only format
// Use the public API that handles fallback internally
common_chat_msg fallback_result = common_chat_parse(content, is_partial, syntax);
// Convert to ik_chat_msg
msg.tool_calls.clear();
msg.content = fallback_result.content;
}
// If is_partial=true, keep empty result (no content chunks during streaming)
}
return msg;
}
static std::string generate_tool_call_id() {
static int counter = 0;
return "call_" + std::to_string(++counter);
}