Files
ik_llama.cpp/examples/server/deepseek_r1_tools.hpp
Anton Sokolchenko 9ee72225dc Function calling support for Kimi-K2 (#628)
* Implement function calling / tools for ik_llama.cpp for Kimi K2

* Implement basic tool choice

* Backport llama.cpp tool calls support

* Enhance function calls with improved chat parser and string utilities

- Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling
- Improve function calls parsing with fallback to llama.cpp builder pattern
- Add string utility functions (starts_with, ends_with, find_partial_stop)
- Update README with function calls testing instructions
- Enhance Kimi K2 parser and function calls documentation
- Add comprehensive test suite for function calls
- Update CMakeLists.txt and Makefile for new components

* Enhance function calling with unified streaming and parser improvements

- Fix streaming content cleanup to prevent function syntax in output
- Unify content extraction patterns with llama.cpp approach
- Improve Kimi K2 parser robustness and partial content handling
- Add comprehensive test coverage for function call scenarios
- Optimize chat message parsing and diff computation

* Replace hardcoded values in kimi_k2_parser.hpp with named constants

- Add compile-time constants for all token format markers
- Add compile-time constants for XML format markers
- Add compile-time constants for simple format patterns
- Replace all hardcoded string literals with named constants
- Use compile-time length calculation to avoid manual counting
- Improve maintainability and reduce magic numbers throughout parser

* Fix duplicate common_chat_parse definition

- Remove duplicate implementation from chat-parser.cpp
- Keep single implementation in chat.cpp following llama.cpp patterns
- Resolves linker error: multiple definition of common_chat_parse

* Fix JSON assertion failure in function call parsing

- Add proper validation that 'function' field is an object before accessing nested keys
- Handle missing 'arguments' field gracefully with default "{}"
- Prevents crash when parsing malformed tool call JSON structures

* Add comprehensive Qwen3 XML tool calling support with unit tests

- Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format
- Add model detection and routing for Qwen3 vs Kimi-K2 formats
- Create 8 comprehensive unit tests covering parsing, streaming, error handling
- Fix token format cleaning bug in kimi_k2_parser.hpp processing order
- Remove progressive parsing code and related utilities
- Add tool injection support for Qwen3 format in server utils

* Add DeepSeek R1 function calling support with comprehensive unit tests

- Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp
- Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp
- Update function_calls.hpp with DeepSeek R1 integration and content extraction
- Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models
- Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration
- Port exact implementation patterns from original llama.cpp for compatibility

Key features:
- Native DeepSeek R1 format: <|tool▁calls▁begin|>function<|tool▁sep|>name```json{}```<|tool▁call▁end|><|tool▁calls▁end|>
- Reasoning content extraction from <think>...</think> tags
- Multiple tool calls support with separate call blocks
- Model detection for deepseek-r1, deepseek_r1 naming patterns
- Integration with incremental parsing and streaming support

* Add partial parsing support for JSON and regex

- json-partial.h/cpp: JSON partial parsing functionality
- regex-partial.h/cpp: Regex partial parsing functionality

* Add format_chat integration tests for Qwen3 tool injection

- Add test_qwen3_format_chat_integration() to validate tool injection pipeline
- Test tool injection conditions and system message enhancement
- Verify JSON formatting and anti-preamble instructions
- Add comprehensive test documentation

Tests confirm tool injection works correctly - conversational preamble
issue is not in ik_llama.cpp but likely in UI configuration.

* Fix Qwen3 tool call parsing - pass model name to parser

Server was not passing model name to parse_chat_message_incremental(),
causing Qwen3 to fall back to Kimi-K2 parser and return tool calls
as content instead of proper tool_calls array.

* Fix non-streaming path to use model-specific parsing

Non-streaming responses were hardcoded to use Kimi-K2 format,
causing Qwen3 XML tool calls to be returned as content instead
of proper tool_calls array. Now uses same model detection as
streaming path for consistency.
2025-07-23 18:11:42 +02:00

82 lines
3.2 KiB
C++
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#pragma once
#include "json.hpp"
#include <string>
#include <vector>
#include <algorithm>
#include <cctype>
using json = nlohmann::ordered_json;
//
// DeepSeek R1 specific tool handling
// Based on original llama.cpp implementation
//
// Check if the model is DeepSeek R1 (based on common naming patterns)
inline bool is_deepseek_r1_model(const std::string & model_name) {
if (model_name.empty()) {
return false;
}
// Convert to lowercase for case-insensitive comparison
std::string lower_model = model_name;
std::transform(lower_model.begin(), lower_model.end(), lower_model.begin(), ::tolower);
// Check for DeepSeek R1 patterns (more specific than general deepseek)
return lower_model.find("deepseek-r1") != std::string::npos ||
lower_model.find("deepseek_r1") != std::string::npos ||
lower_model.find("deepseek r1") != std::string::npos ||
(lower_model.find("deepseek") != std::string::npos &&
(lower_model.find("-r1") != std::string::npos ||
lower_model.find("_r1") != std::string::npos ||
lower_model.find(" r1") != std::string::npos));
}
// Generate DeepSeek R1 tool format instructions (following original template patterns)
inline std::string deepseek_r1_tool_format_instructions() {
return "\n\nFor function calls, use the DeepSeek R1 format:\n"
"<tool▁calls▁begin>\n"
"<tool▁call▁begin>\n"
"function<tool▁sep><function_name>\n"
"```json\n"
"{\"arguments\": \"value\"}\n"
"```\n"
"<tool▁call▁end>\n"
"<tool▁calls▁end>";
}
// Generate tools description for DeepSeek R1
inline std::string deepseek_r1_tools_description(const json & tools) {
std::string tools_desc = "# Available Tools\n\n"
"You have access to the following functions. "
"Call them when needed to assist with the user's request.\n\n";
for (const auto & tool : tools) {
if (tool.contains("function")) {
const auto & func = tool["function"];
tools_desc += "**" + func["name"].get<std::string>() + "**: ";
tools_desc += func["description"].get<std::string>() + "\n";
}
}
return tools_desc;
}
// Inject tools into existing system message content
inline std::string deepseek_r1_inject_tools_to_system(const std::string & content, const json & tools) {
return content + "\n\n" + deepseek_r1_tools_description(tools) + deepseek_r1_tool_format_instructions();
}
// Create a new system message with tools for DeepSeek R1
inline std::string deepseek_r1_create_system_with_tools(const json & tools) {
std::string tools_prompt = "You are a helpful assistant with access to function calling capabilities.\n\n";
tools_prompt += deepseek_r1_tools_description(tools);
tools_prompt += deepseek_r1_tool_format_instructions();
return tools_prompt;
}
// Check if tools injection is needed for DeepSeek R1
inline bool deepseek_r1_should_inject_tools(const json & tools, const std::string & model_name) {
return !tools.empty() && tools.is_array() && is_deepseek_r1_model(model_name);
}