Files
ik_llama.cpp/common/chat-parser.cpp
Anton Sokolchenko f4051d9c3e Deepseek R1 function calls (more formats) (#652)
* Implement function calling / tools for ik_llama.cpp for Kimi K2

* Implement basic tool choice

* Backport llama.cpp tool calls support

* Enhance function calls with improved chat parser and string utilities

- Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling
- Improve function calls parsing with fallback to llama.cpp builder pattern
- Add string utility functions (starts_with, ends_with, find_partial_stop)
- Update README with function calls testing instructions
- Enhance Kimi K2 parser and function calls documentation
- Add comprehensive test suite for function calls
- Update CMakeLists.txt and Makefile for new components

* Enhance function calling with unified streaming and parser improvements

- Fix streaming content cleanup to prevent function syntax in output
- Unify content extraction patterns with llama.cpp approach
- Improve Kimi K2 parser robustness and partial content handling
- Add comprehensive test coverage for function call scenarios
- Optimize chat message parsing and diff computation

* Replace hardcoded values in kimi_k2_parser.hpp with named constants

- Add compile-time constants for all token format markers
- Add compile-time constants for XML format markers
- Add compile-time constants for simple format patterns
- Replace all hardcoded string literals with named constants
- Use compile-time length calculation to avoid manual counting
- Improve maintainability and reduce magic numbers throughout parser

* Fix duplicate common_chat_parse definition

- Remove duplicate implementation from chat-parser.cpp
- Keep single implementation in chat.cpp following llama.cpp patterns
- Resolves linker error: multiple definition of common_chat_parse

* Fix JSON assertion failure in function call parsing

- Add proper validation that 'function' field is an object before accessing nested keys
- Handle missing 'arguments' field gracefully with default "{}"
- Prevents crash when parsing malformed tool call JSON structures

* Add comprehensive Qwen3 XML tool calling support with unit tests

- Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format
- Add model detection and routing for Qwen3 vs Kimi-K2 formats
- Create 8 comprehensive unit tests covering parsing, streaming, error handling
- Fix token format cleaning bug in kimi_k2_parser.hpp processing order
- Remove progressive parsing code and related utilities
- Add tool injection support for Qwen3 format in server utils

* Add DeepSeek R1 function calling support with comprehensive unit tests

- Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp
- Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp
- Update function_calls.hpp with DeepSeek R1 integration and content extraction
- Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models
- Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration
- Port exact implementation patterns from original llama.cpp for compatibility

Key features:
- Native DeepSeek R1 format: <|tool▁calls▁begin|>function<|tool▁sep|>name```json{}```<|tool▁call▁end|><|tool▁calls▁end|>
- Reasoning content extraction from <think>...</think> tags
- Multiple tool calls support with separate call blocks
- Model detection for deepseek-r1, deepseek_r1 naming patterns
- Integration with incremental parsing and streaming support

* Add partial parsing support for JSON and regex

- json-partial.h/cpp: JSON partial parsing functionality
- regex-partial.h/cpp: Regex partial parsing functionality

* Add format_chat integration tests for Qwen3 tool injection

- Add test_qwen3_format_chat_integration() to validate tool injection pipeline
- Test tool injection conditions and system message enhancement
- Verify JSON formatting and anti-preamble instructions
- Add comprehensive test documentation

Tests confirm tool injection works correctly - conversational preamble
issue is not in ik_llama.cpp but likely in UI configuration.

* Fix Qwen3 tool call parsing - pass model name to parser

Server was not passing model name to parse_chat_message_incremental(),
causing Qwen3 to fall back to Kimi-K2 parser and return tool calls
as content instead of proper tool_calls array.

* Fix non-streaming path to use model-specific parsing

Non-streaming responses were hardcoded to use Kimi-K2 format,
causing Qwen3 XML tool calls to be returned as content instead
of proper tool_calls array. Now uses same model detection as
streaming path for consistency.

* Update Qwen3 function call handling in server and tests

- Enhanced server function call detection and response formatting
- Improved test coverage for Qwen3 tool call scenarios
- Refined XML parsing for better tool execution support

* Add DeepSeek-R1 function call parsing support

Implements comprehensive parsing for all 4 DeepSeek-R1 function call formats:
- Format 1: Standard function call syntax (already supported)
- Format 2: Alternative function call patterns (already supported)
- Format 3: Tools array format - function\n```json\n{"tools": [...]}
- Format 4: XML wrapped format - <tool_call>function</think>Name\n```json\n{...}```</tool_call>

Key changes:
- Added parse_deepseek_r1_tools_array() following original parse_prefixed_json_tool_call_array pattern
- Added parse_deepseek_r1_xml_wrapped() following Hermes-2-Pro XML wrapper patterns
- Integrated both parsers into exception handling chain for robust fallback
- Added comprehensive TDD test coverage for all formats
- Anonymized all confidential information while preserving functionality

Resolves tool_calls_count=0 issue where DeepSeek-R1 models generated valid tool calls
but server failed to parse them correctly.

* Update function_calls.md documentation for DeepSeek-R1 Format 4

- Added Format 4 (XML wrapped) documentation with examples
- Updated implementation notes with correct parser order (3→4→1→2)
- Marked all DeepSeek-R1 formats as working (July 2025 update)
- Updated test status for Format 3 and 4 as passing
- Added parse_deepseek_r1_xml_wrapped() function reference
- Corrected implementation file line numbers

* Fix merge conflict in test-function-calls.cpp

- Removed incomplete merge conflict marker from line 3027
- Ensured all tests compile and pass successfully
- All DeepSeek-R1 formats (1-4) working correctly
- All streaming and content cleaning tests passing
2025-08-07 08:15:57 +03:00

492 lines
16 KiB
C++
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// Chat parser implementation
#include "chat-parser.h"
#include "../examples/server/parsers/kimi_k2_parser.hpp"
#include "json.hpp"
#include "common.h"
using json = nlohmann::ordered_json;
common_chat_msg_parser::common_chat_msg_parser(const std::string & input, bool is_partial, const common_chat_syntax & syntax)
: input_(input), is_partial_(is_partial), syntax_(syntax) {
// Initialize result with default role
result_.role = "assistant";
}
std::string common_chat_msg_parser::str(const common_string_range & rng) const {
if (rng.begin > input_.size() || rng.end > input_.size()) {
throw std::runtime_error("Range out of bounds");
}
return input_.substr(rng.begin, rng.end - rng.begin);
}
void common_chat_msg_parser::add_content(const std::string & content) {
result_.content += content;
}
void common_chat_msg_parser::add_reasoning_content(const std::string & reasoning_content) {
result_.reasoning_content += reasoning_content;
}
void common_chat_msg_parser::add_tool_call(const common_chat_tool_call & tool_call) {
result_.tool_calls.push_back(tool_call);
}
bool common_chat_msg_parser::add_tool_call(const std::string & name, const std::string & id, const std::string & arguments) {
if (name.empty()) {
return false;
}
common_chat_tool_call tool_call;
tool_call.name = name;
tool_call.arguments = arguments;
tool_call.id = id;
result_.tool_calls.emplace_back(tool_call);
return true;
}
bool common_chat_msg_parser::add_tool_call(const json & tool_call) {
std::string name = tool_call.contains("name") ? tool_call.at("name") : "";
std::string id = tool_call.contains("id") ? tool_call.at("id") : "";
std::string arguments = tool_call.contains("arguments") ? tool_call.at("arguments") : "";
return add_tool_call(name, id, arguments);
}
bool common_chat_msg_parser::add_tool_calls(const json & arr) {
for (const auto & item : arr) {
if (!add_tool_call(item)) {
return false;
}
}
return true;
}
void common_chat_msg_parser::clear_tools() {
result_.tool_calls.clear();
}
std::string common_chat_msg_parser::consume_rest() {
auto rest = input_.substr(pos_);
pos_ = input_.size();
return rest;
}
bool common_chat_msg_parser::try_consume_literal(const std::string & literal) {
if (pos_ + literal.size() <= input_.size()) {
if (input_.substr(pos_, literal.size()) == literal) {
pos_ += literal.size();
return true;
}
}
return false;
}
bool common_chat_msg_parser::try_parse_reasoning(const std::string & start_think, const std::string & end_think) {
auto start_pos = input_.find(start_think, pos_);
if (start_pos == std::string::npos) {
return false;
}
auto end_pos = input_.find(end_think, start_pos + start_think.size());
if (end_pos == std::string::npos) {
if (is_partial_) {
// Partial reasoning content
auto reasoning = input_.substr(start_pos + start_think.size());
add_reasoning_content(string_strip(reasoning));
pos_ = input_.size();
return true;
}
return false;
}
// Extract reasoning content
auto reasoning = input_.substr(start_pos + start_think.size(), end_pos - start_pos - start_think.size());
add_reasoning_content(string_strip(reasoning));
pos_ = end_pos + end_think.size();
return true;
}
std::optional<common_chat_msg_parser::find_regex_result> common_chat_msg_parser::try_find_literal_legacy(const std::string & literal) {
auto idx = input_.find(literal, pos_);
if (idx != std::string::npos) {
find_regex_result res;
res.prelude = input_.substr(pos_, idx - pos_);
auto end = idx + literal.size();
res.groups.emplace_back(common_string_range{idx, end});
move_to(end);
return res;
}
if (is_partial_) {
idx = string_find_partial_stop(input_, literal);
if (idx != std::string::npos && idx >= pos_) {
find_regex_result res;
res.prelude = input_.substr(pos_, idx - pos_);
auto end = input_.size();
res.groups.emplace_back(common_string_range{idx, end});
move_to(end);
return res;
}
}
return std::nullopt;
}
void common_chat_msg_parser::parse() {
switch (syntax_.format) {
case COMMON_CHAT_FORMAT_KIMI_K2:
parse_kimi_k2_format();
break;
case COMMON_CHAT_FORMAT_DEEPSEEK_R1:
parse_deepseek_r1_format();
break;
case COMMON_CHAT_FORMAT_GENERIC:
parse_generic_format();
break;
case COMMON_CHAT_FORMAT_CONTENT_ONLY:
add_content(consume_rest());
break;
default:
// Fallback to content-only for now
add_content(consume_rest());
break;
}
}
void common_chat_msg_parser::parse_kimi_k2_format() {
json tool_calls_json = kimi_k2::parse_tool_calls(input_);
if (is_partial_ && kimi_k2::is_partial_content_advanced(input_)) {
throw common_chat_msg_partial_exception("partial structured content detected");
}
bool has_function_syntax = input_.find("functions.") != std::string::npos;
bool parsing_succeeded = !tool_calls_json.empty();
if (has_function_syntax && !parsing_succeeded) {
throw std::runtime_error("malformed function call syntax detected");
}
if (!tool_calls_json.empty()) {
for (const auto& tc_json : tool_calls_json) {
try {
common_chat_tool_call tc;
tc.id = tc_json.value("id", "");
if (!tc_json.contains("function") || !tc_json["function"].contains("name")) {
continue;
}
tc.name = tc_json["function"]["name"];
if (tc.name.empty()) {
continue;
}
tc.arguments = tc_json["function"]["arguments"];
if (!is_partial_ && !tc.arguments.empty()) {
try {
auto parsed = json::parse(tc.arguments);
(void)parsed;
} catch (const std::exception&) {
continue;
}
}
add_tool_call(tc);
} catch (const std::exception&) {
continue;
}
}
add_content(kimi_k2::clean_content(input_));
} else {
add_content(input_);
}
pos_ = input_.size();
}
void common_chat_msg_parser::parse_generic_format() {
add_content(consume_rest());
}
void common_chat_msg_parser::parse_deepseek_r1_format() {
// Delegate to the main chat.cpp function which has the corrected implementation
// This follows the original llama.cpp pattern where chat-parser delegates to chat.cpp
common_chat_parse_deepseek_r1(*this);
}
void common_chat_msg_parser::finish() {
// Any final processing can go here
}
common_chat_msg common_chat_msg_parser::result_and_reset() {
auto msg = result_;
result_ = common_chat_msg();
result_.role = "assistant";
pos_ = 0;
return msg;
}
// Content-only parsing for fallback scenarios
// Format detection from chat template patterns (focused on DeepSeek R1 and Kimi K2)
common_chat_format common_chat_format_detect(const std::string & chat_template) {
if (chat_template.empty()) {
return COMMON_CHAT_FORMAT_GENERIC;
}
// Detect DeepSeek R1 format (following original llama.cpp detection logic)
if (chat_template.find("<tool▁calls▁begin>") != std::string::npos) {
return COMMON_CHAT_FORMAT_DEEPSEEK_R1;
}
// Detect Kimi K2 format (our custom format)
if (chat_template.find("kimi") != std::string::npos ||
chat_template.find("Kimi") != std::string::npos ||
chat_template.find("functions.") != std::string::npos) {
return COMMON_CHAT_FORMAT_KIMI_K2;
}
// Default to generic format for unknown templates
return COMMON_CHAT_FORMAT_GENERIC;
}
// Progressive parsing primitive - find literal (following original llama.cpp pattern)
std::optional<common_chat_msg_parser::find_regex_result> common_chat_msg_parser::try_find_literal(const std::string & literal) {
auto idx = input_.find(literal, pos_);
if (idx != std::string::npos) {
find_regex_result res;
res.prelude = input_.substr(pos_, idx - pos_);
auto end = idx + literal.size();
res.groups.emplace_back(common_string_range{idx, end});
move_to(end);
return res;
}
if (is_partial_) {
idx = string_find_partial_stop(input_, literal);
if (idx != std::string::npos && idx >= pos_) {
find_regex_result res;
res.prelude = input_.substr(pos_, idx - pos_);
auto end = input_.size();
res.groups.emplace_back(common_string_range{idx, end});
move_to(end);
return res;
}
}
return std::nullopt;
}
bool common_chat_msg_parser::consume_spaces() {
bool consumed = false;
while (pos_ < input_.length() && std::isspace(input_[pos_])) {
pos_++;
consumed = true;
}
return consumed;
}
void common_chat_msg_parser::set_healing_marker(const std::string & marker) {
healing_marker_ = marker;
}
// Enhanced JSON parsing methods (following original llama.cpp patterns exactly)
std::optional<common_json> common_chat_msg_parser::try_consume_json() {
auto it = input_.cbegin() + pos_;
const auto end = input_.cend();
common_json result;
if (!common_json_parse(it, end, healing_marker_, result)) {
return std::nullopt;
}
pos_ = std::distance(input_.cbegin(), it);
if (result.healing_marker.marker.empty()) {
// No healing marker, just return the parsed json
return result;
}
if (!is_partial()) {
throw common_chat_msg_partial_exception("JSON");
}
return result;
}
common_json common_chat_msg_parser::consume_json() {
if (auto result = try_consume_json()) {
return *result;
}
throw common_chat_msg_partial_exception("JSON");
}
common_chat_msg_parser::consume_json_result common_chat_msg_parser::consume_json_with_dumped_args(
const std::vector<std::vector<std::string>>& args_paths,
const std::vector<std::vector<std::string>>& content_paths
) {
if (auto result = try_consume_json_with_dumped_args(args_paths, content_paths)) {
return *result;
}
throw common_chat_msg_partial_exception("JSON");
}
std::optional<common_chat_msg_parser::consume_json_result> common_chat_msg_parser::try_consume_json_with_dumped_args(
const std::vector<std::vector<std::string>>& args_paths,
const std::vector<std::vector<std::string>>& content_paths
) {
auto partial = try_consume_json();
if (!partial) {
return std::nullopt;
}
auto is_arguments_path = [&](const std::vector<std::string> & path) {
return std::find(args_paths.begin(), args_paths.end(), path) != args_paths.end();
};
auto is_content_path = [&](const std::vector<std::string> & path) {
return std::find(content_paths.begin(), content_paths.end(), path) != content_paths.end();
};
if (partial->healing_marker.marker.empty()) {
if (args_paths.empty()) {
// No arguments to dump, and JSON was parsed fully.
return consume_json_result {
partial->json,
/* .is_partial = */ false,
};
}
if (is_arguments_path({})) {
// Entire JSON is the arguments and was parsed fully.
return consume_json_result {
partial->json.dump(),
/* .is_partial = */ false,
};
}
// TODO: Implement full path-based argument dumping logic from original
// For now, return the parsed JSON as-is
return consume_json_result {
partial->json,
/* .is_partial = */ false,
};
}
// Has healing marker - this is partial JSON
// TODO: Implement sophisticated partial JSON handling with path-based dumping
// For now, return partial result
return consume_json_result {
partial->json,
/* .is_partial = */ true,
};
}
bool common_chat_msg_parser::detect_partial_function_call(const std::string& content) {
if (content.empty()) return false;
// Enhanced partial detection patterns
static const std::vector<std::string> partial_patterns = {
"functions",
"functions.",
"<tool_call",
"<tool_call>",
"<invoke",
"<|tool_calls_section_begin|>",
"<|tool_call_begin|>"
};
for (const auto& pattern : partial_patterns) {
if (content.substr(0, pattern.length()) == pattern && content.length() <= pattern.length() + 50) {
return true;
}
}
return false;
}
void common_chat_msg_parser::handle_partial_detection() {
if (!is_partial_) return;
// Check for various partial patterns
std::string remaining = input_.substr(pos_);
if (remaining.empty()) return;
// Detect partial function calls
if (detect_partial_function_call(remaining)) {
set_healing_marker(remaining);
throw common_chat_msg_partial_exception("partial function call detected");
}
// Enhanced partial JSON detection
if (remaining.find('{') != std::string::npos) {
size_t brace_pos = remaining.find('{');
std::string json_part = remaining.substr(brace_pos);
// Check if JSON is incomplete
int brace_count = 0;
bool in_string = false;
bool escaped = false;
bool is_incomplete = true;
for (size_t i = 0; i < json_part.length(); i++) {
char c = json_part[i];
if (!escaped) {
if (c == '"' && !in_string) {
in_string = true;
} else if (c == '"' && in_string) {
in_string = false;
} else if (!in_string) {
if (c == '{') brace_count++;
else if (c == '}') brace_count--;
}
}
escaped = (!escaped && c == '\\');
if (brace_count == 0) {
is_incomplete = false;
break;
}
}
if (is_incomplete) {
set_healing_marker(json_part);
throw common_chat_msg_partial_exception("partial JSON detected");
}
}
}
// Regex-based parsing methods (ported from original llama.cpp)
std::optional<common_chat_msg_parser::find_regex_result> common_chat_msg_parser::try_find_regex(const common_regex & regex, size_t from, bool add_prelude_to_content) {
auto m = regex.search(input_, from == std::string::npos ? pos_ : from);
if (m.type == COMMON_REGEX_MATCH_TYPE_NONE) {
return std::nullopt;
}
auto prelude = input_.substr(pos_, m.groups[0].begin - pos_);
pos_ = m.groups[0].end;
if (add_prelude_to_content) {
add_content(prelude);
}
if (m.type == COMMON_REGEX_MATCH_TYPE_PARTIAL) {
if (is_partial()) {
throw common_chat_msg_partial_exception(regex.str());
}
return std::nullopt;
}
return find_regex_result{prelude, m.groups};
}
common_chat_msg_parser::find_regex_result common_chat_msg_parser::consume_regex(const common_regex & regex) {
auto result = try_find_regex(regex);
if (!result) {
throw std::runtime_error("Expected regex not found: " + regex.str());
}
return *result;
}
std::optional<common_chat_msg_parser::find_regex_result> common_chat_msg_parser::try_consume_regex(const common_regex & regex) {
return try_find_regex(regex, pos_, false);
}
void common_chat_msg_parser::consume_literal(const std::string & literal) {
if (!try_consume_literal(literal)) {
throw std::runtime_error("Expected literal not found: " + literal);
}
}
// Get format name for debugging/logging (implemented in chat.cpp)