From dc1746338c2bb46c9238ae13d93b25700c1ebf21 Mon Sep 17 00:00:00 2001 From: Anton Sokolchenko Date: Fri, 8 Aug 2025 12:56:44 +0200 Subject: [PATCH] Fix for Deepseek r1 parsing (#676) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Implement function calling / tools for ik_llama.cpp for Kimi K2 * Implement basic tool choice * Backport llama.cpp tool calls support * Enhance function calls with improved chat parser and string utilities - Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling - Improve function calls parsing with fallback to llama.cpp builder pattern - Add string utility functions (starts_with, ends_with, find_partial_stop) - Update README with function calls testing instructions - Enhance Kimi K2 parser and function calls documentation - Add comprehensive test suite for function calls - Update CMakeLists.txt and Makefile for new components * Enhance function calling with unified streaming and parser improvements - Fix streaming content cleanup to prevent function syntax in output - Unify content extraction patterns with llama.cpp approach - Improve Kimi K2 parser robustness and partial content handling - Add comprehensive test coverage for function call scenarios - Optimize chat message parsing and diff computation * Replace hardcoded values in kimi_k2_parser.hpp with named constants - Add compile-time constants for all token format markers - Add compile-time constants for XML format markers - Add compile-time constants for simple format patterns - Replace all hardcoded string literals with named constants - Use compile-time length calculation to avoid manual counting - Improve maintainability and reduce magic numbers throughout parser * Fix duplicate common_chat_parse definition - Remove duplicate implementation from chat-parser.cpp - Keep single implementation in chat.cpp following llama.cpp patterns - Resolves linker error: multiple definition of common_chat_parse * Fix JSON assertion failure in function call parsing - Add proper validation that 'function' field is an object before accessing nested keys - Handle missing 'arguments' field gracefully with default "{}" - Prevents crash when parsing malformed tool call JSON structures * Add comprehensive Qwen3 XML tool calling support with unit tests - Implement Qwen3 XML parser with {"name": "func", "arguments": {...}} format - Add model detection and routing for Qwen3 vs Kimi-K2 formats - Create 8 comprehensive unit tests covering parsing, streaming, error handling - Fix token format cleaning bug in kimi_k2_parser.hpp processing order - Remove progressive parsing code and related utilities - Add tool injection support for Qwen3 format in server utils * Add DeepSeek R1 function calling support with comprehensive unit tests - Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp - Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp - Update function_calls.hpp with DeepSeek R1 integration and content extraction - Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models - Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration - Port exact implementation patterns from original llama.cpp for compatibility Key features: - Native DeepSeek R1 format: <|tool▁calls▁begin|>function<|tool▁sep|>name```json{}```<|tool▁call▁end|><|tool▁calls▁end|> - Reasoning content extraction from ... tags - Multiple tool calls support with separate call blocks - Model detection for deepseek-r1, deepseek_r1 naming patterns - Integration with incremental parsing and streaming support * Add partial parsing support for JSON and regex - json-partial.h/cpp: JSON partial parsing functionality - regex-partial.h/cpp: Regex partial parsing functionality * Add format_chat integration tests for Qwen3 tool injection - Add test_qwen3_format_chat_integration() to validate tool injection pipeline - Test tool injection conditions and system message enhancement - Verify JSON formatting and anti-preamble instructions - Add comprehensive test documentation Tests confirm tool injection works correctly - conversational preamble issue is not in ik_llama.cpp but likely in UI configuration. * Fix Qwen3 tool call parsing - pass model name to parser Server was not passing model name to parse_chat_message_incremental(), causing Qwen3 to fall back to Kimi-K2 parser and return tool calls as content instead of proper tool_calls array. * Fix non-streaming path to use model-specific parsing Non-streaming responses were hardcoded to use Kimi-K2 format, causing Qwen3 XML tool calls to be returned as content instead of proper tool_calls array. Now uses same model detection as streaming path for consistency. * Update Qwen3 function call handling in server and tests - Enhanced server function call detection and response formatting - Improved test coverage for Qwen3 tool call scenarios - Refined XML parsing for better tool execution support * Add DeepSeek-R1 function call parsing support Implements comprehensive parsing for all 4 DeepSeek-R1 function call formats: - Format 1: Standard function call syntax (already supported) - Format 2: Alternative function call patterns (already supported) - Format 3: Tools array format - function\n```json\n{"tools": [...]} - Format 4: XML wrapped format - functionName\n```json\n{...}``` Key changes: - Added parse_deepseek_r1_tools_array() following original parse_prefixed_json_tool_call_array pattern - Added parse_deepseek_r1_xml_wrapped() following Hermes-2-Pro XML wrapper patterns - Integrated both parsers into exception handling chain for robust fallback - Added comprehensive TDD test coverage for all formats - Anonymized all confidential information while preserving functionality Resolves tool_calls_count=0 issue where DeepSeek-R1 models generated valid tool calls but server failed to parse them correctly. * Update function_calls.md documentation for DeepSeek-R1 Format 4 - Added Format 4 (XML wrapped) documentation with examples - Updated implementation notes with correct parser order (3→4→1→2) - Marked all DeepSeek-R1 formats as working (July 2025 update) - Updated test status for Format 3 and 4 as passing - Added parse_deepseek_r1_xml_wrapped() function reference - Corrected implementation file line numbers * Fix merge conflict in test-function-calls.cpp - Removed incomplete merge conflict marker from line 3027 - Ensured all tests compile and pass successfully - All DeepSeek-R1 formats (1-4) working correctly - All streaming and content cleaning tests passing * Fix DeepSeek R1 parsing issue with responses wrapped in think tags Restore missing consume_rest() call from working PR #648 implementation. When responses don't contain tool calls, remaining content after reasoning parsing must be preserved as displayable content. Fixes issue where entire responses wrapped in tags resulted in empty content output. * Implement proper reasoning handling following original llama.cpp patterns - Add missing reasoning_format and reasoning_in_content fields to common_chat_syntax - Update try_parse_reasoning to match original llama.cpp logic exactly - Add TDD test case with reasoning_in_content=true for DeepSeek R1 - Following TDD: test should now pass with proper syntax configuration Based on original llama.cpp implementation patterns. * TDD SUCCESS: Fix DeepSeek R1 thinking tag termination issue ✅ Test passes with reasoning_in_content=true configuration - Content properly preserved: 'content' displays fully - Reasoning field empty as expected - Following TDD: test-first approach validates the fix Next: Update server to automatically apply this configuration. * Complete server integration fix for DeepSeek R1 thinking tag termination - Server now automatically sets reasoning_in_content=true for DeepSeek R1 models - Fixes issue where responses wrapped in tags appear empty to users * Add TDD test case for DeepSeek R1 thinking tag termination issue - Test reproduces the exact failure scenario reported by user - Validates that reasoning_in_content=true fixes the issue - Demonstrates empty content problem and working solution * Add remaining TDD test changes for DeepSeek R1 thinking tag fix * Add debug output after upstream merge * Remove temporary benchmark and debug files - Remove tests/benchmark-progressive-parsing.cpp (development tool, not part of core functionality) - Remove tests/reproduce_bug.sh (debugging script, not needed for PR) --- common/chat-parser.cpp | 46 ++++++++++++++--------- common/chat.cpp | 3 ++ common/chat.h | 10 +++++ examples/server/function_calls.hpp | 2 + tests/test-function-calls.cpp | 59 +++++++++++++++++++++++++++++- 5 files changed, 101 insertions(+), 19 deletions(-) diff --git a/common/chat-parser.cpp b/common/chat-parser.cpp index 3acba5d0..b2f5c91f 100644 --- a/common/chat-parser.cpp +++ b/common/chat-parser.cpp @@ -82,28 +82,38 @@ bool common_chat_msg_parser::try_consume_literal(const std::string & literal) { } bool common_chat_msg_parser::try_parse_reasoning(const std::string & start_think, const std::string & end_think) { - auto start_pos = input_.find(start_think, pos_); - if (start_pos == std::string::npos) { - return false; - } + auto handle_reasoning = [&](const std::string & reasoning, bool closed) { + auto stripped_reasoning = string_strip(reasoning); + if (stripped_reasoning.empty()) { + return; + } + if (syntax_.reasoning_in_content) { + add_content(syntax_.reasoning_format == COMMON_REASONING_FORMAT_DEEPSEEK ? "" : start_think); + add_content(stripped_reasoning); + if (closed) { + add_content(syntax_.reasoning_format == COMMON_REASONING_FORMAT_DEEPSEEK ? "" : end_think); + } + } else { + add_reasoning_content(stripped_reasoning); + } + }; - auto end_pos = input_.find(end_think, start_pos + start_think.size()); - if (end_pos == std::string::npos) { - if (is_partial_) { - // Partial reasoning content - auto reasoning = input_.substr(start_pos + start_think.size()); - add_reasoning_content(string_strip(reasoning)); - pos_ = input_.size(); + if (syntax_.reasoning_format != COMMON_REASONING_FORMAT_NONE) { + if (syntax_.thinking_forced_open || try_consume_literal(start_think)) { + if (auto res = try_find_literal(end_think)) { + handle_reasoning(res->prelude, /* closed */ true); + consume_spaces(); + return true; + } + auto rest = consume_rest(); + if (!rest.empty()) { + handle_reasoning(rest, /* closed */ !is_partial()); + } + // Allow unclosed thinking tags for now (following original llama.cpp) return true; } - return false; } - - // Extract reasoning content - auto reasoning = input_.substr(start_pos + start_think.size(), end_pos - start_pos - start_think.size()); - add_reasoning_content(string_strip(reasoning)); - pos_ = end_pos + end_think.size(); - return true; + return false; } std::optional common_chat_msg_parser::try_find_literal_legacy(const std::string & literal) { diff --git a/common/chat.cpp b/common/chat.cpp index 15cfbbf0..f62c2801 100644 --- a/common/chat.cpp +++ b/common/chat.cpp @@ -278,6 +278,9 @@ void common_chat_parse_deepseek_r1(common_chat_msg_parser & builder) { throw; // Re-throw for partial mode } } + + // Add any remaining content (critical for responses without tool calls) + builder.add_content(builder.consume_rest()); } // Parse DeepSeek R1 tools array format following original llama.cpp parse_prefixed_json_tool_call_array pattern diff --git a/common/chat.h b/common/chat.h index e23f84f3..5899ef1a 100644 --- a/common/chat.h +++ b/common/chat.h @@ -135,8 +135,18 @@ enum common_chat_format { COMMON_CHAT_FORMAT_KIMI_K2, // Our custom format (keep last for backward compatibility) }; +enum common_reasoning_format { + COMMON_REASONING_FORMAT_NONE, + COMMON_REASONING_FORMAT_DEEPSEEK, + COMMON_REASONING_FORMAT_DEEPSEEK_LEGACY, +}; + struct common_chat_syntax { common_chat_format format = COMMON_CHAT_FORMAT_KIMI_K2; + common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_NONE; + // Whether reasoning_content should be inlined in the content (e.g. for reasoning_format=deepseek in stream mode) + bool reasoning_in_content = false; + bool thinking_forced_open = false; bool enable_thinking = false; bool enable_tool_calls = true; }; diff --git a/examples/server/function_calls.hpp b/examples/server/function_calls.hpp index 068c5f24..92d25a0d 100644 --- a/examples/server/function_calls.hpp +++ b/examples/server/function_calls.hpp @@ -89,6 +89,8 @@ static ik_chat_msg parse_chat_message_incremental(const std::string& content, bo try { common_chat_syntax syntax; syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1; + syntax.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK; + syntax.reasoning_in_content = true; // Fix for thinking tag termination issue syntax.enable_tool_calls = true; common_chat_msg_parser parser(content, is_partial, syntax); diff --git a/tests/test-function-calls.cpp b/tests/test-function-calls.cpp index cfd560be..59af3804 100644 --- a/tests/test-function-calls.cpp +++ b/tests/test-function-calls.cpp @@ -3298,6 +3298,63 @@ int main() { std::cout << "✅ PASS: Qwen3 XML tool calls -> finish_reason='tool_calls'" << std::endl; std::cout << "🎯 All streaming finish_reason tests passed!" << std::endl; + + // TDD: Test for thinking tag termination issue - Reproduce user's exact complaint + std::cout << std::endl; + std::cout << "🧠 Testing DeepSeek R1 thinking tag termination issue..." << std::endl; + + // Test case: Response wrapped entirely in think tags (reported issue) + std::string wrapped_response = "This should be content but is wrapped in think tags"; + + std::cout << "\n 1. REPRODUCING FAILURE - Without fix (reasoning_in_content=false):" << std::endl; + + // First reproduce the failing behavior that user reported + common_chat_syntax broken_syntax; + broken_syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1; + broken_syntax.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK; + broken_syntax.reasoning_in_content = false; // This causes the reported issue + broken_syntax.enable_tool_calls = false; + + try { + auto broken_msg = common_chat_parse(wrapped_response, false, broken_syntax); + std::cout << " Content: '" << broken_msg.content << "'" << std::endl; + std::cout << " Reasoning: '" << broken_msg.reasoning_content << "'" << std::endl; + + if (broken_msg.content.empty() && !broken_msg.reasoning_content.empty()) { + std::cout << " ❌ REPRODUCED USER BUG: Content disappears (thinking tags don't terminate properly)" << std::endl; + std::cout << " User sees: EMPTY CONTENT - this is exactly what was reported!" << std::endl; + } + } catch (const std::exception& e) { + std::cout << " ❌ Exception: " << e.what() << std::endl; + } + + std::cout << "\n 2. DEMONSTRATING FIX - With fix (reasoning_in_content=true):" << std::endl; + + // Now show the fix works + common_chat_syntax fixed_syntax; + fixed_syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1; + fixed_syntax.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK; + fixed_syntax.reasoning_in_content = true; // Key fix: display thinking as content + fixed_syntax.enable_tool_calls = false; + + try { + auto msg = common_chat_parse(wrapped_response, false, fixed_syntax); + std::cout << " Content: '" << msg.content << "'" << std::endl; + std::cout << " Reasoning: '" << msg.reasoning_content << "'" << std::endl; + + if (msg.content.find("This should be content but is wrapped in think tags") != std::string::npos) { + std::cout << " ✅ PASS: Content properly preserved from think tags (with reasoning_in_content=true)" << std::endl; + std::cout << " User sees: Full content - this fixes the reported issue!" << std::endl; + } else if (msg.content.empty() && !msg.reasoning_content.empty()) { + std::cout << " ❌ FAILING TEST: Entire response treated as reasoning instead of content!" << std::endl; + std::cout << " Expected: Content should contain the text from within think tags" << std::endl; + } else { + std::cout << " ⚠️ PARTIAL: Some content found but may not contain expected text" << std::endl; + } + } catch (const std::exception& e) { + std::cout << " ❌ Exception in thinking tag test: " << e.what() << std::endl; + } + } catch (const std::exception& e) { std::cout << std::endl; std::cout << "❌ Test failed with exception: " << e.what() << std::endl; @@ -3305,4 +3362,4 @@ int main() { } return 0; -} \ No newline at end of file +}