From dc1746338c2bb46c9238ae13d93b25700c1ebf21 Mon Sep 17 00:00:00 2001
From: Anton Sokolchenko <wsevendays@gmail.com>
Date: Fri, 8 Aug 2025 12:56:44 +0200
Subject: [PATCH] Fix for Deepseek r1 parsing (#676)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Implement function calling / tools for ik_llama.cpp for Kimi K2

* Implement basic tool choice

* Backport llama.cpp tool calls support

* Enhance function calls with improved chat parser and string utilities

- Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling
- Improve function calls parsing with fallback to llama.cpp builder pattern
- Add string utility functions (starts_with, ends_with, find_partial_stop)
- Update README with function calls testing instructions
- Enhance Kimi K2 parser and function calls documentation
- Add comprehensive test suite for function calls
- Update CMakeLists.txt and Makefile for new components

* Enhance function calling with unified streaming and parser improvements

- Fix streaming content cleanup to prevent function syntax in output
- Unify content extraction patterns with llama.cpp approach
- Improve Kimi K2 parser robustness and partial content handling
- Add comprehensive test coverage for function call scenarios
- Optimize chat message parsing and diff computation

* Replace hardcoded values in kimi_k2_parser.hpp with named constants

- Add compile-time constants for all token format markers
- Add compile-time constants for XML format markers
- Add compile-time constants for simple format patterns
- Replace all hardcoded string literals with named constants
- Use compile-time length calculation to avoid manual counting
- Improve maintainability and reduce magic numbers throughout parser

* Fix duplicate common_chat_parse definition

- Remove duplicate implementation from chat-parser.cpp
- Keep single implementation in chat.cpp following llama.cpp patterns
- Resolves linker error: multiple definition of common_chat_parse

* Fix JSON assertion failure in function call parsing

- Add proper validation that 'function' field is an object before accessing nested keys
- Handle missing 'arguments' field gracefully with default "{}"
- Prevents crash when parsing malformed tool call JSON structures

* Add comprehensive Qwen3 XML tool calling support with unit tests

- Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format
- Add model detection and routing for Qwen3 vs Kimi-K2 formats
- Create 8 comprehensive unit tests covering parsing, streaming, error handling
- Fix token format cleaning bug in kimi_k2_parser.hpp processing order
- Remove progressive parsing code and related utilities
- Add tool injection support for Qwen3 format in server utils

* Add DeepSeek R1 function calling support with comprehensive unit tests

- Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp
- Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp
- Update function_calls.hpp with DeepSeek R1 integration and content extraction
- Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models
- Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration
- Port exact implementation patterns from original llama.cpp for compatibility

Key features:
- Native DeepSeek R1 format: <｜tool▁calls▁begin｜>function<｜tool▁sep｜>name```json{}```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
- Reasoning content extraction from <think>...</think> tags
- Multiple tool calls support with separate call blocks
- Model detection for deepseek-r1, deepseek_r1 naming patterns
- Integration with incremental parsing and streaming support

* Add partial parsing support for JSON and regex

- json-partial.h/cpp: JSON partial parsing functionality
- regex-partial.h/cpp: Regex partial parsing functionality

* Add format_chat integration tests for Qwen3 tool injection

- Add test_qwen3_format_chat_integration() to validate tool injection pipeline
- Test tool injection conditions and system message enhancement
- Verify JSON formatting and anti-preamble instructions
- Add comprehensive test documentation

Tests confirm tool injection works correctly - conversational preamble
issue is not in ik_llama.cpp but likely in UI configuration.

* Fix Qwen3 tool call parsing - pass model name to parser

Server was not passing model name to parse_chat_message_incremental(),
causing Qwen3 to fall back to Kimi-K2 parser and return tool calls
as content instead of proper tool_calls array.

* Fix non-streaming path to use model-specific parsing

Non-streaming responses were hardcoded to use Kimi-K2 format,
causing Qwen3 XML tool calls to be returned as content instead
of proper tool_calls array. Now uses same model detection as
streaming path for consistency.

* Update Qwen3 function call handling in server and tests

- Enhanced server function call detection and response formatting
- Improved test coverage for Qwen3 tool call scenarios
- Refined XML parsing for better tool execution support

* Add DeepSeek-R1 function call parsing support

Implements comprehensive parsing for all 4 DeepSeek-R1 function call formats:
- Format 1: Standard function call syntax (already supported)
- Format 2: Alternative function call patterns (already supported)
- Format 3: Tools array format - function\n```json\n{"tools": [...]}
- Format 4: XML wrapped format - <tool_call>function</think>Name\n```json\n{...}```</tool_call>

Key changes:
- Added parse_deepseek_r1_tools_array() following original parse_prefixed_json_tool_call_array pattern
- Added parse_deepseek_r1_xml_wrapped() following Hermes-2-Pro XML wrapper patterns
- Integrated both parsers into exception handling chain for robust fallback
- Added comprehensive TDD test coverage for all formats
- Anonymized all confidential information while preserving functionality

Resolves tool_calls_count=0 issue where DeepSeek-R1 models generated valid tool calls
but server failed to parse them correctly.

* Update function_calls.md documentation for DeepSeek-R1 Format 4

- Added Format 4 (XML wrapped) documentation with examples
- Updated implementation notes with correct parser order (3→4→1→2)
- Marked all DeepSeek-R1 formats as working (July 2025 update)
- Updated test status for Format 3 and 4 as passing
- Added parse_deepseek_r1_xml_wrapped() function reference
- Corrected implementation file line numbers

* Fix merge conflict in test-function-calls.cpp

- Removed incomplete merge conflict marker from line 3027
- Ensured all tests compile and pass successfully
- All DeepSeek-R1 formats (1-4) working correctly
- All streaming and content cleaning tests passing

* Fix DeepSeek R1 parsing issue with responses wrapped in think tags

Restore missing consume_rest() call from working PR #648 implementation.
When responses don't contain tool calls, remaining content after reasoning
parsing must be preserved as displayable content.

Fixes issue where entire responses wrapped in <think> tags resulted in
empty content output.

* Implement proper reasoning handling following original llama.cpp patterns

- Add missing reasoning_format and reasoning_in_content fields to common_chat_syntax
- Update try_parse_reasoning to match original llama.cpp logic exactly
- Add TDD test case with reasoning_in_content=true for DeepSeek R1
- Following TDD: test should now pass with proper syntax configuration

Based on original llama.cpp implementation patterns.

* TDD SUCCESS: Fix DeepSeek R1 thinking tag termination issue

✅ Test passes with reasoning_in_content=true configuration
- Content properly preserved: '<think>content</think>' displays fully
- Reasoning field empty as expected
- Following TDD: test-first approach validates the fix

Next: Update server to automatically apply this configuration.

* Complete server integration fix for DeepSeek R1 thinking tag termination

- Server now automatically sets reasoning_in_content=true for DeepSeek R1 models
- Fixes issue where responses wrapped in <think> tags appear empty to users

* Add TDD test case for DeepSeek R1 thinking tag termination issue

- Test reproduces the exact failure scenario reported by user
- Validates that reasoning_in_content=true fixes the issue
- Demonstrates empty content problem and working solution

* Add remaining TDD test changes for DeepSeek R1 thinking tag fix

* Add debug output after upstream merge

* Remove temporary benchmark and debug files

- Remove tests/benchmark-progressive-parsing.cpp (development tool, not part of core functionality)
- Remove tests/reproduce_bug.sh (debugging script, not needed for PR)
---
 common/chat-parser.cpp             | 46 ++++++++++++++---------
 common/chat.cpp                    |  3 ++
 common/chat.h                      | 10 +++++
 examples/server/function_calls.hpp |  2 +
 tests/test-function-calls.cpp      | 59 +++++++++++++++++++++++++++++-
 5 files changed, 101 insertions(+), 19 deletions(-)
diff --git a/common/chat-parser.cpp b/common/chat-parser.cpp
index 3acba5d0..b2f5c91f 100644
--- a/common/chat-parser.cpp
+++ b/common/chat-parser.cpp
@@ -82,28 +82,38 @@ bool common_chat_msg_parser::try_consume_literal(const std::string & literal) {
 }
 
 bool common_chat_msg_parser::try_parse_reasoning(const std::string & start_think, const std::string & end_think) {
-    auto start_pos = input_.find(start_think, pos_);
-    if (start_pos == std::string::npos) {
-        return false;
-    }
+    auto handle_reasoning = [&](const std::string & reasoning, bool closed) {
+        auto stripped_reasoning = string_strip(reasoning);
+        if (stripped_reasoning.empty()) {
+            return;
+        }
+        if (syntax_.reasoning_in_content) {
+            add_content(syntax_.reasoning_format == COMMON_REASONING_FORMAT_DEEPSEEK ? "<think>" : start_think);
+            add_content(stripped_reasoning);
+            if (closed) {
+                add_content(syntax_.reasoning_format == COMMON_REASONING_FORMAT_DEEPSEEK ? "</think>" : end_think);
+            }
+        } else {
+            add_reasoning_content(stripped_reasoning);
+        }
+    };
     
-    auto end_pos = input_.find(end_think, start_pos + start_think.size());
-    if (end_pos == std::string::npos) {
-        if (is_partial_) {
-            // Partial reasoning content
-            auto reasoning = input_.substr(start_pos + start_think.size());
-            add_reasoning_content(string_strip(reasoning));
-            pos_ = input_.size();
+    if (syntax_.reasoning_format != COMMON_REASONING_FORMAT_NONE) {
+        if (syntax_.thinking_forced_open || try_consume_literal(start_think)) {
+            if (auto res = try_find_literal(end_think)) {
+                handle_reasoning(res->prelude, /* closed */ true);
+                consume_spaces();
+                return true;
+            }
+            auto rest = consume_rest();
+            if (!rest.empty()) {
+                handle_reasoning(rest, /* closed */ !is_partial());
+            }
+            // Allow unclosed thinking tags for now (following original llama.cpp)
             return true;
         }
-        return false;
     }
-    
-    // Extract reasoning content
-    auto reasoning = input_.substr(start_pos + start_think.size(), end_pos - start_pos - start_think.size());
-    add_reasoning_content(string_strip(reasoning));
-    pos_ = end_pos + end_think.size();
-    return true;
+    return false;
 }
 
 std::optional<common_chat_msg_parser::find_regex_result> common_chat_msg_parser::try_find_literal_legacy(const std::string & literal) {
diff --git a/common/chat.cpp b/common/chat.cpp
index 15cfbbf0..f62c2801 100644
--- a/common/chat.cpp
+++ b/common/chat.cpp
@@ -278,6 +278,9 @@ void common_chat_parse_deepseek_r1(common_chat_msg_parser & builder) {
             throw; // Re-throw for partial mode
         }
     }
+    
+    // Add any remaining content (critical for responses without tool calls)
+    builder.add_content(builder.consume_rest());
 }
 
 // Parse DeepSeek R1 tools array format following original llama.cpp parse_prefixed_json_tool_call_array pattern
diff --git a/common/chat.h b/common/chat.h
index e23f84f3..5899ef1a 100644
--- a/common/chat.h
+++ b/common/chat.h
@@ -135,8 +135,18 @@ enum common_chat_format {
     COMMON_CHAT_FORMAT_KIMI_K2,  // Our custom format (keep last for backward compatibility)
 };
 
+enum common_reasoning_format {
+    COMMON_REASONING_FORMAT_NONE,
+    COMMON_REASONING_FORMAT_DEEPSEEK,
+    COMMON_REASONING_FORMAT_DEEPSEEK_LEGACY,
+};
+
 struct common_chat_syntax {
     common_chat_format format = COMMON_CHAT_FORMAT_KIMI_K2;
+    common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_NONE;
+    // Whether reasoning_content should be inlined in the content (e.g. for reasoning_format=deepseek in stream mode)
+    bool reasoning_in_content = false;
+    bool thinking_forced_open = false;
     bool enable_thinking = false;
     bool enable_tool_calls = true;
 };
diff --git a/examples/server/function_calls.hpp b/examples/server/function_calls.hpp
index 068c5f24..92d25a0d 100644
--- a/examples/server/function_calls.hpp
+++ b/examples/server/function_calls.hpp
@@ -89,6 +89,8 @@ static ik_chat_msg parse_chat_message_incremental(const std::string& content, bo
             try {
                 common_chat_syntax syntax;
                 syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1;
+                syntax.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK;
+                syntax.reasoning_in_content = true; // Fix for thinking tag termination issue
                 syntax.enable_tool_calls = true;
                 
                 common_chat_msg_parser parser(content, is_partial, syntax);
diff --git a/tests/test-function-calls.cpp b/tests/test-function-calls.cpp
index cfd560be..59af3804 100644
--- a/tests/test-function-calls.cpp
+++ b/tests/test-function-calls.cpp
@@ -3298,6 +3298,63 @@ int main() {
         std::cout << "✅ PASS: Qwen3 XML tool calls -> finish_reason='tool_calls'" << std::endl;
         
         std::cout << "🎯 All streaming finish_reason tests passed!" << std::endl;
+        
+        // TDD: Test for thinking tag termination issue - Reproduce user's exact complaint
+        std::cout << std::endl;
+        std::cout << "🧠 Testing DeepSeek R1 thinking tag termination issue..." << std::endl;
+        
+        // Test case: Response wrapped entirely in think tags (reported issue)
+        std::string wrapped_response = "<think>This should be content but is wrapped in think tags</think>";
+        
+        std::cout << "\n   1. REPRODUCING FAILURE - Without fix (reasoning_in_content=false):" << std::endl;
+        
+        // First reproduce the failing behavior that user reported
+        common_chat_syntax broken_syntax;
+        broken_syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1;
+        broken_syntax.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK;
+        broken_syntax.reasoning_in_content = false; // This causes the reported issue
+        broken_syntax.enable_tool_calls = false;
+        
+        try {
+            auto broken_msg = common_chat_parse(wrapped_response, false, broken_syntax);
+            std::cout << "      Content: '" << broken_msg.content << "'" << std::endl;
+            std::cout << "      Reasoning: '" << broken_msg.reasoning_content << "'" << std::endl;
+            
+            if (broken_msg.content.empty() && !broken_msg.reasoning_content.empty()) {
+                std::cout << "      ❌ REPRODUCED USER BUG: Content disappears (thinking tags don't terminate properly)" << std::endl;
+                std::cout << "      User sees: EMPTY CONTENT - this is exactly what was reported!" << std::endl;
+            }
+        } catch (const std::exception& e) {
+            std::cout << "      ❌ Exception: " << e.what() << std::endl;
+        }
+        
+        std::cout << "\n   2. DEMONSTRATING FIX - With fix (reasoning_in_content=true):" << std::endl;
+        
+        // Now show the fix works
+        common_chat_syntax fixed_syntax;
+        fixed_syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1;
+        fixed_syntax.reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK;
+        fixed_syntax.reasoning_in_content = true; // Key fix: display thinking as content
+        fixed_syntax.enable_tool_calls = false;
+        
+        try {
+            auto msg = common_chat_parse(wrapped_response, false, fixed_syntax);
+            std::cout << "      Content: '" << msg.content << "'" << std::endl;
+            std::cout << "      Reasoning: '" << msg.reasoning_content << "'" << std::endl;
+            
+            if (msg.content.find("This should be content but is wrapped in think tags") != std::string::npos) {
+                std::cout << "      ✅ PASS: Content properly preserved from think tags (with reasoning_in_content=true)" << std::endl;
+                std::cout << "      User sees: Full content - this fixes the reported issue!" << std::endl;
+            } else if (msg.content.empty() && !msg.reasoning_content.empty()) {
+                std::cout << "      ❌ FAILING TEST: Entire response treated as reasoning instead of content!" << std::endl;
+                std::cout << "      Expected: Content should contain the text from within think tags" << std::endl;
+            } else {
+                std::cout << "      ⚠️  PARTIAL: Some content found but may not contain expected text" << std::endl;
+            }
+        } catch (const std::exception& e) {
+            std::cout << "      ❌ Exception in thinking tag test: " << e.what() << std::endl;
+        }
+        
     } catch (const std::exception& e) {
         std::cout << std::endl;
         std::cout << "❌ Test failed with exception: " << e.what() << std::endl;
@@ -3305,4 +3362,4 @@ int main() {
     }
     
     return 0;
-}
\ No newline at end of file
+}