Deepseek R1 function calls (more formats) (#652)

* Implement function calling / tools for ik_llama.cpp for Kimi K2 * Implement basic tool choice * Backport llama.cpp tool calls support * Enhance function calls with improved chat parser and string utilities - Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling - Improve function calls parsing with fallback to llama.cpp builder pattern - Add string utility functions (starts_with, ends_with, find_partial_stop) - Update README with function calls testing instructions - Enhance Kimi K2 parser and function calls documentation - Add comprehensive test suite for function calls - Update CMakeLists.txt and Makefile for new components * Enhance function calling with unified streaming and parser improvements - Fix streaming content cleanup to prevent function syntax in output - Unify content extraction patterns with llama.cpp approach - Improve Kimi K2 parser robustness and partial content handling - Add comprehensive test coverage for function call scenarios - Optimize chat message parsing and diff computation * Replace hardcoded values in kimi_k2_parser.hpp with named constants - Add compile-time constants for all token format markers - Add compile-time constants for XML format markers - Add compile-time constants for simple format patterns - Replace all hardcoded string literals with named constants - Use compile-time length calculation to avoid manual counting - Improve maintainability and reduce magic numbers throughout parser * Fix duplicate common_chat_parse definition - Remove duplicate implementation from chat-parser.cpp - Keep single implementation in chat.cpp following llama.cpp patterns - Resolves linker error: multiple definition of common_chat_parse * Fix JSON assertion failure in function call parsing - Add proper validation that 'function' field is an object before accessing nested keys - Handle missing 'arguments' field gracefully with default "{}" - Prevents crash when parsing malformed tool call JSON structures * Add comprehensive Qwen3 XML tool calling support with unit tests - Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format - Add model detection and routing for Qwen3 vs Kimi-K2 formats - Create 8 comprehensive unit tests covering parsing, streaming, error handling - Fix token format cleaning bug in kimi_k2_parser.hpp processing order - Remove progressive parsing code and related utilities - Add tool injection support for Qwen3 format in server utils * Add DeepSeek R1 function calling support with comprehensive unit tests - Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp - Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp - Update function_calls.hpp with DeepSeek R1 integration and content extraction - Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models - Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration - Port exact implementation patterns from original llama.cpp for compatibility Key features: - Native DeepSeek R1 format: <｜tool▁calls▁begin｜>function<｜tool▁sep｜>name```json{}```<｜tool▁call▁end｜><｜tool▁calls▁end｜> - Reasoning content extraction from <think>...</think> tags - Multiple tool calls support with separate call blocks - Model detection for deepseek-r1, deepseek_r1 naming patterns - Integration with incremental parsing and streaming support * Add partial parsing support for JSON and regex - json-partial.h/cpp: JSON partial parsing functionality - regex-partial.h/cpp: Regex partial parsing functionality * Add format_chat integration tests for Qwen3 tool injection - Add test_qwen3_format_chat_integration() to validate tool injection pipeline - Test tool injection conditions and system message enhancement - Verify JSON formatting and anti-preamble instructions - Add comprehensive test documentation Tests confirm tool injection works correctly - conversational preamble issue is not in ik_llama.cpp but likely in UI configuration. * Fix Qwen3 tool call parsing - pass model name to parser Server was not passing model name to parse_chat_message_incremental(), causing Qwen3 to fall back to Kimi-K2 parser and return tool calls as content instead of proper tool_calls array. * Fix non-streaming path to use model-specific parsing Non-streaming responses were hardcoded to use Kimi-K2 format, causing Qwen3 XML tool calls to be returned as content instead of proper tool_calls array. Now uses same model detection as streaming path for consistency. * Update Qwen3 function call handling in server and tests - Enhanced server function call detection and response formatting - Improved test coverage for Qwen3 tool call scenarios - Refined XML parsing for better tool execution support * Add DeepSeek-R1 function call parsing support Implements comprehensive parsing for all 4 DeepSeek-R1 function call formats: - Format 1: Standard function call syntax (already supported) - Format 2: Alternative function call patterns (already supported) - Format 3: Tools array format - function\n```json\n{"tools": [...]} - Format 4: XML wrapped format - <tool_call>function</think>Name\n```json\n{...}```</tool_call> Key changes: - Added parse_deepseek_r1_tools_array() following original parse_prefixed_json_tool_call_array pattern - Added parse_deepseek_r1_xml_wrapped() following Hermes-2-Pro XML wrapper patterns - Integrated both parsers into exception handling chain for robust fallback - Added comprehensive TDD test coverage for all formats - Anonymized all confidential information while preserving functionality Resolves tool_calls_count=0 issue where DeepSeek-R1 models generated valid tool calls but server failed to parse them correctly. * Update function_calls.md documentation for DeepSeek-R1 Format 4 - Added Format 4 (XML wrapped) documentation with examples - Updated implementation notes with correct parser order (3→4→1→2) - Marked all DeepSeek-R1 formats as working (July 2025 update) - Updated test status for Format 3 and 4 as passing - Added parse_deepseek_r1_xml_wrapped() function reference - Corrected implementation file line numbers * Fix merge conflict in test-function-calls.cpp - Removed incomplete merge conflict marker from line 3027 - Ensured all tests compile and pass successfully - All DeepSeek-R1 formats (1-4) working correctly - All streaming and content cleaning tests passing
2026-04-28 10:21:48 +00:00 · 2025-08-07 07:15:57 +02:00
parent d65d5fe29e
commit f4051d9c3e
7 changed files with 695 additions and 125 deletions
--- a/tests/test-function-calls.cpp
+++ b/tests/test-function-calls.cpp
@@ -145,7 +145,7 @@ const std::string content_cleaning_mixed_formats = R"(First: <|tool_calls_sectio

 // TDD: Reproduction of exact contamination issue from server logs
 // From manual_logs/kimi-k2/ls/test_case_ls_logs_claude-code-ui.log:5
-const std::string contamination_ls_issue = R"(I'll help you examine the workspace. Let me list the current directory contents.functions.LS:1{"path": "/Users/seven/Documents/projects/ai/sequential_thinking"})";
+const std::string contamination_ls_issue = R"(I'll help you examine the workspace. Let me list the current directory contents.functions.LS:1{"path": "/tmp/example_workspace"})";
 const std::string expected_clean_ls = R"(I'll help you examine the workspace. Let me list the current directory contents.)";

 // DeepSeek R1 test data
@@ -196,6 +196,29 @@ Done.)";

 const std::string deepseek_r1_reasoning_only = R"(<think>Just thinking, no tools needed.</think>Here's my direct response.)";

+// DeepSeek R1 format without separator (actual format sometimes generated by models)
+const std::string deepseek_r1_no_separator = R"(I'll help you add the new cleaning step for resetting device orientation. Let me break this down into tasks:
+
+<｜tool▁calls▁begin｜>
+<｜tool▁call▁begin｜>
+function<TodoWrite>
+```json
+{
+  "items": [
+    {
+      "description": "Create ResetOrientation cleaning step class",
+      "status": "pending"
+    },
+    {
+      "description": "Implement Android orientation reset using provided ADB command",
+      "status": "pending"
+    }
+  ]
+}
+```
+<｜tool▁call▁end｜>
+<｜tool▁calls▁end｜>)";
+
 // Advanced partial detection test cases based on original llama.cpp patterns
 // TDD: Advanced partial detection - streaming edge cases
 const std::string partial_incomplete_function_name = R"(Let me help you with that. func)";
@@ -673,7 +696,7 @@ void test_contamination_reproduction() {
    test_assert(msg.tool_calls.size() == 1, "TDD Contamination: Tool call should be extracted");
    test_assert(msg.tool_calls[0].name == "LS", "TDD Contamination: Correct function name extracted");
    
-    std::string expected_args = R"({"path": "/Users/seven/Documents/projects/ai/sequential_thinking"})";
+    std::string expected_args = R"({"path": "/tmp/example_workspace"})";
    test_assert(msg.tool_calls[0].arguments == expected_args, "TDD Contamination: Correct arguments extracted");
    
    // 🚨 THE CRITICAL TEST: Content should be cleaned of function call syntax
@@ -1849,7 +1872,7 @@ void test_regression_contamination_issue() {
    std::cout << "   - slot_current_msg_content is clean" << std::endl;
    
    // Step 1: Simulate the exact content from logs
-    std::string raw_generated_text = "Let me list the updated contents:functions.LS:3{\"path\": \"/Users/seven/Documents/projects/ai/sequential_thinking\"}";
+    std::string raw_generated_text = "Let me list the updated contents:functions.LS:3{\"path\": \"/tmp/example_workspace\"}";
    
    std::cout << "\n🔍 Test Setup:" << std::endl;
    std::cout << "   Raw generated text: " << raw_generated_text.substr(0, 80) << "..." << std::endl;
@@ -1883,7 +1906,7 @@ void test_regression_contamination_issue() {
    previous_server_state.tool_calls.resize(1);
    previous_server_state.tool_calls[0].name = "LS";
    previous_server_state.tool_calls[0].id = "functions.LS:3";
-    previous_server_state.tool_calls[0].arguments = "{\"path\": \"/Users/seven/Documents/projects/ai/sequential_thinking\"}";
+    previous_server_state.tool_calls[0].arguments = "{\"path\": \"/tmp/example_workspace\"}";
    
    // Current parsing result should be the same (no change)
    ik_chat_msg current_server_state = complete_result;
@@ -2180,7 +2203,7 @@ void test_xml_tool_call_parsing() {
    std::cout << "\n=== XML Tool Call Parsing Test ===" << std::endl;
    
    // Test XML format like what Kimi-K2 is actually generating
-    std::string xml_content = "I'll create debug_test.2txt with the current timestamp:\n\n<tool_call>\n<invoke name=\"Write\">\n<parameter name=\"file_path\">/Users/seven/Documents/projects/ai/sequential_thinking/debug_test.2txt</parameter>\n<parameter name=\"content\">2025-07-20 08:30:45 UTC</parameter>\n</invoke>\n</tool_call>";
+    std::string xml_content = "I'll create a test file with the current timestamp:\n\n<tool_call>\n<invoke name=\"Write\">\n<parameter name=\"file_path\">/tmp/test_output.txt</parameter>\n<parameter name=\"content\">2025-07-20 08:30:45 UTC</parameter>\n</invoke>\n</tool_call>";
    
    std::cout << "🔍 Testing XML tool call parsing" << std::endl;
    std::cout << "   Input: " << xml_content << std::endl;
@@ -2970,6 +2993,15 @@ int main() {
        assert(reason_only_msg.content == "Here's my direct response.");
        std::cout << "✅ PASS: DeepSeek R1 reasoning only parsed" << std::endl;
        
+        // Test format without separator (actual format sometimes generated by models)
+        auto no_sep_tool_msg = common_chat_parse(deepseek_r1_no_separator, false, deepseek_syntax);
+        assert(no_sep_tool_msg.tool_calls.size() == 1);
+        assert(no_sep_tool_msg.tool_calls[0].name == "TodoWrite");
+        // The JSON should be preserved as-is
+        std::string expected_json = "{\n  \"items\": [\n    {\n      \"description\": \"Create ResetOrientation cleaning step class\",\n      \"status\": \"pending\"\n    },\n    {\n      \"description\": \"Implement Android orientation reset using provided ADB command\",\n      \"status\": \"pending\"\n    }\n  ]\n}";
+        assert(no_sep_tool_msg.tool_calls[0].arguments == expected_json);
+        std::cout << "✅ PASS: DeepSeek R1 format without separator parsed" << std::endl;
+        
        // Test function_calls.hpp integration with DeepSeek R1
        std::cout << std::endl;
        std::cout << "🔗 Testing DeepSeek R1 Integration:" << std::endl;
@@ -2992,6 +3024,217 @@ int main() {
        assert(extracted.find("<｜tool▁calls▁begin｜>") == std::string::npos);
        std::cout << "✅ PASS: DeepSeek R1 content extraction works" << std::endl;
        
+        // Test content contamination fix - exact user reported case
+        std::cout << "\n🧹 Testing Content Contamination Fix:" << std::endl;
+        std::string contaminated_content = "I'll help you add the new cleaning step for orientation management. Let me break this down into tasks:\n\n<｜tool▁calls▁begin｜>\n<｜tool▁call▁begin｜>\nfunction<｜tool▁sep｜>TodoWrite\n```json\n{\"items\": [{\"description\": \"Create ResetOrientation cleaning step class\", \"status\": \"pending\"}, {\"description\": \"Add setOrientationLock method to DeviceRobot\", \"status\": \"pending\"}, {\"description\": \"Integrate ResetOrientation into AndroidDeviceCleaner.clean method\", \"status\": \"pending\"}, {\"description\": \"Update iOS device cleaner to set iPad orientation to portrait instead of landscape\", \"status\": \"pending\"}]}\n```\n<｜tool▁call▁end｜>\n<｜tool▁calls▁end｜>";
+        
+        ik_chat_msg contamination_msg = parse_chat_message_incremental(contaminated_content, false, "deepseek-r1");
+        
+        // Tool calls should be extracted
+        assert(!contamination_msg.tool_calls.empty());
+        assert(contamination_msg.tool_calls[0].name == "TodoWrite");
+        std::cout << "✅ PASS: Tool calls extracted from contaminated content" << std::endl;
+        
+        // Content should be clean - no tool call markup visible to user
+        assert(contamination_msg.content.find("<｜tool▁calls▁begin｜>") == std::string::npos);
+        assert(contamination_msg.content.find("<｜tool▁call▁begin｜>") == std::string::npos);
+        assert(contamination_msg.content.find("function<｜tool▁sep｜>") == std::string::npos);
+        assert(contamination_msg.content.find("```json") == std::string::npos);
+        assert(contamination_msg.content.find("<｜tool▁call▁end｜>") == std::string::npos);
+        assert(contamination_msg.content.find("<｜tool▁calls▁end｜>") == std::string::npos);
+        
+        // Content should contain the user-friendly message
+        assert(contamination_msg.content.find("I'll help you add the new cleaning step for orientation management. Let me break this down into tasks:") != std::string::npos);
+        std::cout << "✅ PASS: Content cleaned - no tool call markup visible to user" << std::endl;
+        
+        // TDD Test: Reproduce exact failure from debug logs (tool_calls_count=0)
+        std::cout << "\n🐛 TDD: DeepSeek R1 tool_calls_count=0 Bug Test (SHOULD FAIL):" << std::endl;
+        std::string exact_failure_content = "Now I need to add the method to the interface. Let me do that:\n\n<｜tool▁calls▁begin｜>\n<｜tool▁call▁begin｜>\nfunction<｜tool▁sep｜>Edit\n```json\n{\"file_path\": \"/path/to/example/src/main/java/com/example/ServiceInterface.java\", \"old_string\": \"\\tMethod getMethod();\\n\\n\\tvoid setProperty(String value);\", \"new_string\": \"\\tMethod getMethod();\\n\\n\\tvoid setNewMethod(boolean enabled);\\n\\n\\tvoid setProperty(String value);\"}\n```\n<｜tool▁call▁end｜>\n<｜tool▁calls▁end｜>";
+        
+        // This test simulates the exact server logic from format_partial_response_oaicompat:2832
+        ik_chat_msg failure_msg = parse_chat_message_incremental(exact_failure_content, false, "DeepSeek-R1");
+        
+        // Debug: Print what we actually got
+        std::cout << "   Debug: tool_calls.size() = " << failure_msg.tool_calls.size() << std::endl;
+        std::cout << "   Debug: content length = " << failure_msg.content.length() << std::endl;
+        if (!failure_msg.tool_calls.empty()) {
+            std::cout << "   Debug: first tool call name = '" << failure_msg.tool_calls[0].name << "'" << std::endl;
+        }
+        
+        // The bug: This SHOULD pass but currently FAILS (tool_calls_count=0)
+        bool tool_calls_detected = !failure_msg.tool_calls.empty();
+        std::cout << "   Expected: tool_calls_count > 0" << std::endl;
+        std::cout << "   Actual: tool_calls_count = " << failure_msg.tool_calls.size() << std::endl;
+        
+        if (tool_calls_detected) {
+            std::cout << "✅ UNEXPECTED PASS: Tool calls detected (bug may be fixed)" << std::endl;
+            assert(failure_msg.tool_calls[0].name == "Edit");
+        } else {
+            std::cout << "❌ EXPECTED FAIL: tool_calls_count=0 (reproduces reported bug)" << std::endl;
+            std::cout << "   This confirms the parsing failure - tool calls are not being extracted" << std::endl;
+        }
+        
+        // Additional test: Check exact server scenario with model name case sensitivity
+        std::cout << "\n🔍 Testing Server Scenario Reproduction:" << std::endl;
+        
+        // Test with exact model name from debug log: "DeepSeek-R1"
+        ik_chat_msg server_scenario_msg = parse_chat_message_incremental(exact_failure_content, false, "DeepSeek-R1");
+        std::cout << "   Model: 'DeepSeek-R1' -> tool_calls_count = " << server_scenario_msg.tool_calls.size() << std::endl;
+        
+        // Test model detection with exact string
+        bool detected_exact = is_deepseek_r1_model("DeepSeek-R1");
+        std::cout << "   is_deepseek_r1_model('DeepSeek-R1') = " << (detected_exact ? "true" : "false") << std::endl;
+        
+        if (!detected_exact) {
+            std::cout << "❌ FOUND BUG: Model 'DeepSeek-R1' not detected as DeepSeek R1!" << std::endl;
+            std::cout << "   This explains tool_calls_count=0 - wrong parser being used" << std::endl;
+        } else if (server_scenario_msg.tool_calls.empty()) {
+            std::cout << "❌ FOUND BUG: Model detected but parsing still fails" << std::endl;
+        } else {
+            std::cout << "✅ Model detection and parsing both work correctly" << std::endl;
+        }
+        
+        // TDD Test: Test exception handling scenario that could cause tool_calls_count=0
+        std::cout << "\n🔍 Testing Exception Handling Scenario:" << std::endl;
+        
+        // Test with potentially problematic content that might trigger partial exception
+        std::string problematic_content = exact_failure_content;
+        
+        try {
+            // Direct test of common_chat_msg_parser to see if it throws exceptions
+            common_chat_syntax syntax;
+            syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1;
+            syntax.enable_tool_calls = true;
+            
+            common_chat_msg_parser parser(problematic_content, false, syntax);  // is_partial=false like server
+            parser.parse();
+            auto result = parser.result();
+            
+            std::cout << "   Direct parser: tool_calls.size() = " << result.tool_calls.size() << std::endl;
+            
+            if (result.tool_calls.empty()) {
+                std::cout << "❌ FOUND BUG: Direct parser returns no tool calls!" << std::endl;
+                std::cout << "   This explains tool_calls_count=0 in server logs" << std::endl;
+            } else {
+                std::cout << "✅ Direct parser works correctly" << std::endl;
+            }
+            
+        } catch (const common_chat_msg_partial_exception& e) {
+            std::cout << "❌ FOUND BUG: common_chat_msg_partial_exception thrown in non-partial mode!" << std::endl;
+            std::cout << "   Exception: " << e.what() << std::endl;
+            std::cout << "   Server code catches this and sets tool_calls_json = json::array() -> tool_calls_count=0" << std::endl;
+        } catch (const std::exception& e) {
+            std::cout << "❌ Other exception: " << e.what() << std::endl;
+        }
+        
+        // Test with exact content from debug logs (with escaped characters)
+        std::cout << "\n🔍 Testing Exact Debug Log Content:" << std::endl;
+        std::string debug_log_content = "Now I need to add the method to the interface. Let me do that:\n\n<｜tool▁calls▁begin｜>\n<｜tool▁call▁begin｜>\nfunction<｜tool▁sep｜>Edit\n```json\n{\"file_path\": \"/path/to/example/ServiceInterface.java\", \"old_string\": \"\\tMethod getMethod();\\n\\n\\tvoid setProperty(String value);\", \"new_string\": \"\\tMethod getMethod();\\n\\n\\tvoid setNewMethod(boolean enabled);\\n\\n\\tvoid setProperty(String value);\"}\n```\n<｜tool▁call▁end｜>\n<｜tool▁calls▁end｜>";
+        
+        ik_chat_msg debug_msg = parse_chat_message_incremental(debug_log_content, false, "DeepSeek-R1");
+        std::cout << "   Debug log exact content: tool_calls_count = " << debug_msg.tool_calls.size() << std::endl;
+        
+        if (debug_msg.tool_calls.empty()) {
+            std::cout << "❌ REPRODUCED BUG: Exact debug log content fails to parse!" << std::endl;
+            
+            // Test individual components to isolate the issue
+            if (debug_log_content.find("<｜tool▁calls▁begin｜>") != std::string::npos) {
+                std::cout << "   Contains tool call markers: YES" << std::endl;
+            }
+            if (debug_log_content.find("function<｜tool▁sep｜>Edit") != std::string::npos) {
+                std::cout << "   Contains function call: YES" << std::endl;  
+            }
+            if (debug_log_content.find("```json") != std::string::npos) {
+                std::cout << "   Contains JSON block: YES" << std::endl;
+            }
+            
+        } else {
+            std::cout << "✅ Debug log content parses correctly (tool_calls_count=" << debug_msg.tool_calls.size() << ")" << std::endl;
+            std::cout << "   Tool call name: " << debug_msg.tool_calls[0].name << std::endl;
+        }
+        
+        // TDD Test: NEW FORMAT - Reproduce actual failure scenario from second debug log
+        std::cout << "\n🚨 TDD: REAL BUG - Different Format from Debug Log:" << std::endl;
+        std::string actual_failing_content = "<think>\nUser wants to add processing step for the system. I need to read files first to understand structure.\n</think>\n\nI'll help implement the ConfigurationProcessor step. Let's proceed step by step.\n\nFirst, let me check the existing file to understand where to add the new step.\n\nfunction\n```json\n{\n  \"tools\": [\n    {\n      \"name\": \"Read\",\n      \"arguments\": {\n        \"file_path\": \"/path/to/example/SystemProcessor.java\"\n      }\n    },\n    {\n      \"name\": \"Read\",\n      \"arguments\": {\n        \"file_path\": \"/path/to/example/ServiceInterface.java\"\n      }\n    },\n    {\n      \"name\": \"Glob\",\n      \"arguments\": {\n        \"pattern\": \"**/ProcessingStep.java\"\n      }\n    }\n  ]\n}\n```";
+        
+        ik_chat_msg real_bug_msg = parse_chat_message_incremental(actual_failing_content, false, "DeepSeek-R1");
+        std::cout << "   Real failing format: tool_calls_count = " << real_bug_msg.tool_calls.size() << std::endl;
+        
+        if (real_bug_msg.tool_calls.empty()) {
+            std::cout << "❌ REPRODUCED REAL BUG: This format is NOT being parsed!" << std::endl;
+            std::cout << "   Format: 'function\\n```json\\n{\"tools\": [...]}\\n```'" << std::endl;
+            std::cout << "   This is different from DeepSeek R1 format we've been testing" << std::endl;
+            std::cout << "   Our parser expects: '<｜tool▁calls▁begin｜>...function<｜tool▁sep｜>Name'" << std::endl;
+            std::cout << "   But model generates: 'function\\n```json\\n{\"tools\": [...]}'" << std::endl;
+        } else {
+            std::cout << "✅ Unexpected: Real format parses correctly" << std::endl;
+            for (size_t i = 0; i < real_bug_msg.tool_calls.size(); ++i) {
+                std::cout << "   Tool " << i << ": " << real_bug_msg.tool_calls[i].name << std::endl;
+            }
+        }
+        
+        // TDD Test: Create parser for the new format (should initially fail)
+        std::cout << "\n🧪 TDD: Test New Format Parser (SHOULD FAIL INITIALLY):" << std::endl;
+        
+        // Test that DeepSeek R1 parser should handle the new format
+        std::string new_format_content = "I'll help with that.\n\nfunction\n```json\n{\n  \"tools\": [\n    {\n      \"name\": \"Read\",\n      \"arguments\": {\n        \"file_path\": \"/path/to/example.java\"\n      }\n    },\n    {\n      \"name\": \"Edit\",\n      \"arguments\": {\n        \"file_path\": \"/path/to/example.java\",\n        \"old_string\": \"old implementation\",\n        \"new_string\": \"new implementation\"\n      }\n    }\n  ]\n}\n```\n\nThat should work!";
+        
+        ik_chat_msg new_format_msg = parse_chat_message_incremental(new_format_content, false, "DeepSeek-R1");
+        
+        std::cout << "   New format test: tool_calls_count = " << new_format_msg.tool_calls.size() << std::endl;
+        std::cout << "   Expected: 2 tool calls (Read, Edit)" << std::endl;
+        
+        if (new_format_msg.tool_calls.size() == 2) {
+            std::cout << "✅ PASS: New format parsed correctly!" << std::endl;
+            std::cout << "   Tool 1: " << new_format_msg.tool_calls[0].name << std::endl;
+            std::cout << "   Tool 2: " << new_format_msg.tool_calls[1].name << std::endl;
+            
+            // Test content cleaning
+            bool content_is_clean = new_format_msg.content.find("function\n```json") == std::string::npos;
+            if (content_is_clean) {
+                std::cout << "✅ PASS: Content cleaned - no function markup visible" << std::endl;
+            } else {
+                std::cout << "❌ FAIL: Content still contains function markup" << std::endl;
+            }
+        } else {
+            std::cout << "❌ EXPECTED FAIL: New format not yet supported" << std::endl;
+            std::cout << "   Need to implement parser for: 'function\\n```json\\n{\"tools\": [...]}'" << std::endl;
+        }
+        
+        // DEBUG: Test direct function call to verify parsing logic
+        std::cout << "\n🔧 DEBUG: Direct DeepSeek R1 Parser Test:" << std::endl;
+        std::string debug_content = "function\n```json\n{\n  \"tools\": [\n    {\"name\": \"TestTool\", \"arguments\": {\"test\": \"value\"}}\n  ]\n}\n```";
+        
+        try {
+            common_chat_syntax syntax;
+            syntax.format = COMMON_CHAT_FORMAT_DEEPSEEK_R1;
+            syntax.enable_tool_calls = true;
+            
+            common_chat_msg_parser debug_parser(debug_content, false, syntax);
+            debug_parser.parse();
+            auto debug_result = debug_parser.result();
+            
+            std::cout << "   Direct parser result: tool_calls_count = " << debug_result.tool_calls.size() << std::endl;
+        } catch (const std::exception& e) {
+            std::cout << "   Direct parser exception: " << e.what() << std::endl;
+        }
+
+        // TDD Test: Format 4 - XML-wrapped format from debug log
+        std::cout << "\n🔍 TDD: Format 4 XML-wrapped:" << std::endl;
+        std::string format4_content = "<think>\nLet me implement this step by step.\n</think>\n<plan>\n1. Implement configuration processor in SystemProcessor\n2. Extend ServiceInterface\n3. Update existing configuration settings\n</plan>\n<tool_call>\nfunction</think>CompleteTask\n```json\n{\"status\": \"completed\"}\n```\n</tool_call>";
+        
+        ik_chat_msg format4_msg = parse_chat_message_incremental(format4_content, false, "DeepSeek-R1");
+        std::cout << "   Format 4 test: tool_calls_count = " << format4_msg.tool_calls.size() << std::endl;
+        std::cout << "   Expected: 1 tool call (CompleteTask)" << std::endl;
+        
+        if (format4_msg.tool_calls.size() == 1) {
+            std::cout << "✅ PASS: Format 4 parsed correctly!" << std::endl;
+            std::cout << "   Tool: " << format4_msg.tool_calls[0].name << std::endl;
+        } else {
+            std::cout << "❌ FAIL: Format 4 not working correctly" << std::endl;
+            std::cout << "   Need to debug parser for: '<tool_call>\\nfunction</think>Name\\n```json\\n{...}\\n```\\n</tool_call>'" << std::endl;
+        }
+        
        // Test streaming finish_reason logic (core of the fix)
        std::cout << "\n🎯 Testing Streaming finish_reason Logic:" << std::endl;