Fix Qwen3 content extraction breaking code formatting (#661)

Problem: - qwen3::extract_content_during_parsing() used aggressive regex to collapse multiple newlines - This broke proper code formatting (e.g., PEP 8's 2 empty lines between functions) - Affected non-tool-call streaming output where formatting is critical Solution: - Replace aggressive std::regex_replace(R"(\n\s*\n)", "\n") with gentle string_strip() - Follow original llama.cpp patterns: only trim leading/trailing whitespace - Preserve internal formatting including multiple newlines - Add proper include for common.h to access string_strip function Changes: - examples/server/parsers/qwen3_parser.hpp: Replace whitespace cleanup with string_strip() - tests/test-function-calls.cpp: Add test_qwen3_whitespace_preservation() to prevent regression Testing: - ✅ PEP 8 compliance: 2 empty lines between functions preserved - ✅ Tool call parsing: All Qwen3 tests continue to pass - ✅ No regressions: Existing functionality maintained - ✅ Follows original llama.cpp whitespace handling patterns
2026-03-09 05:20:01 +00:00 · 2025-08-07 07:22:01 +02:00
parent e484944bc0
commit dee40cffb6
2 changed files with 38 additions and 6 deletions
--- a/examples/server/parsers/qwen3_parser.hpp
+++ b/examples/server/parsers/qwen3_parser.hpp
@@ -1,6 +1,7 @@
 #pragma once

 #include "json.hpp"
+#include "../../common/common.h"
 #include <string>
 #include <regex>

@@ -102,12 +103,8 @@ static std::string extract_content_during_parsing(const std::string& text, bool
            }
        }
        
-        // Clean up extra whitespace
-        content = std::regex_replace(content, std::regex(R"(\n\s*\n)"), "\n");
-        
-        // Trim leading/trailing whitespace
-        content.erase(0, content.find_first_not_of(" \t\n\r"));
-        content.erase(content.find_last_not_of(" \t\n\r") + 1);
+        // Only trim leading/trailing whitespace, preserve internal formatting
+        content = string_strip(content);
        
    } catch (const std::exception&) {
        // Return original text on regex errors