Fix Qwen3 content extraction breaking code formatting (#661)

Problem:
- qwen3::extract_content_during_parsing() used aggressive regex to collapse multiple newlines
- This broke proper code formatting (e.g., PEP 8's 2 empty lines between functions)
- Affected non-tool-call streaming output where formatting is critical

Solution:
- Replace aggressive std::regex_replace(R"(\n\s*\n)", "\n") with gentle string_strip()
- Follow original llama.cpp patterns: only trim leading/trailing whitespace
- Preserve internal formatting including multiple newlines
- Add proper include for common.h to access string_strip function

Changes:
- examples/server/parsers/qwen3_parser.hpp: Replace whitespace cleanup with string_strip()
- tests/test-function-calls.cpp: Add test_qwen3_whitespace_preservation() to prevent regression

Testing:
-  PEP 8 compliance: 2 empty lines between functions preserved
-  Tool call parsing: All Qwen3 tests continue to pass
-  No regressions: Existing functionality maintained
-  Follows original llama.cpp whitespace handling patterns
This commit is contained in:
Anton Sokolchenko
2025-08-07 07:22:01 +02:00
committed by GitHub
parent e484944bc0
commit dee40cffb6
2 changed files with 38 additions and 6 deletions

View File

@@ -1,6 +1,7 @@
#pragma once
#include "json.hpp"
#include "../../common/common.h"
#include <string>
#include <regex>
@@ -102,12 +103,8 @@ static std::string extract_content_during_parsing(const std::string& text, bool
}
}
// Clean up extra whitespace
content = std::regex_replace(content, std::regex(R"(\n\s*\n)"), "\n");
// Trim leading/trailing whitespace
content.erase(0, content.find_first_not_of(" \t\n\r"));
content.erase(content.find_last_not_of(" \t\n\r") + 1);
// Only trim leading/trailing whitespace, preserve internal formatting
content = string_strip(content);
} catch (const std::exception&) {
// Return original text on regex errors