crawl4ai/tests/async/test_content_extraction.py at pdf_processing

mirror of https://github.com/unclecode/crawl4ai.git synced 2026-06-10 15:58:15 +00:00

Files

Soham Kukreti 18ad3ef159 fix: Implement base tag support in link extraction (#1147 )

- Extract base href from <head><base> tag using XPath in _process_element method
- Use base URL as the primary URL for link normalization when present
- Add error handling with logging for malformed or problematic base tags
- Maintain backward compatibility when no base tag is present
- Add test to verify the functionality of the base tag extraction.

2025-08-08 20:11:57 +05:30

3.6 KiB

Raw Permalink Blame History

View Raw

3.6 KiB Raw Permalink Blame History

3.6 KiB

Raw Permalink Blame History