Files
crawl4ai/tests
unclecode 9b571bb947 feat: HTTP strategy detects and saves file downloads (CSV, PDF, etc.)
The HTTP crawler strategy now checks Content-Type and Content-Disposition
headers to detect non-HTML file responses. When a file download is
detected, raw bytes are saved to disk and the path is returned via
downloaded_files. Text-based files (CSV, JSON, XML) also populate the
html field for backward compatibility. Binary files (PDF, images) set
html to empty string — content is only available via downloaded_files.

Adds downloads_path to HTTPCrawlerConfig (defaults to ~/.crawl4ai/downloads/).
2026-03-16 14:03:43 +00:00
..
2025-02-28 19:53:35 +08:00
2024-05-14 21:27:41 +08:00
2025-01-13 19:19:58 +08:00
2025-04-29 16:26:35 +02:00
2025-01-13 19:19:58 +08:00
2025-04-29 16:26:35 +02:00
2025-02-28 19:53:35 +08:00