crawl4ai

mirror of https://github.com/unclecode/crawl4ai.git synced 2026-06-10 07:48:50 +00:00

Files

unclecode 9b571bb947 feat: HTTP strategy detects and saves file downloads (CSV, PDF, etc.)

The HTTP crawler strategy now checks Content-Type and Content-Disposition
headers to detect non-HTML file responses. When a file download is
detected, raw bytes are saved to disk and the path is returned via
downloaded_files. Text-based files (CSV, JSON, XML) also populate the
html field for backward compatibility. Binary files (PDF, images) set
html to empty string — content is only available via downloaded_files.

Adds downloads_path to HTTPCrawlerConfig (defaults to ~/.crawl4ai/downloads/).

2026-03-16 14:03:43 +00:00

PR-TODOLIST.md

feat: HTTP strategy detects and saves file downloads (CSV, PDF, etc.)

2026-03-16 14:03:43 +00:00