crawl4ai/tests at 9b571bb947c5ce0750dff783296afcb0efcd2bd0 - crawl4ai - Public git mirror

unclecode/crawl4ai

mirror of https://github.com/unclecode/crawl4ai.git synced 2026-06-10 07:48:50 +00:00

Files

History

unclecode 9b571bb947 feat: HTTP strategy detects and saves file downloads (CSV, PDF, etc.)

The HTTP crawler strategy now checks Content-Type and Content-Disposition
headers to detect non-HTML file responses. When a file download is
detected, raw bytes are saved to disk and the path is returned via
downloaded_files. Text-based files (CSV, JSON, XML) also populate the
html field for backward compatibility. Binary files (PDF, images) set
html to empty string — content is only available via downloaded_files.

Adds downloads_path to HTTPCrawlerConfig (defaults to ~/.crawl4ai/downloads/).

2026-03-16 14:03:43 +00:00

..

fix: batch fix for 10 open issues (#1520 , #1489 , #1374 , #1424 , #1183 , #1354 , #880 , #1031 , #1251 , #1758 )

2026-03-07 09:47:38 +00:00

feat: HTTP strategy detects and saves file downloads (CSV, PDF, etc.)

2026-03-16 14:03:43 +00:00

async_assistant

test(async_assistant): add new tests for extract pipeline

2025-06-23 10:44:27 +08:00

feat: add avoid_ads/avoid_css resource filtering and pool release lifecycle

2026-02-25 07:12:28 +00:00

cache_validation

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

When using --deep-crawl, output all pages, not just the first one.

2025-12-10 10:12:01 -07:00

Add cancellation support for deep crawl strategies

2026-01-22 06:08:25 +00:00

#1167 Add PHP MIME types to ContentTypeFilter for better file handling

2025-06-09 11:49:33 +08:00

feat: add avoid_ads/avoid_css resource filtering and pool release lifecycle

2026-02-25 07:12:28 +00:00

Add token usage tracking to generate_schema / agenerate_schema

2026-02-18 06:44:17 +00:00

refactor(crawler): improve HTML handling and cleanup codebase

2025-02-07 21:56:27 +08:00

Release prep (#749 )

2025-02-28 19:53:35 +08:00

feat(docker): update Docker deployment for v0.6.0

2025-04-22 22:35:25 +08:00

#1375 : refactor(proxy) Deprecate 'proxy' parameter in BrowserConfig and enhance proxy string parsing

2025-08-28 17:21:49 +08:00

chore(profile-test): fix filename typo ( test_crteate_profile.py → test_create_profile.py )

2025-06-12 14:38:32 +03:00

Fix anti-bot detection for large SPA block pages (403/503)

2026-02-20 10:07:59 +00:00

test: add comprehensive regression test suite (291 tests)

2026-03-08 03:20:52 +00:00

test(releases): Add test cases for release 0.7.0

2025-07-11 22:27:18 +08:00

feat: add avoid_ads/avoid_css resource filtering and pool release lifecycle

2026-02-25 07:12:28 +00:00

__init__.py

- Test all methods

2024-05-14 21:27:41 +08:00

check_dependencies.py

fix: Replace tf-playwright-stealth with playwright-stealth dependency

2026-01-17 11:06:44 +01:00

docker_example.py

docs: remove CRAWL4AI_API_TOKEN references and use correct endpoints in Docker example scripts (#1015 )

2025-08-09 19:37:22 +05:30

test_arun_many.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_bug_batch_1622_1786_1796.py

fix: batch merge of community PRs (#1622 , #1786 , #1796 , #1795 , #1798 , #1734 , #1290 , #1668 )

2026-03-07 08:45:11 +00:00

test_cdp_changes.py

Improve CDP connection handling

2026-01-31 11:07:26 +00:00

test_cli_docs.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_cloud_bugs_batch.py

fix: anti-bot false positive on browser JSON, URLPatternFilter prefix match, PDF deserialization

2026-03-09 14:52:58 +00:00

test_config_defaults.py

Add set_defaults/get_defaults/reset_defaults to config classes

2026-01-31 11:44:07 +00:00

test_config_matching_only.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_config_selection.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_docker_api_with_llm_provider.py

feat(docker): add flexible LLM provider configuration

2025-08-05 14:09:54 +08:00

test_docker.py

Fix: README.md urls list

2025-04-29 16:26:35 +02:00

test_issue_1213_bm25_dedup.py

fix: deduplicate BM25ContentFilter output (#1213 ) (#1824 )

2026-03-12 14:23:34 +08:00

test_issue_1370_1818_1762_1509.py

fix: screenshot distortion, deep crawl timeout/arun_many, CLI encoding (#1370 , #1818 , #1509 , #1762 )

2026-03-12 18:17:13 +08:00

test_issue_1484_css_selector.py

fix: css_selector ignored in LXML scraping for raw:// URLs (#1484 )

2026-03-12 20:00:33 +08:00

test_issue_1594_mcp_sse.py

fix: MCP SSE endpoint crash — mount via raw ASGI Route (#1594 )

2026-03-12 11:22:48 +00:00

test_issue_1611_llm_provider.py

fix: /llm per-request provider override, Redis config from host/port/password (#1611 , #1817 )

2026-03-12 15:53:04 +08:00

test_issue_1748_screenshot_scroll_delay.py

Fix scroll_delay ignored in take_screenshot_scroller for full-page screenshots

2026-02-25 06:52:53 +03:00

test_issue_1750_screenshot_scan_full_page.py

fix: screenshot respects scan_full_page=False (#1750 )

2026-03-12 12:04:45 +08:00

test_link_extractor.py

feat: cleanup unused code and enhance documentation for v0.7.1

2025-07-17 11:35:16 +02:00

test_llm_extraction_parallel_issue_1055.py

This commit resolves issue #1055 where LLM extraction was blocking async

2025-11-06 11:22:45 +01:00

test_llm_simple_url.py

refactor: Update LLMTableExtraction examples and tests

2025-08-15 18:47:31 +08:00

test_llmtxt.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_main.py

Fix: README.md urls list

2025-04-29 16:26:35 +02:00

test_memory_macos.py

refactor(utils): move memory utilities to utils and update imports

2025-08-17 19:14:55 +08:00

test_merge_head_data_scoring.py

Fix total_score not calculated for links that fail head extraction

2026-02-16 20:41:30 +05:30

test_multi_config.py

fix: Correct URL matcher fallback behavior and improve memory monitoring

2025-08-03 16:50:54 +08:00

test_normalize_url.py

#1103 fix(url): enhance URL normalization to handle invalid schemes and trailing slashes

2025-05-19 13:51:16 +08:00

test_pr_1290_1668.py

fix: batch merge of community PRs (#1622 , #1786 , #1796 , #1795 , #1798 , #1734 , #1290 , #1668 )

2026-03-07 08:45:11 +00:00

test_pr_1435_redirected_status_code.py

Add tests, docs, and contributors for PRs #1463 and #1435

2026-02-06 09:30:19 +00:00

test_pr_1463_device_scale_factor.py

Add tests, docs, and contributors for PRs #1463 and #1435

2026-02-06 09:30:19 +00:00

test_pr_1795_1798_1734.py

fix: batch merge of community PRs (#1622 , #1786 , #1796 , #1795 , #1798 , #1734 , #1290 , #1668 )

2026-03-07 08:45:11 +00:00

test_prefetch_integration.py

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

test_prefetch_mode.py

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

test_prefetch_regression.py

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

test_preserve_https_for_internal_links.py

feat: add preserve_https_for_internal_links flag to maintain HTTPS during crawling. Ref #1410

2025-08-28 17:38:40 +08:00

test_pyopenssl_security_fix.py

test: add verification tests for pyOpenSSL security update

2025-10-23 06:57:25 +00:00

test_pyopenssl_update.py

test: add verification tests for pyOpenSSL security update

2025-10-23 06:57:25 +00:00

test_raw_html_browser.py

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

test_raw_html_edge_cases.py

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

test_raw_html_redirected_url.py

Fix redirected_url containing raw HTML content for raw: URLs

2026-01-20 00:45:15 +00:00

test_scraping_strategy.py

Release prep (#749 )

2025-02-28 19:53:35 +08:00

test_source_sibling_selector.py

Add source (sibling selector) support to JSON extraction strategies

2026-02-17 09:04:40 +00:00

test_table_gfm_compliance.py

fix: add leading/trailing pipes to GFM tables (pad_tables=False)

2026-02-17 21:14:36 -05:00

test_virtual_scroll.py

feat: Add virtual scroll support for modern web scraping

2025-06-29 20:41:37 +08:00

test_web_crawler.py

Update all documentation to import extraction strategies directly from crawl4ai.

2025-06-10 18:08:27 +08:00

test_webhook_feature.sh

test: add comprehensive webhook feature test script

2025-10-22 00:35:07 +00:00

WEBHOOK_TEST_README.md

test: add comprehensive webhook feature test script

2025-10-22 00:35:07 +00:00