crawl4ai/tests at 697c2b2a58cb2f03fd6884cd431ca4b5c6a5efac - crawl4ai - Public git mirror

unclecode/crawl4ai

mirror of https://github.com/unclecode/crawl4ai.git synced 2026-06-10 15:58:15 +00:00

Files

History

unclecode a4cc0a9f04 feat: add separate query_llm_config for adaptive crawler query expansion (#1682 )

The embedding strategy uses two incompatible API call types: embedding
calls (text-to-vector) and query expansion (chat completion). Previously
both used a single embedding_llm_config, so setting an embedding model
broke query expansion and vice versa.

Add query_llm_config to AdaptiveConfig and EmbeddingStrategy so users
can specify separate models for each call type. Fallback chain preserves
backward compatibility: query_llm_config -> llm_config -> hardcoded defaults.

Also fixes base_url and backoff params not being passed to
perform_completion_with_backoff in query expansion, and simplifies
_embedding_llm_config_dict to use LLMConfig.to_dict() (which includes
the 3 backoff fields the manual extraction was missing).

Inspired by PR #1683 from @sthakrar — thank you for identifying the
issue and proposing the initial approach.

2026-02-25 12:26:39 +00:00

..

feat: add separate query_llm_config for adaptive crawler query expansion (#1682 )

2026-02-25 12:26:39 +00:00

Fix browser recycling under high concurrency — version-based approach

2026-02-05 07:48:12 +00:00

async_assistant

test(async_assistant): add new tests for extract pipeline

2025-06-23 10:44:27 +08:00

feat: add avoid_ads/avoid_css resource filtering and pool release lifecycle

2026-02-25 07:12:28 +00:00

cache_validation

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

When using --deep-crawl, output all pages, not just the first one.

2025-12-10 10:12:01 -07:00

Add cancellation support for deep crawl strategies

2026-01-22 06:08:25 +00:00

#1167 Add PHP MIME types to ContentTypeFilter for better file handling

2025-06-09 11:49:33 +08:00

feat: add avoid_ads/avoid_css resource filtering and pool release lifecycle

2026-02-25 07:12:28 +00:00

Add token usage tracking to generate_schema / agenerate_schema

2026-02-18 06:44:17 +00:00

refactor(crawler): improve HTML handling and cleanup codebase

2025-02-07 21:56:27 +08:00

Release prep (#749 )

2025-02-28 19:53:35 +08:00

feat(docker): update Docker deployment for v0.6.0

2025-04-22 22:35:25 +08:00

#1375 : refactor(proxy) Deprecate 'proxy' parameter in BrowserConfig and enhance proxy string parsing

2025-08-28 17:21:49 +08:00

chore(profile-test): fix filename typo ( test_crteate_profile.py → test_create_profile.py )

2025-06-12 14:38:32 +03:00

Fix anti-bot detection for large SPA block pages (403/503)

2026-02-20 10:07:59 +00:00

test(releases): Add test cases for release 0.7.0

2025-07-11 22:27:18 +08:00

feat: add avoid_ads/avoid_css resource filtering and pool release lifecycle

2026-02-25 07:12:28 +00:00

__init__.py

- Test all methods

2024-05-14 21:27:41 +08:00

check_dependencies.py

fix: Replace tf-playwright-stealth with playwright-stealth dependency

2026-01-17 11:06:44 +01:00

docker_example.py

docs: remove CRAWL4AI_API_TOKEN references and use correct endpoints in Docker example scripts (#1015 )

2025-08-09 19:37:22 +05:30

test_arun_many.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_cdp_changes.py

Improve CDP connection handling

2026-01-31 11:07:26 +00:00

test_cli_docs.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_config_defaults.py

Add set_defaults/get_defaults/reset_defaults to config classes

2026-01-31 11:44:07 +00:00

test_config_matching_only.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_config_selection.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_docker_api_with_llm_provider.py

feat(docker): add flexible LLM provider configuration

2025-08-05 14:09:54 +08:00

test_docker.py

Fix: README.md urls list

2025-04-29 16:26:35 +02:00

test_issue_1748_screenshot_scroll_delay.py

Fix scroll_delay ignored in take_screenshot_scroller for full-page screenshots

2026-02-25 06:52:53 +03:00

test_link_extractor.py

feat: cleanup unused code and enhance documentation for v0.7.1

2025-07-17 11:35:16 +02:00

test_llm_extraction_parallel_issue_1055.py

This commit resolves issue #1055 where LLM extraction was blocking async

2025-11-06 11:22:45 +01:00

test_llm_simple_url.py

refactor: Update LLMTableExtraction examples and tests

2025-08-15 18:47:31 +08:00

test_llmtxt.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_main.py

Fix: README.md urls list

2025-04-29 16:26:35 +02:00

test_memory_macos.py

refactor(utils): move memory utilities to utils and update imports

2025-08-17 19:14:55 +08:00

test_merge_head_data_scoring.py

Fix total_score not calculated for links that fail head extraction

2026-02-16 20:41:30 +05:30

test_multi_config.py

fix: Correct URL matcher fallback behavior and improve memory monitoring

2025-08-03 16:50:54 +08:00

test_normalize_url.py

#1103 fix(url): enhance URL normalization to handle invalid schemes and trailing slashes

2025-05-19 13:51:16 +08:00

test_pr_1435_redirected_status_code.py

Add tests, docs, and contributors for PRs #1463 and #1435

2026-02-06 09:30:19 +00:00

test_pr_1463_device_scale_factor.py

Add tests, docs, and contributors for PRs #1463 and #1435

2026-02-06 09:30:19 +00:00

test_prefetch_integration.py

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

test_prefetch_mode.py

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

test_prefetch_regression.py

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

test_preserve_https_for_internal_links.py

feat: add preserve_https_for_internal_links flag to maintain HTTPS during crawling. Ref #1410

2025-08-28 17:38:40 +08:00

test_pyopenssl_security_fix.py

test: add verification tests for pyOpenSSL security update

2025-10-23 06:57:25 +00:00

test_pyopenssl_update.py

test: add verification tests for pyOpenSSL security update

2025-10-23 06:57:25 +00:00

test_raw_html_browser.py

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

test_raw_html_edge_cases.py

Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )

2026-01-17 14:19:15 +01:00

test_raw_html_redirected_url.py

Fix redirected_url containing raw HTML content for raw: URLs

2026-01-20 00:45:15 +00:00

test_scraping_strategy.py

Release prep (#749 )

2025-02-28 19:53:35 +08:00

test_source_sibling_selector.py

Add source (sibling selector) support to JSON extraction strategies

2026-02-17 09:04:40 +00:00

test_table_gfm_compliance.py

fix: add leading/trailing pipes to GFM tables (pad_tables=False)

2026-02-17 21:14:36 -05:00

test_virtual_scroll.py

feat: Add virtual scroll support for modern web scraping

2025-06-29 20:41:37 +08:00

test_web_crawler.py

Update all documentation to import extraction strategies directly from crawl4ai.

2025-06-10 18:08:27 +08:00

test_webhook_feature.sh

test: add comprehensive webhook feature test script

2025-10-22 00:35:07 +00:00

WEBHOOK_TEST_README.md

test: add comprehensive webhook feature test script

2025-10-22 00:35:07 +00:00