Commit Graph

  • 3f481e9e5c fix: screenshot distortion, deep crawl timeout/arun_many, CLI encoding (#1370, #1818, #1509, #1762) hafezparast 2026-03-12 18:17:13 +08:00
  • 480d938f67 fix: /llm per-request provider override, Redis config from host/port/password (#1611, #1817) hafezparast 2026-03-12 15:53:04 +08:00
  • d907e167a5 Merge pull request #1823 from hafezparast/fix/maysam-screenshot-scan-full-page-1750 Nasrin 2026-03-12 07:39:52 +01:00
  • 57b0d09934 fix: deduplicate BM25ContentFilter output (#1213) (#1824) Maysam Hafezparast 2026-03-12 14:23:34 +08:00
  • 35034f551b docs: add hafezparast to CONTRIBUTORS.md unclecode 2026-03-12 05:43:48 +00:00
  • 6efbffe345 fix: screenshot respects scan_full_page=False (#1750) hafezparast 2026-03-12 12:04:45 +08:00
  • 11b45760da fix: anti-bot false positive on browser JSON, URLPatternFilter prefix match, PDF deserialization unclecode 2026-03-09 14:52:58 +00:00
  • 55956a874d fix: 3 bug fixes (#1487, #1512, #1666) + close 3 already-fixed issues unclecode 2026-03-08 08:44:04 +00:00
  • a7e6da0b19 Merge fix/batch-easy-issues-10: 10 bug fixes + regression test suite unclecode 2026-03-08 03:20:56 +00:00
  • d788c28315 test: add comprehensive regression test suite (291 tests) unclecode 2026-03-08 03:20:52 +00:00
  • 3a75dd3f4c fix: batch fix for 10 open issues (#1520, #1489, #1374, #1424, #1183, #1354, #880, #1031, #1251, #1758) fix/batch-easy-issues-10 unclecode 2026-03-07 09:47:38 +00:00
  • 0c9e3c427e Update CONTRIBUTORS and PR-TODOLIST for batch 5 (15 PRs resolved) unclecode 2026-03-07 08:49:32 +00:00
  • 7c0cc3ed88 fix: batch merge of community PRs (#1622, #1786, #1796, #1795, #1798, #1734, #1290, #1668) unclecode 2026-03-07 08:45:11 +00:00
  • 11ed854155 Update CONTRIBUTORS for PR #462 unclecode 2026-03-07 07:06:49 +00:00
  • 697c2b2a58 fix: add newline before opening code fence in html2text (#462) unclecode 2026-03-07 07:06:41 +00:00
  • 3704758746 Update CONTRIBUTORS for PR #1770 unclecode 2026-03-07 07:01:54 +00:00
  • 04e83aa3c7 docs: modernize deprecated API usage across shipped docs (#1770) unclecode 2026-03-07 07:01:06 +00:00
  • 31d0de23df Update PR-TODOLIST for batch 4 merge (10 PRs) and refresh open PR list unclecode 2026-03-07 06:50:26 +00:00
  • db98aefb03 Update CONTRIBUTORS for PRs #1494, #1715, #1716, #1308, #1789, #1793, #1792, #1794, #1784, #1730 unclecode 2026-03-07 06:47:03 +00:00
  • 761664d29e fix: add TTL expiry for Redis task data to prevent memory growth (#1730) integrate-verified-prs unclecode 2026-03-07 06:17:58 +00:00
  • e47e810aca fix: handle UnicodeEncodeError in URL seeder and strip zero-width chars (#1784) unclecode 2026-03-07 06:16:41 +00:00
  • 1029815fd4 fix: add Windows support for crawler monitor keyboard input (#1794) unclecode 2026-03-07 06:16:12 +00:00
  • d229beeaf8 fix: add wait_for_images option to screenshot endpoint (#1792) unclecode 2026-03-07 06:15:54 +00:00
  • c73aa271ac fix: make link_preview_timeout configurable in AdaptiveConfig (#1793) unclecode 2026-03-07 06:15:44 +00:00
  • 91330ef179 fix: add explicit utf-8 encoding to CLI file output (#1789) unclecode 2026-03-07 06:15:32 +00:00
  • d6a8f57fdd docs: fix css_selector type from list to string in examples (#1308) unclecode 2026-03-07 06:15:14 +00:00
  • e6c2a65625 docs: fix return type annotations to use RunManyReturn (#1716) unclecode 2026-03-07 06:14:49 +00:00
  • 5601861555 docs: add missing CacheMode import in quickstart example (#1715) unclecode 2026-03-07 06:13:32 +00:00
  • 72cc17c113 docs: fix docstring param name crawler_config -> config (#1494) unclecode 2026-03-07 06:13:18 +00:00
  • 814bc4df47 Update CONTRIBUTORS for PRs #1782, #1788, #1783, #1179 unclecode 2026-03-07 04:15:49 +00:00
  • 93f2f03fab Merge PR #1783: fix: strip port from URL domain in is_external_url comparison unclecode 2026-03-07 04:15:35 +00:00
  • 5f65d2d1fd Merge PR #1788: fix: guard against None LLM content and propagate finish_reason unclecode 2026-03-07 04:15:22 +00:00
  • 122be00076 Merge PR #1782: fix: preserve class and id attributes in cleaned_html unclecode 2026-03-07 04:14:21 +00:00
  • 4bde952ade Update CONTRIBUTORS for PRs #1787, #1790, #1804 unclecode 2026-03-07 04:00:36 +00:00
  • ff2ea3429a Merge PR #1804: feat: add score_threshold support to BestFirstCrawlingStrategy unclecode 2026-03-07 03:59:28 +00:00
  • 9ec2969d99 Merge PR #1790: fix: handle nested brackets and parentheses in LINK_PATTERN regex unclecode 2026-03-07 03:59:17 +00:00
  • bd0f6e1bd5 fix: strip markdown fences in force_json_response path (LLM extraction) unclecode 2026-03-07 03:59:00 +00:00
  • d4588904b3 Update PR-TODOLIST and CONTRIBUTORS for merged PRs #1805, #1763, #1803 unclecode 2026-03-07 03:40:36 +00:00
  • b008671345 Merge PR #1803: fix from_serializable_dict to ignore plain data dicts with "type" key unclecode 2026-03-07 03:21:33 +00:00
  • fdb3f8fd98 Merge PR #1763: fix: return in finally block silently suppressing exceptions unclecode 2026-03-07 03:21:22 +00:00
  • 8a677a9db1 Merge PR #1805: fix: prevent AdaptiveCrawler from crawling external domains unclecode 2026-03-07 03:21:11 +00:00
  • 78434eadac fix: prevent AdaptiveCrawler from crawling external domains nightcityblade 2026-03-07 10:57:42 +08:00
  • 379591047d fix: add score_threshold support to BestFirstCrawlingStrategy nightcityblade 2026-03-07 10:55:09 +08:00
  • 71a6526459 fix(docker): narrow from_serializable_dict to ignore plain data dicts with "type" key fix/deserialize-schema-type-false-positive Soham Kukreti 2026-03-06 13:10:35 +05:30
  • d8fd895314 fix(docker): make deep-crawl streaming mirror Python library behaviour (#1779) fix/deep-crawl-stream-docker Soham Kukreti 2026-03-03 12:14:56 +05:30
  • 0273b27821 Fix MediaItem crash on non-numeric width values (e.g. "100%", "auto") ntohidi 2026-03-02 09:51:59 +08:00
  • 0d151eba82 Merge branch 'develop' of https://github.com/unclecode/crawl4ai into develop ntohidi 2026-03-02 09:42:28 +08:00
  • 669b466667 fix: handle nested brackets and parentheses in LINK_PATTERN regex Br1an67 2026-03-02 01:24:02 +08:00
  • b138c949b5 fix: guard against None LLM content and propagate finish_reason Br1an67 2026-03-02 01:18:47 +08:00
  • 20488620cd fix: strip port from URL domain in is_external_url comparison Br1an67 2026-03-02 00:48:50 +08:00
  • 500d047654 fix: preserve class and id attributes in cleaned_html Br1an67 2026-03-02 00:43:23 +08:00
  • 0a45c1056d feat: add separate query_llm_config for adaptive crawler query expansion (#1682) unclecode 2026-02-25 12:26:39 +00:00
  • e05cbfcefe Add unit test for adaptive crawler filters to exclude external links fix/issue-1776-adaptive-external-filter Ahmed-tawfik94 2026-02-26 12:59:09 +03:00
  • a4cc0a9f04 feat: add separate query_llm_config for adaptive crawler query expansion (#1682) unclecode 2026-02-25 12:26:39 +00:00
  • 8f2c2e1f90 docs: add mzyfree to contributors for PR #1689 unclecode 2026-02-25 07:29:28 +00:00
  • c0912f7234 feat: add avoid_ads/avoid_css resource filtering and pool release lifecycle unclecode 2026-02-25 05:56:29 +00:00
  • 8d35d17d01 Merge pull request #1722 from YuriNachos/fix/issue-1652-md-docstring Nasrin 2026-02-25 06:00:09 +01:00
  • d419199a4c Merge pull request #1775 from unclecode/fix/issue-1748-screenshot-scroll-delay Nasrin 2026-02-25 05:54:24 +01:00
  • 9cfeb4626d Document scroll_delay parameter for full-page screenshot crawling fix/issue-1748-screenshot-scroll-delay Ahmed-tawfik94 2026-02-25 06:52:59 +03:00
  • cd81e3cd19 Fix scroll_delay ignored in take_screenshot_scroller for full-page screenshots Ahmed-tawfik94 2026-02-25 06:52:53 +03:00
  • 4f9cc0810b Merge pull request #1764 from PatD42/fix/table-gfm-pipes Nasrin 2026-02-25 03:32:54 +01:00
  • c4cdc02e27 Merge pull request #1761 from AtharvaJaiswal005/fix/total-score-missing-for-failed-head-extraction-1749 Nasrin 2026-02-25 02:25:22 +01:00
  • cbd36b74b2 Add stats dashboard page for LP summit unclecode 2026-02-24 12:58:34 +00:00
  • 5b815c278e Fix redirect URL mismatch in head data merging Atharva Jaiswal 2026-02-24 16:02:16 +05:30
  • 1a9f68d825 Fix cascading context crash from duplicate add_init_script (#1768) unclecode 2026-02-24 09:42:45 +00:00
  • 731388c65c Merge pull request #1760 from nitesh-77/fix/async-chardet-block Nasrin 2026-02-24 09:42:21 +01:00
  • 57be8b8732 Merge pull request #1759 from nitesh-77/fix/filterchain-tuple-attribute-error Nasrin 2026-02-24 08:48:46 +01:00
  • 7435a1654c Merge pull request #1771 from hafezparast/claude/check-fork-sync-S9SSz Nasrin 2026-02-23 06:28:10 +01:00
  • 0e9b677870 Fix MCP bridge httpx timeout: add configurable timeout parameter Claude 2026-02-23 02:10:04 +00:00
  • 254ef0510b Fix anti-bot detection for large SPA block pages (403/503) unclecode 2026-02-20 10:07:59 +00:00
  • aa7b05072d Change security contact email and update date UncleCode 2026-02-20 04:31:12 +01:00
  • 7226f8face Extend try/finally to cover all post-get_page setup code (#1640) unclecode 2026-02-20 02:30:49 +00:00
  • c854e2b899 Fix simulate_user destroying page content via ArrowDown keypress unclecode 2026-02-19 15:03:28 +00:00
  • 8df3541ac4 Skip anti-bot checks and fallback for raw: URLs unclecode 2026-02-19 14:05:56 +00:00
  • 94a77eea30 Move test_repro_1640.py to tests/browser/ unclecode 2026-02-19 06:33:46 +00:00
  • 2060c7e965 Fix browser recycling deadlock under sustained concurrent load (#1640) unclecode 2026-02-19 06:27:25 +00:00
  • 13048a106b Add Tier 3 structural integrity check to anti-bot detector unclecode 2026-02-18 06:59:22 +00:00
  • c9cb0160cf Add token usage tracking to generate_schema / agenerate_schema unclecode 2026-02-18 06:44:17 +00:00
  • 8576331d4e Add Shadow DOM flattening and reorder js_code execution pipeline unclecode 2026-02-18 06:43:00 +00:00
  • c70ab31abd fix: add leading/trailing pipes to GFM tables (pad_tables=False) Patrick 2026-02-17 21:14:36 -05:00
  • 6ea0e38325 Re-raise exceptions in MemoryAdaptiveDispatcher.run_urls after logging Otman404 2026-02-18 00:34:50 +00:00
  • 4fb02f8b50 Warn LLM against hashed/generated CSS class names in schema prompts unclecode 2026-02-17 12:02:58 +00:00
  • d267c650cb Add source (sibling selector) support to JSON extraction strategies unclecode 2026-02-17 09:04:40 +00:00
  • 87f57f1675 Fix return in finally block silently suppressing exceptions Otman404 2026-02-17 02:26:09 +00:00
  • 094242d4a7 Fix total_score not calculated for links that fail head extraction Atharva Jaiswal 2026-02-16 20:41:30 +05:30
  • 4298e26525 fix: run blocking chardet.detect in thread executor #1751 nitesh-77 2026-02-16 06:33:09 +05:30
  • cfa73084ea fix: resolve AttributeError in FilterChain.add_filter by handling tuple immutability nitesh-77 2026-02-16 05:41:22 +05:30
  • ccd24aa824 Fix fallback fetch: run when all proxies crash, skip re-check, never return None unclecode 2026-02-15 10:55:00 +00:00
  • 45d8e1450f Fix proxy escalation: don't re-raise on first proxy exception when chain has alternatives unclecode 2026-02-15 09:55:55 +00:00
  • d028a889d0 Make proxy_config a property so direct assignment also normalizes unclecode 2026-02-14 13:16:36 +00:00
  • 879553955c Add ProxyConfig.DIRECT sentinel for direct-then-proxy escalation unclecode 2026-02-14 10:25:07 +00:00
  • 875207287e Unify proxy_config to accept list, add crawl_stats tracking unclecode 2026-02-14 07:53:46 +00:00
  • 72b546c48d Add anti-bot detection, retry, and fallback system unclecode 2026-02-14 05:24:07 +00:00
  • fdd989785f Sync sec-ch-ua with User-Agent and keep WebGL alive in stealth mode checkpoint-pre-antibot-fallback unclecode 2026-02-13 04:10:47 +00:00
  • 112f44a97d Fix proxy auth for persistent browser contexts unclecode 2026-02-12 11:19:29 +00:00
  • 1a24ac785e Refactor from_kwargs to respect set_defaults and use __init__ defaults unclecode 2026-02-11 13:35:36 +00:00
  • 3fc7730aaf Add remove_consent_popups flag and fix from_kwargs dict deserialization unclecode 2026-02-11 12:46:47 +00:00
  • 44b8afb6dc Improve schema generation prompt for sibling-based layouts unclecode 2026-02-10 08:34:22 +00:00
  • fbc52813a4 Add tests, docs, and contributors for PRs #1463 and #1435 unclecode 2026-02-06 09:30:19 +00:00
  • 37a49c5315 Merge PR #1435: Add redirected_status_code to CrawlResult unclecode 2026-02-06 09:23:54 +00:00