Commit Graph

  • 0aacafed0a Merge PR #1463: Add configurable device_scale_factor for screenshot quality unclecode 2026-02-06 09:19:42 +00:00
  • 719e83e105 Update PR todolist — refresh open PRs, add 6 new, classify unclecode 2026-02-06 09:06:13 +00:00
  • 3401dd1620 Fix browser recycling under high concurrency — version-based approach unclecode 2026-02-05 07:48:12 +00:00
  • c046918bb4 Add memory-saving mode, browser recycling, and CDP leak fixes unclecode 2026-02-04 02:00:13 +00:00
  • 4e56f3e00d Add contributing guide and update mkdocs navigation for community resources ntohidi 2026-02-03 09:46:54 +01:00
  • 0bfcf080dd Add contributors from PRs #1133, #729 unclecode 2026-02-02 07:56:37 +00:00
  • b962699c0d Add contributors from PRs #973, #1073, #931 unclecode 2026-02-02 07:14:12 +00:00
  • ffd3face6b Remove duplicate PROMPT_EXTRACT_BLOCKS definition in prompts.py unclecode 2026-02-02 07:04:35 +00:00
  • c790231aba Fix browser context memory leak — signature shrink + LRU eviction (#943) unclecode 2026-02-01 14:23:04 +00:00
  • bb523b6c6c Merge PRs #1077, #1281 — bs4 deprecation and proxy auth fix unclecode 2026-02-01 07:06:39 +00:00
  • 980dc73156 Merge PR #1281: Fix proxy auth ERR_INVALID_AUTH_CREDENTIALS unclecode 2026-02-01 07:05:00 +00:00
  • 98aea2fb46 Merge PR #1077: Fix bs4 deprecation warning (text -> string) unclecode 2026-02-01 07:04:31 +00:00
  • a56dd07559 Merge PRs #1667, #1296, #1364 — CLI deep-crawl, env var, script tags unclecode 2026-02-01 06:53:53 +00:00
  • 312cef8633 Fix PR #1296: restore .crawl4ai subfolder in VersionManager path unclecode 2026-02-01 06:22:16 +00:00
  • a244e4d781 Merge PR #1364: Fix script tag removal losing adjacent text in cleaned_html unclecode 2026-02-01 06:22:10 +00:00
  • 0f83b05a2d Merge PR #1296: Fix VersionManager ignoring CRAWL4_AI_BASE_DIRECTORY env var unclecode 2026-02-01 06:21:40 +00:00
  • 37995d4d3f Merge PR #1667: Fix deep-crawl CLI outputting only the first page unclecode 2026-02-01 06:21:25 +00:00
  • dc4ae73221 Merge PRs #1714, #1721, #1719, #1717 and fix base tag pipeline unclecode 2026-02-01 05:41:33 +00:00
  • 5cd0648d71 Merge PR #1717: Allow local embeddings by removing OpenAI fallback unclecode 2026-02-01 05:02:18 +00:00
  • 9172581416 Merge PR #1719: Include GoogleSearchCrawler script.js in package distribution unclecode 2026-02-01 05:02:05 +00:00
  • c39e796a18 Merge PR #1721: Fix <base> tag ignored in html2text relative link resolution unclecode 2026-02-01 05:01:52 +00:00
  • ccab926f1f Merge PR #1714: Replace tf-playwright-stealth with playwright-stealth unclecode 2026-02-01 05:01:31 +00:00
  • 43738c9ed2 Fix can_process_url() to receive normalized URL in deep crawl strategies unclecode 2026-02-01 03:45:52 +00:00
  • ee717dc019 Add contributor for PR #1746 and fix test pytest marker unclecode 2026-02-01 03:10:32 +00:00
  • 7c5933e2e7 Merge PR #1746: Fix sitemap-only URL seeding avoiding Common Crawl calls unclecode 2026-02-01 02:57:06 +00:00
  • 5be0d2d75e Add contributor and docs for force_viewport_screenshot feature unclecode 2026-02-01 01:10:20 +00:00
  • e19492a82e Merge PR #1694: feat: add force viewport screenshot unclecode 2026-02-01 01:05:52 +00:00
  • 55a2cc8181 Document set_defaults/get_defaults/reset_defaults in config guides unclecode 2026-01-31 11:46:53 +00:00
  • 13a414802b Add set_defaults/get_defaults/reset_defaults to config classes unclecode 2026-01-31 11:44:07 +00:00
  • 19b9140c68 Improve CDP connection handling unclecode 2026-01-31 11:07:26 +00:00
  • 694ba44a04 Added fix for URL Seeder forcing Common Crawl index in case of a "sitemap" ChiragBellara 2026-01-30 09:33:30 -08:00
  • 0104db6de2 Fix critical RCE via deserialization and eval() in /crawl endpoint unclecode 2026-01-30 08:46:01 +00:00
  • ad5ebf166a Merge pull request #1718 from YuriNachos/fix/issue-1704-default-logger Nasrin 2026-01-29 13:03:11 +01:00
  • 034bddf557 Merge pull request #1733 from jose-blockchain/fix/1686-docker-health-version Nasrin 2026-01-29 12:55:24 +01:00
  • 911bbce8b1 Fix agenerate_schema() JSON parsing for Anthropic models unclecode 2026-01-29 11:38:53 +00:00
  • 0a17fe8f19 Improve page tracking with global CDP endpoint-based tracking unclecode 2026-01-28 09:30:20 +00:00
  • 9b52c1490b Fix page reuse race condition when create_isolated_context=False unclecode 2026-01-28 01:43:21 +00:00
  • 656b938ef8 Merge branch 'main' into develop unclecode 2026-01-27 01:58:45 +00:00
  • 55de32d925 Add CycloneDX SBOM and generation script unclecode 2026-01-27 01:45:42 +00:00
  • 21e6c418be Fix: Keep storage_state.json in profile shrink unclecode 2026-01-26 13:06:31 +00:00
  • 18d2ef4a24 Fix: Disable cookie encryption for portable profiles unclecode 2026-01-26 12:57:17 +00:00
  • ef226f5787 Add: Cloud CLI module for profile management unclecode 2026-01-25 09:35:48 +00:00
  • 94e19a4c72 Enhance browser profile management capabilities unclecode 2026-01-24 08:02:52 +00:00
  • 79ebfce913 Refactor HTML block delimiter to use config constant unclecode 2026-01-24 04:19:50 +00:00
  • 2d5e5306c5 Add support for parallel URL processing in extraction utilities unclecode 2026-01-24 04:13:39 +00:00
  • b0b3ca1222 Refactor extraction strategy internals and improve error handling unclecode 2026-01-24 03:08:41 +00:00
  • 777d0878f2 Update security contact emails in SECURITY.md ntohidi 2026-01-22 09:53:24 +01:00
  • fbfbc6995c Fix deep crawl cancellation example to use DFS for precise control unclecode 2026-01-22 06:25:34 +00:00
  • 1e2b7fe7e6 Add documentation and example for deep crawl cancellation unclecode 2026-01-22 06:10:54 +00:00
  • f6897d1429 Add cancellation support for deep crawl strategies unclecode 2026-01-22 06:08:25 +00:00
  • c9a271a3ff Merge branch 'fix/1686-docker-health-version' of https://github.com/jose-blockchain/crawl4ai into fix/1686-docker-health-version José 2026-01-20 23:45:13 +01:00
  • 9123f65140 Fix #1686: Use dynamic version from crawl4ai package in health endpoint José 2026-01-20 23:40:38 +01:00
  • fe1c1cb0bc Fix #1686: Use dynamic version from crawl4ai package in health endpoint José 2026-01-20 23:40:38 +01:00
  • 418bfcfd3b Fix redirected_url containing raw HTML content for raw: URLs unclecode 2026-01-20 00:31:12 +00:00
  • 857b1ed23b Merge branch 'main' into develop ntohidi 2026-01-19 13:25:56 +01:00
  • f6f7f1b551 Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712) Nasrin 2026-01-17 14:19:15 +01:00
  • 2016d669a9 fix: Respect <base> tag for relative link resolution in html2text Yurii Chukhlib 2026-01-17 11:17:28 +01:00
  • 232f00752c fix: Initialize default logger in AsyncPlaywrightCrawlerStrategy Yurii Chukhlib 2026-01-17 11:14:42 +01:00
  • ef8f0c6096 fix: Include GoogleSearchCrawler script.js in package distribution Yurii Chukhlib 2026-01-17 11:15:30 +01:00
  • 37ff85f4b1 fix: Add docstring to MCP tool 'md' endpoint Yurii Chukhlib 2026-01-17 11:24:20 +01:00
  • 2a04fc319a fix: Allow local embeddings by removing OpenAI fallback in EmbeddingStrategy Yurii Chukhlib 2026-01-17 11:10:33 +01:00
  • 624dfe7af5 fix: Replace tf-playwright-stealth with playwright-stealth dependency Yurii Chukhlib 2026-01-17 11:06:44 +01:00
  • a5354f267a Merge branch 'develop' into release/v0.8.0 v0.8.0 docker-rebuild-v0.8.0 release/v0.8.0 ntohidi 2026-01-16 11:34:24 +01:00
  • 6090629ee0 Fix: Enable litellm.drop_params for O-series/GPT-5 model compatibility unclecode 2026-01-16 09:56:38 +00:00
  • a00da6557b Add async agenerate_schema method for schema generation unclecode 2026-01-16 06:19:33 +00:00
  • 177e298af0 Update security researcher acknowledgment with a hyperlink for Neo by ProjectDiscovery ntohidi 2026-01-14 14:19:23 +01:00
  • f09146c435 Release v0.8.0: The v0.8.0 Update ntohidi 2026-01-14 13:46:42 +01:00
  • 315eae9e6f Add examples for deep crawl crash recovery and prefetch mode in documentation ntohidi 2026-01-14 12:58:44 +01:00
  • 530cde351f Add release notes for v0.8.0, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates unclecode 2026-01-12 13:45:42 +00:00
  • 122b4fe3f0 Add release notes for v0.7.9, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates ntohidi 2026-01-12 13:46:39 +01:00
  • acfab80dd4 Enhance authentication flow by implementing JWT token retrieval and adding authorization headers to API requests ntohidi 2026-01-12 13:46:32 +01:00
  • f24396c23e Fix critical RCE and LFI vulnerabilities in Docker API deployment unclecode 2026-01-12 04:14:37 +00:00
  • cee79a8129 feat: add force viewport screenshot TheRedRad 2026-01-06 21:12:17 +01:00
  • 6b2dca76c3 Docs: Add multi-sample schema generation section unclecode 2026-01-04 12:50:08 +00:00
  • 0d3f9e65b0 Add MEMORY.md to gitignore unclecode 2025-12-30 03:04:30 +00:00
  • db61ab8559 Update URL seeder docs with smart TTL cache parameters unclecode 2025-12-30 03:03:41 +00:00
  • 3d78001c30 Add smart TTL cache for sitemap URL seeder unclecode 2025-12-30 01:59:09 +00:00
  • 2550f3d2d5 Add browser pipeline support for raw:/file:// URLs unclecode 2025-12-27 12:32:42 +00:00
  • a43256b27a Add proxy support to HTTP crawler strategy unclecode 2025-12-26 13:17:28 +00:00
  • 9e7f5aa44b Updates on proxy rotation and proxy configuration unclecode 2025-12-26 12:45:57 +00:00
  • c85f56b085 Merge pull request #1677 from unclecode/sponsors/thor_data UncleCode 2025-12-25 12:08:21 +08:00
  • fde4e9f0c6 Add prefetch mode for two-phase deep crawling unclecode 2025-12-25 01:55:08 +00:00
  • 3937efcf0b Add base_url parameter to CrawlerRunConfig for raw HTML processing unclecode 2025-12-24 06:05:55 +00:00
  • 624e34164d Fix: HTTP strategy raw: URL parsing truncates at # character unclecode 2025-12-24 04:31:57 +00:00
  • a234959b12 sponsors: Add thor data as sponsor sponsors/thor_data Aravind Karnam 2025-12-23 20:45:00 +05:30
  • da82f0ada5 sponsors: Add thor data as sponsor Aravind Karnam 2025-12-23 16:28:26 +05:30
  • 31ebf37252 Add crash recovery for deep crawl strategies unclecode 2025-12-22 14:51:10 +00:00
  • 67e03d64b8 Add PDF and MHTML support for raw: and file:// URLs unclecode 2025-12-22 01:24:51 +00:00
  • 444cb14f82 Add _generate_screenshot_from_html for raw: and file:// URLs unclecode 2025-12-22 01:10:20 +00:00
  • 48426f73f0 Some debugging for caching unclecode 2025-12-21 04:45:52 +00:00
  • f6b29a8f9f Update gitignore unclecode 2025-12-21 03:15:15 +00:00
  • 02acad1dc6 Fix CDP connection handling: support WS URLs and proper cleanup unclecode 2025-12-18 22:04:52 +08:00
  • d5a0866e03 fix: pdf processing to target only css_selector, thereby give users a choice to discard unnecssary element from page in pdf generated pdf_processing Aravind Karnam 2025-12-18 16:32:16 +05:30
  • d10ca38599 Add init_scripts support to BrowserConfig for pre-page-load JS injection unclecode 2025-12-14 01:58:11 +00:00
  • ecedb6113e Add context caching to create_isolated_context branch unclecode 2025-12-13 08:58:21 +00:00
  • 55eb968a8d Add create_isolated_context flag for concurrent CDP crawls unclecode 2025-12-13 08:29:05 +00:00
  • 6185d3cb32 Revert context matching attempts - Playwright cannot see CDP-created contexts unclecode 2025-12-13 07:57:29 +00:00
  • 8014805c17 Fix: use CDP to find context by browserContextId for concurrent sessions unclecode 2025-12-13 07:02:23 +00:00
  • c1e485e0b0 Fix: use target_id to find correct page in get_page unclecode 2025-12-13 06:51:54 +00:00
  • b2e4a1f2e3 Fix: find context by target_id for concurrent CDP connections unclecode 2025-12-13 06:41:13 +00:00