Commit Graph

  • dc85481180 refactor: Update LLM extraction example with the updated structure ntohidi 2025-06-12 12:23:03 +02:00
  • 5d9213a0e9 fix: Update JavaScript execution in AsyncPlaywrightCrawlerStrategy to handle script errors and add basic download test case. ref #1215 ntohidi 2025-06-12 12:21:40 +02:00
  • c0fd36982d Update all documentation to import extraction strategies directly from crawl4ai. UncleCode 2025-06-10 18:08:27 +08:00
  • 4679ee023d fix: Enhance URLPatternFilter to enforce path boundary checks for prefix matching. ref #1003 ntohidi 2025-06-10 11:19:18 +02:00
  • f9b7090084 Merge pull request #1186 from zimmski/fix-typo-provoder Nasrin 2025-06-10 10:26:45 +02:00
  • cab457e9c7 Merge branch 'next' of https://github.com/unclecode/crawl4ai into next UncleCode 2025-06-10 15:54:20 +08:00
  • 2a0c0ed18d chore(deps): add httpx extras (#1195) UncleCode 2025-06-08 10:06:38 +02:00
  • c73a130c50 Set memory_wait_timeout default to 10 minutes (#1193) UncleCode 2025-06-08 07:53:09 +02:00
  • ef6f4329fa Add use_stemming option to BM25ContentFilter (#1192) UncleCode 2025-06-08 06:57:37 +02:00
  • 4eb90b41b6 Refactor Crawl4AI Assistant: Rename Schema Builder to Click2Crawl, update UI elements, and remove deprecated files feature/c4a-script UncleCode 2025-06-10 15:40:26 +08:00
  • 9442597f81 #1127: Improve URL handling and normalization in scraping strategies AHMET YILMAZ 2025-06-10 11:57:06 +08:00
  • 0ac12da9f3 feat: Major Chrome Extension overhaul with Click2Crawl, instant Schema extraction, and modular architecture UncleCode 2025-06-09 23:18:27 +08:00
  • 74b06d4b80 #1167 Add PHP MIME types to ContentTypeFilter for better file handling AHMET YILMAZ 2025-06-05 11:29:35 +08:00
  • 40640badad feat: add Script Builder to Chrome Extension and reorganize LLM context files UncleCode 2025-06-08 22:02:12 +08:00
  • 926592649e Add Crawl4AI Assistant Chrome Extension UncleCode 2025-06-08 18:34:05 +08:00
  • b870bfdb6c chore(deps): add httpx extras (#1195) UncleCode 2025-06-08 10:06:38 +02:00
  • f54db649c5 chore(deps): add httpx extras codex/add-httpx-and-https-http2]-packages UncleCode 2025-06-08 10:06:13 +02:00
  • 6f3a0ea38e Create "Apps" section in documentation and Add interactive c4a-script playground and LLM context builder for Crawl4AI UncleCode 2025-06-08 15:48:17 +08:00
  • 451b0d6c9a Set memory_wait_timeout default to 10 minutes (#1193) UncleCode 2025-06-08 07:53:09 +02:00
  • c8456d8a01 Set memory_wait_timeout default to 10 minutes codex/add-memory_wait_timeout-parameter-to-memoryadaptivedispatche UncleCode 2025-06-08 07:52:21 +02:00
  • 8b215e17af Add use_stemming option to BM25ContentFilter (#1192) UncleCode 2025-06-08 06:57:37 +02:00
  • 5100dd28be Add use_stemming option to BM25ContentFilter codex/add-use_stemming-parameter-to-bm25contentfiler UncleCode 2025-06-08 06:56:33 +02:00
  • b4bb0ccea0 Update simple-crawling.md UncleCode 2025-06-08 11:33:28 +08:00
  • 08a2cdae53 Add C4A-Script support and documentation UncleCode 2025-06-07 23:07:19 +08:00
  • ca03acbc82 Add some new commands for the Crawl4ai script transpiler and creating an interactive tutorial that allows users to go through multiple steps and apply the syntax to automate the page. Fixed some issues and add several new commands for setting input values, variables, clearing input fields, and more. UncleCode 2025-06-06 23:03:26 +08:00
  • 3f6f2e998c feat(script): add new scripting capabilities and documentation UncleCode 2025-06-06 17:16:53 +08:00
  • 5ac19a61d7 feat: Implement max_scroll_steps parameter for full page scanning. ref: #1168 ntohidi 2025-06-05 16:40:34 +02:00
  • 022cc2d92a fix, Typo Markus Zimmermann 2025-06-05 15:30:38 +02:00
  • e731596315 docs(tutorial_url_seeder): refine summary and next steps, enhance agentic design patterns section UncleCode 2025-06-05 16:20:58 +08:00
  • 641526af81 docs(tutorial_url_seeder): add advanced agentic patterns and implementation examples UncleCode 2025-06-05 16:07:05 +08:00
  • 82a25c037a feat(async_url_seeder): add smart URL filtering to exclude nonsense URLs UncleCode 2025-06-05 15:46:24 +08:00
  • c6fc5c0518 docs(linkdin, url_seeder): update and reorganize LinkedIn data discovery and URL seeder documentation UncleCode 2025-06-05 15:06:25 +08:00
  • b5c2732f88 Add BBC Sp0ort Research Assistant pipeline example UncleCode 2025-06-04 23:23:21 +08:00
  • 09fd3e152a fix: Import os and adjust file saving path in URL seeder demo UncleCode 2025-06-03 23:34:11 +08:00
  • 3f9424e884 Update CHANGELOG UncleCode 2025-06-03 23:27:31 +08:00
  • 3048cc1ff9 feat: Add AsyncUrlSeeder for intelligent URL discovery and filtering UncleCode 2025-06-03 23:27:12 +08:00
  • fcc2abe4db (fix): Update document about LLM extraction strategy to use LLMConfig. REF #1146 ntohidi 2025-06-03 12:53:59 +02:00
  • cc95d3abd4 Fix raw URL parsing logic to correctly handle "raw://" and "raw:" prefixes. REF #1118 ntohidi 2025-06-03 11:19:08 +02:00
  • 5ce3e682f3 Merge pull request #752 from jl-martins/fix-raw-url-parsing Nasrin 2025-06-03 11:10:29 +02:00
  • 28125c1980 Merge branch 'next' into 2025-MAY-2 ntohidi 2025-06-02 20:26:40 +02:00
  • 773ed7b281 Merge branch '2025-APR-1' into 2025-MAY-2 ntohidi 2025-06-02 20:25:58 +02:00
  • 58c1e17170 Merge branch 'main' into fix-raw-url-parsing João Martins 2025-05-30 13:03:25 +01:00
  • 4bcb7171a3 fix(browser_profiler): cross-platform 'q' to quit prokopis3 2025-05-30 14:43:18 +03:00
  • 2b3b728dcd fix(metadata): improve title extraction with fallbacks for edge cases. REF #995 feature/scraping-strategy ntohidi 2025-05-28 10:17:50 +02:00
  • bfec5156ad Refactor content scraping strategies: comment out WebScrapingStrategy references and update to use LXMLWebScrapingStrategy across multiple files. Bring WebScrapingStrategy methods to LXMLWebScrapingStrategy ntohidi 2025-05-27 17:32:45 +02:00
  • 2b2ef12e25 #1156: Refactor completion function calls to use asynchronous version feature/async-llm-extaction Ahmed-Tawfik94 2025-05-27 15:10:34 +08:00
  • b55e27d2ef fix: chanegd error variable name handle_crawl_request, docker api ntohidi 2025-05-26 11:08:23 +02:00
  • d9b3db925a Refactor extraction and completion functions to support asynchronous execution Ahmed-Tawfik94 2025-05-26 16:01:38 +08:00
  • 3b766e1aac Add Google Colab button to LinkedIn Prospect Wizard README UncleCode 2025-05-26 14:35:06 +08:00
  • c3b7b7e918 Add linkedin example ipynb. UncleCode 2025-05-25 17:55:22 +08:00
  • 7d0b447e1c Update setup script to clarify virtual display setup message UncleCode 2025-05-25 16:55:18 +08:00
  • 33b0e222ca Add Colab utilities and rename setup function for clarity UncleCode 2025-05-25 16:50:56 +08:00
  • 1fc45ffac8 Fix temperature typo and enhance LinkedIn extraction with Colab support UncleCode 2025-05-25 16:47:12 +08:00
  • 9c2cc7f73c Fix BM25ContentFilter documentation to use language parameter instead of use_stemming (#1152) devin-ai-integration[bot] 2025-05-25 10:02:13 +08:00
  • c8d28316b9 Fix BM25ContentFilter documentation to use language parameter instead of use_stemming devin/1748137705-fix-bm25contentfilter-docs Devin AI 2025-05-25 01:51:21 +00:00
  • 1c5e76d51a Adjust positioning and set only core component as selected item by default UncleCode 2025-05-24 20:49:44 +08:00
  • 7665a6832f Add LLMContext article and updte JS to not show all components. UncleCode 2025-05-24 20:46:24 +08:00
  • a06710ff03 Adding LLMContext generator to website. UncleCode 2025-05-24 20:37:09 +08:00
  • ad078c3f18 fix(pdf): add timeout to PDF downloads to prevent hanging (#1141) unclecode 2025-05-23 16:05:44 +08:00
  • 400a6621ee Add debug folder to gitignore unclecode 2025-05-23 10:43:05 +08:00
  • 3d46d89759 docs: fix https://github.com/unclecode/crawl4ai/issues/1109 Aravind Karnam 2025-05-22 17:21:42 +05:30
  • da8f0dbb93 fix(browser_profiler): change logger print to info for consistent logging in interactive manager ntohidi 2025-05-22 11:25:51 +02:00
  • 33a0c7a17a fix(logger): add RED color to LogColor enum for enhanced logging options ntohidi 2025-05-22 11:17:28 +02:00
  • bf56787874 refactor(browser): remove commented-out code for clarity UncleCode 2025-05-21 20:32:40 +08:00
  • 08ad7ef257 feat(browser): improve browser session management and profile handling UncleCode 2025-05-21 20:23:17 +08:00
  • 984524ca1c fix(auth): add token authorization header in request preparation to ensure authenticated requests are made Ahmed-Tawfik94 2025-05-21 13:26:11 +08:00
  • 1c0ce41328 Fix managed browser page retrieval when no pages (#1137) UncleCode 2025-05-20 21:12:32 +08:00
  • 0e840aea2b Fix managed browser page retrieval when no pages codex/fix-indexerror-in-browser-manager-py-with-use-managed-browse UncleCode 2025-05-20 21:06:12 +08:00
  • cb8d581e47 fix(docs): update CrawlerRunConfig to use CacheMode for bypassing cache. REF: #1125 ntohidi 2025-05-19 18:03:05 +02:00
  • a55c2b3f88 refactor(logging): update extraction logging to use url_status method Ahmed-Tawfik94 2025-05-19 16:32:22 +08:00
  • ce09648af1 Merge pull request #1054 from Sacristaan/feature/readme_example Ahmed Tawfik 2025-05-19 14:20:21 +08:00
  • a97654270b #1086 fix(markdown): update BM25 filter to use language parameter for stemming Ahmed-Tawfik94 2025-05-19 14:11:46 +08:00
  • b4fc60a555 #1103 fix(url): enhance URL normalization to handle invalid schemes and trailing slashes Ahmed-Tawfik94 2025-05-19 13:51:16 +08:00
  • 137ac014fb #1105 :fix(metadata): optimize article metadata extraction using XPath for improved performance Ahmed-Tawfik94 2025-05-19 13:48:02 +08:00
  • faa98eefbc #1105 got fixed (metadata now matches with meta property article:* Ahmed-Tawfik94 2025-05-19 11:35:13 +08:00
  • 6029097114 feat: add VNC streaming support codex/add-vnc-streaming-endpoint-to-docker-server UncleCode 2025-05-17 19:12:15 +08:00
  • 85ac6fa523 Merge branch 'next' of https://github.com/unclecode/crawl4ai into next UncleCode 2025-05-17 19:04:03 +08:00
  • becc4624bb feat(favicon): add new favicon images for improved branding UncleCode 2025-05-17 19:03:51 +08:00
  • 754ba731fa Fix chunk splitting utilities (#1122) UncleCode 2025-05-17 15:06:53 +08:00
  • 9d8ead59b8 📝 Add docstrings to codex/find-and-fix-a-bug (#1123) codex/find-and-fix-a-bug coderabbitai[bot] 2025-05-17 10:52:55 +08:00
  • 32fcacafa6 📝 Add docstrings to codex/find-and-fix-a-bug coderabbitai/docstrings/14vTVzYa3bH06l5wYNY9jTghrrj9FxxWL coderabbitai[bot] 2025-05-17 02:37:00 +00:00
  • 45f1652d98 Fix merge_chunks splitter usage and remove incorrect return UncleCode 2025-05-17 10:31:19 +08:00
  • ac9981a1f5 feat(favicon): add favicon image and update mkdocs configuration UncleCode 2025-05-16 21:59:23 +08:00
  • 83ef15fd47 feat(favicon): add favicon.ico for improved branding UncleCode 2025-05-16 21:55:07 +08:00
  • a3cb938675 feat(theme): enable dark color mode in mkdocs configuration UncleCode 2025-05-16 21:44:56 +08:00
  • 9b60988232 feat(feedback): add feedback modal styles and integrate into mkdocs configuration UncleCode 2025-05-16 21:25:10 +08:00
  • 98e951f611 fix(mkdocs): remove duplicate gtag.js entry in extra_javascript UncleCode 2025-05-16 20:52:41 +08:00
  • baca2df8df feat(analytics): add Google Tag Manager script and gtag.js for tracking UncleCode 2025-05-16 20:49:02 +08:00
  • 8a5e23d374 feat(crawler): add separate timeout for wait_for condition UncleCode 2025-05-16 17:00:45 +08:00
  • 22725ca87b fix(crawler): initialize captured_console to prevent unbound local error for local HTML files. REF: #1072 ntohidi 2025-05-15 11:29:36 +02:00
  • e0fbd2b0a0 fix(schema): update f parameter description to use lowercase enum values. REF: #1070 ntohidi 2025-05-15 10:45:23 +02:00
  • 32966bea11 fix(extraction): resolve 'str' object has no attribute 'choices' error in LLMExtractionStrategy. Refs: #979 ntohidi 2025-05-15 10:09:19 +02:00
  • a3b0cab52a #1088 is sloved flag -bc now if for --byPass-cache Ahmed-Tawfik94 2025-05-15 11:25:06 +08:00
  • 137556b3dc fix the EXTRACT to match the styling of the other methods medo94my 2025-05-14 16:01:10 +08:00
  • 260e2dc347 fix(browser): create browser config before launching managed browser instance. REF: https://discord.com/channels/1278297938551902308/1278298697540567132/1371683009459392716 ntohidi 2025-05-13 14:03:20 +02:00
  • 25d97d56e4 fix(dependencies): remove duplicated aiofiles from project dependencies. REF #1045 ntohidi 2025-05-13 13:56:12 +02:00
  • 98a56e6e01 Merge next branch Aravind Karnam 2025-05-13 17:12:11 +05:30
  • 1e1c887a2f fix(docker-api): migrate to modern datetime library API Emmanuel Ferdman 2025-05-13 00:04:58 -07:00
  • 897e017361 Set version to 0.6.3 vr0.6.3 v0.6.3 UncleCode 2025-05-12 21:20:10 +08:00
  • a3e9ef91ad fix(crawler): remove automatic page closure in screenshot methods UncleCode 2025-05-12 21:17:57 +08:00