Commit Graph

  • cba4a466e5 feat(browser): add BrowserProfiler class for identity-based browsing UncleCode 2025-03-02 20:32:29 +08:00
  • 7c1705712d fix: https://github.com/unclecode/crawl4ai/issues/756 Aravind Karnam 2025-03-01 18:17:11 +05:30
  • a9e24307cc Release prep (#749) Aravind 2025-02-28 17:23:35 +05:30
  • 3a87b4e43b fix(dependencies): update cchardet to faust-cchardet for compatibility UncleCode 2025-02-26 18:25:58 +08:00
  • 4bcd4cbda1 refactor(pdf): improve PDF processor dependency handling UncleCode 2025-02-25 22:27:55 +08:00
  • 71ce01c9e1 feat(browser): add cdp_url parameter to BrowserManager initialization UncleCode 2025-02-24 14:48:02 +08:00
  • c6d48080a4 feat(logger): add abstract logger base class and file logger implementation UncleCode 2025-02-23 21:23:41 +08:00
  • 46d2f12851 chore: remove old Dockerfile and server script UncleCode 2025-02-22 13:45:04 +08:00
  • 367cd71db9 feat(core): release version 0.5.0 with deep crawling and CLI UncleCode 2025-02-21 19:55:02 +08:00
  • 2af958e12c Feat/llm config (#724) Aravind 2025-02-21 13:11:37 +05:30
  • 3cb28875c3 refactor(config): enhance serialization and config handling UncleCode 2025-02-19 17:23:25 +08:00
  • dad592c801 2025 feb alpha 1 (#685) Aravind 2025-02-19 11:43:17 +05:30
  • c171891999 Merge branch 'main' into next UncleCode 2025-02-19 13:26:42 +08:00
  • 3b1025abbb Merge branch 'main' of https://github.com/unclecode/crawl4ai UncleCode 2025-02-19 13:24:18 +08:00
  • f00dcc276f Update README.md (#562) UncleCode 2025-01-26 04:00:28 +01:00
  • 392c923980 feat(docker): add JWT authentication and improve server architecture UncleCode 2025-02-18 22:07:13 +08:00
  • 2864015469 feat(docker): implement supervisor and secure API endpoints UncleCode 2025-02-17 20:31:20 +08:00
  • 27af4cc27b Fix "raw://" URL parsing logic João Martins 2025-02-15 15:34:59 +00:00
  • 8bb799068e feat(crawler): add HTTP crawler strategy for lightweight web scraping UncleCode 2025-02-15 19:26:30 +08:00
  • 063df572b0 docs(examples): add SERP API project example UncleCode 2025-02-14 23:06:16 +08:00
  • 966fb47e64 feat(config): enhance serialization and add deep crawling exports UncleCode 2025-02-13 21:45:19 +08:00
  • 43e09da694 refactor(crawler): remove content filter functionality UncleCode 2025-02-12 21:59:19 +08:00
  • 69705df0b3 fix(install): ensure proper exit after running doctor command UncleCode 2025-02-11 19:48:23 +08:00
  • 91a5fea11f feat(cli): add command line interface with comprehensive features UncleCode 2025-02-10 16:58:52 +08:00
  • 467be9ac76 feat(deep-crawling): add DFS strategy and update exports; refactor CLI entry point UncleCode 2025-02-09 20:23:40 +08:00
  • 19df96ed56 feat(proxy): add proxy rotation strategy UncleCode 2025-02-09 18:49:10 +08:00
  • b957ff2ecd refactor(crawler): improve HTML handling and cleanup codebase UncleCode 2025-02-07 21:56:27 +08:00
  • 91073c1244 refactor(crawling): improve type hints and code cleanup UncleCode 2025-02-07 19:01:59 +08:00
  • 926beee832 base-config structure is changed (#618) Sezer Bozkır 2025-02-07 12:11:51 +03:00
  • a9415aaaf6 refactor(deep-crawling): reorganize deep crawling strategies and add new implementations UncleCode 2025-02-05 22:50:39 +08:00
  • c308a794e8 refactor(deep-crawl): reorganize deep crawling functionality into dedicated module UncleCode 2025-02-04 23:28:17 +08:00
  • bc7559586f feat(crawler): add deep crawling capabilities with BFS strategy UncleCode 2025-02-04 01:24:49 +08:00
  • 04bc643cec feat(api): improve cache handling and add API tests UncleCode 2025-02-02 20:53:31 +08:00
  • 33a21d6a7a refactor(docker): improve server architecture and configuration UncleCode 2025-02-02 20:19:51 +08:00
  • 7b1ef07c41 refactor(docker): remove unused models and utilities for cleaner codebase UncleCode 2025-02-01 20:10:13 +08:00
  • 2f15976b34 feat(docker): enhance Docker deployment setup and configuration UncleCode 2025-02-01 19:33:27 +08:00
  • 20920fa17b refactor(docker): clean up import statements in server.py UncleCode 2025-02-01 14:28:28 +08:00
  • 53ac3ec0b4 feat(docker): add Docker service integration and config serialization UncleCode 2025-01-31 18:00:16 +08:00
  • ce4f04dad2 feat(docker): add Docker deployment configuration and API server UncleCode 2025-01-31 15:22:21 +08:00
  • f7ce2d42c9 feat: Add deep crawl capabilities to arun_many function feature/scraper Aravind Karnam 2025-01-30 17:49:58 +05:30
  • f81712eb91 refactor(core): reorganize project structure and remove legacy code UncleCode 2025-01-30 19:35:06 +08:00
  • f6edb8342e Refactor: remove the old deep_crawl method Aravind Karnam 2025-01-30 16:22:41 +05:30
  • ca3f0126d3 Refactor:Moved deep_crawl_strategy, inside crawler run config Aravind Karnam 2025-01-30 16:18:15 +05:30
  • 31938fb922 feat(crawler): enhance JavaScript execution and PDF processing UncleCode 2025-01-29 21:03:39 +08:00
  • 858c18df39 fix: removed child_urls from CrawlResult Aravind Karnam 2025-01-29 18:08:34 +05:30
  • 2c8f2ec5a6 Refactor: Renamed scrape to traverse and deep_crawl in a few sections where it applies Aravind Karnam 2025-01-29 16:24:11 +05:30
  • 9ef43bc5f0 Refactor: Move adeep_crawl as method of crawler itself. Create attributes in CrawlResult to reconstruct the tree once deep crawling is completed Aravind Karnam 2025-01-29 15:58:21 +05:30
  • 84ffdaab9a Refactor: Move adeep_crawl as method of crawler itself. Create attributes in CrawlResult to reconstruct the tree once deep crawling is completed Aravind Karnam 2025-01-29 13:06:09 +05:30
  • 78223bc847 feat: create ScraperPageResult model to attach score and depth attributes to yielded/returned crawl results Aravind Karnam 2025-01-28 16:47:30 +05:30
  • 60ce8bbf55 Merge: with v-0.4.3b Aravind Karnam 2025-01-28 12:59:53 +05:30
  • 85847ff13f feat: Aravind Karnam 2025-01-28 12:39:45 +05:30
  • f34b4878cf fix: code formatting Aravind Karnam 2025-01-28 10:00:01 +05:30
  • f8fd9d9eff feat(pdf): add PDF processing capabilities UncleCode 2025-01-27 21:24:15 +08:00
  • d9324e3454 fix: Move the creation of crawler outside the main loop Aravind Karnam 2025-01-27 18:31:13 +05:30
  • 0ff95c83bc feat: change input params to scraper, Add asynchronous context manager to AsyncWebScraper, Optimise filter application Aravind Karnam 2025-01-27 18:13:33 +05:30
  • bb6450f458 Remove robots.txt compliance from scraper Aravind Karnam 2025-01-27 11:58:54 +05:30
  • 513d008de5 feat: Merge reviews from unclecode for scorers and filters & Remove the robots.txt compliance from scraper since that will be now handled by crawler Aravind Karnam 2025-01-27 11:54:10 +05:30
  • 0f00821df5 Fix version vr0.4.3b3 UncleCode 2025-01-26 18:08:24 +08:00
  • dde14eba7d Update README.md (#562) UncleCode 2025-01-26 04:00:28 +01:00
  • 149b69c832 Update README.md unclecode-patch-7 UncleCode 2025-01-26 10:59:48 +08:00
  • 54c84079c4 docs(api): improve formatting and readability of API documentation UncleCode 2025-01-25 22:06:11 +08:00
  • d0586f09a9 Merge branch 'vr0.4.3b3' UncleCode 2025-01-25 21:57:29 +08:00
  • 09ac7ed008 feat(demo): uncomment feature demos and add fake-useragent dependency UncleCode 2025-01-25 21:56:08 +08:00
  • 97796f39d2 docs(examples): update proxy rotation demo and disable other demos UncleCode 2025-01-25 21:52:35 +08:00
  • 4d7f91b378 refactor(user-agent): improve user agent generation system UncleCode 2025-01-25 21:16:39 +08:00
  • 69a77222ef feat(browser): add CDP URL configuration support UncleCode 2025-01-24 15:53:47 +08:00
  • 0afc3e9e5e refactor(examples): update API usage in features demo UncleCode 2025-01-23 22:37:29 +08:00
  • 65d33bcc0f style(docs): improve code formatting in features demo UncleCode 2025-01-23 22:36:58 +08:00
  • 6a01008a2b docs(multi-url): improve documentation clarity and update examples UncleCode 2025-01-23 22:33:36 +08:00
  • cf3e1e748d feat(scraper): add optimized URL scoring system UncleCode 2025-01-23 20:46:33 +08:00
  • 6dc01eae3a refactor(core): improve type hints and remove unused file UncleCode 2025-01-23 18:53:22 +08:00
  • 7b7fe84e0d docs(readme): resolve merge conflict and update version info UncleCode 2025-01-22 20:52:42 +08:00
  • 5c36f4308f Merge branch 'main' of https://github.com/unclecode/crawl4ai UncleCode 2025-01-22 20:51:52 +08:00
  • 45809d1c91 Merge branch 'vr0.4.3b2' UncleCode 2025-01-22 20:51:46 +08:00
  • 357414c345 docs(readme): update version references and fix links UncleCode 2025-01-22 20:46:39 +08:00
  • 260b9120c3 docs(examples): update v0.4.3 features demo to v0.4.3b2 vr0.4.3b2 UncleCode 2025-01-22 20:41:43 +08:00
  • 976ea52167 docs(examples): update demo scripts and fix output formats UncleCode 2025-01-22 20:40:03 +08:00
  • e6ef8d91ba refactor(scraper): optimize URL validation and filter performance UncleCode 2025-01-22 19:45:56 +08:00
  • d21ffad3a2 chore(git): update gitignore patterns scrapper UncleCode 2025-01-22 17:22:26 +08:00
  • 2d69bf2366 refactor(models): rename final_url to redirected_url for consistency UncleCode 2025-01-22 17:14:24 +08:00
  • dee5fe9851 feat(proxy): add proxy rotation support and documentation UncleCode 2025-01-22 16:11:01 +08:00
  • 88697c4630 docs(readme): update version and feature announcements for v0.4.3b1 vr0.4.3b1 UncleCode 2025-01-21 21:20:04 +08:00
  • 6e78c56dda Refactor: Removed all scheduling logic from scraper. From now scraper expects arun_many to handle all scheduling. Scraper will only do traversal, validations, compliance checks, URL filtering and scoring etc. Reformatted some of the scraper files with Black code formatter Aravind Karnam 2025-01-21 18:44:43 +05:30
  • 16b8d4945b feat(release): prepare v0.4.3 beta release UncleCode 2025-01-21 21:03:11 +08:00
  • 67fa06c09b Refactor: Removed all scheduling logic from scraper. From now scraper expects arun_many to handle all scheduling. Scraper will only do traversal, validations, compliance checks, URL filtering and scoring etc. Reformatted some of the scraper files with Black code formatter Aravind Karnam 2025-01-21 17:49:51 +05:30
  • d09c611d15 feat(robots): add robots.txt compliance support UncleCode 2025-01-21 17:54:13 +08:00
  • 26d78d8512 Merge branch 'next' into feature/scraper Aravind Karnam 2025-01-21 12:35:45 +05:30
  • 1079965453 refactor: Remove the URL processing logic out of scraper Aravind Karnam 2025-01-21 12:16:59 +05:30
  • 9247877037 feat(proxy): add proxy configuration support to CrawlerRunConfig UncleCode 2025-01-20 22:14:05 +08:00
  • a677c2b61d Merge pull request #496 from aravindkarnam/scraper-uc Aravind 2025-01-20 16:55:41 +05:30
  • 2cec527a22 feat(extraction): add LLM-powered schema generation utility UncleCode 2025-01-20 17:28:00 +08:00
  • 4b1309cbf2 feat(crawler): add URL redirection tracking UncleCode 2025-01-19 19:53:38 +08:00
  • 8b6fe6a98f docs(api): add streaming mode documentation and examples UncleCode 2025-01-19 18:21:34 +08:00
  • 91463e34f1 feat(config): add streaming support and config cloning UncleCode 2025-01-19 17:51:47 +08:00
  • 1221be30a3 feat(browser): improve browser context management and add shared data support UncleCode 2025-01-19 17:12:03 +08:00
  • 6dfa9cb703 Streamline Feature requests, bug reports and Forums with Forms & Templates (#465) Aravind 2025-01-19 14:23:03 +05:30
  • e363234172 feat(dispatcher): add streaming support for URL processing UncleCode 2025-01-19 14:03:34 +08:00
  • 3d09b6a221 feat(content-filter): add LLMContentFilter for intelligent markdown generation UncleCode 2025-01-18 19:31:07 +08:00
  • 2d6b19e1a2 refactor(browser): improve browser path management UncleCode 2025-01-17 22:14:37 +08:00
  • ece9202b61 fix(dispatcher): adjust memory threshold and fix dispatcher initialization UncleCode 2025-01-16 21:58:52 +08:00