Commit Graph

  • 76dd86d1b3 Merge remote-tracking branch 'origin/linkedin-prep' into next UncleCode 2025-05-08 17:13:59 +08:00
  • 206a9dfabd feat(crawler): add session management and view-source support UncleCode 2025-05-08 17:13:35 +08:00
  • 1af3d1c2e0 Merge branch '2025-APR-1' of https://github.com/unclecode/crawl4ai into 2025-APR-1 ntohidi 2025-05-08 11:11:32 +02:00
  • c1041b9bbe fix: exclude_external_images flag simply discards elements ref:https://github.com/unclecode/crawl4ai/issues/345 Aravind Karnam 2025-05-07 18:43:29 +05:30
  • f6e25e2a6b fix: check_robots_txt to support wildcard rules ref: #699 Aravind Karnam 2025-05-07 17:53:30 +05:30
  • ee93acbd06 fix(async_playwright_crawler): use config directly instead of self.config for verbosity check ntohidi 2025-05-07 12:32:38 +02:00
  • 2b17f234f8 docs: update direct passing of content_filter to CrawlerRunConfig and instead pass it via MarkdownGenerator. Ref: #603 Aravind Karnam 2025-05-07 15:20:36 +05:30
  • eebb8c84f0 fix(requirements): add PyPDF2 dependency for PDF processing ntohidi 2025-05-07 11:18:44 +02:00
  • 12783fabda fix(dependencies): update pillow version constraint to allow newer releases. ref #709 ntohidi 2025-05-07 11:18:13 +02:00
  • 39e3b792a1 Merge branch 'next' into 2025-APR-1 Aravind Karnam 2025-05-07 10:25:25 +05:30
  • aaf05910eb fix: removed unnecessary imports and installs Aravind Karnam 2025-05-06 15:53:55 +05:30
  • a0555d5fa6 merge:from next branch Aravind Karnam 2025-05-06 15:16:47 +05:30
  • 9a0585c8f6 fix bs4 warning on text kwarg - switch to string RoyLeviLangware 2025-05-06 11:44:48 +03:00
  • 38ebcbb304 fix: provide support for local llm by adding it to the arguments Aravind Karnam 2025-05-05 10:34:38 +05:30
  • 9b5ccac76e feat(extraction): add RegexExtractionStrategy for pattern-based extraction UncleCode 2025-05-02 21:15:24 +08:00
  • 87d4b0fff4 format bash scripts properly so copy & paste may work without issues Aravind Karnam 2025-05-02 17:21:09 +05:30
  • bd5a9ac632 updated readme with arguments for litellm Aravind Karnam 2025-05-02 17:04:42 +05:30
  • 6650b2f34a fix: replace openAI with litellm to support multiple llm providers Aravind Karnam 2025-05-02 16:51:15 +05:30
  • 5cc58f9bb3 fix: 1. duplicate verbose flag 2.inconsistency in argument name --profile-name 3. duplicate initialisaiton of env_defaults Aravind Karnam 2025-05-02 16:40:58 +05:30
  • baf7f6a6f5 fix: typo in readme Aravind Karnam 2025-05-02 16:33:11 +05:30
  • e0cd3e10de fix(crawler): initialize captured_console variable for local file processing ntohidi 2025-05-02 10:35:35 +02:00
  • 94e9959fe0 feat(docker-api): add job-based polling endpoints for crawl and LLM tasks UncleCode 2025-05-01 21:24:52 +08:00
  • 7c2fd5202e fix: incorrect params and commands in linkedin app readme Aravind Karnam 2025-05-01 18:27:03 +05:30
  • ee01b81f3e Merge branch 'merge-pr971' into next UncleCode 2025-05-01 18:58:41 +08:00
  • 0e5d672763 Merge branch 'pr-971' into merge-pr971 merge-pr971 UncleCode 2025-05-01 18:57:28 +08:00
  • cd2b490b40 refactor(logger): Apply the Enumeration for color wakaka6 2025-05-01 16:59:33 +08:00
  • 50f0b83fcd feat(linkedin): add prospect-wizard app with scraping and visualization UncleCode 2025-04-30 19:38:25 +08:00
  • 1d6a2b9979 fix(crawler): surface real redirect status codes and keep redirect chain. the 30x response instead of always returning 200. Refs #660 ntohidi 2025-04-30 12:29:17 +02:00
  • 039be1b1ce feat: add pdf2image dependency to requirements ntohidi 2025-04-30 11:41:35 +02:00
  • 9499164d3c feat(browser): improve browser profile management and cleanup UncleCode 2025-04-29 23:04:32 +08:00
  • 53245e4e0e Fix: README.md urls list Marc Sacristán 2025-04-29 16:26:35 +02:00
  • 2140d9aca4 fix(browser): correct headless mode default behavior UncleCode 2025-04-26 21:09:50 +08:00
  • ccec40ed17 feat(models): add dedicated tables field to CrawlResult UncleCode 2025-04-24 18:36:25 +08:00
  • 094201ab2a Merge next + resolve conflicts Aravind Karnam 2025-04-23 19:44:50 +05:30
  • ad4dfb21e1 Remoce "rc1" UncleCode 2025-04-23 21:00:00 +08:00
  • 7784b2468e feat(docs): enhance Ask AI button UX and add v0.6.0 release notes UncleCode 2025-04-23 20:07:03 +08:00
  • 146f9d415f Update README vr0.6.0 UncleCode 2025-04-23 19:50:33 +08:00
  • 37fd80e4b9 feat(docs): add mobile-friendly navigation menu UncleCode 2025-04-23 19:44:25 +08:00
  • 949a93982e feat(docs): update documentation and disable Ask AI feature UncleCode 2025-04-23 19:02:39 +08:00
  • c4f5651199 chore(deps): upgrade to Python 3.12 and prepare for 0.6.0 release UncleCode 2025-04-23 16:35:15 +08:00
  • b0aa8bc9f7 Update README vr0.6.0rc1 UncleCode 2025-04-22 23:21:42 +08:00
  • c98ffe2130 Update CHANGELOG UncleCode 2025-04-22 22:36:41 +08:00
  • 4812f08a73 feat(docker): update Docker deployment for v0.6.0 UncleCode 2025-04-22 22:35:25 +08:00
  • f3ebb38edf Merge PR #899 into next, resolve conflicts in server.py and docs/browser-crawler-config.md v0.5.5 unclecode 2025-04-22 14:56:47 +08:00
  • 0007aea204 Update changelog UncleCode 2025-04-21 23:21:49 +08:00
  • b5c25731e6 feat(browser): add geolocation, locale and timezone support UncleCode 2025-04-21 23:20:59 +08:00
  • 5297e362f3 feat(mcp): Implement MCP protocol and enhance server capabilities UncleCode 2025-04-21 22:22:02 +08:00
  • 14a31456ef fix(docs): update browser-crawler-config example to include LLMContentFilter and DefaultMarkdownGenerator, fix syntax errors ntohidi 2025-04-21 13:59:49 +02:00
  • a58c8000aa refactor(server): migrate to pool-based crawler management UncleCode 2025-04-20 20:14:26 +08:00
  • b27bb367e8 merge next. Resolve conflicts. Fix some import errors and error handling in server.py Aravind Karnam 2025-04-19 20:27:47 +05:30
  • d2648eaa39 fix: solved with deepcopy of elements https://github.com/unclecode/crawl4ai/issues/902 Aravind Karnam 2025-04-19 20:08:36 +05:30
  • c2902fd200 reverse:last change in order of execution for it introduced a new issue in content generated. https://github.com/unclecode/crawl4ai/issues/902 Aravind Karnam 2025-04-19 19:46:20 +05:30
  • 16b2318242 feat(api): implement crawler pool manager for improved resource handling UncleCode 2025-04-18 22:26:24 +08:00
  • 907cba194f Merge branch 'next-stress' into next UncleCode 2025-04-17 22:34:43 +08:00
  • 3bf78ff47a refactor(docker-demo): enhance error handling and output formatting UncleCode 2025-04-17 22:32:58 +08:00
  • 921e0c46b6 feat(tests): implement high volume stress testing framework UncleCode 2025-04-17 22:31:51 +08:00
  • fd899f66aa Merge branch 'next-fix-markdown-source' into next UncleCode 2025-04-17 20:16:15 +08:00
  • 30ec4f571f feat(docs): add comprehensive Docker API demo script UncleCode 2025-04-17 20:16:11 +08:00
  • 7db6b468d9 feat(markdown): add content source selection for markdown generation UncleCode 2025-04-17 20:13:53 +08:00
  • 0886153d6a fix(async_playwright_crawler): improve segment handling and viewport adjustments during screenshot capture (Fixed bug: Capturing Screenshot Twice and Increasing Image Size) ntohidi 2025-04-17 12:48:11 +02:00
  • 0ec3c4a788 fix(crawler): handle navigation aborts during file downloads in AsyncPlaywrightCrawlerStrategy ntohidi 2025-04-17 12:11:12 +02:00
  • eed7f88f29 Merge branch 'next' into 2025-MAR-ALPHA-1 Aravind Karnam 2025-04-17 10:50:02 +05:30
  • 94d486579c docs(tests): clarify server URL comments in deep crawl tests UncleCode 2025-04-15 22:32:27 +08:00
  • 5206c6f2d6 Modify the test file UncleCode 2025-04-15 22:28:01 +08:00
  • 230f22da86 refactor(proxy): move ProxyConfig to async_configs and improve LLM token handling UncleCode 2025-04-15 22:27:18 +08:00
  • 05085b6e3d fix(requirements): add fake-useragent to requirements ntohidi 2025-04-15 13:05:19 +02:00
  • 793668a413 Remove parameter_updates.txt UncleCode 2025-04-14 23:05:24 +08:00
  • 82aa53aa59 Merge branch 'next-alpine-docker' into next UncleCode 2025-04-14 23:01:22 +08:00
  • cd7ff6f9c1 feat(docs): add AI assistant interface and code copy button next-alpine-docker UncleCode 2025-04-14 23:00:47 +08:00
  • c56974cf59 feat(docs): enhance documentation UI with ToC and GitHub stats UncleCode 2025-04-14 20:46:32 +08:00
  • 1f3b1251d0 docs(cli): add Crawl4AI CLI installation instructions to the CLI guide ntohidi 2025-04-14 12:16:31 +02:00
  • 7b9aabc64a fix(crawler): ensure max_pages limit is respected during batch processing in crawling strategies ntohidi 2025-04-14 12:11:22 +02:00
  • dcc265458c fix: Add a nominal wait time for remove overlay elements since it's already controllable through delay_before_return_html Aravind Karnam 2025-04-14 12:39:05 +05:30
  • ecec53a8c1 Docker tested on Windows machine. UncleCode 2025-04-13 20:14:41 +08:00
  • 7d8e81fb2e fix: fix target_elements, in a less invasive and more efficient way simply by changing order of execution :) https://github.com/unclecode/crawl4ai/issues/902 Aravind Karnam 2025-04-12 12:44:00 +05:30
  • 9fc5d315af fix: revert the old target_elms code in LXMLwebscraping strategy Aravind Karnam 2025-04-12 12:07:04 +05:30
  • d84508b4d5 fix: revert the old target_elms code in regular webscraping strategy Aravind Karnam 2025-04-12 12:05:17 +05:30
  • 022f5c9e25 Merged next branch Aravind Karnam 2025-04-12 10:47:02 +05:30
  • 3179d6ad0c fix(core): improve error handling and stability in core components UncleCode 2025-04-11 20:58:39 +08:00
  • b2f3cb0dfa WIP: logger migriate to rich wakaka6 2025-04-10 23:02:19 +08:00
  • 18e8227dfb feat(crawler): add console message capture functionality UncleCode 2025-04-10 23:26:09 +08:00
  • 7c358a1aee fix(browser): add null check for crawlerRunConfig.url UncleCode 2025-04-10 23:25:07 +08:00
  • 108b2a8bfb Fixed capturing console messages for case the url is the local file. Update docker configuration (work in progress) UncleCode 2025-04-10 23:22:38 +08:00
  • 66ac07b4f3 feat(crawler): add network request and console message capturing unclecode 2025-04-10 16:03:48 +08:00
  • a2061bf31e feat(crawler): add MHTML capture functionality UncleCode 2025-04-09 15:39:04 +08:00
  • 6f7ab9c927 fix: Revert changes to session management in AsyncHttpWebcrawler and solve the underlying issue by removing the session closure in finally block of session context. Aravind Karnam 2025-04-08 18:31:00 +05:30
  • 9038e9acbd Merge branch 'main' into next UncleCode 2025-04-08 17:43:42 +08:00
  • 02e627e0bd fix(crawler): simplify page retrieval logic in AsyncPlaywrightCrawlerStrategy UncleCode 2025-04-08 17:43:36 +08:00
  • 72d8e679ad feat(pipeline): add high-level Crawler utility class for simplified web crawling next-2-batch-crawl UncleCode 2025-04-07 22:50:44 +08:00
  • 67a790b4a6 Add test file for Pipeline batch crawl. UncleCode 2025-04-06 19:38:31 +08:00
  • 5b66208a7e Refactor next branch UncleCode 2025-04-06 18:33:09 +08:00
  • d95b2dc9f2 Some refactoring, movie pipelin submodule folder into the main. UncleCode 2025-04-06 18:28:28 +08:00
  • 591f55edc7 refactor(browser): rename methods and update type hints in BrowserHub for clarity UncleCode 2025-04-06 18:22:05 +08:00
  • e1d9e2489c refactor(docs): update import statement in quickstart.py for improved clarity UncleCode 2025-04-05 23:12:06 +08:00
  • b1693b1c21 Remove old quickstart files UncleCode 2025-04-05 23:10:25 +08:00
  • 49d904ca0a refactor(docs): enhance quickstart_examples.py with improved configuration and file handling UncleCode 2025-04-05 22:57:45 +08:00
  • ca9351252a refactor(docs): update import paths and clean up example code in quickstart_examples.py UncleCode 2025-04-05 22:55:56 +08:00
  • 935d9d39f8 Add quickstart example set UncleCode 2025-04-05 21:37:25 +08:00
  • f8213c32b9 Merge branch 'vr0.5.0.post8' UncleCode 2025-04-05 21:36:17 +08:00
  • 14894b4d70 feat(config): set DefaultMarkdownGenerator as the default markdown generator in CrawlerRunConfig feat(logger): add color mapping for log message formatting options UncleCode 2025-04-03 20:34:19 +08:00