mirror of
https://github.com/unclecode/crawl4ai.git
synced 2026-06-12 00:38:00 +00:00
This commit is contained in:
@@ -25,7 +25,7 @@ We would like to thank the following people for their contributions to Crawl4AI:
|
||||
- [paulokuong](https://github.com/paulokuong) - fix: RAWL4_AI_BASE_DIRECTORY should be Path object instead of string [#298](https://github.com/unclecode/crawl4ai/pull/298)
|
||||
- [TheRedRad](https://github.com/theredrad) - feat: add force viewport screenshot option [#1694](https://github.com/unclecode/crawl4ai/pull/1694)
|
||||
- [ChiragBellara](https://github.com/ChiragBellara) - fix: avoid Common Crawl calls for sitemap-only URL seeding [#1746](https://github.com/unclecode/crawl4ai/pull/1746)
|
||||
- [YuriNachos](https://github.com/YuriNachos) - fix: replace tf-playwright-stealth with playwright-stealth [#1714](https://github.com/unclecode/crawl4ai/pull/1714), fix: respect `<base>` tag for relative link resolution [#1721](https://github.com/unclecode/crawl4ai/pull/1721), fix: include GoogleSearchCrawler script.js in package [#1719](https://github.com/unclecode/crawl4ai/pull/1719), fix: allow local embeddings by removing OpenAI fallback [#1717](https://github.com/unclecode/crawl4ai/pull/1717)
|
||||
- [YuriNachos](https://github.com/YuriNachos) - fix: replace tf-playwright-stealth with playwright-stealth [#1714](https://github.com/unclecode/crawl4ai/pull/1714), fix: respect `<base>` tag for relative link resolution [#1721](https://github.com/unclecode/crawl4ai/pull/1721), fix: include GoogleSearchCrawler script.js in package [#1719](https://github.com/unclecode/crawl4ai/pull/1719), fix: allow local embeddings by removing OpenAI fallback [#1717](https://github.com/unclecode/crawl4ai/pull/1717), docs: add missing CacheMode import [#1715](https://github.com/unclecode/crawl4ai/pull/1715), docs: fix return types to RunManyReturn [#1716](https://github.com/unclecode/crawl4ai/pull/1716)
|
||||
- [christian-oudard](https://github.com/christian-oudard) - fix: deep-crawl CLI outputting only the first page [#1667](https://github.com/unclecode/crawl4ai/pull/1667)
|
||||
- [vladmandic](https://github.com/vladmandic) - fix: VersionManager ignoring CRAWL4_AI_BASE_DIRECTORY env var [#1296](https://github.com/unclecode/crawl4ai/pull/1296)
|
||||
- [nnxiong](https://github.com/nnxiong) - fix: script tag removal losing adjacent text in cleaned_html [#1364](https://github.com/unclecode/crawl4ai/pull/1364)
|
||||
@@ -43,9 +43,12 @@ We would like to thank the following people for their contributions to Crawl4AI:
|
||||
- [nightcityblade](https://github.com/nightcityblade) - fix: prevent AdaptiveCrawler from crawling external domains [#1805](https://github.com/unclecode/crawl4ai/pull/1805)
|
||||
- [Otman404](https://github.com/Otman404) - fix: return in finally block silently suppressing exceptions in dispatcher [#1763](https://github.com/unclecode/crawl4ai/pull/1763)
|
||||
- [SohamKukreti](https://github.com/SohamKukreti) - fix: from_serializable_dict ignoring plain data dicts with "type" key [#1803](https://github.com/unclecode/crawl4ai/pull/1803)
|
||||
- [Br1an67](https://github.com/Br1an67) - fix: handle nested brackets and parentheses in LINK_PATTERN regex [#1790](https://github.com/unclecode/crawl4ai/pull/1790), identified: strip markdown fences in LLM JSON responses [#1787](https://github.com/unclecode/crawl4ai/pull/1787), fix: preserve class/id in cleaned_html [#1782](https://github.com/unclecode/crawl4ai/pull/1782), fix: guard against None LLM content [#1788](https://github.com/unclecode/crawl4ai/pull/1788), fix: strip port from domain in is_external_url [#1783](https://github.com/unclecode/crawl4ai/pull/1783)
|
||||
- [Br1an67](https://github.com/Br1an67) - fix: handle nested brackets and parentheses in LINK_PATTERN regex [#1790](https://github.com/unclecode/crawl4ai/pull/1790), identified: strip markdown fences in LLM JSON responses [#1787](https://github.com/unclecode/crawl4ai/pull/1787), fix: preserve class/id in cleaned_html [#1782](https://github.com/unclecode/crawl4ai/pull/1782), fix: guard against None LLM content [#1788](https://github.com/unclecode/crawl4ai/pull/1788), fix: strip port from domain in is_external_url [#1783](https://github.com/unclecode/crawl4ai/pull/1783), fix: UTF-8 encoding for CLI output [#1789](https://github.com/unclecode/crawl4ai/pull/1789), fix: configurable link_preview_timeout [#1793](https://github.com/unclecode/crawl4ai/pull/1793), fix: wait_for_images on screenshot endpoint [#1792](https://github.com/unclecode/crawl4ai/pull/1792), fix: cross-platform terminal input in CrawlerMonitor [#1794](https://github.com/unclecode/crawl4ai/pull/1794), fix: UnicodeEncodeError in URL seeder [#1784](https://github.com/unclecode/crawl4ai/pull/1784)
|
||||
- [nightcityblade](https://github.com/nightcityblade) - feat: add score_threshold to BestFirstCrawlingStrategy [#1804](https://github.com/unclecode/crawl4ai/pull/1804)
|
||||
- [phamngocquy](https://github.com/phamngocquy) - identified: raw HTML URL token leak [#1179](https://github.com/unclecode/crawl4ai/pull/1179)
|
||||
- [AkosLukacs](https://github.com/AkosLukacs) - docs: fix docstring param name crawler_config -> config [#1494](https://github.com/unclecode/crawl4ai/pull/1494)
|
||||
- [dominicx](https://github.com/dominicx) - docs: fix css_selector type from list to string [#1308](https://github.com/unclecode/crawl4ai/pull/1308)
|
||||
- [hoi](https://github.com/hoi) - fix: add TTL expiry for Redis task data [#1730](https://github.com/unclecode/crawl4ai/pull/1730)
|
||||
|
||||
#### Feb-Alpha-1
|
||||
- [sufianuddin](https://github.com/sufianuddin) - fix: [Documentation for JsonCssExtractionStrategy](https://github.com/unclecode/crawl4ai/issues/651)
|
||||
|
||||
Reference in New Issue
Block a user