mirror of
https://github.com/unclecode/crawl4ai.git
synced 2026-06-10 15:58:15 +00:00
Commit Graph
Select branches
Hide Pull Requests
0.3.5
0.3.6
0.3.7
0.3.72
0.3.73
0.3.74
0.3.742
0.3.743
0.3.744
0.3.745
0.3.75
0.4.0
0.4.1
0.4.2
2025-JUN-1
add-claude-github-actions-1759553116682
bug/proxy_config
bugfix/arun-many-cdp-managed-browser
claude/fix-update-pyopenssl-security-011CUPexU25DkNvoxfu5ZrnB
claude/implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp
coderabbitai/docstrings/14vTVzYa3bH06l5wYNY9jTghrrj9FxxWL
codex/add-httpx-and-https-http2]-packages
codex/add-memory_wait_timeout-parameter-to-memoryadaptivedispatche
codex/add-use_stemming-parameter-to-bm25contentfiler
codex/add-vnc-streaming-endpoint-to-docker-server
codex/find-and-fix-a-bug
codex/fix-indexerror-in-browser-manager-py-with-use-managed-browse
copilot/modify-page-creation-and-logging
deploy
develop
devin/1748137705-fix-bm25contentfilter-docs
docker-test
docker/add_features
docker/base_config_overrides
docker/fix_sig
docs
docs-llm-strategies-update
docs-proxy-security
extract-media
feat/ahmed_dev
feat/follow-frameset
feat/undetected-browser
feature/agent-oai
feature/async-llm-extaction
feature/c4a-script
feature/configHealthMonitor
feature/content-filter
feature/content-filter-nasrin-1
feature/docker-cluster
feature/docker-hooks
feature/docker-llm-parameters
feature/marketplace-sponsor-logo
feature/nasrin-cli-deep-crawl
feature/scraper
feature/scraping-strategy
feature/telemetry
fix-async-url-seeder-redirect-verification
fix-cors-disable-web-security
fix/adaptive-crawler-llm-config
fix/arun-return-type-1898
fix/async-llm-extraction-arunMany
fix/batch-easy-issues-10
fix/bedrock-provider-prefix
fix/case_senstive_params
fix/cdp
fix/configurable-backoff
fix/deep-crawl-scoring
fix/deep-crawl-scoring-priority
fix/deep-crawl-stream-docker
fix/deep-crawl-streaming-contextvar-1917
fix/deprecated_pydantic
fix/deserialize-schema-type-false-positive
fix/dfs_deep_crawling
fix/docker
fix/docker-filter
fix/docker-jwt
fix/docker-llmEnvFile
fix/exit_with_q
fix/https-reditrect
fix/issue-1748-screenshot-scroll-delay
fix/issue-1776-adaptive-external-filter
fix/json-infinity-serialization
fix/linkPreviewScoring
fix/marketplace
fix/mcp-crawler-config-passthrough
fix/mcp-ensure-ascii-cjk-encoding
fix/n-playwright-stealth
fix/nlp-sentence-chunking-1909
fix/playwright-stealth
fix/preserve-tail-text-1938
fix/proxy_deprecation
fix/rate-limiter-burst-and-headers-1095
fix/relative_url
fix/release-notes-demo-code
fix/request-crawl-stream
fix/sandbox-escape-allowlist-attrs
fix/serialize-proxy-config
fix/sitemap_seeder
fix/timeline-deadlock-shared-lock-1754
fix/viewport_in_managed_browser
format-inline-tags
hooks
image-description
image-filterizer
implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp
integrate-verified-prs
main
main-0.3.7
main-1
main-75
main-img-captionify
main-v0.2.72
merge-pr971
new-release-0.0.2
new-release-0.0.2-no-spacy
next
next-2-batch-crawl
next-JUN
next-MAY
next-alpine-docker
next-browser-farm
patch/generate_schema
pdf_processing
proxy-support
pull-84
release/v0.7.0
release/v0.7.1
release/v0.7.2
release/v0.7.3
release/v0.7.4
release/v0.7.5
release/v0.7.6
release/v0.7.7
release/v0.7.8
release/v0.8.0
release/v0.8.5
release/v0.8.7
release/v0.8.8
release/v0.8.9
run-many-deep-crawling
scraper-uc
scrapper
sponsors/thor_data
ssh-server
staging
unclecode-patch-1
unclecode-patch-2
unclecode-patch-3
unclecode-patch-4
unclecode-patch-5
unclecode-patch-6
unclecode-patch-7
unclecode-patch-8
unclecode/issue157
unclecode/issue167
v0.2.74
v0.2.76
v0.4.24
v0.4.241
v0.4.242
v0.4.243
v0.5.5
vr0.4.244
vr0.4.245
vr0.4.246
vr0.4.267
vr0.4.3b1
vr0.4.3b2
vr0.4.3b3
vr0.5.0.post1
vr0.5.0.post5
#1004
#1030
#1054
#1058
#1059
#1060
#1062
#1065
#1068
#1073
#1074
#1077
#1078
#108
#1081
#1083
#1085
#1085
#109
#1090
#1093
#1094
#1098
#1100
#1102
#1104
#1106
#1107
#1108
#1110
#1113
#1122
#1123
#1124
#1124
#1133
#1137
#1140
#1145
#1152
#1155
#1155
#1156
#1157
#1159
#1161
#1170
#1175
#1179
#1180
#1184
#1186
#119
#1192
#1193
#1195
#1200
#1207
#1208
#1209
#1210
#1211
#1212
#1214
#1220
#1223
#1225
#1232
#1234
#1238
#1239
#1245
#1249
#125
#1255
#1263
#1265
#1266
#1267
#1272
#1274
#128
#1281
#1282
#1285
#1289
#1289
#129
#1290
#1296
#13
#1303
#1304
#1305
#1307
#1308
#1313
#1319
#1334
#1334
#1336
#1337
#1339
#134
#135
#1351
#1356
#1358
#1361
#1364
#1366
#1368
#1369
#1371
#1372
#1373
#1376
#1378
#1381
#1383
#1384
#1386
#1387
#1388
#1389
#139
#1390
#1393
#1395
#1398
#1399
#14
#1402
#1408
#1413
#1416
#1417
#1420
#1422
#1425
#1426
#1432
#1433
#1435
#1436
#1440
#1441
#1444
#1447
#1448
#1450
#1451
#1454
#1463
#1464
#1465
#1467
#1469
#1470
#1471
#1478
#1482
#1483
#1486
#1488
#149
#1494
#1495
#1496
#1497
#1501
#1508
#1513
#1514
#1518
#1519
#1525
#1527
#1528
#1529
#1530
#1531
#1532
#1533
#1533
#1535
#1536
#1537
#1539
#1546
#1547
#1548
#1550
#1554
#1555
#1556
#1557
#1558
#1560
#1565
#1568
#1569
#1570
#1572
#1576
#158
#1580
#1588
#1589
#1590
#1592
#1595
#1596
#1597
#1598
#1599
#1600
#1605
#1607
#1609
#1612
#1613
#1617
#1617
#1619
#1620
#1622
#1623
#1624
#1628
#1630
#1633
#1637
#1640
#1641
#1643
#1645
#1648
#1650
#1653
#1655
#1661
#1662
#1667
#1668
#1674
#1676
#1677
#1681
#1683
#1685
#1689
#169
#1694
#1696
#1697
#1698
#1700
#1702
#1703
#1706
#1707
#1710
#1712
#1713
#1714
#1715
#1716
#1717
#1718
#1719
#172
#1720
#1721
#1722
#1723
#1724
#1729
#1730
#1733
#1734
#1744
#1746
#1752
#1755
#1756
#1756
#1759
#176
#1760
#1761
#1763
#1764
#1765
#1766
#1768
#1770
#1771
#1772
#1773
#1774
#1775
#1777
#1778
#1782
#1783
#1784
#1785
#1786
#1787
#1788
#1789
#1790
#1791
#1792
#1793
#1794
#1795
#1796
#1798
#1803
#1804
#1805
#1806
#1807
#1807
#1808
#1808
#1809
#1809
#1810
#1810
#1811
#1811
#1812
#1812
#1813
#1814
#1814
#1816
#1816
#1822
#1822
#1823
#1824
#1826
#1827
#1828
#1829
#1830
#1831
#1832
#1833
#1834
#1835
#1835
#1836
#1838
#1838
#1840
#1840
#1844
#1845
#1846
#1847
#1847
#1849
#1851
#1852
#1853
#1853
#1854
#1854
#1855
#1856
#1856
#1857
#1857
#1858
#1858
#1859
#1859
#1860
#1860
#1861
#1861
#1862
#1862
#1866
#1866
#1868
#1868
#1869
#1869
#1870
#1870
#1871
#1871
#1873
#1873
#1874
#1874
#1875
#1875
#1876
#1876
#1877
#1879
#1881
#1881
#1882
#1884
#1884
#1885
#1886
#1887
#1887
#1891
#1891
#1892
#1892
#1893
#1893
#1895
#1895
#1896
#1896
#1897
#1899
#1899
#1901
#1902
#1902
#1904
#1904
#1906
#1906
#1907
#1908
#1908
#1910
#1911
#1913
#1914
#1915
#1915
#1922
#1923
#1923
#1925
#1929
#1931
#1932
#1932
#1933
#1934
#1935
#1935
#1936
#1937
#1939
#194
#1940
#1941
#1941
#1943
#1944
#1944
#1946
#1946
#1947
#1951
#1952
#1953
#1955
#1955
#1957
#1957
#1960
#1965
#1965
#1967
#1969
#1970
#1970
#1971
#1975
#1976
#1977
#1977
#1978
#1979
#1981
#1983
#1983
#1984
#1984
#1985
#1985
#1986
#1986
#1987
#1987
#1988
#1988
#1989
#1990
#1991
#1991
#1993
#1993
#1994
#1994
#1995
#1995
#1997
#1997
#200
#2001
#2001
#2003
#2003
#2004
#2004
#2005
#2005
#2008
#2008
#2009
#2009
#215
#218
#229
#232
#234
#24
#249
#255
#269
#271
#279
#286
#288
#293
#294
#298
#299
#3
#300
#304
#312
#313
#314
#324
#33
#332
#335
#337
#34
#357
#358
#369
#37
#379
#387
#389
#390
#394
#403
#410
#411
#416
#419
#419
#427
#440
#444
#445
#458
#462
#465
#472
#475
#496
#510
#562
#581
#60
#605
#606
#609
#612
#617
#618
#622
#64
#640
#65
#657
#658
#66
#662
#671
#679
#680
#681
#685
#687
#706
#708
#723
#724
#729
#734
#741
#749
#75
#752
#754
#775
#776
#777
#788
#792
#799
#80
#800
#806
#808
#821
#84
#84
#846
#85
#864
#865
#868
#891
#899
#901
#903
#914
#915
#916
#918
#929
#93
#931
#945
#948
#95
#961
#967
#969
#970
#971
#973
#977
#983
#988
#988
#990
#994
#999
0.3.4
checkpoint-pre-antibot-fallback
docker-rebuild-v0.7.5
docker-rebuild-v0.7.6
docker-rebuild-v0.7.7
docker-rebuild-v0.7.8
docker-rebuild-v0.8.0
docker-rebuild-v0.8.5
docker-rebuild-v0.8.6
docker-rebuild-v0.8.7
docker-rebuild-v0.8.8
docker-rebuild-v0.8.9
v.3.72
v0.0.75
v0.1.0
v0.2.0
v0.2.1
v0.2.2
v0.2.4
v0.2.6
v0.2.7
v0.2.71
v0.2.72
v0.2.73
v0.2.74
v0.2.77
v0.3.0
v0.3.3
v0.3.6
v0.3.745
v0.3.746
v0.4.24
v0.4.243
v0.5.0.post1
v0.6.3
v0.7.0
v0.7.1
v0.7.2
v0.7.3
v0.7.4
v0.7.5
v0.7.6
v0.7.7
v0.7.8
v0.8.0
v0.8.5
v0.8.6
v0.8.7
v0.8.8
v0.8.9
vr0.6.0
vr0.6.0rc1
vr0.6.3
Select branches
Hide Pull Requests
0.3.5
0.3.6
0.3.7
0.3.72
0.3.73
0.3.74
0.3.742
0.3.743
0.3.744
0.3.745
0.3.75
0.4.0
0.4.1
0.4.2
2025-JUN-1
add-claude-github-actions-1759553116682
bug/proxy_config
bugfix/arun-many-cdp-managed-browser
claude/fix-update-pyopenssl-security-011CUPexU25DkNvoxfu5ZrnB
claude/implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp
coderabbitai/docstrings/14vTVzYa3bH06l5wYNY9jTghrrj9FxxWL
codex/add-httpx-and-https-http2]-packages
codex/add-memory_wait_timeout-parameter-to-memoryadaptivedispatche
codex/add-use_stemming-parameter-to-bm25contentfiler
codex/add-vnc-streaming-endpoint-to-docker-server
codex/find-and-fix-a-bug
codex/fix-indexerror-in-browser-manager-py-with-use-managed-browse
copilot/modify-page-creation-and-logging
deploy
develop
devin/1748137705-fix-bm25contentfilter-docs
docker-test
docker/add_features
docker/base_config_overrides
docker/fix_sig
docs
docs-llm-strategies-update
docs-proxy-security
extract-media
feat/ahmed_dev
feat/follow-frameset
feat/undetected-browser
feature/agent-oai
feature/async-llm-extaction
feature/c4a-script
feature/configHealthMonitor
feature/content-filter
feature/content-filter-nasrin-1
feature/docker-cluster
feature/docker-hooks
feature/docker-llm-parameters
feature/marketplace-sponsor-logo
feature/nasrin-cli-deep-crawl
feature/scraper
feature/scraping-strategy
feature/telemetry
fix-async-url-seeder-redirect-verification
fix-cors-disable-web-security
fix/adaptive-crawler-llm-config
fix/arun-return-type-1898
fix/async-llm-extraction-arunMany
fix/batch-easy-issues-10
fix/bedrock-provider-prefix
fix/case_senstive_params
fix/cdp
fix/configurable-backoff
fix/deep-crawl-scoring
fix/deep-crawl-scoring-priority
fix/deep-crawl-stream-docker
fix/deep-crawl-streaming-contextvar-1917
fix/deprecated_pydantic
fix/deserialize-schema-type-false-positive
fix/dfs_deep_crawling
fix/docker
fix/docker-filter
fix/docker-jwt
fix/docker-llmEnvFile
fix/exit_with_q
fix/https-reditrect
fix/issue-1748-screenshot-scroll-delay
fix/issue-1776-adaptive-external-filter
fix/json-infinity-serialization
fix/linkPreviewScoring
fix/marketplace
fix/mcp-crawler-config-passthrough
fix/mcp-ensure-ascii-cjk-encoding
fix/n-playwright-stealth
fix/nlp-sentence-chunking-1909
fix/playwright-stealth
fix/preserve-tail-text-1938
fix/proxy_deprecation
fix/rate-limiter-burst-and-headers-1095
fix/relative_url
fix/release-notes-demo-code
fix/request-crawl-stream
fix/sandbox-escape-allowlist-attrs
fix/serialize-proxy-config
fix/sitemap_seeder
fix/timeline-deadlock-shared-lock-1754
fix/viewport_in_managed_browser
format-inline-tags
hooks
image-description
image-filterizer
implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp
integrate-verified-prs
main
main-0.3.7
main-1
main-75
main-img-captionify
main-v0.2.72
merge-pr971
new-release-0.0.2
new-release-0.0.2-no-spacy
next
next-2-batch-crawl
next-JUN
next-MAY
next-alpine-docker
next-browser-farm
patch/generate_schema
pdf_processing
proxy-support
pull-84
release/v0.7.0
release/v0.7.1
release/v0.7.2
release/v0.7.3
release/v0.7.4
release/v0.7.5
release/v0.7.6
release/v0.7.7
release/v0.7.8
release/v0.8.0
release/v0.8.5
release/v0.8.7
release/v0.8.8
release/v0.8.9
run-many-deep-crawling
scraper-uc
scrapper
sponsors/thor_data
ssh-server
staging
unclecode-patch-1
unclecode-patch-2
unclecode-patch-3
unclecode-patch-4
unclecode-patch-5
unclecode-patch-6
unclecode-patch-7
unclecode-patch-8
unclecode/issue157
unclecode/issue167
v0.2.74
v0.2.76
v0.4.24
v0.4.241
v0.4.242
v0.4.243
v0.5.5
vr0.4.244
vr0.4.245
vr0.4.246
vr0.4.267
vr0.4.3b1
vr0.4.3b2
vr0.4.3b3
vr0.5.0.post1
vr0.5.0.post5
#1004
#1030
#1054
#1058
#1059
#1060
#1062
#1065
#1068
#1073
#1074
#1077
#1078
#108
#1081
#1083
#1085
#1085
#109
#1090
#1093
#1094
#1098
#1100
#1102
#1104
#1106
#1107
#1108
#1110
#1113
#1122
#1123
#1124
#1124
#1133
#1137
#1140
#1145
#1152
#1155
#1155
#1156
#1157
#1159
#1161
#1170
#1175
#1179
#1180
#1184
#1186
#119
#1192
#1193
#1195
#1200
#1207
#1208
#1209
#1210
#1211
#1212
#1214
#1220
#1223
#1225
#1232
#1234
#1238
#1239
#1245
#1249
#125
#1255
#1263
#1265
#1266
#1267
#1272
#1274
#128
#1281
#1282
#1285
#1289
#1289
#129
#1290
#1296
#13
#1303
#1304
#1305
#1307
#1308
#1313
#1319
#1334
#1334
#1336
#1337
#1339
#134
#135
#1351
#1356
#1358
#1361
#1364
#1366
#1368
#1369
#1371
#1372
#1373
#1376
#1378
#1381
#1383
#1384
#1386
#1387
#1388
#1389
#139
#1390
#1393
#1395
#1398
#1399
#14
#1402
#1408
#1413
#1416
#1417
#1420
#1422
#1425
#1426
#1432
#1433
#1435
#1436
#1440
#1441
#1444
#1447
#1448
#1450
#1451
#1454
#1463
#1464
#1465
#1467
#1469
#1470
#1471
#1478
#1482
#1483
#1486
#1488
#149
#1494
#1495
#1496
#1497
#1501
#1508
#1513
#1514
#1518
#1519
#1525
#1527
#1528
#1529
#1530
#1531
#1532
#1533
#1533
#1535
#1536
#1537
#1539
#1546
#1547
#1548
#1550
#1554
#1555
#1556
#1557
#1558
#1560
#1565
#1568
#1569
#1570
#1572
#1576
#158
#1580
#1588
#1589
#1590
#1592
#1595
#1596
#1597
#1598
#1599
#1600
#1605
#1607
#1609
#1612
#1613
#1617
#1617
#1619
#1620
#1622
#1623
#1624
#1628
#1630
#1633
#1637
#1640
#1641
#1643
#1645
#1648
#1650
#1653
#1655
#1661
#1662
#1667
#1668
#1674
#1676
#1677
#1681
#1683
#1685
#1689
#169
#1694
#1696
#1697
#1698
#1700
#1702
#1703
#1706
#1707
#1710
#1712
#1713
#1714
#1715
#1716
#1717
#1718
#1719
#172
#1720
#1721
#1722
#1723
#1724
#1729
#1730
#1733
#1734
#1744
#1746
#1752
#1755
#1756
#1756
#1759
#176
#1760
#1761
#1763
#1764
#1765
#1766
#1768
#1770
#1771
#1772
#1773
#1774
#1775
#1777
#1778
#1782
#1783
#1784
#1785
#1786
#1787
#1788
#1789
#1790
#1791
#1792
#1793
#1794
#1795
#1796
#1798
#1803
#1804
#1805
#1806
#1807
#1807
#1808
#1808
#1809
#1809
#1810
#1810
#1811
#1811
#1812
#1812
#1813
#1814
#1814
#1816
#1816
#1822
#1822
#1823
#1824
#1826
#1827
#1828
#1829
#1830
#1831
#1832
#1833
#1834
#1835
#1835
#1836
#1838
#1838
#1840
#1840
#1844
#1845
#1846
#1847
#1847
#1849
#1851
#1852
#1853
#1853
#1854
#1854
#1855
#1856
#1856
#1857
#1857
#1858
#1858
#1859
#1859
#1860
#1860
#1861
#1861
#1862
#1862
#1866
#1866
#1868
#1868
#1869
#1869
#1870
#1870
#1871
#1871
#1873
#1873
#1874
#1874
#1875
#1875
#1876
#1876
#1877
#1879
#1881
#1881
#1882
#1884
#1884
#1885
#1886
#1887
#1887
#1891
#1891
#1892
#1892
#1893
#1893
#1895
#1895
#1896
#1896
#1897
#1899
#1899
#1901
#1902
#1902
#1904
#1904
#1906
#1906
#1907
#1908
#1908
#1910
#1911
#1913
#1914
#1915
#1915
#1922
#1923
#1923
#1925
#1929
#1931
#1932
#1932
#1933
#1934
#1935
#1935
#1936
#1937
#1939
#194
#1940
#1941
#1941
#1943
#1944
#1944
#1946
#1946
#1947
#1951
#1952
#1953
#1955
#1955
#1957
#1957
#1960
#1965
#1965
#1967
#1969
#1970
#1970
#1971
#1975
#1976
#1977
#1977
#1978
#1979
#1981
#1983
#1983
#1984
#1984
#1985
#1985
#1986
#1986
#1987
#1987
#1988
#1988
#1989
#1990
#1991
#1991
#1993
#1993
#1994
#1994
#1995
#1995
#1997
#1997
#200
#2001
#2001
#2003
#2003
#2004
#2004
#2005
#2005
#2008
#2008
#2009
#2009
#215
#218
#229
#232
#234
#24
#249
#255
#269
#271
#279
#286
#288
#293
#294
#298
#299
#3
#300
#304
#312
#313
#314
#324
#33
#332
#335
#337
#34
#357
#358
#369
#37
#379
#387
#389
#390
#394
#403
#410
#411
#416
#419
#419
#427
#440
#444
#445
#458
#462
#465
#472
#475
#496
#510
#562
#581
#60
#605
#606
#609
#612
#617
#618
#622
#64
#640
#65
#657
#658
#66
#662
#671
#679
#680
#681
#685
#687
#706
#708
#723
#724
#729
#734
#741
#749
#75
#752
#754
#775
#776
#777
#788
#792
#799
#80
#800
#806
#808
#821
#84
#84
#846
#85
#864
#865
#868
#891
#899
#901
#903
#914
#915
#916
#918
#929
#93
#931
#945
#948
#95
#961
#967
#969
#970
#971
#973
#977
#983
#988
#988
#990
#994
#999
0.3.4
checkpoint-pre-antibot-fallback
docker-rebuild-v0.7.5
docker-rebuild-v0.7.6
docker-rebuild-v0.7.7
docker-rebuild-v0.7.8
docker-rebuild-v0.8.0
docker-rebuild-v0.8.5
docker-rebuild-v0.8.6
docker-rebuild-v0.8.7
docker-rebuild-v0.8.8
docker-rebuild-v0.8.9
v.3.72
v0.0.75
v0.1.0
v0.2.0
v0.2.1
v0.2.2
v0.2.4
v0.2.6
v0.2.7
v0.2.71
v0.2.72
v0.2.73
v0.2.74
v0.2.77
v0.3.0
v0.3.3
v0.3.6
v0.3.745
v0.3.746
v0.4.24
v0.4.243
v0.5.0.post1
v0.6.3
v0.7.0
v0.7.1
v0.7.2
v0.7.3
v0.7.4
v0.7.5
v0.7.6
v0.7.7
v0.7.8
v0.8.0
v0.8.5
v0.8.6
v0.8.7
v0.8.8
v0.8.9
vr0.6.0
vr0.6.0rc1
vr0.6.3
-
de43505ae4
feat: update version to 0.3.742
0.3.742
unclecode
2024-11-24 19:36:30 +08:00 -
d7c5b900b8
feat: add support for arm64 platform in Docker commands and update INSTALL_TYPE variable in docker-compose
unclecode
2024-11-24 19:35:53 +08:00 -
edad7b6a74
chore: remove Railway deployment configuration and related documentation
unclecode
2024-11-24 18:48:39 +08:00 -
829a1f7992
feat: update version to 0.3.741 and enhance content filtering with heuristic strategy. Fixing the issue that when the past HTML to BM25 content filter does not have any HTML elements.
UncleCode
2024-11-23 19:45:41 +08:00 -
d729aa7d5e
refactor: Add group ID to for images extracted from srcset.
UncleCode
2024-11-23 18:00:32 +08:00 -
2226ef53c8
fix: Exempting the start_url from can_process_url
Aravind Karnam
2024-11-23 14:59:14 +05:30 -
3d52b551f2
Merge pull request #8 from aravindkarnam/main
aravind
2024-11-23 13:57:36 +05:30 -
f8e85b1499
Fixed a bug in _process_links, handled condition for when url_scorer is passed as None, renamed the scrapper folder to scraper.
Aravind Karnam
2024-11-23 13:52:34 +05:30 -
c1797037c0
Fixed a few bugs, import errors and changed to asyncio wait_for instead of timeout to support python versions < 3.11
Aravind Karnam
2024-11-23 12:39:25 +05:30 -
0d0cef3438
feat: add enhanced markdown generation example with citations and file output
UncleCode
2024-11-22 20:14:58 +08:00 -
d7a112fefe
Merge branch 'main' of https://github.com/unclecode/crawl4ai
UncleCode
2024-11-22 19:56:56 +08:00 -
a5decaa7cf
Merge branch '0.3.74'
UncleCode
2024-11-22 19:55:52 +08:00 -
8dea3f470f
chore: update README to include new features and improvements for version 0.3.74
0.3.74
UncleCode
2024-11-22 18:50:12 +08:00 -
e02935dc5b
chore: update README to reflect new features and improvements in version 0.3.74
UncleCode
2024-11-22 18:49:22 +08:00 -
24ad2fe2dd
feat: enhance Markdown generation to include fit_html attribute
UncleCode
2024-11-22 18:47:17 +08:00 -
571dda6549
Update Redme
UncleCode
2024-11-22 18:27:43 +08:00 -
006bee4a5a
feat: enhance image processing capabilities - Enhanced image processing with srcset support and validation checks for better image selection.
UncleCode
2024-11-22 16:00:17 +08:00 -
dbb751c8f0
In this commit, we introduce the new concept of MakrdownGenerationStrategy, which allows us to expand our future strategies to generate better markdown. Right now, we generate raw markdown as we were doing before. We have a new algorithm for fitting markdown based on BM25, and now we add the ability to refine markdown into a citation form. Our links will be extracted and replaced by a citation reference number, and then we will have reference sections at the very end; we add all the links with the descriptions. This format is more suitable for large language models. In case we don't need to pass links, we can reduce the size of the markdown significantly and also attach the list of references as a separate file to a large language model. This commit contains changes for this direction.
UncleCode
2024-11-21 18:21:43 +08:00 -
3439f7886d
fix: crawler strategy exception handling and fixes (#271)
程序员阿江(Relakkes)
2024-11-20 20:30:25 +08:00 -
d418a04602
Fix #260 prevent pass duplicated kwargs to scrapping_strategy (#269)
Darwing Medina
2024-11-20 04:52:11 -06:00 -
8179cae765
feat: adding test file to my branch
feature/content-filter-nasrin-1
feature/content-filter
ntohidikplay
2024-11-19 13:23:25 +01:00 -
fde35f644d
feat: adding test file to my branch
ntohidikplay
2024-11-19 13:02:52 +01:00 -
7047422e48
Merge branch '0.3.74' of https://github.com/unclecode/crawl4ai into 0.3.74
UncleCode
2024-11-19 19:33:08 +08:00 -
2bdec1fa5a
chore: add manage-collab.sh to .gitignore
UncleCode
2024-11-19 19:33:04 +08:00 -
b654c49e55
Update .gitignore to exclude additional scripts and files
UncleCode
2024-11-19 19:32:06 +08:00 -
f2cb7d506d
Delete test3.txt
UncleCode
2024-11-19 19:12:14 +08:00 -
a6dad3fc6d
test: trying to push to 0.3.74
ntohidikplay
2024-11-19 12:09:33 +01:00 -
fbcff85ecb
Remove test files
UncleCode
2024-11-19 19:03:23 +08:00 -
788c67c29a
Merge branch 'main' of https://github.com/unclecode/crawl4ai
UncleCode
2024-11-19 19:02:44 +08:00 -
2f19d38693
Update .gitignore to include .gitboss/ and todo_executor.md
UncleCode
2024-11-19 19:02:41 +08:00 -
3aae30ed2a
test1: trying to push to main
ntohidikplay
2024-11-19 11:57:07 +01:00 -
5eeb682719
Delete test.txt
unclecode-patch-1
UncleCode
2024-11-19 18:55:11 +08:00 -
593c7ad307
test: trying to push to main
ntohidikplay
2024-11-19 11:45:26 +01:00 -
73658c758a
chore: update .gitignore to include manage-collab.sh
UncleCode
2024-11-19 16:10:43 +08:00 -
b6af94cbbb
Merge remote-tracking branch 'origin/main' into 0.3.74
UncleCode
2024-11-18 21:15:04 +08:00 -
852729ff38
feat(docker): add Docker Compose configurations for local and hub deployment; enhance GPU support checks in Dockerfile feat(requirements): update requirements.txt to include snowballstemmer fix(version_manager): correct version parsing to use __version__.__version__ feat(main): introduce chunking strategy and content filter in CrawlRequest model feat(content_filter): enhance BM25 algorithm with priority tag scoring for improved content relevance feat(logger): implement new async logger engine replacing print statements throughout library fix(database): resolve version-related deadlock and circular lock issues in database operations docs(docker): expand Docker deployment documentation with usage instructions for Docker Compose
UncleCode
2024-11-18 21:00:06 +08:00 -
152ac35bc2
feat(docs): update README for version 0.3.74 with new features and improvements fix(version): update version number to 0.3.74 refactor(async_webcrawler): enhance logging and add domain-based request delay
UncleCode
2024-11-17 21:09:26 +08:00 -
df63a40606
feat(docs): update examples and documentation to replace bypass_cache with cache_mode for improved clarity
UncleCode
2024-11-17 19:44:45 +08:00 -
a59c107b23
Update changelog for 0.3.74
UncleCode
2024-11-17 18:42:43 +08:00 -
f9fe6f89fe
feat(database): implement version management and migration checks during initialization
UncleCode
2024-11-17 18:09:33 +08:00 -
2a82455b3d
feat(crawl): implement direct crawl functionality and introduce CacheMode for improved caching control
UncleCode
2024-11-17 17:17:34 +08:00 -
3a524a3bdd
fix(docs): remove unnecessary blank line in README for improved readability
UncleCode
2024-11-17 16:00:39 +08:00 -
3a66aa8a60
feat(cache): introduce CacheMode and CacheContext for enhanced caching behavior chore(requirements): add colorama dependency refactor(config): add SHOW_DEPRECATION_WARNINGS flag and clean up code fix(docs): update example scripts for clarity and consistency
UncleCode
2024-11-17 15:30:56 +08:00 -
4b45b28f25
feat(docs): enhance deployment documentation with one-click setup, API security details, and Docker Compose examples
UncleCode
2024-11-16 18:44:47 +08:00 -
9139ef3125
feat(docker): update Dockerfile for improved installation process and enhance deployment documentation with Docker Compose setup and API token security
UncleCode
2024-11-16 18:19:44 +08:00 -
6360d0545a
feat(api): add API token authentication and update Dockerfile description
UncleCode
2024-11-16 18:08:56 +08:00 -
1961adb530
refactor(docker): remove shared memory size configuration to streamline Dockerfile
UncleCode
2024-11-16 17:35:27 +08:00 -
79feab89c4
refactor(deploy): remove memory utilization alert configuration from deployment template
UncleCode
2024-11-16 17:28:42 +08:00 -
5d0b13294c
feat(deploy): change instance size to professional-xs and update memory utilization alert window to 300 seconds
UncleCode
2024-11-16 17:25:07 +08:00 -
67edc2d641
feat(deploy): update instance size to professional-xs and add memory utilization alert parameters
UncleCode
2024-11-16 17:23:32 +08:00 -
6b569cceb5
feat(deploy): update branch to 0.3.74 and change instance size to basic-xs
UncleCode
2024-11-16 17:21:45 +08:00 -
6f2fe5954f
feat(deploy): update instance size to professional-xs and add memory utilization alert
UncleCode
2024-11-16 17:12:41 +08:00 -
fca1319b7d
feat(docker): add MkDocs installation and build step for documentation
UncleCode
2024-11-16 17:10:30 +08:00 -
f77f06a3bd
feat(deploy): add deployment configuration and templates for crawl4ai
UncleCode
2024-11-16 16:43:31 +08:00 -
e62c807295
feat(deploy): add Railway deployment configuration and setup instructions
UncleCode
2024-11-16 16:38:13 +08:00 -
90df6921b7
feat(crawl_sync): add synchronous crawl endpoint and corresponding test
UncleCode
2024-11-16 15:34:30 +08:00 -
5098442086
refactor: migrate versioning to __version__.py and remove deprecated _version.py
UncleCode
2024-11-16 15:30:24 +08:00 -
d0014c6793
New async database manager and migration support - Introduced AsyncDatabaseManager for async DB management. - Added migration feature to transition to file-based storage. - Enhanced web crawler with improved caching logic. - Updated requirements and setup for async processing.
UncleCode
2024-11-16 14:54:41 +08:00 -
60670b2af6
Merge pull request #7 from aravindkarnam/main
aravind
2024-11-15 20:43:54 +05:30 -
ae7ebc0bd8
chore: update .gitignore and enhance changelog with major feature additions and examples
UncleCode
2024-11-15 20:16:13 +08:00 -
1f269f9834
test(content_filter): add comprehensive tests for BM25ContentFilter functionality
UncleCode
2024-11-15 18:11:11 +08:00 -
7f1ae5adcf
Update changelog
UncleCode
2024-11-14 22:51:51 +08:00 -
3d00fee6c2
- In this commit, the library is updated to process file downloads. Users can now specify a download folder and trigger the download process via JavaScript or other means, with all files being saved. The list of downloaded files will also be added to the crowd result object. - Another thing this commit introduces is the concept of the Relevance Content Filter. This is an improvement over Fit Markdown. This class of strategies aims to extract the main content from a given page - the part that really matters and is useful to be processed. One strategy has been created using the BM25 algorithm, which finds chunks of text from the web page relevant to its title, descriptions, and keywords, or supports a given user query and matches them. The result is then returned to the main engine to be converted to Markdown. Plans include adding approaches using language models as well. - The cache database was updated to hold information about response headers and downloaded files.
UncleCode
2024-11-14 22:50:59 +08:00 -
17913f5acf
feat(crawler): support local files and raw HTML input in AsyncWebCrawler
UncleCode
2024-11-13 20:00:29 +08:00 -
3a2cb7dacf
test: Add comprehensive unit tests for AsyncExecutor functionality
0.3.75
UncleCode
2024-11-13 19:46:05 +08:00 -
c38ac29edb
perf(crawler): major performance improvements & raw HTML support
UncleCode
2024-11-13 19:40:40 +08:00 -
38044d4afe
Merge pull request #255 from maheshpec/feature/configure-cache-directory
UncleCode
2024-11-13 09:43:29 +01:00 -
61b93ebf36
Update change log
UncleCode
2024-11-13 15:38:30 +08:00 -
bf91adf3f8
fix: Resolve unexpected BrowserContext closure during crawl in Docker
UncleCode
2024-11-13 15:37:16 +08:00 -
00026b5f8b
feat(config): Adding a configurable way of setting the cache directory for constrained environments
Mahesh
2024-11-12 14:52:51 -07:00 -
8c22396d8b
Merge pull request #234 from devatnull/patch-1
UncleCode
2024-11-12 08:37:14 +01:00 -
b6d6631b12
Enhance Async Crawler with Playwright support - Implemented new async crawler strategy using Playwright. - Introduced ManagedBrowser for better browser management. - Added support for persistent browser sessions and improved error handling. - Updated version from 0.3.73 to 0.3.731. - Enhanced logic in main.py for conditional mounting of static files. - Updated requirements to replace playwright_stealth with tf-playwright-stealth.
UncleCode
2024-11-12 12:10:58 +08:00 -
a098483cbb
Update Roadmap
UncleCode
2024-11-09 20:40:30 +08:00 -
f9a297e08d
Add Docker example script for testing Crawl4AI functionality
UncleCode
2024-11-08 19:39:05 +08:00 -
bcdd80911f
Remove some old files.
UncleCode
2024-11-08 19:08:58 +08:00 -
0d357ab7d2
feat(scraper): Enhance URL filtering and scoring systems
scraper-uc
UncleCode
2024-11-08 19:02:28 +08:00 -
bae4665949
feat(scraper): Enhance URL filtering and scoring systems
UncleCode
2024-11-08 18:45:12 +08:00 -
d11c004fbb
Enhanced BFS Strategy: Improved monitoring, resource management & configuration
UncleCode
2024-11-08 15:57:23 +08:00 -
b120965b6a
Fixed issues with the Manage Browser, including its inability to connect to the user directory and inability to create new pages within the Manage Browser context; all issues are now resolved.
UncleCode
2024-11-07 20:15:03 +08:00 -
16f918621f
Merge branch 'main' of https://github.com/unclecode/crawl4ai
UncleCode
2024-11-07 19:30:22 +08:00 -
f7574230a1
Update API server request object. text_docker file and Readme
UncleCode
2024-11-07 19:29:31 +08:00 -
3d1c9a8434
Revieweing the BFS strategy.
UncleCode
2024-11-07 18:54:53 +08:00 -
2879344d9c
Update README.md
devatnull
2024-11-06 17:36:46 +03:00 -
9f5eef1f38
Refactored the
CustomHTML2Textclass incontent_scrapping_strategy.pyto remove the handling logic for header tags (h1-h6), which are now commented out. This cleanup improves code readability and reduces maintenance overhead.
UncleCode
2024-11-06 21:50:09 +08:00 -
be472c624c
Refactored AsyncWebScraper to include comprehensive error handling and progress tracking capabilities. Introduced a ScrapingProgress data class to monitor processed and failed URLs. Enhanced scraping methods to log errors and track stats throughout the scraping process.
UncleCode
2024-11-06 21:09:47 +08:00 -
06b21dcc50
Update .gitignore to include new directories for issues and documentation
UncleCode
2024-11-06 18:44:03 +08:00 -
c5aa1bec18
Merge pull request #229 from bizrockman/main
UncleCode
2024-11-06 07:31:07 +01:00 -
0f0f60527d
Merge pull request #172 from aravindkarnam/scraper
UncleCode
2024-11-06 07:00:44 +01:00 -
11721eb0ce
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-11-05 13:02:59 +00:00 -
b51263664e
feat(api): add CORS support and static file serving, update root redirect
UncleCode
2024-11-05 21:02:47 +08:00 -
1222e456fb
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-11-05 12:58:30 +00:00 -
1e7db0d293
docs(README): update release notes for version 0.3.73 with new features and improvements
UncleCode
2024-11-05 20:12:20 +08:00 -
2a54f3c048
refactor(core): remove main_v0.py file and associated functionality
UncleCode
2024-11-05 20:11:07 +08:00 -
1c20b815b3
docs(README): update Docker usage instructions and add deployment options
UncleCode
2024-11-05 20:10:24 +08:00 -
43a2b26f63
Merge branch 'main' of https://github.com/unclecode/crawl4ai
UncleCode
2024-11-05 20:08:20 +08:00 -
3cf19a1bc2
chore(version): bump version to 0.3.73
0.3.73
UncleCode
2024-11-05 20:05:58 +08:00 -
67a23c3182
feat(core): Release v0.3.73 with Browser Takeover and Docker Support
UncleCode
2024-11-05 20:04:18 +08:00 -
796dbaf08c
Rename episode_11_3_Extraction_Strategies:_Cosine.md to episode_11_3_Extraction_Strategies_Cosine.md
bizrockman
2024-11-04 20:19:43 +01:00 -
3a3c88a2d0
Rename episode_11_2_Extraction_Strategies:_LLM.md to episode_11_2_Extraction_Strategies_LLM.md
bizrockman
2024-11-04 20:19:20 +01:00 -
870296fa7e
Rename episode_11_1_Extraction_Strategies:_JSON_CSS.md to episode_11_1_Extraction_Strategies_JSON_CSS.md
bizrockman
2024-11-04 20:18:58 +01:00