Files
crawl4ai/docs/examples/adaptive_crawling
unclecode 3a75dd3f4c fix: batch fix for 10 open issues (#1520, #1489, #1374, #1424, #1183, #1354, #880, #1031, #1251, #1758)
- #1520: Preserve trailing slashes in URL normalization (RFC 3986 compliance)
- #1489: Preserve query parameter key casing in normalize_url
- #1374: Close NamedTemporaryFile handle before reopening (Windows fix)
- #1424: Fix CosineStrategy returning empty results (delimiter fallback + at_least_k >= 1)
- #1183: Fix extract_xml_data regex matching tag names in prose text
- #1354: Make import_knowledge_base async (fix asyncio.run in running loop)
- #880: Fix 404 sample_ecommerce.html gist URL in docs (6 occurrences)
- #1031: Make Docker playground code editor resizable with overflow-auto
- #1251: Add DEFAULT_CONFIG with deep-merge in load_config to prevent KeyError crashes
- #1758: Change screenshot stitching format from BMP to PNG
2026-03-07 09:47:38 +00:00
..
2025-10-22 20:41:06 +08:00

Adaptive Crawling Examples

This directory contains examples demonstrating various aspects of Crawl4AI's Adaptive Crawling feature.

Examples Overview

1. basic_usage.py

  • Simple introduction to adaptive crawling
  • Uses default statistical strategy
  • Shows how to get crawl statistics and relevant content

2. embedding_strategy.py NEW

  • Demonstrates the embedding-based strategy for semantic understanding
  • Shows query expansion and irrelevance detection
  • Includes configuration for both local and API-based embeddings

3. embedding_vs_statistical.py NEW

  • Direct comparison between statistical and embedding strategies
  • Helps you choose the right strategy for your use case
  • Shows performance and accuracy trade-offs

4. embedding_configuration.py NEW

  • Advanced configuration options for embedding strategy
  • Parameter tuning guide for different scenarios
  • Examples for research, exploration, and quality-focused crawling

5. advanced_configuration.py

  • Shows various configuration options for both strategies
  • Demonstrates threshold tuning and performance optimization

6. custom_strategies.py

  • How to implement your own crawling strategy
  • Extends the base CrawlStrategy class
  • Advanced use case for specialized requirements

7. export_import_kb.py

  • Export crawled knowledge base to JSONL
  • Import and continue crawling from saved state
  • Useful for building persistent knowledge bases

Quick Start

For your first adaptive crawling experience, run:

python basic_usage.py

To try the new embedding strategy with semantic understanding:

python embedding_strategy.py

To compare strategies and see which works best for your use case:

python embedding_vs_statistical.py

Strategy Selection Guide

Use Statistical Strategy (Default) When:

  • Working with technical documentation
  • Queries contain specific terms or code
  • Speed is critical
  • No API access available

Use Embedding Strategy When:

  • Queries are conceptual or ambiguous
  • Need semantic understanding beyond exact matches
  • Want to detect irrelevant content
  • Working with diverse content sources

Requirements

  • Crawl4AI installed
  • For embedding strategy with local models: sentence-transformers
  • For embedding strategy with OpenAI: Set OPENAI_API_KEY environment variable

Learn More