mirror of
https://github.com/unclecode/crawl4ai.git
synced 2026-06-12 00:38:00 +00:00
Document scroll_delay parameter for full-page screenshot crawling
This commit is contained in:
@@ -56,6 +56,18 @@ if __name__ == "__main__":
|
||||
- If `screenshot=True`, and a PDF is already available, it directly converts the first page of that PDF to an image for you—no repeated loading or scrolling.
|
||||
- Finally, you get your PDF and/or screenshot ready to use.
|
||||
|
||||
**Controlling scroll speed for full-page screenshots:**
|
||||
When a page is taller than `screenshot_height_threshold` (default ~20,000px) and no PDF is available, Crawl4AI scrolls through the page to capture a stitched full-page screenshot. Use `scroll_delay` to control the pause between scroll steps:
|
||||
|
||||
```python
|
||||
config = CrawlerRunConfig(
|
||||
screenshot=True,
|
||||
scroll_delay=0.5, # Wait 0.5s between scroll steps (default: 0.2)
|
||||
)
|
||||
```
|
||||
|
||||
This is particularly useful for pages with lazy-loaded images or animations that need time to render during scrolling.
|
||||
|
||||
---
|
||||
|
||||
## Viewport-Only Screenshots
|
||||
|
||||
@@ -112,6 +112,7 @@ if __name__ == "__main__":
|
||||
**Relevant Parameters**
|
||||
- **`pdf=True`**: Exports the current page as a PDF (base64-encoded in `result.pdf`).
|
||||
- **`screenshot=True`**: Creates a screenshot (base64-encoded in `result.screenshot`).
|
||||
- **`scroll_delay`**: Controls the delay (seconds) between scroll steps when taking a full-page screenshot of a tall page. Defaults to `0.2`. Increase for pages with slow-loading assets.
|
||||
- **`scan_full_page`** or advanced hooking can further refine how the crawler captures content.
|
||||
|
||||
---
|
||||
|
||||
@@ -158,7 +158,7 @@ Use these for controlling whether you read or write from a local content cache.
|
||||
| **`js_only`** | `bool` (False) | If `True`, indicates we're reusing an existing session and only applying JS. No full reload. |
|
||||
| **`ignore_body_visibility`** | `bool` (True) | Skip checking if `<body>` is visible. Usually best to keep `True`. |
|
||||
| **`scan_full_page`** | `bool` (False) | If `True`, auto-scroll the page to load dynamic content (infinite scroll). |
|
||||
| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps if `scan_full_page=True`. |
|
||||
| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps when scanning the full page (`scan_full_page=True`) or capturing full-page screenshots. |
|
||||
| **`max_scroll_steps`** | `int or None` (None) | Maximum number of scroll steps during full page scan. If None, scrolls until entire page is loaded. |
|
||||
| **`process_iframes`** | `bool` (False) | Inlines iframe content for single-page extraction. |
|
||||
| **`flatten_shadow_dom`** | `bool` (False) | Flattens Shadow DOM content into the light DOM before HTML capture. Resolves slots, strips shadow-scoped styles, and force-opens closed shadow roots. Essential for sites built with Web Components (Stencil, Lit, Shoelace, etc.). |
|
||||
|
||||
@@ -1786,7 +1786,7 @@ run_cfg = CrawlerRunConfig(
|
||||
| **`js_only`** | `bool` (False) | If `True`, indicates we're reusing an existing session and only applying JS. No full reload. |
|
||||
| **`ignore_body_visibility`** | `bool` (True) | Skip checking if `<body>` is visible. Usually best to keep `True`. |
|
||||
| **`scan_full_page`** | `bool` (False) | If `True`, auto-scroll the page to load dynamic content (infinite scroll). |
|
||||
| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps if `scan_full_page=True`. |
|
||||
| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps when scanning the full page (`scan_full_page=True`) or capturing full-page screenshots. |
|
||||
| **`process_iframes`** | `bool` (False) | Inlines iframe content for single-page extraction. |
|
||||
| **`flatten_shadow_dom`** | `bool` (False) | Flattens Shadow DOM content into the light DOM before HTML capture. Resolves slots, strips shadow-scoped styles, and force-opens closed shadow roots. Essential for sites built with Web Components. |
|
||||
| **`remove_overlay_elements`** | `bool` (False) | Removes potential modals/popups blocking the main content. |
|
||||
|
||||
Reference in New Issue
Block a user