Document scroll_delay parameter for full-page screenshot crawling

This commit is contained in:
Ahmed-tawfik94
2026-02-25 06:52:59 +03:00
parent cd81e3cd19
commit 9cfeb4626d
4 changed files with 15 additions and 2 deletions

View File

@@ -56,6 +56,18 @@ if __name__ == "__main__":
- If `screenshot=True`, and a PDF is already available, it directly converts the first page of that PDF to an image for you—no repeated loading or scrolling.
- Finally, you get your PDF and/or screenshot ready to use.
**Controlling scroll speed for full-page screenshots:**
When a page is taller than `screenshot_height_threshold` (default ~20,000px) and no PDF is available, Crawl4AI scrolls through the page to capture a stitched full-page screenshot. Use `scroll_delay` to control the pause between scroll steps:
```python
config = CrawlerRunConfig(
screenshot=True,
scroll_delay=0.5, # Wait 0.5s between scroll steps (default: 0.2)
)
```
This is particularly useful for pages with lazy-loaded images or animations that need time to render during scrolling.
---
## Viewport-Only Screenshots

View File

@@ -112,6 +112,7 @@ if __name__ == "__main__":
**Relevant Parameters**
- **`pdf=True`**: Exports the current page as a PDF (base64-encoded in `result.pdf`).
- **`screenshot=True`**: Creates a screenshot (base64-encoded in `result.screenshot`).
- **`scroll_delay`**: Controls the delay (seconds) between scroll steps when taking a full-page screenshot of a tall page. Defaults to `0.2`. Increase for pages with slow-loading assets.
- **`scan_full_page`** or advanced hooking can further refine how the crawler captures content.
---

View File

@@ -158,7 +158,7 @@ Use these for controlling whether you read or write from a local content cache.
| **`js_only`** | `bool` (False) | If `True`, indicates we're reusing an existing session and only applying JS. No full reload. |
| **`ignore_body_visibility`** | `bool` (True) | Skip checking if `<body>` is visible. Usually best to keep `True`. |
| **`scan_full_page`** | `bool` (False) | If `True`, auto-scroll the page to load dynamic content (infinite scroll). |
| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps if `scan_full_page=True`. |
| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps when scanning the full page (`scan_full_page=True`) or capturing full-page screenshots. |
| **`max_scroll_steps`** | `int or None` (None) | Maximum number of scroll steps during full page scan. If None, scrolls until entire page is loaded. |
| **`process_iframes`** | `bool` (False) | Inlines iframe content for single-page extraction. |
| **`flatten_shadow_dom`** | `bool` (False) | Flattens Shadow DOM content into the light DOM before HTML capture. Resolves slots, strips shadow-scoped styles, and force-opens closed shadow roots. Essential for sites built with Web Components (Stencil, Lit, Shoelace, etc.). |

View File

@@ -1786,7 +1786,7 @@ run_cfg = CrawlerRunConfig(
| **`js_only`** | `bool` (False) | If `True`, indicates we're reusing an existing session and only applying JS. No full reload. |
| **`ignore_body_visibility`** | `bool` (True) | Skip checking if `<body>` is visible. Usually best to keep `True`. |
| **`scan_full_page`** | `bool` (False) | If `True`, auto-scroll the page to load dynamic content (infinite scroll). |
| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps if `scan_full_page=True`. |
| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps when scanning the full page (`scan_full_page=True`) or capturing full-page screenshots. |
| **`process_iframes`** | `bool` (False) | Inlines iframe content for single-page extraction. |
| **`flatten_shadow_dom`** | `bool` (False) | Flattens Shadow DOM content into the light DOM before HTML capture. Resolves slots, strips shadow-scoped styles, and force-opens closed shadow roots. Essential for sites built with Web Components. |
| **`remove_overlay_elements`** | `bool` (False) | Removes potential modals/popups blocking the main content. |