diff --git a/docs/examples/full_page_screenshot_and_pdf_export.md b/docs/examples/full_page_screenshot_and_pdf_export.md index d34c9096..6d7c632d 100644 --- a/docs/examples/full_page_screenshot_and_pdf_export.md +++ b/docs/examples/full_page_screenshot_and_pdf_export.md @@ -56,6 +56,18 @@ if __name__ == "__main__": - If `screenshot=True`, and a PDF is already available, it directly converts the first page of that PDF to an image for you—no repeated loading or scrolling. - Finally, you get your PDF and/or screenshot ready to use. +**Controlling scroll speed for full-page screenshots:** +When a page is taller than `screenshot_height_threshold` (default ~20,000px) and no PDF is available, Crawl4AI scrolls through the page to capture a stitched full-page screenshot. Use `scroll_delay` to control the pause between scroll steps: + +```python +config = CrawlerRunConfig( + screenshot=True, + scroll_delay=0.5, # Wait 0.5s between scroll steps (default: 0.2) +) +``` + +This is particularly useful for pages with lazy-loaded images or animations that need time to render during scrolling. + --- ## Viewport-Only Screenshots diff --git a/docs/md_v2/advanced/advanced-features.md b/docs/md_v2/advanced/advanced-features.md index 211869c3..75ba1cce 100644 --- a/docs/md_v2/advanced/advanced-features.md +++ b/docs/md_v2/advanced/advanced-features.md @@ -112,6 +112,7 @@ if __name__ == "__main__": **Relevant Parameters** - **`pdf=True`**: Exports the current page as a PDF (base64-encoded in `result.pdf`). - **`screenshot=True`**: Creates a screenshot (base64-encoded in `result.screenshot`). +- **`scroll_delay`**: Controls the delay (seconds) between scroll steps when taking a full-page screenshot of a tall page. Defaults to `0.2`. Increase for pages with slow-loading assets. - **`scan_full_page`** or advanced hooking can further refine how the crawler captures content. --- diff --git a/docs/md_v2/api/parameters.md b/docs/md_v2/api/parameters.md index f361137a..064a2388 100644 --- a/docs/md_v2/api/parameters.md +++ b/docs/md_v2/api/parameters.md @@ -158,7 +158,7 @@ Use these for controlling whether you read or write from a local content cache. | **`js_only`** | `bool` (False) | If `True`, indicates we're reusing an existing session and only applying JS. No full reload. | | **`ignore_body_visibility`** | `bool` (True) | Skip checking if `` is visible. Usually best to keep `True`. | | **`scan_full_page`** | `bool` (False) | If `True`, auto-scroll the page to load dynamic content (infinite scroll). | -| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps if `scan_full_page=True`. | +| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps when scanning the full page (`scan_full_page=True`) or capturing full-page screenshots. | | **`max_scroll_steps`** | `int or None` (None) | Maximum number of scroll steps during full page scan. If None, scrolls until entire page is loaded. | | **`process_iframes`** | `bool` (False) | Inlines iframe content for single-page extraction. | | **`flatten_shadow_dom`** | `bool` (False) | Flattens Shadow DOM content into the light DOM before HTML capture. Resolves slots, strips shadow-scoped styles, and force-opens closed shadow roots. Essential for sites built with Web Components (Stencil, Lit, Shoelace, etc.). | diff --git a/docs/md_v2/complete-sdk-reference.md b/docs/md_v2/complete-sdk-reference.md index 3d639edc..e79b63b8 100644 --- a/docs/md_v2/complete-sdk-reference.md +++ b/docs/md_v2/complete-sdk-reference.md @@ -1786,7 +1786,7 @@ run_cfg = CrawlerRunConfig( | **`js_only`** | `bool` (False) | If `True`, indicates we're reusing an existing session and only applying JS. No full reload. | | **`ignore_body_visibility`** | `bool` (True) | Skip checking if `` is visible. Usually best to keep `True`. | | **`scan_full_page`** | `bool` (False) | If `True`, auto-scroll the page to load dynamic content (infinite scroll). | -| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps if `scan_full_page=True`. | +| **`scroll_delay`** | `float` (0.2) | Delay between scroll steps when scanning the full page (`scan_full_page=True`) or capturing full-page screenshots. | | **`process_iframes`** | `bool` (False) | Inlines iframe content for single-page extraction. | | **`flatten_shadow_dom`** | `bool` (False) | Flattens Shadow DOM content into the light DOM before HTML capture. Resolves slots, strips shadow-scoped styles, and force-opens closed shadow roots. Essential for sites built with Web Components. | | **`remove_overlay_elements`** | `bool` (False) | Removes potential modals/popups blocking the main content. |