0.8.8's SSRF check validated the crawl target URL but not the proxy address, so
an unauthenticated /crawl, /crawl/stream, or /crawl/job could route the browser
through a proxy pointing at an internal IP and reach internal services / cloud
metadata. Reported by Geo (geo-chen).
Fix (backward compatible): validate every proxy destination with the same
not-is_global check used for crawl URLs, before the browser is built -
browser_config.proxy, browser_config.proxy_config.server,
crawler_config.proxy_config.server - and strip proxy/DNS-redirecting flags
(--proxy-server / --proxy-pac-url / --proxy-bypass-list / --host-resolver-rules)
from extra_args. A legitimate public proxy still works; configure proxies via
proxy_config (validated), not raw extra_args flags. _enforce_proxy_safety is
called in both crawl handlers (and covers /crawl/job transitively); HTTPException
passthrough added so the 400 is not masked as a 500.
Bump 0.8.8 -> 0.8.9 (__version__ + Dockerfile). 20 new tests; full security
suite 161 pass. Changelog, release blog, README, SECURITY-CREDITS updated.
This vector was already fixed in the upcoming secure-by-default release; 0.8.9
brings it forward because it is an unauthenticated SSRF.
Backward-compatible fixes for the Docker server - features keep working, only
the unsafe behavior is closed. (The secure-by-default redesign is the later
major.)
- SSRF: replace the explicit blocklist with the one rule (reject any resolved
IP where not ip.is_global) evaluated on embedded IPv4 transition forms too,
closing the gaps - IPv6 unspecified ::, NAT64 64:ff9b::/96, 6to4 2002::/16,
v4-mapped. Error messages are now opaque (no resolved-IP leak).
- output_path arbitrary write: harden validate_output_path with realpath
containment (defeats a symlinked path component) and write via O_NOFOLLOW
(write_output_file). output_path stays supported.
- LLM base_url key exfil: ignore a request-supplied base_url in /md, /llm,
/llm/job; the endpoint is always server-derived. Field still accepted (no
4xx) for compatibility.
- env:SECRET_KEY exfil gadget: LLMConfig refuses env: resolution of protected
names (SECRET/PASSWORD/PRIVATE substrings, CRAWL4AI*/AWS_SECRET* prefixes,
SECRET_KEY/REDIS_PASSWORD/TOKEN). Normal provider keys (OPENAI_API_KEY, ...)
unaffected.
- CRLF log injection: CRLFSafeFilter strips CR/LF/control from log records.
- Webhook header injection: sanitize_webhook_headers (name pattern, no control
chars, deny hop-by-hop/sensitive) at send time + a WebhookConfig validator
for early 422.
Bump 0.8.7 -> 0.8.8 (__version__ + Dockerfile C4AI_VER). 30 new behavioral
tests; existing 111 security tests + 112 library config tests still pass.
NOT included (breaking -> deferred to the major): auth-by-default, trust
boundary, declarative hooks, output_path removal, base_url/provider removal,
loopback bind, redis password, TLS-verify-on, CORS, bounded queue. The
exec-hook RCE and unauth-by-default criticals have no non-breaking fix and are
closed only in the major (hooks are already off by default).
Caught during internal review. `http://[::ffff:127.0.0.1]/` bypassed
validate_webhook_url because getaddrinfo returns ::ffff:7f00:1, which
is not in any IPv4 blocklist (127.0.0.0/8) nor IPv6 blocklist (::1/128).
Fix: added _expand_ip_candidates() helper that unwraps IPv4 from
IPv4-mapped (::ffff:X.Y.Z.W, via .ipv4_mapped) and IPv4-compatible
(::X.Y.Z.W, via low-32-bits) IPv6 addresses. Blocklist now checks
both the original IP and the unwrapped IPv4 form.
Added 6 new TestIPv6MappedBypass tests covering:
- Loopback, RFC 1918, link-local (cloud metadata) via ::ffff: mapping
- IPv4-compatible variant (::127.0.0.1)
- Regression test that plain ::1 still blocked
Also updated stale test assertion in test_eval_security_adversarial:
hasattr, type, __build_class__ were removed from hook builtins in
batch 2 but the test still expected hasattr to remain.
DO NOT PUSH until release day.
Reported by secsys_codex (2026-04-18): /md, /crawl, /llm endpoints
pass user URLs to crawler.arun() with no private IP validation.
- Add validate_url_destination() to utils.py with opt-out via
CRAWL4AI_ALLOW_INTERNAL_URLS=true env var for users who need
to crawl internal services.
- Integrate into validate_url_scheme() (covers all server.py endpoints).
- Add validation at all 4 URL entry points in api.py (handle_llm_qa,
handle_markdown_request, create_new_task, handle_crawl_request).
- raw: URLs bypass check (inline HTML, no network fetch).
- 16 adversarial + source coverage tests added.
- secsys_codex added to SECURITY-CREDITS.md.
DO NOT PUSH until release day.
- Replace raw eval() in _compute_field() with AST-validated
_safe_eval_expression() that blocks __import__, dunder attribute
access, and import statements while preserving safe transforms
- Add ALLOWED_DESERIALIZE_TYPES allowlist to from_serializable_dict()
preventing arbitrary class instantiation from API input
- Update security contact email and add v0.8.1 security fixes to
SECURITY.md with researcher acknowledgment
- Add 17 security tests covering both fixes
Security fixes for vulnerabilities reported by ProjectDiscovery:
1. Remote Code Execution via Hooks (CVE pending)
- Remove __import__ from allowed_builtins in hook_manager.py
- Prevents arbitrary module imports (os, subprocess, etc.)
- Hooks now disabled by default via CRAWL4AI_HOOKS_ENABLED env var
2. Local File Inclusion via file:// URLs (CVE pending)
- Add URL scheme validation to /execute_js, /screenshot, /pdf, /html
- Block file://, javascript:, data: and other dangerous schemes
- Only allow http://, https://, and raw: (where appropriate)
3. Security hardening
- Add CRAWL4AI_HOOKS_ENABLED=false as default (opt-in for hooks)
- Add security warning comments in config.yml
- Add validate_url_scheme() helper for consistent validation
Testing:
- Add unit tests (test_security_fixes.py) - 16 tests
- Add integration tests (run_security_tests.py) for live server
Affected endpoints:
- POST /crawl (hooks disabled by default)
- POST /crawl/stream (hooks disabled by default)
- POST /execute_js (URL validation added)
- POST /screenshot (URL validation added)
- POST /pdf (URL validation added)
- POST /html (URL validation added)
Breaking changes:
- Hooks require CRAWL4AI_HOOKS_ENABLED=true to function
- file:// URLs no longer work on API endpoints (use library directly)
- Introduced a demo script (`demo_monitor_dashboard.py`) to showcase various monitoring features through simulated activity.
- Implemented a test script (`test_monitor_demo.py`) to generate dashboard activity and verify monitor health and endpoint statistics.
- Added a logo image to the static assets for branding purposes.