2 Commits

Author SHA1 Message Date
unclecode
1e25edcb5c fix(security): block IPv6-mapped IPv4 SSRF bypass
Caught during internal review. `http://[::ffff:127.0.0.1]/` bypassed
validate_webhook_url because getaddrinfo returns ::ffff:7f00:1, which
is not in any IPv4 blocklist (127.0.0.0/8) nor IPv6 blocklist (::1/128).

Fix: added _expand_ip_candidates() helper that unwraps IPv4 from
IPv4-mapped (::ffff:X.Y.Z.W, via .ipv4_mapped) and IPv4-compatible
(::X.Y.Z.W, via low-32-bits) IPv6 addresses. Blocklist now checks
both the original IP and the unwrapped IPv4 form.

Added 6 new TestIPv6MappedBypass tests covering:
- Loopback, RFC 1918, link-local (cloud metadata) via ::ffff: mapping
- IPv4-compatible variant (::127.0.0.1)
- Regression test that plain ::1 still blocked

Also updated stale test assertion in test_eval_security_adversarial:
hasattr, type, __build_class__ were removed from hook builtins in
batch 2 but the test still expected hasattr to remain.

DO NOT PUSH until release day.
2026-04-20 10:10:59 +00:00
unclecode
f77c0a856f fix(security): SSRF protection on all crawl/md/llm URL entry points
Reported by secsys_codex (2026-04-18): /md, /crawl, /llm endpoints
pass user URLs to crawler.arun() with no private IP validation.

- Add validate_url_destination() to utils.py with opt-out via
  CRAWL4AI_ALLOW_INTERNAL_URLS=true env var for users who need
  to crawl internal services.
- Integrate into validate_url_scheme() (covers all server.py endpoints).
- Add validation at all 4 URL entry points in api.py (handle_llm_qa,
  handle_markdown_request, create_new_task, handle_crawl_request).
- raw: URLs bypass check (inline HTML, no network fetch).
- 16 adversarial + source coverage tests added.
- secsys_codex added to SECURITY-CREDITS.md.

DO NOT PUSH until release day.
2026-04-20 09:42:43 +00:00