unclecode
11b45760da
fix: anti-bot false positive on browser JSON, URLPatternFilter prefix match, PDF deserialization
...
- antibot_detector: add <pre> to content elements regex, detect
browser-wrapped JSON in _looks_like_data() so httpbin-style
responses are not flagged as blocked
- deep_crawling/filters: use urlparse().path for path-only prefix
patterns (/docs/*) instead of matching against full URL, which
always failed; full-URL prefixes still match correctly
- async_configs: add PDFContentScrapingStrategy to
ALLOWED_DESERIALIZE_TYPES so /crawl API can deserialize it
- __init__: export PDFContentScrapingStrategy for type resolution
- tests: add 86-test suite covering all three fixes with adversarial
and edge cases
2026-03-09 14:52:58 +00:00
..
2026-03-07 09:47:38 +00:00
2026-02-05 07:48:12 +00:00
2025-06-23 10:44:27 +08:00
2026-02-25 07:12:28 +00:00
2026-01-17 14:19:15 +01:00
2025-12-10 10:12:01 -07:00
2026-01-22 06:08:25 +00:00
2025-06-09 11:49:33 +08:00
2026-02-25 07:12:28 +00:00
2026-02-18 06:44:17 +00:00
2025-02-07 21:56:27 +08:00
2025-02-28 19:53:35 +08:00
2025-04-22 22:35:25 +08:00
2025-08-28 17:21:49 +08:00
2025-06-12 14:38:32 +03:00
2026-02-20 10:07:59 +00:00
2026-03-08 03:20:52 +00:00
2025-07-11 22:27:18 +08:00
2026-02-25 07:12:28 +00:00
2024-05-14 21:27:41 +08:00
2026-01-17 11:06:44 +01:00
2025-08-09 19:37:22 +05:30
2025-08-02 19:10:36 +08:00
2026-03-07 08:45:11 +00:00
2026-01-31 11:07:26 +00:00
2025-01-13 19:19:58 +08:00
2026-03-09 14:52:58 +00:00
2026-01-31 11:44:07 +00:00
2025-08-02 19:10:36 +08:00
2025-08-02 19:10:36 +08:00
2025-08-05 14:09:54 +08:00
2025-04-29 16:26:35 +02:00
2026-02-25 06:52:53 +03:00
2025-07-17 11:35:16 +02:00
2025-11-06 11:22:45 +01:00
2025-08-15 18:47:31 +08:00
2025-01-13 19:19:58 +08:00
2025-04-29 16:26:35 +02:00
2025-08-17 19:14:55 +08:00
2026-02-16 20:41:30 +05:30
2025-08-03 16:50:54 +08:00
2025-05-19 13:51:16 +08:00
2026-03-07 08:45:11 +00:00
2026-02-06 09:30:19 +00:00
2026-02-06 09:30:19 +00:00
2026-03-07 08:45:11 +00:00
2026-01-17 14:19:15 +01:00
2026-01-17 14:19:15 +01:00
2026-01-17 14:19:15 +01:00
2025-08-28 17:38:40 +08:00
2025-10-23 06:57:25 +00:00
2025-10-23 06:57:25 +00:00
2026-01-17 14:19:15 +01:00
2026-01-17 14:19:15 +01:00
2026-01-20 00:45:15 +00:00
2025-02-28 19:53:35 +08:00
2026-02-17 09:04:40 +00:00
2026-02-17 21:14:36 -05:00
2025-06-29 20:41:37 +08:00
2025-06-10 18:08:27 +08:00
2025-10-22 00:35:07 +00:00
2025-10-22 00:35:07 +00:00