116 lines
3.1 KiB
Plaintext
116 lines
3.1 KiB
Plaintext
# Simple json request module
|
|
|
|
aliases:
|
|
retry = "retry 10 times with 1 time(s) cooldown"
|
|
|
|
start
|
|
try get request
|
|
|
|
-- Network errors block
|
|
Having: url
|
|
|
|
Exception: Network is unreachable or temporary failure in name resolution
|
|
Wait until network became available
|
|
Exception: Name not resolved
|
|
Repeat 10 times with 10 times cooldown
|
|
Fatal: target site is dead
|
|
|
|
-- HTTP errors block
|
|
Having: Some HTTP response
|
|
|
|
Exception: Service unavailable
|
|
Repeat 10 times with 10 times cooldown
|
|
Throw Exception on higher level
|
|
Exception: Internal server error and other HTTP errors (403, 404...)
|
|
retry
|
|
Throw Exception on higher level
|
|
|
|
-- Content errors block
|
|
Having
|
|
Some successful HTTP response
|
|
|
|
Raised: Service unavailable
|
|
wait until initial page become available
|
|
retry
|
|
try strip cursor if cursor crawler
|
|
retry
|
|
try decrement cursor/page
|
|
retry
|
|
try increment cursor/page
|
|
retry
|
|
|
|
Raised: Internal server error and other HTTP errors (403, 404...)
|
|
try strip cursor if cursor crawler
|
|
retry
|
|
try decrement cursor/page
|
|
retry
|
|
try increment cursor/page
|
|
retry
|
|
|
|
Exception: Response is not json data
|
|
retry
|
|
try strip cursor and retry if cursor crawler
|
|
try decrement cursor/page and retry 1 times
|
|
try increment cursor/page and retry 1 times
|
|
log error and end crawl
|
|
|
|
Having: Some json data
|
|
|
|
Exception: Response not contains {items: list, metadata: dict} fields
|
|
retry
|
|
try strip cursor and retry if cursor crawler
|
|
try decrement cursor/page and retry 1 times
|
|
try increment cursor/page and retry 1 times
|
|
log error and end crawl
|
|
|
|
Exception: items is empty and metadata is empty
|
|
retry
|
|
try strip cursor and retry if cursor crawler
|
|
try decrement cursor/page and retry 1 times
|
|
try increment cursor/page and retry 1 times
|
|
log error and end crawl
|
|
|
|
Exception: items is empty and metadata is not empty
|
|
if result of (try decrement cursor/page and retry 1 times) is 1: end crawl
|
|
retry
|
|
try strip cursor and retry if cursor crawler
|
|
try decrement cursor/page and retry 1 times
|
|
log error and end crawl
|
|
|
|
Exception: if cursor crawler: metadata not have required field "nextPage"
|
|
retry
|
|
try strip cursor and retry if cursor crawler
|
|
try decrement cursor/page and retry 1 times
|
|
try increment cursor/page and retry 1 times
|
|
log error and end crawl
|
|
|
|
ExitPoint: items is not empty and metadata is empty
|
|
end crawl
|
|
|
|
|
|
Having: Some valid json api response (items is not empty and not cursor crawler or (end crawl flag is set or metadata is not empty))
|
|
|
|
Exception: Cursor slip (nextPage url equals request url)
|
|
if not cursor crawler: pass (not possible)
|
|
try increment cursor/page and retry 1 times
|
|
|
|
|
|
Exception: response "items" has no new items (may be caused by cursor system destroy or rare situation, where total_items mod page_items_limit is 0)
|
|
try strip cursor and retry if cursor crawler
|
|
try increment cursor/page and retry 1 times
|
|
log error and end crawl
|
|
|
|
Warning: Added items != page_items_limit and not end crawl
|
|
log warning
|
|
|
|
Having: some items, added to all crawl items dict
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|