Common commit
This commit is contained in:
115
doc/CivitFetchPaseudocode
Normal file
115
doc/CivitFetchPaseudocode
Normal file
@@ -0,0 +1,115 @@
|
||||
# Simple json request module
|
||||
|
||||
aliases:
|
||||
retry = "retry 10 times with 1 time(s) cooldown"
|
||||
|
||||
start
|
||||
try get request
|
||||
|
||||
-- Network errors block
|
||||
Having: url
|
||||
|
||||
Exception: Network is unreachable or temporary failure in name resolution
|
||||
Wait until network became available
|
||||
Exception: Name not resolved
|
||||
Repeat 10 times with 10 times cooldown
|
||||
Fatal: target site is dead
|
||||
|
||||
-- HTTP errors block
|
||||
Having: Some HTTP response
|
||||
|
||||
Exception: Service unavailable
|
||||
Repeat 10 times with 10 times cooldown
|
||||
Throw Exception on higher level
|
||||
Exception: Internal server error and other HTTP errors (403, 404...)
|
||||
retry
|
||||
Throw Exception on higher level
|
||||
|
||||
-- Content errors block
|
||||
Having
|
||||
Some successful HTTP response
|
||||
|
||||
Raised: Service unavailable
|
||||
wait until initial page become available
|
||||
retry
|
||||
try strip cursor if cursor crawler
|
||||
retry
|
||||
try decrement cursor/page
|
||||
retry
|
||||
try increment cursor/page
|
||||
retry
|
||||
|
||||
Raised: Internal server error and other HTTP errors (403, 404...)
|
||||
try strip cursor if cursor crawler
|
||||
retry
|
||||
try decrement cursor/page
|
||||
retry
|
||||
try increment cursor/page
|
||||
retry
|
||||
|
||||
Exception: Response is not json data
|
||||
retry
|
||||
try strip cursor and retry if cursor crawler
|
||||
try decrement cursor/page and retry 1 times
|
||||
try increment cursor/page and retry 1 times
|
||||
log error and end crawl
|
||||
|
||||
Having: Some json data
|
||||
|
||||
Exception: Response not contains {items: list, metadata: dict} fields
|
||||
retry
|
||||
try strip cursor and retry if cursor crawler
|
||||
try decrement cursor/page and retry 1 times
|
||||
try increment cursor/page and retry 1 times
|
||||
log error and end crawl
|
||||
|
||||
Exception: items is empty and metadata is empty
|
||||
retry
|
||||
try strip cursor and retry if cursor crawler
|
||||
try decrement cursor/page and retry 1 times
|
||||
try increment cursor/page and retry 1 times
|
||||
log error and end crawl
|
||||
|
||||
Exception: items is empty and metadata is not empty
|
||||
if result of (try decrement cursor/page and retry 1 times) is 1: end crawl
|
||||
retry
|
||||
try strip cursor and retry if cursor crawler
|
||||
try decrement cursor/page and retry 1 times
|
||||
log error and end crawl
|
||||
|
||||
Exception: if cursor crawler: metadata not have required field "nextPage"
|
||||
retry
|
||||
try strip cursor and retry if cursor crawler
|
||||
try decrement cursor/page and retry 1 times
|
||||
try increment cursor/page and retry 1 times
|
||||
log error and end crawl
|
||||
|
||||
ExitPoint: items is not empty and metadata is empty
|
||||
end crawl
|
||||
|
||||
|
||||
Having: Some valid json api response (items is not empty and not cursor crawler or (end crawl flag is set or metadata is not empty))
|
||||
|
||||
Exception: Cursor slip (nextPage url equals request url)
|
||||
if not cursor crawler: pass (not possible)
|
||||
try increment cursor/page and retry 1 times
|
||||
|
||||
|
||||
Exception: response "items" has no new items (may be caused by cursor system destroy or rare situation, where total_items mod page_items_limit is 0)
|
||||
try strip cursor and retry if cursor crawler
|
||||
try increment cursor/page and retry 1 times
|
||||
log error and end crawl
|
||||
|
||||
Warning: Added items != page_items_limit and not end crawl
|
||||
log warning
|
||||
|
||||
Having: some items, added to all crawl items dict
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user