Common commit

This commit is contained in:
2025-10-16 18:42:32 +07:00
parent 124065d2ac
commit 40c320a1ac
21 changed files with 1916 additions and 379 deletions

115
doc/CivitFetchPaseudocode Normal file
View File

@@ -0,0 +1,115 @@
# Simple json request module
aliases:
retry = "retry 10 times with 1 time(s) cooldown"
start
try get request
-- Network errors block
Having: url
Exception: Network is unreachable or temporary failure in name resolution
Wait until network became available
Exception: Name not resolved
Repeat 10 times with 10 times cooldown
Fatal: target site is dead
-- HTTP errors block
Having: Some HTTP response
Exception: Service unavailable
Repeat 10 times with 10 times cooldown
Throw Exception on higher level
Exception: Internal server error and other HTTP errors (403, 404...)
retry
Throw Exception on higher level
-- Content errors block
Having
Some successful HTTP response
Raised: Service unavailable
wait until initial page become available
retry
try strip cursor if cursor crawler
retry
try decrement cursor/page
retry
try increment cursor/page
retry
Raised: Internal server error and other HTTP errors (403, 404...)
try strip cursor if cursor crawler
retry
try decrement cursor/page
retry
try increment cursor/page
retry
Exception: Response is not json data
retry
try strip cursor and retry if cursor crawler
try decrement cursor/page and retry 1 times
try increment cursor/page and retry 1 times
log error and end crawl
Having: Some json data
Exception: Response not contains {items: list, metadata: dict} fields
retry
try strip cursor and retry if cursor crawler
try decrement cursor/page and retry 1 times
try increment cursor/page and retry 1 times
log error and end crawl
Exception: items is empty and metadata is empty
retry
try strip cursor and retry if cursor crawler
try decrement cursor/page and retry 1 times
try increment cursor/page and retry 1 times
log error and end crawl
Exception: items is empty and metadata is not empty
if result of (try decrement cursor/page and retry 1 times) is 1: end crawl
retry
try strip cursor and retry if cursor crawler
try decrement cursor/page and retry 1 times
log error and end crawl
Exception: if cursor crawler: metadata not have required field "nextPage"
retry
try strip cursor and retry if cursor crawler
try decrement cursor/page and retry 1 times
try increment cursor/page and retry 1 times
log error and end crawl
ExitPoint: items is not empty and metadata is empty
end crawl
Having: Some valid json api response (items is not empty and not cursor crawler or (end crawl flag is set or metadata is not empty))
Exception: Cursor slip (nextPage url equals request url)
if not cursor crawler: pass (not possible)
try increment cursor/page and retry 1 times
Exception: response "items" has no new items (may be caused by cursor system destroy or rare situation, where total_items mod page_items_limit is 0)
try strip cursor and retry if cursor crawler
try increment cursor/page and retry 1 times
log error and end crawl
Warning: Added items != page_items_limit and not end crawl
log warning
Having: some items, added to all crawl items dict