Commit Graph

15 Commits

Author SHA1 Message Date
Luke Mino-Altherr
58ddf46c0a Rename bulk_set_needs_verify to bulk_update_needs_verify for readability
Amp-Thread-ID: https://ampcode.com/threads/T-019c3167-e8f1-7409-904b-5fc0edaeef37
Co-authored-by: Amp <amp@ampcode.com>
2026-02-05 21:25:58 -08:00
Luke Mino-Altherr
9222ff6d81 feat: non-destructive asset pruning with is_missing flag
- Add is_missing column to AssetCacheState for soft-delete
- Replace hard-delete pruning with mark_cache_states_missing_outside_prefixes
- Auto-restore missing cache states when files are re-scanned
- Filter out missing cache states from queries by default
- Rename functions for clarity:
  - mark_cache_states_missing_outside_prefixes (was delete_cache_states_outside_prefixes)
  - get_unreferenced_unhashed_asset_ids (was get_orphaned_seed_asset_ids)
  - mark_assets_missing_outside_prefixes (was prune_orphaned_assets)
  - mark_missing_outside_prefixes_safely (was prune_orphans_safely)
- Add restore_cache_states_by_paths for explicit restoration
- Add cleanup_unreferenced_assets for explicit hard-delete when needed
- Update API endpoint /api/assets/prune to use new soft-delete behavior

This preserves user metadata (tags, etc.) when base directories change,
allowing assets to be restored when the original paths become available again.

Amp-Thread-ID: https://ampcode.com/threads/T-019c3114-bf28-73a9-a4d2-85b208fd5462
Co-authored-by: Amp <amp@ampcode.com>
2026-02-05 21:21:46 -08:00
Luke Mino-Altherr
ebb2f5b0e9 refactor: make scanner helper functions public
Rename _sync_root_safely, _prune_orphans_safely, _collect_paths_for_roots,
_build_asset_specs, and _insert_asset_specs to remove underscore prefix
since they are used by seeder.py as part of the public API.

Amp-Thread-ID: https://ampcode.com/threads/T-019c3037-df32-7138-99d8-b4b824d896b3
Co-authored-by: Amp <amp@ampcode.com>
2026-02-05 19:01:46 -08:00
Luke Mino-Altherr
28c4b58dd6 Make ingest_file_from_path and register_existing_asset private
Amp-Thread-ID: https://ampcode.com/threads/T-019c2fe5-a3de-71cc-a6e5-67fe944a101e
Co-authored-by: Amp <amp@ampcode.com>
2026-02-05 14:26:36 -08:00
Luke Mino-Altherr
56e9a75ca2 Decouple orphan pruning from asset seeding
- Remove automatic pruning from scan loop to prevent partial scans from
  deleting assets belonging to other roots
- Add get_all_known_prefixes() helper to get prefixes for all root types
- Add prune_orphans() method to AssetSeeder for explicit pruning
- Add prune_first parameter to start() for optional pre-scan pruning
- Add POST /api/assets/prune endpoint for explicit pruning via API
- Update main.py startup to use prune_first=True for full startup scans
- Add tests for new prune_orphans functionality

Fixes issue where a models-only scan would delete all input/output assets.

Amp-Thread-ID: https://ampcode.com/threads/T-019c2ba0-e004-7229-81bf-452b2f7f57a1
Co-authored-by: Amp <amp@ampcode.com>
2026-02-04 19:38:29 -08:00
Luke Mino-Altherr
fa3749ced7 Add TypedDict types to scanner and bulk_ingest
Amp-Thread-ID: https://ampcode.com/threads/T-019c2af9-4d41-73e9-b38d-78d06bc28a3f
Co-authored-by: Amp <amp@ampcode.com>
2026-02-04 15:32:26 -08:00
Luke Mino-Altherr
abeec3072b refactor(assets): extract scanner logic into service modules
- Create file_utils.py with shared file utilities:
  - get_mtime_ns() - extract mtime in nanoseconds from stat
  - get_size_and_mtime_ns() - get both size and mtime
  - verify_file_unchanged() - check file matches DB mtime/size
  - list_files_recursively() - recursive directory listing

- Create bulk_ingest.py for bulk operations:
  - BulkInsertResult dataclass
  - batch_insert_seed_assets() - batch insert with conflict handling
  - prune_orphaned_assets() - clean up orphaned assets

- Update scanner.py to use new service modules instead of
  calling database queries directly

- Update ingest.py to use shared get_size_and_mtime_ns()

- Export new functions from services/__init__.py

Amp-Thread-ID: https://ampcode.com/threads/T-019c2ae7-f701-716a-a0dd-1feb988732fb
Co-authored-by: Amp <amp@ampcode.com>
2026-02-04 15:17:31 -08:00
Luke Mino-Altherr
b23302f372 refactor(assets): consolidate duplicated query utilities and remove unused code
- Extract shared helpers to database/queries/common.py:
  - MAX_BIND_PARAMS, calculate_rows_per_statement, iter_chunks, iter_row_chunks
  - build_visible_owner_clause

- Remove duplicate _compute_filename_for_asset, consolidate in path_utils.py

- Remove unused get_asset_info_with_tags (duplicated get_asset_detail)

- Remove redundant __all__ from cache_state.py

- Make internal helpers private (_check_is_scalar)

Amp-Thread-ID: https://ampcode.com/threads/T-019c2ad9-9432-7451-94a8-79287dbbb19e
Co-authored-by: Amp <amp@ampcode.com>
2026-02-04 15:04:30 -08:00
Luke Mino-Altherr
37ecc5b663 chore: remove obvious/self-documenting comments from assets package
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 20:14:11 -08:00
Luke Mino-Altherr
d8cb122dfb chore: sort imports in assets package
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 20:02:52 -08:00
Luke Mino-Altherr
0f75def5b5 refactor: move scanner.py out of services to top-level assets module
Scanner is used externally by main.py and server.py for startup/maintenance,
not as part of the regular service layer. Moving it to app/assets/scanner.py
makes the public API clearer.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 19:56:29 -08:00
Luke Mino-Altherr
64d2f51dfc refactor: move scanner to services layer with pure query extraction
- Move app/assets/scanner.py to app/assets/services/scanner.py
- Extract pure queries from fast_db_consistency_pass:
  - get_cache_states_for_prefixes()
  - bulk_set_needs_verify()
  - delete_cache_states_by_ids()
  - delete_orphaned_seed_asset()
- Split prune_orphaned_assets into pure queries:
  - delete_cache_states_outside_prefixes()
  - get_orphaned_seed_asset_ids()
  - delete_assets_by_ids()
- Add reconcile_cache_states_for_root() service function
- Add prune_orphaned_assets() service function
- Remove function injection pattern
- Update imports in main.py, server.py, routes.py

Amp-Thread-ID: https://ampcode.com/threads/T-019c24f1-3385-701b-87e0-8b6bc87e841b
Co-authored-by: Amp <amp@ampcode.com>
2026-02-03 13:08:04 -08:00
Luke Mino-Altherr
17ad7e393f refactor(assets): split queries.py into modular query modules
Split the ~1000 line app/assets/database/queries.py into focused modules:

- queries/asset.py - Asset entity queries (asset_exists_by_hash, get_asset_by_hash)
- queries/asset_info.py - AssetInfo queries (~15 functions)
- queries/cache_state.py - AssetCacheState queries (list_cache_states_by_asset_id,
  pick_best_live_path, prune_orphaned_assets, fast_db_consistency_pass)
- queries/tags.py - Tag queries (8 functions including ensure_tags_exist,
  add/remove tag functions, list_tags_with_usage)
- queries/__init__.py - Re-exports all public functions for backward compatibility

Also adds comprehensive unit tests using in-memory SQLite:
- tests-unit/assets_test/queries/conftest.py - Session fixture
- tests-unit/assets_test/queries/test_asset.py - 5 tests
- tests-unit/assets_test/queries/test_asset_info.py - 23 tests
- tests-unit/assets_test/queries/test_cache_state.py - 8 tests
- tests-unit/assets_test/queries/test_metadata.py - 12 tests for _apply_metadata_filter
- tests-unit/assets_test/queries/test_tags.py - 23 tests

All 71 unit tests pass. Existing integration tests unaffected.

Amp-Thread-ID: https://ampcode.com/threads/T-019c24bb-475b-7442-9ff9-8288edea3345
Co-authored-by: Amp <amp@ampcode.com>
2026-02-03 13:08:04 -08:00
Jedrzej Kosinski
6ea8c128a3 Assets Part 2 - add more endpoints (#12125) 2026-01-31 02:22:05 -05:00
Jedrzej Kosinski
1dc3da6314 Add most basic Asset support for models (#11315)
* Brought over minimal elements from PR 10045 to reproduce seed_assets and register_assets_system without adding anything to the DB or server routes yet, for now making everything sync (can introduce async once everything is cleaned up and brought over)

* Added db script to insert assets stuff, cleaned up some code; assets (models) now get added/rescanned

* Added support for 5 http endpoints for assets

* Replaced Optional with | None in schemas_in.py and schemas_out.py

* Remove two routes that will not be relevant yet in this PR: HEAD /api/assets/hash/<hash> and PUT /api/assets/<id>/preview

* Remove some functions the two deleted endpoints were using

* Don't show assets scan message upon calling /object_info endpoint

* removed unsued import to satisfy ruff

* Simplified hashing function tpye hint and _hash_file_obj

* Satisfied ruff
2026-01-08 22:21:51 -05:00