Commit Graph

6 Commits

Author SHA1 Message Date
Luke Mino-Altherr
3fee0962c6 feat: implement two-phase scanning architecture (fast + enrich)
Phase 1 (FAST): Creates stub records with filesystem metadata only
- path, size, mtime - no file content reading
- Populates asset database quickly on startup

Phase 2 (ENRICH): Extracts metadata and computes hashes
- Safetensors header parsing, MIME types
- Optional blake3 hash computation
- Updates existing stub records

Changes:
- Add ScanPhase enum (FAST, ENRICH, FULL)
- Add enrichment_level column to AssetCacheState (0=stub, 1=metadata, 2=hashed)
- Add build_stub_specs() for fast scanning without metadata extraction
- Add get_unenriched_cache_states(), enrich_asset(), enrich_assets_batch()
- Add start_fast(), start_enrich() convenience methods to AssetSeeder
- Update start() to accept phase parameter (defaults to FULL)
- Split _run_scan() into _run_fast_phase() and _run_enrich_phase()
- Add migration 0003_add_enrichment_level.py
- Update tests for new architecture

Amp-Thread-ID: https://ampcode.com/threads/T-019c4eef-1568-778f-aede-38254728f848
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
f13b7aeba2 feat: non-destructive asset pruning with is_missing flag
- Add is_missing column to AssetCacheState for soft-delete
- Replace hard-delete pruning with mark_cache_states_missing_outside_prefixes
- Auto-restore missing cache states when files are re-scanned
- Filter out missing cache states from queries by default
- Rename functions for clarity:
  - mark_cache_states_missing_outside_prefixes (was delete_cache_states_outside_prefixes)
  - get_unreferenced_unhashed_asset_ids (was get_orphaned_seed_asset_ids)
  - mark_assets_missing_outside_prefixes (was prune_orphaned_assets)
  - mark_missing_outside_prefixes_safely (was prune_orphans_safely)
- Add restore_cache_states_by_paths for explicit restoration
- Add cleanup_unreferenced_assets for explicit hard-delete when needed
- Update API endpoint /api/assets/prune to use new soft-delete behavior

This preserves user metadata (tags, etc.) when base directories change,
allowing assets to be restored when the original paths become available again.

Amp-Thread-ID: https://ampcode.com/threads/T-019c3114-bf28-73a9-a4d2-85b208fd5462
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
8c4eb9a659 refactor(assets): consolidate duplicated query utilities and remove unused code
- Extract shared helpers to database/queries/common.py:
  - MAX_BIND_PARAMS, calculate_rows_per_statement, iter_chunks, iter_row_chunks
  - build_visible_owner_clause

- Remove duplicate _compute_filename_for_asset, consolidate in path_utils.py

- Remove unused get_asset_info_with_tags (duplicated get_asset_detail)

- Remove redundant __all__ from cache_state.py

- Make internal helpers private (_check_is_scalar)

Amp-Thread-ID: https://ampcode.com/threads/T-019c2ad9-9432-7451-94a8-79287dbbb19e
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
fe59234476 chore: sort imports in assets package
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
1f9272bc94 refactor: rename functions to verb-based naming convention
Rename functions across app/assets/ to follow verb-based naming:
- is_scalar → check_is_scalar
- project_kv → expand_metadata_to_rows
- _visible_owner_clause → _build_visible_owner_clause
- _chunk_rows → _iter_row_chunks
- _at_least_one → _validate_at_least_one_field
- _tags_norm → _normalize_tags_field
- _ser_dt → _serialize_datetime
- _ser_updated → _serialize_updated_at
- _error_response → _build_error_response
- _validation_error_response → _build_validation_error_response
- file_sender → stream_file_chunks
- seed_assets_endpoint → seed_assets
- utcnow → get_utc_now
- _safe_sort_field → _validate_sort_field
- _safe_filename → _sanitize_filename
- fast_asset_file_check → check_asset_file_fast
- prefixes_for_root → get_prefixes_for_root
- blake3_hash → compute_blake3_hash
- blake3_hash_async → compute_blake3_hash_async
- _is_within → _check_is_within
- _rel → _compute_relative

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:37 -08:00
Jedrzej Kosinski
1dc3da6314 Add most basic Asset support for models (#11315)
* Brought over minimal elements from PR 10045 to reproduce seed_assets and register_assets_system without adding anything to the DB or server routes yet, for now making everything sync (can introduce async once everything is cleaned up and brought over)

* Added db script to insert assets stuff, cleaned up some code; assets (models) now get added/rescanned

* Added support for 5 http endpoints for assets

* Replaced Optional with | None in schemas_in.py and schemas_out.py

* Remove two routes that will not be relevant yet in this PR: HEAD /api/assets/hash/<hash> and PUT /api/assets/<id>/preview

* Remove some functions the two deleted endpoints were using

* Don't show assets scan message upon calling /object_info endpoint

* removed unsued import to satisfy ruff

* Simplified hashing function tpye hint and _hash_file_obj

* Satisfied ruff
2026-01-08 22:21:51 -05:00