Commit Graph

9 Commits

Author SHA1 Message Date
Luke Mino-Altherr
105e54e420 Populate mime_type for assets in scanner and API paths
- Add custom MIME type registrations for model files (.safetensors, .pt, .ckpt, .gguf, .yaml)
- Pass mime_type through SeedAssetSpec to bulk_ingest
- Re-register types before use since server.py mimetypes.init() resets them
- Add tests for bulk ingest mime_type handling

Amp-Thread-ID: https://ampcode.com/threads/T-019c3626-c6ad-7139-a570-62da4e656a1a
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
53869fb0c7 Fix FK constraint violation in bulk_ingest by filtering dropped assets
Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019c3626-c6ad-7139-a570-62da4e656a1a
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
7519a556df Add optional blake3 hashing during asset scanning
- Make blake3 import lazy in hashing.py (only imported when needed)
- Add compute_hashes parameter to AssetSeeder.start(), build_asset_specs(), and seed_assets()
- Fix missing tag clearing: include is_missing states in sync when update_missing_tags=True
- Clear is_missing flag on cache states when files are restored with matching mtime/size
- Fix validation error serialization in routes.py (use json.loads(ve.json()))

Amp-Thread-ID: https://ampcode.com/threads/T-019c3614-56d4-74a8-a717-19922d6dbbee
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
947ca6b61b Fix is_missing state updates for asset cache states on startup
- Add bulk_update_is_missing() to efficiently update is_missing flag
- Update sync_cache_states_with_filesystem() to mark non-existent files as is_missing=True
- Call restore_cache_states_by_paths() in batch_insert_seed_assets() to restore
  previously-missing states when files reappear during scanning

Amp-Thread-ID: https://ampcode.com/threads/T-019c3177-e591-7666-ac6b-7e05c71c8ebf
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
8587725c20 refactor(bulk_ingest): improve variable naming and add typed dicts
- Rename shorthand variables to explicit names (sp -> spec, aid -> asset_id, etc.)
- Move imports to top of file
- Add TypedDict definitions for AssetRow, CacheStateRow, AssetInfoRow, TagRow, MetadataRow
- Replace bare dict types with typed alternatives

Amp-Thread-ID: https://ampcode.com/threads/T-019c316d-13f7-77f8-b92b-ea7276c3e09c
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
f13b7aeba2 feat: non-destructive asset pruning with is_missing flag
- Add is_missing column to AssetCacheState for soft-delete
- Replace hard-delete pruning with mark_cache_states_missing_outside_prefixes
- Auto-restore missing cache states when files are re-scanned
- Filter out missing cache states from queries by default
- Rename functions for clarity:
  - mark_cache_states_missing_outside_prefixes (was delete_cache_states_outside_prefixes)
  - get_unreferenced_unhashed_asset_ids (was get_orphaned_seed_asset_ids)
  - mark_assets_missing_outside_prefixes (was prune_orphaned_assets)
  - mark_missing_outside_prefixes_safely (was prune_orphans_safely)
- Add restore_cache_states_by_paths for explicit restoration
- Add cleanup_unreferenced_assets for explicit hard-delete when needed
- Update API endpoint /api/assets/prune to use new soft-delete behavior

This preserves user metadata (tags, etc.) when base directories change,
allowing assets to be restored when the original paths become available again.

Amp-Thread-ID: https://ampcode.com/threads/T-019c3114-bf28-73a9-a4d2-85b208fd5462
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
e054a40765 Make ingest_file_from_path and register_existing_asset private
Amp-Thread-ID: https://ampcode.com/threads/T-019c2fe5-a3de-71cc-a6e5-67fe944a101e
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
469576ed87 Add TypedDict types to scanner and bulk_ingest
Amp-Thread-ID: https://ampcode.com/threads/T-019c2af9-4d41-73e9-b38d-78d06bc28a3f
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
bc92ae4a0d refactor(assets): extract scanner logic into service modules
- Create file_utils.py with shared file utilities:
  - get_mtime_ns() - extract mtime in nanoseconds from stat
  - get_size_and_mtime_ns() - get both size and mtime
  - verify_file_unchanged() - check file matches DB mtime/size
  - list_files_recursively() - recursive directory listing

- Create bulk_ingest.py for bulk operations:
  - BulkInsertResult dataclass
  - batch_insert_seed_assets() - batch insert with conflict handling
  - prune_orphaned_assets() - clean up orphaned assets

- Update scanner.py to use new service modules instead of
  calling database queries directly

- Update ingest.py to use shared get_size_and_mtime_ns()

- Export new functions from services/__init__.py

Amp-Thread-ID: https://ampcode.com/threads/T-019c2ae7-f701-716a-a0dd-1feb988732fb
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00