Commit Graph

36 Commits

Author SHA1 Message Date
Luke Mino-Altherr
4e1c1d8bdb Fix ruff linting issues
- Remove debug print statements
- Remove trailing whitespace on blank lines
- Remove unused pytest import

Amp-Thread-ID: https://ampcode.com/threads/T-019c3a8d-3b4f-75b4-8513-1c77914782f7
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
c5e788e610 Skip hidden files and directories in asset scanner
Amp-Thread-ID: https://ampcode.com/threads/T-019c3a75-046e-758d-ac96-08d45281a0c8
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
105e54e420 Populate mime_type for assets in scanner and API paths
- Add custom MIME type registrations for model files (.safetensors, .pt, .ckpt, .gguf, .yaml)
- Pass mime_type through SeedAssetSpec to bulk_ingest
- Re-register types before use since server.py mimetypes.init() resets them
- Add tests for bulk ingest mime_type handling

Amp-Thread-ID: https://ampcode.com/threads/T-019c3626-c6ad-7139-a570-62da4e656a1a
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
53869fb0c7 Fix FK constraint violation in bulk_ingest by filtering dropped assets
Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019c3626-c6ad-7139-a570-62da4e656a1a
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
7519a556df Add optional blake3 hashing during asset scanning
- Make blake3 import lazy in hashing.py (only imported when needed)
- Add compute_hashes parameter to AssetSeeder.start(), build_asset_specs(), and seed_assets()
- Fix missing tag clearing: include is_missing states in sync when update_missing_tags=True
- Clear is_missing flag on cache states when files are restored with matching mtime/size
- Fix validation error serialization in routes.py (use json.loads(ve.json()))

Amp-Thread-ID: https://ampcode.com/threads/T-019c3614-56d4-74a8-a717-19922d6dbbee
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
882ae8df12 Fix magic number and function name typo
- Add MAX_SAFETENSORS_HEADER_SIZE constant in metadata_extract.py
- Fix double 'compute' typo: compute_compute_blake3_hash_async → compute_blake3_hash_async

Amp-Thread-ID: https://ampcode.com/threads/T-019c3550-4dbc-7301-a5e8-e6e23aa2d7b1
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
947ca6b61b Fix is_missing state updates for asset cache states on startup
- Add bulk_update_is_missing() to efficiently update is_missing flag
- Update sync_cache_states_with_filesystem() to mark non-existent files as is_missing=True
- Call restore_cache_states_by_paths() in batch_insert_seed_assets() to restore
  previously-missing states when files reappear during scanning

Amp-Thread-ID: https://ampcode.com/threads/T-019c3177-e591-7666-ac6b-7e05c71c8ebf
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
8587725c20 refactor(bulk_ingest): improve variable naming and add typed dicts
- Rename shorthand variables to explicit names (sp -> spec, aid -> asset_id, etc.)
- Move imports to top of file
- Add TypedDict definitions for AssetRow, CacheStateRow, AssetInfoRow, TagRow, MetadataRow
- Replace bare dict types with typed alternatives

Amp-Thread-ID: https://ampcode.com/threads/T-019c316d-13f7-77f8-b92b-ea7276c3e09c
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
f13b7aeba2 feat: non-destructive asset pruning with is_missing flag
- Add is_missing column to AssetCacheState for soft-delete
- Replace hard-delete pruning with mark_cache_states_missing_outside_prefixes
- Auto-restore missing cache states when files are re-scanned
- Filter out missing cache states from queries by default
- Rename functions for clarity:
  - mark_cache_states_missing_outside_prefixes (was delete_cache_states_outside_prefixes)
  - get_unreferenced_unhashed_asset_ids (was get_orphaned_seed_asset_ids)
  - mark_assets_missing_outside_prefixes (was prune_orphaned_assets)
  - mark_missing_outside_prefixes_safely (was prune_orphans_safely)
- Add restore_cache_states_by_paths for explicit restoration
- Add cleanup_unreferenced_assets for explicit hard-delete when needed
- Update API endpoint /api/assets/prune to use new soft-delete behavior

This preserves user metadata (tags, etc.) when base directories change,
allowing assets to be restored when the original paths become available again.

Amp-Thread-ID: https://ampcode.com/threads/T-019c3114-bf28-73a9-a4d2-85b208fd5462
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
e054a40765 Make ingest_file_from_path and register_existing_asset private
Amp-Thread-ID: https://ampcode.com/threads/T-019c2fe5-a3de-71cc-a6e5-67fe944a101e
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
469576ed87 Add TypedDict types to scanner and bulk_ingest
Amp-Thread-ID: https://ampcode.com/threads/T-019c2af9-4d41-73e9-b38d-78d06bc28a3f
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
bc92ae4a0d refactor(assets): extract scanner logic into service modules
- Create file_utils.py with shared file utilities:
  - get_mtime_ns() - extract mtime in nanoseconds from stat
  - get_size_and_mtime_ns() - get both size and mtime
  - verify_file_unchanged() - check file matches DB mtime/size
  - list_files_recursively() - recursive directory listing

- Create bulk_ingest.py for bulk operations:
  - BulkInsertResult dataclass
  - batch_insert_seed_assets() - batch insert with conflict handling
  - prune_orphaned_assets() - clean up orphaned assets

- Update scanner.py to use new service modules instead of
  calling database queries directly

- Update ingest.py to use shared get_size_and_mtime_ns()

- Export new functions from services/__init__.py

Amp-Thread-ID: https://ampcode.com/threads/T-019c2ae7-f701-716a-a0dd-1feb988732fb
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
8c4eb9a659 refactor(assets): consolidate duplicated query utilities and remove unused code
- Extract shared helpers to database/queries/common.py:
  - MAX_BIND_PARAMS, calculate_rows_per_statement, iter_chunks, iter_row_chunks
  - build_visible_owner_clause

- Remove duplicate _compute_filename_for_asset, consolidate in path_utils.py

- Remove unused get_asset_info_with_tags (duplicated get_asset_detail)

- Remove redundant __all__ from cache_state.py

- Make internal helpers private (_check_is_scalar)

Amp-Thread-ID: https://ampcode.com/threads/T-019c2ad9-9432-7451-94a8-79287dbbb19e
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
5060b9f67e refactor: eliminate manager layer, routes call services directly
- Delete app/assets/manager.py
- Move upload logic (upload_from_temp_path, create_from_hash) to ingest service
- Add HashMismatchError and DependencyMissingError to ingest service
- Add UploadResult schema for upload responses
- Update routes.py to import services directly and do schema conversion inline
- Add asset lookup/listing service functions to asset_management.py

Routes now call the service layer directly, removing an unnecessary
layer of indirection. The manager was only converting between service
dataclasses and Pydantic response schemas.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
d06a538a04 refactor: require blake3 package directly in hashing module
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
2ddf91c6d9 refactor: add explicit types to asset service functions
- Add typed result dataclasses: IngestResult, AddTagsResult,
  RemoveTagsResult, SetTagsResult, TagUsage
- Add UserMetadata type alias for user_metadata parameters
- Type helper functions with Session parameters
- Use TypedDicts at query layer to avoid circular imports
- Update manager.py and tests to use attribute access

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
4695694263 chore: remove obvious/self-documenting comments from assets package
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
a6a8d3ad74 chore: remove module-level comments and docstrings from assets package
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
fe59234476 chore: sort imports in assets package
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
227b68c696 refactor: move scanner.py out of services to top-level assets module
Scanner is used externally by main.py and server.py for startup/maintenance,
not as part of the regular service layer. Moving it to app/assets/scanner.py
makes the public API clearer.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:38 -08:00
Luke Mino-Altherr
80dcb9f896 chore: remove unused get_utc_now import
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
1b169a2b2e refactor: use query functions instead of direct ORM modifications in service layer
Add update_asset_info_name and update_asset_info_updated_at query functions
and update asset_management.py to use them instead of modifying ORM objects
directly. This ensures the service layer only uses explicit operations from
the queries package.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
cff2f43bb8 refactor: use explicit dataclasses instead of ORM objects in service layer
Replace dict/ORM object returns with explicit dataclasses to fix
DetachedInstanceError when accessing ORM attributes after session closes.

- Add app/assets/services/schemas.py with AssetData, AssetInfoData,
  AssetDetailResult, and RegisterAssetResult dataclasses
- Update asset_management.py and ingest.py to return dataclasses
- Update manager.py to use attribute access on dataclasses
- Fix created_new to be False in create_asset_from_hash (content exists)
- Add DependencyMissingError for better blake3 missing error handling
- Update tests to use attribute access instead of dict subscripting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
2c9fd1c785 fix: handle missing blake3 module gracefully to prevent server crash
Make blake3 an optional import that fails gracefully at import time,
with a clear error message when hashing functions are actually called.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
0b742326ea refactor: remove try-finally wrapper in seed_assets by extracting helpers
Extract focused helper functions to eliminate the try-finally block that
wrapped ~50 lines just for logging. The new helpers (_collect_paths_for_roots,
_build_asset_specs, _insert_asset_specs) make seed_assets a simple linear flow.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
6d6ab81b72 refactor: flatten nested try blocks and if statements in assets package
Extract helper functions to eliminate nested try-except blocks in scanner.py
and remove duplicated type-checking logic in asset_info.py. Simplify nested
conditionals in asset_management.py for clearer control flow.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
915d21afcb refactor: improve function naming for clarity and consistency
Rename functions to use clearer verb-based names:
- pick_best_live_path → select_best_live_path
- escape_like_prefix → escape_sql_like_string
- list_tree → list_files_recursively
- check_asset_file_fast → verify_asset_file_unchanged
- _seed_from_paths_batch → _batch_insert_assets_from_paths
- reconcile_cache_states_for_root → sync_cache_states_with_filesystem
- touch_asset_info_by_id → update_asset_info_access_time
- replace_asset_info_metadata_projection → set_asset_info_metadata
- expand_metadata_to_rows → convert_metadata_to_rows
- _rows_per_stmt → _calculate_rows_per_statement
- ensure_within_base → validate_path_within_base
- _cleanup_temp → _delete_temp_file_if_exists
- validate_hash_format → normalize_and_validate_hash
- get_relative_to_root_category_path_of_asset → get_asset_category_and_relative_path

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
1f9272bc94 refactor: rename functions to verb-based naming convention
Rename functions across app/assets/ to follow verb-based naming:
- is_scalar → check_is_scalar
- project_kv → expand_metadata_to_rows
- _visible_owner_clause → _build_visible_owner_clause
- _chunk_rows → _iter_row_chunks
- _at_least_one → _validate_at_least_one_field
- _tags_norm → _normalize_tags_field
- _ser_dt → _serialize_datetime
- _ser_updated → _serialize_updated_at
- _error_response → _build_error_response
- _validation_error_response → _build_validation_error_response
- file_sender → stream_file_chunks
- seed_assets_endpoint → seed_assets
- utcnow → get_utc_now
- _safe_sort_field → _validate_sort_field
- _safe_filename → _sanitize_filename
- fast_asset_file_check → check_asset_file_fast
- prefixes_for_root → get_prefixes_for_root
- blake3_hash → compute_blake3_hash
- blake3_hash_async → compute_blake3_hash_async
- _is_within → _check_is_within
- _rel → _compute_relative

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
c546e9315b fix: ruff linting errors and add comprehensive test coverage for asset queries
- Fix unused imports in routes.py, asset.py, manager.py, asset_management.py, ingest.py
- Fix whitespace issues in upload.py, asset_info.py, ingest.py
- Fix typo in manager.py (stray character after result["asset"])
- Fix broken import in test_metadata.py (project_kv moved to asset_info.py)
- Add fixture override in queries/conftest.py for unit test isolation

Add 48 new tests covering all previously untested query functions:
- asset.py: upsert_asset, bulk_insert_assets
- cache_state.py: upsert_cache_state, delete_cache_states_outside_prefixes,
  get_orphaned_seed_asset_ids, delete_assets_by_ids, get_cache_states_for_prefixes,
  bulk_set_needs_verify, delete_cache_states_by_ids, delete_orphaned_seed_asset,
  bulk_insert_cache_states_ignore_conflicts, get_cache_states_by_paths_and_asset_ids
- asset_info.py: insert_asset_info, get_or_create_asset_info,
  update_asset_info_timestamps, replace_asset_info_metadata_projection,
  bulk_insert_asset_infos_ignore_conflicts, get_asset_info_ids_by_ids
- tags.py: bulk_insert_tags_and_meta

Total: 119 tests pass (up from 71)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
e8edc4aa93 Move get_comfy_models_folders to path_utils.py to avoid late import
Amp-Thread-ID: https://ampcode.com/threads/T-019c2510-33fa-7199-ae4b-bc31102277a7
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
24ca007bf6 Refactor helpers.py: move functions to their respective modules
- Move scanner-only functions to scanner.py
- Move query-only functions (is_scalar, project_kv) to asset_info.py
- Move get_query_dict to routes.py
- Create path_utils.py service for path-related functions
- Reduce helpers.py to shared utilities only

Amp-Thread-ID: https://ampcode.com/threads/T-019c2510-33fa-7199-ae4b-bc31102277a7
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
e4efb072b0 Move hashing.py to services directory
Amp-Thread-ID: https://ampcode.com/threads/T-019c2510-33fa-7199-ae4b-bc31102277a7
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
ef97ea8880 refactor: move bulk_ops to queries and scanner service
- Delete bulk_ops.py, moving logic to appropriate layers
- Add bulk insert query functions:
  - queries/asset.bulk_insert_assets
  - queries/cache_state.bulk_insert_cache_states_ignore_conflicts
  - queries/cache_state.get_cache_states_by_paths_and_asset_ids
  - queries/asset_info.bulk_insert_asset_infos_ignore_conflicts
  - queries/asset_info.get_asset_info_ids_by_ids
  - queries/tags.bulk_insert_tags_and_meta
- Move seed_from_paths_batch orchestration to scanner._seed_from_paths_batch

Amp-Thread-ID: https://ampcode.com/threads/T-019c24fd-157d-776a-ad24-4f19cf5d3afe
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
48bfd29fb6 refactor: move scanner to services layer with pure query extraction
- Move app/assets/scanner.py to app/assets/services/scanner.py
- Extract pure queries from fast_db_consistency_pass:
  - get_cache_states_for_prefixes()
  - bulk_set_needs_verify()
  - delete_cache_states_by_ids()
  - delete_orphaned_seed_asset()
- Split prune_orphaned_assets into pure queries:
  - delete_cache_states_outside_prefixes()
  - get_orphaned_seed_asset_ids()
  - delete_assets_by_ids()
- Add reconcile_cache_states_for_root() service function
- Add prune_orphaned_assets() service function
- Remove function injection pattern
- Update imports in main.py, server.py, routes.py

Amp-Thread-ID: https://ampcode.com/threads/T-019c24f1-3385-701b-87e0-8b6bc87e841b
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
7372858f12 refactor: move in-function imports to top-level and remove keyword-only argument pattern
- Move imports from inside functions to module top-level in:
  - app/assets/database/queries/asset.py
  - app/assets/database/queries/asset_info.py
  - app/assets/database/queries/cache_state.py
  - app/assets/manager.py
  - app/assets/services/asset_management.py
  - app/assets/services/ingest.py

- Remove keyword-only argument markers (*,) from app/assets/ to match codebase conventions

Amp-Thread-ID: https://ampcode.com/threads/T-019c24eb-bfa2-727f-8212-8bc976048604
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:37 -08:00
Luke Mino-Altherr
02117ae130 Refactor asset database: separate business logic from queries
Architecture changes:
- API Routes -> manager.py (thin adapter) -> services/ (business logic) -> queries/ (atomic DB ops)
- Services own session lifecycle via create_session()
- Queries accept Session as parameter, do single-table atomic operations

New app/assets/services/ layer:
- __init__.py - exports all service functions
- ingest.py - ingest_file_from_path(), register_existing_asset()
- asset_management.py - get_asset_detail(), update_asset_metadata(), delete_asset_reference(), set_asset_preview()
- tagging.py - apply_tags(), remove_tags(), list_tags()

Removed from queries/asset_info.py:
- ingest_fs_asset (moved to services/ingest.py as ingest_file_from_path)
- update_asset_info_full (moved to services/asset_management.py as update_asset_metadata)
- create_asset_info_for_existing_asset (moved to services/ingest.py as register_existing_asset)

Updated manager.py:
- Now a thin adapter that transforms API schemas to/from service calls
- Delegates all business logic to services layer
- No longer imports sqlalchemy.orm.Session or models directly

Test updates:
- Fixed test_cache_state.py import of pick_best_live_path (moved to helpers.py)
- Added comprehensive service layer tests (41 new tests)
- All 112 query + service tests pass

Amp-Thread-ID: https://ampcode.com/threads/T-019c24e2-7ae4-707f-ad19-c775ed8b82b5
Co-authored-by: Amp <amp@ampcode.com>
2026-02-11 17:41:37 -08:00