From d460d409d2c9596cce9908d8061030dc597bad31 Mon Sep 17 00:00:00 2001 From: Luke Mino-Altherr Date: Thu, 5 Feb 2026 21:35:58 -0800 Subject: [PATCH] docs: remove background-asset-seeder.md Amp-Thread-ID: https://ampcode.com/threads/T-019c316d-13f7-77f8-b92b-ea7276c3e09c Co-authored-by: Amp --- docs/design/background-asset-seeder.md | 195 ------------------------- 1 file changed, 195 deletions(-) delete mode 100644 docs/design/background-asset-seeder.md diff --git a/docs/design/background-asset-seeder.md b/docs/design/background-asset-seeder.md deleted file mode 100644 index 496560fca..000000000 --- a/docs/design/background-asset-seeder.md +++ /dev/null @@ -1,195 +0,0 @@ -# Background Asset Seeder Design Document - -## Problem Statement - -The `seed_assets` function in `app/assets/scanner.py` scans filesystem directories and imports assets into the database. Currently it runs synchronously, which causes two problems: - -1. **Startup blocking**: When ComfyUI starts, `setup_database()` calls `seed_assets(["models"])` synchronously. If the models directory contains thousands of files, startup is delayed significantly before the UI becomes available. - -2. **API request blocking**: The `POST /api/assets/seed` endpoint runs synchronously, blocking the HTTP request until scanning completes. For large directories, this causes request timeouts and a poor user experience. - -## Goals - -- Move asset scanning to a background thread so startup and API requests return immediately -- Provide visibility into scan progress via API and WebSocket events -- Support graceful cancellation of in-progress scans -- Maintain backward compatibility for tests that rely on synchronous behavior -- Ensure thread safety when accessing shared state - -## Non-Goals - -- Targeted/priority scanning of specific paths (addressed separately via synchronous `scan_paths()`) -- Parallel scanning across multiple threads -- Persistent scan state across restarts - ---- - -## Architecture Overview - -### Component: AssetSeeder Singleton - -A new `AssetSeeder` class in `app/assets/seeder.py` manages background scanning with the following responsibilities: - -- Owns a single background `threading.Thread` for scanning work -- Tracks scan state and progress in a thread-safe manner -- Provides cancellation support via `threading.Event` -- Emits WebSocket events for UI progress updates - -### State Machine - -``` -IDLE ──start()──► RUNNING ──(completes)──► IDLE - │ - cancel() - │y - ▼ - CANCELLING ──(thread exits)──► IDLE -``` - -### Integration Points - -| Component | Change | -|-----------|--------| -| `main.py` | Call `asset_seeder.start()` (non-blocking) instead of `seed_assets()` | -| `main.py` | Call `asset_seeder.shutdown()` in `finally` block alongside `cleanup_temp()` | -| API routes | New endpoints for status and cancellation | -| WebSocket | Emit progress events during scanning | -| Test helper | Use `?wait=true` query param for synchronous behavior | - ---- - -## Tasks - -### Task 1: Create AssetSeeder Class - -**Description**: Implement the core `AssetSeeder` singleton class with thread management, state tracking, and cancellation support. - -**Acceptance Criteria**: -- [ ] `AssetSeeder` is a singleton accessible via module-level instance -- [ ] `State` enum with values: `IDLE`, `RUNNING`, `CANCELLING` -- [ ] `start(roots)` method spawns a daemon thread and returns immediately -- [ ] `start()` is idempotent—calling while already running is a no-op and returns `False` -- [ ] `cancel()` method signals the thread to stop gracefully -- [ ] `wait(timeout)` method blocks until thread completes or timeout expires -- [ ] `get_status()` returns current state and progress information -- [ ] All state access is protected by `threading.Lock` -- [ ] Thread creates its own database sessions (no session sharing across threads) -- [ ] Progress tuple tracks: `(scanned, total, created, skipped)` -- [ ] Errors during scanning are captured and available via `get_status()` - -### Task 2: Add Cancellation Checkpoints - -**Description**: Modify the scanning logic to check for cancellation between batches, allowing graceful early termination. - -**Acceptance Criteria**: -- [ ] `threading.Event` is checked between batch operations -- [ ] When cancellation is requested, current batch completes before stopping -- [ ] Partial progress is committed (assets already scanned remain in database) -- [ ] State transitions to `IDLE` after cancellation completes -- [ ] Cancellation is logged with partial progress statistics - -### Task 3: Update Startup Integration - -**Description**: Modify `main.py` to use non-blocking asset seeding at startup. - -**Acceptance Criteria**: -- [ ] `setup_database()` calls `asset_seeder.start(roots=["models"])` instead of `seed_assets()` -- [ ] Startup proceeds immediately without waiting for scan completion -- [ ] `asset_seeder.shutdown()` called in `finally` block alongside `cleanup_temp()` -- [ ] `--disable-assets-autoscan` flag continues to skip seeding entirely -- [ ] Startup logs indicate background scan was initiated (not completed) - -### Task 4: Create API Endpoints - -**Description**: Add REST endpoints for triggering, monitoring, and cancelling background scans. - -**Endpoints**: - -#### POST /api/assets/seed -Trigger a background scan for specified roots. - -**Acceptance Criteria**: -- [ ] Accepts `{"roots": ["models", "input", "output"]}` in request body -- [ ] Returns `202 Accepted` with `{"status": "started"}` when scan begins -- [ ] Returns `409 Conflict` with `{"status": "already_running"}` if scan in progress -- [ ] Supports `?wait=true` query param for synchronous behavior (blocks until complete) -- [ ] With `?wait=true`, returns `200 OK` with final statistics on completion - -#### GET /api/assets/seed/status -Get current scan status and progress. - -**Acceptance Criteria**: -- [ ] Returns `{"state": "IDLE|RUNNING|CANCELLING", "progress": {...}, "errors": [...]}` -- [ ] Progress object includes: `scanned`, `total`, `created`, `skipped` -- [ ] When idle, progress reflects last completed scan (or null if never run) -- [ ] Errors array contains messages from any failures during last/current scan - -#### POST /api/assets/seed/cancel -Request cancellation of in-progress scan. - -**Acceptance Criteria**: -- [ ] Returns `200 OK` with `{"status": "cancelling"}` if scan was running -- [ ] Returns `200 OK` with `{"status": "idle"}` if no scan was running -- [ ] Cancellation is graceful—does not corrupt database state - -### Task 5: Add WebSocket Progress Events - -**Description**: Emit WebSocket events during scanning so the UI can display progress. - -**Acceptance Criteria**: -- [ ] Event type: `assets.seed.started` with `{"roots": [...], "total": N}` -- [ ] Event type: `assets.seed.progress` with `{"scanned": N, "total": M, "created": C}` -- [ ] Event type: `assets.seed.completed` with final statistics -- [ ] Event type: `assets.seed.cancelled` if scan was cancelled -- [ ] Event type: `assets.seed.error` if scan failed with error message -- [ ] Progress events emitted at reasonable intervals (not every file, ~every 100 files or 1 second) - -### Task 6: Update Test Helper - -**Description**: Modify the test helper to use synchronous behavior via query parameter. - -**Acceptance Criteria**: -- [ ] `trigger_sync_seed_assets()` uses `?wait=true` query parameter -- [ ] Tests continue to pass with synchronous blocking behavior -- [ ] Remove artificial `time.sleep(0.2)` delay (no longer needed with `wait=true`) - -### Task 7: Unit Tests for AssetSeeder - -**Description**: Add unit tests covering the seeder state machine and thread safety. - -**Acceptance Criteria**: -- [ ] Test: `start()` transitions state from IDLE to RUNNING -- [ ] Test: `start()` while RUNNING returns False (idempotent) -- [ ] Test: `cancel()` transitions state from RUNNING to CANCELLING -- [ ] Test: `wait()` blocks until thread completes -- [ ] Test: `wait(timeout)` returns False if timeout expires -- [ ] Test: `get_status()` returns correct progress during scan -- [ ] Test: Concurrent `start()` calls are safe (only one thread spawned) -- [ ] Test: Scan commits partial progress on cancellation -- [ ] Test: Database errors are captured in status, don't crash thread - ---- - -## Thread Safety Considerations - -| Shared Resource | Protection Strategy | -|-----------------|---------------------| -| `_state` enum | Protected by `threading.Lock` | -| `_progress` tuple | Protected by `threading.Lock` | -| `_errors` list | Protected by `threading.Lock` | -| `_thread` reference | Protected by `threading.Lock` | -| `_cancel_event` | `threading.Event` (inherently thread-safe) | -| Database sessions | Created per-operation inside thread (no sharing) | - -## Error Handling - -- Database connection failures: Log error, set state to IDLE, populate errors list -- Individual file scan failures: Log warning, continue with next file, increment error count -- Thread crashes: Caught by outer try/except, state reset to IDLE, error captured - -## Future Considerations - -- **Priority queue**: If targeted scans need to be non-blocking in the future, the seeder could be extended with a priority queue -- **Persistent state**: Scan progress could be persisted to allow resume after restart -- **Parallel scanning**: Multiple threads could scan different roots concurrently (requires careful session management) -- **Throttling**: If scanning competes with generation (e.g., disk I/O contention when hashing large files), add configurable sleep between batches. Currently considered low risk since scanning is I/O-bound and generation is GPU-bound.