* mm: re-instantate smart memory for VRAM
* mm: restore non-dynamic smart memory
By popular demand. We aren't quite ready for the deprecation as non
dynamic enabled GPUs and some high-vram custom model loader setups
prefer the old full hands on.
* memory_management: Add direct to read GPU mode
Make destination optional (or make it optionally GPU) and use aimdo
to file_read direct to GPU.
* ops: Remove stream pin buffers and use aimdo reads
This consumed too much RAM and its better to just take the hit on
the CPU syncing back the stream on a short ring buffer. Aimdo
implements this so just rip the stream pin buffer from comfy.
* model_management: all active pin registration movement
Its better to just let the active model load past the pin limit as
pins and let the pins move around. The saves the HDD and SATA
people disk traffic while only costing a few GPU syncs.
* utils: use aimdo file handle
This opens on windows with more favourable flags
* mp: only count the model proper for loaded_ram and vram
Exclude live loras from the numbers to avoid the case where the reported
loaded memory exceeds the size of the model.
This causes me confusion in the Kijai visualizer when it looked fully
loaded but was hitting disk due to this accounding disrepency.
* utils: add bit reverse utility
useful for max scattering something ordered.
* pinned_memory: Implement offload balancing
Use a max scatter alogorithm to prioritize pins of the same size such
that when doing a little bit of offloading it gets scattered, allowing
the prefetcher to more evenly swollow the offload.
* comfy-aimdo 0.4.7
Aimdo 0.4.7 implement VRAM buffer exhaustion predection to avoid
early speculative load of weights that definately wont fix once the
inference gets further in.
* model-prefetch: consolidate pin ensures on the sync point
This could happen mid prefetch block, cause a sync of the entire
block and lose overlap. Get ahead of the problem with a free down
at the natural compute stream sync point.
* mm: Put a 2GB min on the pin ceiling
This is reasonably bad if it starts causing swap pressure, moreso than
during normal ram-cache proceedings. Clamp it.
* add --fast-disk
* Move dataset/text nodes to text category
* Rename category utils into utilities
* Rename category api node into partner
* Move categories conditioning, latent, sampling, model_patches, training, etc. under model category
* Dispatch partner nodes in to 3d, audio, image, text, video categories
* Move PreviewAny node to utilities category
* openapi: document QueueManageResponse body on POST /api/queue
The Cloud runtime returns a JSON body from POST /api/queue describing which
prompts were deleted and whether the queue was cleared. The spec previously
declared a bare 200 with no schema, so generated clients had no type for the
response.
Adds a QueueManageResponse schema ({deleted, cleared}) and references it from
the 200 response. Tagged x-runtime: [cloud] with a [cloud-only] description:
local ComfyUI returns an empty 200 body, so both fields are nullable.
* openapi: fix GET /api/hub/labels response to the label-catalog shape (#14118)
* openapi: fix GET /api/hub/labels response to the label-catalog shape
GET /api/hub/labels returns the catalog of available labels you can filter by,
which the Cloud runtime serves as {labels: HubLabelInfo[]} (slug name,
display_name, and a type category: tag/model/custom_node).
The spec had this operation returning a bare array of HubLabel ({id, name,
color}) — that schema models the label chips attached to a published workflow
(HubWorkflow.labels), a different object. The catalog schema (HubLabelInfo)
already existed but was unreferenced.
Repoints the 200 response to a new HubLabelListResponse wrapper over the
existing HubLabelInfo. HubLabel is unchanged and still used by
HubWorkflow.labels. Endpoint remains x-runtime: [cloud].
* openapi: add Cloud-runtime fields (workflow_id, execution_error) to JobEntry (#14119)
* openapi: add Cloud-runtime fields workflow_id, execution_error to JobEntry
The Cloud runtime returns two additional fields on JobEntry that the spec
didn't declare:
- workflow_id: UUID of the Cloud workflow entity the job is associated with
- execution_error: structured ComfyUI execution error for failed jobs
(reuses the existing ExecutionError schema)
Both tagged x-runtime: [cloud] with [cloud-only] descriptions; local ComfyUI
does not populate them.
* openapi: document Cloud-runtime request fields on POST /api/assets/export (#14120)
The Cloud runtime accepts three request fields on /api/assets/export that the
spec didn't declare:
- job_ids: include all assets associated with the given jobs
- naming_strategy: how to name files in the ZIP (enum, default group_by_job_time)
- job_asset_name_filters: optional per-job asset-name allowlist
Also drops asset_ids from required: the runtime supports exporting by job_ids
alone, so neither field is individually required.
/api/assets/export is already x-runtime: [cloud]; these are plain field
additions under that endpoint-level tag.
* Emit `hash` alongside `asset_hash` on all Asset responses
Add a `hash` field to the Asset response schema that carries the same
value as the existing `asset_hash` field. Both fields are now populated
in _build_asset_response, so every Asset-returning endpoint (GET, POST,
PUT) includes both.
No existing fields are removed. Tests updated to assert both fields.
Co-authored-by: Matt Miller <MillerMedia@users.noreply.github.com>
* Tighten hash field tests and DRY response builder
- Extract assert_hash_fields_consistent() helper that verifies presence
parity and value equality, replacing body.get()-based assertions that
treated missing keys and explicit nulls identically.
- Conftest seeded_asset fixture and seed-asset list assertions now check
key absence directly, so a regression that surfaces null fields would
be caught (validates exclude_none behavior).
- DRY duplicate hash expression in _build_asset_response.
- Add list-endpoint coverage asserting hash is present and consistent on
populated assets.
- Add schema-level test asserting AssetCreated inherits the hash field
from Asset, guarding against future inheritance drift.
---------
Co-authored-by: Matt Miller <MillerMedia@users.noreply.github.com>
Co-authored-by: guill <jacob.e.segal@gmail.com>
Use the RAM right up to the wire as the community is bit accustomed too.
This trades off headroom for the case where large chunky intermediates
arrive and potenitally hits pagefile/swap, but a lot of people have
"it just fits" workflows out there, so strike a compromise with
75->90%.
Disable the incative cache for all but the very high RAM users.