Add ESSENTIALS_CATEGORY or essentials_category to 12 node classes and all
36 blueprint JSONs. Update SubgraphEntry TypedDict and subgraph_manager to
extract and pass through the field.
Fixes COM-15221
Amp-Thread-ID: https://ampcode.com/threads/T-019c83de-f7ab-7779-a451-0ba5940b56a9
* chore: tune CodeRabbit config to limit review scope and disable for drafts
- Add tone_instructions to focus only on newly introduced issues
- Add global path_instructions entry to ignore pre-existing issues in moved/reformatted code
- Disable draft PR reviews (drafts: false) and add WIP title keywords
- Disable ruff tool to prevent linter-based outside-diff-range comments
Addresses feedback from maintainers about CodeRabbit flagging pre-existing
issues in code that was merely moved or de-indented (e.g., PR #12557),
which can discourage community contributions and cause scope creep.
Amp-Thread-ID: https://ampcode.com/threads/T-019c82de-0481-7253-ad42-20cb595bb1ba
* chore: add 'DO NOT MERGE' to ignore_title_keywords
Amp-Thread-ID: https://ampcode.com/threads/T-019c82de-0481-7253-ad42-20cb595bb1ba
On Windows, Python defaults to cp1252 encoding when no encoding is
specified. JSON files containing UTF-8 characters (e.g., non-ASCII
characters) cause UnicodeDecodeError when read with cp1252.
This fixes the error that occurs when loading blueprint subgraphs
on Windows systems.
https://claude.ai/code/session_014WHi3SL9Gzsi3U6kbSjbSb
Co-authored-by: Claude <noreply@anthropic.com>
Integrate comfy-aimdo 0.2 which takes a different approach to
installing the memory allocator hook. Instead of using the complicated
and buggy pytorch MemPool+CudaPluggableAlloctor, cuda is directly hooked
making the process much more transparent to both comfy and pytorch. As
far as pytorch knows, aimdo doesnt exist anymore, and just operates
behind the scenes.
Remove all the mempool setup stuff for dynamic_vram and bump the
comfy-aimdo version. Remove the allocator object from memory_management
and demote its use as an enablment check to a boolean flag.
Comfy-aimdo 0.2 also support the pytorch cuda async allocator, so
remove the dynamic_vram based force disablement of cuda_malloc and
just go back to the old settings of allocators based on command line
input.
Add 24 non-cloud essential blueprints from comfyui-wiki/Subgraph-Blueprints.
These cover common workflows: text/image/video generation, editing,
inpainting, outpainting, upscaling, depth maps, pose, captioning, and more.
Cloud-only blueprints (5) are excluded and will be added once
client-side distribution filtering lands.
Amp-Thread-ID: https://ampcode.com/threads/T-019c6f43-6212-7308-bea6-bfc35a486cbf
gemini-3-pro-image-preview nondeterministically returns image/jpeg
instead of image/png. get_image_from_response() hardcoded
get_parts_by_type(response, "image/png"), silently dropping JPEG
responses and falling back to torch.zeros (all-black output).
Add _mime_matches() helper using fnmatch for glob-style MIME matching.
Change get_image_from_response() to request "image/*" so any image
format returned by the API is correctly captured.
This check was far too broad and the dtype is not a reliable indicator
of wanting the requant (as QT returns the compute dtype as the dtype).
So explictly plumb whether fp8mm wants the requant or not.
* lora: add weight shape calculations.
This lets the loader know if a lora will change the shape of a weight
so it can take appropriate action.
* MPDynamic: force load flux img_in weight
This weight is a bit special, in that the lora changes its geometry.
This is rather unique, not handled by existing estimate and doesn't
work for either offloading or dynamic_vram.
Fix for dynamic_vram as a special case. Ideally we can fully precalculate
these lora geometry changes at load time, but just get these models
working first.
* lora_extract: Add a trange
If you bite off more than your GPU can chew, this kinda just hangs.
Give a rough indication of progress counting the weights in a trange.
* lora_extract: Support on-the-fly patching
Use the on-the-fly approach from the regular model saving logic for
lora extraction too. Switch off force_cast_weights accordingly.
This gets extraction working in dynamic vram while also supporting
extraction on GPU offloaded.