Commit Graph

2244 Commits

Author SHA1 Message Date
comfyanonymous
514bb8ba21 Fix ideogram if model dtype gets set to fp8. (#14291) 2026-06-04 19:20:22 -07:00
comfyanonymous
8e3045a90b Memory usage factor for ideogram 4 on non dynamic vram. (#14264) 2026-06-03 12:19:18 -04:00
Jukka Seppänen
24f9a020ce Support Ideogram4 (#14259) 2026-06-03 08:41:44 -07:00
comfyanonymous
d4c7ebff9c Remove old useless no comfy kitchen fallback. (#14245)
* Remove old fallback used when no comfy kitchen.

* Remove unused logging import
2026-06-02 17:52:41 -07:00
Quasar of Mikus
e9207aa7cc fix (MultiGPU): prevent freeze on manual abort when using MultiGPU CFG Split (#14235)
* fix (MultiGPU): prevent freeze on manual abort when using MultiGPU CFG Split

Problem:
Upon manual abort application hangs indefinitely.
`InterruptProcessingException` inherits from `BaseException` and bypasses MultiGPU's worker error handling block so thread dies silently, leaving the main thread waiting forever for `result_q.get()`

Fix:
Catch `comfy.model_management.InterruptProcessingException` instead of `Exception` so it's caught and passed back via `result_q` to unblock the main thread when manual abort signal fires.

* oops
2026-06-02 10:05:24 -07:00
person4268
c96fcddb81 Radiance: support variant with nonzero txt_ids (#14206) 2026-06-01 22:07:48 -07:00
comfyanonymous
4b48535a7d Do tripo dinov3 inference in fp32. (#14221) 2026-06-01 18:08:20 -07:00
comfyanonymous
e785f0d212 Some cast/dtype fixes for the birefnet and dino3 models. (#14217) 2026-06-01 14:35:26 -07:00
Jukka Seppänen
462c27fdb2 feat: Add TripoSplat support (#14210) 2026-06-01 07:01:50 -07:00
savvadesogle
cd45f42a83 fix(multigpu): replace hardcoded torch.cuda.set_device with device-agnostic set_torch_device (#14191) 2026-05-30 21:18:42 -04:00
comfyanonymous
81aa5a38b2 Speed up ernie model by a bit on nvidia and use higher quality rope. (#14192) 2026-05-30 17:53:37 -07:00
rattus
f7297bc5a9 Revert deprecation of non-dynamic smart memory (CORE-152 (revert)) (#14183)
* mm: re-instantate smart memory for VRAM

* mm: restore non-dynamic smart memory

By popular demand. We aren't quite ready for the deprecation as non
dynamic enabled GPUs and some high-vram custom model loader setups
prefer the old full hands on.
2026-05-30 15:20:33 -04:00
rattus
e154da83b1 Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116)
* memory_management: Add direct to read GPU mode

Make destination optional (or make it optionally GPU) and use aimdo
to file_read direct to GPU.

* ops: Remove stream pin buffers and use aimdo reads

This consumed too much RAM and its better to just take the hit on
the CPU syncing back the stream on a short ring buffer. Aimdo
implements this so just rip the stream pin buffer from comfy.

* model_management: all active pin registration movement

Its better to just let the active model load past the pin limit as
pins and let the pins move around. The saves the HDD and SATA
people disk traffic while only costing a few GPU syncs.

* utils: use aimdo file handle

This opens on windows with more favourable flags

* mp: only count the model proper for loaded_ram and vram

Exclude live loras from the numbers to avoid the case where the reported
loaded memory exceeds the size of the model.

This causes me confusion in the Kijai visualizer when it looked fully
loaded but was hitting disk due to this accounding disrepency.

* utils: add bit reverse utility

useful for max scattering something ordered.

* pinned_memory: Implement offload balancing

Use a max scatter alogorithm to prioritize pins of the same size such
that when doing a little bit of offloading it gets scattered, allowing
the prefetcher to more evenly swollow the offload.

* comfy-aimdo 0.4.7

Aimdo 0.4.7 implement VRAM buffer exhaustion predection to avoid
early speculative load of weights that definately wont fix once the
inference gets further in.

* model-prefetch: consolidate pin ensures on the sync point

This could happen mid prefetch block, cause a sync of the entire
block and lose overlap. Get ahead of the problem with a free down
at the natural compute stream sync point.

* mm: Put a 2GB min on the pin ceiling

This is reasonably bad if it starts causing swap pressure, moreso than
during normal ram-cache proceedings. Clamp it.

* add --fast-disk
2026-05-30 15:20:04 -04:00
comfyanonymous
0b04660ba3 Speed up anima a bit on nvidia. (#14181) 2026-05-29 22:47:10 -07:00
comfyanonymous
6e1ef2311b Remove useless code. (#14178) 2026-05-29 16:26:46 -07:00
Jukka Seppänen
54d5be4a8e Fix background removal mask output shape (#14171) 2026-05-29 09:14:32 -07:00
rattus
684296148e float: use CK stochastic rounding cuda kernel (#13971) 2026-05-28 19:23:42 -07:00
comfyanonymous
85a403d1ea Disable sage attention in stable audio dit and VAE. (#14148) 2026-05-27 20:35:03 -04:00
Jukka Seppänen
987a937658 Support context window for PiD and fix lq_latent rounding (#14136) 2026-05-27 12:08:06 -07:00
comfyanonymous
e75a92c1b6 Add memory usage factor for lens model. (#14124) 2026-05-26 18:06:51 -07:00
comfyanonymous
d8d860a588 Closer memory usage factors for PID (#14123) 2026-05-26 18:04:55 -07:00
Jukka Seppänen
28f4ef277c feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103) 2026-05-26 17:50:14 -07:00
Jukka Seppänen
f9f54cae42 Lens: some cleanup (#14112)
* Lens: remove redundant memory optimization
2026-05-26 10:32:53 +03:00
Jukka Seppänen
41812fa0ac feat: Microsoft Lens support (CORE-248) (#14077) 2026-05-25 23:01:51 -07:00
Ivan Zorin
57414dadfe fix: cross-attention AdaLN scale, shift, sigma parameters calculation (#14097) 2026-05-25 20:07:09 -07:00
comfyanonymous
da49b7d0b6 Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
Jedrzej Kosinski
0a2dd86e78 MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
rattus
b30e980a20 cache-ram: lower thresholds (#14089)
Use the RAM right up to the wire as the community is bit accustomed too.

This trades off headroom for the case where large chunky intermediates
arrive and potenitally hits pagefile/swap, but a lot of people have
"it just fits" workflows out there, so strike a compromise with
75->90%.

Disable the incative cache for all but the very high RAM users.
2026-05-24 15:26:50 -07:00
rattus
39f963b4b0 mark loads to pins as cold immediately (#14088)
This does the posix_fadvise to kick pins out of the disk cache (to
avoid a double copy in RAM).
2026-05-24 15:25:59 -07:00
comfyanonymous
08d809d128 Fix --use-flash-attention ignored when xformers installed. (#14083) 2026-05-23 17:44:28 -07:00
comfyanonymous
d80fcafee7 Remove dead code. (#14072) 2026-05-22 19:56:36 -07:00
rattus
03e511862e Fix reshaping lora application (#14031)
* ModelPatcherDyanmic: purge stale vbar allocs on force cast

* ModelPatcherDynamic: restore backups before load

If doing a clean reload, mutative changes (lora application) could be
applied on-top of the already loaded weight. Restore from backup
unconditionally so that the new load is clean.
2026-05-21 09:47:16 -07:00
Edoardo Carmignani
aab41a9ddb fix(lanczos): correct dimension transposition for single-channel tensors (#12679) 2026-05-21 23:47:20 +08:00
rattus
5aa5ccc9e0 Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) (#13802)
* model_management: disable non-dynamic smart memory

Disable smart memory outright for non dynamic models.

This is a minor step towards deprecation of --disable-dynamic-vram
and the legacy ModelPatcher.

This is needed for estimate-free model development, where new models
can opt-out of supplying a memory estimate and not have to worry
about hard VRAM allocations due to legacy non-dynamic model patchers

This is also a general stability increase for a lot of stray use cases
where estimates may still be off and going forward we are not going
to accurately maintain such estimates.

* pinned_memory: implement with aimdo growable buffer

Use a single growable buffer so we can do threaded pre-warming on
pinned memory.

* mm: use aimdo to do transfer from disk to pin

Aimdo implements a faster threaded loader.

* Add stream host pin buffer for AIMDO casts

Introduce per-offload-stream HostBuffer reuse for pinned staging,
include it in cast buffer reset synchronization.

Defer actual casts that go via this pin path to a separate pass
such that the buffer can be allocated monolithically (to avoid
cudaHostRegister thrash).

* remove old pin path

* Implement JIT pinned memory pressure

Replace the predictive pin pressure mechanism with JIT PIN memory
pressure.

* LowVRAMPatch: change to two-phase visit

* lora: re-implement as inplace swiss-army-knife operation

* prepare for multiple pin sets

* implement pinned loras

* requirements: comfy-aimdo 0.4.0

* ops: remove unused arg

This was defeatured in aimdo iteration

* ops: sync the CPU with only the offload stream activity

This was syncing with the offload stream which itself is synced with the
compute stream, so this was syncing CPU with compute transitively. Define
the event to sync it more gently.

* pins: implement freeing intermediate for pinned memory

Pinning is more important than inactive intermediates and the stream
pin buffer is more important than even active intermediates.

* execution: implement pin eviction on RAM presure

Add back proper pin freeing on RAM pressure

* implement pin registration swaps

Uncap the windows pins from 50% by extending the pool and have a pressure
mechanism to move the pin reservations om demand.

This unfortunately implies a GPU sync to do the freeing so significant
hysterisis needs to be added to consolidate these pressure events.

* cli_args/execution: Implement lower background cache-ram threshold

Limit the amount of RAM background intermediates can use, so that
switching workflows doesn't degrade performance too much.

* make default

* bump aimdo

* model-patcher: force-cast tiny weights

Flux 2 gets crazy stalls due to a mix of tiny and giant weights
creating lopsided steam buffer rotations which creates stalls.

* ops: refactor in prep for chunking

* mm: delegate pin-on-the-way to aimdo

Aimdo is able to chunk and slice this on the way for better CPU->GPU
overlap. The main advantage is the ability to shorten the bus contention
window between previous weight transfer and the next weights vbar
fault.

* bump aimdo

* pinning updates

* specify hostbuf max allocation size

There a signs of virtual memory exhaustion on some linux systems when
throwing 128GB for every little piece. Pass the actual to save aimdo
from over-estimates

* tests: update execution tests for caching

The default caching changed to ram-cache so update these tests
accordingly.

Remove the LRU 0 test as this also falls through to RAM cache.
2026-05-20 17:03:58 -07:00
comfyanonymous
f9c84c94b4 Support Stable Audio 3 model. (#14010) 2026-05-20 11:34:22 -04:00
Cezarijus Kivylius
78b5dec6b6 fix: Hunyuan3D 2.1 batch size crashes in attention and forward pass (#13699) 2026-05-20 19:58:49 +08:00
yy
626b082838 Fix typo in ops.py (#11925) 2026-05-20 05:45:04 +08:00
comfyanonymous
a4382e056e Use temporal downscale to make empty audio latent nodes more reusable. (#13975) 2026-05-19 00:14:30 -04:00
comfyanonymous
990a7ae7f2 Initial work to make downscale_ratio_temporal work. (#13972) 2026-05-18 23:01:43 -04:00
Yousef R. Gamaleldin
187e5237e1 Fix BiRefNet issue (#13966) 2026-05-19 05:03:22 +08:00
rattus
16f862f02a implement dynamic clip saving (#13959)
Fix clip saving by doing the same patching process and diffusion
models.
2026-05-18 11:46:40 -07:00
Jukka Seppänen
971c9e3518 HiDream-O1: support area conditioning (#13944) 2026-05-18 01:17:05 -04:00
Jukka Seppänen
b39af210d0 Fix Qwen3.5 text generation with multiple input images (#13943) 2026-05-18 01:16:42 -04:00
comfyanonymous
f48d2a017e Log which quant ops are enabled/emulated. (#13946) 2026-05-17 16:30:54 -04:00
drozbay
d3607a8e6d feat: Add downscaled IC-LoRA support to LTXVAddGuide (CORE-102) (#13896) 2026-05-16 15:02:57 +08:00
comfyanonymous
5d5a4554e1 Remove useless option and clarify what lowvram does. (#13922) 2026-05-15 17:59:02 -07:00
Jukka Seppänen
33ce449c8b Reduce LTX2.3 peak VRAM when guide_mask is in use (CORE-166) (#13735)
- Reduce peak VRAM by handling self_attn_mask more efficiently
- Fallback to SDPA when self_attention_mask is used
2026-05-16 00:02:27 +03:00
Jukka Seppänen
77e2ed5e01 feat: Support MoGe (CORE-168) (#13878) 2026-05-15 10:34:56 +08:00
Talmaj
74c17a25e5 Fix void failing with RuntimeError: start (0) + length (464) exceeds dimension size (461). (#13873) 2026-05-13 12:37:30 -07:00
comfyanonymous
2bd65f2091 Better Hidream O1 mem usage factor for non dynamic vram. (#13864) 2026-05-12 20:55:38 -07:00