ComfyUI

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-06-07 00:04:37 +00:00

Author	SHA1	Message	Date
comfyanonymous	514bb8ba21	Fix ideogram if model dtype gets set to fp8. (#14291 )	2026-06-04 19:20:22 -07:00
comfyanonymous	8e3045a90b	Memory usage factor for ideogram 4 on non dynamic vram. (#14264 )	2026-06-03 12:19:18 -04:00
Jukka Seppänen	24f9a020ce	Support Ideogram4 (#14259 )	2026-06-03 08:41:44 -07:00
comfyanonymous	d4c7ebff9c	Remove old useless no comfy kitchen fallback. (#14245 ) * Remove old fallback used when no comfy kitchen. * Remove unused logging import	2026-06-02 17:52:41 -07:00
Quasar of Mikus	e9207aa7cc	fix (MultiGPU): prevent freeze on manual abort when using MultiGPU CFG Split (#14235 ) * fix (MultiGPU): prevent freeze on manual abort when using MultiGPU CFG Split Problem: Upon manual abort application hangs indefinitely. `InterruptProcessingException` inherits from `BaseException` and bypasses MultiGPU's worker error handling block so thread dies silently, leaving the main thread waiting forever for `result_q.get()` Fix: Catch `comfy.model_management.InterruptProcessingException` instead of `Exception` so it's caught and passed back via `result_q` to unblock the main thread when manual abort signal fires. * oops	2026-06-02 10:05:24 -07:00
person4268	c96fcddb81	Radiance: support variant with nonzero txt_ids (#14206 )	2026-06-01 22:07:48 -07:00
comfyanonymous	4b48535a7d	Do tripo dinov3 inference in fp32. (#14221 )	2026-06-01 18:08:20 -07:00
comfyanonymous	e785f0d212	Some cast/dtype fixes for the birefnet and dino3 models. (#14217 )	2026-06-01 14:35:26 -07:00
Jukka Seppänen	462c27fdb2	feat: Add TripoSplat support (#14210 )	2026-06-01 07:01:50 -07:00
savvadesogle	cd45f42a83	fix(multigpu): replace hardcoded torch.cuda.set_device with device-agnostic set_torch_device (#14191 )	2026-05-30 21:18:42 -04:00
comfyanonymous	81aa5a38b2	Speed up ernie model by a bit on nvidia and use higher quality rope. (#14192 )	2026-05-30 17:53:37 -07:00
rattus	f7297bc5a9	Revert deprecation of non-dynamic smart memory (CORE-152 (revert)) (#14183 ) * mm: re-instantate smart memory for VRAM * mm: restore non-dynamic smart memory By popular demand. We aren't quite ready for the deprecation as non dynamic enabled GPUs and some high-vram custom model loader setups prefer the old full hands on.	2026-05-30 15:20:33 -04:00
rattus	e154da83b1	Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116 ) * memory_management: Add direct to read GPU mode Make destination optional (or make it optionally GPU) and use aimdo to file_read direct to GPU. * ops: Remove stream pin buffers and use aimdo reads This consumed too much RAM and its better to just take the hit on the CPU syncing back the stream on a short ring buffer. Aimdo implements this so just rip the stream pin buffer from comfy. * model_management: all active pin registration movement Its better to just let the active model load past the pin limit as pins and let the pins move around. The saves the HDD and SATA people disk traffic while only costing a few GPU syncs. * utils: use aimdo file handle This opens on windows with more favourable flags * mp: only count the model proper for loaded_ram and vram Exclude live loras from the numbers to avoid the case where the reported loaded memory exceeds the size of the model. This causes me confusion in the Kijai visualizer when it looked fully loaded but was hitting disk due to this accounding disrepency. * utils: add bit reverse utility useful for max scattering something ordered. * pinned_memory: Implement offload balancing Use a max scatter alogorithm to prioritize pins of the same size such that when doing a little bit of offloading it gets scattered, allowing the prefetcher to more evenly swollow the offload. * comfy-aimdo 0.4.7 Aimdo 0.4.7 implement VRAM buffer exhaustion predection to avoid early speculative load of weights that definately wont fix once the inference gets further in. * model-prefetch: consolidate pin ensures on the sync point This could happen mid prefetch block, cause a sync of the entire block and lose overlap. Get ahead of the problem with a free down at the natural compute stream sync point. * mm: Put a 2GB min on the pin ceiling This is reasonably bad if it starts causing swap pressure, moreso than during normal ram-cache proceedings. Clamp it. * add --fast-disk	2026-05-30 15:20:04 -04:00
comfyanonymous	0b04660ba3	Speed up anima a bit on nvidia. (#14181 )	2026-05-29 22:47:10 -07:00
comfyanonymous	6e1ef2311b	Remove useless code. (#14178 )	2026-05-29 16:26:46 -07:00
Jukka Seppänen	54d5be4a8e	Fix background removal mask output shape (#14171 )	2026-05-29 09:14:32 -07:00
rattus	684296148e	float: use CK stochastic rounding cuda kernel (#13971 )	2026-05-28 19:23:42 -07:00
comfyanonymous	85a403d1ea	Disable sage attention in stable audio dit and VAE. (#14148 )	2026-05-27 20:35:03 -04:00
Jukka Seppänen	987a937658	Support context window for PiD and fix lq_latent rounding (#14136 )	2026-05-27 12:08:06 -07:00
comfyanonymous	e75a92c1b6	Add memory usage factor for lens model. (#14124 )	2026-05-26 18:06:51 -07:00
comfyanonymous	d8d860a588	Closer memory usage factors for PID (#14123 )	2026-05-26 18:04:55 -07:00
Jukka Seppänen	28f4ef277c	feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103 )	2026-05-26 17:50:14 -07:00
Jukka Seppänen	f9f54cae42	Lens: some cleanup (#14112 ) * Lens: remove redundant memory optimization	2026-05-26 10:32:53 +03:00
Jukka Seppänen	41812fa0ac	feat: Microsoft Lens support (CORE-248) (#14077 )	2026-05-25 23:01:51 -07:00
Ivan Zorin	57414dadfe	fix: cross-attention AdaLN scale, shift, sigma parameters calculation (#14097 )	2026-05-25 20:07:09 -07:00
comfyanonymous	da49b7d0b6	Remove useless annotations imports. (#14105 )	2026-05-25 19:23:29 -07:00
Jedrzej Kosinski	0a2dd86e78	MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063 )	2026-05-25 18:26:40 -07:00
rattus	b30e980a20	cache-ram: lower thresholds (#14089 ) Use the RAM right up to the wire as the community is bit accustomed too. This trades off headroom for the case where large chunky intermediates arrive and potenitally hits pagefile/swap, but a lot of people have "it just fits" workflows out there, so strike a compromise with 75->90%. Disable the incative cache for all but the very high RAM users.	2026-05-24 15:26:50 -07:00
rattus	39f963b4b0	mark loads to pins as cold immediately (#14088 ) This does the posix_fadvise to kick pins out of the disk cache (to avoid a double copy in RAM).	2026-05-24 15:25:59 -07:00
comfyanonymous	08d809d128	Fix --use-flash-attention ignored when xformers installed. (#14083 )	2026-05-23 17:44:28 -07:00
comfyanonymous	d80fcafee7	Remove dead code. (#14072 )	2026-05-22 19:56:36 -07:00
rattus	03e511862e	Fix reshaping lora application (#14031 ) * ModelPatcherDyanmic: purge stale vbar allocs on force cast * ModelPatcherDynamic: restore backups before load If doing a clean reload, mutative changes (lora application) could be applied on-top of the already loaded weight. Restore from backup unconditionally so that the new load is clean.	2026-05-21 09:47:16 -07:00
Edoardo Carmignani	aab41a9ddb	fix(lanczos): correct dimension transposition for single-channel tensors (#12679 )	2026-05-21 23:47:20 +08:00
rattus	5aa5ccc9e0	Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) (#13802 ) * model_management: disable non-dynamic smart memory Disable smart memory outright for non dynamic models. This is a minor step towards deprecation of --disable-dynamic-vram and the legacy ModelPatcher. This is needed for estimate-free model development, where new models can opt-out of supplying a memory estimate and not have to worry about hard VRAM allocations due to legacy non-dynamic model patchers This is also a general stability increase for a lot of stray use cases where estimates may still be off and going forward we are not going to accurately maintain such estimates. * pinned_memory: implement with aimdo growable buffer Use a single growable buffer so we can do threaded pre-warming on pinned memory. * mm: use aimdo to do transfer from disk to pin Aimdo implements a faster threaded loader. * Add stream host pin buffer for AIMDO casts Introduce per-offload-stream HostBuffer reuse for pinned staging, include it in cast buffer reset synchronization. Defer actual casts that go via this pin path to a separate pass such that the buffer can be allocated monolithically (to avoid cudaHostRegister thrash). * remove old pin path * Implement JIT pinned memory pressure Replace the predictive pin pressure mechanism with JIT PIN memory pressure. * LowVRAMPatch: change to two-phase visit * lora: re-implement as inplace swiss-army-knife operation * prepare for multiple pin sets * implement pinned loras * requirements: comfy-aimdo 0.4.0 * ops: remove unused arg This was defeatured in aimdo iteration * ops: sync the CPU with only the offload stream activity This was syncing with the offload stream which itself is synced with the compute stream, so this was syncing CPU with compute transitively. Define the event to sync it more gently. * pins: implement freeing intermediate for pinned memory Pinning is more important than inactive intermediates and the stream pin buffer is more important than even active intermediates. * execution: implement pin eviction on RAM presure Add back proper pin freeing on RAM pressure * implement pin registration swaps Uncap the windows pins from 50% by extending the pool and have a pressure mechanism to move the pin reservations om demand. This unfortunately implies a GPU sync to do the freeing so significant hysterisis needs to be added to consolidate these pressure events. * cli_args/execution: Implement lower background cache-ram threshold Limit the amount of RAM background intermediates can use, so that switching workflows doesn't degrade performance too much. * make default * bump aimdo * model-patcher: force-cast tiny weights Flux 2 gets crazy stalls due to a mix of tiny and giant weights creating lopsided steam buffer rotations which creates stalls. * ops: refactor in prep for chunking * mm: delegate pin-on-the-way to aimdo Aimdo is able to chunk and slice this on the way for better CPU->GPU overlap. The main advantage is the ability to shorten the bus contention window between previous weight transfer and the next weights vbar fault. * bump aimdo * pinning updates * specify hostbuf max allocation size There a signs of virtual memory exhaustion on some linux systems when throwing 128GB for every little piece. Pass the actual to save aimdo from over-estimates * tests: update execution tests for caching The default caching changed to ram-cache so update these tests accordingly. Remove the LRU 0 test as this also falls through to RAM cache.	2026-05-20 17:03:58 -07:00
comfyanonymous	f9c84c94b4	Support Stable Audio 3 model. (#14010 )	2026-05-20 11:34:22 -04:00
Cezarijus Kivylius	78b5dec6b6	fix: Hunyuan3D 2.1 batch size crashes in attention and forward pass (#13699 )	2026-05-20 19:58:49 +08:00
yy	626b082838	Fix typo in ops.py (#11925 )	2026-05-20 05:45:04 +08:00
comfyanonymous	a4382e056e	Use temporal downscale to make empty audio latent nodes more reusable. (#13975 )	2026-05-19 00:14:30 -04:00
comfyanonymous	990a7ae7f2	Initial work to make downscale_ratio_temporal work. (#13972 )	2026-05-18 23:01:43 -04:00
Yousef R. Gamaleldin	187e5237e1	Fix BiRefNet issue (#13966 )	2026-05-19 05:03:22 +08:00
rattus	16f862f02a	implement dynamic clip saving (#13959 ) Fix clip saving by doing the same patching process and diffusion models.	2026-05-18 11:46:40 -07:00
Jukka Seppänen	971c9e3518	HiDream-O1: support area conditioning (#13944 )	2026-05-18 01:17:05 -04:00
Jukka Seppänen	b39af210d0	Fix Qwen3.5 text generation with multiple input images (#13943 )	2026-05-18 01:16:42 -04:00
comfyanonymous	f48d2a017e	Log which quant ops are enabled/emulated. (#13946 )	2026-05-17 16:30:54 -04:00
drozbay	d3607a8e6d	feat: Add downscaled IC-LoRA support to LTXVAddGuide (CORE-102) (#13896 )	2026-05-16 15:02:57 +08:00
comfyanonymous	5d5a4554e1	Remove useless option and clarify what lowvram does. (#13922 )	2026-05-15 17:59:02 -07:00
Jukka Seppänen	33ce449c8b	Reduce LTX2.3 peak VRAM when guide_mask is in use (CORE-166) (#13735 ) - Reduce peak VRAM by handling self_attn_mask more efficiently - Fallback to SDPA when self_attention_mask is used	2026-05-16 00:02:27 +03:00
Jukka Seppänen	77e2ed5e01	feat: Support MoGe (CORE-168) (#13878 )	2026-05-15 10:34:56 +08:00
Talmaj	74c17a25e5	Fix void failing with RuntimeError: start (0) + length (464) exceeds dimension size (461). (#13873 )	2026-05-13 12:37:30 -07:00
comfyanonymous	2bd65f2091	Better Hidream O1 mem usage factor for non dynamic vram. (#13864 )	2026-05-12 20:55:38 -07:00

1 2 3 4 5 ...

2244 Commits