Commit Graph

279 Commits

Author SHA1 Message Date
Daxiong (Lin)
2ca1480f91 chore: update workflow templates to v0.9.82 (#14034) 2026-05-21 11:48:20 -07:00
rattus
5aa5ccc9e0 Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) (#13802)
* model_management: disable non-dynamic smart memory

Disable smart memory outright for non dynamic models.

This is a minor step towards deprecation of --disable-dynamic-vram
and the legacy ModelPatcher.

This is needed for estimate-free model development, where new models
can opt-out of supplying a memory estimate and not have to worry
about hard VRAM allocations due to legacy non-dynamic model patchers

This is also a general stability increase for a lot of stray use cases
where estimates may still be off and going forward we are not going
to accurately maintain such estimates.

* pinned_memory: implement with aimdo growable buffer

Use a single growable buffer so we can do threaded pre-warming on
pinned memory.

* mm: use aimdo to do transfer from disk to pin

Aimdo implements a faster threaded loader.

* Add stream host pin buffer for AIMDO casts

Introduce per-offload-stream HostBuffer reuse for pinned staging,
include it in cast buffer reset synchronization.

Defer actual casts that go via this pin path to a separate pass
such that the buffer can be allocated monolithically (to avoid
cudaHostRegister thrash).

* remove old pin path

* Implement JIT pinned memory pressure

Replace the predictive pin pressure mechanism with JIT PIN memory
pressure.

* LowVRAMPatch: change to two-phase visit

* lora: re-implement as inplace swiss-army-knife operation

* prepare for multiple pin sets

* implement pinned loras

* requirements: comfy-aimdo 0.4.0

* ops: remove unused arg

This was defeatured in aimdo iteration

* ops: sync the CPU with only the offload stream activity

This was syncing with the offload stream which itself is synced with the
compute stream, so this was syncing CPU with compute transitively. Define
the event to sync it more gently.

* pins: implement freeing intermediate for pinned memory

Pinning is more important than inactive intermediates and the stream
pin buffer is more important than even active intermediates.

* execution: implement pin eviction on RAM presure

Add back proper pin freeing on RAM pressure

* implement pin registration swaps

Uncap the windows pins from 50% by extending the pool and have a pressure
mechanism to move the pin reservations om demand.

This unfortunately implies a GPU sync to do the freeing so significant
hysterisis needs to be added to consolidate these pressure events.

* cli_args/execution: Implement lower background cache-ram threshold

Limit the amount of RAM background intermediates can use, so that
switching workflows doesn't degrade performance too much.

* make default

* bump aimdo

* model-patcher: force-cast tiny weights

Flux 2 gets crazy stalls due to a mix of tiny and giant weights
creating lopsided steam buffer rotations which creates stalls.

* ops: refactor in prep for chunking

* mm: delegate pin-on-the-way to aimdo

Aimdo is able to chunk and slice this on the way for better CPU->GPU
overlap. The main advantage is the ability to shorten the bus contention
window between previous weight transfer and the next weights vbar
fault.

* bump aimdo

* pinning updates

* specify hostbuf max allocation size

There a signs of virtual memory exhaustion on some linux systems when
throwing 128GB for every little piece. Pass the actual to save aimdo
from over-estimates

* tests: update execution tests for caching

The default caching changed to ram-cache so update these tests
accordingly.

Remove the LRU 0 test as this also falls through to RAM cache.
2026-05-20 17:03:58 -07:00
Daxiong (Lin)
4efe1ddb5c chore: update workflow templates to v0.9.79 (#14011) 2026-05-20 23:46:20 +08:00
Daxiong (Lin)
3d870ff51f chore: update workflow templates to v0.9.77 (#13895) 2026-05-15 01:25:18 +08:00
Daxiong (Lin)
afb4fa15d5 chore: update workflow templates to v0.9.75 (#13877)
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
2026-05-13 12:33:12 -07:00
Daxiong (Lin)
240363f11e chore: update embedded docs to v0.5.0 (#13865) 2026-05-13 13:33:29 +08:00
Daxiong (Lin)
aa9d2fc713 chore: update workflow templates to v0.9.73 (#13822) 2026-05-10 19:10:13 +08:00
Comfy Org PR Bot
a4b7e3beed Bump comfyui-frontend-package to 1.43.18 (#13809)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-05-09 07:53:10 -07:00
Daxiong (Lin)
25757a53c9 chore: update workflow templates to v0.9.72 (#13732)
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
2026-05-07 00:28:18 -07:00
Comfy Org PR Bot
9c34f5f36a Bump comfyui-frontend-package to 1.43.17 (#13723)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Alexander Brown <DrJKL0424@gmail.com>
2026-05-05 22:22:48 -07:00
Daxiong (Lin)
d794b62939 Update workflow templates to v0.9.69 (#13714)
* chore: update workflow templates to v0.9.69

* Update comfyui-workflow-templates to version 0.9.70

* Downgrade comfyui-workflow-templates to 0.9.69

---------

Co-authored-by: Alexander Piskun <13381981+bigcat88@users.noreply.github.com>
2026-05-05 16:57:27 +03:00
Daxiong (Lin)
1d23a875ed chore: update workflow templates to v0.9.68 (#13678) 2026-05-03 10:06:55 +08:00
Daxiong (Lin)
10b45a71cd chore: update workflow templates to v0.9.66 (#13662)
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
2026-05-01 12:11:30 -07:00
Daxiong (Lin)
e8e8fee224 chore: update workflow templates to v0.9.65 (#13644) 2026-04-30 18:14:28 -07:00
Jedrzej Kosinski
a7d82baa06 Fix SQLAlchemy version format in requirements.txt (#13547)
Change SQLAlchemy>=2.0 to SQLAlchemy>=2.0.0 to satisfy the X.Y.Z
version format expected by install_util.is_valid_version().
2026-04-29 23:30:01 -04:00
rattus
e514119e1e comfy-aimdo v0.3.0 (#13604)
Comfy-aimdo 0.3.0 contains several major new features.

multi-GPU support
ARM support
AMD support

Refactorings include:

Linkless architecture - linkage is now performed purely at runtime
to stop host library lookups completely and only interact with the
torch-loaded Nvidia stack.

Elimination of cudart integration on linux. Its no consistent with
windows.

Misc bugfixes and minor features.
2026-04-28 16:34:37 -04:00
Daxiong (Lin)
1233f077b1 chore: update workflow templates to v0.9.63 (#13586)
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
2026-04-27 10:06:03 -07:00
Comfy Org PR Bot
5e3f15a830 Bump comfyui-frontend-package to 1.42.15 (#13556) 2026-04-24 17:21:39 -07:00
Daxiong (Lin)
47ccecaee0 chore: update workflow templates to v0.9.62 (#13539) 2026-04-23 16:56:13 -07:00
rattus
ef8f3cbcdc comfy-aimdo 0.2.14: Hotfix async allocator estimations (#13534)
This was doing an over-estimate of VRAM used by the async allocator when lots
of little small tensors were in play.

Also change the versioning scheme to == so we can roll forward aimdo without
worrying about stable regressions downstream in comfyUI core.
2026-04-23 11:14:13 -07:00
Daxiong (Lin)
2a14e1e96a chore: update embedded docs to v0.4.4 (#13535) 2026-04-23 08:15:47 -07:00
Daxiong (Lin)
5edbdf4364 chore: update workflow templates to v0.9.61 (#13533) 2026-04-23 07:51:20 -07:00
Daxiong (Lin)
6045c11d8b chore: update workflow templates to v0.9.59 (#13507) 2026-04-21 20:45:25 -07:00
Comfy Org PR Bot
102773cd2c Bump comfyui-frontend-package to 1.42.14 (#13493) 2026-04-21 11:35:45 -07:00
Comfy Org PR Bot
e75f775ae8 Bump comfyui-frontend-package to 1.42.12 (#13489) 2026-04-21 00:43:11 -07:00
Octopus
543e9fba64 fix: pin SQLAlchemy>=2.0 in requirements.txt (fixes #13036) (#13316) 2026-04-20 15:30:23 -07:00
Daxiong (Lin)
f8d92cf313 chore: update workflow templates to v0.9.57 (#13455) 2026-04-17 12:16:39 -05:00
Daxiong (Lin)
7ce3f64c78 Update workflow templates to v0.9.54 (#13412) 2026-04-14 17:35:27 -07:00
Comfy Org PR Bot
c16db7fd69 Bump comfyui-frontend-package to 1.42.11 (#13398) 2026-04-14 14:13:35 -04:00
Daxiong (Lin)
fed4ac031a chore: update workflow templates to v0.9.50 (#13399) 2026-04-14 14:24:37 +08:00
Daxiong (Lin)
559501e4b8 chore: update workflow templates to v0.9.47 (#13385) 2026-04-12 23:19:09 -07:00
Daxiong (Lin)
b920bdd77d chore: update workflow templates to v0.9.45 (#13353) 2026-04-10 15:50:40 -04:00
comfyanonymous
3d4aca8084 Bump comfyui-frontend-package version to 1.42.10 (#13346) 2026-04-09 21:56:49 -04:00
Daxiong (Lin)
8cbbea8f6a chore: update workflow templates to v0.9.44 (#13290) 2026-04-05 13:31:11 +08:00
Daxiong (Lin)
eb0686bbb6 Update template to 0.9.43 (#13265) 2026-04-02 23:52:10 -07:00
Daxiong (Lin)
7d437687c2 chore: update workflow templates to v0.9.41 (#13242) 2026-03-31 20:23:25 -07:00
ComfyUI Wiki
85b7495135 chore: update workflow templates to v0.9.39 (#13196) 2026-03-27 10:13:02 -07:00
ComfyUI Wiki
359559c913 chore: update workflow templates to v0.9.38 (#13176) 2026-03-26 12:07:38 -07:00
comfyanonymous
c2862b24af Update templates package version. (#13141) 2026-03-24 17:36:12 -04:00
comfyanonymous
2d4970ff67 Update frontend version to 1.42.8 (#13126) 2026-03-23 20:43:41 -04:00
Alexander Brown
b67ed2a45f Update comfyui-frontend-package version to 1.41.21 (#13035) 2026-03-18 16:36:39 -04:00
ComfyUI Wiki
379fbd1a82 chore: update workflow templates to v0.9.26 (#13012) 2026-03-16 21:53:18 -07:00
comfyanonymous
4941cd046e Update comfyui-frontend-package to version 1.41.20 (#12954) 2026-03-14 19:53:31 -04:00
rattus
4c4be1bba5 comfy-aimdo 0.2.12 (#12941)
comfy-aimdo 0.2.12 fixes support for non-ASCII filepaths in the new
mmap helper.
2026-03-14 07:53:00 -07:00
rattus
7810f49702 comfy aimdo 0.2.11 + Improved RAM Pressure release strategies - Windows speedups (#12925)
* Implement seek and read for pins

Source pins from an mmap is pad because its its a CPU->CPU copy that
attempts to fully buffer the same data twice. Instead, use seek and
read which avoids the mmap buffering while usually being a faster
read in the first place (avoiding mmap faulting etc).

* pinned_memory: Use Aimdo pinner

The aimdo pinner bypasses pytorches CPU allocator which can leak
windows commit charge.

* ops: bypass init() of weight for embedding layer

This similarly consumes large commit charge especially for TEs. It can
cause a permanement leaked commit charge which can destabilize on
systems close to the commit ceiling and generally confuses the RAM
stats.

* model_patcher: implement pinned memory counter

Implement a pinned memory counter for better accounting of what volume
of memory pins have.

* implement touch accounting

Implement accounting of touching mmapped tensors.

* mm+mp: add residency mmap getter

* utils: use the aimdo mmap to load sft files

* model_management: Implement tigher RAM pressure semantics

Implement a pressure release on entire MMAPs as windows does perform
faster when mmaps are unloaded and model loads free ramp into fully
unallocated RAM.

Make the concept of freeing for pins a completely separate concept.
Now that pins are loadable directly from original file and don' touch
the mmap, tighten the freeing budget to just the current loaded model
- what you have left over. This still over-frees pins, but its a lot
better than before.

So after the pins are freed with that algorithm, bounce entire MMAPs
to free RAM based on what the model needs, deducting off any known
resident-in-mmap tensors to the free quota to keep it as tight as
possible.

* comfy-aimdo 0.2.11

Comfy aimdo 0.2.11

* mm: Implement file_slice path for QT

* ruff

* ops: put meta-tensors in place to allow custom nodes to check geo
2026-03-13 22:18:08 -04:00
Comfy Org PR Bot
6cd35a0c5f Bump comfyui-frontend-package to 1.41.19 (#12923) 2026-03-13 14:31:25 -04:00
Christian Byrne
8d9faaa181 Update requirements.txt (#12910) 2026-03-12 18:14:59 -04:00
ComfyUI Wiki
712411d539 chore: update workflow templates to v0.9.21 (#12908) 2026-03-12 12:16:54 -07:00
comfyanonymous
8f9ea49571 Bump comfy-kitchen version to 0.2.8 (#12895) 2026-03-12 00:17:31 -04:00
Comfy Org PR Bot
9ce4c3dd87 Bump comfyui-frontend-package to 1.41.16 (#12894)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-03-11 18:16:30 -07:00