Commit Graph

2182 Commits

Author SHA1 Message Date
Jukka Seppänen
cd8c7a2306 Throttle dynamic VRAM prepare logging (#13704) 2026-05-07 10:41:13 +08:00
Talmaj
78b3096bf3 Void model - pass 1 & 2 (CORE-38) (#13403) 2026-05-05 19:59:04 -07:00
drozbay
e5369c0eec feat: Context windows - add causal_window_fix to improve blending of context windows (CORE-100) (#13563)
* Context windows: add causal_window_fix toggle

* Fix slice_cond to correctly handle causal anchor index for temporal offsets
2026-05-05 16:40:53 -07:00
drozbay
1655f8089a Add temporal_downscale_ratio to LatentFormat (#13702)
Co-authored-by: ozbayb <17261091+ozbayb@users.noreply.github.com>
Co-authored-by: Alexis Rolland <alexisrolland@hotmail.com>
Co-authored-by: Jukka Seppänen <40791699+kijai@users.noreply.github.com>
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
2026-05-05 16:30:00 -07:00
Talmaj
fed8d5efa6 feat: Auto-regressive video generation (CORE-25) (#13082) 2026-05-04 21:01:22 -07:00
Jedrzej Kosinski
e758594e3b Add deploy environment header (Comfy-Env) to partner node API calls (#13425) 2026-05-04 20:17:56 -07:00
Jedrzej Kosinski
ae457da84b feat: add generic --feature-flag CLI arg and --list-feature-flags registry (#13685) 2026-05-04 19:50:26 -07:00
rattus
1265955b34 ops: handle multi-compute of the same weight (#13705)
If the same weight is used multiple times within the same prefetch
window, it should only apply compute state mutations once. Mark the
weight as fully resident on the first pass accordingly.
2026-05-04 16:40:57 -07:00
rattus
1ac78180b3 make control-net load order deterministic (#13701)
Make this deterministic so speeds dont change base of load order. Load
them in reverse order so whatever the caller lists first is the top
priority.
2026-05-04 12:58:06 -07:00
rattus
c47633f3be prefetch: guard against no offload (#13703)
cast_ will return no stream if there is no work to do. guard against
this is the consume logic.
2026-05-04 12:56:05 -07:00
Silver
b138133ffa Enable triton comfy kitchen via cli-arg (#12730) 2026-05-03 14:07:21 -04:00
Jukka Seppänen
be95871adc feat: Gemma4 text generation support (CORE-30) (#13376)
* initial gemma4 support

* parity with reference implementation

outputs can 100% match transformers with same sdpa flags, checkpoint this and then optimize

* Cleanup, video fixes

* cleanup, enable fused rms norm by default

* update comment

* Cleanup

* Update sd.py

* Various fixes

* Add fp8 scaled embedding support

* small fixes

* Translate think tokens

* Fix image encoder attention mask type

So it works with basic attention

* Handle thinking tokens different only for Gemma4

* Code cleanup

* Update nodes_textgen.py

* Use embed scale class instead of buffer

Slight difference to HF, but technically more accurate and simpler code

* Default to fused rms_norm

* Update gemma4.py
2026-05-02 22:46:15 -04:00
rattus
783782d5d7 Implement block prefetch + Lora Async load + and adopt in LTX (Speedup!) (CORE-111) (#13618)
* mm: Use Aimdo raw allocator for cast buffers

pytorch manages allocation of growing buffers on streams poorly. Pyt
has no windows support for the expandable segments allocator (which is
the right tool for this job), while also segmenting the memory by
stream such that it can be generally re-used. So kick the problem to
aimdo which can just grow a virtual region thats freed per stream.

* plan

* ops: move cpu handler up to the caller

* ops: split up prefetch from weight prep block prefetching API

Split up the casting and weight formating/lora stuff in prep for
arbitrary prefetch support.

* ops: implement block prefetching API

allow a model to construct a prefetch list and operate it for increased
async offload.

* ltxv2: Implement block prefetching

* Implement lora async offload

Implement async offload of loras.
2026-05-02 19:23:24 -04:00
Simon Lui
63103d519e Remove IPEX and clean up checks and add missing synchronize during empty cache. (#13653) 2026-05-01 14:16:41 -07:00
Talmaj
cf9cbec596 Reformat models variable into multiline array CORE-59 (#13513)
Co-authored-by: Talmaj Marinc <talmaj@comfy.org>
2026-05-01 17:20:11 +08:00
Rainer
e9c311b245 OneTainer ERNIE LoRA support (#13640) 2026-04-30 19:33:41 -04:00
blepping
a164c82913 Add high quality preview support for Flux2 latents (#13496) 2026-04-29 19:37:30 -04:00
Talmaj
5eeae3f1d8 Cogvideox (#13402)
---------

Co-authored-by: kijai <40791699+kijai@users.noreply.github.com>
Co-authored-by: Talmaj Marinc <talmaj@comfy.org>
2026-04-29 19:30:08 -04:00
Jukka Seppänen
0e25a6936e Reduce video tiny VAE peak VRAM and decode time (CORE-127) (#13617)
* Update taehv.py

* Simplify

* Simplify pixel_unshuffle dispatch
2026-04-29 12:15:10 -07:00
rattus
fce0398470 dynamicVRAM + --cache-ram 2 (CORE-117) (#13603)
* pinned_memory: remove JIT RAM pressure release

This doesn't work, as freeing intermediates for pins needs to be
higher-priority than freeing pins-for-pins if and when you are going
to do that. So this is too late as pins-for-pins is model load time
and we dont have JIT pins-for-pins.

* cacheing: Add a filter to only free intermediates from inactive wfs

This is to get priorities in amongst pins straight.

* mm: free inactive-ram from RAM cache first

Stuff from inactive workflows should be freed before anything else.

* caching: purge old ModelPatchers first

Dont try and score them, just dump them at the first sign of trouble
if they arent part of the workflow.
2026-04-28 19:15:02 -04:00
rattus
b47f15f25a fix: Handle un-inited meta-tensors in models (fixes a CPU TE crash) (CORE-67) (#13578) 2026-04-27 22:22:31 -04:00
Jukka Seppänen
084e08c6e2 Disable sageattention for SAM3 (#13529)
Causes Nans
2026-04-23 11:14:42 -07:00
Jukka Seppänen
749d5b4e8d feat: SAM (segment anything) 3.1 support (CORE-34) (#13408) 2026-04-23 00:07:43 -04:00
rattus
ec4b1659ab ModelPatcherDynamic: force cast stray weights on comfy layers (#13487)
the mixed_precision ops can have input_scale parameters that are used
in tensor math but arent a weight or bias so dont get proper VRAM
management. Treat these as force-castable parameters like the non comfy
weight, random params are buffers already are.
2026-04-22 18:13:38 -04:00
blepping
9949c19c63 Derive InterruptProcessingException from BaseException (#13523) 2026-04-22 18:08:19 -04:00
Octopus
cc6f9500a1 fix: use Parameter assignment for Stable_Zero123 cc_projection weights (fixes #13492) (#13518)
On Windows with aimdo enabled, disable_weight_init.Linear uses lazy
initialization that sets weight and bias to None to avoid unnecessary
memory allocation. This caused a crash when copy_() was called on the
None weight attribute in Stable_Zero123.__init__.

Replace copy_() with direct torch.nn.Parameter assignment, which works
correctly on both Windows (aimdo enabled) and other platforms.
2026-04-22 15:05:43 -07:00
Jukka Seppänen
eb22225387 Support standalone LTXV audio VAEs (#13499) 2026-04-21 10:46:37 -07:00
comfyanonymous
ad94d47221 Make the ltx audio vae more native. (#13486) 2026-04-21 11:02:42 -04:00
comfyanonymous
3d816db07f Some optimizations to make Ernie inference a bit faster. (#13472) 2026-04-18 23:02:29 -04:00
Jukka Seppänen
b9dedea57d feat: SUPIR model support (CORE-17) (#13250) 2026-04-18 23:02:01 -04:00
Bedovyy
b41ab53b6f Use ErnieTEModel_ not ErnieTEModel. (#13431) 2026-04-16 10:11:58 -04:00
Jun Yamog
1de83f91c3 Fix OOM regression in _apply() for quantized models during inference (#13372)
Skip unnecessary clone of inference-mode tensors when already inside
torch.inference_mode(), matching the existing guard in set_attr_param.
The unconditional clone introduced in 20561aa9 caused transient VRAM
doubling during model movement for FP8/quantized models.
2026-04-15 02:10:36 -07:00
comfyanonymous
cb0bbde402 Fix ernie on devices that don't support fp64. (#13414) 2026-04-14 22:54:47 -04:00
comfyanonymous
722bc73319 Make text generation work with ministral model. (#13395)
Needs template before it works properly.
2026-04-13 20:43:57 -04:00
comfyanonymous
402ff1cdb7 Fix issue with ernie image. (#13393) 2026-04-13 16:38:42 -04:00
comfyanonymous
c2657d5fb9 Fix typo. (#13382) 2026-04-12 23:37:13 -04:00
comfyanonymous
31283d2892 Implement Ernie Image model. (#13369) 2026-04-11 22:29:31 -04:00
comfyanonymous
55ebd287ee Add a supports_fp64 function. (#13368) 2026-04-11 21:06:36 -04:00
Jukka Seppänen
a134423890 SDPose: resize input always (#13349) 2026-04-10 11:26:55 -10:00
huemin
b615af1c65 Add support for small flux.2 decoder (#13314) 2026-04-07 03:44:18 -04:00
comfyanonymous
40862c0776 Support Ace Step 1.5 XL model. (#13317) 2026-04-07 03:13:47 -04:00
comfyanonymous
0c63b4f6e3 Remove dead code. (#13251) 2026-04-01 20:22:06 -04:00
comfyanonymous
e2ddf28d78 Fix some fp8 scaled checkpoints no longer working. (#13239) 2026-03-31 14:27:17 -07:00
rattus
8d723d2caa Fix/tweak pinned memory accounting (#13221)
* mm: Lower windows pin threshold

Some workflows have more extranous use of shared GPU memory than is
accounted for in the 5% pin headroom. Lower this for safety.

* mm: Remove pin count clearing threshold.

TOTAL_PINNED_MEMORY is shared between the legacy and aimdo pinning
systems, however this catch-all assumes only the legacy system exists.
Remove the catch-all as the PINNED_MEMORY buffer is coherent already.
2026-03-29 16:43:24 -07:00
Jukka Seppänen
a500f1edac CORE-13 feat: Support RT-DETRv4 detection model (#12748) 2026-03-28 23:34:10 -04:00
comfyanonymous
3f77450ef1 Fix #13214 (#13216) 2026-03-28 22:35:59 -04:00
rattus
b353a7c863 Integrate RAM cache with model RAM management (#13173) 2026-03-27 21:34:16 -04:00
comfyanonymous
3a56201da5 Allow flux conditioning without a pooled output. (#13198) 2026-03-27 20:36:26 -04:00
Jukka Seppänen
b0fd65e884 fix: regression in text generate with LTXAV model (#13170) 2026-03-26 09:55:05 -07:00
comfyanonymous
2a1f402601 Make Qwen 8B work with TextGenerate node. (#13160) 2026-03-25 23:21:44 -04:00