Commit Graph

545 Commits

Author SHA1 Message Date
comfyanonymous
3d816db07f Some optimizations to make Ernie inference a bit faster. (#13472) 2026-04-18 23:02:29 -04:00
Jukka Seppänen
b9dedea57d feat: SUPIR model support (CORE-17) (#13250) 2026-04-18 23:02:01 -04:00
comfyanonymous
cb0bbde402 Fix ernie on devices that don't support fp64. (#13414) 2026-04-14 22:54:47 -04:00
comfyanonymous
402ff1cdb7 Fix issue with ernie image. (#13393) 2026-04-13 16:38:42 -04:00
comfyanonymous
31283d2892 Implement Ernie Image model. (#13369) 2026-04-11 22:29:31 -04:00
comfyanonymous
55ebd287ee Add a supports_fp64 function. (#13368) 2026-04-11 21:06:36 -04:00
Jukka Seppänen
a134423890 SDPose: resize input always (#13349) 2026-04-10 11:26:55 -10:00
huemin
b615af1c65 Add support for small flux.2 decoder (#13314) 2026-04-07 03:44:18 -04:00
comfyanonymous
40862c0776 Support Ace Step 1.5 XL model. (#13317) 2026-04-07 03:13:47 -04:00
comfyanonymous
0c63b4f6e3 Remove dead code. (#13251) 2026-04-01 20:22:06 -04:00
Jukka Seppänen
a500f1edac CORE-13 feat: Support RT-DETRv4 detection model (#12748) 2026-03-28 23:34:10 -04:00
Jukka Seppänen
e87858e974 feat: LTX2: Support reference audio (ID-LoRA) (#13111) 2026-03-23 18:22:24 -04:00
Talmaj
d49420b3c7 LongCat-Image edit (#13003) 2026-03-21 23:51:05 -04:00
rattus
25b6d1d629 wan: vae: Fix light/color change (#13101)
There was an issue where the resample split was too early and dropped one
of the rolling convolutions a frame early. This is most noticable as a
lighting/color change between pixel frames 5->6 (latent 2->3), or as a
lighting change between the first and last frame in an FLF wan flow.
2026-03-21 18:44:35 -04:00
rattus
f49856af57 ltx: vae: Fix missing init variable (#13074)
Forgot to push this ammendment. Previous test results apply to this.
2026-03-19 22:34:58 -04:00
rattus
82b868a45a Fix VRAM leak in tiler fallback in video VAEs (#13073)
* sd: soft_empty_cache on tiler fallback

This doesnt cost a lot and creates the expected VRAM reduction in
resource monitors when you fallback to tiler.

* wan: vae: Don't recursion in local fns (move run_up)

Moved Decoder3d’s recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.

* ltx: vae: Don't recursion in local fns (move run_up)

Mov the recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.
2026-03-19 22:30:27 -04:00
rattus
fabed694a2 ltx: vae: implement chunked encoder + CPU IO chunking (Big VRAM reductions) (#13062)
* ltx: vae: add cache state to downsample block

* ltx: vae: Add time stride awareness to causal_conv_3d

* ltx: vae: Automate truncation for encoder

Other VAEs just truncate without error. Do the same.

* sd/ltx: Make chunked_io a flag in its own right

Taking this bi-direcitonal, so make it a for-purpose named flag.

* ltx: vae: implement chunked encoder + CPU IO chunking

People are doing things with big frame counts in LTX including V2V
flows. Implement the time-chunked encoder to keep the VRAM down, with
the converse of the new CPU pre-allocation technique, where the chunks
are brought from the CPU JIT.

* ltx: vae-encode: round chunk sizes more strictly

Only powers of 2 and multiple of 8 are valid due to cache slicing.
2026-03-19 09:58:47 -07:00
Jukka Seppänen
9fff091f35 Further Reduce LTX VAE decode peak RAM usage (#13052) 2026-03-18 18:32:26 -04:00
rattus
cad24ce262 cascade: remove dead weight init code (#13026)
This weight init process is fully shadowed be the weight load and
doesnt work in dynamic_vram were the weight allocation is deferred.
2026-03-17 20:59:10 -04:00
rattus
035414ede4 Reduce WAN VAE VRAM, Save use cases for OOM/Tiler (#13014)
* wan: vae: encoder: Add feature cache layer that corks singles

If a downsample only gives you a single frame, save it to the feature
cache and return nothing to the top level. This increases the
efficiency of cacheability, but also prepares support for going two
by two rather than four by four on the frames.

* wan: remove all concatentation with the feature cache

The loopers are now responsible for ensuring that non-final frames are
processes at least two-by-two, elimiating the need for this cat case.

* wan: vae: recurse and chunk for 2+2 frames on decode

Avoid having to clone off slices of 4 frame chunks and reduce the size
of the big 6 frame convolutions down to 4. Save the VRAMs.

* wan: encode frames 2x2.

Reduce VRAM usage greatly by encoding frames 2 at a time rather than
4.

* wan: vae: remove cloning

The loopers now control the chunking such there is noever more than 2
frames, so just cache these slices directly and avoid the clone
allocations completely.

* wan: vae: free consumer caller tensors on recursion

* wan: vae: restyle a little to match LTX
2026-03-17 17:34:39 -04:00
rattus
1a157e1f97 Reduce LTX VAE VRAM usage and save use cases from OOMs/Tiler (#13013)
* ltx: vae: scale the chunk size with the users VRAM

Scale this linearly down for users with low VRAM.

* ltx: vae: free non-chunking recursive intermediates

* ltx: vae: cleanup some intermediates

The conv layer can be the VRAM peak and it does a torch.cat. So cleanup
the pieces of the cat. Also clear our the cache ASAP as each layer detect
its end as this VAE surges in VRAM at the end due to the ended padding
increasing the size of the final frame convolutions off-the-books to
the chunker. So if all the earlier layers free up their cache it can
offset that surge.

Its a fragmentation nightmare, and the chance of it having to recache the
pyt allocator is very high, but you wont OOM.
2026-03-17 17:32:43 -04:00
Paulo Muggler Moreira
8cc746a864 fix: disable SageAttention for Hunyuan3D v2.1 DiT (#12772) 2026-03-16 22:27:27 -04:00
Jukka Seppänen
0904cc3fe5 LTXV: Accumulate VAE decode results on intermediate_device (#12955) 2026-03-14 18:09:09 -07:00
comfyanonymous
44f1246c89 Support flux 2 klein kv cache model: Use the FluxKVCache node. (#12905) 2026-03-12 11:30:50 -04:00
comfyanonymous
f6274c06b4 Fix issue with batch_size > 1 on some models. (#12892) 2026-03-11 16:37:31 -04:00
comfyanonymous
9642e4407b Add pre attention and post input patches to qwen image model. (#12879) 2026-03-11 00:09:35 -04:00
rattus
535c16ce6e Widen OOM_EXCEPTION to AcceleratorError form (#12835)
Pytorch only filters for OOMs in its own allocators however there are
paths that can OOM on allocators made outside the pytorch allocators.
These manifest as an AllocatorError as pytorch does not have universal
error translation to its OOM type on exception. Handle it. A log I have
for this also shows a double report of the error async, so call the
async discarder to cleanup and make these OOMs look like OOMs.
2026-03-10 00:41:02 -04:00
comfyanonymous
c4fb0271cd Add a way for nodes to add pre attn patches to flux model. (#12861) 2026-03-09 23:37:58 -04:00
comfyanonymous
17b43c2b87 LTX audio vae novram fixes. (#12796) 2026-03-05 16:31:28 -05:00
Jukka Seppänen
8befce5c7b Add manual cast to LTX2 vocoder conv_transpose1d (#12795)
* Add manual cast to LTX2 vocoder

* Update vocoder.py
2026-03-05 12:37:25 -08:00
comfyanonymous
43c64b6308 Support the LTXAV 2.3 model. (#12773) 2026-03-04 20:06:20 -05:00
Lodestone
9ebee0a217 Feat: z-image pixel space (model still training atm) (#12709)
* draft zeta (z-image pixel space)

* revert gitignore

* model loaded and able to run however vector direction still wrong tho

* flip the vector direction to original again this time

* Move wrongly positioned Z image pixel space class

* inherit Radiance LatentFormat class

* Fix parameters in classes for Zeta x0 dino

* remove arbitrary nn.init instances

* Remove unused import of lru_cache

---------

Co-authored-by: silveroxides <ishimarukaito@gmail.com>
2026-03-02 19:43:47 -05:00
Jukka Seppänen
1f6744162f feat: Support SCAIL WanVideo model (#12614) 2026-02-28 16:49:12 -05:00
Reiner "Tiles" Prokein
25ec3d96a3 Class WanVAE, def encode, feat_map is using self.decoder instead of self.encoder (#12682) 2026-02-27 19:03:45 -05:00
Jukka Seppänen
c7f7d52b68 feat: Support SDPose-OOD (#12661) 2026-02-26 19:59:05 -05:00
Tavi Halperin
a4522017c5 feat: per-guide attention strength control in self-attention (#12518)
Implements per-guide attention attenuation via log-space additive bias
in self-attention. Each guide reference tracks its own strength and
optional spatial mask in conditioning metadata (guide_attention_entries).
2026-02-26 01:25:23 -05:00
Jukka Seppänen
907e5dcbbf initial FlowRVS support (#12637) 2026-02-25 23:38:46 -05:00
comfyanonymous
caa43d2395 Fix issue loading fp8 ltxav checkpoints. (#12582) 2026-02-22 16:00:02 -05:00
comfyanonymous
07ca6852e8 Fix dtype issue in embeddings connector. (#12570) 2026-02-22 03:18:20 -05:00
comfyanonymous
f266b8d352 Move LTXAV av embedding connectors to diffusion model. (#12569) 2026-02-21 22:29:58 -05:00
chaObserv
44f8598521 Fix anima LLM adapter forward when manual cast (#12504) 2026-02-17 07:56:44 -08:00
comfyanonymous
18927538a1 Implement NAG on all the models based on the Flux code. (#12500)
Use the Normalized Attention Guidance node.

Flux, Flux2, Klein, Chroma, Chroma radiance, Hunyuan Video, etc..
2026-02-16 23:30:34 -05:00
comfyanonymous
88e6370527 Remove workaround for old pytorch. (#12480) 2026-02-15 20:43:53 -05:00
krigeta
dc9822b7df Add working Qwen 2512 ControlNet (Fun ControlNet) support (#12359) 2026-02-13 22:23:52 -05:00
comfyanonymous
726af73867 Fix some custom nodes. (#12455) 2026-02-13 20:21:10 -05:00
comfyanonymous
e1add563f9 Use torch RMSNorm for flux models and refactor hunyuan video code. (#12432) 2026-02-13 15:35:13 -05:00
comfyanonymous
76a7fa96db Make built in lora training work on anima. (#12402) 2026-02-10 22:04:32 -05:00
Kohaku-Blueleaf
cdcf4119b3 [Trainer] training with proper offloading (#12189)
* Fix bypass dtype/device moving

* Force offloading mode for training

* training context var

* offloading implementation in training node

* fix wrong input type

* Support bypass load lora model, correct adapter/offloading handling
2026-02-10 21:45:19 -05:00
comfyanonymous
039955c527 Some fixes to previous pr. (#12339) 2026-02-06 20:14:52 -05:00
tdrussell
6a26328842 Support fp16 for Cosmos-Predict2 and Anima (#12249) 2026-02-06 20:12:15 -05:00