gemini-3-pro-image-preview nondeterministically returns image/jpeg
instead of image/png. get_image_from_response() hardcoded
get_parts_by_type(response, "image/png"), silently dropping JPEG
responses and falling back to torch.zeros (all-black output).
Add _mime_matches() helper using fnmatch for glob-style MIME matching.
Change get_image_from_response() to request "image/*" so any image
format returned by the API is correctly captured.
* lora: add weight shape calculations.
This lets the loader know if a lora will change the shape of a weight
so it can take appropriate action.
* MPDynamic: force load flux img_in weight
This weight is a bit special, in that the lora changes its geometry.
This is rather unique, not handled by existing estimate and doesn't
work for either offloading or dynamic_vram.
Fix for dynamic_vram as a special case. Ideally we can fully precalculate
these lora geometry changes at load time, but just get these models
working first.
* lora_extract: Add a trange
If you bite off more than your GPU can chew, this kinda just hangs.
Give a rough indication of progress counting the weights in a trange.
* lora_extract: Support on-the-fly patching
Use the on-the-fly approach from the regular model saving logic for
lora extraction too. Switch off force_cast_weights accordingly.
This gets extraction working in dynamic vram while also supporting
extraction on GPU offloaded.
Get rid of the cat and unary negation and inplace add-cmul the two
halves of the rope. Precompute -sin once at the start of the model
rather than every transformer block.
This is slightly faster on both GPU and CPU bound setups.
The current behaviour of the default ModelPatcher is to .to a model
only if its fully loaded, which is how random non-leaf weights get
loaded in non-LowVRAM conditions.
The however means they never get loaded in dynamic_vram. In the
dynamic_vram case, force load them to the GPU.
* model_management: lazy-cache aimdo_tensor
These tensors cosntructed from aimdo-allocations are CPU expensive to
make on the pytorch side. Add a cache version that will be valid with
signature match to fast path past whatever torch is doing.
* dynamic_vram: Minimize fast path CPU work
Move as much as possible inside the not resident if block and cache
the formed weight and bias rather than the flat intermediates. In
extreme layer weight rates this adds up.
* Fix bypass dtype/device moving
* Force offloading mode for training
* training context var
* offloading implementation in training node
* fix wrong input type
* Support bypass load lora model, correct adapter/offloading handling
This was missing the stochastic rounding required for fp8 downcast
to be consistent with model_patcher.patch_weight_to_device.
Missed in testing as I spend too much time with quantized tensors
and overlooked the simpler ones.
If there are non-trivial python objects nested in the model_options, this
causes all sorts of issues. Traverse lists and dicts so clones can safely
overide settings and BYO objects but stop there on the deepclone.
* feat(api-nodes-Kling): add new models (V3, O3)
* remove storyboard from VideoToVideo node
* added check for total duration of storyboards
* fixed other small things
* updated display name for nodes
* added "fake" seed