Keep the transformer and Qwen text encoder off CUDA during initial load/quantization in low-VRAM mode so model startup avoids full-model OOM before offloading and quantization can take effect.
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Jaret Burkett <jaretburkett@gmail.com>
When training Klein models with a `control_path` (edit/kontext-style
paired datasets), `encode_image_refs()` returns tensors that reside on
the VAE's device (CPU, since the VAE weights are loaded via
`load_file(..., device="cpu")` and are never explicitly moved to the
training device). Concatenating those CPU tensors with the training
latents (`packed_latents`) that live on CUDA raises:
RuntimeError: Expected all tensors to be on the same device
Fix: move `img_cond_seq` and `img_cond_seq_ids` to the same device
(and dtype) as `img_input` / `img_input_ids` before concatenation.
Co-authored-by: HuangYuChuh <HuangYuChuh@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix Qwen Image mask handling
* Fix Qwen attention mask crash with diffusers >=0.37
diffusers v0.37 (PR #12987) optimizes all-ones attention masks to None
in encode_prompt() when there is no padding. This breaks ai-toolkit's
Qwen extensions which call .to() on the mask unconditionally.
Fix: reconstruct the all-ones mask at the boundary (get_prompt_embeds)
right after encode_prompt() returns. This keeps the rest of the code
unchanged and works with both old and new diffusers versions.
Also removes redundant duplicate mask assignments in qwen_image_edit
and qwen_image_edit_plus.
Fixes#740
* Initial support for ltx 2.3. Still needs a lot of testing to make sure it is all right.
* bump version
* Handle lora renaming keys for new ltx 2.3 layers
* WIP, adding support for LTX2
* Training on images working
* Fix loading comfy models
* Handle converting and deconverting lora so it matches original format
* Reworked ui to habdle ltx and propert dataset default overwriting.
* Update the way lokr saves to it is more compatable with comfy
* Audio loading and synchronization/resampling is working
* Add audio to training. Does it work? Maybe, still testing.
* Fixed fps default issue for sound
* Have ui set fps for accurate audio mapping on ltx
* Added audio procession options to the ui for ltx
* Clean up requirements