Files
ai-toolkit/extensions_built_in/diffusion_models/flux2
M. Hofer f213e3b1e5 Fix FLUX2 Klein load-time VRAM spikes on low-memory GPUs. (#726)
Keep the transformer and Qwen text encoder off CUDA during initial load/quantization in low-VRAM mode so model startup avoids full-model OOM before offloading and quantization can take effect.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Jaret Burkett <jaretburkett@gmail.com>
2026-04-01 09:36:55 -06:00
..