mirror of
https://github.com/ostris/ai-toolkit.git
synced 2026-05-12 00:42:07 +00:00
Keep the transformer and Qwen text encoder off CUDA during initial load/quantization in low-VRAM mode so model startup avoids full-model OOM before offloading and quantization can take effect. Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Jaret Burkett <jaretburkett@gmail.com>