* Base ace step 1.5 xl added. Generating, still wip on training and ui
* Base training code done
* Fix some issues with caching text embeddings. Update sample cards to show audio
* Fix issue with quantizing ace step
* Add album artwork to samples with waveform.
* Cleanup logs
* Add album art endpoint to speed up album art loading
* Made an make video with artwork script
* Make ui handle basic audio models. Make multi line adjustments to the editor and better syntax hilighting.
* Add prompt tagging system for special tagged models.
* prompt tagging processing for ui working.
* Moved default samples to a special file so we can add more when needed and they can be adjusted for a specific model
* Add a captioner job with music captioner that is prepped for use with the ui
* Add basit ui setup for captioning modal and handeling captioning jobs
* Starting captioning job from ui working. Still better management for it.
* Better filtering of job options in the job view for captioning jobs
* Added qwen3 vl as a captioner for images
* Have an indicator when a dataset is being captioned.
* Adjust the way caption jobs look in the queue
* Fix a few issues. Adjust defaults.
* Version bump
* Added ace step to the readme.
Keep the transformer and Qwen text encoder off CUDA during initial load/quantization in low-VRAM mode so model startup avoids full-model OOM before offloading and quantization can take effect.
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Jaret Burkett <jaretburkett@gmail.com>
* Made an install script and auto updates env for mac
* GPU sensors and initial training working for MAC. Still WIP.
* Switch dataloader to single threaded until I can work around some mac pickeling issues.
* Get quantization working on mac
* Fix mac exclusive imports so they don't break other builds.
* Add mac instructions to the UI
Changes to the EMA Decay input don't get preserved when switching back and forth between Advanced and Simple view. I believe the onChange is not writing it correctly here.
* Add 1328 native resolution for Qwen Image training
Qwen-Image and Qwen-Image-2512 have a native 1:1 resolution of 1328x1328
as documented in the official model card's aspect ratio table. Adding it
to the resolution buckets and UI allows training at the model's native
resolution for improved quality.
* Revert example config change (24GB OOM at 1328)
When training Klein models with a `control_path` (edit/kontext-style
paired datasets), `encode_image_refs()` returns tensors that reside on
the VAE's device (CPU, since the VAE weights are loaded via
`load_file(..., device="cpu")` and are never explicitly moved to the
training device). Concatenating those CPU tensors with the training
latents (`packed_latents`) that live on CUDA raises:
RuntimeError: Expected all tensors to be on the same device
Fix: move `img_cond_seq` and `img_cond_seq_ids` to the same device
(and dtype) as `img_input` / `img_input_ids` before concatenation.
Co-authored-by: HuangYuChuh <HuangYuChuh@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>