Modal cloud training support, fixed typo in toolkit/scheduler.py, Schnell training support for Colab, issue #92 , issue #114 (#115)

* issue #76, load_checkpoint_and_dispatch() 'force_hooks'

https://github.com/ostris/ai-toolkit/issues/76

* RunPod cloud config

https://github.com/ostris/ai-toolkit/issues/90

* change 2x A40 to 1x A40 and price per hour

referring to https://github.com/ostris/ai-toolkit/issues/90#issuecomment-2294894929

* include missed FLUX.1-schnell setup guide in last commit

* huggingface-cli login required auth

* #92 peft, #114 colab, schnell training in colab

* modal cloud - run_modal.py and .yaml configs

* run_modal.py mount path example

* modal_examples renamed to modal

* Training in Modal README.md setup guide

* rename run command in title for consistency
This commit is contained in:
martintomov
2024-08-23 06:25:44 +03:00
committed by GitHub
parent 4d35a29c97
commit 34db804c76
8 changed files with 817 additions and 89 deletions

View File

@@ -117,7 +117,7 @@ Please do not open a bug report unless it is a bug in the code. You are welcome
and ask for help there. However, please refrain from PMing me directly with general question or support. Ask in the discord and ask for help there. However, please refrain from PMing me directly with general question or support. Ask in the discord
and I will answer when I can. and I will answer when I can.
### Training in RunPod cloud ## Training in RunPod
Example RunPod template: **runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04** Example RunPod template: **runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04**
> You need a minimum of 24GB VRAM, pick a GPU by your preference. > You need a minimum of 24GB VRAM, pick a GPU by your preference.
@@ -142,26 +142,72 @@ pip install -r requirements.txt
pip install --upgrade accelerate transformers diffusers huggingface_hub #Optional, run it if you run into issues pip install --upgrade accelerate transformers diffusers huggingface_hub #Optional, run it if you run into issues
``` ```
### 2. Upload your dataset ### 2. Upload your dataset
- Create a new folder in the root, name it `dataset` or whatever you like - Create a new folder in the root, name it `dataset` or whatever you like.
- Drag and drop your .jpg and .txt files inside the newly created dataset folder - Drag and drop your .jpg, .jpeg, or .png images and .txt files inside the newly created dataset folder.
### 3. Login into Hugging Face with an Access Token ### 3. Login into Hugging Face with an Access Token
- Get a READ token from [here](https://huggingface.co/settings/tokens) - Get a READ token from [here](https://huggingface.co/settings/tokens) and request access to Flux.1-dev model from [here](https://huggingface.co/black-forest-labs/FLUX.1-dev).
- Run ```huggingface-cli login``` and paste your token - Run ```huggingface-cli login``` and paste your token.
### 4. Training ### 4. Training
- Copy an example config file located at ```config/examples``` to the config folder and rename it to ```whatever_you_want.yml``` - Copy an example config file located at ```config/examples``` to the config folder and rename it to ```whatever_you_want.yml```.
- Edit the config following the comments in the file - Edit the config following the comments in the file.
- Change ```folder_path: "/path/to/images/folder"``` to your dataset path like ```folder_path: "/workspace/ai-toolkit/your-dataset"``` - Change ```folder_path: "/path/to/images/folder"``` to your dataset path like ```folder_path: "/workspace/ai-toolkit/your-dataset"```.
- Run the file: ```python run.py config/whatever_you_want.yml``` - Run the file: ```python run.py config/whatever_you_want.yml```.
### Screenshot from RunPod ### Screenshot from RunPod
<img width="1728" alt="RunPod Training Screenshot" src="https://github.com/user-attachments/assets/53a1b8ef-92fa-4481-81a7-bde45a14a7b5"> <img width="1728" alt="RunPod Training Screenshot" src="https://github.com/user-attachments/assets/53a1b8ef-92fa-4481-81a7-bde45a14a7b5">
<!--- ## Training in Modal
### Training in the cloud
Coming very soon. Getting base out then will have a notebook that makes all that work. ### 1. Setup
--> #### ai-toolkit:
```
git clone https://github.com/ostris/ai-toolkit.git
cd ai-toolkit
git submodule update --init --recursive
python -m venv venv
source venv/bin/activate
pip install torch
pip install -r requirements.txt
pip install --upgrade accelerate transformers diffusers huggingface_hub #Optional, run it if you run into issues
```
#### Modal:
- Run `pip install modal` to install the modal Python package.
- Run `modal setup` to authenticate (if this doesnt work, try `python -m modal setup`).
#### Hugging Face:
- Get a READ token from [here](https://huggingface.co/settings/tokens) and request access to Flux.1-dev model from [here](https://huggingface.co/black-forest-labs/FLUX.1-dev).
- Run `huggingface-cli login` and paste your token.
### 2. Upload your dataset
- Drag and drop your dataset folder containing the .jpg, .jpeg, or .png images and .txt files in `ai-toolkit`.
### 3. Configs
- Copy an example config file located at ```config/examples/modal``` to the `config` folder and rename it to ```whatever_you_want.yml```.
- Edit the config following the comments in the file, **<ins>be careful and follow the example `/root/ai-toolkit` paths</ins>**.
### 4. Edit run_modal.py
- Set your entire local `ai-toolkit` path at `code_mount = modal.Mount.from_local_dir` like:
```
code_mount = modal.Mount.from_local_dir("/Users/username/ai-toolkit", remote_path="/root/ai-toolkit")
```
- Choose a `GPU` and `Timeout` in `@app.function` _(default is A100 40GB and 2 hour timeout)_.
### 5. Training
- Run the config file in your terminal: `modal run run_modal.py --config-file-list-str=/root/ai-toolkit/config/whatever_you_want.yml`.
- You can monitor your training in your local terminal, or on [modal.com](https://modal.com/).
- Models, samples and optimizer will be stored in `Storage > flux-lora-models`.
### 6. Saving the model
- Check contents of the volume by running `modal volume ls flux-lora-models`.
- Download the content by running `modal volume get flux-lora-models your-model-name`.
- Example: `modal volume get flux-lora-models my_first_flux_lora_v1`.
### Screenshot from Modal
<img width="1728" alt="Modal Traning Screenshot" src="https://github.com/user-attachments/assets/7497eb38-0090-49d6-8ad9-9c8ea7b5388b">
--- ---

View File

@@ -0,0 +1,96 @@
---
job: extension
config:
# this name will be the folder and filename name
name: "my_first_flux_lora_v1"
process:
- type: 'sd_trainer'
# root folder to save training sessions/samples/weights
training_folder: "/root/ai-toolkit/modal_output" # must match MOUNT_DIR from run_modal.py
# uncomment to see performance stats in the terminal every N steps
# performance_log_every: 1000
device: cuda:0
# if a trigger word is specified, it will be added to captions of training data if it does not already exist
# alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word
# trigger_word: "p3r5on"
network:
type: "lora"
linear: 16
linear_alpha: 16
save:
dtype: float16 # precision to save
save_every: 250 # save every this many steps
max_step_saves_to_keep: 4 # how many intermittent saves to keep
datasets:
# datasets are a folder of images. captions need to be txt files with the same name as the image
# for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently
# images will automatically be resized and bucketed into the resolution specified
# on windows, escape back slashes with another backslash so
# "C:\\path\\to\\images\\folder"
# your dataset must be placed in /ai-toolkit and /root is for modal to find the dir:
- folder_path: "/root/ai-toolkit/your-dataset"
caption_ext: "txt"
caption_dropout_rate: 0.05 # will drop out the caption 5% of time
shuffle_tokens: false # shuffle caption order, split by commas
cache_latents_to_disk: true # leave this true unless you know what you're doing
resolution: [ 512, 768, 1024 ] # flux enjoys multiple resolutions
train:
batch_size: 1
steps: 2000 # total number of steps to train 500 - 4000 is a good range
gradient_accumulation_steps: 1
train_unet: true
train_text_encoder: false # probably won't work with flux
gradient_checkpointing: true # need the on unless you have a ton of vram
noise_scheduler: "flowmatch" # for training only
optimizer: "adamw8bit"
lr: 1e-4
# uncomment this to skip the pre training sample
# skip_first_sample: true
# uncomment to completely disable sampling
# disable_sampling: true
# uncomment to use new vell curved weighting. Experimental but may produce better results
# linear_timesteps: true
# ema will smooth out learning, but could slow it down. Recommended to leave on.
ema_config:
use_ema: true
ema_decay: 0.99
# will probably need this if gpu supports it for flux, other dtypes may not work correctly
dtype: bf16
model:
# huggingface model name or path
# if you get an error, or get stuck while downloading,
# check https://github.com/ostris/ai-toolkit/issues/84, download the model locally and
# place it like "/root/ai-toolkit/FLUX.1-dev"
name_or_path: "black-forest-labs/FLUX.1-dev"
is_flux: true
quantize: true # run 8bit mixed precision
# low_vram: true # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower.
sample:
sampler: "flowmatch" # must match train.noise_scheduler
sample_every: 250 # sample every this many steps
width: 1024
height: 1024
prompts:
# you can add [trigger] to the prompts here and it will be replaced with the trigger word
# - "[trigger] holding a sign that says 'I LOVE PROMPTS!'"\
- "woman with red hair, playing chess at the park, bomb going off in the background"
- "a woman holding a coffee cup, in a beanie, sitting at a cafe"
- "a horse is a DJ at a night club, fish eye lens, smoke machine, lazer lights, holding a martini"
- "a man showing off his cool new t shirt at the beach, a shark is jumping out of the water in the background"
- "a bear building a log cabin in the snow covered mountains"
- "woman playing the guitar, on stage, singing a song, laser lights, punk rocker"
- "hipster man with a beard, building a chair, in a wood shop"
- "photo of a man, white background, medium shot, modeling clothing, studio lighting, white backdrop"
- "a man holding a sign that says, 'this is a sign'"
- "a bulldog, in a post apocalyptic world, with a shotgun, in a leather jacket, in a desert, with a motorcycle"
neg: "" # not used on flux
seed: 42
walk_seed: true
guidance_scale: 4
sample_steps: 20
# you can add any additional meta info here. [name] is replaced with config name at top
meta:
name: "[name]"
version: '1.0'

View File

@@ -0,0 +1,98 @@
---
job: extension
config:
# this name will be the folder and filename name
name: "my_first_flux_lora_v1"
process:
- type: 'sd_trainer'
# root folder to save training sessions/samples/weights
training_folder: "/root/ai-toolkit/modal_output" # must match MOUNT_DIR from run_modal.py
# uncomment to see performance stats in the terminal every N steps
# performance_log_every: 1000
device: cuda:0
# if a trigger word is specified, it will be added to captions of training data if it does not already exist
# alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word
# trigger_word: "p3r5on"
network:
type: "lora"
linear: 16
linear_alpha: 16
save:
dtype: float16 # precision to save
save_every: 250 # save every this many steps
max_step_saves_to_keep: 4 # how many intermittent saves to keep
datasets:
# datasets are a folder of images. captions need to be txt files with the same name as the image
# for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently
# images will automatically be resized and bucketed into the resolution specified
# on windows, escape back slashes with another backslash so
# "C:\\path\\to\\images\\folder"
# your dataset must be placed in /ai-toolkit and /root is for modal to find the dir:
- folder_path: "/root/ai-toolkit/your-dataset"
caption_ext: "txt"
caption_dropout_rate: 0.05 # will drop out the caption 5% of time
shuffle_tokens: false # shuffle caption order, split by commas
cache_latents_to_disk: true # leave this true unless you know what you're doing
resolution: [ 512, 768, 1024 ] # flux enjoys multiple resolutions
train:
batch_size: 1
steps: 2000 # total number of steps to train 500 - 4000 is a good range
gradient_accumulation_steps: 1
train_unet: true
train_text_encoder: false # probably won't work with flux
gradient_checkpointing: true # need the on unless you have a ton of vram
noise_scheduler: "flowmatch" # for training only
optimizer: "adamw8bit"
lr: 1e-4
# uncomment this to skip the pre training sample
# skip_first_sample: true
# uncomment to completely disable sampling
# disable_sampling: true
# uncomment to use new vell curved weighting. Experimental but may produce better results
# linear_timesteps: true
# ema will smooth out learning, but could slow it down. Recommended to leave on.
ema_config:
use_ema: true
ema_decay: 0.99
# will probably need this if gpu supports it for flux, other dtypes may not work correctly
dtype: bf16
model:
# huggingface model name or path
# if you get an error, or get stuck while downloading,
# check https://github.com/ostris/ai-toolkit/issues/84, download the models locally and
# place them like "/root/ai-toolkit/FLUX.1-schnell" and "/root/ai-toolkit/FLUX.1-schnell-training-adapter"
name_or_path: "black-forest-labs/FLUX.1-schnell"
assistant_lora_path: "ostris/FLUX.1-schnell-training-adapter" # Required for flux schnell training
is_flux: true
quantize: true # run 8bit mixed precision
# low_vram is painfully slow to fuse in the adapter avoid it unless absolutely necessary
# low_vram: true # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower.
sample:
sampler: "flowmatch" # must match train.noise_scheduler
sample_every: 250 # sample every this many steps
width: 1024
height: 1024
prompts:
# you can add [trigger] to the prompts here and it will be replaced with the trigger word
# - "[trigger] holding a sign that says 'I LOVE PROMPTS!'"\
- "woman with red hair, playing chess at the park, bomb going off in the background"
- "a woman holding a coffee cup, in a beanie, sitting at a cafe"
- "a horse is a DJ at a night club, fish eye lens, smoke machine, lazer lights, holding a martini"
- "a man showing off his cool new t shirt at the beach, a shark is jumping out of the water in the background"
- "a bear building a log cabin in the snow covered mountains"
- "woman playing the guitar, on stage, singing a song, laser lights, punk rocker"
- "hipster man with a beard, building a chair, in a wood shop"
- "photo of a man, white background, medium shot, modeling clothing, studio lighting, white backdrop"
- "a man holding a sign that says, 'this is a sign'"
- "a bulldog, in a post apocalyptic world, with a shotgun, in a leather jacket, in a desert, with a motorcycle"
neg: "" # not used on flux
seed: 42
walk_seed: true
guidance_scale: 1 # schnell does not do guidance
sample_steps: 4 # 1 - 4 works well
# you can add any additional meta info here. [name] is replaced with config name at top
meta:
name: "[name]"
version: '1.0'

View File

@@ -1,53 +1,45 @@
{ {
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"machine_shape": "hm",
"gpuType": "A100"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [ "cells": [
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [
"# AI Toolkit by Ostris\n",
"## FLUX.1 Training\n"
],
"metadata": { "metadata": {
"collapsed": false, "collapsed": false,
"id": "zl-S0m3pkQC5" "id": "zl-S0m3pkQC5"
} },
"source": [
"# AI Toolkit by Ostris\n",
"## FLUX.1-dev Training\n"
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [ "source": [
"!git clone https://github.com/ostris/ai-toolkit\n", "!nvidia-smi"
"!mkdir -p /content/dataset" ]
], },
{
"cell_type": "code",
"execution_count": null,
"metadata": { "metadata": {
"id": "BvAG0GKAh59G" "id": "BvAG0GKAh59G"
}, },
"execution_count": null, "outputs": [],
"outputs": [] "source": [
"!git clone https://github.com/ostris/ai-toolkit\n",
"!mkdir -p /content/dataset"
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [
"Put your image dataset in the `/content/dataset` folder"
],
"metadata": { "metadata": {
"id": "UFUW4ZMmnp1V" "id": "UFUW4ZMmnp1V"
} },
"source": [
"Put your image dataset in the `/content/dataset` folder"
]
}, },
{ {
"cell_type": "code", "cell_type": "code",
@@ -62,6 +54,9 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {
"id": "OV0HnOI6o8V6"
},
"source": [ "source": [
"## Model License\n", "## Model License\n",
"Training currently only works with FLUX.1-dev. Which means anything you train will inherit the non-commercial license. It is also a gated model, so you need to accept the license on HF before using it. Otherwise, this will fail. Here are the required steps to setup a license.\n", "Training currently only works with FLUX.1-dev. Which means anything you train will inherit the non-commercial license. It is also a gated model, so you need to accept the license on HF before using it. Otherwise, this will fail. Here are the required steps to setup a license.\n",
@@ -69,13 +64,15 @@
"Sign into HF and accept the model access here [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)\n", "Sign into HF and accept the model access here [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)\n",
"\n", "\n",
"[Get a READ key from huggingface](https://huggingface.co/settings/tokens/new?) and place it in the next cell after running it." "[Get a READ key from huggingface](https://huggingface.co/settings/tokens/new?) and place it in the next cell after running it."
], ]
"metadata": {
"id": "OV0HnOI6o8V6"
}
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3yZZdhFRoj2m"
},
"outputs": [],
"source": [ "source": [
"import getpass\n", "import getpass\n",
"import os\n", "import os\n",
@@ -87,15 +84,15 @@
"os.environ['HF_TOKEN'] = hf_token\n", "os.environ['HF_TOKEN'] = hf_token\n",
"\n", "\n",
"print(\"HF_TOKEN environment variable has been set.\")" "print(\"HF_TOKEN environment variable has been set.\")"
], ]
"metadata": {
"id": "3yZZdhFRoj2m"
},
"execution_count": null,
"outputs": []
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9gO2EzQ1kQC8"
},
"outputs": [],
"source": [ "source": [
"import os\n", "import os\n",
"import sys\n", "import sys\n",
@@ -105,26 +102,26 @@
"from PIL import Image\n", "from PIL import Image\n",
"import os\n", "import os\n",
"os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\"" "os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\""
], ]
"metadata": {
"id": "9gO2EzQ1kQC8"
},
"outputs": [],
"execution_count": null
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {
"id": "N8UUFzVRigbC"
},
"source": [ "source": [
"## Setup\n", "## Setup\n",
"\n", "\n",
"This is your config. It is documented pretty well. Normally you would do this as a yaml file, but for colab, this will work. This will run as is without modification, but feel free to edit as you want." "This is your config. It is documented pretty well. Normally you would do this as a yaml file, but for colab, this will work. This will run as is without modification, but feel free to edit as you want."
], ]
"metadata": {
"id": "N8UUFzVRigbC"
}
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_t28QURYjRQO"
},
"outputs": [],
"source": [ "source": [
"from collections import OrderedDict\n", "from collections import OrderedDict\n",
"\n", "\n",
@@ -169,7 +166,7 @@
" ]),\n", " ]),\n",
" ('train', OrderedDict([\n", " ('train', OrderedDict([\n",
" ('batch_size', 1),\n", " ('batch_size', 1),\n",
" ('steps', 4000), # total number of steps to train 500 - 4000 is a good range\n", " ('steps', 2000), # total number of steps to train 500 - 4000 is a good range\n",
" ('gradient_accumulation_steps', 1),\n", " ('gradient_accumulation_steps', 1),\n",
" ('train_unet', True),\n", " ('train_unet', True),\n",
" ('train_text_encoder', False), # probably won't work with flux\n", " ('train_text_encoder', False), # probably won't work with flux\n",
@@ -177,9 +174,16 @@
" ('gradient_checkpointing', True), # need the on unless you have a ton of vram\n", " ('gradient_checkpointing', True), # need the on unless you have a ton of vram\n",
" ('noise_scheduler', 'flowmatch'), # for training only\n", " ('noise_scheduler', 'flowmatch'), # for training only\n",
" ('optimizer', 'adamw8bit'),\n", " ('optimizer', 'adamw8bit'),\n",
" ('lr', 4e-4),\n", " ('lr', 1e-4),\n",
"\n",
" # uncomment this to skip the pre training sample\n", " # uncomment this to skip the pre training sample\n",
" #('skip_first_sample', True),\n", " # ('skip_first_sample', True),\n",
"\n",
" # uncomment to completely disable sampling\n",
" # ('disable_sampling', True),\n",
"\n",
" # uncomment to use new vell curved weighting. Experimental but may produce better results\n",
" # ('linear_timesteps', True),\n",
"\n", "\n",
" # ema will smooth out learning, but could slow it down. Recommended to leave on.\n", " # ema will smooth out learning, but could slow it down. Recommended to leave on.\n",
" ('ema_config', OrderedDict([\n", " ('ema_config', OrderedDict([\n",
@@ -231,45 +235,57 @@
" ('version', '1.0')\n", " ('version', '1.0')\n",
" ]))\n", " ]))\n",
"])\n" "])\n"
], ]
"metadata": {
"id": "_t28QURYjRQO"
},
"execution_count": null,
"outputs": []
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {
"id": "h6F1FlM2Wb3l"
},
"source": [ "source": [
"## Run it\n", "## Run it\n",
"\n", "\n",
"Below does all the magic. Check your folders to the left. Items will be in output/LoRA/your_name_v1 In the samples folder, there are preiodic sampled. This doesnt work great with colab. They will be in /content/output" "Below does all the magic. Check your folders to the left. Items will be in output/LoRA/your_name_v1 In the samples folder, there are preiodic sampled. This doesnt work great with colab. They will be in /content/output"
], ]
"metadata": {
"id": "h6F1FlM2Wb3l"
}
}, },
{ {
"cell_type": "code", "cell_type": "code",
"source": [ "execution_count": null,
"run_job(job_to_run)\n"
],
"metadata": { "metadata": {
"id": "HkajwI8gteOh" "id": "HkajwI8gteOh"
}, },
"execution_count": null, "outputs": [],
"outputs": [] "source": [
"run_job(job_to_run)\n"
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {
"id": "Hblgb5uwW5SD"
},
"source": [ "source": [
"## Done\n", "## Done\n",
"\n", "\n",
"Check your ourput dir and get your slider\n" "Check your ourput dir and get your slider\n"
]
}
], ],
"metadata": { "metadata": {
"id": "Hblgb5uwW5SD" "accelerator": "GPU",
"colab": {
"gpuType": "A100",
"machine_shape": "hm",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
} }
} },
] "nbformat": 4,
"nbformat_minor": 0
} }

View File

@@ -0,0 +1,296 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"id": "zl-S0m3pkQC5"
},
"source": [
"# AI Toolkit by Ostris\n",
"## FLUX.1-schnell Training\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3cokMT-WC6rG"
},
"outputs": [],
"source": [
"!nvidia-smi"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"id": "BvAG0GKAh59G"
},
"outputs": [],
"source": [
"!git clone https://github.com/ostris/ai-toolkit\n",
"!mkdir -p /content/dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UFUW4ZMmnp1V"
},
"source": [
"Put your image dataset in the `/content/dataset` folder"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"id": "XGZqVER_aQJW"
},
"outputs": [],
"source": [
"!cd ai-toolkit && git submodule update --init --recursive && pip install -r requirements.txt\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OV0HnOI6o8V6"
},
"source": [
"## Model License\n",
"Training currently only works with FLUX.1-dev. Which means anything you train will inherit the non-commercial license. It is also a gated model, so you need to accept the license on HF before using it. Otherwise, this will fail. Here are the required steps to setup a license.\n",
"\n",
"Sign into HF and accept the model access here [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)\n",
"\n",
"[Get a READ key from huggingface](https://huggingface.co/settings/tokens/new?) and place it in the next cell after running it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3yZZdhFRoj2m"
},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"# Prompt for the token\n",
"hf_token = getpass.getpass('Enter your HF access token and press enter: ')\n",
"\n",
"# Set the environment variable\n",
"os.environ['HF_TOKEN'] = hf_token\n",
"\n",
"print(\"HF_TOKEN environment variable has been set.\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "9gO2EzQ1kQC8"
},
"outputs": [],
"source": [
"import os\n",
"import sys\n",
"sys.path.append('/content/ai-toolkit')\n",
"from toolkit.job import run_job\n",
"from collections import OrderedDict\n",
"from PIL import Image\n",
"import os\n",
"os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "N8UUFzVRigbC"
},
"source": [
"## Setup\n",
"\n",
"This is your config. It is documented pretty well. Normally you would do this as a yaml file, but for colab, this will work. This will run as is without modification, but feel free to edit as you want."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "_t28QURYjRQO"
},
"outputs": [],
"source": [
"from collections import OrderedDict\n",
"\n",
"job_to_run = OrderedDict([\n",
" ('job', 'extension'),\n",
" ('config', OrderedDict([\n",
" # this name will be the folder and filename name\n",
" ('name', 'my_first_flux_lora_v1'),\n",
" ('process', [\n",
" OrderedDict([\n",
" ('type', 'sd_trainer'),\n",
" # root folder to save training sessions/samples/weights\n",
" ('training_folder', '/content/output'),\n",
" # uncomment to see performance stats in the terminal every N steps\n",
" #('performance_log_every', 1000),\n",
" ('device', 'cuda:0'),\n",
" # if a trigger word is specified, it will be added to captions of training data if it does not already exist\n",
" # alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word\n",
" # ('trigger_word', 'image'),\n",
" ('network', OrderedDict([\n",
" ('type', 'lora'),\n",
" ('linear', 16),\n",
" ('linear_alpha', 16)\n",
" ])),\n",
" ('save', OrderedDict([\n",
" ('dtype', 'float16'), # precision to save\n",
" ('save_every', 250), # save every this many steps\n",
" ('max_step_saves_to_keep', 4) # how many intermittent saves to keep\n",
" ])),\n",
" ('datasets', [\n",
" # datasets are a folder of images. captions need to be txt files with the same name as the image\n",
" # for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently\n",
" # images will automatically be resized and bucketed into the resolution specified\n",
" OrderedDict([\n",
" ('folder_path', '/content/dataset'),\n",
" ('caption_ext', 'txt'),\n",
" ('caption_dropout_rate', 0.05), # will drop out the caption 5% of time\n",
" ('shuffle_tokens', False), # shuffle caption order, split by commas\n",
" ('cache_latents_to_disk', True), # leave this true unless you know what you're doing\n",
" ('resolution', [512, 768, 1024]) # flux enjoys multiple resolutions\n",
" ])\n",
" ]),\n",
" ('train', OrderedDict([\n",
" ('batch_size', 1),\n",
" ('steps', 2000), # total number of steps to train 500 - 4000 is a good range\n",
" ('gradient_accumulation_steps', 1),\n",
" ('train_unet', True),\n",
" ('train_text_encoder', False), # probably won't work with flux\n",
" ('gradient_checkpointing', True), # need the on unless you have a ton of vram\n",
" ('noise_scheduler', 'flowmatch'), # for training only\n",
" ('optimizer', 'adamw8bit'),\n",
" ('lr', 1e-4),\n",
"\n",
" # uncomment this to skip the pre training sample\n",
" # ('skip_first_sample', True),\n",
"\n",
" # uncomment to completely disable sampling\n",
" # ('disable_sampling', True),\n",
"\n",
" # uncomment to use new vell curved weighting. Experimental but may produce better results\n",
" # ('linear_timesteps', True),\n",
"\n",
" # ema will smooth out learning, but could slow it down. Recommended to leave on.\n",
" ('ema_config', OrderedDict([\n",
" ('use_ema', True),\n",
" ('ema_decay', 0.99)\n",
" ])),\n",
"\n",
" # will probably need this if gpu supports it for flux, other dtypes may not work correctly\n",
" ('dtype', 'bf16')\n",
" ])),\n",
" ('model', OrderedDict([\n",
" # huggingface model name or path\n",
" ('name_or_path', 'black-forest-labs/FLUX.1-schnell'),\n",
" ('assistant_lora_path', 'ostris/FLUX.1-schnell-training-adapter'), # Required for flux schnell training\n",
" ('is_flux', True),\n",
" ('quantize', True), # run 8bit mixed precision\n",
" # low_vram is painfully slow to fuse in the adapter avoid it unless absolutely necessary\n",
" #('low_vram', True), # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower.\n",
" ])),\n",
" ('sample', OrderedDict([\n",
" ('sampler', 'flowmatch'), # must match train.noise_scheduler\n",
" ('sample_every', 250), # sample every this many steps\n",
" ('width', 1024),\n",
" ('height', 1024),\n",
" ('prompts', [\n",
" # you can add [trigger] to the prompts here and it will be replaced with the trigger word\n",
" #'[trigger] holding a sign that says \\'I LOVE PROMPTS!\\'',\n",
" 'woman with red hair, playing chess at the park, bomb going off in the background',\n",
" 'a woman holding a coffee cup, in a beanie, sitting at a cafe',\n",
" 'a horse is a DJ at a night club, fish eye lens, smoke machine, lazer lights, holding a martini',\n",
" 'a man showing off his cool new t shirt at the beach, a shark is jumping out of the water in the background',\n",
" 'a bear building a log cabin in the snow covered mountains',\n",
" 'woman playing the guitar, on stage, singing a song, laser lights, punk rocker',\n",
" 'hipster man with a beard, building a chair, in a wood shop',\n",
" 'photo of a man, white background, medium shot, modeling clothing, studio lighting, white backdrop',\n",
" 'a man holding a sign that says, \\'this is a sign\\'',\n",
" 'a bulldog, in a post apocalyptic world, with a shotgun, in a leather jacket, in a desert, with a motorcycle'\n",
" ]),\n",
" ('neg', ''), # not used on flux\n",
" ('seed', 42),\n",
" ('walk_seed', True),\n",
" ('guidance_scale', 1), # schnell does not do guidance\n",
" ('sample_steps', 4) # 1 - 4 works well\n",
" ]))\n",
" ])\n",
" ])\n",
" ])),\n",
" # you can add any additional meta info here. [name] is replaced with config name at top\n",
" ('meta', OrderedDict([\n",
" ('name', '[name]'),\n",
" ('version', '1.0')\n",
" ]))\n",
"])\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "h6F1FlM2Wb3l"
},
"source": [
"## Run it\n",
"\n",
"Below does all the magic. Check your folders to the left. Items will be in output/LoRA/your_name_v1 In the samples folder, there are preiodic sampled. This doesnt work great with colab. They will be in /content/output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HkajwI8gteOh"
},
"outputs": [],
"source": [
"run_job(job_to_run)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Hblgb5uwW5SD"
},
"source": [
"## Done\n",
"\n",
"Check your ourput dir and get your slider\n"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "A100",
"machine_shape": "hm",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -29,3 +29,4 @@ pytorch_fid
optimum-quanto optimum-quanto
sentencepiece sentencepiece
huggingface_hub huggingface_hub
peft

175
run_modal.py Normal file
View File

@@ -0,0 +1,175 @@
'''
ostris/ai-toolkit on https://modal.com
Run training with the following command:
modal run run_modal.py --config-file-list-str=/root/ai-toolkit/config/whatever_you_want.yml
'''
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
import sys
import modal
from dotenv import load_dotenv
# Load the .env file if it exists
load_dotenv()
sys.path.insert(0, "/root/ai-toolkit")
# must come before ANY torch or fastai imports
# import toolkit.cuda_malloc
# turn off diffusers telemetry until I can figure out how to make it opt-in
os.environ['DISABLE_TELEMETRY'] = 'YES'
# define the volume for storing model outputs, using "creating volumes lazily": https://modal.com/docs/guide/volumes
# you will find your model, samples and optimizer stored in: https://modal.com/storage/your-username/main/flux-lora-models
model_volume = modal.Volume.from_name("flux-lora-models", create_if_missing=True)
# modal_output, due to "cannot mount volume on non-empty path" requirement
MOUNT_DIR = "/root/ai-toolkit/modal_output" # modal_output, due to "cannot mount volume on non-empty path" requirement
# define modal app
image = (
modal.Image.debian_slim(python_version="3.11")
# install required system and pip packages, more about this modal approach: https://modal.com/docs/examples/dreambooth_app
.apt_install("libgl1", "libglib2.0-0")
.pip_install(
"python-dotenv",
"torch",
"diffusers[torch]",
"transformers",
"ftfy",
"torchvision",
"oyaml",
"opencv-python",
"albumentations",
"safetensors",
"lycoris-lora==1.8.3",
"flatten_json",
"pyyaml",
"tensorboard",
"kornia",
"invisible-watermark",
"einops",
"accelerate",
"toml",
"pydantic",
"omegaconf",
"k-diffusion",
"open_clip_torch",
"timm",
"prodigyopt",
"controlnet_aux==0.0.7",
"bitsandbytes",
"hf_transfer",
"lpips",
"pytorch_fid",
"optimum-quanto",
"sentencepiece",
"huggingface_hub",
"peft"
)
)
# mount for the entire ai-toolkit directory
# example: "/Users/username/ai-toolkit" is the local directory, "/root/ai-toolkit" is the remote directory
code_mount = modal.Mount.from_local_dir("/Users/username/ai-toolkit", remote_path="/root/ai-toolkit")
# create the Modal app with the necessary mounts and volumes
app = modal.App(name="flux-lora-training", image=image, mounts=[code_mount], volumes={MOUNT_DIR: model_volume})
# Check if we have DEBUG_TOOLKIT in env
if os.environ.get("DEBUG_TOOLKIT", "0") == "1":
# Set torch to trace mode
import torch
torch.autograd.set_detect_anomaly(True)
import argparse
from toolkit.job import get_job
def print_end_message(jobs_completed, jobs_failed):
failure_string = f"{jobs_failed} failure{'' if jobs_failed == 1 else 's'}" if jobs_failed > 0 else ""
completed_string = f"{jobs_completed} completed job{'' if jobs_completed == 1 else 's'}"
print("")
print("========================================")
print("Result:")
if len(completed_string) > 0:
print(f" - {completed_string}")
if len(failure_string) > 0:
print(f" - {failure_string}")
print("========================================")
@app.function(
# request a GPU with at least 24GB VRAM
# more about modal GPU's: https://modal.com/docs/guide/gpu
gpu="A100", # gpu="H100"
# more about modal timeouts: https://modal.com/docs/guide/timeouts
timeout=7200 # 2 hours, increase or decrease if needed
)
def main(config_file_list_str: str, recover: bool = False, name: str = None):
# convert the config file list from a string to a list
config_file_list = config_file_list_str.split(",")
jobs_completed = 0
jobs_failed = 0
print(f"Running {len(config_file_list)} job{'' if len(config_file_list) == 1 else 's'}")
for config_file in config_file_list:
try:
job = get_job(config_file, name)
job.config['process'][0]['training_folder'] = MOUNT_DIR
os.makedirs(MOUNT_DIR, exist_ok=True)
print(f"Training outputs will be saved to: {MOUNT_DIR}")
# run the job
job.run()
# commit the volume after training
model_volume.commit()
job.cleanup()
jobs_completed += 1
except Exception as e:
print(f"Error running job: {e}")
jobs_failed += 1
if not recover:
print_end_message(jobs_completed, jobs_failed)
raise e
print_end_message(jobs_completed, jobs_failed)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
# require at least one config file
parser.add_argument(
'config_file_list',
nargs='+',
type=str,
help='Name of config file (eg: person_v1 for config/person_v1.json/yaml), or full path if it is not in config folder, you can pass multiple config files and run them all sequentially'
)
# flag to continue if a job fails
parser.add_argument(
'-r', '--recover',
action='store_true',
help='Continue running additional jobs even if a job fails'
)
# optional name replacement for config file
parser.add_argument(
'-n', '--name',
type=str,
default=None,
help='Name to replace [name] tag in config file, useful for shared config file'
)
args = parser.parse_args()
# convert list of config files to a comma-separated string for Modal compatibility
config_file_list_str = ",".join(args.config_file_list)
main.call(config_file_list_str=config_file_list_str, recover=args.recover, name=args.name)

View File

@@ -26,7 +26,7 @@ def get_lr_scheduler(
optimizer, **kwargs optimizer, **kwargs
) )
elif name == "constant": elif name == "constant":
if 'facor' not in kwargs: if 'factor' not in kwargs:
kwargs['factor'] = 1.0 kwargs['factor'] = 1.0
return torch.optim.lr_scheduler.ConstantLR(optimizer, **kwargs) return torch.optim.lr_scheduler.ConstantLR(optimizer, **kwargs)