Modal cloud training support, fixed typo in toolkit/scheduler.py, Schnell training support for Colab, issue #92 , issue #114 (#115)

* issue #76, load_checkpoint_and_dispatch() 'force_hooks' https://github.com/ostris/ai-toolkit/issues/76 * RunPod cloud config https://github.com/ostris/ai-toolkit/issues/90 * change 2x A40 to 1x A40 and price per hour referring to https://github.com/ostris/ai-toolkit/issues/90#issuecomment-2294894929 * include missed FLUX.1-schnell setup guide in last commit * huggingface-cli login required auth * #92 peft, #114 colab, schnell training in colab * modal cloud - run_modal.py and .yaml configs * run_modal.py mount path example * modal_examples renamed to modal * Training in Modal README.md setup guide * rename run command in title for consistency
2026-04-29 10:41:28 +00:00 · 2024-08-23 06:25:44 +03:00
parent 4d35a29c97
commit 34db804c76
8 changed files with 817 additions and 89 deletions
--- a/notebooks/FLUX_1_dev_LoRA_Training.ipynb
+++ b/notebooks/FLUX_1_dev_LoRA_Training.ipynb
@@ -1,53 +1,45 @@
 {
-  "nbformat": 4,
-  "nbformat_minor": 0,
-  "metadata": {
-    "colab": {
-      "provenance": [],
-      "machine_shape": "hm",
-      "gpuType": "A100"
-    },
-    "kernelspec": {
-      "name": "python3",
-      "display_name": "Python 3"
-    },
-    "language_info": {
-      "name": "python"
-    },
-    "accelerator": "GPU"
-  },
  "cells": [
    {
      "cell_type": "markdown",
-      "source": [
-        "# AI Toolkit by Ostris\n",
-        "## FLUX.1 Training\n"
-      ],
      "metadata": {
        "collapsed": false,
        "id": "zl-S0m3pkQC5"
-      }
+      },
+      "source": [
+        "# AI Toolkit by Ostris\n",
+        "## FLUX.1-dev Training\n"
+      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
      "source": [
-        "!git clone https://github.com/ostris/ai-toolkit\n",
-        "!mkdir -p /content/dataset"
-      ],
+        "!nvidia-smi"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "BvAG0GKAh59G"
      },
-      "execution_count": null,
-      "outputs": []
+      "outputs": [],
+      "source": [
+        "!git clone https://github.com/ostris/ai-toolkit\n",
+        "!mkdir -p /content/dataset"
+      ]
    },
    {
      "cell_type": "markdown",
-      "source": [
-        "Put your image dataset in the `/content/dataset` folder"
-      ],
      "metadata": {
        "id": "UFUW4ZMmnp1V"
-      }
+      },
+      "source": [
+        "Put your image dataset in the `/content/dataset` folder"
+      ]
    },
    {
      "cell_type": "code",
@@ -62,6 +54,9 @@
    },
    {
      "cell_type": "markdown",
+      "metadata": {
+        "id": "OV0HnOI6o8V6"
+      },
      "source": [
        "## Model License\n",
        "Training currently only works with FLUX.1-dev. Which means anything you train will inherit the non-commercial license. It is also a gated model, so you need to accept the license on HF before using it. Otherwise, this will fail. Here are the required steps to setup a license.\n",
@@ -69,13 +64,15 @@
        "Sign into HF and accept the model access here [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)\n",
        "\n",
        "[Get a READ key from huggingface](https://huggingface.co/settings/tokens/new?) and place it in the next cell after running it."
-      ],
-      "metadata": {
-        "id": "OV0HnOI6o8V6"
-      }
+      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "3yZZdhFRoj2m"
+      },
+      "outputs": [],
      "source": [
        "import getpass\n",
        "import os\n",
@@ -87,15 +84,15 @@
        "os.environ['HF_TOKEN'] = hf_token\n",
        "\n",
        "print(\"HF_TOKEN environment variable has been set.\")"
-      ],
-      "metadata": {
-        "id": "3yZZdhFRoj2m"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "9gO2EzQ1kQC8"
+      },
+      "outputs": [],
      "source": [
        "import os\n",
        "import sys\n",
@@ -105,26 +102,26 @@
        "from PIL import Image\n",
        "import os\n",
        "os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\""
-      ],
-      "metadata": {
-        "id": "9gO2EzQ1kQC8"
-      },
-      "outputs": [],
-      "execution_count": null
+      ]
    },
    {
      "cell_type": "markdown",
+      "metadata": {
+        "id": "N8UUFzVRigbC"
+      },
      "source": [
        "## Setup\n",
        "\n",
        "This is your config. It is documented pretty well. Normally you would do this as a yaml file, but for colab, this will work. This will run as is without modification, but feel free to edit as you want."
-      ],
-      "metadata": {
-        "id": "N8UUFzVRigbC"
-      }
+      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "_t28QURYjRQO"
+      },
+      "outputs": [],
      "source": [
        "from collections import OrderedDict\n",
        "\n",
@@ -169,7 +166,7 @@
        "                ]),\n",
        "                ('train', OrderedDict([\n",
        "                    ('batch_size', 1),\n",
-        "                    ('steps', 4000),  # total number of steps to train 500 - 4000 is a good range\n",
+        "                    ('steps', 2000),  # total number of steps to train 500 - 4000 is a good range\n",
        "                    ('gradient_accumulation_steps', 1),\n",
        "                    ('train_unet', True),\n",
        "                    ('train_text_encoder', False),  # probably won't work with flux\n",
@@ -177,9 +174,16 @@
        "                    ('gradient_checkpointing', True),  # need the on unless you have a ton of vram\n",
        "                    ('noise_scheduler', 'flowmatch'),  # for training only\n",
        "                    ('optimizer', 'adamw8bit'),\n",
-        "                    ('lr', 4e-4),\n",
+        "                    ('lr', 1e-4),\n",
+        "\n",
        "                    # uncomment this to skip the pre training sample\n",
-        "                    #('skip_first_sample', True),\n",
+        "                    # ('skip_first_sample', True),\n",
+        "\n",
+        "                    # uncomment to completely disable sampling\n",
+        "                    # ('disable_sampling', True),\n",
+        "\n",
+        "                    # uncomment to use new vell curved weighting. Experimental but may produce better results\n",
+        "                    # ('linear_timesteps', True),\n",
        "\n",
        "                    # ema will smooth out learning, but could slow it down. Recommended to leave on.\n",
        "                    ('ema_config', OrderedDict([\n",
@@ -231,45 +235,57 @@
        "        ('version', '1.0')\n",
        "    ]))\n",
        "])\n"
-      ],
-      "metadata": {
-        "id": "_t28QURYjRQO"
-      },
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
+      "metadata": {
+        "id": "h6F1FlM2Wb3l"
+      },
      "source": [
        "## Run it\n",
        "\n",
        "Below does all the magic. Check your folders to the left. Items will be in output/LoRA/your_name_v1 In the samples folder, there are preiodic sampled. This doesnt work great with colab. They will be in /content/output"
-      ],
-      "metadata": {
-        "id": "h6F1FlM2Wb3l"
-      }
+      ]
    },
    {
      "cell_type": "code",
-      "source": [
-        "run_job(job_to_run)\n"
-      ],
+      "execution_count": null,
      "metadata": {
        "id": "HkajwI8gteOh"
      },
-      "execution_count": null,
-      "outputs": []
+      "outputs": [],
+      "source": [
+        "run_job(job_to_run)\n"
+      ]
    },
    {
      "cell_type": "markdown",
+      "metadata": {
+        "id": "Hblgb5uwW5SD"
+      },
      "source": [
        "## Done\n",
        "\n",
        "Check your ourput dir and get your slider\n"
-      ],
-      "metadata": {
-        "id": "Hblgb5uwW5SD"
-      }
+      ]
    }
-  ]
-}
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "gpuType": "A100",
+      "machine_shape": "hm",
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}