From 91342853c1bddb32df72294fad361a12e45077a7 Mon Sep 17 00:00:00 2001
From: raziel2001au <7069692+raziel2001au@users.noreply.github.com>
Date: Sun, 21 Dec 2025 00:20:07 +1000
Subject: [PATCH] Add support for DGX OS (#567)
---
README.md | 3 ++
dgx_instructions.md | 84 ++++++++++++++++++++++++++++++++++++++++++++
dgx_requirements.txt | 14 ++++++++
3 files changed, 101 insertions(+)
create mode 100644 dgx_instructions.md
create mode 100644 dgx_requirements.txt
diff --git a/README.md b/README.md
index 6f26fc1e..2beb02fc 100644
--- a/README.md
+++ b/README.md
@@ -232,6 +232,9 @@ pip3 install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 -
pip3 install -r requirements.txt
```
+For devices running **DGX OS** (including DGX Spark), follow [these](dgx_instructions.md) instructions.
+
+
Windows:
If you are having issues with Windows. I recommend using the easy install script at [https://github.com/Tavris1/AI-Toolkit-Easy-Install](https://github.com/Tavris1/AI-Toolkit-Easy-Install)
diff --git a/dgx_instructions.md b/dgx_instructions.md
new file mode 100644
index 00000000..f9798382
--- /dev/null
+++ b/dgx_instructions.md
@@ -0,0 +1,84 @@
+# AI Toolkit by Ostris
+
+## DGX OS installation instructions
+
+You need to use Python 3.11 to run AI Toolkit on DGX OS. The easiest way to do this without affecting the system installation of Python is to create a virtual environment with **miniconda**, which allows you to specify the version of Python to use in the environment.
+
+This guide will assume you have a fresh installation of DGX OS, and will guide you through the installation of all requirements.
+
+### Installation instructions for DGX OS:
+
+**1) Get Python 3.11 (via miniconda)**
+
+Install the latest version of miniconda:
+```
+$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh
+$ chmod u+x Miniconda3-latest-Linux-aarch64.sh
+$ ./Miniconda3-latest-Linux-aarch64.sh
+```
+
+Restart your bash or ssh session. If miniconda was installed successfully, it will automatically load the 'base' environment by default. If you want to disable this behaviour, run:
+```
+$ conda config --set auto_activate_base false
+```
+
+Now you can create a Python 3.11 environment for ai-toolkit:
+```
+$ conda create --name ai-toolkit python=3.11
+```
+
+Then activate the environment with:
+
+```
+$ conda activate ai-toolkit
+```
+
+
+**2) Install PyTorch**
+
+```
+$ pip3 install torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu130
+```
+
+
+**3) Install the remaining requirements (dgx_requirements.txt)**
+
+```
+$ pip3 install -r dgx_requirements.txt
+```
+
+### Running the UI on DGX OS:
+
+Running the UI is not that different from doing it on other systems, however, you need to install the ARM64 version of NodeJS for Linux, which is compatible with the NVIDIA Grace CPU.
+
+
+**1) Install Node.js**
+
+Download a Linux ARM64 build of Node.js from: https://nodejs.org (for example: https://nodejs.org/dist/v24.11.1/node-v24.11.1-linux-arm64.tar.xz)
+
+Extract it and add the bin directory to your path. I extracted it to **/opt** and added the following to my ~/.bashrc file:
+```
+export PATH=“/opt/node-v24.11.1-linux-arm64/bin:$PATH”
+```
+
+
+**2) Compile and run the Node.js UI**
+
+Change to the ui directory, then build and run the UI:
+```
+$ cd ui
+$ npm run build_and_start
+```
+
+If all went well, you’ll be able to access the UI on port 8675 and start training.
+
+
+
+ Troubleshooting issues
+If you’re not getting any output when starting a training job from the UI, it’s probably crashing before the process started, the best way to debug these issues is to run the python training script directly (which is normally started by the UI). To do this, set up a training job in the UI, go to the advanced config screen, copy and paste the configuration into a file like train.yaml, then run the training script like this with the conda virtual environment active:
+
+```
+$ python run.py path/to/train.yaml
+```
+
+
\ No newline at end of file
diff --git a/dgx_requirements.txt b/dgx_requirements.txt
new file mode 100644
index 00000000..b97cc586
--- /dev/null
+++ b/dgx_requirements.txt
@@ -0,0 +1,14 @@
+# You need to use Python 3.11, the easiest way to get this on DGX OS without impacting the system version of Python is to create an environment with miniconda.
+
+# specific dependency versions needed on DGX OS devices:
+scipy==1.16.0
+tifffile==2025.6.11
+imageio==2.37.0
+scikit_image==0.25.2
+clean_fid==0.1.35
+pywavelets==1.9.0
+contourpy==1.3.3
+opencv_python_headless==4.11.0.86
+
+# we include the base requirements.txt for the remaining dependencies:
+-r requirements.txt
\ No newline at end of file