Merge branch 'main' into dev

This commit is contained in:
lllyasviel
2024-01-27 15:48:30 -08:00
committed by GitHub
66 changed files with 2746 additions and 2892 deletions

View File

@@ -1,12 +1 @@
* @AUTOMATIC1111
# if you were managing a localization and were removed from this file, this is because
# the intended way to do localizations now is via extensions. See:
# https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Developing-extensions
# Make a repo with your localization and since you are still listed as a collaborator
# you can add it to the wiki page yourself. This change is because some people complained
# the git commit log is cluttered with things unrelated to almost everyone and
# because I believe this is the best overall for the project to handle localizations almost
# entirely without my oversight.
* @lllyasviel

552
README.md
View File

@@ -1,182 +1,370 @@
# Stable Diffusion web UI
A web interface for Stable Diffusion, implemented using Gradio library.
![](screenshot.png)
## Features
[Detailed feature showcase with images](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features):
- Original txt2img and img2img modes
- One click install and run script (but you still must install python and git)
- Outpainting
- Inpainting
- Color Sketch
- Prompt Matrix
- Stable Diffusion Upscale
- Attention, specify parts of text that the model should pay more attention to
- a man in a `((tuxedo))` - will pay more attention to tuxedo
- a man in a `(tuxedo:1.21)` - alternative syntax
- select text and press `Ctrl+Up` or `Ctrl+Down` (or `Command+Up` or `Command+Down` if you're on a MacOS) to automatically adjust attention to selected text (code contributed by anonymous user)
- Loopback, run img2img processing multiple times
- X/Y/Z plot, a way to draw a 3 dimensional plot of images with different parameters
- Textual Inversion
- have as many embeddings as you want and use any names you like for them
- use multiple embeddings with different numbers of vectors per token
- works with half precision floating point numbers
- train embeddings on 8GB (also reports of 6GB working)
- Extras tab with:
- GFPGAN, neural network that fixes faces
- CodeFormer, face restoration tool as an alternative to GFPGAN
- RealESRGAN, neural network upscaler
- ESRGAN, neural network upscaler with a lot of third party models
- SwinIR and Swin2SR ([see here](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2092)), neural network upscalers
- LDSR, Latent diffusion super resolution upscaling
- Resizing aspect ratio options
- Sampling method selection
- Adjust sampler eta values (noise multiplier)
- More advanced noise setting options
- Interrupt processing at any time
- 4GB video card support (also reports of 2GB working)
- Correct seeds for batches
- Live prompt token length validation
- Generation parameters
- parameters you used to generate images are saved with that image
- in PNG chunks for PNG, in EXIF for JPEG
- can drag the image to PNG info tab to restore generation parameters and automatically copy them into UI
- can be disabled in settings
- drag and drop an image/text-parameters to promptbox
- Read Generation Parameters Button, loads parameters in promptbox to UI
- Settings page
- Running arbitrary python code from UI (must run with `--allow-code` to enable)
- Mouseover hints for most UI elements
- Possible to change defaults/mix/max/step values for UI elements via text config
- Tiling support, a checkbox to create images that can be tiled like textures
- Progress bar and live image generation preview
- Can use a separate neural network to produce previews with almost none VRAM or compute requirement
- Negative prompt, an extra text field that allows you to list what you don't want to see in generated image
- Styles, a way to save part of prompt and easily apply them via dropdown later
- Variations, a way to generate same image but with tiny differences
- Seed resizing, a way to generate same image but at slightly different resolution
- CLIP interrogator, a button that tries to guess prompt from an image
- Prompt Editing, a way to change prompt mid-generation, say to start making a watermelon and switch to anime girl midway
- Batch Processing, process a group of files using img2img
- Img2img Alternative, reverse Euler method of cross attention control
- Highres Fix, a convenience option to produce high resolution pictures in one click without usual distortions
- Reloading checkpoints on the fly
- Checkpoint Merger, a tab that allows you to merge up to 3 checkpoints into one
- [Custom scripts](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Custom-Scripts) with many extensions from community
- [Composable-Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/), a way to use multiple prompts at once
- separate prompts using uppercase `AND`
- also supports weights for prompts: `a cat :1.2 AND a dog AND a penguin :2.2`
- No token limit for prompts (original stable diffusion lets you use up to 75 tokens)
- DeepDanbooru integration, creates danbooru style tags for anime prompts
- [xformers](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Xformers), major speed increase for select cards: (add `--xformers` to commandline args)
- via extension: [History tab](https://github.com/yfszzx/stable-diffusion-webui-images-browser): view, direct and delete images conveniently within the UI
- Generate forever option
- Training tab
- hypernetworks and embeddings options
- Preprocessing images: cropping, mirroring, autotagging using BLIP or deepdanbooru (for anime)
- Clip skip
- Hypernetworks
- Loras (same as Hypernetworks but more pretty)
- A separate UI where you can choose, with preview, which embeddings, hypernetworks or Loras to add to your prompt
- Can select to load a different VAE from settings screen
- Estimated completion time in progress bar
- API
- Support for dedicated [inpainting model](https://github.com/runwayml/stable-diffusion#inpainting-with-stable-diffusion) by RunwayML
- via extension: [Aesthetic Gradients](https://github.com/AUTOMATIC1111/stable-diffusion-webui-aesthetic-gradients), a way to generate images with a specific aesthetic by using clip images embeds (implementation of [https://github.com/vicgalle/stable-diffusion-aesthetic-gradients](https://github.com/vicgalle/stable-diffusion-aesthetic-gradients))
- [Stable Diffusion 2.0](https://github.com/Stability-AI/stablediffusion) support - see [wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#stable-diffusion-20) for instructions
- [Alt-Diffusion](https://arxiv.org/abs/2211.06679) support - see [wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#alt-diffusion) for instructions
- Now without any bad letters!
- Load checkpoints in safetensors format
- Eased resolution restriction: generated image's dimensions must be a multiple of 8 rather than 64
- Now with a license!
- Reorder elements in the UI from settings screen
- [Segmind Stable Diffusion](https://huggingface.co/segmind/SSD-1B) support
## Installation and Running
Make sure the required [dependencies](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Dependencies) are met and follow the instructions available for:
- [NVidia](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-NVidia-GPUs) (recommended)
- [AMD](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs) GPUs.
- [Intel CPUs, Intel GPUs (both integrated and discrete)](https://github.com/openvinotoolkit/stable-diffusion-webui/wiki/Installation-on-Intel-Silicon) (external wiki page)
Alternatively, use online services (like Google Colab):
- [List of Online Services](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Online-Services)
### Installation on Windows 10/11 with NVidia-GPUs using release package
1. Download `sd.webui.zip` from [v1.0.0-pre](https://github.com/AUTOMATIC1111/stable-diffusion-webui/releases/tag/v1.0.0-pre) and extract its contents.
2. Run `update.bat`.
3. Run `run.bat`.
> For more details see [Install-and-Run-on-NVidia-GPUs](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-NVidia-GPUs)
### Automatic Installation on Windows
1. Install [Python 3.10.6](https://www.python.org/downloads/release/python-3106/) (Newer version of Python does not support torch), checking "Add Python to PATH".
2. Install [git](https://git-scm.com/download/win).
3. Download the stable-diffusion-webui repository, for example by running `git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git`.
4. Run `webui-user.bat` from Windows Explorer as normal, non-administrator, user.
### Automatic Installation on Linux
1. Install the dependencies:
```bash
# Debian-based:
sudo apt install wget git python3 python3-venv libgl1 libglib2.0-0
# Red Hat-based:
sudo dnf install wget git python3 gperftools-libs libglvnd-glx
# openSUSE-based:
sudo zypper install wget git python3 libtcmalloc4 libglvnd
# Arch-based:
sudo pacman -S wget git python3
```
2. Navigate to the directory you would like the webui to be installed and execute the following command:
```bash
wget -q https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh
```
3. Run `webui.sh`.
4. Check `webui-user.sh` for options.
### Installation on Apple Silicon
Find the instructions [here](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon).
## Contributing
Here's how to add code to this repo: [Contributing](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Contributing)
## Documentation
The documentation was moved from this README over to the project's [wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki).
For the purposes of getting Google and other search engines to crawl the wiki, here's a link to the (not for humans) [crawlable wiki](https://github-wiki-see.page/m/AUTOMATIC1111/stable-diffusion-webui/wiki).
## Credits
Licenses for borrowed code can be found in `Settings -> Licenses` screen, and also in `html/licenses.html` file.
- Stable Diffusion - https://github.com/Stability-AI/stablediffusion, https://github.com/CompVis/taming-transformers
- k-diffusion - https://github.com/crowsonkb/k-diffusion.git
- Spandrel - https://github.com/chaiNNer-org/spandrel implementing
- GFPGAN - https://github.com/TencentARC/GFPGAN.git
- CodeFormer - https://github.com/sczhou/CodeFormer
- ESRGAN - https://github.com/xinntao/ESRGAN
- SwinIR - https://github.com/JingyunLiang/SwinIR
- Swin2SR - https://github.com/mv-lab/swin2sr
- LDSR - https://github.com/Hafiidz/latent-diffusion
- MiDaS - https://github.com/isl-org/MiDaS
- Ideas for optimizations - https://github.com/basujindal/stable-diffusion
- Cross Attention layer optimization - Doggettx - https://github.com/Doggettx/stable-diffusion, original idea for prompt editing.
- Cross Attention layer optimization - InvokeAI, lstein - https://github.com/invoke-ai/InvokeAI (originally http://github.com/lstein/stable-diffusion)
- Sub-quadratic Cross Attention layer optimization - Alex Birch (https://github.com/Birch-san/diffusers/pull/1), Amin Rezaei (https://github.com/AminRezaei0x443/memory-efficient-attention)
- Textual Inversion - Rinon Gal - https://github.com/rinongal/textual_inversion (we're not using his code, but we are using his ideas).
- Idea for SD upscale - https://github.com/jquesnelle/txt2imghd
- Noise generation for outpainting mk2 - https://github.com/parlance-zz/g-diffuser-bot
- CLIP interrogator idea and borrowing some code - https://github.com/pharmapsychotic/clip-interrogator
- Idea for Composable Diffusion - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
- xformers - https://github.com/facebookresearch/xformers
- DeepDanbooru - interrogator for anime diffusers https://github.com/KichangKim/DeepDanbooru
- Sampling in float32 precision from a float16 UNet - marunine for the idea, Birch-san for the example Diffusers implementation (https://github.com/Birch-san/diffusers-play/tree/92feee6)
- Instruct pix2pix - Tim Brooks (star), Aleksander Holynski (star), Alexei A. Efros (no star) - https://github.com/timothybrooks/instruct-pix2pix
- Security advice - RyotaK
- UniPC sampler - Wenliang Zhao - https://github.com/wl-zhao/UniPC
- TAESD - Ollin Boer Bohan - https://github.com/madebyollin/taesd
- LyCORIS - KohakuBlueleaf
- Restart sampling - lambertae - https://github.com/Newbeeer/diffusion_restart_sampling
- Hypertile - tfernd - https://github.com/tfernd/HyperTile
- Initial Gradio script - posted on 4chan by an Anonymous user. Thank you Anonymous user.
- (You)
# This is a Private Project
Currently, we are only sending invitations to people who may be interested in development of this project.
Please do not share codes or info from this project to public.
If you see this, please join our private Discord server for discussion: https://discord.gg/2uvDhfAZ
# Stable Diffusion Web UI Forge
Stable Diffusion Web UI Forge is a platform on top of [Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to make development easier, and optimize the speed and resource consumption.
The name "Forge" is inspired from "Minecraft Forge". This project will become SD WebUI's Forge.
Forge will give you:
1. Improved optimization. (Fastest speed and minimal memory use among all alternative software.)
2. Patchable UNet and CLIP objects. (Developer-friendly platform.)
# Improved Optimization
I tested with several devices, and this is a typical result from 8GB VRAM (3070ti laptop) with SDXL.
**This is WebUI:**
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/c32baedd-500b-408f-8cfb-ed4570c883bd)
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/cb6098de-f2d4-4b25-9566-df4302dda396)
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/5447e8b7-f3ca-4003-9961-02027c8181e8)
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/f3cb57d9-ac7a-4667-8b3f-303139e38afa)
(average about 7.4GB/8GB, peak at about 7.9GB/8GB)
**This is WebUI Forge:**
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/0c45cd98-0b14-42c3-9556-28e48d4d5fa0)
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/3a71f5d4-39e5-4ab1-81cf-8eaa790a2dc8)
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/65fbb4a5-ee73-4bb9-9c5f-8a958cd9674d)
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/76f181a1-c5fb-4323-a6cc-b6308a45587e)
(average and peak are all 6.3GB/8GB)
Also, you can see that Forge does not change WebUI results. Installing Forge is not a seed breaking change.
We do not change any UI. But you will see the version of Forge here
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/93fdbccf-2f9b-4d45-ad81-c7c4106a357b)
"f0.0.1v1.7.0" means WebUI 1.7.0 with Forge 0.0.1
### Changes
Forge removes all WebUI's codes related to speed and memory optimization and reworked everything. All previous cmd flags like medvram, lowvram, medvram-sdxl, precision full, no half, no half vae, attention_xxx, upcast unet, ... are all REMOVED. Adding these flags will not cause error but they will not do anything now. **We highly encourage Forge users to remove all cmd flags and let Forge to decide how to load models.**
Without any cmd flag, Forge can run SDXL with 4GB vram and SD1.5 with 2GB vram.
**The only one flag that you may still need** is `--always-offload-from-vram` (This flag will make things **slower**). This option will let Forge always unload models from VRAM. This can be useful is you use multiple software together and want Forge to use less VRAM and give some vram to other software, or when you are using some old extensions that will compete vram with main UI, or (very rarely) when you get OOM.
If you really want to play with cmd flags, you can additionally control the GPU with:
(extreme VRAM cases)
--always-gpu
--always-cpu
(rare attention cases)
--attention-split
--attention-quad
--attention-pytorch
--disable-xformers
--disable-attention-upcast
(float point type)
--all-in-fp32
--all-in-fp16
--unet-in-bf16
--unet-in-fp16
--unet-in-fp8-e4m3fn
--unet-in-fp8-e5m2
--vae-in-fp16
--vae-in-fp32
--vae-in-bf16
--clip-in-fp8-e4m3fn
--clip-in-fp8-e5m2
--clip-in-fp16
--clip-in-fp32
(rare platforms)
--directml
--disable-ipex-hijack
--pytorch-deterministic
Again, Forge do not recommend users to use any cmd flags unless you are very sure that you really need these.
# Patchable UNet
Now developing an extension is super simple. We finally have patchable UNet.
Below is using one single file with 80 lines of codes to support FreeU:
`extensions-builtin/sd_forge_freeu/scripts/forge_freeu.py`
```python
import torch
import gradio as gr
from modules import scripts
def Fourier_filter(x, threshold, scale):
x_freq = torch.fft.fftn(x.float(), dim=(-2, -1))
x_freq = torch.fft.fftshift(x_freq, dim=(-2, -1))
B, C, H, W = x_freq.shape
mask = torch.ones((B, C, H, W), device=x.device)
crow, ccol = H // 2, W //2
mask[..., crow - threshold:crow + threshold, ccol - threshold:ccol + threshold] = scale
x_freq = x_freq * mask
x_freq = torch.fft.ifftshift(x_freq, dim=(-2, -1))
x_filtered = torch.fft.ifftn(x_freq, dim=(-2, -1)).real
return x_filtered.to(x.dtype)
def set_freeu_v2_patch(model, b1, b2, s1, s2):
model_channels = model.model.model_config.unet_config["model_channels"]
scale_dict = {model_channels * 4: (b1, s1), model_channels * 2: (b2, s2)}
def output_block_patch(h, hsp, *args, **kwargs):
scale = scale_dict.get(h.shape[1], None)
if scale is not None:
hidden_mean = h.mean(1).unsqueeze(1)
B = hidden_mean.shape[0]
hidden_max, _ = torch.max(hidden_mean.view(B, -1), dim=-1, keepdim=True)
hidden_min, _ = torch.min(hidden_mean.view(B, -1), dim=-1, keepdim=True)
hidden_mean = (hidden_mean - hidden_min.unsqueeze(2).unsqueeze(3)) / \
(hidden_max - hidden_min).unsqueeze(2).unsqueeze(3)
h[:, :h.shape[1] // 2] = h[:, :h.shape[1] // 2] * ((scale[0] - 1) * hidden_mean + 1)
hsp = Fourier_filter(hsp, threshold=1, scale=scale[1])
return h, hsp
m = model.clone()
m.set_model_output_block_patch(output_block_patch)
return m
class FreeUForForge(scripts.Script):
def title(self):
return "FreeU Integrated"
def show(self, is_img2img):
# make this extension visible in both txt2img and img2img tab.
return scripts.AlwaysVisible
def ui(self, *args, **kwargs):
with gr.Accordion(open=False, label=self.title()):
freeu_enabled = gr.Checkbox(label='Enabled', value=False)
freeu_b1 = gr.Slider(label='B1', minimum=0, maximum=2, step=0.01, value=1.01)
freeu_b2 = gr.Slider(label='B2', minimum=0, maximum=2, step=0.01, value=1.02)
freeu_s1 = gr.Slider(label='S1', minimum=0, maximum=4, step=0.01, value=0.99)
freeu_s2 = gr.Slider(label='S2', minimum=0, maximum=4, step=0.01, value=0.95)
return freeu_enabled, freeu_b1, freeu_b2, freeu_s1, freeu_s2
def process_batch(self, p, *script_args, **kwargs):
freeu_enabled, freeu_b1, freeu_b2, freeu_s1, freeu_s2 = script_args
if not freeu_enabled:
return
unet = p.sd_model.forge_objects.unet
unet = set_freeu_v2_patch(unet, freeu_b1, freeu_b2, freeu_s1, freeu_s2)
p.sd_model.forge_objects.unet = unet
# Below codes will add some logs to the texts below the image outputs on UI.
# The extra_generation_params does not influence results.
p.extra_generation_params.update(dict(
freeu_enabled=freeu_enabled,
freeu_b1=freeu_b1,
freeu_b2=freeu_b2,
freeu_s1=freeu_s1,
freeu_s2=freeu_s2,
))
return
```
It looks like this:
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/a7798cf2-057c-43e0-883a-5f8643af8529)
Similar components like HyperTile, KohyaHighResFix, SAG, can all be implemented within 100 lines of codes (see also the codes).
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/e2fc1b73-e6ee-405e-864c-c67afd92a1db)
ControlNets can finally be called by different extensions. (80% codes of ControlNet can be removed now, will start soon)
Implementing Stable Video Diffusion and Zero123 are also super simple now (see also the codes).
*Stable Video Diffusion:*
`extensions-builtin/sd_forge_svd/scripts/forge_svd.py`
```python
import torch
import gradio as gr
import os
import pathlib
from modules import script_callbacks
from modules.paths import models_path
from modules.ui_common import ToolButton, refresh_symbol
from modules import shared
from modules_forge.forge_util import numpy_to_pytorch, pytorch_to_numpy
from ldm_patched.modules.sd import load_checkpoint_guess_config
from ldm_patched.contrib.external_video_model import VideoLinearCFGGuidance, SVD_img2vid_Conditioning
from ldm_patched.contrib.external import KSampler, VAEDecode
opVideoLinearCFGGuidance = VideoLinearCFGGuidance()
opSVD_img2vid_Conditioning = SVD_img2vid_Conditioning()
opKSampler = KSampler()
opVAEDecode = VAEDecode()
svd_root = os.path.join(models_path, 'svd')
os.makedirs(svd_root, exist_ok=True)
svd_filenames = []
def update_svd_filenames():
global svd_filenames
svd_filenames = [
pathlib.Path(x).name for x in
shared.walk_files(svd_root, allowed_extensions=[".pt", ".ckpt", ".safetensors"])
]
return svd_filenames
@torch.inference_mode()
@torch.no_grad()
def predict(filename, width, height, video_frames, motion_bucket_id, fps, augmentation_level,
sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler,
sampling_denoise, guidance_min_cfg, input_image):
filename = os.path.join(svd_root, filename)
model_raw, _, vae, clip_vision = \
load_checkpoint_guess_config(filename, output_vae=True, output_clip=False, output_clipvision=True)
model = opVideoLinearCFGGuidance.patch(model_raw, guidance_min_cfg)[0]
init_image = numpy_to_pytorch(input_image)
positive, negative, latent_image = opSVD_img2vid_Conditioning.encode(
clip_vision, init_image, vae, width, height, video_frames, motion_bucket_id, fps, augmentation_level)
output_latent = opKSampler.sample(model, sampling_seed, sampling_steps, sampling_cfg,
sampling_sampler_name, sampling_scheduler, positive,
negative, latent_image, sampling_denoise)[0]
output_pixels = opVAEDecode.decode(vae, output_latent)[0]
outputs = pytorch_to_numpy(output_pixels)
return outputs
def on_ui_tabs():
with gr.Blocks() as svd_block:
with gr.Row():
with gr.Column():
input_image = gr.Image(label='Input Image', source='upload', type='numpy', height=400)
with gr.Row():
filename = gr.Dropdown(label="SVD Checkpoint Filename",
choices=svd_filenames,
value=svd_filenames[0] if len(svd_filenames) > 0 else None)
refresh_button = ToolButton(value=refresh_symbol, tooltip="Refresh")
refresh_button.click(
fn=lambda: gr.update(choices=update_svd_filenames),
inputs=[], outputs=filename)
width = gr.Slider(label='Width', minimum=16, maximum=8192, step=8, value=1024)
height = gr.Slider(label='Height', minimum=16, maximum=8192, step=8, value=576)
video_frames = gr.Slider(label='Video Frames', minimum=1, maximum=4096, step=1, value=14)
motion_bucket_id = gr.Slider(label='Motion Bucket Id', minimum=1, maximum=1023, step=1, value=127)
fps = gr.Slider(label='Fps', minimum=1, maximum=1024, step=1, value=6)
augmentation_level = gr.Slider(label='Augmentation Level', minimum=0.0, maximum=10.0, step=0.01,
value=0.0)
sampling_steps = gr.Slider(label='Sampling Steps', minimum=1, maximum=200, step=1, value=20)
sampling_cfg = gr.Slider(label='CFG Scale', minimum=0.0, maximum=50.0, step=0.1, value=2.5)
sampling_denoise = gr.Slider(label='Sampling Denoise', minimum=0.0, maximum=1.0, step=0.01, value=1.0)
guidance_min_cfg = gr.Slider(label='Guidance Min Cfg', minimum=0.0, maximum=100.0, step=0.5, value=1.0)
sampling_sampler_name = gr.Radio(label='Sampler Name',
choices=['euler', 'euler_ancestral', 'heun', 'heunpp2', 'dpm_2',
'dpm_2_ancestral', 'lms', 'dpm_fast', 'dpm_adaptive',
'dpmpp_2s_ancestral', 'dpmpp_sde', 'dpmpp_sde_gpu',
'dpmpp_2m', 'dpmpp_2m_sde', 'dpmpp_2m_sde_gpu',
'dpmpp_3m_sde', 'dpmpp_3m_sde_gpu', 'ddpm', 'lcm', 'ddim',
'uni_pc', 'uni_pc_bh2'], value='euler')
sampling_scheduler = gr.Radio(label='Scheduler',
choices=['normal', 'karras', 'exponential', 'sgm_uniform', 'simple',
'ddim_uniform'], value='karras')
sampling_seed = gr.Number(label='Seed', value=12345, precision=0)
generate_button = gr.Button(value="Generate")
ctrls = [filename, width, height, video_frames, motion_bucket_id, fps, augmentation_level,
sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler,
sampling_denoise, guidance_min_cfg, input_image]
with gr.Column():
output_gallery = gr.Gallery(label='Gallery', show_label=False, object_fit='contain',
visible=True, height=1024, columns=4)
generate_button.click(predict, inputs=ctrls, outputs=[output_gallery])
return [(svd_block, "SVD", "svd")]
update_svd_filenames()
script_callbacks.on_ui_tabs(on_ui_tabs)
```
Note that although the above codes look like independent codes, they actually will automatically offload/unload any other models. For example, below is me opening webui, load SDXL, generated an image, then go to SVD, then generated image frames. You can see that the GPU memory is perfectly managed and the SDXL is moved to RAM then SVD is moved to GPU.
Note that this management is fully automatic. This makes writing extensions super simple.
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/ac7ed152-cd33-4645-94af-4c43bb8c3d88)
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/cdcb23ad-02dc-4e39-be74-98e927550ef6)
Similarly, Zero123:
![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/d1a4a17d-f382-442d-91f2-fc5b6c10737f)
# About Extensions
All UI related extensions should work without problems, like:
canvas-zoom
different translations
etc
Below extensions are tested and worked well:
Dynamic Prompts
Adetailer
Ultimate SD Upscale
Reactor
Below extensions will be tested soon:
Regional Prompter (I have not figure out how to use that UI yet.. will test later)
Below extensions will be given up but they may still work
MultiDiffusion / Tiled Diffusison
Deform
Roop
(Tiled diffusion is integrated now and no need to install extra extensions. Also the current smart unet offload is much better than multi-diffusion and people can directly generate 4k images without using multi-diffusion, by automatically offload unet to RAM. If bigger than 4k, use Ultimate SD Upscale.)
(But if you want to use some special features in MultiDiffusion like inversion or region prompt, probably you can still use it, but it can be very rare.)
Below extensions will be reworked soon
sd-webui-controlnet
controlnet will not be replaced by another extension. We will have 10+ extensions for different preprocessors, and 10+ extensions for different control. Each extension will have less than 100 lines of codes. Everyone will be able to add preprocessor/control model by adding new extensions. And adding those extensions will be super easy. Those extensions will share a UI managed by Forge.

View File

@@ -1,7 +1,7 @@
import os
from modules.modelloader import load_file_from_url
from modules.upscaler import Upscaler, UpscalerData
from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
from ldsr_model_arch import LDSR
from modules import shared, script_callbacks, errors
import sd_hijack_autoencoder # noqa: F401
@@ -49,6 +49,7 @@ class UpscalerLDSR(Upscaler):
return LDSR(model, yaml)
def do_upscale(self, img, path):
prepare_free_memory()
try:
ldsr = self.load_model(path)
except Exception:

View File

@@ -1,31 +1,6 @@
import torch
import networks
from modules import patches
class LoraPatches:
def __init__(self):
self.Linear_forward = patches.patch(__name__, torch.nn.Linear, 'forward', networks.network_Linear_forward)
self.Linear_load_state_dict = patches.patch(__name__, torch.nn.Linear, '_load_from_state_dict', networks.network_Linear_load_state_dict)
self.Conv2d_forward = patches.patch(__name__, torch.nn.Conv2d, 'forward', networks.network_Conv2d_forward)
self.Conv2d_load_state_dict = patches.patch(__name__, torch.nn.Conv2d, '_load_from_state_dict', networks.network_Conv2d_load_state_dict)
self.GroupNorm_forward = patches.patch(__name__, torch.nn.GroupNorm, 'forward', networks.network_GroupNorm_forward)
self.GroupNorm_load_state_dict = patches.patch(__name__, torch.nn.GroupNorm, '_load_from_state_dict', networks.network_GroupNorm_load_state_dict)
self.LayerNorm_forward = patches.patch(__name__, torch.nn.LayerNorm, 'forward', networks.network_LayerNorm_forward)
self.LayerNorm_load_state_dict = patches.patch(__name__, torch.nn.LayerNorm, '_load_from_state_dict', networks.network_LayerNorm_load_state_dict)
self.MultiheadAttention_forward = patches.patch(__name__, torch.nn.MultiheadAttention, 'forward', networks.network_MultiheadAttention_forward)
self.MultiheadAttention_load_state_dict = patches.patch(__name__, torch.nn.MultiheadAttention, '_load_from_state_dict', networks.network_MultiheadAttention_load_state_dict)
pass
def undo(self):
self.Linear_forward = patches.undo(__name__, torch.nn.Linear, 'forward')
self.Linear_load_state_dict = patches.undo(__name__, torch.nn.Linear, '_load_from_state_dict')
self.Conv2d_forward = patches.undo(__name__, torch.nn.Conv2d, 'forward')
self.Conv2d_load_state_dict = patches.undo(__name__, torch.nn.Conv2d, '_load_from_state_dict')
self.GroupNorm_forward = patches.undo(__name__, torch.nn.GroupNorm, 'forward')
self.GroupNorm_load_state_dict = patches.undo(__name__, torch.nn.GroupNorm, '_load_from_state_dict')
self.LayerNorm_forward = patches.undo(__name__, torch.nn.LayerNorm, 'forward')
self.LayerNorm_load_state_dict = patches.undo(__name__, torch.nn.LayerNorm, '_load_from_state_dict')
self.MultiheadAttention_forward = patches.undo(__name__, torch.nn.MultiheadAttention, 'forward')
self.MultiheadAttention_load_state_dict = patches.undo(__name__, torch.nn.MultiheadAttention, '_load_from_state_dict')
pass

View File

@@ -1,68 +0,0 @@
import torch
def make_weight_cp(t, wa, wb):
temp = torch.einsum('i j k l, j r -> i r k l', t, wb)
return torch.einsum('i j k l, i r -> r j k l', temp, wa)
def rebuild_conventional(up, down, shape, dyn_dim=None):
up = up.reshape(up.size(0), -1)
down = down.reshape(down.size(0), -1)
if dyn_dim is not None:
up = up[:, :dyn_dim]
down = down[:dyn_dim, :]
return (up @ down).reshape(shape)
def rebuild_cp_decomposition(up, down, mid):
up = up.reshape(up.size(0), -1)
down = down.reshape(down.size(0), -1)
return torch.einsum('n m k l, i n, m j -> i j k l', mid, up, down)
# copied from https://github.com/KohakuBlueleaf/LyCORIS/blob/dev/lycoris/modules/lokr.py
def factorization(dimension: int, factor:int=-1) -> tuple[int, int]:
'''
return a tuple of two value of input dimension decomposed by the number closest to factor
second value is higher or equal than first value.
In LoRA with Kroneckor Product, first value is a value for weight scale.
secon value is a value for weight.
Becuase of non-commutative property, A⊗B ≠ B⊗A. Meaning of two matrices is slightly different.
examples)
factor
-1 2 4 8 16 ...
127 -> 1, 127 127 -> 1, 127 127 -> 1, 127 127 -> 1, 127 127 -> 1, 127
128 -> 8, 16 128 -> 2, 64 128 -> 4, 32 128 -> 8, 16 128 -> 8, 16
250 -> 10, 25 250 -> 2, 125 250 -> 2, 125 250 -> 5, 50 250 -> 10, 25
360 -> 8, 45 360 -> 2, 180 360 -> 4, 90 360 -> 8, 45 360 -> 12, 30
512 -> 16, 32 512 -> 2, 256 512 -> 4, 128 512 -> 8, 64 512 -> 16, 32
1024 -> 32, 32 1024 -> 2, 512 1024 -> 4, 256 1024 -> 8, 128 1024 -> 16, 64
'''
if factor > 0 and (dimension % factor) == 0:
m = factor
n = dimension // factor
if m > n:
n, m = m, n
return m, n
if factor < 0:
factor = dimension
m, n = 1, dimension
length = m + n
while m<n:
new_m = m + 1
while dimension%new_m != 0:
new_m += 1
new_n = dimension // new_m
if new_m + new_n > length or new_m>factor:
break
else:
m, n = new_m, new_n
if m > n:
n, m = m, n
return m, n

View File

@@ -1,190 +1,190 @@
from __future__ import annotations
import os
from collections import namedtuple
import enum
import torch.nn as nn
import torch.nn.functional as F
from modules import sd_models, cache, errors, hashes, shared
NetworkWeights = namedtuple('NetworkWeights', ['network_key', 'sd_key', 'w', 'sd_module'])
metadata_tags_order = {"ss_sd_model_name": 1, "ss_resolution": 2, "ss_clip_skip": 3, "ss_num_train_images": 10, "ss_tag_frequency": 20}
class SdVersion(enum.Enum):
Unknown = 1
SD1 = 2
SD2 = 3
SDXL = 4
class NetworkOnDisk:
def __init__(self, name, filename):
self.name = name
self.filename = filename
self.metadata = {}
self.is_safetensors = os.path.splitext(filename)[1].lower() == ".safetensors"
def read_metadata():
metadata = sd_models.read_metadata_from_safetensors(filename)
metadata.pop('ssmd_cover_images', None) # those are cover images, and they are too big to display in UI as text
return metadata
if self.is_safetensors:
try:
self.metadata = cache.cached_data_for_file('safetensors-metadata', "lora/" + self.name, filename, read_metadata)
except Exception as e:
errors.display(e, f"reading lora {filename}")
if self.metadata:
m = {}
for k, v in sorted(self.metadata.items(), key=lambda x: metadata_tags_order.get(x[0], 999)):
m[k] = v
self.metadata = m
self.alias = self.metadata.get('ss_output_name', self.name)
self.hash = None
self.shorthash = None
self.set_hash(
self.metadata.get('sshs_model_hash') or
hashes.sha256_from_cache(self.filename, "lora/" + self.name, use_addnet_hash=self.is_safetensors) or
''
)
self.sd_version = self.detect_version()
def detect_version(self):
if str(self.metadata.get('ss_base_model_version', "")).startswith("sdxl_"):
return SdVersion.SDXL
elif str(self.metadata.get('ss_v2', "")) == "True":
return SdVersion.SD2
elif len(self.metadata):
return SdVersion.SD1
return SdVersion.Unknown
def set_hash(self, v):
self.hash = v
self.shorthash = self.hash[0:12]
if self.shorthash:
import networks
networks.available_network_hash_lookup[self.shorthash] = self
def read_hash(self):
if not self.hash:
self.set_hash(hashes.sha256(self.filename, "lora/" + self.name, use_addnet_hash=self.is_safetensors) or '')
def get_alias(self):
import networks
if shared.opts.lora_preferred_name == "Filename" or self.alias.lower() in networks.forbidden_network_aliases:
return self.name
else:
return self.alias
class Network: # LoraModule
def __init__(self, name, network_on_disk: NetworkOnDisk):
self.name = name
self.network_on_disk = network_on_disk
self.te_multiplier = 1.0
self.unet_multiplier = 1.0
self.dyn_dim = None
self.modules = {}
self.bundle_embeddings = {}
self.mtime = None
self.mentioned_name = None
"""the text that was used to add the network to prompt - can be either name or an alias"""
class ModuleType:
def create_module(self, net: Network, weights: NetworkWeights) -> Network | None:
return None
class NetworkModule:
def __init__(self, net: Network, weights: NetworkWeights):
self.network = net
self.network_key = weights.network_key
self.sd_key = weights.sd_key
self.sd_module = weights.sd_module
if hasattr(self.sd_module, 'weight'):
self.shape = self.sd_module.weight.shape
self.ops = None
self.extra_kwargs = {}
if isinstance(self.sd_module, nn.Conv2d):
self.ops = F.conv2d
self.extra_kwargs = {
'stride': self.sd_module.stride,
'padding': self.sd_module.padding
}
elif isinstance(self.sd_module, nn.Linear):
self.ops = F.linear
elif isinstance(self.sd_module, nn.LayerNorm):
self.ops = F.layer_norm
self.extra_kwargs = {
'normalized_shape': self.sd_module.normalized_shape,
'eps': self.sd_module.eps
}
elif isinstance(self.sd_module, nn.GroupNorm):
self.ops = F.group_norm
self.extra_kwargs = {
'num_groups': self.sd_module.num_groups,
'eps': self.sd_module.eps
}
self.dim = None
self.bias = weights.w.get("bias")
self.alpha = weights.w["alpha"].item() if "alpha" in weights.w else None
self.scale = weights.w["scale"].item() if "scale" in weights.w else None
def multiplier(self):
if 'transformer' in self.sd_key[:20]:
return self.network.te_multiplier
else:
return self.network.unet_multiplier
def calc_scale(self):
if self.scale is not None:
return self.scale
if self.dim is not None and self.alpha is not None:
return self.alpha / self.dim
return 1.0
def finalize_updown(self, updown, orig_weight, output_shape, ex_bias=None):
if self.bias is not None:
updown = updown.reshape(self.bias.shape)
updown += self.bias.to(orig_weight.device, dtype=updown.dtype)
updown = updown.reshape(output_shape)
if len(output_shape) == 4:
updown = updown.reshape(output_shape)
if orig_weight.size().numel() == updown.size().numel():
updown = updown.reshape(orig_weight.shape)
if ex_bias is not None:
ex_bias = ex_bias * self.multiplier()
return updown * self.calc_scale() * self.multiplier(), ex_bias
def calc_updown(self, target):
raise NotImplementedError()
def forward(self, x, y):
"""A general forward implementation for all modules"""
if self.ops is None:
raise NotImplementedError()
else:
updown, ex_bias = self.calc_updown(self.sd_module.weight)
return y + self.ops(x, weight=updown, bias=ex_bias, **self.extra_kwargs)
from __future__ import annotations
import os
from collections import namedtuple
import enum
import torch.nn as nn
import torch.nn.functional as F
from modules import sd_models, cache, errors, hashes, shared
NetworkWeights = namedtuple('NetworkWeights', ['network_key', 'sd_key', 'w', 'sd_module'])
metadata_tags_order = {"ss_sd_model_name": 1, "ss_resolution": 2, "ss_clip_skip": 3, "ss_num_train_images": 10, "ss_tag_frequency": 20}
class SdVersion(enum.Enum):
Unknown = 1
SD1 = 2
SD2 = 3
SDXL = 4
class NetworkOnDisk:
def __init__(self, name, filename):
self.name = name
self.filename = filename
self.metadata = {}
self.is_safetensors = os.path.splitext(filename)[1].lower() == ".safetensors"
def read_metadata():
metadata = sd_models.read_metadata_from_safetensors(filename)
metadata.pop('ssmd_cover_images', None) # those are cover images, and they are too big to display in UI as text
return metadata
if self.is_safetensors:
try:
self.metadata = cache.cached_data_for_file('safetensors-metadata', "lora/" + self.name, filename, read_metadata)
except Exception as e:
errors.display(e, f"reading lora {filename}")
if self.metadata:
m = {}
for k, v in sorted(self.metadata.items(), key=lambda x: metadata_tags_order.get(x[0], 999)):
m[k] = v
self.metadata = m
self.alias = self.metadata.get('ss_output_name', self.name)
self.hash = None
self.shorthash = None
self.set_hash(
self.metadata.get('sshs_model_hash') or
hashes.sha256_from_cache(self.filename, "lora/" + self.name, use_addnet_hash=self.is_safetensors) or
''
)
self.sd_version = self.detect_version()
def detect_version(self):
if str(self.metadata.get('ss_base_model_version', "")).startswith("sdxl_"):
return SdVersion.SDXL
elif str(self.metadata.get('ss_v2', "")) == "True":
return SdVersion.SD2
elif len(self.metadata):
return SdVersion.SD1
return SdVersion.Unknown
def set_hash(self, v):
self.hash = v
self.shorthash = self.hash[0:12]
if self.shorthash:
import networks
networks.available_network_hash_lookup[self.shorthash] = self
def read_hash(self):
if not self.hash:
self.set_hash(hashes.sha256(self.filename, "lora/" + self.name, use_addnet_hash=self.is_safetensors) or '')
def get_alias(self):
import networks
if shared.opts.lora_preferred_name == "Filename" or self.alias.lower() in networks.forbidden_network_aliases:
return self.name
else:
return self.alias
class Network: # LoraModule
def __init__(self, name, network_on_disk: NetworkOnDisk):
self.name = name
self.network_on_disk = network_on_disk
self.te_multiplier = 1.0
self.unet_multiplier = 1.0
self.dyn_dim = None
self.modules = {}
self.bundle_embeddings = {}
self.mtime = None
self.mentioned_name = None
"""the text that was used to add the network to prompt - can be either name or an alias"""
class ModuleType:
def create_module(self, net: Network, weights: NetworkWeights) -> Network | None:
return None
class NetworkModule:
def __init__(self, net: Network, weights: NetworkWeights):
self.network = net
self.network_key = weights.network_key
self.sd_key = weights.sd_key
self.sd_module = weights.sd_module
if hasattr(self.sd_module, 'weight'):
self.shape = self.sd_module.weight.shape
self.ops = None
self.extra_kwargs = {}
if isinstance(self.sd_module, nn.Conv2d):
self.ops = F.conv2d
self.extra_kwargs = {
'stride': self.sd_module.stride,
'padding': self.sd_module.padding
}
elif isinstance(self.sd_module, nn.Linear):
self.ops = F.linear
elif isinstance(self.sd_module, nn.LayerNorm):
self.ops = F.layer_norm
self.extra_kwargs = {
'normalized_shape': self.sd_module.normalized_shape,
'eps': self.sd_module.eps
}
elif isinstance(self.sd_module, nn.GroupNorm):
self.ops = F.group_norm
self.extra_kwargs = {
'num_groups': self.sd_module.num_groups,
'eps': self.sd_module.eps
}
self.dim = None
self.bias = weights.w.get("bias")
self.alpha = weights.w["alpha"].item() if "alpha" in weights.w else None
self.scale = weights.w["scale"].item() if "scale" in weights.w else None
def multiplier(self):
if 'transformer' in self.sd_key[:20]:
return self.network.te_multiplier
else:
return self.network.unet_multiplier
def calc_scale(self):
if self.scale is not None:
return self.scale
if self.dim is not None and self.alpha is not None:
return self.alpha / self.dim
return 1.0
def finalize_updown(self, updown, orig_weight, output_shape, ex_bias=None):
if self.bias is not None:
updown = updown.reshape(self.bias.shape)
updown += self.bias.to(orig_weight.device, dtype=updown.dtype)
updown = updown.reshape(output_shape)
if len(output_shape) == 4:
updown = updown.reshape(output_shape)
if orig_weight.size().numel() == updown.size().numel():
updown = updown.reshape(orig_weight.shape)
if ex_bias is not None:
ex_bias = ex_bias * self.multiplier()
return updown * self.calc_scale() * self.multiplier(), ex_bias
def calc_updown(self, target):
raise NotImplementedError()
def forward(self, x, y):
"""A general forward implementation for all modules"""
if self.ops is None:
raise NotImplementedError()
else:
updown, ex_bias = self.calc_updown(self.sd_module.weight)
return y + self.ops(x, weight=updown, bias=ex_bias, **self.extra_kwargs)

View File

@@ -1,27 +0,0 @@
import network
class ModuleTypeFull(network.ModuleType):
def create_module(self, net: network.Network, weights: network.NetworkWeights):
if all(x in weights.w for x in ["diff"]):
return NetworkModuleFull(net, weights)
return None
class NetworkModuleFull(network.NetworkModule):
def __init__(self, net: network.Network, weights: network.NetworkWeights):
super().__init__(net, weights)
self.weight = weights.w.get("diff")
self.ex_bias = weights.w.get("diff_b")
def calc_updown(self, orig_weight):
output_shape = self.weight.shape
updown = self.weight.to(orig_weight.device)
if self.ex_bias is not None:
ex_bias = self.ex_bias.to(orig_weight.device)
else:
ex_bias = None
return self.finalize_updown(updown, orig_weight, output_shape, ex_bias)

View File

@@ -1,33 +0,0 @@
import network
class ModuleTypeGLora(network.ModuleType):
def create_module(self, net: network.Network, weights: network.NetworkWeights):
if all(x in weights.w for x in ["a1.weight", "a2.weight", "alpha", "b1.weight", "b2.weight"]):
return NetworkModuleGLora(net, weights)
return None
# adapted from https://github.com/KohakuBlueleaf/LyCORIS
class NetworkModuleGLora(network.NetworkModule):
def __init__(self, net: network.Network, weights: network.NetworkWeights):
super().__init__(net, weights)
if hasattr(self.sd_module, 'weight'):
self.shape = self.sd_module.weight.shape
self.w1a = weights.w["a1.weight"]
self.w1b = weights.w["b1.weight"]
self.w2a = weights.w["a2.weight"]
self.w2b = weights.w["b2.weight"]
def calc_updown(self, orig_weight):
w1a = self.w1a.to(orig_weight.device)
w1b = self.w1b.to(orig_weight.device)
w2a = self.w2a.to(orig_weight.device)
w2b = self.w2b.to(orig_weight.device)
output_shape = [w1a.size(0), w1b.size(1)]
updown = ((w2b @ w1b) + ((orig_weight.to(dtype = w1a.dtype) @ w2a) @ w1a))
return self.finalize_updown(updown, orig_weight, output_shape)

View File

@@ -1,55 +0,0 @@
import lyco_helpers
import network
class ModuleTypeHada(network.ModuleType):
def create_module(self, net: network.Network, weights: network.NetworkWeights):
if all(x in weights.w for x in ["hada_w1_a", "hada_w1_b", "hada_w2_a", "hada_w2_b"]):
return NetworkModuleHada(net, weights)
return None
class NetworkModuleHada(network.NetworkModule):
def __init__(self, net: network.Network, weights: network.NetworkWeights):
super().__init__(net, weights)
if hasattr(self.sd_module, 'weight'):
self.shape = self.sd_module.weight.shape
self.w1a = weights.w["hada_w1_a"]
self.w1b = weights.w["hada_w1_b"]
self.dim = self.w1b.shape[0]
self.w2a = weights.w["hada_w2_a"]
self.w2b = weights.w["hada_w2_b"]
self.t1 = weights.w.get("hada_t1")
self.t2 = weights.w.get("hada_t2")
def calc_updown(self, orig_weight):
w1a = self.w1a.to(orig_weight.device)
w1b = self.w1b.to(orig_weight.device)
w2a = self.w2a.to(orig_weight.device)
w2b = self.w2b.to(orig_weight.device)
output_shape = [w1a.size(0), w1b.size(1)]
if self.t1 is not None:
output_shape = [w1a.size(1), w1b.size(1)]
t1 = self.t1.to(orig_weight.device)
updown1 = lyco_helpers.make_weight_cp(t1, w1a, w1b)
output_shape += t1.shape[2:]
else:
if len(w1b.shape) == 4:
output_shape += w1b.shape[2:]
updown1 = lyco_helpers.rebuild_conventional(w1a, w1b, output_shape)
if self.t2 is not None:
t2 = self.t2.to(orig_weight.device)
updown2 = lyco_helpers.make_weight_cp(t2, w2a, w2b)
else:
updown2 = lyco_helpers.rebuild_conventional(w2a, w2b, output_shape)
updown = updown1 * updown2
return self.finalize_updown(updown, orig_weight, output_shape)

View File

@@ -1,30 +0,0 @@
import network
class ModuleTypeIa3(network.ModuleType):
def create_module(self, net: network.Network, weights: network.NetworkWeights):
if all(x in weights.w for x in ["weight"]):
return NetworkModuleIa3(net, weights)
return None
class NetworkModuleIa3(network.NetworkModule):
def __init__(self, net: network.Network, weights: network.NetworkWeights):
super().__init__(net, weights)
self.w = weights.w["weight"]
self.on_input = weights.w["on_input"].item()
def calc_updown(self, orig_weight):
w = self.w.to(orig_weight.device)
output_shape = [w.size(0), orig_weight.size(1)]
if self.on_input:
output_shape.reverse()
else:
w = w.reshape(-1, 1)
updown = orig_weight * w
return self.finalize_updown(updown, orig_weight, output_shape)

View File

@@ -1,64 +0,0 @@
import torch
import lyco_helpers
import network
class ModuleTypeLokr(network.ModuleType):
def create_module(self, net: network.Network, weights: network.NetworkWeights):
has_1 = "lokr_w1" in weights.w or ("lokr_w1_a" in weights.w and "lokr_w1_b" in weights.w)
has_2 = "lokr_w2" in weights.w or ("lokr_w2_a" in weights.w and "lokr_w2_b" in weights.w)
if has_1 and has_2:
return NetworkModuleLokr(net, weights)
return None
def make_kron(orig_shape, w1, w2):
if len(w2.shape) == 4:
w1 = w1.unsqueeze(2).unsqueeze(2)
w2 = w2.contiguous()
return torch.kron(w1, w2).reshape(orig_shape)
class NetworkModuleLokr(network.NetworkModule):
def __init__(self, net: network.Network, weights: network.NetworkWeights):
super().__init__(net, weights)
self.w1 = weights.w.get("lokr_w1")
self.w1a = weights.w.get("lokr_w1_a")
self.w1b = weights.w.get("lokr_w1_b")
self.dim = self.w1b.shape[0] if self.w1b is not None else self.dim
self.w2 = weights.w.get("lokr_w2")
self.w2a = weights.w.get("lokr_w2_a")
self.w2b = weights.w.get("lokr_w2_b")
self.dim = self.w2b.shape[0] if self.w2b is not None else self.dim
self.t2 = weights.w.get("lokr_t2")
def calc_updown(self, orig_weight):
if self.w1 is not None:
w1 = self.w1.to(orig_weight.device)
else:
w1a = self.w1a.to(orig_weight.device)
w1b = self.w1b.to(orig_weight.device)
w1 = w1a @ w1b
if self.w2 is not None:
w2 = self.w2.to(orig_weight.device)
elif self.t2 is None:
w2a = self.w2a.to(orig_weight.device)
w2b = self.w2b.to(orig_weight.device)
w2 = w2a @ w2b
else:
t2 = self.t2.to(orig_weight.device)
w2a = self.w2a.to(orig_weight.device)
w2b = self.w2b.to(orig_weight.device)
w2 = lyco_helpers.make_weight_cp(t2, w2a, w2b)
output_shape = [w1.size(0) * w2.size(0), w1.size(1) * w2.size(1)]
if len(orig_weight.shape) == 4:
output_shape = orig_weight.shape
updown = make_kron(output_shape, w1, w2)
return self.finalize_updown(updown, orig_weight, output_shape)

View File

@@ -1,86 +0,0 @@
import torch
import lyco_helpers
import network
from modules import devices
class ModuleTypeLora(network.ModuleType):
def create_module(self, net: network.Network, weights: network.NetworkWeights):
if all(x in weights.w for x in ["lora_up.weight", "lora_down.weight"]):
return NetworkModuleLora(net, weights)
return None
class NetworkModuleLora(network.NetworkModule):
def __init__(self, net: network.Network, weights: network.NetworkWeights):
super().__init__(net, weights)
self.up_model = self.create_module(weights.w, "lora_up.weight")
self.down_model = self.create_module(weights.w, "lora_down.weight")
self.mid_model = self.create_module(weights.w, "lora_mid.weight", none_ok=True)
self.dim = weights.w["lora_down.weight"].shape[0]
def create_module(self, weights, key, none_ok=False):
weight = weights.get(key)
if weight is None and none_ok:
return None
is_linear = type(self.sd_module) in [torch.nn.Linear, torch.nn.modules.linear.NonDynamicallyQuantizableLinear, torch.nn.MultiheadAttention]
is_conv = type(self.sd_module) in [torch.nn.Conv2d]
if is_linear:
weight = weight.reshape(weight.shape[0], -1)
module = torch.nn.Linear(weight.shape[1], weight.shape[0], bias=False)
elif is_conv and key == "lora_down.weight" or key == "dyn_up":
if len(weight.shape) == 2:
weight = weight.reshape(weight.shape[0], -1, 1, 1)
if weight.shape[2] != 1 or weight.shape[3] != 1:
module = torch.nn.Conv2d(weight.shape[1], weight.shape[0], self.sd_module.kernel_size, self.sd_module.stride, self.sd_module.padding, bias=False)
else:
module = torch.nn.Conv2d(weight.shape[1], weight.shape[0], (1, 1), bias=False)
elif is_conv and key == "lora_mid.weight":
module = torch.nn.Conv2d(weight.shape[1], weight.shape[0], self.sd_module.kernel_size, self.sd_module.stride, self.sd_module.padding, bias=False)
elif is_conv and key == "lora_up.weight" or key == "dyn_down":
module = torch.nn.Conv2d(weight.shape[1], weight.shape[0], (1, 1), bias=False)
else:
raise AssertionError(f'Lora layer {self.network_key} matched a layer with unsupported type: {type(self.sd_module).__name__}')
with torch.no_grad():
if weight.shape != module.weight.shape:
weight = weight.reshape(module.weight.shape)
module.weight.copy_(weight)
module.to(device=devices.cpu, dtype=devices.dtype)
module.weight.requires_grad_(False)
return module
def calc_updown(self, orig_weight):
up = self.up_model.weight.to(orig_weight.device)
down = self.down_model.weight.to(orig_weight.device)
output_shape = [up.size(0), down.size(1)]
if self.mid_model is not None:
# cp-decomposition
mid = self.mid_model.weight.to(orig_weight.device)
updown = lyco_helpers.rebuild_cp_decomposition(up, down, mid)
output_shape += mid.shape[2:]
else:
if len(down.shape) == 4:
output_shape += down.shape[2:]
updown = lyco_helpers.rebuild_conventional(up, down, output_shape, self.network.dyn_dim)
return self.finalize_updown(updown, orig_weight, output_shape)
def forward(self, x, y):
self.up_model.to(device=devices.device)
self.down_model.to(device=devices.device)
return y + self.up_model(self.down_model(x)) * self.multiplier() * self.calc_scale()

View File

@@ -1,28 +0,0 @@
import network
class ModuleTypeNorm(network.ModuleType):
def create_module(self, net: network.Network, weights: network.NetworkWeights):
if all(x in weights.w for x in ["w_norm", "b_norm"]):
return NetworkModuleNorm(net, weights)
return None
class NetworkModuleNorm(network.NetworkModule):
def __init__(self, net: network.Network, weights: network.NetworkWeights):
super().__init__(net, weights)
self.w_norm = weights.w.get("w_norm")
self.b_norm = weights.w.get("b_norm")
def calc_updown(self, orig_weight):
output_shape = self.w_norm.shape
updown = self.w_norm.to(orig_weight.device)
if self.b_norm is not None:
ex_bias = self.b_norm.to(orig_weight.device)
else:
ex_bias = None
return self.finalize_updown(updown, orig_weight, output_shape, ex_bias)

View File

@@ -1,82 +0,0 @@
import torch
import network
from lyco_helpers import factorization
from einops import rearrange
class ModuleTypeOFT(network.ModuleType):
def create_module(self, net: network.Network, weights: network.NetworkWeights):
if all(x in weights.w for x in ["oft_blocks"]) or all(x in weights.w for x in ["oft_diag"]):
return NetworkModuleOFT(net, weights)
return None
# Supports both kohya-ss' implementation of COFT https://github.com/kohya-ss/sd-scripts/blob/main/networks/oft.py
# and KohakuBlueleaf's implementation of OFT/COFT https://github.com/KohakuBlueleaf/LyCORIS/blob/dev/lycoris/modules/diag_oft.py
class NetworkModuleOFT(network.NetworkModule):
def __init__(self, net: network.Network, weights: network.NetworkWeights):
super().__init__(net, weights)
self.lin_module = None
self.org_module: list[torch.Module] = [self.sd_module]
self.scale = 1.0
# kohya-ss
if "oft_blocks" in weights.w.keys():
self.is_kohya = True
self.oft_blocks = weights.w["oft_blocks"] # (num_blocks, block_size, block_size)
self.alpha = weights.w["alpha"] # alpha is constraint
self.dim = self.oft_blocks.shape[0] # lora dim
# LyCORIS
elif "oft_diag" in weights.w.keys():
self.is_kohya = False
self.oft_blocks = weights.w["oft_diag"]
# self.alpha is unused
self.dim = self.oft_blocks.shape[1] # (num_blocks, block_size, block_size)
is_linear = type(self.sd_module) in [torch.nn.Linear, torch.nn.modules.linear.NonDynamicallyQuantizableLinear]
is_conv = type(self.sd_module) in [torch.nn.Conv2d]
is_other_linear = type(self.sd_module) in [torch.nn.MultiheadAttention] # unsupported
if is_linear:
self.out_dim = self.sd_module.out_features
elif is_conv:
self.out_dim = self.sd_module.out_channels
elif is_other_linear:
self.out_dim = self.sd_module.embed_dim
if self.is_kohya:
self.constraint = self.alpha * self.out_dim
self.num_blocks = self.dim
self.block_size = self.out_dim // self.dim
else:
self.constraint = None
self.block_size, self.num_blocks = factorization(self.out_dim, self.dim)
def calc_updown(self, orig_weight):
oft_blocks = self.oft_blocks.to(orig_weight.device)
eye = torch.eye(self.block_size, device=oft_blocks.device)
if self.is_kohya:
block_Q = oft_blocks - oft_blocks.transpose(1, 2) # ensure skew-symmetric orthogonal matrix
norm_Q = torch.norm(block_Q.flatten())
new_norm_Q = torch.clamp(norm_Q, max=self.constraint.to(oft_blocks.device))
block_Q = block_Q * ((new_norm_Q + 1e-8) / (norm_Q + 1e-8))
oft_blocks = torch.matmul(eye + block_Q, (eye - block_Q).float().inverse())
R = oft_blocks.to(orig_weight.device)
# This errors out for MultiheadAttention, might need to be handled up-stream
merged_weight = rearrange(orig_weight, '(k n) ... -> k n ...', k=self.num_blocks, n=self.block_size)
merged_weight = torch.einsum(
'k n m, k n ... -> k m ...',
R,
merged_weight
)
merged_weight = rearrange(merged_weight, 'k m ... -> (k m) ...')
updown = merged_weight.to(orig_weight.device) - orig_weight.to(merged_weight.dtype)
output_shape = orig_weight.shape
return self.finalize_updown(updown, orig_weight, output_shape)

View File

@@ -1,566 +1,127 @@
import gradio as gr
import logging
import os
import re
import lora_patches
import functools
import network
import network_lora
import network_glora
import network_hada
import network_ia3
import network_lokr
import network_full
import network_norm
import network_oft
import torch
from typing import Union
from modules import shared, devices, sd_models, errors, scripts, sd_hijack
import modules.textual_inversion.textual_inversion as textual_inversion
from lora_logger import logger
module_types = [
network_lora.ModuleTypeLora(),
network_hada.ModuleTypeHada(),
network_ia3.ModuleTypeIa3(),
network_lokr.ModuleTypeLokr(),
network_full.ModuleTypeFull(),
network_norm.ModuleTypeNorm(),
network_glora.ModuleTypeGLora(),
network_oft.ModuleTypeOFT(),
]
from modules import shared, sd_models, errors, scripts
from ldm_patched.modules.utils import load_torch_file
from ldm_patched.modules.sd import load_lora_for_models
re_digits = re.compile(r"\d+")
re_x_proj = re.compile(r"(.*)_([qkv]_proj)$")
re_compiled = {}
suffix_conversion = {
"attentions": {},
"resnets": {
"conv1": "in_layers_2",
"conv2": "out_layers_3",
"norm1": "in_layers_0",
"norm2": "out_layers_0",
"time_emb_proj": "emb_layers_1",
"conv_shortcut": "skip_connection",
}
}
@functools.lru_cache(maxsize=5)
def load_lora_state_dict(filename):
return load_torch_file(filename, safe_load=True)
def convert_diffusers_name_to_compvis(key, is_sd2):
def match(match_list, regex_text):
regex = re_compiled.get(regex_text)
if regex is None:
regex = re.compile(regex_text)
re_compiled[regex_text] = regex
r = re.match(regex, key)
if not r:
return False
match_list.clear()
match_list.extend([int(x) if re.match(re_digits, x) else x for x in r.groups()])
return True
m = []
if match(m, r"lora_unet_conv_in(.*)"):
return f'diffusion_model_input_blocks_0_0{m[0]}'
if match(m, r"lora_unet_conv_out(.*)"):
return f'diffusion_model_out_2{m[0]}'
if match(m, r"lora_unet_time_embedding_linear_(\d+)(.*)"):
return f"diffusion_model_time_embed_{m[0] * 2 - 2}{m[1]}"
if match(m, r"lora_unet_down_blocks_(\d+)_(attentions|resnets)_(\d+)_(.+)"):
suffix = suffix_conversion.get(m[1], {}).get(m[3], m[3])
return f"diffusion_model_input_blocks_{1 + m[0] * 3 + m[2]}_{1 if m[1] == 'attentions' else 0}_{suffix}"
if match(m, r"lora_unet_mid_block_(attentions|resnets)_(\d+)_(.+)"):
suffix = suffix_conversion.get(m[0], {}).get(m[2], m[2])
return f"diffusion_model_middle_block_{1 if m[0] == 'attentions' else m[1] * 2}_{suffix}"
if match(m, r"lora_unet_up_blocks_(\d+)_(attentions|resnets)_(\d+)_(.+)"):
suffix = suffix_conversion.get(m[1], {}).get(m[3], m[3])
return f"diffusion_model_output_blocks_{m[0] * 3 + m[2]}_{1 if m[1] == 'attentions' else 0}_{suffix}"
if match(m, r"lora_unet_down_blocks_(\d+)_downsamplers_0_conv"):
return f"diffusion_model_input_blocks_{3 + m[0] * 3}_0_op"
if match(m, r"lora_unet_up_blocks_(\d+)_upsamplers_0_conv"):
return f"diffusion_model_output_blocks_{2 + m[0] * 3}_{2 if m[0]>0 else 1}_conv"
if match(m, r"lora_te_text_model_encoder_layers_(\d+)_(.+)"):
if is_sd2:
if 'mlp_fc1' in m[1]:
return f"model_transformer_resblocks_{m[0]}_{m[1].replace('mlp_fc1', 'mlp_c_fc')}"
elif 'mlp_fc2' in m[1]:
return f"model_transformer_resblocks_{m[0]}_{m[1].replace('mlp_fc2', 'mlp_c_proj')}"
else:
return f"model_transformer_resblocks_{m[0]}_{m[1].replace('self_attn', 'attn')}"
return f"transformer_text_model_encoder_layers_{m[0]}_{m[1]}"
if match(m, r"lora_te2_text_model_encoder_layers_(\d+)_(.+)"):
if 'mlp_fc1' in m[1]:
return f"1_model_transformer_resblocks_{m[0]}_{m[1].replace('mlp_fc1', 'mlp_c_fc')}"
elif 'mlp_fc2' in m[1]:
return f"1_model_transformer_resblocks_{m[0]}_{m[1].replace('mlp_fc2', 'mlp_c_proj')}"
else:
return f"1_model_transformer_resblocks_{m[0]}_{m[1].replace('self_attn', 'attn')}"
return key
pass
def assign_network_names_to_compvis_modules(sd_model):
network_layer_mapping = {}
if shared.sd_model.is_sdxl:
for i, embedder in enumerate(shared.sd_model.conditioner.embedders):
if not hasattr(embedder, 'wrapped'):
continue
for name, module in embedder.wrapped.named_modules():
network_name = f'{i}_{name.replace(".", "_")}'
network_layer_mapping[network_name] = module
module.network_layer_name = network_name
else:
for name, module in shared.sd_model.cond_stage_model.wrapped.named_modules():
network_name = name.replace(".", "_")
network_layer_mapping[network_name] = module
module.network_layer_name = network_name
for name, module in shared.sd_model.model.named_modules():
network_name = name.replace(".", "_")
network_layer_mapping[network_name] = module
module.network_layer_name = network_name
sd_model.network_layer_mapping = network_layer_mapping
pass
def load_network(name, network_on_disk):
net = network.Network(name, network_on_disk)
net.mtime = os.path.getmtime(network_on_disk.filename)
sd = sd_models.read_state_dict(network_on_disk.filename)
# this should not be needed but is here as an emergency fix for an unknown error people are experiencing in 1.2.0
if not hasattr(shared.sd_model, 'network_layer_mapping'):
assign_network_names_to_compvis_modules(shared.sd_model)
keys_failed_to_match = {}
is_sd2 = 'model_transformer_resblocks' in shared.sd_model.network_layer_mapping
matched_networks = {}
bundle_embeddings = {}
for key_network, weight in sd.items():
key_network_without_network_parts, _, network_part = key_network.partition(".")
if key_network_without_network_parts == "bundle_emb":
emb_name, vec_name = network_part.split(".", 1)
emb_dict = bundle_embeddings.get(emb_name, {})
if vec_name.split('.')[0] == 'string_to_param':
_, k2 = vec_name.split('.', 1)
emb_dict['string_to_param'] = {k2: weight}
else:
emb_dict[vec_name] = weight
bundle_embeddings[emb_name] = emb_dict
key = convert_diffusers_name_to_compvis(key_network_without_network_parts, is_sd2)
sd_module = shared.sd_model.network_layer_mapping.get(key, None)
if sd_module is None:
m = re_x_proj.match(key)
if m:
sd_module = shared.sd_model.network_layer_mapping.get(m.group(1), None)
# SDXL loras seem to already have correct compvis keys, so only need to replace "lora_unet" with "diffusion_model"
if sd_module is None and "lora_unet" in key_network_without_network_parts:
key = key_network_without_network_parts.replace("lora_unet", "diffusion_model")
sd_module = shared.sd_model.network_layer_mapping.get(key, None)
elif sd_module is None and "lora_te1_text_model" in key_network_without_network_parts:
key = key_network_without_network_parts.replace("lora_te1_text_model", "0_transformer_text_model")
sd_module = shared.sd_model.network_layer_mapping.get(key, None)
# some SD1 Loras also have correct compvis keys
if sd_module is None:
key = key_network_without_network_parts.replace("lora_te1_text_model", "transformer_text_model")
sd_module = shared.sd_model.network_layer_mapping.get(key, None)
# kohya_ss OFT module
elif sd_module is None and "oft_unet" in key_network_without_network_parts:
key = key_network_without_network_parts.replace("oft_unet", "diffusion_model")
sd_module = shared.sd_model.network_layer_mapping.get(key, None)
# KohakuBlueLeaf OFT module
if sd_module is None and "oft_diag" in key:
key = key_network_without_network_parts.replace("lora_unet", "diffusion_model")
key = key_network_without_network_parts.replace("lora_te1_text_model", "0_transformer_text_model")
sd_module = shared.sd_model.network_layer_mapping.get(key, None)
if sd_module is None:
keys_failed_to_match[key_network] = key
continue
if key not in matched_networks:
matched_networks[key] = network.NetworkWeights(network_key=key_network, sd_key=key, w={}, sd_module=sd_module)
matched_networks[key].w[network_part] = weight
for key, weights in matched_networks.items():
net_module = None
for nettype in module_types:
net_module = nettype.create_module(net, weights)
if net_module is not None:
break
if net_module is None:
raise AssertionError(f"Could not find a module type (out of {', '.join([x.__class__.__name__ for x in module_types])}) that would accept those keys: {', '.join(weights.w)}")
net.modules[key] = net_module
embeddings = {}
for emb_name, data in bundle_embeddings.items():
embedding = textual_inversion.create_embedding_from_data(data, emb_name, filename=network_on_disk.filename + "/" + emb_name)
embedding.loaded = None
embeddings[emb_name] = embedding
net.bundle_embeddings = embeddings
if keys_failed_to_match:
logging.debug(f"Network {network_on_disk.filename} didn't match keys: {keys_failed_to_match}")
return net
pass
def purge_networks_from_memory():
while len(networks_in_memory) > shared.opts.lora_in_memory_limit and len(networks_in_memory) > 0:
name = next(iter(networks_in_memory))
networks_in_memory.pop(name, None)
devices.torch_gc()
pass
def load_networks(names, te_multipliers=None, unet_multipliers=None, dyn_dims=None):
emb_db = sd_hijack.model_hijack.embedding_db
already_loaded = {}
global lora_state_dict_cache
for net in loaded_networks:
if net.name in names:
already_loaded[net.name] = net
for emb_name, embedding in net.bundle_embeddings.items():
if embedding.loaded:
emb_db.register_embedding_by_name(None, shared.sd_model, emb_name)
loaded_networks.clear()
current_sd = sd_models.model_data.get_sd_model()
if current_sd is None:
return
networks_on_disk = [available_networks.get(name, None) if name.lower() in forbidden_network_aliases else available_network_aliases.get(name, None) for name in names]
if any(x is None for x in networks_on_disk):
list_available_networks()
networks_on_disk = [available_networks.get(name, None) if name.lower() in forbidden_network_aliases else available_network_aliases.get(name, None) for name in names]
failed_to_load_networks = []
compiled_lora_targets = []
for a, b, c in zip(networks_on_disk, unet_multipliers, te_multipliers):
compiled_lora_targets.append([a.filename, b, c])
for i, (network_on_disk, name) in enumerate(zip(networks_on_disk, names)):
net = already_loaded.get(name, None)
compiled_lora_targets_hash = str(compiled_lora_targets)
if network_on_disk is not None:
if net is None:
net = networks_in_memory.get(name)
if current_sd.current_lora_hash == compiled_lora_targets_hash:
return
if net is None or os.path.getmtime(network_on_disk.filename) > net.mtime:
try:
net = load_network(name, network_on_disk)
current_sd.current_lora_hash = compiled_lora_targets_hash
current_sd.forge_objects.unet = current_sd.forge_objects_original.unet
current_sd.forge_objects.clip = current_sd.forge_objects_original.clip
networks_in_memory.pop(name, None)
networks_in_memory[name] = net
except Exception as e:
errors.display(e, f"loading network {network_on_disk.filename}")
continue
for filename, strength_model, strength_clip in compiled_lora_targets:
lora_sd = load_lora_state_dict(filename)
current_sd.forge_objects.unet, current_sd.forge_objects.clip = load_lora_for_models(
current_sd.forge_objects.unet, current_sd.forge_objects.clip, lora_sd, strength_model, strength_clip)
net.mentioned_name = name
network_on_disk.read_hash()
if net is None:
failed_to_load_networks.append(name)
logging.info(f"Couldn't find network with name {name}")
continue
net.te_multiplier = te_multipliers[i] if te_multipliers else 1.0
net.unet_multiplier = unet_multipliers[i] if unet_multipliers else 1.0
net.dyn_dim = dyn_dims[i] if dyn_dims else 1.0
loaded_networks.append(net)
for emb_name, embedding in net.bundle_embeddings.items():
if embedding.loaded is None and emb_name in emb_db.word_embeddings:
logger.warning(
f'Skip bundle embedding: "{emb_name}"'
' as it was already loaded from embeddings folder'
)
continue
embedding.loaded = False
if emb_db.expected_shape == -1 or emb_db.expected_shape == embedding.shape:
embedding.loaded = True
emb_db.register_embedding(embedding, shared.sd_model)
else:
emb_db.skipped_embeddings[name] = embedding
if failed_to_load_networks:
lora_not_found_message = f'Lora not found: {", ".join(failed_to_load_networks)}'
sd_hijack.model_hijack.comments.append(lora_not_found_message)
if shared.opts.lora_not_found_warning_console:
print(f'\n{lora_not_found_message}\n')
if shared.opts.lora_not_found_gradio_warning:
gr.Warning(lora_not_found_message)
purge_networks_from_memory()
current_sd.forge_objects_after_applying_lora = current_sd.forge_objects.shallow_copy()
return
def network_restore_weights_from_backup(self: Union[torch.nn.Conv2d, torch.nn.Linear, torch.nn.GroupNorm, torch.nn.LayerNorm, torch.nn.MultiheadAttention]):
weights_backup = getattr(self, "network_weights_backup", None)
bias_backup = getattr(self, "network_bias_backup", None)
if weights_backup is None and bias_backup is None:
return
if weights_backup is not None:
if isinstance(self, torch.nn.MultiheadAttention):
self.in_proj_weight.copy_(weights_backup[0])
self.out_proj.weight.copy_(weights_backup[1])
else:
self.weight.copy_(weights_backup)
if bias_backup is not None:
if isinstance(self, torch.nn.MultiheadAttention):
self.out_proj.bias.copy_(bias_backup)
else:
self.bias.copy_(bias_backup)
else:
if isinstance(self, torch.nn.MultiheadAttention):
self.out_proj.bias = None
else:
self.bias = None
pass
def network_apply_weights(self: Union[torch.nn.Conv2d, torch.nn.Linear, torch.nn.GroupNorm, torch.nn.LayerNorm, torch.nn.MultiheadAttention]):
"""
Applies the currently selected set of networks to the weights of torch layer self.
If weights already have this particular set of networks applied, does nothing.
If not, restores orginal weights from backup and alters weights according to networks.
"""
network_layer_name = getattr(self, 'network_layer_name', None)
if network_layer_name is None:
return
current_names = getattr(self, "network_current_names", ())
wanted_names = tuple((x.name, x.te_multiplier, x.unet_multiplier, x.dyn_dim) for x in loaded_networks)
weights_backup = getattr(self, "network_weights_backup", None)
if weights_backup is None and wanted_names != ():
if current_names != ():
raise RuntimeError("no backup weights found and current weights are not unchanged")
if isinstance(self, torch.nn.MultiheadAttention):
weights_backup = (self.in_proj_weight.to(devices.cpu, copy=True), self.out_proj.weight.to(devices.cpu, copy=True))
else:
weights_backup = self.weight.to(devices.cpu, copy=True)
self.network_weights_backup = weights_backup
bias_backup = getattr(self, "network_bias_backup", None)
if bias_backup is None:
if isinstance(self, torch.nn.MultiheadAttention) and self.out_proj.bias is not None:
bias_backup = self.out_proj.bias.to(devices.cpu, copy=True)
elif getattr(self, 'bias', None) is not None:
bias_backup = self.bias.to(devices.cpu, copy=True)
else:
bias_backup = None
self.network_bias_backup = bias_backup
if current_names != wanted_names:
network_restore_weights_from_backup(self)
for net in loaded_networks:
module = net.modules.get(network_layer_name, None)
if module is not None and hasattr(self, 'weight'):
try:
with torch.no_grad():
if getattr(self, 'fp16_weight', None) is None:
weight = self.weight
bias = self.bias
else:
weight = self.fp16_weight.clone().to(self.weight.device)
bias = getattr(self, 'fp16_bias', None)
if bias is not None:
bias = bias.clone().to(self.bias.device)
updown, ex_bias = module.calc_updown(weight)
if len(weight.shape) == 4 and weight.shape[1] == 9:
# inpainting model. zero pad updown to make channel[1] 4 to 9
updown = torch.nn.functional.pad(updown, (0, 0, 0, 0, 0, 5))
self.weight.copy_((weight.to(dtype=updown.dtype) + updown).to(dtype=self.weight.dtype))
if ex_bias is not None and hasattr(self, 'bias'):
if self.bias is None:
self.bias = torch.nn.Parameter(ex_bias).to(self.weight.dtype)
else:
self.bias.copy_((bias + ex_bias).to(dtype=self.bias.dtype))
except RuntimeError as e:
logging.debug(f"Network {net.name} layer {network_layer_name}: {e}")
extra_network_lora.errors[net.name] = extra_network_lora.errors.get(net.name, 0) + 1
continue
module_q = net.modules.get(network_layer_name + "_q_proj", None)
module_k = net.modules.get(network_layer_name + "_k_proj", None)
module_v = net.modules.get(network_layer_name + "_v_proj", None)
module_out = net.modules.get(network_layer_name + "_out_proj", None)
if isinstance(self, torch.nn.MultiheadAttention) and module_q and module_k and module_v and module_out:
try:
with torch.no_grad():
updown_q, _ = module_q.calc_updown(self.in_proj_weight)
updown_k, _ = module_k.calc_updown(self.in_proj_weight)
updown_v, _ = module_v.calc_updown(self.in_proj_weight)
updown_qkv = torch.vstack([updown_q, updown_k, updown_v])
updown_out, ex_bias = module_out.calc_updown(self.out_proj.weight)
self.in_proj_weight += updown_qkv
self.out_proj.weight += updown_out
if ex_bias is not None:
if self.out_proj.bias is None:
self.out_proj.bias = torch.nn.Parameter(ex_bias)
else:
self.out_proj.bias += ex_bias
except RuntimeError as e:
logging.debug(f"Network {net.name} layer {network_layer_name}: {e}")
extra_network_lora.errors[net.name] = extra_network_lora.errors.get(net.name, 0) + 1
continue
if module is None:
continue
logging.debug(f"Network {net.name} layer {network_layer_name}: couldn't find supported operation")
extra_network_lora.errors[net.name] = extra_network_lora.errors.get(net.name, 0) + 1
self.network_current_names = wanted_names
pass
def network_forward(org_module, input, original_forward):
"""
Old way of applying Lora by executing operations during layer's forward.
Stacking many loras this way results in big performance degradation.
"""
if len(loaded_networks) == 0:
return original_forward(org_module, input)
input = devices.cond_cast_unet(input)
network_restore_weights_from_backup(org_module)
network_reset_cached_weight(org_module)
y = original_forward(org_module, input)
network_layer_name = getattr(org_module, 'network_layer_name', None)
for lora in loaded_networks:
module = lora.modules.get(network_layer_name, None)
if module is None:
continue
y = module.forward(input, y)
return y
pass
def network_reset_cached_weight(self: Union[torch.nn.Conv2d, torch.nn.Linear]):
self.network_current_names = ()
self.network_weights_backup = None
self.network_bias_backup = None
pass
def network_Linear_forward(self, input):
if shared.opts.lora_functional:
return network_forward(self, input, originals.Linear_forward)
network_apply_weights(self)
return originals.Linear_forward(self, input)
pass
def network_Linear_load_state_dict(self, *args, **kwargs):
network_reset_cached_weight(self)
return originals.Linear_load_state_dict(self, *args, **kwargs)
pass
def network_Conv2d_forward(self, input):
if shared.opts.lora_functional:
return network_forward(self, input, originals.Conv2d_forward)
network_apply_weights(self)
return originals.Conv2d_forward(self, input)
pass
def network_Conv2d_load_state_dict(self, *args, **kwargs):
network_reset_cached_weight(self)
return originals.Conv2d_load_state_dict(self, *args, **kwargs)
pass
def network_GroupNorm_forward(self, input):
if shared.opts.lora_functional:
return network_forward(self, input, originals.GroupNorm_forward)
network_apply_weights(self)
return originals.GroupNorm_forward(self, input)
pass
def network_GroupNorm_load_state_dict(self, *args, **kwargs):
network_reset_cached_weight(self)
return originals.GroupNorm_load_state_dict(self, *args, **kwargs)
pass
def network_LayerNorm_forward(self, input):
if shared.opts.lora_functional:
return network_forward(self, input, originals.LayerNorm_forward)
network_apply_weights(self)
return originals.LayerNorm_forward(self, input)
pass
def network_LayerNorm_load_state_dict(self, *args, **kwargs):
network_reset_cached_weight(self)
return originals.LayerNorm_load_state_dict(self, *args, **kwargs)
pass
def network_MultiheadAttention_forward(self, *args, **kwargs):
network_apply_weights(self)
return originals.MultiheadAttention_forward(self, *args, **kwargs)
pass
def network_MultiheadAttention_load_state_dict(self, *args, **kwargs):
network_reset_cached_weight(self)
return originals.MultiheadAttention_load_state_dict(self, *args, **kwargs)
pass
def list_available_networks():

View File

@@ -5,7 +5,7 @@ import torch
from PIL import Image
from modules import devices, modelloader, script_callbacks, shared, upscaler_utils
from modules.upscaler import Upscaler, UpscalerData
from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
SWINIR_MODEL_URL = "https://github.com/JingyunLiang/SwinIR/releases/download/v0.0/003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth"
@@ -33,6 +33,8 @@ class UpscalerSwinIR(Upscaler):
self.scalers = scalers
def do_upscale(self, img: Image.Image, model_file: str) -> Image.Image:
prepare_free_memory()
current_config = (model_file, shared.opts.SWIN_tile)
if self._cached_model_config == current_config:

View File

@@ -1,351 +0,0 @@
"""
Hypertile module for splitting attention layers in SD-1.5 U-Net and SD-1.5 VAE
Warn: The patch works well only if the input image has a width and height that are multiples of 128
Original author: @tfernd Github: https://github.com/tfernd/HyperTile
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import Callable
from functools import wraps, cache
import math
import torch.nn as nn
import random
from einops import rearrange
@dataclass
class HypertileParams:
depth = 0
layer_name = ""
tile_size: int = 0
swap_size: int = 0
aspect_ratio: float = 1.0
forward = None
enabled = False
# TODO add SD-XL layers
DEPTH_LAYERS = {
0: [
# SD 1.5 U-Net (diffusers)
"down_blocks.0.attentions.0.transformer_blocks.0.attn1",
"down_blocks.0.attentions.1.transformer_blocks.0.attn1",
"up_blocks.3.attentions.0.transformer_blocks.0.attn1",
"up_blocks.3.attentions.1.transformer_blocks.0.attn1",
"up_blocks.3.attentions.2.transformer_blocks.0.attn1",
# SD 1.5 U-Net (ldm)
"input_blocks.1.1.transformer_blocks.0.attn1",
"input_blocks.2.1.transformer_blocks.0.attn1",
"output_blocks.9.1.transformer_blocks.0.attn1",
"output_blocks.10.1.transformer_blocks.0.attn1",
"output_blocks.11.1.transformer_blocks.0.attn1",
# SD 1.5 VAE
"decoder.mid_block.attentions.0",
"decoder.mid.attn_1",
],
1: [
# SD 1.5 U-Net (diffusers)
"down_blocks.1.attentions.0.transformer_blocks.0.attn1",
"down_blocks.1.attentions.1.transformer_blocks.0.attn1",
"up_blocks.2.attentions.0.transformer_blocks.0.attn1",
"up_blocks.2.attentions.1.transformer_blocks.0.attn1",
"up_blocks.2.attentions.2.transformer_blocks.0.attn1",
# SD 1.5 U-Net (ldm)
"input_blocks.4.1.transformer_blocks.0.attn1",
"input_blocks.5.1.transformer_blocks.0.attn1",
"output_blocks.6.1.transformer_blocks.0.attn1",
"output_blocks.7.1.transformer_blocks.0.attn1",
"output_blocks.8.1.transformer_blocks.0.attn1",
],
2: [
# SD 1.5 U-Net (diffusers)
"down_blocks.2.attentions.0.transformer_blocks.0.attn1",
"down_blocks.2.attentions.1.transformer_blocks.0.attn1",
"up_blocks.1.attentions.0.transformer_blocks.0.attn1",
"up_blocks.1.attentions.1.transformer_blocks.0.attn1",
"up_blocks.1.attentions.2.transformer_blocks.0.attn1",
# SD 1.5 U-Net (ldm)
"input_blocks.7.1.transformer_blocks.0.attn1",
"input_blocks.8.1.transformer_blocks.0.attn1",
"output_blocks.3.1.transformer_blocks.0.attn1",
"output_blocks.4.1.transformer_blocks.0.attn1",
"output_blocks.5.1.transformer_blocks.0.attn1",
],
3: [
# SD 1.5 U-Net (diffusers)
"mid_block.attentions.0.transformer_blocks.0.attn1",
# SD 1.5 U-Net (ldm)
"middle_block.1.transformer_blocks.0.attn1",
],
}
# XL layers, thanks for GitHub@gel-crabs for the help
DEPTH_LAYERS_XL = {
0: [
# SD 1.5 U-Net (diffusers)
"down_blocks.0.attentions.0.transformer_blocks.0.attn1",
"down_blocks.0.attentions.1.transformer_blocks.0.attn1",
"up_blocks.3.attentions.0.transformer_blocks.0.attn1",
"up_blocks.3.attentions.1.transformer_blocks.0.attn1",
"up_blocks.3.attentions.2.transformer_blocks.0.attn1",
# SD 1.5 U-Net (ldm)
"input_blocks.4.1.transformer_blocks.0.attn1",
"input_blocks.5.1.transformer_blocks.0.attn1",
"output_blocks.3.1.transformer_blocks.0.attn1",
"output_blocks.4.1.transformer_blocks.0.attn1",
"output_blocks.5.1.transformer_blocks.0.attn1",
# SD 1.5 VAE
"decoder.mid_block.attentions.0",
"decoder.mid.attn_1",
],
1: [
# SD 1.5 U-Net (diffusers)
#"down_blocks.1.attentions.0.transformer_blocks.0.attn1",
#"down_blocks.1.attentions.1.transformer_blocks.0.attn1",
#"up_blocks.2.attentions.0.transformer_blocks.0.attn1",
#"up_blocks.2.attentions.1.transformer_blocks.0.attn1",
#"up_blocks.2.attentions.2.transformer_blocks.0.attn1",
# SD 1.5 U-Net (ldm)
"input_blocks.4.1.transformer_blocks.1.attn1",
"input_blocks.5.1.transformer_blocks.1.attn1",
"output_blocks.3.1.transformer_blocks.1.attn1",
"output_blocks.4.1.transformer_blocks.1.attn1",
"output_blocks.5.1.transformer_blocks.1.attn1",
"input_blocks.7.1.transformer_blocks.0.attn1",
"input_blocks.8.1.transformer_blocks.0.attn1",
"output_blocks.0.1.transformer_blocks.0.attn1",
"output_blocks.1.1.transformer_blocks.0.attn1",
"output_blocks.2.1.transformer_blocks.0.attn1",
"input_blocks.7.1.transformer_blocks.1.attn1",
"input_blocks.8.1.transformer_blocks.1.attn1",
"output_blocks.0.1.transformer_blocks.1.attn1",
"output_blocks.1.1.transformer_blocks.1.attn1",
"output_blocks.2.1.transformer_blocks.1.attn1",
"input_blocks.7.1.transformer_blocks.2.attn1",
"input_blocks.8.1.transformer_blocks.2.attn1",
"output_blocks.0.1.transformer_blocks.2.attn1",
"output_blocks.1.1.transformer_blocks.2.attn1",
"output_blocks.2.1.transformer_blocks.2.attn1",
"input_blocks.7.1.transformer_blocks.3.attn1",
"input_blocks.8.1.transformer_blocks.3.attn1",
"output_blocks.0.1.transformer_blocks.3.attn1",
"output_blocks.1.1.transformer_blocks.3.attn1",
"output_blocks.2.1.transformer_blocks.3.attn1",
"input_blocks.7.1.transformer_blocks.4.attn1",
"input_blocks.8.1.transformer_blocks.4.attn1",
"output_blocks.0.1.transformer_blocks.4.attn1",
"output_blocks.1.1.transformer_blocks.4.attn1",
"output_blocks.2.1.transformer_blocks.4.attn1",
"input_blocks.7.1.transformer_blocks.5.attn1",
"input_blocks.8.1.transformer_blocks.5.attn1",
"output_blocks.0.1.transformer_blocks.5.attn1",
"output_blocks.1.1.transformer_blocks.5.attn1",
"output_blocks.2.1.transformer_blocks.5.attn1",
"input_blocks.7.1.transformer_blocks.6.attn1",
"input_blocks.8.1.transformer_blocks.6.attn1",
"output_blocks.0.1.transformer_blocks.6.attn1",
"output_blocks.1.1.transformer_blocks.6.attn1",
"output_blocks.2.1.transformer_blocks.6.attn1",
"input_blocks.7.1.transformer_blocks.7.attn1",
"input_blocks.8.1.transformer_blocks.7.attn1",
"output_blocks.0.1.transformer_blocks.7.attn1",
"output_blocks.1.1.transformer_blocks.7.attn1",
"output_blocks.2.1.transformer_blocks.7.attn1",
"input_blocks.7.1.transformer_blocks.8.attn1",
"input_blocks.8.1.transformer_blocks.8.attn1",
"output_blocks.0.1.transformer_blocks.8.attn1",
"output_blocks.1.1.transformer_blocks.8.attn1",
"output_blocks.2.1.transformer_blocks.8.attn1",
"input_blocks.7.1.transformer_blocks.9.attn1",
"input_blocks.8.1.transformer_blocks.9.attn1",
"output_blocks.0.1.transformer_blocks.9.attn1",
"output_blocks.1.1.transformer_blocks.9.attn1",
"output_blocks.2.1.transformer_blocks.9.attn1",
],
2: [
# SD 1.5 U-Net (diffusers)
"mid_block.attentions.0.transformer_blocks.0.attn1",
# SD 1.5 U-Net (ldm)
"middle_block.1.transformer_blocks.0.attn1",
"middle_block.1.transformer_blocks.1.attn1",
"middle_block.1.transformer_blocks.2.attn1",
"middle_block.1.transformer_blocks.3.attn1",
"middle_block.1.transformer_blocks.4.attn1",
"middle_block.1.transformer_blocks.5.attn1",
"middle_block.1.transformer_blocks.6.attn1",
"middle_block.1.transformer_blocks.7.attn1",
"middle_block.1.transformer_blocks.8.attn1",
"middle_block.1.transformer_blocks.9.attn1",
],
3 : [] # TODO - separate layers for SD-XL
}
RNG_INSTANCE = random.Random()
@cache
def get_divisors(value: int, min_value: int, /, max_options: int = 1) -> list[int]:
"""
Returns divisors of value that
x * min_value <= value
in big -> small order, amount of divisors is limited by max_options
"""
max_options = max(1, max_options) # at least 1 option should be returned
min_value = min(min_value, value)
divisors = [i for i in range(min_value, value + 1) if value % i == 0] # divisors in small -> big order
ns = [value // i for i in divisors[:max_options]] # has at least 1 element # big -> small order
return ns
def random_divisor(value: int, min_value: int, /, max_options: int = 1) -> int:
"""
Returns a random divisor of value that
x * min_value <= value
if max_options is 1, the behavior is deterministic
"""
ns = get_divisors(value, min_value, max_options=max_options) # get cached divisors
idx = RNG_INSTANCE.randint(0, len(ns) - 1)
return ns[idx]
def set_hypertile_seed(seed: int) -> None:
RNG_INSTANCE.seed(seed)
@cache
def largest_tile_size_available(width: int, height: int) -> int:
"""
Calculates the largest tile size available for a given width and height
Tile size is always a power of 2
"""
gcd = math.gcd(width, height)
largest_tile_size_available = 1
while gcd % (largest_tile_size_available * 2) == 0:
largest_tile_size_available *= 2
return largest_tile_size_available
def iterative_closest_divisors(hw:int, aspect_ratio:float) -> tuple[int, int]:
"""
Finds h and w such that h*w = hw and h/w = aspect_ratio
We check all possible divisors of hw and return the closest to the aspect ratio
"""
divisors = [i for i in range(2, hw + 1) if hw % i == 0] # all divisors of hw
pairs = [(i, hw // i) for i in divisors] # all pairs of divisors of hw
ratios = [w/h for h, w in pairs] # all ratios of pairs of divisors of hw
closest_ratio = min(ratios, key=lambda x: abs(x - aspect_ratio)) # closest ratio to aspect_ratio
closest_pair = pairs[ratios.index(closest_ratio)] # closest pair of divisors to aspect_ratio
return closest_pair
@cache
def find_hw_candidates(hw:int, aspect_ratio:float) -> tuple[int, int]:
"""
Finds h and w such that h*w = hw and h/w = aspect_ratio
"""
h, w = round(math.sqrt(hw * aspect_ratio)), round(math.sqrt(hw / aspect_ratio))
# find h and w such that h*w = hw and h/w = aspect_ratio
if h * w != hw:
w_candidate = hw / h
# check if w is an integer
if not w_candidate.is_integer():
h_candidate = hw / w
# check if h is an integer
if not h_candidate.is_integer():
return iterative_closest_divisors(hw, aspect_ratio)
else:
h = int(h_candidate)
else:
w = int(w_candidate)
return h, w
def self_attn_forward(params: HypertileParams, scale_depth=True) -> Callable:
@wraps(params.forward)
def wrapper(*args, **kwargs):
if not params.enabled:
return params.forward(*args, **kwargs)
latent_tile_size = max(128, params.tile_size) // 8
x = args[0]
# VAE
if x.ndim == 4:
b, c, h, w = x.shape
nh = random_divisor(h, latent_tile_size, params.swap_size)
nw = random_divisor(w, latent_tile_size, params.swap_size)
if nh * nw > 1:
x = rearrange(x, "b c (nh h) (nw w) -> (b nh nw) c h w", nh=nh, nw=nw) # split into nh * nw tiles
out = params.forward(x, *args[1:], **kwargs)
if nh * nw > 1:
out = rearrange(out, "(b nh nw) c h w -> b c (nh h) (nw w)", nh=nh, nw=nw)
# U-Net
else:
hw: int = x.size(1)
h, w = find_hw_candidates(hw, params.aspect_ratio)
assert h * w == hw, f"Invalid aspect ratio {params.aspect_ratio} for input of shape {x.shape}, hw={hw}, h={h}, w={w}"
factor = 2 ** params.depth if scale_depth else 1
nh = random_divisor(h, latent_tile_size * factor, params.swap_size)
nw = random_divisor(w, latent_tile_size * factor, params.swap_size)
if nh * nw > 1:
x = rearrange(x, "b (nh h nw w) c -> (b nh nw) (h w) c", h=h // nh, w=w // nw, nh=nh, nw=nw)
out = params.forward(x, *args[1:], **kwargs)
if nh * nw > 1:
out = rearrange(out, "(b nh nw) hw c -> b nh nw hw c", nh=nh, nw=nw)
out = rearrange(out, "b nh nw (h w) c -> b (nh h nw w) c", h=h // nh, w=w // nw)
return out
return wrapper
def hypertile_hook_model(model: nn.Module, width, height, *, enable=False, tile_size_max=128, swap_size=1, max_depth=3, is_sdxl=False):
hypertile_layers = getattr(model, "__webui_hypertile_layers", None)
if hypertile_layers is None:
if not enable:
return
hypertile_layers = {}
layers = DEPTH_LAYERS_XL if is_sdxl else DEPTH_LAYERS
for depth in range(4):
for layer_name, module in model.named_modules():
if any(layer_name.endswith(try_name) for try_name in layers[depth]):
params = HypertileParams()
module.__webui_hypertile_params = params
params.forward = module.forward
params.depth = depth
params.layer_name = layer_name
module.forward = self_attn_forward(params)
hypertile_layers[layer_name] = 1
model.__webui_hypertile_layers = hypertile_layers
aspect_ratio = width / height
tile_size = min(largest_tile_size_available(width, height), tile_size_max)
for layer_name, module in model.named_modules():
if layer_name in hypertile_layers:
params = module.__webui_hypertile_params
params.tile_size = tile_size
params.swap_size = swap_size
params.aspect_ratio = aspect_ratio
params.enabled = enable and params.depth <= max_depth

View File

@@ -1,109 +0,0 @@
import hypertile
from modules import scripts, script_callbacks, shared
from scripts.hypertile_xyz import add_axis_options
class ScriptHypertile(scripts.Script):
name = "Hypertile"
def title(self):
return self.name
def show(self, is_img2img):
return scripts.AlwaysVisible
def process(self, p, *args):
hypertile.set_hypertile_seed(p.all_seeds[0])
configure_hypertile(p.width, p.height, enable_unet=shared.opts.hypertile_enable_unet)
self.add_infotext(p)
def before_hr(self, p, *args):
enable = shared.opts.hypertile_enable_unet_secondpass or shared.opts.hypertile_enable_unet
# exclusive hypertile seed for the second pass
if enable:
hypertile.set_hypertile_seed(p.all_seeds[0])
configure_hypertile(p.hr_upscale_to_x, p.hr_upscale_to_y, enable_unet=enable)
if enable and not shared.opts.hypertile_enable_unet:
p.extra_generation_params["Hypertile U-Net second pass"] = True
self.add_infotext(p, add_unet_params=True)
def add_infotext(self, p, add_unet_params=False):
def option(name):
value = getattr(shared.opts, name)
default_value = shared.opts.get_default(name)
return None if value == default_value else value
if shared.opts.hypertile_enable_unet:
p.extra_generation_params["Hypertile U-Net"] = True
if shared.opts.hypertile_enable_unet or add_unet_params:
p.extra_generation_params["Hypertile U-Net max depth"] = option('hypertile_max_depth_unet')
p.extra_generation_params["Hypertile U-Net max tile size"] = option('hypertile_max_tile_unet')
p.extra_generation_params["Hypertile U-Net swap size"] = option('hypertile_swap_size_unet')
if shared.opts.hypertile_enable_vae:
p.extra_generation_params["Hypertile VAE"] = True
p.extra_generation_params["Hypertile VAE max depth"] = option('hypertile_max_depth_vae')
p.extra_generation_params["Hypertile VAE max tile size"] = option('hypertile_max_tile_vae')
p.extra_generation_params["Hypertile VAE swap size"] = option('hypertile_swap_size_vae')
def configure_hypertile(width, height, enable_unet=True):
hypertile.hypertile_hook_model(
shared.sd_model.first_stage_model,
width,
height,
swap_size=shared.opts.hypertile_swap_size_vae,
max_depth=shared.opts.hypertile_max_depth_vae,
tile_size_max=shared.opts.hypertile_max_tile_vae,
enable=shared.opts.hypertile_enable_vae,
)
hypertile.hypertile_hook_model(
shared.sd_model.model,
width,
height,
swap_size=shared.opts.hypertile_swap_size_unet,
max_depth=shared.opts.hypertile_max_depth_unet,
tile_size_max=shared.opts.hypertile_max_tile_unet,
enable=enable_unet,
is_sdxl=shared.sd_model.is_sdxl
)
def on_ui_settings():
import gradio as gr
options = {
"hypertile_explanation": shared.OptionHTML("""
<a href='https://github.com/tfernd/HyperTile'>Hypertile</a> optimizes the self-attention layer within U-Net and VAE models,
resulting in a reduction in computation time ranging from 1 to 4 times. The larger the generated image is, the greater the
benefit.
"""),
"hypertile_enable_unet": shared.OptionInfo(False, "Enable Hypertile U-Net", infotext="Hypertile U-Net").info("enables hypertile for all modes, including hires fix second pass; noticeable change in details of the generated picture"),
"hypertile_enable_unet_secondpass": shared.OptionInfo(False, "Enable Hypertile U-Net for hires fix second pass", infotext="Hypertile U-Net second pass").info("enables hypertile just for hires fix second pass - regardless of whether the above setting is enabled"),
"hypertile_max_depth_unet": shared.OptionInfo(3, "Hypertile U-Net max depth", gr.Slider, {"minimum": 0, "maximum": 3, "step": 1}, infotext="Hypertile U-Net max depth").info("larger = more neural network layers affected; minor effect on performance"),
"hypertile_max_tile_unet": shared.OptionInfo(256, "Hypertile U-Net max tile size", gr.Slider, {"minimum": 0, "maximum": 512, "step": 16}, infotext="Hypertile U-Net max tile size").info("larger = worse performance"),
"hypertile_swap_size_unet": shared.OptionInfo(3, "Hypertile U-Net swap size", gr.Slider, {"minimum": 0, "maximum": 64, "step": 1}, infotext="Hypertile U-Net swap size"),
"hypertile_enable_vae": shared.OptionInfo(False, "Enable Hypertile VAE", infotext="Hypertile VAE").info("minimal change in the generated picture"),
"hypertile_max_depth_vae": shared.OptionInfo(3, "Hypertile VAE max depth", gr.Slider, {"minimum": 0, "maximum": 3, "step": 1}, infotext="Hypertile VAE max depth"),
"hypertile_max_tile_vae": shared.OptionInfo(128, "Hypertile VAE max tile size", gr.Slider, {"minimum": 0, "maximum": 512, "step": 16}, infotext="Hypertile VAE max tile size"),
"hypertile_swap_size_vae": shared.OptionInfo(3, "Hypertile VAE swap size ", gr.Slider, {"minimum": 0, "maximum": 64, "step": 1}, infotext="Hypertile VAE swap size"),
}
for name, opt in options.items():
opt.section = ('hypertile', "Hypertile")
shared.opts.add_option(name, opt)
script_callbacks.on_ui_settings(on_ui_settings)
script_callbacks.on_before_ui(add_axis_options)

View File

@@ -1,51 +0,0 @@
from modules import scripts
from modules.shared import opts
xyz_grid = [x for x in scripts.scripts_data if x.script_class.__module__ == "xyz_grid.py"][0].module
def int_applier(value_name:str, min_range:int = -1, max_range:int = -1):
"""
Returns a function that applies the given value to the given value_name in opts.data.
"""
def validate(value_name:str, value:str):
value = int(value)
# validate value
if not min_range == -1:
assert value >= min_range, f"Value {value} for {value_name} must be greater than or equal to {min_range}"
if not max_range == -1:
assert value <= max_range, f"Value {value} for {value_name} must be less than or equal to {max_range}"
def apply_int(p, x, xs):
validate(value_name, x)
opts.data[value_name] = int(x)
return apply_int
def bool_applier(value_name:str):
"""
Returns a function that applies the given value to the given value_name in opts.data.
"""
def validate(value_name:str, value:str):
assert value.lower() in ["true", "false"], f"Value {value} for {value_name} must be either true or false"
def apply_bool(p, x, xs):
validate(value_name, x)
value_boolean = x.lower() == "true"
opts.data[value_name] = value_boolean
return apply_bool
def add_axis_options():
extra_axis_options = [
xyz_grid.AxisOption("[Hypertile] Unet First pass Enabled", str, bool_applier("hypertile_enable_unet"), choices=xyz_grid.boolean_choice(reverse=True)),
xyz_grid.AxisOption("[Hypertile] Unet Second pass Enabled", str, bool_applier("hypertile_enable_unet_secondpass"), choices=xyz_grid.boolean_choice(reverse=True)),
xyz_grid.AxisOption("[Hypertile] Unet Max Depth", int, int_applier("hypertile_max_depth_unet", 0, 3), choices=lambda: [str(x) for x in range(4)]),
xyz_grid.AxisOption("[Hypertile] Unet Max Tile Size", int, int_applier("hypertile_max_tile_unet", 0, 512)),
xyz_grid.AxisOption("[Hypertile] Unet Swap Size", int, int_applier("hypertile_swap_size_unet", 0, 64)),
xyz_grid.AxisOption("[Hypertile] VAE Enabled", str, bool_applier("hypertile_enable_vae"), choices=xyz_grid.boolean_choice(reverse=True)),
xyz_grid.AxisOption("[Hypertile] VAE Max Depth", int, int_applier("hypertile_max_depth_vae", 0, 3), choices=lambda: [str(x) for x in range(4)]),
xyz_grid.AxisOption("[Hypertile] VAE Max Tile Size", int, int_applier("hypertile_max_tile_vae", 0, 512)),
xyz_grid.AxisOption("[Hypertile] VAE Swap Size", int, int_applier("hypertile_swap_size_vae", 0, 64)),
]
set_a = {opt.label for opt in xyz_grid.axis_options}
set_b = {opt.label for opt in extra_axis_options}
if set_a.intersection(set_b):
return
xyz_grid.axis_options.extend(extra_axis_options)

View File

@@ -0,0 +1,674 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
<program> Copyright (C) <year> <name of author>
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<https://www.gnu.org/licenses/>.
The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<https://www.gnu.org/licenses/why-not-lgpl.html>.

View File

@@ -0,0 +1,86 @@
import gradio as gr
from modules import scripts
from ldm_patched.contrib.external_freelunch import FreeU_V2
opFreeU_V2 = FreeU_V2()
# def Fourier_filter(x, threshold, scale):
# x_freq = torch.fft.fftn(x.float(), dim=(-2, -1))
# x_freq = torch.fft.fftshift(x_freq, dim=(-2, -1))
# B, C, H, W = x_freq.shape
# mask = torch.ones((B, C, H, W), device=x.device)
# crow, ccol = H // 2, W //2
# mask[..., crow - threshold:crow + threshold, ccol - threshold:ccol + threshold] = scale
# x_freq = x_freq * mask
# x_freq = torch.fft.ifftshift(x_freq, dim=(-2, -1))
# x_filtered = torch.fft.ifftn(x_freq, dim=(-2, -1)).real
# return x_filtered.to(x.dtype)
#
#
# def set_freeu_v2_patch(model, b1, b2, s1, s2):
# model_channels = model.model.model_config.unet_config["model_channels"]
# scale_dict = {model_channels * 4: (b1, s1), model_channels * 2: (b2, s2)}
#
# def output_block_patch(h, hsp, *args, **kwargs):
# scale = scale_dict.get(h.shape[1], None)
# if scale is not None:
# hidden_mean = h.mean(1).unsqueeze(1)
# B = hidden_mean.shape[0]
# hidden_max, _ = torch.max(hidden_mean.view(B, -1), dim=-1, keepdim=True)
# hidden_min, _ = torch.min(hidden_mean.view(B, -1), dim=-1, keepdim=True)
# hidden_mean = (hidden_mean - hidden_min.unsqueeze(2).unsqueeze(3)) / \
# (hidden_max - hidden_min).unsqueeze(2).unsqueeze(3)
# h[:, :h.shape[1] // 2] = h[:, :h.shape[1] // 2] * ((scale[0] - 1) * hidden_mean + 1)
# hsp = Fourier_filter(hsp, threshold=1, scale=scale[1])
# return h, hsp
#
# m = model.clone()
# m.set_model_output_block_patch(output_block_patch)
# return m
class FreeUForForge(scripts.Script):
def title(self):
return "FreeU Integrated"
def show(self, is_img2img):
# make this extension visible in both txt2img and img2img tab.
return scripts.AlwaysVisible
def ui(self, *args, **kwargs):
with gr.Accordion(open=False, label=self.title()):
freeu_enabled = gr.Checkbox(label='Enabled', value=False)
freeu_b1 = gr.Slider(label='B1', minimum=0, maximum=2, step=0.01, value=1.01)
freeu_b2 = gr.Slider(label='B2', minimum=0, maximum=2, step=0.01, value=1.02)
freeu_s1 = gr.Slider(label='S1', minimum=0, maximum=4, step=0.01, value=0.99)
freeu_s2 = gr.Slider(label='S2', minimum=0, maximum=4, step=0.01, value=0.95)
return freeu_enabled, freeu_b1, freeu_b2, freeu_s1, freeu_s2
def process_batch(self, p, *script_args, **kwargs):
freeu_enabled, freeu_b1, freeu_b2, freeu_s1, freeu_s2 = script_args
if not freeu_enabled:
return
unet = p.sd_model.forge_objects.unet
# unet = set_freeu_v2_patch(unet, freeu_b1, freeu_b2, freeu_s1, freeu_s2)
unet = opFreeU_V2.patch(unet, freeu_b1, freeu_b2, freeu_s1, freeu_s2)[0]
p.sd_model.forge_objects.unet = unet
# Below codes will add some logs to the texts below the image outputs on UI.
# The extra_generation_params does not influence results.
p.extra_generation_params.update(dict(
freeu_enabled=freeu_enabled,
freeu_b1=freeu_b1,
freeu_b2=freeu_b2,
freeu_s1=freeu_s1,
freeu_s2=freeu_s2,
))
return

View File

@@ -0,0 +1,48 @@
import gradio as gr
from modules import scripts
from ldm_patched.contrib.external_hypertile import HyperTile
opHyperTile = HyperTile()
class HyperTileForForge(scripts.Script):
def title(self):
return "HyperTile Integrated"
def show(self, is_img2img):
return scripts.AlwaysVisible
def ui(self, *args, **kwargs):
with gr.Accordion(open=False, label=self.title()):
enabled = gr.Checkbox(label='Enabled', value=False)
tile_size = gr.Slider(label='Tile Size', minimum=1, maximum=2048, step=1, value=256)
swap_size = gr.Slider(label='Swap Size', minimum=1, maximum=128, step=1, value=2)
max_depth = gr.Slider(label='Max Depth', minimum=0, maximum=10, step=1, value=0)
scale_depth = gr.Checkbox(label='Scale Depth', value=False)
return enabled, tile_size, swap_size, max_depth, scale_depth
def process_batch(self, p, *script_args, **kwargs):
enabled, tile_size, swap_size, max_depth, scale_depth = script_args
tile_size, swap_size, max_depth = int(tile_size), int(swap_size), int(max_depth)
if not enabled:
return
unet = p.sd_model.forge_objects.unet
unet = opHyperTile.patch(unet, tile_size, swap_size, max_depth, scale_depth)[0]
p.sd_model.forge_objects.unet = unet
p.extra_generation_params.update(dict(
HyperTile_enabled=enabled,
HyperTile_tile_size=tile_size,
HyperTile_swap_size=swap_size,
HyperTile_max_depth=max_depth,
HyperTile_scale_depth=scale_depth,
))
return

View File

@@ -0,0 +1,55 @@
import gradio as gr
from modules import scripts
from ldm_patched.contrib.external_model_downscale import PatchModelAddDownscale
opPatchModelAddDownscale = PatchModelAddDownscale()
class KohyaHRFixForForge(scripts.Script):
def title(self):
return "Kohya HRFix Integrated"
def show(self, is_img2img):
return scripts.AlwaysVisible
def ui(self, *args, **kwargs):
upscale_methods = ["bicubic", "nearest-exact", "bilinear", "area", "bislerp"]
with gr.Accordion(open=False, label=self.title()):
enabled = gr.Checkbox(label='Enabled', value=False)
block_number = gr.Slider(label='Block Number', value=3, minimum=1, maximum=32, step=1)
downscale_factor = gr.Slider(label='Downscale Factor', value=2.0, minimum=0.1, maximum=9.0, step=0.001)
start_percent = gr.Slider(label='Start Percent', value=0.0, minimum=0.0, maximum=1.0, step=0.001)
end_percent = gr.Slider(label='End Percent', value=0.35, minimum=0.0, maximum=1.0, step=0.001)
downscale_after_skip = gr.Checkbox(label='Downscale After Skip', value=True)
downscale_method = gr.Radio(label='Downscale Method', choices=upscale_methods, value=upscale_methods[0])
upscale_method = gr.Radio(label='Upscale Method', choices=upscale_methods, value=upscale_methods[0])
return enabled, block_number, downscale_factor, start_percent, end_percent, downscale_after_skip, downscale_method, upscale_method
def process_batch(self, p, *script_args, **kwargs):
enabled, block_number, downscale_factor, start_percent, end_percent, downscale_after_skip, downscale_method, upscale_method = script_args
block_number = int(block_number)
if not enabled:
return
unet = p.sd_model.forge_objects.unet
unet = opPatchModelAddDownscale.patch(unet, block_number, downscale_factor, start_percent, end_percent, downscale_after_skip, downscale_method, upscale_method)[0]
p.sd_model.forge_objects.unet = unet
p.extra_generation_params.update(dict(
kohya_hrfix_enabled=enabled,
kohya_hrfix_block_number=block_number,
kohya_hrfix_downscale_factor=downscale_factor,
kohya_hrfix_start_percent=start_percent,
kohya_hrfix_end_percent=end_percent,
kohya_hrfix_downscale_after_skip=downscale_after_skip,
kohya_hrfix_downscale_method=downscale_method,
kohya_hrfix_upscale_method=upscale_method,
))
return

View File

@@ -0,0 +1,43 @@
import gradio as gr
from modules import scripts
from ldm_patched.contrib.external_sag import SelfAttentionGuidance
opSelfAttentionGuidance = SelfAttentionGuidance()
class SAGForForge(scripts.Script):
def title(self):
return "SelfAttentionGuidance Integrated"
def show(self, is_img2img):
return scripts.AlwaysVisible
def ui(self, *args, **kwargs):
with gr.Accordion(open=False, label=self.title()):
enabled = gr.Checkbox(label='Enabled', value=False)
scale = gr.Slider(label='Scale', minimum=-2.0, maximum=5.0, step=0.01, value=0.5)
blur_sigma = gr.Slider(label='Blur Sigma', minimum=0.0, maximum=10.0, step=0.01, value=2.0)
return enabled, scale, blur_sigma
def process_batch(self, p, *script_args, **kwargs):
enabled, scale, blur_sigma = script_args
if not enabled:
return
unet = p.sd_model.forge_objects.unet
unet = opSelfAttentionGuidance.patch(unet, scale, blur_sigma)[0]
p.sd_model.forge_objects.unet = unet
p.extra_generation_params.update(dict(
sag_enabled=enabled,
sag_scale=scale,
sag_blur_sigma=blur_sigma
))
return

View File

@@ -0,0 +1,113 @@
import torch
import gradio as gr
import os
import pathlib
from modules import script_callbacks
from modules.paths import models_path
from modules.ui_common import ToolButton, refresh_symbol
from modules import shared
from modules_forge.forge_util import numpy_to_pytorch, pytorch_to_numpy, write_images_to_mp4
from ldm_patched.modules.sd import load_checkpoint_guess_config
from ldm_patched.contrib.external_video_model import VideoLinearCFGGuidance, SVD_img2vid_Conditioning
from ldm_patched.contrib.external import KSampler, VAEDecode
opVideoLinearCFGGuidance = VideoLinearCFGGuidance()
opSVD_img2vid_Conditioning = SVD_img2vid_Conditioning()
opKSampler = KSampler()
opVAEDecode = VAEDecode()
svd_root = os.path.join(models_path, 'svd')
os.makedirs(svd_root, exist_ok=True)
svd_filenames = []
def update_svd_filenames():
global svd_filenames
svd_filenames = [
pathlib.Path(x).name for x in
shared.walk_files(svd_root, allowed_extensions=[".pt", ".ckpt", ".safetensors"])
]
return svd_filenames
@torch.inference_mode()
@torch.no_grad()
def predict(filename, width, height, video_frames, motion_bucket_id, fps, augmentation_level,
sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler,
sampling_denoise, guidance_min_cfg, input_image):
filename = os.path.join(svd_root, filename)
model_raw, _, vae, clip_vision = \
load_checkpoint_guess_config(filename, output_vae=True, output_clip=False, output_clipvision=True)
model = opVideoLinearCFGGuidance.patch(model_raw, guidance_min_cfg)[0]
init_image = numpy_to_pytorch(input_image)
positive, negative, latent_image = opSVD_img2vid_Conditioning.encode(
clip_vision, init_image, vae, width, height, video_frames, motion_bucket_id, fps, augmentation_level)
output_latent = opKSampler.sample(model, sampling_seed, sampling_steps, sampling_cfg,
sampling_sampler_name, sampling_scheduler, positive,
negative, latent_image, sampling_denoise)[0]
output_pixels = opVAEDecode.decode(vae, output_latent)[0]
outputs = pytorch_to_numpy(output_pixels)
video_filename = write_images_to_mp4(outputs, fps=fps)
return outputs, video_filename
def on_ui_tabs():
with gr.Blocks() as svd_block:
with gr.Row():
with gr.Column():
input_image = gr.Image(label='Input Image', source='upload', type='numpy', height=400)
with gr.Row():
filename = gr.Dropdown(label="SVD Checkpoint Filename",
choices=svd_filenames,
value=svd_filenames[0] if len(svd_filenames) > 0 else None)
refresh_button = ToolButton(value=refresh_symbol, tooltip="Refresh")
refresh_button.click(
fn=lambda: gr.update(choices=update_svd_filenames),
inputs=[], outputs=filename)
width = gr.Slider(label='Width', minimum=16, maximum=8192, step=8, value=1024)
height = gr.Slider(label='Height', minimum=16, maximum=8192, step=8, value=576)
video_frames = gr.Slider(label='Video Frames', minimum=1, maximum=4096, step=1, value=14)
motion_bucket_id = gr.Slider(label='Motion Bucket Id', minimum=1, maximum=1023, step=1, value=127)
fps = gr.Slider(label='Fps', minimum=1, maximum=1024, step=1, value=6)
augmentation_level = gr.Slider(label='Augmentation Level', minimum=0.0, maximum=10.0, step=0.01,
value=0.0)
sampling_steps = gr.Slider(label='Sampling Steps', minimum=1, maximum=200, step=1, value=20)
sampling_cfg = gr.Slider(label='CFG Scale', minimum=0.0, maximum=50.0, step=0.1, value=2.5)
sampling_denoise = gr.Slider(label='Sampling Denoise', minimum=0.0, maximum=1.0, step=0.01, value=1.0)
guidance_min_cfg = gr.Slider(label='Guidance Min Cfg', minimum=0.0, maximum=100.0, step=0.5, value=1.0)
sampling_sampler_name = gr.Radio(label='Sampler Name',
choices=['euler', 'euler_ancestral', 'heun', 'heunpp2', 'dpm_2',
'dpm_2_ancestral', 'lms', 'dpm_fast', 'dpm_adaptive',
'dpmpp_2s_ancestral', 'dpmpp_sde', 'dpmpp_sde_gpu',
'dpmpp_2m', 'dpmpp_2m_sde', 'dpmpp_2m_sde_gpu',
'dpmpp_3m_sde', 'dpmpp_3m_sde_gpu', 'ddpm', 'lcm', 'ddim',
'uni_pc', 'uni_pc_bh2'], value='euler')
sampling_scheduler = gr.Radio(label='Scheduler',
choices=['normal', 'karras', 'exponential', 'sgm_uniform', 'simple',
'ddim_uniform'], value='karras')
sampling_seed = gr.Number(label='Seed', value=12345, precision=0)
generate_button = gr.Button(value="Generate")
ctrls = [filename, width, height, video_frames, motion_bucket_id, fps, augmentation_level,
sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler,
sampling_denoise, guidance_min_cfg, input_image]
with gr.Column():
output_video = gr.Video(autoplay=True)
output_gallery = gr.Gallery(label='Gallery', show_label=False, object_fit='contain',
visible=True, height=1024, columns=4)
generate_button.click(predict, inputs=ctrls, outputs=[output_gallery, output_video])
return [(svd_block, "SVD", "svd")]
update_svd_filenames()
script_callbacks.on_ui_tabs(on_ui_tabs)

View File

@@ -0,0 +1,100 @@
import torch
import gradio as gr
import os
import pathlib
from modules import script_callbacks
from modules.paths import models_path
from modules.ui_common import ToolButton, refresh_symbol
from modules import shared
from modules_forge.forge_util import numpy_to_pytorch, pytorch_to_numpy
from ldm_patched.modules.sd import load_checkpoint_guess_config
from ldm_patched.contrib.external_stable3d import StableZero123_Conditioning
from ldm_patched.contrib.external import KSampler, VAEDecode
opStableZero123_Conditioning = StableZero123_Conditioning()
opKSampler = KSampler()
opVAEDecode = VAEDecode()
model_root = os.path.join(models_path, 'z123')
os.makedirs(model_root, exist_ok=True)
model_filenames = []
def update_model_filenames():
global model_filenames
model_filenames = [
pathlib.Path(x).name for x in
shared.walk_files(model_root, allowed_extensions=[".pt", ".ckpt", ".safetensors"])
]
return model_filenames
@torch.inference_mode()
@torch.no_grad()
def predict(filename, width, height, batch_size, elevation, azimuth,
sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler, sampling_denoise, input_image):
filename = os.path.join(model_root, filename)
model, _, vae, clip_vision = \
load_checkpoint_guess_config(filename, output_vae=True, output_clip=False, output_clipvision=True)
init_image = numpy_to_pytorch(input_image)
positive, negative, latent_image = opStableZero123_Conditioning.encode(
clip_vision, init_image, vae, width, height, batch_size, elevation, azimuth)
output_latent = opKSampler.sample(model, sampling_seed, sampling_steps, sampling_cfg,
sampling_sampler_name, sampling_scheduler, positive,
negative, latent_image, sampling_denoise)[0]
output_pixels = opVAEDecode.decode(vae, output_latent)[0]
outputs = pytorch_to_numpy(output_pixels)
return outputs
def on_ui_tabs():
with gr.Blocks() as model_block:
with gr.Row():
with gr.Column():
input_image = gr.Image(label='Input Image', source='upload', type='numpy', height=400)
with gr.Row():
filename = gr.Dropdown(label="Zero123 Checkpoint Filename",
choices=model_filenames,
value=model_filenames[0] if len(model_filenames) > 0 else None)
refresh_button = ToolButton(value=refresh_symbol, tooltip="Refresh")
refresh_button.click(
fn=lambda: gr.update(choices=update_model_filenames),
inputs=[], outputs=filename)
width = gr.Slider(label='Width', minimum=16, maximum=8192, step=8, value=256)
height = gr.Slider(label='Height', minimum=16, maximum=8192, step=8, value=256)
batch_size = gr.Slider(label='Batch Size', minimum=1, maximum=4096, step=1, value=4)
elevation = gr.Slider(label='Elevation', minimum=-180.0, maximum=180.0, step=0.001, value=10.0)
azimuth = gr.Slider(label='Azimuth', minimum=-180.0, maximum=180.0, step=0.001, value=142.0)
sampling_denoise = gr.Slider(label='Sampling Denoise', minimum=0.0, maximum=1.0, step=0.01, value=1.0)
sampling_steps = gr.Slider(label='Sampling Steps', minimum=1, maximum=10000, step=1, value=20)
sampling_cfg = gr.Slider(label='CFG Scale', minimum=0.0, maximum=100.0, step=0.1, value=5.0)
sampling_sampler_name = gr.Radio(label='Sampler Name',
choices=['euler', 'euler_ancestral', 'heun', 'heunpp2', 'dpm_2',
'dpm_2_ancestral', 'lms', 'dpm_fast', 'dpm_adaptive',
'dpmpp_2s_ancestral', 'dpmpp_sde', 'dpmpp_sde_gpu',
'dpmpp_2m', 'dpmpp_2m_sde', 'dpmpp_2m_sde_gpu',
'dpmpp_3m_sde', 'dpmpp_3m_sde_gpu', 'ddpm', 'lcm', 'ddim',
'uni_pc', 'uni_pc_bh2'], value='euler')
sampling_scheduler = gr.Radio(label='Sampling Scheduler',
choices=['normal', 'karras', 'exponential', 'sgm_uniform', 'simple',
'ddim_uniform'], value='sgm_uniform')
sampling_seed = gr.Number(label='Seed', value=12345, precision=0)
generate_button = gr.Button(value="Generate")
ctrls = [filename, width, height, batch_size, elevation, azimuth, sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler, sampling_denoise, input_image]
with gr.Column():
output_gallery = gr.Gallery(label='Gallery', show_label=False, object_fit='contain',
visible=True, height=1024, columns=4)
generate_button.click(predict, inputs=ctrls, outputs=[output_gallery])
return [(model_block, "Z123", "z123")]
update_model_filenames()
script_callbacks.on_ui_tabs(on_ui_tabs)

View File

View File

View File

@@ -2,8 +2,9 @@ import argparse
import json
import os
from modules.paths_internal import models_path, script_path, data_path, extensions_dir, extensions_builtin_dir, sd_default_config, sd_model_file # noqa: F401
from ldm_patched.modules import args_parser
parser = argparse.ArgumentParser()
parser = args_parser.parser
parser.add_argument("-f", action='store_true', help=argparse.SUPPRESS) # allows running as root; implemented outside of webui
parser.add_argument("--update-all-extensions", action='store_true', help="launch.py argument: download updates for all extensions when starting the program")

View File

@@ -2,7 +2,7 @@ import os
from modules import modelloader, errors
from modules.shared import cmd_opts, opts
from modules.upscaler import Upscaler, UpscalerData
from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
from modules.upscaler_utils import upscale_with_model
@@ -23,6 +23,7 @@ class UpscalerDAT(Upscaler):
self.scalers.append(model)
def do_upscale(self, img, path):
prepare_free_memory()
try:
info = self.load_model(path)
except Exception:

View File

@@ -4,7 +4,10 @@ import re
import torch
import numpy as np
from modules import modelloader, paths, deepbooru_model, devices, images, shared
from modules import modelloader, paths, deepbooru_model, images, shared
from ldm_patched.modules import model_management
from ldm_patched.modules.model_patcher import ModelPatcher
re_special = re.compile(r'([\\()])')
@@ -12,6 +15,14 @@ re_special = re.compile(r'([\\()])')
class DeepDanbooru:
def __init__(self):
self.model = None
self.load_device = model_management.text_encoder_device()
self.offload_device = model_management.text_encoder_offload_device()
self.dtype = torch.float32
if model_management.should_use_fp16(device=self.load_device):
self.dtype = torch.float16
self.patcher = None
def load(self):
if self.model is not None:
@@ -28,16 +39,16 @@ class DeepDanbooru:
self.model.load_state_dict(torch.load(files[0], map_location="cpu"))
self.model.eval()
self.model.to(devices.cpu, devices.dtype)
self.model.to(self.offload_device, self.dtype)
self.patcher = ModelPatcher(self.model, load_device=self.load_device, offload_device=self.offload_device)
def start(self):
self.load()
self.model.to(devices.device)
model_management.load_models_gpu([self.patcher])
def stop(self):
if not shared.opts.interrogate_keep_models_in_memory:
self.model.to(devices.cpu)
devices.torch_gc()
pass
def tag(self, pil_image):
self.start()
@@ -56,8 +67,8 @@ class DeepDanbooru:
pic = images.resize_image(2, pil_image.convert("RGB"), 512, 512)
a = np.expand_dims(np.array(pic, dtype=np.float32), 0) / 255
with torch.no_grad(), devices.autocast():
x = torch.from_numpy(a).to(devices.device)
with torch.no_grad():
x = torch.from_numpy(a).to(self.load_device, self.dtype)
y = self.model(x)[0].detach().cpu().numpy()
probability_dict = {}

View File

@@ -1,211 +1,89 @@
import sys
import contextlib
from functools import lru_cache
import torch
from modules import errors, shared
from modules import torch_utils
if sys.platform == "darwin":
from modules import mac_specific
if shared.cmd_opts.use_ipex:
from modules import xpu_specific
import ldm_patched.modules.model_management as model_management
def has_xpu() -> bool:
return shared.cmd_opts.use_ipex and xpu_specific.has_xpu
return model_management.xpu_available
def has_mps() -> bool:
if sys.platform != "darwin":
return False
else:
return mac_specific.has_mps
return model_management.mps_mode()
def cuda_no_autocast(device_id=None) -> bool:
if device_id is None:
device_id = get_cuda_device_id()
return (
torch.cuda.get_device_capability(device_id) == (7, 5)
and torch.cuda.get_device_name(device_id).startswith("NVIDIA GeForce GTX 16")
)
return False
def get_cuda_device_id():
return (
int(shared.cmd_opts.device_id)
if shared.cmd_opts.device_id is not None and shared.cmd_opts.device_id.isdigit()
else 0
) or torch.cuda.current_device()
return model_management.get_torch_device().index
def get_cuda_device_string():
if shared.cmd_opts.device_id is not None:
return f"cuda:{shared.cmd_opts.device_id}"
return "cuda"
return str(model_management.get_torch_device())
def get_optimal_device_name():
if torch.cuda.is_available():
return get_cuda_device_string()
if has_mps():
return "mps"
if has_xpu():
return xpu_specific.get_xpu_device_string()
return "cpu"
return model_management.get_torch_device().type
def get_optimal_device():
return torch.device(get_optimal_device_name())
return model_management.get_torch_device()
def get_device_for(task):
if task in shared.cmd_opts.use_cpu or "all" in shared.cmd_opts.use_cpu:
return cpu
return get_optimal_device()
def torch_gc():
if torch.cuda.is_available():
with torch.cuda.device(get_cuda_device_string()):
torch.cuda.empty_cache()
torch.cuda.ipc_collect()
if has_mps():
mac_specific.torch_mps_gc()
if has_xpu():
xpu_specific.torch_xpu_gc()
model_management.soft_empty_cache()
def enable_tf32():
if torch.cuda.is_available():
return
# enabling benchmark option seems to enable a range of cards to do fp16 when they otherwise can't
# see https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/4407
if cuda_no_autocast():
torch.backends.cudnn.benchmark = True
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
errors.run(enable_tf32, "Enabling TF32")
cpu: torch.device = torch.device("cpu")
fp8: bool = False
device: torch.device = None
device_interrogate: torch.device = None
device_gfpgan: torch.device = None
device_esrgan: torch.device = None
device_codeformer: torch.device = None
dtype: torch.dtype = torch.float16
dtype_vae: torch.dtype = torch.float16
dtype_unet: torch.dtype = torch.float16
dtype_inference: torch.dtype = torch.float16
device: torch.device = model_management.get_torch_device()
device_interrogate: torch.device = cpu # not used
device_gfpgan: torch.device = cpu
device_esrgan: torch.device = model_management.get_torch_device() # will be managed in special way
device_codeformer: torch.device = cpu
dtype: torch.dtype = model_management.unet_dtype()
dtype_vae: torch.dtype = model_management.vae_dtype()
dtype_unet: torch.dtype = model_management.unet_dtype()
dtype_inference: torch.dtype = model_management.unet_dtype()
unet_needs_upcast = False
def cond_cast_unet(input):
return input.to(dtype_unet) if unet_needs_upcast else input
return input
def cond_cast_float(input):
return input.float() if unet_needs_upcast else input
return input
nv_rng = None
patch_module_list = [
torch.nn.Linear,
torch.nn.Conv2d,
torch.nn.MultiheadAttention,
torch.nn.GroupNorm,
torch.nn.LayerNorm,
]
patch_module_list = []
def manual_cast_forward(target_dtype):
def forward_wrapper(self, *args, **kwargs):
if any(
isinstance(arg, torch.Tensor) and arg.dtype != target_dtype
for arg in args
):
args = [arg.to(target_dtype) if isinstance(arg, torch.Tensor) else arg for arg in args]
kwargs = {k: v.to(target_dtype) if isinstance(v, torch.Tensor) else v for k, v in kwargs.items()}
org_dtype = torch_utils.get_param(self).dtype
if org_dtype != target_dtype:
self.to(target_dtype)
result = self.org_forward(*args, **kwargs)
if org_dtype != target_dtype:
self.to(org_dtype)
if target_dtype != dtype_inference:
if isinstance(result, tuple):
result = tuple(
i.to(dtype_inference)
if isinstance(i, torch.Tensor)
else i
for i in result
)
elif isinstance(result, torch.Tensor):
result = result.to(dtype_inference)
return result
return forward_wrapper
return
@contextlib.contextmanager
def manual_cast(target_dtype):
applied = False
for module_type in patch_module_list:
if hasattr(module_type, "org_forward"):
continue
applied = True
org_forward = module_type.forward
if module_type == torch.nn.MultiheadAttention and has_xpu():
module_type.forward = manual_cast_forward(torch.float32)
else:
module_type.forward = manual_cast_forward(target_dtype)
module_type.org_forward = org_forward
try:
yield None
finally:
if applied:
for module_type in patch_module_list:
if hasattr(module_type, "org_forward"):
module_type.forward = module_type.org_forward
delattr(module_type, "org_forward")
return
def autocast(disable=False):
if disable:
return contextlib.nullcontext()
if fp8 and device==cpu:
return torch.autocast("cpu", dtype=torch.bfloat16, enabled=True)
if fp8 and dtype_inference == torch.float32:
return manual_cast(dtype)
if dtype == torch.float32 or dtype_inference == torch.float32:
return contextlib.nullcontext()
if has_xpu() or has_mps() or cuda_no_autocast():
return manual_cast(dtype)
return torch.autocast("cuda")
return contextlib.nullcontext()
def without_autocast(disable=False):
return torch.autocast("cuda", enabled=False) if torch.is_autocast_enabled() and not disable else contextlib.nullcontext()
return contextlib.nullcontext()
class NansException(Exception):
@@ -213,43 +91,9 @@ class NansException(Exception):
def test_for_nans(x, where):
if shared.cmd_opts.disable_nan_check:
return
if not torch.all(torch.isnan(x)).item():
return
if where == "unet":
message = "A tensor with all NaNs was produced in Unet."
if not shared.cmd_opts.no_half:
message += " This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the \"Upcast cross attention layer to float32\" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this."
elif where == "vae":
message = "A tensor with all NaNs was produced in VAE."
if not shared.cmd_opts.no_half and not shared.cmd_opts.no_half_vae:
message += " This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this."
else:
message = "A tensor with all NaNs was produced."
message += " Use --disable-nan-check commandline argument to disable this check."
raise NansException(message)
return
@lru_cache
def first_time_calculation():
"""
just do any calculation with pytorch layers - the first time this is done it allocaltes about 700MB of memory and
spends about 2.7 seconds doing that, at least wih NVidia.
"""
x = torch.zeros((1, 1)).to(device, dtype)
linear = torch.nn.Linear(1, 1).to(device, dtype)
linear(x)
x = torch.zeros((1, 1, 3, 3)).to(device, dtype)
conv2d = torch.nn.Conv2d(1, 1, (3, 3)).to(device, dtype)
conv2d(x)
return

View File

@@ -1,6 +1,6 @@
from modules import modelloader, devices, errors
from modules.shared import opts
from modules.upscaler import Upscaler, UpscalerData
from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
from modules.upscaler_utils import upscale_with_model
@@ -27,6 +27,7 @@ class UpscalerESRGAN(Upscaler):
self.scalers.append(scaler_data)
def do_upscale(self, img, selected_model):
prepare_free_memory()
try:
model = self.load_model(selected_model)
except Exception:

View File

@@ -3,7 +3,7 @@ import sys
from modules import modelloader, devices
from modules.shared import opts
from modules.upscaler import Upscaler, UpscalerData
from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
from modules.upscaler_utils import upscale_with_model
@@ -20,6 +20,7 @@ class UpscalerHAT(Upscaler):
self.scalers.append(scaler_data)
def do_upscale(self, img, selected_model):
prepare_free_memory()
try:
model = self.load_model(selected_model)
except Exception as e:

View File

@@ -12,6 +12,10 @@ def imports():
logging.getLogger("torch.distributed.nn").setLevel(logging.ERROR) # sshh...
logging.getLogger("xformers").addFilter(lambda record: 'A matching Triton is not available' not in record.getMessage())
from modules_forge.initialization import initialize_forge
initialize_forge()
startup_timer.record("initialize forge")
import torch # noqa: F401
startup_timer.record("import torch")
import pytorch_lightning # noqa: F401

View File

@@ -10,7 +10,10 @@ import torch.hub
from torchvision import transforms
from torchvision.transforms.functional import InterpolationMode
from modules import devices, paths, shared, lowvram, modelloader, errors, torch_utils
from modules import devices, paths, shared, modelloader, errors
from ldm_patched.modules import model_management
from ldm_patched.modules.model_patcher import ModelPatcher
blip_image_eval_size = 384
clip_model_name = 'ViT-L/14'
@@ -53,7 +56,16 @@ class InterrogateModels:
self.loaded_categories = None
self.skip_categories = []
self.content_dir = content_dir
self.running_on_cpu = devices.device_interrogate == torch.device("cpu")
self.load_device = model_management.text_encoder_device()
self.offload_device = model_management.text_encoder_offload_device()
self.dtype = torch.float32
if model_management.should_use_fp16(device=self.load_device):
self.dtype = torch.float16
self.blip_patcher = None
self.clip_patcher = None
def categories(self):
if not os.path.exists(self.content_dir):
@@ -105,49 +117,37 @@ class InterrogateModels:
def load_clip_model(self):
import clip
import clip.model
if self.running_on_cpu:
model, preprocess = clip.load(clip_model_name, device="cpu", download_root=shared.cmd_opts.clip_models_path)
else:
model, preprocess = clip.load(clip_model_name, download_root=shared.cmd_opts.clip_models_path)
clip.model.LayerNorm = torch.nn.LayerNorm
model, preprocess = clip.load(clip_model_name, device="cpu", download_root=shared.cmd_opts.clip_models_path)
model.eval()
model = model.to(devices.device_interrogate)
return model, preprocess
def load(self):
if self.blip_model is None:
self.blip_model = self.load_blip_model()
if not shared.cmd_opts.no_half and not self.running_on_cpu:
self.blip_model = self.blip_model.half()
self.blip_model = self.blip_model.to(devices.device_interrogate)
self.blip_model = self.blip_model.to(device=self.offload_device, dtype=self.dtype)
self.blip_patcher = ModelPatcher(self.blip_model, load_device=self.load_device, offload_device=self.offload_device)
if self.clip_model is None:
self.clip_model, self.clip_preprocess = self.load_clip_model()
if not shared.cmd_opts.no_half and not self.running_on_cpu:
self.clip_model = self.clip_model.half()
self.clip_model = self.clip_model.to(device=self.offload_device, dtype=self.dtype)
self.clip_patcher = ModelPatcher(self.clip_model, load_device=self.load_device, offload_device=self.offload_device)
self.clip_model = self.clip_model.to(devices.device_interrogate)
self.dtype = torch_utils.get_param(self.clip_model).dtype
model_management.load_models_gpu([self.blip_patcher, self.clip_patcher])
return
def send_clip_to_ram(self):
if not shared.opts.interrogate_keep_models_in_memory:
if self.clip_model is not None:
self.clip_model = self.clip_model.to(devices.cpu)
pass
def send_blip_to_ram(self):
if not shared.opts.interrogate_keep_models_in_memory:
if self.blip_model is not None:
self.blip_model = self.blip_model.to(devices.cpu)
pass
def unload(self):
self.send_clip_to_ram()
self.send_blip_to_ram()
devices.torch_gc()
pass
def rank(self, image_features, text_array, top_count=1):
import clip
@@ -158,11 +158,11 @@ class InterrogateModels:
text_array = text_array[0:int(shared.opts.interrogate_clip_dict_limit)]
top_count = min(top_count, len(text_array))
text_tokens = clip.tokenize(list(text_array), truncate=True).to(devices.device_interrogate)
text_tokens = clip.tokenize(list(text_array), truncate=True).to(self.load_device)
text_features = self.clip_model.encode_text(text_tokens).type(self.dtype)
text_features /= text_features.norm(dim=-1, keepdim=True)
similarity = torch.zeros((1, len(text_array))).to(devices.device_interrogate)
similarity = torch.zeros((1, len(text_array))).to(self.load_device)
for i in range(image_features.shape[0]):
similarity += (100.0 * image_features[i].unsqueeze(0) @ text_features.T).softmax(dim=-1)
similarity /= image_features.shape[0]
@@ -175,7 +175,7 @@ class InterrogateModels:
transforms.Resize((blip_image_eval_size, blip_image_eval_size), interpolation=InterpolationMode.BICUBIC),
transforms.ToTensor(),
transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711))
])(pil_image).unsqueeze(0).type(self.dtype).to(devices.device_interrogate)
])(pil_image).unsqueeze(0).type(self.dtype).to(self.load_device)
with torch.no_grad():
caption = self.blip_model.generate(gpu_image, sample=False, num_beams=shared.opts.interrogate_clip_num_beams, min_length=shared.opts.interrogate_clip_min_length, max_length=shared.opts.interrogate_clip_max_length)
@@ -186,9 +186,6 @@ class InterrogateModels:
res = ""
shared.state.begin(job="interrogate")
try:
lowvram.send_everything_to_cpu()
devices.torch_gc()
self.load()
caption = self.generate_caption(pil_image)
@@ -197,7 +194,7 @@ class InterrogateModels:
res = caption
clip_image = self.clip_preprocess(pil_image).unsqueeze(0).type(self.dtype).to(devices.device_interrogate)
clip_image = self.clip_preprocess(pil_image).unsqueeze(0).type(self.dtype).to(self.load_device)
with torch.no_grad(), devices.autocast():
image_features = self.clip_model.encode_image(clip_image).type(self.dtype)

View File

@@ -15,6 +15,7 @@ from modules import cmd_args, errors
from modules.paths_internal import script_path, extensions_dir
from modules.timer import startup_timer
from modules import logging_config
from modules_forge import forge_version
args, _ = cmd_args.parser.parse_known_args()
logging_config.setup_logging(args.loglevel)
@@ -70,7 +71,7 @@ def commit_hash():
@lru_cache()
def git_tag():
def git_tag_a1111():
try:
return subprocess.check_output([git, "-C", script_path, "describe", "--tags"], shell=False, encoding='utf8').strip()
except Exception:
@@ -85,6 +86,10 @@ def git_tag():
return "<none>"
def git_tag():
return 'f' + forge_version.version + '-' + git_tag_a1111()
def run(command, desc=None, errdesc=None, custom_env=None, live: bool = default_command_live) -> str:
if desc is not None:
print(desc)

View File

@@ -6,142 +6,20 @@ cpu = torch.device("cpu")
def send_everything_to_cpu():
global module_in_gpu
if module_in_gpu is not None:
module_in_gpu.to(cpu)
module_in_gpu = None
return
def is_needed(sd_model):
return shared.cmd_opts.lowvram or shared.cmd_opts.medvram or shared.cmd_opts.medvram_sdxl and hasattr(sd_model, 'conditioner')
return False
def apply(sd_model):
enable = is_needed(sd_model)
shared.parallel_processing_allowed = not enable
if enable:
setup_for_low_vram(sd_model, not shared.cmd_opts.lowvram)
else:
sd_model.lowvram = False
return
def setup_for_low_vram(sd_model, use_medvram):
if getattr(sd_model, 'lowvram', False):
return
sd_model.lowvram = True
parents = {}
def send_me_to_gpu(module, _):
"""send this module to GPU; send whatever tracked module was previous in GPU to CPU;
we add this as forward_pre_hook to a lot of modules and this way all but one of them will
be in CPU
"""
global module_in_gpu
module = parents.get(module, module)
if module_in_gpu == module:
return
if module_in_gpu is not None:
module_in_gpu.to(cpu)
module.to(devices.device)
module_in_gpu = module
# see below for register_forward_pre_hook;
# first_stage_model does not use forward(), it uses encode/decode, so register_forward_pre_hook is
# useless here, and we just replace those methods
first_stage_model = sd_model.first_stage_model
first_stage_model_encode = sd_model.first_stage_model.encode
first_stage_model_decode = sd_model.first_stage_model.decode
def first_stage_model_encode_wrap(x):
send_me_to_gpu(first_stage_model, None)
return first_stage_model_encode(x)
def first_stage_model_decode_wrap(z):
send_me_to_gpu(first_stage_model, None)
return first_stage_model_decode(z)
to_remain_in_cpu = [
(sd_model, 'first_stage_model'),
(sd_model, 'depth_model'),
(sd_model, 'embedder'),
(sd_model, 'model'),
(sd_model, 'embedder'),
]
is_sdxl = hasattr(sd_model, 'conditioner')
is_sd2 = not is_sdxl and hasattr(sd_model.cond_stage_model, 'model')
if is_sdxl:
to_remain_in_cpu.append((sd_model, 'conditioner'))
elif is_sd2:
to_remain_in_cpu.append((sd_model.cond_stage_model, 'model'))
else:
to_remain_in_cpu.append((sd_model.cond_stage_model, 'transformer'))
# remove several big modules: cond, first_stage, depth/embedder (if applicable), and unet from the model
stored = []
for obj, field in to_remain_in_cpu:
module = getattr(obj, field, None)
stored.append(module)
setattr(obj, field, None)
# send the model to GPU.
sd_model.to(devices.device)
# put modules back. the modules will be in CPU.
for (obj, field), module in zip(to_remain_in_cpu, stored):
setattr(obj, field, module)
# register hooks for those the first three models
if is_sdxl:
sd_model.conditioner.register_forward_pre_hook(send_me_to_gpu)
elif is_sd2:
sd_model.cond_stage_model.model.register_forward_pre_hook(send_me_to_gpu)
sd_model.cond_stage_model.model.token_embedding.register_forward_pre_hook(send_me_to_gpu)
parents[sd_model.cond_stage_model.model] = sd_model.cond_stage_model
parents[sd_model.cond_stage_model.model.token_embedding] = sd_model.cond_stage_model
else:
sd_model.cond_stage_model.transformer.register_forward_pre_hook(send_me_to_gpu)
parents[sd_model.cond_stage_model.transformer] = sd_model.cond_stage_model
sd_model.first_stage_model.register_forward_pre_hook(send_me_to_gpu)
sd_model.first_stage_model.encode = first_stage_model_encode_wrap
sd_model.first_stage_model.decode = first_stage_model_decode_wrap
if sd_model.depth_model:
sd_model.depth_model.register_forward_pre_hook(send_me_to_gpu)
if sd_model.embedder:
sd_model.embedder.register_forward_pre_hook(send_me_to_gpu)
if use_medvram:
sd_model.model.register_forward_pre_hook(send_me_to_gpu)
else:
diff_model = sd_model.model.diffusion_model
# the third remaining model is still too big for 4 GB, so we also do the same for its submodules
# so that only one of them is in GPU at a time
stored = diff_model.input_blocks, diff_model.middle_block, diff_model.output_blocks, diff_model.time_embed
diff_model.input_blocks, diff_model.middle_block, diff_model.output_blocks, diff_model.time_embed = None, None, None, None
sd_model.model.to(devices.device)
diff_model.input_blocks, diff_model.middle_block, diff_model.output_blocks, diff_model.time_embed = stored
# install hooks for bits of third model
diff_model.time_embed.register_forward_pre_hook(send_me_to_gpu)
for block in diff_model.input_blocks:
block.register_forward_pre_hook(send_me_to_gpu)
diff_model.middle_block.register_forward_pre_hook(send_me_to_gpu)
for block in diff_model.output_blocks:
block.register_forward_pre_hook(send_me_to_gpu)
return
def is_enabled(sd_model):
return sd_model.lowvram
return False

View File

@@ -627,44 +627,7 @@ def decode_latent_batch(model, batch, target_device=None, check_for_nans=False):
for i in range(batch.shape[0]):
sample = decode_first_stage(model, batch[i:i + 1])[0]
if check_for_nans:
try:
devices.test_for_nans(sample, "vae")
except devices.NansException as e:
if shared.opts.auto_vae_precision_bfloat16:
autofix_dtype = torch.bfloat16
autofix_dtype_text = "bfloat16"
autofix_dtype_setting = "Automatically convert VAE to bfloat16"
autofix_dtype_comment = ""
elif shared.opts.auto_vae_precision:
autofix_dtype = torch.float32
autofix_dtype_text = "32-bit float"
autofix_dtype_setting = "Automatically revert VAE to 32-bit floats"
autofix_dtype_comment = "\nTo always start with 32-bit VAE, use --no-half-vae commandline flag."
else:
raise e
if devices.dtype_vae == autofix_dtype:
raise e
errors.print_error_explanation(
"A tensor with all NaNs was produced in VAE.\n"
f"Web UI will now convert VAE into {autofix_dtype_text} and retry.\n"
f"To disable this behavior, disable the '{autofix_dtype_setting}' setting.{autofix_dtype_comment}"
)
devices.dtype_vae = autofix_dtype
model.first_stage_model.to(devices.dtype_vae)
batch = batch.to(devices.dtype_vae)
sample = decode_first_stage(model, batch[i:i + 1])[0]
if target_device is not None:
sample = sample.to(target_device)
samples.append(sample)
samples.append(sample.to(target_device))
return samples
@@ -847,7 +810,7 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
infotexts = []
output_images = []
with torch.no_grad(), p.sd_model.ema_scope():
with torch.no_grad():
with devices.autocast():
p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
@@ -871,6 +834,7 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
sd_models.reload_model_weights() # model can be changed for example by refiner
p.sd_model.forge_objects = p.sd_model.forge_objects_original.shallow_copy()
p.prompts = p.all_prompts[n * p.batch_size:(n + 1) * p.batch_size]
p.negative_prompts = p.all_negative_prompts[n * p.batch_size:(n + 1) * p.batch_size]
p.seeds = p.all_seeds[n * p.batch_size:(n + 1) * p.batch_size]
@@ -887,8 +851,9 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
p.parse_extra_network_prompts()
if not p.disable_extra_networks:
with devices.autocast():
extra_networks.activate(p, p.extra_network_data)
extra_networks.activate(p, p.extra_network_data)
p.sd_model.forge_objects = p.sd_model.forge_objects_after_applying_lora.shallow_copy()
if p.scripts is not None:
p.scripts.process_batch(p, batch_number=n, prompts=p.prompts, seeds=p.seeds, subseeds=p.subseeds)
@@ -940,8 +905,7 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
p.extra_generation_params['Noise Schedule'] = opts.sd_noise_schedule
p.sd_model.alphas_cumprod = rescale_zero_terminal_snr_abar(p.sd_model.alphas_cumprod).to(shared.device)
with devices.without_autocast() if devices.unet_needs_upcast else devices.autocast():
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
if p.scripts is not None:
ps = scripts.PostSampleArgs(samples_ddim)
@@ -960,9 +924,6 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
del samples_ddim
if lowvram.is_enabled(shared.sd_model):
lowvram.send_everything_to_cpu()
devices.torch_gc()
state.nextjob()
@@ -1269,7 +1230,7 @@ class StableDiffusionProcessingTxt2Img(StableDiffusionProcessing):
image = np.array(self.firstpass_image).astype(np.float32) / 255.0
image = np.moveaxis(image, 2, 0)
image = torch.from_numpy(np.expand_dims(image, axis=0))
image = image.to(shared.device, dtype=devices.dtype_vae)
image = image.to(shared.device, dtype=torch.float32)
if opts.sd_vae_encode_method != 'Full':
self.extra_generation_params['VAE Encoder'] = opts.sd_vae_encode_method
@@ -1353,7 +1314,7 @@ class StableDiffusionProcessingTxt2Img(StableDiffusionProcessing):
batch_images.append(image)
decoded_samples = torch.from_numpy(np.array(batch_images))
decoded_samples = decoded_samples.to(shared.device, dtype=devices.dtype_vae)
decoded_samples = decoded_samples.to(shared.device, dtype=torch.float32)
if opts.sd_vae_encode_method != 'Full':
self.extra_generation_params['VAE Encoder'] = opts.sd_vae_encode_method
@@ -1458,7 +1419,7 @@ class StableDiffusionProcessingTxt2Img(StableDiffusionProcessing):
if shared.opts.hires_fix_use_firstpass_conds:
self.calculate_hr_conds()
elif lowvram.is_enabled(shared.sd_model) and shared.sd_model.sd_checkpoint_info == sd_models.select_checkpoint(): # if in lowvram mode, we need to calculate conds right away, before the cond NN is unloaded
elif shared.sd_model.sd_checkpoint_info == sd_models.select_checkpoint(): # if in lowvram mode, we need to calculate conds right away, before the cond NN is unloaded
with devices.autocast():
extra_networks.activate(self, self.hr_extra_network_data)
@@ -1645,7 +1606,7 @@ class StableDiffusionProcessingImg2Img(StableDiffusionProcessing):
raise RuntimeError(f"bad number of images passed: {len(imgs)}; expecting {self.batch_size} or less")
image = torch.from_numpy(batch_images)
image = image.to(shared.device, dtype=devices.dtype_vae)
image = image.to(shared.device, dtype=torch.float32)
if opts.sd_vae_encode_method != 'Full':
self.extra_generation_params['VAE Encoder'] = opts.sd_vae_encode_method

View File

@@ -276,6 +276,12 @@ class DictWithShape(dict):
def shape(self):
return self["crossattn"].shape
def to(self, *args, **kwargs):
for k in self.keys():
if isinstance(self[k], torch.Tensor):
self[k] = self[k].to(*args, **kwargs)
return self
def reconstruct_cond_batch(c: list[list[ScheduledPromptConditioning]], current_step):
param = c[0][0].cond
@@ -317,15 +323,32 @@ def stack_conds(tensors):
return torch.stack(tensors)
def stack_conds_alter(tensors, weights):
token_count = max([x.shape[0] for x in tensors])
for i in range(len(tensors)):
if tensors[i].shape[0] != token_count:
last_vector = tensors[i][-1:]
last_vector_repeated = last_vector.repeat([token_count - tensors[i].shape[0], 1])
tensors[i] = torch.vstack([tensors[i], last_vector_repeated])
result = 0
full_weights = 0
for x, w in zip(tensors, weights):
result = result + x * float(w)
full_weights = full_weights + float(w)
result = result / full_weights
return result
def reconstruct_multicond_batch(c: MulticondLearnedConditioning, current_step):
param = c.batch[0][0].schedules[0].cond
tensors = []
conds_list = []
results = []
for composable_prompts in c.batch:
conds_for_batch = []
tensors = []
weights = []
for composable_prompt in composable_prompts:
target_index = 0
@@ -334,19 +357,24 @@ def reconstruct_multicond_batch(c: MulticondLearnedConditioning, current_step):
target_index = current
break
conds_for_batch.append((len(tensors), composable_prompt.weight))
weights.append(composable_prompt.weight)
tensors.append(composable_prompt.schedules[target_index].cond)
conds_list.append(conds_for_batch)
if isinstance(tensors[0], dict):
weighted = {k: stack_conds_alter([x[k] for x in tensors], weights) for k in tensors[0].keys()}
else:
weighted = stack_conds_alter(tensors, weights)
if isinstance(tensors[0], dict):
keys = list(tensors[0].keys())
stacked = {k: stack_conds([x[k] for x in tensors]) for k in keys}
stacked = DictWithShape(stacked, stacked['crossattn'].shape)
results.append(weighted)
if isinstance(results[0], dict):
results = {k: torch.stack([x[k] for x in results])
for k in results[0].keys()}
results = DictWithShape(results, results['crossattn'].shape)
else:
stacked = stack_conds(tensors).to(device=param.device, dtype=param.dtype)
results = torch.stack(results).to(device=param.device, dtype=param.dtype)
return conds_list, stacked
return results
re_attention = re.compile(r"""

View File

@@ -2,7 +2,7 @@ import os
from modules import modelloader, errors
from modules.shared import cmd_opts, opts
from modules.upscaler import Upscaler, UpscalerData
from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
from modules.upscaler_utils import upscale_with_model
@@ -27,6 +27,8 @@ class UpscalerRealESRGAN(Upscaler):
self.scalers.append(scaler)
def do_upscale(self, img, path):
prepare_free_memory()
if not self.enable:
return img

View File

@@ -57,57 +57,11 @@ def list_optimizers():
def apply_optimizations(option=None):
global current_optimizer
undo_optimizations()
if len(optimizers) == 0:
# a script can access the model very early, and optimizations would not be filled by then
current_optimizer = None
return ''
ldm.modules.diffusionmodules.model.nonlinearity = silu
ldm.modules.diffusionmodules.openaimodel.th = sd_hijack_unet.th
sgm.modules.diffusionmodules.model.nonlinearity = silu
sgm.modules.diffusionmodules.openaimodel.th = sd_hijack_unet.th
if current_optimizer is not None:
current_optimizer.undo()
current_optimizer = None
selection = option or shared.opts.cross_attention_optimization
if selection == "Automatic" and len(optimizers) > 0:
matching_optimizer = next(iter([x for x in optimizers if x.cmd_opt and getattr(shared.cmd_opts, x.cmd_opt, False)]), optimizers[0])
else:
matching_optimizer = next(iter([x for x in optimizers if x.title() == selection]), None)
if selection == "None":
matching_optimizer = None
elif selection == "Automatic" and shared.cmd_opts.disable_opt_split_attention:
matching_optimizer = None
elif matching_optimizer is None:
matching_optimizer = optimizers[0]
if matching_optimizer is not None:
print(f"Applying attention optimization: {matching_optimizer.name}... ", end='')
matching_optimizer.apply()
print("done.")
current_optimizer = matching_optimizer
return current_optimizer.name
else:
print("Disabling attention optimization")
return ''
return
def undo_optimizations():
ldm.modules.diffusionmodules.model.nonlinearity = diffusionmodules_model_nonlinearity
ldm.modules.attention.CrossAttention.forward = hypernetwork.attention_CrossAttention_forward
ldm.modules.diffusionmodules.model.AttnBlock.forward = diffusionmodules_model_AttnBlock_forward
sgm.modules.diffusionmodules.model.nonlinearity = diffusionmodules_model_nonlinearity
sgm.modules.attention.CrossAttention.forward = hypernetwork.attention_CrossAttention_forward
sgm.modules.diffusionmodules.model.AttnBlock.forward = diffusionmodules_model_AttnBlock_forward
return
def fix_checkpoint():
@@ -182,131 +136,16 @@ class StableDiffusionModelHijack:
self.embedding_db.add_embedding_dir(cmd_opts.embeddings_dir)
def apply_optimizations(self, option=None):
try:
self.optimization_method = apply_optimizations(option)
except Exception as e:
errors.display(e, "applying cross attention optimization")
undo_optimizations()
pass
def convert_sdxl_to_ssd(self, m):
"""Converts an SDXL model to a Segmind Stable Diffusion model (see https://huggingface.co/segmind/SSD-1B)"""
delattr(m.model.diffusion_model.middle_block, '1')
delattr(m.model.diffusion_model.middle_block, '2')
for i in ['9', '8', '7', '6', '5', '4']:
delattr(m.model.diffusion_model.input_blocks[7][1].transformer_blocks, i)
delattr(m.model.diffusion_model.input_blocks[8][1].transformer_blocks, i)
delattr(m.model.diffusion_model.output_blocks[0][1].transformer_blocks, i)
delattr(m.model.diffusion_model.output_blocks[1][1].transformer_blocks, i)
delattr(m.model.diffusion_model.output_blocks[4][1].transformer_blocks, '1')
delattr(m.model.diffusion_model.output_blocks[5][1].transformer_blocks, '1')
devices.torch_gc()
pass
def hijack(self, m):
conditioner = getattr(m, 'conditioner', None)
if conditioner:
text_cond_models = []
for i in range(len(conditioner.embedders)):
embedder = conditioner.embedders[i]
typename = type(embedder).__name__
if typename == 'FrozenOpenCLIPEmbedder':
embedder.model.token_embedding = EmbeddingsWithFixes(embedder.model.token_embedding, self)
conditioner.embedders[i] = sd_hijack_open_clip.FrozenOpenCLIPEmbedderWithCustomWords(embedder, self)
text_cond_models.append(conditioner.embedders[i])
if typename == 'FrozenCLIPEmbedder':
model_embeddings = embedder.transformer.text_model.embeddings
model_embeddings.token_embedding = EmbeddingsWithFixes(model_embeddings.token_embedding, self)
conditioner.embedders[i] = sd_hijack_clip.FrozenCLIPEmbedderForSDXLWithCustomWords(embedder, self)
text_cond_models.append(conditioner.embedders[i])
if typename == 'FrozenOpenCLIPEmbedder2':
embedder.model.token_embedding = EmbeddingsWithFixes(embedder.model.token_embedding, self, textual_inversion_key='clip_g')
conditioner.embedders[i] = sd_hijack_open_clip.FrozenOpenCLIPEmbedder2WithCustomWords(embedder, self)
text_cond_models.append(conditioner.embedders[i])
if len(text_cond_models) == 1:
m.cond_stage_model = text_cond_models[0]
else:
m.cond_stage_model = conditioner
if type(m.cond_stage_model) == xlmr.BertSeriesModelWithTransformation or type(m.cond_stage_model) == xlmr_m18.BertSeriesModelWithTransformation:
model_embeddings = m.cond_stage_model.roberta.embeddings
model_embeddings.token_embedding = EmbeddingsWithFixes(model_embeddings.word_embeddings, self)
m.cond_stage_model = sd_hijack_xlmr.FrozenXLMREmbedderWithCustomWords(m.cond_stage_model, self)
elif type(m.cond_stage_model) == ldm.modules.encoders.modules.FrozenCLIPEmbedder:
model_embeddings = m.cond_stage_model.transformer.text_model.embeddings
model_embeddings.token_embedding = EmbeddingsWithFixes(model_embeddings.token_embedding, self)
m.cond_stage_model = sd_hijack_clip.FrozenCLIPEmbedderWithCustomWords(m.cond_stage_model, self)
elif type(m.cond_stage_model) == ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder:
m.cond_stage_model.model.token_embedding = EmbeddingsWithFixes(m.cond_stage_model.model.token_embedding, self)
m.cond_stage_model = sd_hijack_open_clip.FrozenOpenCLIPEmbedderWithCustomWords(m.cond_stage_model, self)
apply_weighted_forward(m)
if m.cond_stage_key == "edit":
sd_hijack_unet.hijack_ddpm_edit()
self.apply_optimizations()
self.clip = m.cond_stage_model
def flatten(el):
flattened = [flatten(children) for children in el.children()]
res = [el]
for c in flattened:
res += c
return res
self.layers = flatten(m)
import modules.models.diffusion.ddpm_edit
if isinstance(m, ldm.models.diffusion.ddpm.LatentDiffusion):
sd_unet.original_forward = ldm_original_forward
elif isinstance(m, modules.models.diffusion.ddpm_edit.LatentDiffusion):
sd_unet.original_forward = ldm_original_forward
elif isinstance(m, sgm.models.diffusion.DiffusionEngine):
sd_unet.original_forward = sgm_original_forward
else:
sd_unet.original_forward = None
pass
def undo_hijack(self, m):
conditioner = getattr(m, 'conditioner', None)
if conditioner:
for i in range(len(conditioner.embedders)):
embedder = conditioner.embedders[i]
if isinstance(embedder, (sd_hijack_open_clip.FrozenOpenCLIPEmbedderWithCustomWords, sd_hijack_open_clip.FrozenOpenCLIPEmbedder2WithCustomWords)):
embedder.wrapped.model.token_embedding = embedder.wrapped.model.token_embedding.wrapped
conditioner.embedders[i] = embedder.wrapped
if isinstance(embedder, sd_hijack_clip.FrozenCLIPEmbedderForSDXLWithCustomWords):
embedder.wrapped.transformer.text_model.embeddings.token_embedding = embedder.wrapped.transformer.text_model.embeddings.token_embedding.wrapped
conditioner.embedders[i] = embedder.wrapped
if hasattr(m, 'cond_stage_model'):
delattr(m, 'cond_stage_model')
elif type(m.cond_stage_model) == sd_hijack_xlmr.FrozenXLMREmbedderWithCustomWords:
m.cond_stage_model = m.cond_stage_model.wrapped
elif type(m.cond_stage_model) == sd_hijack_clip.FrozenCLIPEmbedderWithCustomWords:
m.cond_stage_model = m.cond_stage_model.wrapped
model_embeddings = m.cond_stage_model.transformer.text_model.embeddings
if type(model_embeddings.token_embedding) == EmbeddingsWithFixes:
model_embeddings.token_embedding = model_embeddings.token_embedding.wrapped
elif type(m.cond_stage_model) == sd_hijack_open_clip.FrozenOpenCLIPEmbedderWithCustomWords:
m.cond_stage_model.wrapped.model.token_embedding = m.cond_stage_model.wrapped.model.token_embedding.wrapped
m.cond_stage_model = m.cond_stage_model.wrapped
undo_optimizations()
undo_weighted_forward(m)
self.apply_circular(False)
self.layers = None
self.clip = None
pass
def apply_circular(self, enable):
if self.circular_enabled == enable:
@@ -321,17 +160,12 @@ class StableDiffusionModelHijack:
self.comments = []
self.extra_generation_params = {}
def get_prompt_lengths(self, text):
if self.clip is None:
return "-", "-"
_, token_count = self.clip.process_texts([text])
return token_count, self.clip.get_target_prompt_token_count(token_count)
def get_prompt_lengths(self, text, cond_stage_model):
_, token_count = cond_stage_model.process_texts([text])
return token_count, cond_stage_model.get_target_prompt_token_count(token_count)
def redo_hijack(self, m):
self.undo_hijack(m)
self.hijack(m)
pass
class EmbeddingsWithFixes(torch.nn.Module):

View File

@@ -10,6 +10,7 @@ from omegaconf import OmegaConf, ListConfig
from os import mkdir
from urllib import request
import ldm.modules.midas as midas
import gc
from ldm.util import instantiate_from_config
@@ -17,6 +18,12 @@ from modules import paths, shared, modelloader, devices, script_callbacks, sd_va
from modules.timer import Timer
import tomesd
import numpy as np
from modules_forge import forge_loader
import modules_forge.ops as forge_ops
from ldm_patched.modules.ops import manual_cast
from ldm_patched.modules import model_management as model_management
import ldm_patched.modules.model_patcher
model_dir = "Stable-diffusion"
model_path = os.path.abspath(os.path.join(paths.models_path, model_dir))
@@ -366,26 +373,12 @@ def load_model_weights(model, checkpoint_info: CheckpointInfo, state_dict, timer
sd_model_hash = checkpoint_info.calculate_shorthash()
timer.record("calculate hash")
if devices.fp8:
# prevent model to load state dict in fp8
model.half()
if not SkipWritingToConfig.skip:
shared.opts.data["sd_model_checkpoint"] = checkpoint_info.title
if state_dict is None:
state_dict = get_checkpoint_state_dict(checkpoint_info, timer)
model.is_sdxl = hasattr(model, 'conditioner')
model.is_sd2 = not model.is_sdxl and hasattr(model.cond_stage_model, 'model')
model.is_sd1 = not model.is_sdxl and not model.is_sd2
model.is_ssd = model.is_sdxl and 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_q.weight' not in state_dict.keys()
if model.is_sdxl:
sd_models_xl.extend_sdxl(model)
if model.is_ssd:
sd_hijack.model_hijack.convert_sdxl_to_ssd(model)
if shared.opts.sd_checkpoint_cache > 0:
# cache newly loaded model
checkpoints_loaded[checkpoint_info] = state_dict.copy()
@@ -395,65 +388,6 @@ def load_model_weights(model, checkpoint_info: CheckpointInfo, state_dict, timer
del state_dict
if shared.cmd_opts.opt_channelslast:
model.to(memory_format=torch.channels_last)
timer.record("apply channels_last")
if shared.cmd_opts.no_half:
model.float()
model.alphas_cumprod_original = model.alphas_cumprod
devices.dtype_unet = torch.float32
timer.record("apply float()")
else:
vae = model.first_stage_model
depth_model = getattr(model, 'depth_model', None)
# with --no-half-vae, remove VAE from model when doing half() to prevent its weights from being converted to float16
if shared.cmd_opts.no_half_vae:
model.first_stage_model = None
# with --upcast-sampling, don't convert the depth model weights to float16
if shared.cmd_opts.upcast_sampling and depth_model:
model.depth_model = None
alphas_cumprod = model.alphas_cumprod
model.alphas_cumprod = None
model.half()
model.alphas_cumprod = alphas_cumprod
model.alphas_cumprod_original = alphas_cumprod
model.first_stage_model = vae
if depth_model:
model.depth_model = depth_model
devices.dtype_unet = torch.float16
timer.record("apply half()")
for module in model.modules():
if hasattr(module, 'fp16_weight'):
del module.fp16_weight
if hasattr(module, 'fp16_bias'):
del module.fp16_bias
if check_fp8(model):
devices.fp8 = True
first_stage = model.first_stage_model
model.first_stage_model = None
for module in model.modules():
if isinstance(module, (torch.nn.Conv2d, torch.nn.Linear)):
if shared.opts.cache_fp16_weight:
module.fp16_weight = module.weight.data.clone().cpu().half()
if module.bias is not None:
module.fp16_bias = module.bias.data.clone().cpu().half()
module.to(torch.float8_e4m3fn)
model.first_stage_model = first_stage
timer.record("apply fp8")
else:
devices.fp8 = False
devices.unet_needs_upcast = shared.cmd_opts.upcast_sampling and devices.dtype == torch.float16 and devices.dtype_unet == torch.float16
model.first_stage_model.to(devices.dtype_vae)
timer.record("apply dtype to VAE")
# clean up cache if limit is reached
while len(checkpoints_loaded) > shared.opts.sd_checkpoint_cache:
checkpoints_loaded.popitem(last=False)
@@ -590,14 +524,6 @@ class SdModelData:
sd_vae.loaded_vae_file = getattr(v, "loaded_vae_file", None)
sd_vae.checkpoint_info = v.sd_checkpoint_info
try:
self.loaded_sd_models.remove(v)
except ValueError:
pass
if v is not None:
self.loaded_sd_models.insert(0, v)
model_data = SdModelData()
@@ -615,31 +541,19 @@ def get_empty_cond(sd_model):
def send_model_to_cpu(m):
if m.lowvram:
lowvram.send_everything_to_cpu()
else:
m.to(devices.cpu)
devices.torch_gc()
pass
def model_target_device(m):
if lowvram.is_needed(m):
return devices.cpu
else:
return devices.device
return devices.device
def send_model_to_device(m):
lowvram.apply(m)
if not m.lowvram:
m.to(shared.device)
pass
def send_model_to_trash(m):
m.to(device="meta")
devices.torch_gc()
pass
def load_model(checkpoint_info=None, already_loaded_state_dict=None):
@@ -649,9 +563,14 @@ def load_model(checkpoint_info=None, already_loaded_state_dict=None):
timer = Timer()
if model_data.sd_model:
send_model_to_trash(model_data.sd_model)
if model_data.sd_model.filename == checkpoint_info.filename:
return model_data.sd_model
model_data.sd_model = None
devices.torch_gc()
model_data.loaded_sd_models = []
model_management.unload_all_models()
model_management.soft_empty_cache()
gc.collect()
timer.record("unload existing model")
@@ -660,58 +579,27 @@ def load_model(checkpoint_info=None, already_loaded_state_dict=None):
else:
state_dict = get_checkpoint_state_dict(checkpoint_info, timer)
checkpoint_config = sd_models_config.find_checkpoint_config(state_dict, checkpoint_info)
clip_is_included_into_sd = any(x for x in [sd1_clip_weight, sd2_clip_weight, sdxl_clip_weight, sdxl_refiner_clip_weight] if x in state_dict)
if shared.opts.sd_checkpoint_cache > 0:
# cache newly loaded model
checkpoints_loaded[checkpoint_info] = state_dict.copy()
timer.record("find config")
sd_model = forge_loader.load_model_for_a1111(timer=timer, checkpoint_info=checkpoint_info, state_dict=state_dict)
sd_model.filename = checkpoint_info.filename
sd_config = OmegaConf.load(checkpoint_config)
repair_config(sd_config)
del state_dict
timer.record("load config")
# clean up cache if limit is reached
while len(checkpoints_loaded) > shared.opts.sd_checkpoint_cache:
checkpoints_loaded.popitem(last=False)
print(f"Creating model from config: {checkpoint_config}")
shared.opts.data["sd_checkpoint_hash"] = checkpoint_info.sha256
sd_model = None
try:
with sd_disable_initialization.DisableInitialization(disable_clip=clip_is_included_into_sd or shared.cmd_opts.do_not_download_clip):
with sd_disable_initialization.InitializeOnMeta():
sd_model = instantiate_from_config(sd_config.model)
sd_vae.delete_base_vae()
sd_vae.clear_loaded_vae()
vae_file, vae_source = sd_vae.resolve_vae(checkpoint_info.filename).tuple()
sd_vae.load_vae(sd_model, vae_file, vae_source)
timer.record("load VAE")
except Exception as e:
errors.display(e, "creating model quickly", full_traceback=True)
if sd_model is None:
print('Failed to create model quickly; will retry using slow method.', file=sys.stderr)
with sd_disable_initialization.InitializeOnMeta():
sd_model = instantiate_from_config(sd_config.model)
sd_model.used_config = checkpoint_config
timer.record("create model")
if shared.cmd_opts.no_half:
weight_dtype_conversion = None
else:
weight_dtype_conversion = {
'first_stage_model': None,
'alphas_cumprod': None,
'': torch.float16,
}
with sd_disable_initialization.LoadStateDictOnMeta(state_dict, device=model_target_device(sd_model), weight_dtype_conversion=weight_dtype_conversion):
load_model_weights(sd_model, checkpoint_info, state_dict, timer)
timer.record("load weights from state dict")
send_model_to_device(sd_model)
timer.record("move model to device")
sd_hijack.model_hijack.hijack(sd_model)
timer.record("hijack")
sd_model.eval()
model_data.set_sd_model(sd_model)
model_data.was_loaded_at_least_once = True
@@ -723,7 +611,7 @@ def load_model(checkpoint_info=None, already_loaded_state_dict=None):
timer.record("scripts callbacks")
with devices.autocast(), torch.no_grad():
with torch.no_grad():
sd_model.cond_stage_model_empty_prompt = get_empty_cond(sd_model)
timer.record("calculate empty prompt")
@@ -734,132 +622,14 @@ def load_model(checkpoint_info=None, already_loaded_state_dict=None):
def reuse_model_from_already_loaded(sd_model, checkpoint_info, timer):
"""
Checks if the desired checkpoint from checkpoint_info is not already loaded in model_data.loaded_sd_models.
If it is loaded, returns that (moving it to GPU if necessary, and moving the currently loadded model to CPU if necessary).
If not, returns the model that can be used to load weights from checkpoint_info's file.
If no such model exists, returns None.
Additionaly deletes loaded models that are over the limit set in settings (sd_checkpoints_limit).
"""
already_loaded = None
for i in reversed(range(len(model_data.loaded_sd_models))):
loaded_model = model_data.loaded_sd_models[i]
if loaded_model.sd_checkpoint_info.filename == checkpoint_info.filename:
already_loaded = loaded_model
continue
if len(model_data.loaded_sd_models) > shared.opts.sd_checkpoints_limit > 0:
print(f"Unloading model {len(model_data.loaded_sd_models)} over the limit of {shared.opts.sd_checkpoints_limit}: {loaded_model.sd_checkpoint_info.title}")
model_data.loaded_sd_models.pop()
send_model_to_trash(loaded_model)
timer.record("send model to trash")
if shared.opts.sd_checkpoints_keep_in_cpu:
send_model_to_cpu(sd_model)
timer.record("send model to cpu")
if already_loaded is not None:
send_model_to_device(already_loaded)
timer.record("send model to device")
model_data.set_sd_model(already_loaded, already_loaded=True)
if not SkipWritingToConfig.skip:
shared.opts.data["sd_model_checkpoint"] = already_loaded.sd_checkpoint_info.title
shared.opts.data["sd_checkpoint_hash"] = already_loaded.sd_checkpoint_info.sha256
print(f"Using already loaded model {already_loaded.sd_checkpoint_info.title}: done in {timer.summary()}")
sd_vae.reload_vae_weights(already_loaded)
return model_data.sd_model
elif shared.opts.sd_checkpoints_limit > 1 and len(model_data.loaded_sd_models) < shared.opts.sd_checkpoints_limit:
print(f"Loading model {checkpoint_info.title} ({len(model_data.loaded_sd_models) + 1} out of {shared.opts.sd_checkpoints_limit})")
model_data.sd_model = None
load_model(checkpoint_info)
return model_data.sd_model
elif len(model_data.loaded_sd_models) > 0:
sd_model = model_data.loaded_sd_models.pop()
model_data.sd_model = sd_model
sd_vae.base_vae = getattr(sd_model, "base_vae", None)
sd_vae.loaded_vae_file = getattr(sd_model, "loaded_vae_file", None)
sd_vae.checkpoint_info = sd_model.sd_checkpoint_info
print(f"Reusing loaded model {sd_model.sd_checkpoint_info.title} to load {checkpoint_info.title}")
return sd_model
else:
return None
pass
def reload_model_weights(sd_model=None, info=None, forced_reload=False):
checkpoint_info = info or select_checkpoint()
timer = Timer()
if not sd_model:
sd_model = model_data.sd_model
if sd_model is None: # previous model load failed
current_checkpoint_info = None
else:
current_checkpoint_info = sd_model.sd_checkpoint_info
if check_fp8(sd_model) != devices.fp8:
# load from state dict again to prevent extra numerical errors
forced_reload = True
elif sd_model.sd_model_checkpoint == checkpoint_info.filename and not forced_reload:
return sd_model
sd_model = reuse_model_from_already_loaded(sd_model, checkpoint_info, timer)
if not forced_reload and sd_model is not None and sd_model.sd_checkpoint_info.filename == checkpoint_info.filename:
return sd_model
if sd_model is not None:
sd_unet.apply_unet("None")
send_model_to_cpu(sd_model)
sd_hijack.model_hijack.undo_hijack(sd_model)
state_dict = get_checkpoint_state_dict(checkpoint_info, timer)
checkpoint_config = sd_models_config.find_checkpoint_config(state_dict, checkpoint_info)
timer.record("find config")
if sd_model is None or checkpoint_config != sd_model.used_config:
if sd_model is not None:
send_model_to_trash(sd_model)
load_model(checkpoint_info, already_loaded_state_dict=state_dict)
return model_data.sd_model
try:
load_model_weights(sd_model, checkpoint_info, state_dict, timer)
except Exception:
print("Failed to load checkpoint, restoring previous")
load_model_weights(sd_model, current_checkpoint_info, None, timer)
raise
finally:
sd_hijack.model_hijack.hijack(sd_model)
timer.record("hijack")
if not sd_model.lowvram:
sd_model.to(devices.device)
timer.record("move model to device")
script_callbacks.model_loaded_callback(sd_model)
timer.record("script callbacks")
print(f"Weights loaded in {timer.summary()}.")
model_data.set_sd_model(sd_model)
sd_unet.apply_unet()
return sd_model
return load_model(info)
def unload_model_weights(sd_model=None, info=None):
send_model_to_cpu(sd_model or shared.sd_model)
return sd_model

View File

@@ -8,8 +8,12 @@ import sgm.modules.diffusionmodules.discretizer
from modules import devices, shared, prompt_parser
from modules import torch_utils
import ldm_patched.modules.model_management as model_management
def get_learned_conditioning(self: sgm.models.diffusion.DiffusionEngine, batch: prompt_parser.SdConditioning | list[str]):
model_management.load_model_gpu(self.forge_objects.clip.patcher)
for embedder in self.conditioner.embedders:
embedder.ucg_rate = 0.0
@@ -18,7 +22,7 @@ def get_learned_conditioning(self: sgm.models.diffusion.DiffusionEngine, batch:
is_negative_prompt = getattr(batch, 'is_negative_prompt', False)
aesthetic_score = shared.opts.sdxl_refiner_low_aesthetic_score if is_negative_prompt else shared.opts.sdxl_refiner_high_aesthetic_score
devices_args = dict(device=devices.device, dtype=devices.dtype)
devices_args = dict(device=self.forge_objects.clip.patcher.current_device, dtype=model_management.text_encoder_dtype())
sdxl_conds = {
"txt": batch,
@@ -34,14 +38,11 @@ def get_learned_conditioning(self: sgm.models.diffusion.DiffusionEngine, batch:
return c
def apply_model(self: sgm.models.diffusion.DiffusionEngine, x, t, cond):
sd = self.model.state_dict()
diffusion_model_input = sd.get('diffusion_model.input_blocks.0.0.weight', None)
if diffusion_model_input is not None:
if diffusion_model_input.shape[1] == 9:
x = torch.cat([x] + cond['c_concat'], dim=1)
def apply_model(self: sgm.models.diffusion.DiffusionEngine, x, t, cond, *args, **kwargs):
if self.model.diffusion_model.in_channels == 9:
x = torch.cat([x] + cond['c_concat'], dim=1)
return self.model(x, t, cond)
return self.model(x, t, cond, *args, **kwargs)
def get_first_stage_encoding(self, x): # SDXL's encode_first_stage does everything so get_first_stage_encoding is just there for compatibility

View File

@@ -6,6 +6,7 @@ import modules.shared as shared
from modules.script_callbacks import CFGDenoiserParams, cfg_denoiser_callback
from modules.script_callbacks import CFGDenoisedParams, cfg_denoised_callback
from modules.script_callbacks import AfterCFGCallbackParams, cfg_after_cfg_callback
from modules_forge import forge_sampler
def catenate_conds(conds):
@@ -66,7 +67,7 @@ class CFGDenoiser(torch.nn.Module):
def inner_model(self):
raise NotImplementedError()
def combine_denoised(self, x_out, conds_list, uncond, cond_scale):
def combine_denoised(self, x_out, conds_list, uncond, cond_scale, timestep, x_in, cond):
denoised_uncond = x_out[-uncond.shape[0]:]
denoised = torch.clone(denoised_uncond)
@@ -152,19 +153,13 @@ class CFGDenoiser(torch.nn.Module):
if state.interrupted or state.skipped:
raise sd_samplers_common.InterruptedException
if sd_samplers_common.apply_refiner(self):
if sd_samplers_common.apply_refiner(self, x):
cond = self.sampler.sampler_extra_args['cond']
uncond = self.sampler.sampler_extra_args['uncond']
# at self.image_cfg_scale == 1.0 produced results for edit model are the same as with normal sampling,
# so is_edit_model is set to False to support AND composition.
is_edit_model = shared.sd_model.cond_stage_key == "edit" and self.image_cfg_scale is not None and self.image_cfg_scale != 1.0
conds_list, tensor = prompt_parser.reconstruct_multicond_batch(cond, self.step)
cond = prompt_parser.reconstruct_multicond_batch(cond, self.step)
uncond = prompt_parser.reconstruct_cond_batch(uncond, self.step)
assert not is_edit_model or all(len(conds) == 1 for conds in conds_list), "AND is not supported for InstructPix2Pix checkpoint (unless using Image CFG scale = 1.0)"
# If we use masks, blending between the denoised and original latent images occurs here.
def apply_blend(current_latent):
blended_latent = current_latent * self.nmask + self.init_latent * self.mask
@@ -181,113 +176,12 @@ class CFGDenoiser(torch.nn.Module):
if self.mask_before_denoising and self.mask is not None:
x = apply_blend(x)
batch_size = len(conds_list)
repeats = [len(conds_list[i]) for i in range(batch_size)]
if shared.sd_model.model.conditioning_key == "crossattn-adm":
image_uncond = torch.zeros_like(image_cond)
make_condition_dict = lambda c_crossattn, c_adm: {"c_crossattn": [c_crossattn], "c_adm": c_adm}
else:
image_uncond = image_cond
if isinstance(uncond, dict):
make_condition_dict = lambda c_crossattn, c_concat: {**c_crossattn, "c_concat": [c_concat]}
else:
make_condition_dict = lambda c_crossattn, c_concat: {"c_crossattn": [c_crossattn], "c_concat": [c_concat]}
if not is_edit_model:
x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x])
sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma])
image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond])
else:
x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x] + [x])
sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma] + [sigma])
image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond] + [torch.zeros_like(self.init_latent)])
denoiser_params = CFGDenoiserParams(x_in, image_cond_in, sigma_in, state.sampling_step, state.sampling_steps, tensor, uncond, self)
denoiser_params = CFGDenoiserParams(x, image_cond, sigma, state.sampling_step, state.sampling_steps, cond, uncond, self)
cfg_denoiser_callback(denoiser_params)
x_in = denoiser_params.x
image_cond_in = denoiser_params.image_cond
sigma_in = denoiser_params.sigma
tensor = denoiser_params.text_cond
uncond = denoiser_params.text_uncond
skip_uncond = False
# alternating uncond allows for higher thresholds without the quality loss normally expected from raising it
if self.step % 2 and s_min_uncond > 0 and sigma[0] < s_min_uncond and not is_edit_model:
skip_uncond = True
x_in = x_in[:-batch_size]
sigma_in = sigma_in[:-batch_size]
self.padded_cond_uncond = False
self.padded_cond_uncond_v0 = False
if shared.opts.pad_cond_uncond and tensor.shape[1] != uncond.shape[1]:
tensor, uncond = self.pad_cond_uncond(tensor, uncond)
elif shared.opts.pad_cond_uncond_v0 and tensor.shape[1] != uncond.shape[1]:
tensor, uncond = self.pad_cond_uncond_v0(tensor, uncond)
if tensor.shape[1] == uncond.shape[1] or skip_uncond:
if is_edit_model:
cond_in = catenate_conds([tensor, uncond, uncond])
elif skip_uncond:
cond_in = tensor
else:
cond_in = catenate_conds([tensor, uncond])
if shared.opts.batch_cond_uncond:
x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
else:
x_out = torch.zeros_like(x_in)
for batch_offset in range(0, x_out.shape[0], batch_size):
a = batch_offset
b = a + batch_size
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(subscript_cond(cond_in, a, b), image_cond_in[a:b]))
else:
x_out = torch.zeros_like(x_in)
batch_size = batch_size*2 if shared.opts.batch_cond_uncond else batch_size
for batch_offset in range(0, tensor.shape[0], batch_size):
a = batch_offset
b = min(a + batch_size, tensor.shape[0])
if not is_edit_model:
c_crossattn = subscript_cond(tensor, a, b)
else:
c_crossattn = torch.cat([tensor[a:b]], uncond)
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))
if not skip_uncond:
x_out[-uncond.shape[0]:] = self.inner_model(x_in[-uncond.shape[0]:], sigma_in[-uncond.shape[0]:], cond=make_condition_dict(uncond, image_cond_in[-uncond.shape[0]:]))
denoised_image_indexes = [x[0][0] for x in conds_list]
if skip_uncond:
fake_uncond = torch.cat([x_out[i:i+1] for i in denoised_image_indexes])
x_out = torch.cat([x_out, fake_uncond]) # we skipped uncond denoising, so we put cond-denoised image to where the uncond-denoised image should be
denoised_params = CFGDenoisedParams(x_out, state.sampling_step, state.sampling_steps, self.inner_model)
cfg_denoised_callback(denoised_params)
devices.test_for_nans(x_out, "unet")
if is_edit_model:
denoised = self.combine_denoised_for_edit_model(x_out, cond_scale)
elif skip_uncond:
denoised = self.combine_denoised(x_out, conds_list, uncond, 1.0)
else:
denoised = self.combine_denoised(x_out, conds_list, uncond, cond_scale)
# Blend in the original latents (after)
if not self.mask_before_denoising and self.mask is not None:
denoised = apply_blend(denoised)
self.sampler.last_latent = self.get_pred_x0(torch.cat([x_in[i:i + 1] for i in denoised_image_indexes]), torch.cat([x_out[i:i + 1] for i in denoised_image_indexes]), sigma)
if opts.live_preview_content == "Prompt":
preview = self.sampler.last_latent
elif opts.live_preview_content == "Negative prompt":
preview = self.get_pred_x0(x_in[-uncond.shape[0]:], x_out[-uncond.shape[0]:], sigma)
else:
preview = self.get_pred_x0(torch.cat([x_in[i:i+1] for i in denoised_image_indexes]), torch.cat([denoised[i:i+1] for i in denoised_image_indexes]), sigma)
denoised = forge_sampler.forge_sample(self, denoiser_params=denoiser_params, cond_scale=cond_scale)
preview = self.sampler.last_latent = denoised
sd_samplers_common.store_latent(preview)
after_cfg_callback_params = AfterCFGCallbackParams(denoised, state.sampling_step, state.sampling_steps)

View File

@@ -5,6 +5,7 @@ import torch
from PIL import Image
from modules import devices, images, sd_vae_approx, sd_samplers, sd_vae_taesd, shared, sd_models
from modules.shared import opts, state
from ldm_patched.modules import model_management
import k_diffusion.sampling
@@ -39,9 +40,7 @@ def samples_to_images_tensor(sample, approximation=None, model=None):
if approximation is None or (shared.state.interrupted and opts.live_preview_fast_interrupt):
approximation = approximation_indexes.get(opts.show_progress_type, 0)
from modules import lowvram
if approximation == 0 and lowvram.is_enabled(shared.sd_model) and not shared.opts.live_preview_allow_lowvram_full:
if approximation == 0:
approximation = 1
if approximation == 2:
@@ -54,8 +53,7 @@ def samples_to_images_tensor(sample, approximation=None, model=None):
else:
if model is None:
model = shared.sd_model
with devices.without_autocast(): # fixes an issue with unstable VAEs that are flaky even in fp32
x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
x_sample = model.decode_first_stage(sample)
return x_sample
@@ -71,7 +69,6 @@ def single_sample_to_image(sample, approximation=None):
def decode_first_stage(model, x):
x = x.to(devices.dtype_vae)
approx_index = approximation_indexes.get(opts.sd_vae_decode_method, 0)
return samples_to_images_tensor(x, approx_index, model)
@@ -95,7 +92,6 @@ def images_tensor_to_samples(image, approximation=None, model=None):
else:
if model is None:
model = shared.sd_model
model.first_stage_model.to(devices.dtype_vae)
image = image.to(shared.device, dtype=devices.dtype_vae)
image = image * 2 - 1
@@ -155,7 +151,7 @@ def replace_torchsde_browinan():
replace_torchsde_browinan()
def apply_refiner(cfg_denoiser):
def apply_refiner(cfg_denoiser, x):
completed_ratio = cfg_denoiser.step / cfg_denoiser.total_steps
refiner_switch_at = cfg_denoiser.p.refiner_switch_at
refiner_checkpoint_info = cfg_denoiser.p.refiner_checkpoint_info
@@ -184,10 +180,17 @@ def apply_refiner(cfg_denoiser):
with sd_models.SkipWritingToConfig():
sd_models.reload_model_weights(info=refiner_checkpoint_info)
refiner = sd_models.model_data.get_sd_model()
devices.torch_gc()
cfg_denoiser.p.setup_conds()
cfg_denoiser.update_inner_model()
inference_memory = refiner.current_controlnet_required_memory
unet_patcher = refiner.forge_objects.unet
model_management.load_models_gpu(
[unet_patcher],
unet_patcher.memory_required([x.shape[0]] + list(x.shape[1:])) + inference_memory)
return True

View File

@@ -7,6 +7,8 @@ from modules.script_callbacks import ExtraNoiseParams, extra_noise_callback
from modules.shared import opts
import modules.shared as shared
import ldm_patched.modules.model_management
samplers_k_diffusion = [
('DPM++ 2M Karras', 'sample_dpmpp_2m', ['k_dpmpp_2m_ka'], {'scheduler': 'karras'}),
@@ -139,11 +141,21 @@ class KDiffusionSampler(sd_samplers_common.Sampler):
return sigmas
def sample_img2img(self, p, x, noise, conditioning, unconditional_conditioning, steps=None, image_conditioning=None):
inference_memory = self.model_wrap.inner_model.current_controlnet_required_memory
unet_patcher = self.model_wrap.inner_model.forge_objects.unet
ldm_patched.modules.model_management.load_models_gpu(
[unet_patcher],
unet_patcher.memory_required([x.shape[0] * 2] + list(x.shape[1:])) + inference_memory)
self.model_wrap.log_sigmas = self.model_wrap.log_sigmas.to(unet_patcher.current_device)
self.model_wrap.sigmas = self.model_wrap.sigmas.to(unet_patcher.current_device)
steps, t_enc = sd_samplers_common.setup_img2img_steps(p, steps)
sigmas = self.get_sigmas(p, steps)
sigma_sched = sigmas[steps - t_enc - 1:]
x = x.to(noise)
xi = x + noise * sigma_sched[0]
if opts.img2img_extra_noise > 0:
@@ -192,6 +204,15 @@ class KDiffusionSampler(sd_samplers_common.Sampler):
return samples
def sample(self, p, x, conditioning, unconditional_conditioning, steps=None, image_conditioning=None):
inference_memory = self.model_wrap.inner_model.current_controlnet_required_memory
unet_patcher = self.model_wrap.inner_model.forge_objects.unet
ldm_patched.modules.model_management.load_models_gpu(
[unet_patcher],
unet_patcher.memory_required([x.shape[0] * 2] + list(x.shape[1:])) + inference_memory)
self.model_wrap.log_sigmas = self.model_wrap.log_sigmas.to(unet_patcher.current_device)
self.model_wrap.sigmas = self.model_wrap.sigmas.to(unet_patcher.current_device)
steps = steps or p.steps
sigmas = self.get_sigmas(p, steps)

View File

@@ -34,7 +34,7 @@ class LCMCompVisDenoiser(DiscreteEpsDDPMDenoiser):
def sigma_to_t(self, sigma, quantize=None):
log_sigma = sigma.log()
dists = log_sigma - self.log_sigmas[:, None]
dists = log_sigma - self.log_sigmas.to(sigma)[:, None]
return dists.abs().argmin(dim=0).view(sigma.shape) * self.skip_steps + (self.skip_steps - 1)

View File

@@ -7,6 +7,8 @@ from modules.script_callbacks import ExtraNoiseParams, extra_noise_callback
from modules.shared import opts
import modules.shared as shared
import ldm_patched.modules.model_management
samplers_timesteps = [
('DDIM', sd_samplers_timesteps_impl.ddim, ['ddim'], {}),
@@ -54,6 +56,7 @@ class CFGDenoiserTimesteps(CFGDenoiser):
def get_pred_x0(self, x_in, x_out, sigma):
ts = sigma.to(dtype=int)
self.alphas = self.alphas.to(ts.device)
a_t = self.alphas[ts][:, None, None, None]
sqrt_one_minus_at = (1 - a_t).sqrt()
@@ -95,6 +98,14 @@ class CompVisSampler(sd_samplers_common.Sampler):
return timesteps
def sample_img2img(self, p, x, noise, conditioning, unconditional_conditioning, steps=None, image_conditioning=None):
inference_memory = self.model_wrap.inner_model.current_controlnet_required_memory
unet_patcher = self.model_wrap.inner_model.forge_objects.unet
ldm_patched.modules.model_management.load_models_gpu(
[unet_patcher],
unet_patcher.memory_required([x.shape[0] * 2] + list(x.shape[1:])) + inference_memory)
self.model_wrap.inner_model.alphas_cumprod = self.model_wrap.inner_model.alphas_cumprod.to(unet_patcher.current_device)
steps, t_enc = sd_samplers_common.setup_img2img_steps(p, steps)
timesteps = self.get_timesteps(p, steps)
@@ -104,7 +115,7 @@ class CompVisSampler(sd_samplers_common.Sampler):
sqrt_alpha_cumprod = torch.sqrt(alphas_cumprod[timesteps[t_enc]])
sqrt_one_minus_alpha_cumprod = torch.sqrt(1 - alphas_cumprod[timesteps[t_enc]])
xi = x * sqrt_alpha_cumprod + noise * sqrt_one_minus_alpha_cumprod
xi = x.to(noise) * sqrt_alpha_cumprod + noise * sqrt_one_minus_alpha_cumprod
if opts.img2img_extra_noise > 0:
p.extra_generation_params["Extra noise"] = opts.img2img_extra_noise
@@ -138,6 +149,14 @@ class CompVisSampler(sd_samplers_common.Sampler):
return samples
def sample(self, p, x, conditioning, unconditional_conditioning, steps=None, image_conditioning=None):
inference_memory = self.model_wrap.inner_model.current_controlnet_required_memory
unet_patcher = self.model_wrap.inner_model.forge_objects.unet
ldm_patched.modules.model_management.load_models_gpu(
[unet_patcher],
unet_patcher.memory_required([x.shape[0] * 2] + list(x.shape[1:])) + inference_memory)
self.model_wrap.inner_model.alphas_cumprod = self.model_wrap.inner_model.alphas_cumprod.to(unet_patcher.current_device)
steps = steps or p.steps
timesteps = self.get_timesteps(p, steps)

View File

@@ -237,7 +237,6 @@ def load_vae(model, vae_file=None, vae_source="from unknown source"):
# don't call this from outside
def _load_vae_dict(model, vae_dict_1):
model.first_stage_model.load_state_dict(vae_dict_1)
model.first_stage_model.to(devices.dtype_vae)
def clear_loaded_vae():
@@ -263,20 +262,12 @@ def reload_vae_weights(sd_model=None, vae_file=unspecified):
if loaded_vae_file == vae_file:
return
if sd_model.lowvram:
lowvram.send_everything_to_cpu()
else:
sd_model.to(devices.cpu)
sd_hijack.model_hijack.undo_hijack(sd_model)
load_vae(sd_model, vae_file, vae_source)
sd_hijack.model_hijack.hijack(sd_model)
if not sd_model.lowvram:
sd_model.to(devices.device)
script_callbacks.model_loaded_callback(sd_model)
print("VAE weights loaded.")

View File

@@ -24,13 +24,6 @@ def initialize():
pass
from modules import devices
devices.device, devices.device_interrogate, devices.device_gfpgan, devices.device_esrgan, devices.device_codeformer = \
(devices.cpu if any(y in cmd_opts.use_cpu for y in [x, 'all']) else devices.get_optimal_device() for x in ['sd', 'interrogate', 'gfpgan', 'esrgan', 'codeformer'])
devices.dtype = torch.float32 if cmd_opts.no_half else torch.float16
devices.dtype_vae = torch.float32 if cmd_opts.no_half or cmd_opts.no_half_vae else torch.float16
devices.dtype_inference = torch.float32 if cmd_opts.precision == 'full' else devices.dtype
shared.device = devices.device
shared.weight_load_location = None if cmd_opts.lowram else "cpu"

View File

@@ -299,7 +299,7 @@ options_templates.update(options_section(('ui_alternatives', "UI alternatives",
options_templates.update(options_section(('ui', "User interface", "ui"), {
"localization": OptionInfo("None", "Localization", gr.Dropdown, lambda: {"choices": ["None"] + list(localization.localizations.keys())}, refresh=lambda: localization.list_localizations(cmd_opts.localizations_dir)).needs_reload_ui(),
"quicksettings_list": OptionInfo(["sd_model_checkpoint"], "Quicksettings list", ui_components.DropdownMulti, lambda: {"choices": list(shared.opts.data_labels.keys())}).js("info", "settingsHintsShowQuicksettings").info("setting entries that appear at the top of page rather than in settings tab").needs_reload_ui(),
"quicksettings_list": OptionInfo(["sd_model_checkpoint", "sd_vae", "CLIP_stop_at_last_layers"], "Quicksettings list", ui_components.DropdownMulti, lambda: {"choices": list(shared.opts.data_labels.keys())}).js("info", "settingsHintsShowQuicksettings").info("setting entries that appear at the top of page rather than in settings tab").needs_reload_ui(),
"ui_tab_order": OptionInfo([], "UI tab order", ui_components.DropdownMulti, lambda: {"choices": list(shared.tab_names)}).needs_reload_ui(),
"hidden_tabs": OptionInfo([], "Hidden UI tabs", ui_components.DropdownMulti, lambda: {"choices": list(shared.tab_names)}).needs_reload_ui(),
"ui_reorder_list": OptionInfo([], "UI item order for txt2img/img2img tabs", ui_components.DropdownMulti, lambda: {"choices": list(shared_items.ui_reorder_categories())}).info("selected items appear first").needs_reload_ui(),

View File

@@ -167,9 +167,15 @@ def update_token_counter(text, steps, *, is_positive=True):
# messages related to it in console
prompt_schedules = [[[steps, text]]]
try:
cond_stage_model = sd_models.model_data.sd_model.cond_stage_model
assert cond_stage_model is not None
except Exception:
return f"<span class='gr-box gr-text-input'>?/?</span>"
flat_prompts = reduce(lambda list1, list2: list1+list2, prompt_schedules)
prompts = [prompt_text for step, prompt_text in flat_prompts]
token_count, max_length = max([model_hijack.get_prompt_lengths(prompt) for prompt in prompts], key=lambda args: args[0])
token_count, max_length = max([model_hijack.get_prompt_lengths(prompt, cond_stage_model) for prompt in prompts], key=lambda args: args[0])
return f"<span class='gr-box gr-text-input'>{token_count}/{max_length}</span>"

View File

@@ -294,7 +294,6 @@ class UiSettings:
for _i, k, _item in self.quicksettings_list:
component = self.component_dict[k]
info = opts.data_labels[k]
if isinstance(component, gr.Textbox):
methods = [component.submit, component.blur]
@@ -308,7 +307,7 @@ class UiSettings:
fn=lambda value, k=k: self.run_settings_single(value, key=k),
inputs=[component],
outputs=[component, self.text_settings],
show_progress=info.refresh is not None,
show_progress=False,
)
button_set_checkpoint = gr.Button('Change checkpoint', elem_id='change_checkpoint', visible=False)

View File

@@ -6,6 +6,13 @@ from PIL import Image
import modules.shared
from modules import modelloader, shared
from ldm_patched.modules import model_management
def prepare_free_memory():
model_management.free_memory(memory_required=1024*1024*3, device=model_management.get_torch_device())
print('Upscale script freed memory successfully.')
LANCZOS = (Image.Resampling.LANCZOS if hasattr(Image, 'Resampling') else Image.LANCZOS)
NEAREST = (Image.Resampling.NEAREST if hasattr(Image, 'Resampling') else Image.NEAREST)

View File

@@ -0,0 +1,84 @@
from modules.sd_hijack_clip import FrozenCLIPEmbedderWithCustomWords
from ldm_patched.modules import model_management
from modules.shared import opts
class CLIP_SD_15_L(FrozenCLIPEmbedderWithCustomWords):
def encode_with_transformers(self, tokens):
model_management.load_model_gpu(self.forge_objects.clip.patcher)
outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
if opts.CLIP_stop_at_last_layers > 1:
z = outputs.hidden_states[-opts.CLIP_stop_at_last_layers]
z = self.wrapped.transformer.text_model.final_layer_norm(z)
else:
z = outputs.last_hidden_state
return z
class CLIP_SD_21_H(FrozenCLIPEmbedderWithCustomWords):
def __init__(self, wrapped, hijack):
super().__init__(wrapped, hijack)
if self.wrapped.layer == "penultimate":
self.wrapped.layer = "hidden"
self.wrapped.layer_idx = -2
self.id_start = 49406
self.id_end = 49407
self.id_pad = 0
def encode_with_transformers(self, tokens):
model_management.load_model_gpu(self.forge_objects.clip.patcher)
outputs = self.wrapped.transformer(tokens, output_hidden_states=self.wrapped.layer == "hidden")
if self.wrapped.layer == "last":
z = outputs.last_hidden_state
else:
z = outputs.hidden_states[self.wrapped.layer_idx]
z = self.wrapped.transformer.text_model.final_layer_norm(z)
return z
class CLIP_SD_XL_L(FrozenCLIPEmbedderWithCustomWords):
def __init__(self, wrapped, hijack):
super().__init__(wrapped, hijack)
def encode_with_transformers(self, tokens):
outputs = self.wrapped.transformer(tokens, output_hidden_states=self.wrapped.layer == "hidden")
if self.wrapped.layer == "last":
z = outputs.last_hidden_state
else:
z = outputs.hidden_states[self.wrapped.layer_idx]
return z
class CLIP_SD_XL_G(FrozenCLIPEmbedderWithCustomWords):
def __init__(self, wrapped, hijack):
super().__init__(wrapped, hijack)
if self.wrapped.layer == "penultimate":
self.wrapped.layer = "hidden"
self.wrapped.layer_idx = -2
self.id_start = 49406
self.id_end = 49407
self.id_pad = 0
def encode_with_transformers(self, tokens):
outputs = self.wrapped.transformer(tokens, output_hidden_states=self.wrapped.layer == "hidden")
if self.wrapped.layer == "last":
z = outputs.last_hidden_state
else:
z = outputs.hidden_states[self.wrapped.layer_idx]
pooled_output = outputs.pooler_output
text_projection = self.wrapped.text_projection
pooled_output = pooled_output.float().to(text_projection.device) @ text_projection.float()
z.pooled = pooled_output
return z

View File

@@ -0,0 +1,248 @@
import torch
import contextlib
from ldm_patched.modules import model_management
from ldm_patched.modules import model_detection
from ldm_patched.modules.sd import VAE, CLIP, load_model_weights
import ldm_patched.modules.model_patcher
import ldm_patched.modules.utils
import ldm_patched.modules.clip_vision
from omegaconf import OmegaConf
from modules.sd_models_config import find_checkpoint_config
from modules.shared import cmd_opts
from modules import sd_hijack
from modules.sd_models_xl import extend_sdxl
from ldm.util import instantiate_from_config
from modules_forge import forge_clip
import open_clip
from transformers import CLIPTextModel, CLIPTokenizer
class FakeObject:
def __init__(self, *args, **kwargs):
super().__init__()
self.visual = None
return
def eval(self, *args, **kwargs):
return self
def parameters(self, *args, **kwargs):
return []
class ForgeSD:
def __init__(self, unet, clip, vae, clipvision):
self.unet = unet
self.clip = clip
self.vae = vae
self.clipvision = clipvision
def shallow_copy(self):
return ForgeSD(self.unet, self.clip, self.vae, self.clipvision)
@contextlib.contextmanager
def no_clip():
backup_openclip = open_clip.create_model_and_transforms
backup_CLIPTextModel = CLIPTextModel.from_pretrained
backup_CLIPTokenizer = CLIPTokenizer.from_pretrained
try:
open_clip.create_model_and_transforms = lambda *args, **kwargs: (FakeObject(), None, None)
CLIPTextModel.from_pretrained = lambda *args, **kwargs: FakeObject()
CLIPTokenizer.from_pretrained = lambda *args, **kwargs: FakeObject()
yield
finally:
open_clip.create_model_and_transforms = backup_openclip
CLIPTextModel.from_pretrained = backup_CLIPTextModel
CLIPTokenizer.from_pretrained = backup_CLIPTokenizer
return
def load_checkpoint_guess_config(sd, output_vae=True, output_clip=True, output_clipvision=False, embedding_directory=None, output_model=True):
sd_keys = sd.keys()
clip = None
clipvision = None
vae = None
model = None
model_patcher = None
clip_target = None
parameters = ldm_patched.modules.utils.calculate_parameters(sd, "model.diffusion_model.")
unet_dtype = model_management.unet_dtype(model_params=parameters)
load_device = model_management.get_torch_device()
manual_cast_dtype = model_management.unet_manual_cast(unet_dtype, load_device)
class WeightsLoader(torch.nn.Module):
pass
model_config = model_detection.model_config_from_unet(sd, "model.diffusion_model.", unet_dtype)
model_config.set_manual_cast(manual_cast_dtype)
if model_config is None:
raise RuntimeError("ERROR: Could not detect model type")
if model_config.clip_vision_prefix is not None:
if output_clipvision:
clipvision = ldm_patched.modules.clip_vision.load_clipvision_from_sd(sd, model_config.clip_vision_prefix, True)
if output_model:
inital_load_device = model_management.unet_inital_load_device(parameters, unet_dtype)
offload_device = model_management.unet_offload_device()
model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device)
model.load_model_weights(sd, "model.diffusion_model.")
if output_vae:
vae_sd = ldm_patched.modules.utils.state_dict_prefix_replace(sd, {"first_stage_model.": ""}, filter_keys=True)
vae_sd = model_config.process_vae_state_dict(vae_sd)
vae = VAE(sd=vae_sd)
if output_clip:
w = WeightsLoader()
clip_target = model_config.clip_target()
if clip_target is not None:
clip = CLIP(clip_target, embedding_directory=embedding_directory)
w.cond_stage_model = clip.cond_stage_model
sd = model_config.process_clip_state_dict(sd)
load_model_weights(w, sd)
left_over = sd.keys()
if len(left_over) > 0:
print("left over keys:", left_over)
if output_model:
model_patcher = ldm_patched.modules.model_patcher.ModelPatcher(model, load_device=load_device, offload_device=model_management.unet_offload_device(), current_device=inital_load_device)
if inital_load_device != torch.device("cpu"):
print("loaded straight to GPU")
model_management.load_model_gpu(model_patcher)
return ForgeSD(model_patcher, clip, vae, clipvision)
def load_model_for_a1111(timer, checkpoint_info=None, state_dict=None):
a1111_config_filename = find_checkpoint_config(state_dict, checkpoint_info)
a1111_config = OmegaConf.load(a1111_config_filename)
timer.record("forge solving config")
if hasattr(a1111_config.model.params, 'network_config'):
a1111_config.model.params.network_config.target = 'modules_forge.forge_loader.FakeObject'
if hasattr(a1111_config.model.params, 'unet_config'):
a1111_config.model.params.unet_config.target = 'modules_forge.forge_loader.FakeObject'
if hasattr(a1111_config.model.params, 'first_stage_config'):
a1111_config.model.params.first_stage_config.target = 'modules_forge.forge_loader.FakeObject'
with no_clip():
sd_model = instantiate_from_config(a1111_config.model)
timer.record("forge instantiate config")
forge_objects = load_checkpoint_guess_config(
state_dict,
output_vae=True,
output_clip=True,
output_clipvision=True,
embedding_directory=cmd_opts.embeddings_dir,
output_model=True
)
sd_model.forge_objects = forge_objects
sd_model.forge_objects_original = forge_objects.shallow_copy()
sd_model.forge_objects_after_applying_lora = forge_objects.shallow_copy()
timer.record("forge load real models")
sd_model.first_stage_model = forge_objects.vae.first_stage_model
sd_model.model.diffusion_model = forge_objects.unet.model.diffusion_model
conditioner = getattr(sd_model, 'conditioner', None)
if conditioner:
text_cond_models = []
for i in range(len(conditioner.embedders)):
embedder = conditioner.embedders[i]
typename = type(embedder).__name__
if typename == 'FrozenCLIPEmbedder': # SDXL Clip L
embedder.tokenizer = forge_objects.clip.tokenizer.clip_l.tokenizer
embedder.transformer = forge_objects.clip.cond_stage_model.clip_l.transformer
model_embeddings = embedder.transformer.text_model.embeddings
model_embeddings.token_embedding = sd_hijack.EmbeddingsWithFixes(
model_embeddings.token_embedding, sd_hijack.model_hijack)
embedder = forge_clip.CLIP_SD_XL_L(embedder, sd_hijack.model_hijack)
embedder.forge_objects = forge_objects
conditioner.embedders[i] = embedder
text_cond_models.append(embedder)
elif typename == 'FrozenOpenCLIPEmbedder2': # SDXL Clip G
embedder.tokenizer = forge_objects.clip.tokenizer.clip_g.tokenizer
embedder.transformer = forge_objects.clip.cond_stage_model.clip_g.transformer
embedder.text_projection = forge_objects.clip.cond_stage_model.clip_g.text_projection
model_embeddings = embedder.transformer.text_model.embeddings
model_embeddings.token_embedding = sd_hijack.EmbeddingsWithFixes(
model_embeddings.token_embedding, sd_hijack.model_hijack, textual_inversion_key='clip_g')
embedder = forge_clip.CLIP_SD_XL_G(embedder, sd_hijack.model_hijack)
embedder.forge_objects = forge_objects
conditioner.embedders[i] = embedder
text_cond_models.append(embedder)
if len(text_cond_models) == 1:
sd_model.cond_stage_model = text_cond_models[0]
else:
sd_model.cond_stage_model = conditioner
elif type(sd_model.cond_stage_model).__name__ == 'FrozenCLIPEmbedder': # SD15 Clip
sd_model.cond_stage_model.tokenizer = forge_objects.clip.tokenizer.clip_l.tokenizer
sd_model.cond_stage_model.transformer = forge_objects.clip.cond_stage_model.clip_l.transformer
model_embeddings = sd_model.cond_stage_model.transformer.text_model.embeddings
model_embeddings.token_embedding = sd_hijack.EmbeddingsWithFixes(
model_embeddings.token_embedding, sd_hijack.model_hijack)
sd_model.cond_stage_model = forge_clip.CLIP_SD_15_L(sd_model.cond_stage_model, sd_hijack.model_hijack)
sd_model.cond_stage_model.forge_objects = forge_objects
elif type(sd_model.cond_stage_model).__name__ == 'FrozenOpenCLIPEmbedder': # SD21 Clip
sd_model.cond_stage_model.tokenizer = forge_objects.clip.tokenizer.clip_h.tokenizer
sd_model.cond_stage_model.transformer = forge_objects.clip.cond_stage_model.clip_h.transformer
model_embeddings = sd_model.cond_stage_model.transformer.text_model.embeddings
model_embeddings.token_embedding = sd_hijack.EmbeddingsWithFixes(
model_embeddings.token_embedding, sd_hijack.model_hijack)
sd_model.cond_stage_model = forge_clip.CLIP_SD_21_H(sd_model.cond_stage_model, sd_hijack.model_hijack)
sd_model.cond_stage_model.forge_objects = forge_objects
else:
raise NotImplementedError('Bad Clip Class Name:' + type(sd_model.cond_stage_model).__name__)
timer.record("forge set components")
sd_model_hash = checkpoint_info.calculate_shorthash()
timer.record("calculate hash")
sd_model.is_sdxl = conditioner is not None
sd_model.is_sd2 = not sd_model.is_sdxl and hasattr(sd_model.cond_stage_model, 'model')
sd_model.is_sd1 = not sd_model.is_sdxl and not sd_model.is_sd2
sd_model.is_ssd = sd_model.is_sdxl and 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_q.weight' not in sd_model.state_dict().keys()
if sd_model.is_sdxl:
extend_sdxl(sd_model)
sd_model.sd_model_hash = sd_model_hash
sd_model.sd_model_checkpoint = checkpoint_info.filename
sd_model.sd_checkpoint_info = checkpoint_info
def patched_decode_first_stage(x):
sample = forge_objects.unet.model.model_config.latent_format.process_out(x)
sample = forge_objects.vae.decode(sample).movedim(-1, 1) * 2.0 - 1.0
return sample.to(x)
def patched_encode_first_stage(x):
sample = forge_objects.vae.encode(x.movedim(1, -1) * 0.5 + 0.5)
sample = forge_objects.unet.model.model_config.latent_format.process_in(sample)
return sample.to(x)
sd_model.ema_scope = lambda *args, **kwargs: contextlib.nullcontext()
sd_model.get_first_stage_encoding = lambda x: x
sd_model.decode_first_stage = patched_decode_first_stage
sd_model.encode_first_stage = patched_encode_first_stage
sd_model.clip = sd_model.cond_stage_model
sd_model.current_controlnet_required_memory = 0
timer.record("forge finalize")
sd_model.current_lora_hash = str([])
return sd_model

View File

@@ -0,0 +1,49 @@
import torch
from ldm_patched.modules.conds import CONDRegular, CONDCrossAttn
from ldm_patched.modules.samplers import sampling_function
def cond_from_a1111_to_patched_ldm(cond):
if isinstance(cond, torch.Tensor):
result = dict(
cross_attn=cond,
model_conds=dict(
c_crossattn=CONDCrossAttn(cond),
)
)
return [result, ]
cross_attn = cond['crossattn']
pooled_output = cond['vector']
result = dict(
cross_attn=cross_attn,
pooled_output=pooled_output,
model_conds=dict(
c_crossattn=CONDCrossAttn(cross_attn),
y=CONDRegular(pooled_output)
)
)
return [result, ]
def forge_sample(self, denoiser_params, cond_scale):
model = self.inner_model.inner_model.forge_objects.unet.model
x = denoiser_params.x
timestep = denoiser_params.sigma
uncond = cond_from_a1111_to_patched_ldm(denoiser_params.text_uncond)
cond = cond_from_a1111_to_patched_ldm(denoiser_params.text_cond)
model_options = self.inner_model.inner_model.forge_objects.unet.model_options
seed = self.p.seeds[0]
image_cond_in = denoiser_params.image_cond
if isinstance(image_cond_in, torch.Tensor):
if image_cond_in.shape[0] == x.shape[0] \
and image_cond_in.shape[2] == x.shape[2] \
and image_cond_in.shape[3] == x.shape[3]:
uncond[0]['model_conds']['c_concat'] = CONDRegular(image_cond_in)
cond[0]['model_conds']['c_concat'] = CONDRegular(image_cond_in)
denoised = sampling_function(model, x, timestep, uncond, cond, cond_scale, model_options, seed)
return denoised

View File

@@ -0,0 +1,70 @@
import torch
import numpy as np
import os
import time
import random
import string
def generate_random_filename(extension=".txt"):
timestamp = time.strftime("%Y%m%d-%H%M%S")
random_string = ''.join(random.choices(string.ascii_lowercase + string.digits, k=5))
filename = f"{timestamp}-{random_string}{extension}"
return filename
@torch.no_grad()
@torch.inference_mode()
def pytorch_to_numpy(x):
return [np.clip(255. * y.cpu().numpy(), 0, 255).astype(np.uint8) for y in x]
@torch.no_grad()
@torch.inference_mode()
def numpy_to_pytorch(x):
y = x.astype(np.float32) / 255.0
y = y[None]
y = np.ascontiguousarray(y.copy())
y = torch.from_numpy(y).float()
return y
def write_images_to_mp4(frame_list: list, filename=None, fps=6):
from modules.paths_internal import default_output_dir
video_folder = os.path.join(default_output_dir, 'svd')
os.makedirs(video_folder, exist_ok=True)
if filename is None:
filename = generate_random_filename('.mp4')
full_path = os.path.join(video_folder, filename)
try:
import av
except ImportError:
from launch import run_pip
run_pip(
"install imageio[pyav]",
"imageio[pyav]",
)
import av
options = {
"crf": str(23)
}
output = av.open(full_path, "w")
stream = output.add_stream('libx264', fps, options=options)
stream.width = frame_list[0].shape[1]
stream.height = frame_list[0].shape[0]
for img in frame_list:
frame = av.VideoFrame.from_ndarray(img)
packet = stream.encode(frame)
output.mux(packet)
packet = stream.encode(None)
output.mux(packet)
output.close()
return full_path

View File

@@ -0,0 +1 @@
version = '0.0.1'

View File

@@ -0,0 +1,54 @@
def gradio_compile(items, prefix):
names = []
for k, v in items["required"].items():
t = v[0]
d = v[1] if len(v) > 1 else None
if prefix != '':
name = (prefix + '_' + k).replace(' ', '_').lower()
else:
name = k.replace(' ', '_').lower()
title = name.replace('_', ' ').title()
if t == 'INT':
default = int(d['default'])
min = int(d['min'])
max = int(d['max'])
step = int(d.get('step', 1))
print(f'{name} = gr.Slider(label=\'{title}\', minimum={min}, maximum={max}, step={step}, value={default})')
names.append(name)
elif t == 'FLOAT':
default = float(d['default'])
min = float(d['min'])
max = float(d['max'])
step = float(d.get('step', 0.001))
print(f'{name} = gr.Slider(label=\'{title}\', minimum={min}, maximum={max}, step={step}, value={default})')
names.append(name)
elif isinstance(t, list):
print(f'{name} = gr.Radio(label=\'{title}\', choices={str(t)}, value=\'{t[0]}\')')
names.append(name)
elif t == 'MODEL':
pass
elif t == 'CONDITIONING':
pass
elif t == 'LATENT':
pass
elif t == 'CLIP_VISION':
pass
elif t == 'IMAGE':
pass
elif t == 'VAE':
pass
else:
print('error ' + str(t))
return names
# from modules_forge.gradio_compile import gradio_compile
# ps = []
# ps += gradio_compile(SVD_img2vid_Conditioning.INPUT_TYPES(), prefix='')
# ps += gradio_compile(KSampler.INPUT_TYPES(), prefix='sampling')
# ps += gradio_compile(VideoLinearCFGGuidance.INPUT_TYPES(), prefix='guidance')
# print(', '.join(ps))

View File

@@ -0,0 +1,24 @@
def initialize_forge():
from ldm_patched.modules import args_parser
args_parser.args, _ = args_parser.parser.parse_known_args()
import ldm_patched.modules.model_management as model_management
import torch
device = model_management.get_torch_device()
torch.zeros((1, 1)).to(device, torch.float32)
model_management.soft_empty_cache()
import modules_forge.patch_clip
modules_forge.patch_clip.patch_all_clip()
import modules_forge.patch_precision
modules_forge.patch_precision.patch_all_precision()
if model_management.directml_enabled:
model_management.lowvram_available = True
model_management.OOM_EXCEPTION = Exception
return

19
modules_forge/ops.py Normal file
View File

@@ -0,0 +1,19 @@
import torch
import contextlib
@contextlib.contextmanager
def use_patched_ops(operations):
op_names = ['Linear', 'Conv2d', 'Conv3d', 'GroupNorm', 'LayerNorm']
backups = {op_name: getattr(torch.nn, op_name) for op_name in op_names}
try:
for op_name in op_names:
setattr(torch.nn, op_name, getattr(operations, op_name))
yield
finally:
for op_name in op_names:
setattr(torch.nn, op_name, backups[op_name])
return

112
modules_forge/patch_clip.py Normal file
View File

@@ -0,0 +1,112 @@
# Consistent with Kohya/A1111 to reduce differences between model training and inference.
import os
import torch
import ldm_patched.controlnet.cldm
import ldm_patched.k_diffusion.sampling
import ldm_patched.ldm.modules.attention
import ldm_patched.ldm.modules.diffusionmodules.model
import ldm_patched.ldm.modules.diffusionmodules.openaimodel
import ldm_patched.ldm.modules.diffusionmodules.openaimodel
import ldm_patched.modules.args_parser
import ldm_patched.modules.model_base
import ldm_patched.modules.model_management
import ldm_patched.modules.model_patcher
import ldm_patched.modules.samplers
import ldm_patched.modules.sd
import ldm_patched.modules.sd1_clip
import ldm_patched.modules.clip_vision
import ldm_patched.modules.ops as ops
from modules_forge.ops import use_patched_ops
from transformers import CLIPTextModel, CLIPTextConfig, modeling_utils
def patched_SDClipModel__init__(self, max_length=77, freeze=True, layer="last", layer_idx=None,
textmodel_json_config=None, dtype=None, special_tokens=None,
layer_norm_hidden_state=True, **kwargs):
torch.nn.Module.__init__(self)
assert layer in self.LAYERS
if special_tokens is None:
special_tokens = {"start": 49406, "end": 49407, "pad": 49407}
if textmodel_json_config is None:
textmodel_json_config = os.path.join(os.path.dirname(os.path.realpath(ldm_patched.modules.sd1_clip.__file__)),
"sd1_clip_config.json")
config = CLIPTextConfig.from_json_file(textmodel_json_config)
self.num_layers = config.num_hidden_layers
with use_patched_ops(ops.manual_cast):
with modeling_utils.no_init_weights():
self.transformer = CLIPTextModel(config)
if dtype is not None:
self.transformer.to(dtype)
self.transformer.text_model.embeddings.to(torch.float32)
if freeze:
self.freeze()
self.max_length = max_length
self.layer = layer
self.layer_idx = None
self.special_tokens = special_tokens
self.text_projection = torch.nn.Parameter(torch.eye(self.transformer.get_input_embeddings().weight.shape[1]))
self.logit_scale = torch.nn.Parameter(torch.tensor(4.6055))
self.enable_attention_masks = False
self.layer_norm_hidden_state = layer_norm_hidden_state
if layer == "hidden":
assert layer_idx is not None
assert abs(layer_idx) < self.num_layers
self.clip_layer(layer_idx)
self.layer_default = (self.layer, self.layer_idx)
def patched_SDClipModel_forward(self, tokens):
backup_embeds = self.transformer.get_input_embeddings()
device = backup_embeds.weight.device
tokens = self.set_up_textual_embeddings(tokens, backup_embeds)
tokens = torch.LongTensor(tokens).to(device)
attention_mask = None
if self.enable_attention_masks:
attention_mask = torch.zeros_like(tokens)
max_token = self.transformer.get_input_embeddings().weight.shape[0] - 1
for x in range(attention_mask.shape[0]):
for y in range(attention_mask.shape[1]):
attention_mask[x, y] = 1
if tokens[x, y] == max_token:
break
outputs = self.transformer(input_ids=tokens, attention_mask=attention_mask,
output_hidden_states=self.layer == "hidden")
self.transformer.set_input_embeddings(backup_embeds)
if self.layer == "last":
z = outputs.last_hidden_state
elif self.layer == "pooled":
z = outputs.pooler_output[:, None, :]
else:
z = outputs.hidden_states[self.layer_idx]
if self.layer_norm_hidden_state:
z = self.transformer.text_model.final_layer_norm(z)
if hasattr(outputs, "pooler_output"):
pooled_output = outputs.pooler_output.float()
else:
pooled_output = None
if self.text_projection is not None and pooled_output is not None:
pooled_output = pooled_output.float().to(self.text_projection.device) @ self.text_projection.float()
return z.float(), pooled_output
def patch_all_clip():
ldm_patched.modules.sd1_clip.SDClipModel.__init__ = patched_SDClipModel__init__
ldm_patched.modules.sd1_clip.SDClipModel.forward = patched_SDClipModel_forward
return

View File

@@ -0,0 +1,60 @@
# Consistent with Kohya to reduce differences between model training and inference.
import torch
import math
import einops
import numpy as np
import ldm_patched.ldm.modules.diffusionmodules.openaimodel
import ldm_patched.modules.model_sampling
import ldm_patched.modules.sd1_clip
from ldm_patched.ldm.modules.diffusionmodules.util import make_beta_schedule
def patched_timestep_embedding(timesteps, dim, max_period=10000, repeat_only=False):
# Consistent with Kohya to reduce differences between model training and inference.
if not repeat_only:
half = dim // 2
freqs = torch.exp(
-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half
).to(device=timesteps.device)
args = timesteps[:, None].float() * freqs[None]
embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
if dim % 2:
embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
else:
embedding = einops.repeat(timesteps, 'b -> b d', d=dim)
return embedding
def patched_register_schedule(self, given_betas=None, beta_schedule="linear", timesteps=1000,
linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3):
# Consistent with Kohya to reduce differences between model training and inference.
if given_betas is not None:
betas = given_betas
else:
betas = make_beta_schedule(
beta_schedule,
timesteps,
linear_start=linear_start,
linear_end=linear_end,
cosine_s=cosine_s)
alphas = 1. - betas
alphas_cumprod = np.cumprod(alphas, axis=0)
timesteps, = betas.shape
self.num_timesteps = int(timesteps)
self.linear_start = linear_start
self.linear_end = linear_end
sigmas = torch.tensor(((1 - alphas_cumprod) / alphas_cumprod) ** 0.5, dtype=torch.float32)
self.set_sigmas(sigmas)
return
def patch_all_precision():
ldm_patched.ldm.modules.diffusionmodules.openaimodel.timestep_embedding = patched_timestep_embedding
ldm_patched.modules.model_sampling.ModelSamplingDiscrete._register_schedule = patched_register_schedule
return

View File

@@ -5,4 +5,14 @@ set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=
@REM Uncomment following code to reference an existing A1111 checkout.
@REM set A1111_HOME=Your A1111 checkout dir
@REM
@REM set VENV_DIR=%A1111_HOME%/venv
@REM set COMMANDLINE_ARGS=%COMMANDLINE_ARGS%^
@REM --ckpt-dir %A1111_HOME%/models/Stable-diffusion^
@REM --hypernetwork-dir %A1111_HOME%/models/hypernetworks^
@REM --embeddings-dir %A1111_HOME%/models/embeddings^
@REM --lora-dir %A1111_HOME%/models/Lora
call webui.bat