Merge branch 'main' into dev

2026-03-13 00:49:48 +00:00 · 2024-01-27 15:48:30 -08:00
parent 67100a3dfe f78e5f783f
commit 40518feb6f
66 changed files with 2746 additions and 2892 deletions
--- a/13
+++ b/13
@@ -1,12 +1 @@
-*       @AUTOMATIC1111
-
-# if you were managing a localization and were removed from this file, this is because
-# the intended way to do localizations now is via extensions. See:
-# https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Developing-extensions
-# Make a repo with your localization and since you are still listed as a collaborator
-# you can add it to the wiki page yourself. This change is because some people complained
-# the git commit log is cluttered with things unrelated to almost everyone and
-# because I believe this is the best overall for the project to handle localizations almost
-# entirely without my oversight.
-
-
+*       @lllyasviel
--- a/README.md
+++ b/README.md
@@ -1,182 +1,370 @@
-# Stable Diffusion web UI
-A web interface for Stable Diffusion, implemented using Gradio library.
-
-![](screenshot.png)
-
-## Features
-[Detailed feature showcase with images](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features):
- Original txt2img and img2img modes
- One click install and run script (but you still must install python and git)
- Outpainting
- Inpainting
- Color Sketch
- Prompt Matrix
- Stable Diffusion Upscale
- Attention, specify parts of text that the model should pay more attention to
-    - a man in a `((tuxedo))` - will pay more attention to tuxedo
-    - a man in a `(tuxedo:1.21)` - alternative syntax
-    - select text and press `Ctrl+Up` or `Ctrl+Down` (or `Command+Up` or `Command+Down` if you're on a MacOS) to automatically adjust attention to selected text (code contributed by anonymous user)
- Loopback, run img2img processing multiple times
- X/Y/Z plot, a way to draw a 3 dimensional plot of images with different parameters
- Textual Inversion
-    - have as many embeddings as you want and use any names you like for them
-    - use multiple embeddings with different numbers of vectors per token
-    - works with half precision floating point numbers
-    - train embeddings on 8GB (also reports of 6GB working)
- Extras tab with:
-    - GFPGAN, neural network that fixes faces
-    - CodeFormer, face restoration tool as an alternative to GFPGAN
-    - RealESRGAN, neural network upscaler
-    - ESRGAN, neural network upscaler with a lot of third party models
-    - SwinIR and Swin2SR ([see here](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2092)), neural network upscalers
-    - LDSR, Latent diffusion super resolution upscaling
- Resizing aspect ratio options
- Sampling method selection
-    - Adjust sampler eta values (noise multiplier)
-    - More advanced noise setting options
- Interrupt processing at any time
- 4GB video card support (also reports of 2GB working)
- Correct seeds for batches
- Live prompt token length validation
- Generation parameters
-     - parameters you used to generate images are saved with that image
-     - in PNG chunks for PNG, in EXIF for JPEG
-     - can drag the image to PNG info tab to restore generation parameters and automatically copy them into UI
-     - can be disabled in settings
-     - drag and drop an image/text-parameters to promptbox
- Read Generation Parameters Button, loads parameters in promptbox to UI
- Settings page
- Running arbitrary python code from UI (must run with `--allow-code` to enable)
- Mouseover hints for most UI elements
- Possible to change defaults/mix/max/step values for UI elements via text config
- Tiling support, a checkbox to create images that can be tiled like textures
- Progress bar and live image generation preview
-    - Can use a separate neural network to produce previews with almost none VRAM or compute requirement
- Negative prompt, an extra text field that allows you to list what you don't want to see in generated image
- Styles, a way to save part of prompt and easily apply them via dropdown later
- Variations, a way to generate same image but with tiny differences
- Seed resizing, a way to generate same image but at slightly different resolution
- CLIP interrogator, a button that tries to guess prompt from an image
- Prompt Editing, a way to change prompt mid-generation, say to start making a watermelon and switch to anime girl midway
- Batch Processing, process a group of files using img2img
- Img2img Alternative, reverse Euler method of cross attention control
- Highres Fix, a convenience option to produce high resolution pictures in one click without usual distortions
- Reloading checkpoints on the fly
- Checkpoint Merger, a tab that allows you to merge up to 3 checkpoints into one
- [Custom scripts](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Custom-Scripts) with many extensions from community
- [Composable-Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/), a way to use multiple prompts at once
-     - separate prompts using uppercase `AND`
-     - also supports weights for prompts: `a cat :1.2 AND a dog AND a penguin :2.2`
- No token limit for prompts (original stable diffusion lets you use up to 75 tokens)
- DeepDanbooru integration, creates danbooru style tags for anime prompts
- [xformers](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Xformers), major speed increase for select cards: (add `--xformers` to commandline args)
- via extension: [History tab](https://github.com/yfszzx/stable-diffusion-webui-images-browser): view, direct and delete images conveniently within the UI
- Generate forever option
- Training tab
-     - hypernetworks and embeddings options
-     - Preprocessing images: cropping, mirroring, autotagging using BLIP or deepdanbooru (for anime)
- Clip skip
- Hypernetworks
- Loras (same as Hypernetworks but more pretty)
- A separate UI where you can choose, with preview, which embeddings, hypernetworks or Loras to add to your prompt 
- Can select to load a different VAE from settings screen
- Estimated completion time in progress bar
- API
- Support for dedicated [inpainting model](https://github.com/runwayml/stable-diffusion#inpainting-with-stable-diffusion) by RunwayML
- via extension: [Aesthetic Gradients](https://github.com/AUTOMATIC1111/stable-diffusion-webui-aesthetic-gradients), a way to generate images with a specific aesthetic by using clip images embeds (implementation of [https://github.com/vicgalle/stable-diffusion-aesthetic-gradients](https://github.com/vicgalle/stable-diffusion-aesthetic-gradients))
- [Stable Diffusion 2.0](https://github.com/Stability-AI/stablediffusion) support - see [wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#stable-diffusion-20) for instructions
- [Alt-Diffusion](https://arxiv.org/abs/2211.06679) support - see [wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#alt-diffusion) for instructions
- Now without any bad letters!
- Load checkpoints in safetensors format
- Eased resolution restriction: generated image's dimensions must be a multiple of 8 rather than 64
- Now with a license!
- Reorder elements in the UI from settings screen
- [Segmind Stable Diffusion](https://huggingface.co/segmind/SSD-1B) support
-
-## Installation and Running
-Make sure the required [dependencies](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Dependencies) are met and follow the instructions available for:
- [NVidia](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-NVidia-GPUs) (recommended)
- [AMD](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs) GPUs.
- [Intel CPUs, Intel GPUs (both integrated and discrete)](https://github.com/openvinotoolkit/stable-diffusion-webui/wiki/Installation-on-Intel-Silicon) (external wiki page)
-
-Alternatively, use online services (like Google Colab):
-
- [List of Online Services](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Online-Services)
-
-### Installation on Windows 10/11 with NVidia-GPUs using release package
-1. Download `sd.webui.zip` from [v1.0.0-pre](https://github.com/AUTOMATIC1111/stable-diffusion-webui/releases/tag/v1.0.0-pre) and extract its contents.
-2. Run `update.bat`.
-3. Run `run.bat`.
-> For more details see [Install-and-Run-on-NVidia-GPUs](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-NVidia-GPUs)
-
-### Automatic Installation on Windows
-1. Install [Python 3.10.6](https://www.python.org/downloads/release/python-3106/) (Newer version of Python does not support torch), checking "Add Python to PATH".
-2. Install [git](https://git-scm.com/download/win).
-3. Download the stable-diffusion-webui repository, for example by running `git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git`.
-4. Run `webui-user.bat` from Windows Explorer as normal, non-administrator, user.
-
-### Automatic Installation on Linux
-1. Install the dependencies:
-```bash
-# Debian-based:
-sudo apt install wget git python3 python3-venv libgl1 libglib2.0-0
-# Red Hat-based:
-sudo dnf install wget git python3 gperftools-libs libglvnd-glx 
-# openSUSE-based:
-sudo zypper install wget git python3 libtcmalloc4 libglvnd
-# Arch-based:
-sudo pacman -S wget git python3
-```
-2. Navigate to the directory you would like the webui to be installed and execute the following command:
-```bash
-wget -q https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh
-```
-3. Run `webui.sh`.
-4. Check `webui-user.sh` for options.
-### Installation on Apple Silicon
-
-Find the instructions [here](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon).
-
-## Contributing
-Here's how to add code to this repo: [Contributing](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Contributing)
-
-## Documentation
-
-The documentation was moved from this README over to the project's [wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki).
-
-For the purposes of getting Google and other search engines to crawl the wiki, here's a link to the (not for humans) [crawlable wiki](https://github-wiki-see.page/m/AUTOMATIC1111/stable-diffusion-webui/wiki).
-
-## Credits
-Licenses for borrowed code can be found in `Settings -> Licenses` screen, and also in `html/licenses.html` file.
-
- Stable Diffusion - https://github.com/Stability-AI/stablediffusion, https://github.com/CompVis/taming-transformers
- k-diffusion - https://github.com/crowsonkb/k-diffusion.git
- Spandrel - https://github.com/chaiNNer-org/spandrel implementing
-  - GFPGAN - https://github.com/TencentARC/GFPGAN.git
-  - CodeFormer - https://github.com/sczhou/CodeFormer
-  - ESRGAN - https://github.com/xinntao/ESRGAN
-  - SwinIR - https://github.com/JingyunLiang/SwinIR
-  - Swin2SR - https://github.com/mv-lab/swin2sr
- LDSR - https://github.com/Hafiidz/latent-diffusion
- MiDaS - https://github.com/isl-org/MiDaS
- Ideas for optimizations - https://github.com/basujindal/stable-diffusion
- Cross Attention layer optimization - Doggettx - https://github.com/Doggettx/stable-diffusion, original idea for prompt editing.
- Cross Attention layer optimization - InvokeAI, lstein - https://github.com/invoke-ai/InvokeAI (originally http://github.com/lstein/stable-diffusion)
- Sub-quadratic Cross Attention layer optimization - Alex Birch (https://github.com/Birch-san/diffusers/pull/1), Amin Rezaei (https://github.com/AminRezaei0x443/memory-efficient-attention)
- Textual Inversion - Rinon Gal - https://github.com/rinongal/textual_inversion (we're not using his code, but we are using his ideas).
- Idea for SD upscale - https://github.com/jquesnelle/txt2imghd
- Noise generation for outpainting mk2 - https://github.com/parlance-zz/g-diffuser-bot
- CLIP interrogator idea and borrowing some code - https://github.com/pharmapsychotic/clip-interrogator
- Idea for Composable Diffusion - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
- xformers - https://github.com/facebookresearch/xformers
- DeepDanbooru - interrogator for anime diffusers https://github.com/KichangKim/DeepDanbooru
- Sampling in float32 precision from a float16 UNet - marunine for the idea, Birch-san for the example Diffusers implementation (https://github.com/Birch-san/diffusers-play/tree/92feee6)
- Instruct pix2pix - Tim Brooks (star), Aleksander Holynski (star), Alexei A. Efros (no star) - https://github.com/timothybrooks/instruct-pix2pix
- Security advice - RyotaK
- UniPC sampler - Wenliang Zhao - https://github.com/wl-zhao/UniPC
- TAESD - Ollin Boer Bohan - https://github.com/madebyollin/taesd
- LyCORIS - KohakuBlueleaf
- Restart sampling - lambertae - https://github.com/Newbeeer/diffusion_restart_sampling
- Hypertile - tfernd - https://github.com/tfernd/HyperTile
- Initial Gradio script - posted on 4chan by an Anonymous user. Thank you Anonymous user.
- (You)
+# This is a Private Project
+
+Currently, we are only sending invitations to people who may be interested in development of this project.
+
+Please do not share codes or info from this project to public.
+
+If you see this, please join our private Discord server for discussion: https://discord.gg/2uvDhfAZ
+
+# Stable Diffusion Web UI Forge
+
+Stable Diffusion Web UI Forge is a platform on top of [Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to make development easier, and optimize the speed and resource consumption.
+
+The name "Forge" is inspired from "Minecraft Forge". This project will become SD WebUI's Forge.
+
+Forge will give you:
+
+1. Improved optimization. (Fastest speed and minimal memory use among all alternative software.)
+2. Patchable UNet and CLIP objects. (Developer-friendly platform.)
+
+# Improved Optimization
+
+I tested with several devices, and this is a typical result from 8GB VRAM (3070ti laptop) with SDXL.
+
+**This is WebUI:**
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/c32baedd-500b-408f-8cfb-ed4570c883bd)
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/cb6098de-f2d4-4b25-9566-df4302dda396)
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/5447e8b7-f3ca-4003-9961-02027c8181e8)
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/f3cb57d9-ac7a-4667-8b3f-303139e38afa)
+
+(average about 7.4GB/8GB, peak at about 7.9GB/8GB)
+
+**This is WebUI Forge:**
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/0c45cd98-0b14-42c3-9556-28e48d4d5fa0)
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/3a71f5d4-39e5-4ab1-81cf-8eaa790a2dc8)
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/65fbb4a5-ee73-4bb9-9c5f-8a958cd9674d)
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/76f181a1-c5fb-4323-a6cc-b6308a45587e)
+
+(average and peak are all 6.3GB/8GB)
+
+Also, you can see that Forge does not change WebUI results. Installing Forge is not a seed breaking change. 
+
+We do not change any UI. But you will see the version of Forge here
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/93fdbccf-2f9b-4d45-ad81-c7c4106a357b)
+
+"f0.0.1v1.7.0" means WebUI 1.7.0 with Forge 0.0.1
+
+### Changes
+
+Forge removes all WebUI's codes related to speed and memory optimization and reworked everything. All previous cmd flags like medvram, lowvram, medvram-sdxl, precision full, no half, no half vae, attention_xxx, upcast unet, ... are all REMOVED. Adding these flags will not cause error but they will not do anything now. **We highly encourage Forge users to remove all cmd flags and let Forge to decide how to load models.**
+
+Without any cmd flag, Forge can run SDXL with 4GB vram and SD1.5 with 2GB vram.
+
+**The only one flag that you may still need** is `--always-offload-from-vram` (This flag will make things **slower**). This option will let Forge always unload models from VRAM. This can be useful is you use multiple software together and want Forge to use less VRAM and give some vram to other software, or when you are using some old extensions that will compete vram with main UI, or (very rarely) when you get OOM.
+
+If you really want to play with cmd flags, you can additionally control the GPU with:
+
+(extreme VRAM cases)
+
+    --always-gpu
+    --always-cpu
+
+(rare attention cases)
+
+    --attention-split
+    --attention-quad
+    --attention-pytorch
+    --disable-xformers
+    --disable-attention-upcast
+
+(float point type)
+
+    --all-in-fp32
+    --all-in-fp16
+    --unet-in-bf16
+    --unet-in-fp16
+    --unet-in-fp8-e4m3fn
+    --unet-in-fp8-e5m2
+    --vae-in-fp16
+    --vae-in-fp32
+    --vae-in-bf16
+    --clip-in-fp8-e4m3fn
+    --clip-in-fp8-e5m2
+    --clip-in-fp16
+    --clip-in-fp32
+
+(rare platforms)
+
+    --directml
+    --disable-ipex-hijack
+    --pytorch-deterministic
+
+Again, Forge do not recommend users to use any cmd flags unless you are very sure that you really need these.
+
+# Patchable UNet
+
+Now developing an extension is super simple. We finally have patchable UNet.
+
+Below is using one single file with 80 lines of codes to support FreeU:
+
+`extensions-builtin/sd_forge_freeu/scripts/forge_freeu.py`
+
+```python
+import torch
+import gradio as gr
+from modules import scripts
+
+
+def Fourier_filter(x, threshold, scale):
+    x_freq = torch.fft.fftn(x.float(), dim=(-2, -1))
+    x_freq = torch.fft.fftshift(x_freq, dim=(-2, -1))
+    B, C, H, W = x_freq.shape
+    mask = torch.ones((B, C, H, W), device=x.device)
+    crow, ccol = H // 2, W //2
+    mask[..., crow - threshold:crow + threshold, ccol - threshold:ccol + threshold] = scale
+    x_freq = x_freq * mask
+    x_freq = torch.fft.ifftshift(x_freq, dim=(-2, -1))
+    x_filtered = torch.fft.ifftn(x_freq, dim=(-2, -1)).real
+    return x_filtered.to(x.dtype)
+
+
+def set_freeu_v2_patch(model, b1, b2, s1, s2):
+    model_channels = model.model.model_config.unet_config["model_channels"]
+    scale_dict = {model_channels * 4: (b1, s1), model_channels * 2: (b2, s2)}
+
+    def output_block_patch(h, hsp, *args, **kwargs):
+        scale = scale_dict.get(h.shape[1], None)
+        if scale is not None:
+            hidden_mean = h.mean(1).unsqueeze(1)
+            B = hidden_mean.shape[0]
+            hidden_max, _ = torch.max(hidden_mean.view(B, -1), dim=-1, keepdim=True)
+            hidden_min, _ = torch.min(hidden_mean.view(B, -1), dim=-1, keepdim=True)
+            hidden_mean = (hidden_mean - hidden_min.unsqueeze(2).unsqueeze(3)) / \
+                          (hidden_max - hidden_min).unsqueeze(2).unsqueeze(3)
+            h[:, :h.shape[1] // 2] = h[:, :h.shape[1] // 2] * ((scale[0] - 1) * hidden_mean + 1)
+            hsp = Fourier_filter(hsp, threshold=1, scale=scale[1])
+        return h, hsp
+
+    m = model.clone()
+    m.set_model_output_block_patch(output_block_patch)
+    return m
+
+
+class FreeUForForge(scripts.Script):
+    def title(self):
+        return "FreeU Integrated"
+
+    def show(self, is_img2img):
+        # make this extension visible in both txt2img and img2img tab.
+        return scripts.AlwaysVisible
+
+    def ui(self, *args, **kwargs):
+        with gr.Accordion(open=False, label=self.title()):
+            freeu_enabled = gr.Checkbox(label='Enabled', value=False)
+            freeu_b1 = gr.Slider(label='B1', minimum=0, maximum=2, step=0.01, value=1.01)
+            freeu_b2 = gr.Slider(label='B2', minimum=0, maximum=2, step=0.01, value=1.02)
+            freeu_s1 = gr.Slider(label='S1', minimum=0, maximum=4, step=0.01, value=0.99)
+            freeu_s2 = gr.Slider(label='S2', minimum=0, maximum=4, step=0.01, value=0.95)
+
+        return freeu_enabled, freeu_b1, freeu_b2, freeu_s1, freeu_s2
+
+    def process_batch(self, p, *script_args, **kwargs):
+        freeu_enabled, freeu_b1, freeu_b2, freeu_s1, freeu_s2 = script_args
+
+        if not freeu_enabled:
+            return
+
+        unet = p.sd_model.forge_objects.unet
+
+        unet = set_freeu_v2_patch(unet, freeu_b1, freeu_b2, freeu_s1, freeu_s2)
+
+        p.sd_model.forge_objects.unet = unet
+
+        # Below codes will add some logs to the texts below the image outputs on UI.
+        # The extra_generation_params does not influence results.
+        p.extra_generation_params.update(dict(
+            freeu_enabled=freeu_enabled,
+            freeu_b1=freeu_b1,
+            freeu_b2=freeu_b2,
+            freeu_s1=freeu_s1,
+            freeu_s2=freeu_s2,
+        ))
+
+        return
+```
+
+It looks like this:
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/a7798cf2-057c-43e0-883a-5f8643af8529)
+
+Similar components like HyperTile, KohyaHighResFix, SAG, can all be implemented within 100 lines of codes (see also the codes).
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/e2fc1b73-e6ee-405e-864c-c67afd92a1db)
+
+ControlNets can finally be called by different extensions. (80% codes of ControlNet can be removed now, will start soon)
+
+Implementing Stable Video Diffusion and Zero123 are also super simple now (see also the codes). 
+
+*Stable Video Diffusion:*
+
+`extensions-builtin/sd_forge_svd/scripts/forge_svd.py`
+
+```python
+import torch
+import gradio as gr
+import os
+import pathlib
+
+from modules import script_callbacks
+from modules.paths import models_path
+from modules.ui_common import ToolButton, refresh_symbol
+from modules import shared
+
+from modules_forge.forge_util import numpy_to_pytorch, pytorch_to_numpy
+from ldm_patched.modules.sd import load_checkpoint_guess_config
+from ldm_patched.contrib.external_video_model import VideoLinearCFGGuidance, SVD_img2vid_Conditioning
+from ldm_patched.contrib.external import KSampler, VAEDecode
+
+
+opVideoLinearCFGGuidance = VideoLinearCFGGuidance()
+opSVD_img2vid_Conditioning = SVD_img2vid_Conditioning()
+opKSampler = KSampler()
+opVAEDecode = VAEDecode()
+
+svd_root = os.path.join(models_path, 'svd')
+os.makedirs(svd_root, exist_ok=True)
+svd_filenames = []
+
+
+def update_svd_filenames():
+    global svd_filenames
+    svd_filenames = [
+        pathlib.Path(x).name for x in
+        shared.walk_files(svd_root, allowed_extensions=[".pt", ".ckpt", ".safetensors"])
+    ]
+    return svd_filenames
+
+
+@torch.inference_mode()
+@torch.no_grad()
+def predict(filename, width, height, video_frames, motion_bucket_id, fps, augmentation_level,
+            sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler,
+            sampling_denoise, guidance_min_cfg, input_image):
+    filename = os.path.join(svd_root, filename)
+    model_raw, _, vae, clip_vision = \
+        load_checkpoint_guess_config(filename, output_vae=True, output_clip=False, output_clipvision=True)
+    model = opVideoLinearCFGGuidance.patch(model_raw, guidance_min_cfg)[0]
+    init_image = numpy_to_pytorch(input_image)
+    positive, negative, latent_image = opSVD_img2vid_Conditioning.encode(
+        clip_vision, init_image, vae, width, height, video_frames, motion_bucket_id, fps, augmentation_level)
+    output_latent = opKSampler.sample(model, sampling_seed, sampling_steps, sampling_cfg,
+                                      sampling_sampler_name, sampling_scheduler, positive,
+                                      negative, latent_image, sampling_denoise)[0]
+    output_pixels = opVAEDecode.decode(vae, output_latent)[0]
+    outputs = pytorch_to_numpy(output_pixels)
+    return outputs
+
+
+def on_ui_tabs():
+    with gr.Blocks() as svd_block:
+        with gr.Row():
+            with gr.Column():
+                input_image = gr.Image(label='Input Image', source='upload', type='numpy', height=400)
+
+                with gr.Row():
+                    filename = gr.Dropdown(label="SVD Checkpoint Filename",
+                                           choices=svd_filenames,
+                                           value=svd_filenames[0] if len(svd_filenames) > 0 else None)
+                    refresh_button = ToolButton(value=refresh_symbol, tooltip="Refresh")
+                    refresh_button.click(
+                        fn=lambda: gr.update(choices=update_svd_filenames),
+                        inputs=[], outputs=filename)
+
+                width = gr.Slider(label='Width', minimum=16, maximum=8192, step=8, value=1024)
+                height = gr.Slider(label='Height', minimum=16, maximum=8192, step=8, value=576)
+                video_frames = gr.Slider(label='Video Frames', minimum=1, maximum=4096, step=1, value=14)
+                motion_bucket_id = gr.Slider(label='Motion Bucket Id', minimum=1, maximum=1023, step=1, value=127)
+                fps = gr.Slider(label='Fps', minimum=1, maximum=1024, step=1, value=6)
+                augmentation_level = gr.Slider(label='Augmentation Level', minimum=0.0, maximum=10.0, step=0.01,
+                                               value=0.0)
+                sampling_steps = gr.Slider(label='Sampling Steps', minimum=1, maximum=200, step=1, value=20)
+                sampling_cfg = gr.Slider(label='CFG Scale', minimum=0.0, maximum=50.0, step=0.1, value=2.5)
+                sampling_denoise = gr.Slider(label='Sampling Denoise', minimum=0.0, maximum=1.0, step=0.01, value=1.0)
+                guidance_min_cfg = gr.Slider(label='Guidance Min Cfg', minimum=0.0, maximum=100.0, step=0.5, value=1.0)
+                sampling_sampler_name = gr.Radio(label='Sampler Name',
+                                                 choices=['euler', 'euler_ancestral', 'heun', 'heunpp2', 'dpm_2',
+                                                          'dpm_2_ancestral', 'lms', 'dpm_fast', 'dpm_adaptive',
+                                                          'dpmpp_2s_ancestral', 'dpmpp_sde', 'dpmpp_sde_gpu',
+                                                          'dpmpp_2m', 'dpmpp_2m_sde', 'dpmpp_2m_sde_gpu',
+                                                          'dpmpp_3m_sde', 'dpmpp_3m_sde_gpu', 'ddpm', 'lcm', 'ddim',
+                                                          'uni_pc', 'uni_pc_bh2'], value='euler')
+                sampling_scheduler = gr.Radio(label='Scheduler',
+                                              choices=['normal', 'karras', 'exponential', 'sgm_uniform', 'simple',
+                                                       'ddim_uniform'], value='karras')
+                sampling_seed = gr.Number(label='Seed', value=12345, precision=0)
+
+                generate_button = gr.Button(value="Generate")
+
+                ctrls = [filename, width, height, video_frames, motion_bucket_id, fps, augmentation_level,
+                         sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler,
+                         sampling_denoise, guidance_min_cfg, input_image]
+
+            with gr.Column():
+                output_gallery = gr.Gallery(label='Gallery', show_label=False, object_fit='contain',
+                                            visible=True, height=1024, columns=4)
+
+        generate_button.click(predict, inputs=ctrls, outputs=[output_gallery])
+    return [(svd_block, "SVD", "svd")]
+
+
+update_svd_filenames()
+script_callbacks.on_ui_tabs(on_ui_tabs)
+```
+
+Note that although the above codes look like independent codes, they actually will automatically offload/unload any other models. For example, below is me opening webui, load SDXL, generated an image, then go to SVD, then generated image frames. You can see that the GPU memory is perfectly managed and the SDXL is moved to RAM then SVD is moved to GPU. 
+
+Note that this management is fully automatic. This makes writing extensions super simple.
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/ac7ed152-cd33-4645-94af-4c43bb8c3d88)
+
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/cdcb23ad-02dc-4e39-be74-98e927550ef6)
+
+
+Similarly, Zero123:
+
+![image](https://github.com/lllyasviel/stable-diffusion-webui-forge/assets/19834515/d1a4a17d-f382-442d-91f2-fc5b6c10737f)
+
+
+# About Extensions
+
+All UI related extensions should work without problems, like:
+
+    canvas-zoom
+    different translations
+    etc
+
+Below extensions are tested and worked well:
+
+    Dynamic Prompts
+    Adetailer
+    Ultimate SD Upscale
+    Reactor
+
+Below extensions will be tested soon:
+
+    Regional Prompter (I have not figure out how to use that UI yet.. will test later)
+
+Below extensions will be given up but they may still work
+
+    MultiDiffusion / Tiled Diffusison
+    Deform
+    Roop
+
+(Tiled diffusion is integrated now and no need to install extra extensions. Also the current smart unet offload is much better than multi-diffusion and people can directly generate 4k images without using multi-diffusion, by automatically offload unet to RAM. If bigger than 4k, use Ultimate SD Upscale.)
+(But if you want to use some special features in MultiDiffusion like inversion or region prompt, probably you can still use it, but it can be very rare.)
+
+Below extensions will be reworked soon
+
+    sd-webui-controlnet
+
+controlnet will not be replaced by another extension. We will have 10+ extensions for different preprocessors, and 10+ extensions for different control. Each extension will have less than 100 lines of codes. Everyone will be able to add preprocessor/control model by adding new extensions. And adding those extensions will be super easy. Those extensions will share a UI managed by Forge.
--- a/extensions-builtin/LDSR/scripts/ldsr_model.py
+++ b/extensions-builtin/LDSR/scripts/ldsr_model.py
@@ -1,7 +1,7 @@
 import os

 from modules.modelloader import load_file_from_url
-from modules.upscaler import Upscaler, UpscalerData
+from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
 from ldsr_model_arch import LDSR
 from modules import shared, script_callbacks, errors
 import sd_hijack_autoencoder  # noqa: F401
@@ -49,6 +49,7 @@ class UpscalerLDSR(Upscaler):
        return LDSR(model, yaml)

    def do_upscale(self, img, path):
+        prepare_free_memory()
        try:
            ldsr = self.load_model(path)
        except Exception:
--- a/extensions-builtin/Lora/lora_patches.py
+++ b/extensions-builtin/Lora/lora_patches.py
@@ -1,31 +1,6 @@
-import torch
-
-import networks
-from modules import patches
-
-
 class LoraPatches:
    def __init__(self):
-        self.Linear_forward = patches.patch(__name__, torch.nn.Linear, 'forward', networks.network_Linear_forward)
-        self.Linear_load_state_dict = patches.patch(__name__, torch.nn.Linear, '_load_from_state_dict', networks.network_Linear_load_state_dict)
-        self.Conv2d_forward = patches.patch(__name__, torch.nn.Conv2d, 'forward', networks.network_Conv2d_forward)
-        self.Conv2d_load_state_dict = patches.patch(__name__, torch.nn.Conv2d, '_load_from_state_dict', networks.network_Conv2d_load_state_dict)
-        self.GroupNorm_forward = patches.patch(__name__, torch.nn.GroupNorm, 'forward', networks.network_GroupNorm_forward)
-        self.GroupNorm_load_state_dict = patches.patch(__name__, torch.nn.GroupNorm, '_load_from_state_dict', networks.network_GroupNorm_load_state_dict)
-        self.LayerNorm_forward = patches.patch(__name__, torch.nn.LayerNorm, 'forward', networks.network_LayerNorm_forward)
-        self.LayerNorm_load_state_dict = patches.patch(__name__, torch.nn.LayerNorm, '_load_from_state_dict', networks.network_LayerNorm_load_state_dict)
-        self.MultiheadAttention_forward = patches.patch(__name__, torch.nn.MultiheadAttention, 'forward', networks.network_MultiheadAttention_forward)
-        self.MultiheadAttention_load_state_dict = patches.patch(__name__, torch.nn.MultiheadAttention, '_load_from_state_dict', networks.network_MultiheadAttention_load_state_dict)
+        pass

    def undo(self):
-        self.Linear_forward = patches.undo(__name__, torch.nn.Linear, 'forward')
-        self.Linear_load_state_dict = patches.undo(__name__, torch.nn.Linear, '_load_from_state_dict')
-        self.Conv2d_forward = patches.undo(__name__, torch.nn.Conv2d, 'forward')
-        self.Conv2d_load_state_dict = patches.undo(__name__, torch.nn.Conv2d, '_load_from_state_dict')
-        self.GroupNorm_forward = patches.undo(__name__, torch.nn.GroupNorm, 'forward')
-        self.GroupNorm_load_state_dict = patches.undo(__name__, torch.nn.GroupNorm, '_load_from_state_dict')
-        self.LayerNorm_forward = patches.undo(__name__, torch.nn.LayerNorm, 'forward')
-        self.LayerNorm_load_state_dict = patches.undo(__name__, torch.nn.LayerNorm, '_load_from_state_dict')
-        self.MultiheadAttention_forward = patches.undo(__name__, torch.nn.MultiheadAttention, 'forward')
-        self.MultiheadAttention_load_state_dict = patches.undo(__name__, torch.nn.MultiheadAttention, '_load_from_state_dict')
-
+        pass
--- a/extensions-builtin/Lora/lyco_helpers.py
+++ b/extensions-builtin/Lora/lyco_helpers.py
@@ -1,68 +0,0 @@
-import torch
-
-
-def make_weight_cp(t, wa, wb):
-    temp = torch.einsum('i j k l, j r -> i r k l', t, wb)
-    return torch.einsum('i j k l, i r -> r j k l', temp, wa)
-
-
-def rebuild_conventional(up, down, shape, dyn_dim=None):
-    up = up.reshape(up.size(0), -1)
-    down = down.reshape(down.size(0), -1)
-    if dyn_dim is not None:
-        up = up[:, :dyn_dim]
-        down = down[:dyn_dim, :]
-    return (up @ down).reshape(shape)
-
-
-def rebuild_cp_decomposition(up, down, mid):
-    up = up.reshape(up.size(0), -1)
-    down = down.reshape(down.size(0), -1)
-    return torch.einsum('n m k l, i n, m j -> i j k l', mid, up, down)
-
-
-# copied from https://github.com/KohakuBlueleaf/LyCORIS/blob/dev/lycoris/modules/lokr.py
-def factorization(dimension: int, factor:int=-1) -> tuple[int, int]:
-    '''
-    return a tuple of two value of input dimension decomposed by the number closest to factor
-    second value is higher or equal than first value.
-
-    In LoRA with Kroneckor Product, first value is a value for weight scale.
-    secon value is a value for weight.
-
-    Becuase of non-commutative property, A⊗B ≠ B⊗A. Meaning of two matrices is slightly different.
-
-    examples)
-    factor
-        -1               2                4               8               16               ...
-    127 -> 1, 127   127 -> 1, 127    127 -> 1, 127   127 -> 1, 127   127 -> 1, 127
-    128 -> 8, 16    128 -> 2, 64     128 -> 4, 32    128 -> 8, 16    128 -> 8, 16
-    250 -> 10, 25   250 -> 2, 125    250 -> 2, 125   250 -> 5, 50    250 -> 10, 25
-    360 -> 8, 45    360 -> 2, 180    360 -> 4, 90    360 -> 8, 45    360 -> 12, 30
-    512 -> 16, 32   512 -> 2, 256    512 -> 4, 128   512 -> 8, 64    512 -> 16, 32
-    1024 -> 32, 32  1024 -> 2, 512   1024 -> 4, 256  1024 -> 8, 128  1024 -> 16, 64
-    '''
-
-    if factor > 0 and (dimension % factor) == 0:
-        m = factor
-        n = dimension // factor
-        if m > n:
-            n, m = m, n
-        return m, n
-    if factor < 0:
-        factor = dimension
-    m, n = 1, dimension
-    length = m + n
-    while m<n:
-        new_m = m + 1
-        while dimension%new_m != 0:
-            new_m += 1
-        new_n = dimension // new_m
-        if new_m + new_n > length or new_m>factor:
-            break
-        else:
-            m, n = new_m, new_n
-    if m > n:
-        n, m = m, n
-    return m, n
-
--- a/extensions-builtin/Lora/network.py
+++ b/extensions-builtin/Lora/network.py
@@ -1,190 +1,190 @@
-from __future__ import annotations
-import os
-from collections import namedtuple
-import enum
-
-import torch.nn as nn
-import torch.nn.functional as F
-
-from modules import sd_models, cache, errors, hashes, shared
-
-NetworkWeights = namedtuple('NetworkWeights', ['network_key', 'sd_key', 'w', 'sd_module'])
-
-metadata_tags_order = {"ss_sd_model_name": 1, "ss_resolution": 2, "ss_clip_skip": 3, "ss_num_train_images": 10, "ss_tag_frequency": 20}
-
-
-class SdVersion(enum.Enum):
-    Unknown = 1
-    SD1 = 2
-    SD2 = 3
-    SDXL = 4
-
-
-class NetworkOnDisk:
-    def __init__(self, name, filename):
-        self.name = name
-        self.filename = filename
-        self.metadata = {}
-        self.is_safetensors = os.path.splitext(filename)[1].lower() == ".safetensors"
-
-        def read_metadata():
-            metadata = sd_models.read_metadata_from_safetensors(filename)
-            metadata.pop('ssmd_cover_images', None)  # those are cover images, and they are too big to display in UI as text
-
-            return metadata
-
-        if self.is_safetensors:
-            try:
-                self.metadata = cache.cached_data_for_file('safetensors-metadata', "lora/" + self.name, filename, read_metadata)
-            except Exception as e:
-                errors.display(e, f"reading lora {filename}")
-
-        if self.metadata:
-            m = {}
-            for k, v in sorted(self.metadata.items(), key=lambda x: metadata_tags_order.get(x[0], 999)):
-                m[k] = v
-
-            self.metadata = m
-
-        self.alias = self.metadata.get('ss_output_name', self.name)
-
-        self.hash = None
-        self.shorthash = None
-        self.set_hash(
-            self.metadata.get('sshs_model_hash') or
-            hashes.sha256_from_cache(self.filename, "lora/" + self.name, use_addnet_hash=self.is_safetensors) or
-            ''
-        )
-
-        self.sd_version = self.detect_version()
-
-    def detect_version(self):
-        if str(self.metadata.get('ss_base_model_version', "")).startswith("sdxl_"):
-            return SdVersion.SDXL
-        elif str(self.metadata.get('ss_v2', "")) == "True":
-            return SdVersion.SD2
-        elif len(self.metadata):
-            return SdVersion.SD1
-
-        return SdVersion.Unknown
-
-    def set_hash(self, v):
-        self.hash = v
-        self.shorthash = self.hash[0:12]
-
-        if self.shorthash:
-            import networks
-            networks.available_network_hash_lookup[self.shorthash] = self
-
-    def read_hash(self):
-        if not self.hash:
-            self.set_hash(hashes.sha256(self.filename, "lora/" + self.name, use_addnet_hash=self.is_safetensors) or '')
-
-    def get_alias(self):
-        import networks
-        if shared.opts.lora_preferred_name == "Filename" or self.alias.lower() in networks.forbidden_network_aliases:
-            return self.name
-        else:
-            return self.alias
-
-
-class Network:  # LoraModule
-    def __init__(self, name, network_on_disk: NetworkOnDisk):
-        self.name = name
-        self.network_on_disk = network_on_disk
-        self.te_multiplier = 1.0
-        self.unet_multiplier = 1.0
-        self.dyn_dim = None
-        self.modules = {}
-        self.bundle_embeddings = {}
-        self.mtime = None
-
-        self.mentioned_name = None
-        """the text that was used to add the network to prompt - can be either name or an alias"""
-
-
-class ModuleType:
-    def create_module(self, net: Network, weights: NetworkWeights) -> Network | None:
-        return None
-
-
-class NetworkModule:
-    def __init__(self, net: Network, weights: NetworkWeights):
-        self.network = net
-        self.network_key = weights.network_key
-        self.sd_key = weights.sd_key
-        self.sd_module = weights.sd_module
-
-        if hasattr(self.sd_module, 'weight'):
-            self.shape = self.sd_module.weight.shape
-
-        self.ops = None
-        self.extra_kwargs = {}
-        if isinstance(self.sd_module, nn.Conv2d):
-            self.ops = F.conv2d
-            self.extra_kwargs = {
-                'stride': self.sd_module.stride,
-                'padding': self.sd_module.padding
-            }
-        elif isinstance(self.sd_module, nn.Linear):
-            self.ops = F.linear
-        elif isinstance(self.sd_module, nn.LayerNorm):
-            self.ops = F.layer_norm
-            self.extra_kwargs = {
-                'normalized_shape': self.sd_module.normalized_shape,
-                'eps': self.sd_module.eps
-            }
-        elif isinstance(self.sd_module, nn.GroupNorm):
-            self.ops = F.group_norm
-            self.extra_kwargs = {
-                'num_groups': self.sd_module.num_groups,
-                'eps': self.sd_module.eps
-            }
-
-        self.dim = None
-        self.bias = weights.w.get("bias")
-        self.alpha = weights.w["alpha"].item() if "alpha" in weights.w else None
-        self.scale = weights.w["scale"].item() if "scale" in weights.w else None
-
-    def multiplier(self):
-        if 'transformer' in self.sd_key[:20]:
-            return self.network.te_multiplier
-        else:
-            return self.network.unet_multiplier
-
-    def calc_scale(self):
-        if self.scale is not None:
-            return self.scale
-        if self.dim is not None and self.alpha is not None:
-            return self.alpha / self.dim
-
-        return 1.0
-
-    def finalize_updown(self, updown, orig_weight, output_shape, ex_bias=None):
-        if self.bias is not None:
-            updown = updown.reshape(self.bias.shape)
-            updown += self.bias.to(orig_weight.device, dtype=updown.dtype)
-            updown = updown.reshape(output_shape)
-
-        if len(output_shape) == 4:
-            updown = updown.reshape(output_shape)
-
-        if orig_weight.size().numel() == updown.size().numel():
-            updown = updown.reshape(orig_weight.shape)
-
-        if ex_bias is not None:
-            ex_bias = ex_bias * self.multiplier()
-
-        return updown * self.calc_scale() * self.multiplier(), ex_bias
-
-    def calc_updown(self, target):
-        raise NotImplementedError()
-
-    def forward(self, x, y):
-        """A general forward implementation for all modules"""
-        if self.ops is None:
-            raise NotImplementedError()
-        else:
-            updown, ex_bias = self.calc_updown(self.sd_module.weight)
-            return y + self.ops(x, weight=updown, bias=ex_bias, **self.extra_kwargs)
-
+from __future__ import annotations
+import os
+from collections import namedtuple
+import enum
+
+import torch.nn as nn
+import torch.nn.functional as F
+
+from modules import sd_models, cache, errors, hashes, shared
+
+NetworkWeights = namedtuple('NetworkWeights', ['network_key', 'sd_key', 'w', 'sd_module'])
+
+metadata_tags_order = {"ss_sd_model_name": 1, "ss_resolution": 2, "ss_clip_skip": 3, "ss_num_train_images": 10, "ss_tag_frequency": 20}
+
+
+class SdVersion(enum.Enum):
+    Unknown = 1
+    SD1 = 2
+    SD2 = 3
+    SDXL = 4
+
+
+class NetworkOnDisk:
+    def __init__(self, name, filename):
+        self.name = name
+        self.filename = filename
+        self.metadata = {}
+        self.is_safetensors = os.path.splitext(filename)[1].lower() == ".safetensors"
+
+        def read_metadata():
+            metadata = sd_models.read_metadata_from_safetensors(filename)
+            metadata.pop('ssmd_cover_images', None)  # those are cover images, and they are too big to display in UI as text
+
+            return metadata
+
+        if self.is_safetensors:
+            try:
+                self.metadata = cache.cached_data_for_file('safetensors-metadata', "lora/" + self.name, filename, read_metadata)
+            except Exception as e:
+                errors.display(e, f"reading lora {filename}")
+
+        if self.metadata:
+            m = {}
+            for k, v in sorted(self.metadata.items(), key=lambda x: metadata_tags_order.get(x[0], 999)):
+                m[k] = v
+
+            self.metadata = m
+
+        self.alias = self.metadata.get('ss_output_name', self.name)
+
+        self.hash = None
+        self.shorthash = None
+        self.set_hash(
+            self.metadata.get('sshs_model_hash') or
+            hashes.sha256_from_cache(self.filename, "lora/" + self.name, use_addnet_hash=self.is_safetensors) or
+            ''
+        )
+
+        self.sd_version = self.detect_version()
+
+    def detect_version(self):
+        if str(self.metadata.get('ss_base_model_version', "")).startswith("sdxl_"):
+            return SdVersion.SDXL
+        elif str(self.metadata.get('ss_v2', "")) == "True":
+            return SdVersion.SD2
+        elif len(self.metadata):
+            return SdVersion.SD1
+
+        return SdVersion.Unknown
+
+    def set_hash(self, v):
+        self.hash = v
+        self.shorthash = self.hash[0:12]
+
+        if self.shorthash:
+            import networks
+            networks.available_network_hash_lookup[self.shorthash] = self
+
+    def read_hash(self):
+        if not self.hash:
+            self.set_hash(hashes.sha256(self.filename, "lora/" + self.name, use_addnet_hash=self.is_safetensors) or '')
+
+    def get_alias(self):
+        import networks
+        if shared.opts.lora_preferred_name == "Filename" or self.alias.lower() in networks.forbidden_network_aliases:
+            return self.name
+        else:
+            return self.alias
+
+
+class Network:  # LoraModule
+    def __init__(self, name, network_on_disk: NetworkOnDisk):
+        self.name = name
+        self.network_on_disk = network_on_disk
+        self.te_multiplier = 1.0
+        self.unet_multiplier = 1.0
+        self.dyn_dim = None
+        self.modules = {}
+        self.bundle_embeddings = {}
+        self.mtime = None
+
+        self.mentioned_name = None
+        """the text that was used to add the network to prompt - can be either name or an alias"""
+
+
+class ModuleType:
+    def create_module(self, net: Network, weights: NetworkWeights) -> Network | None:
+        return None
+
+
+class NetworkModule:
+    def __init__(self, net: Network, weights: NetworkWeights):
+        self.network = net
+        self.network_key = weights.network_key
+        self.sd_key = weights.sd_key
+        self.sd_module = weights.sd_module
+
+        if hasattr(self.sd_module, 'weight'):
+            self.shape = self.sd_module.weight.shape
+
+        self.ops = None
+        self.extra_kwargs = {}
+        if isinstance(self.sd_module, nn.Conv2d):
+            self.ops = F.conv2d
+            self.extra_kwargs = {
+                'stride': self.sd_module.stride,
+                'padding': self.sd_module.padding
+            }
+        elif isinstance(self.sd_module, nn.Linear):
+            self.ops = F.linear
+        elif isinstance(self.sd_module, nn.LayerNorm):
+            self.ops = F.layer_norm
+            self.extra_kwargs = {
+                'normalized_shape': self.sd_module.normalized_shape,
+                'eps': self.sd_module.eps
+            }
+        elif isinstance(self.sd_module, nn.GroupNorm):
+            self.ops = F.group_norm
+            self.extra_kwargs = {
+                'num_groups': self.sd_module.num_groups,
+                'eps': self.sd_module.eps
+            }
+
+        self.dim = None
+        self.bias = weights.w.get("bias")
+        self.alpha = weights.w["alpha"].item() if "alpha" in weights.w else None
+        self.scale = weights.w["scale"].item() if "scale" in weights.w else None
+
+    def multiplier(self):
+        if 'transformer' in self.sd_key[:20]:
+            return self.network.te_multiplier
+        else:
+            return self.network.unet_multiplier
+
+    def calc_scale(self):
+        if self.scale is not None:
+            return self.scale
+        if self.dim is not None and self.alpha is not None:
+            return self.alpha / self.dim
+
+        return 1.0
+
+    def finalize_updown(self, updown, orig_weight, output_shape, ex_bias=None):
+        if self.bias is not None:
+            updown = updown.reshape(self.bias.shape)
+            updown += self.bias.to(orig_weight.device, dtype=updown.dtype)
+            updown = updown.reshape(output_shape)
+
+        if len(output_shape) == 4:
+            updown = updown.reshape(output_shape)
+
+        if orig_weight.size().numel() == updown.size().numel():
+            updown = updown.reshape(orig_weight.shape)
+
+        if ex_bias is not None:
+            ex_bias = ex_bias * self.multiplier()
+
+        return updown * self.calc_scale() * self.multiplier(), ex_bias
+
+    def calc_updown(self, target):
+        raise NotImplementedError()
+
+    def forward(self, x, y):
+        """A general forward implementation for all modules"""
+        if self.ops is None:
+            raise NotImplementedError()
+        else:
+            updown, ex_bias = self.calc_updown(self.sd_module.weight)
+            return y + self.ops(x, weight=updown, bias=ex_bias, **self.extra_kwargs)
+
--- a/extensions-builtin/Lora/network_full.py
+++ b/extensions-builtin/Lora/network_full.py
@@ -1,27 +0,0 @@
-import network
-
-
-class ModuleTypeFull(network.ModuleType):
-    def create_module(self, net: network.Network, weights: network.NetworkWeights):
-        if all(x in weights.w for x in ["diff"]):
-            return NetworkModuleFull(net, weights)
-
-        return None
-
-
-class NetworkModuleFull(network.NetworkModule):
-    def __init__(self,  net: network.Network, weights: network.NetworkWeights):
-        super().__init__(net, weights)
-
-        self.weight = weights.w.get("diff")
-        self.ex_bias = weights.w.get("diff_b")
-
-    def calc_updown(self, orig_weight):
-        output_shape = self.weight.shape
-        updown = self.weight.to(orig_weight.device)
-        if self.ex_bias is not None:
-            ex_bias = self.ex_bias.to(orig_weight.device)
-        else:
-            ex_bias = None
-
-        return self.finalize_updown(updown, orig_weight, output_shape, ex_bias)
--- a/extensions-builtin/Lora/network_glora.py
+++ b/extensions-builtin/Lora/network_glora.py
@@ -1,33 +0,0 @@
-
-import network
-
-class ModuleTypeGLora(network.ModuleType):
-    def create_module(self, net: network.Network, weights: network.NetworkWeights):
-        if all(x in weights.w for x in ["a1.weight", "a2.weight", "alpha", "b1.weight", "b2.weight"]):
-            return NetworkModuleGLora(net, weights)
-
-        return None
-
-# adapted from https://github.com/KohakuBlueleaf/LyCORIS
-class NetworkModuleGLora(network.NetworkModule):
-    def __init__(self,  net: network.Network, weights: network.NetworkWeights):
-        super().__init__(net, weights)
-
-        if hasattr(self.sd_module, 'weight'):
-            self.shape = self.sd_module.weight.shape
-
-        self.w1a = weights.w["a1.weight"]
-        self.w1b = weights.w["b1.weight"]
-        self.w2a = weights.w["a2.weight"]
-        self.w2b = weights.w["b2.weight"]
-
-    def calc_updown(self, orig_weight):
-        w1a = self.w1a.to(orig_weight.device)
-        w1b = self.w1b.to(orig_weight.device)
-        w2a = self.w2a.to(orig_weight.device)
-        w2b = self.w2b.to(orig_weight.device)
-
-        output_shape = [w1a.size(0), w1b.size(1)]
-        updown = ((w2b @ w1b) + ((orig_weight.to(dtype = w1a.dtype) @ w2a) @ w1a))
-
-        return self.finalize_updown(updown, orig_weight, output_shape)
--- a/extensions-builtin/Lora/network_hada.py
+++ b/extensions-builtin/Lora/network_hada.py
@@ -1,55 +0,0 @@
-import lyco_helpers
-import network
-
-
-class ModuleTypeHada(network.ModuleType):
-    def create_module(self, net: network.Network, weights: network.NetworkWeights):
-        if all(x in weights.w for x in ["hada_w1_a", "hada_w1_b", "hada_w2_a", "hada_w2_b"]):
-            return NetworkModuleHada(net, weights)
-
-        return None
-
-
-class NetworkModuleHada(network.NetworkModule):
-    def __init__(self,  net: network.Network, weights: network.NetworkWeights):
-        super().__init__(net, weights)
-
-        if hasattr(self.sd_module, 'weight'):
-            self.shape = self.sd_module.weight.shape
-
-        self.w1a = weights.w["hada_w1_a"]
-        self.w1b = weights.w["hada_w1_b"]
-        self.dim = self.w1b.shape[0]
-        self.w2a = weights.w["hada_w2_a"]
-        self.w2b = weights.w["hada_w2_b"]
-
-        self.t1 = weights.w.get("hada_t1")
-        self.t2 = weights.w.get("hada_t2")
-
-    def calc_updown(self, orig_weight):
-        w1a = self.w1a.to(orig_weight.device)
-        w1b = self.w1b.to(orig_weight.device)
-        w2a = self.w2a.to(orig_weight.device)
-        w2b = self.w2b.to(orig_weight.device)
-
-        output_shape = [w1a.size(0), w1b.size(1)]
-
-        if self.t1 is not None:
-            output_shape = [w1a.size(1), w1b.size(1)]
-            t1 = self.t1.to(orig_weight.device)
-            updown1 = lyco_helpers.make_weight_cp(t1, w1a, w1b)
-            output_shape += t1.shape[2:]
-        else:
-            if len(w1b.shape) == 4:
-                output_shape += w1b.shape[2:]
-            updown1 = lyco_helpers.rebuild_conventional(w1a, w1b, output_shape)
-
-        if self.t2 is not None:
-            t2 = self.t2.to(orig_weight.device)
-            updown2 = lyco_helpers.make_weight_cp(t2, w2a, w2b)
-        else:
-            updown2 = lyco_helpers.rebuild_conventional(w2a, w2b, output_shape)
-
-        updown = updown1 * updown2
-
-        return self.finalize_updown(updown, orig_weight, output_shape)
--- a/extensions-builtin/Lora/network_ia3.py
+++ b/extensions-builtin/Lora/network_ia3.py
@@ -1,30 +0,0 @@
-import network
-
-
-class ModuleTypeIa3(network.ModuleType):
-    def create_module(self, net: network.Network, weights: network.NetworkWeights):
-        if all(x in weights.w for x in ["weight"]):
-            return NetworkModuleIa3(net, weights)
-
-        return None
-
-
-class NetworkModuleIa3(network.NetworkModule):
-    def __init__(self,  net: network.Network, weights: network.NetworkWeights):
-        super().__init__(net, weights)
-
-        self.w = weights.w["weight"]
-        self.on_input = weights.w["on_input"].item()
-
-    def calc_updown(self, orig_weight):
-        w = self.w.to(orig_weight.device)
-
-        output_shape = [w.size(0), orig_weight.size(1)]
-        if self.on_input:
-            output_shape.reverse()
-        else:
-            w = w.reshape(-1, 1)
-
-        updown = orig_weight * w
-
-        return self.finalize_updown(updown, orig_weight, output_shape)
--- a/extensions-builtin/Lora/network_lokr.py
+++ b/extensions-builtin/Lora/network_lokr.py
@@ -1,64 +0,0 @@
-import torch
-
-import lyco_helpers
-import network
-
-
-class ModuleTypeLokr(network.ModuleType):
-    def create_module(self, net: network.Network, weights: network.NetworkWeights):
-        has_1 = "lokr_w1" in weights.w or ("lokr_w1_a" in weights.w and "lokr_w1_b" in weights.w)
-        has_2 = "lokr_w2" in weights.w or ("lokr_w2_a" in weights.w and "lokr_w2_b" in weights.w)
-        if has_1 and has_2:
-            return NetworkModuleLokr(net, weights)
-
-        return None
-
-
-def make_kron(orig_shape, w1, w2):
-    if len(w2.shape) == 4:
-        w1 = w1.unsqueeze(2).unsqueeze(2)
-    w2 = w2.contiguous()
-    return torch.kron(w1, w2).reshape(orig_shape)
-
-
-class NetworkModuleLokr(network.NetworkModule):
-    def __init__(self,  net: network.Network, weights: network.NetworkWeights):
-        super().__init__(net, weights)
-
-        self.w1 = weights.w.get("lokr_w1")
-        self.w1a = weights.w.get("lokr_w1_a")
-        self.w1b = weights.w.get("lokr_w1_b")
-        self.dim = self.w1b.shape[0] if self.w1b is not None else self.dim
-        self.w2 = weights.w.get("lokr_w2")
-        self.w2a = weights.w.get("lokr_w2_a")
-        self.w2b = weights.w.get("lokr_w2_b")
-        self.dim = self.w2b.shape[0] if self.w2b is not None else self.dim
-        self.t2 = weights.w.get("lokr_t2")
-
-    def calc_updown(self, orig_weight):
-        if self.w1 is not None:
-            w1 = self.w1.to(orig_weight.device)
-        else:
-            w1a = self.w1a.to(orig_weight.device)
-            w1b = self.w1b.to(orig_weight.device)
-            w1 = w1a @ w1b
-
-        if self.w2 is not None:
-            w2 = self.w2.to(orig_weight.device)
-        elif self.t2 is None:
-            w2a = self.w2a.to(orig_weight.device)
-            w2b = self.w2b.to(orig_weight.device)
-            w2 = w2a @ w2b
-        else:
-            t2 = self.t2.to(orig_weight.device)
-            w2a = self.w2a.to(orig_weight.device)
-            w2b = self.w2b.to(orig_weight.device)
-            w2 = lyco_helpers.make_weight_cp(t2, w2a, w2b)
-
-        output_shape = [w1.size(0) * w2.size(0), w1.size(1) * w2.size(1)]
-        if len(orig_weight.shape) == 4:
-            output_shape = orig_weight.shape
-
-        updown = make_kron(output_shape, w1, w2)
-
-        return self.finalize_updown(updown, orig_weight, output_shape)
--- a/extensions-builtin/Lora/network_lora.py
+++ b/extensions-builtin/Lora/network_lora.py
@@ -1,86 +0,0 @@
-import torch
-
-import lyco_helpers
-import network
-from modules import devices
-
-
-class ModuleTypeLora(network.ModuleType):
-    def create_module(self, net: network.Network, weights: network.NetworkWeights):
-        if all(x in weights.w for x in ["lora_up.weight", "lora_down.weight"]):
-            return NetworkModuleLora(net, weights)
-
-        return None
-
-
-class NetworkModuleLora(network.NetworkModule):
-    def __init__(self,  net: network.Network, weights: network.NetworkWeights):
-        super().__init__(net, weights)
-
-        self.up_model = self.create_module(weights.w, "lora_up.weight")
-        self.down_model = self.create_module(weights.w, "lora_down.weight")
-        self.mid_model = self.create_module(weights.w, "lora_mid.weight", none_ok=True)
-
-        self.dim = weights.w["lora_down.weight"].shape[0]
-
-    def create_module(self, weights, key, none_ok=False):
-        weight = weights.get(key)
-
-        if weight is None and none_ok:
-            return None
-
-        is_linear = type(self.sd_module) in [torch.nn.Linear, torch.nn.modules.linear.NonDynamicallyQuantizableLinear, torch.nn.MultiheadAttention]
-        is_conv = type(self.sd_module) in [torch.nn.Conv2d]
-
-        if is_linear:
-            weight = weight.reshape(weight.shape[0], -1)
-            module = torch.nn.Linear(weight.shape[1], weight.shape[0], bias=False)
-        elif is_conv and key == "lora_down.weight" or key == "dyn_up":
-            if len(weight.shape) == 2:
-                weight = weight.reshape(weight.shape[0], -1, 1, 1)
-
-            if weight.shape[2] != 1 or weight.shape[3] != 1:
-                module = torch.nn.Conv2d(weight.shape[1], weight.shape[0], self.sd_module.kernel_size, self.sd_module.stride, self.sd_module.padding, bias=False)
-            else:
-                module = torch.nn.Conv2d(weight.shape[1], weight.shape[0], (1, 1), bias=False)
-        elif is_conv and key == "lora_mid.weight":
-            module = torch.nn.Conv2d(weight.shape[1], weight.shape[0], self.sd_module.kernel_size, self.sd_module.stride, self.sd_module.padding, bias=False)
-        elif is_conv and key == "lora_up.weight" or key == "dyn_down":
-            module = torch.nn.Conv2d(weight.shape[1], weight.shape[0], (1, 1), bias=False)
-        else:
-            raise AssertionError(f'Lora layer {self.network_key} matched a layer with unsupported type: {type(self.sd_module).__name__}')
-
-        with torch.no_grad():
-            if weight.shape != module.weight.shape:
-                weight = weight.reshape(module.weight.shape)
-            module.weight.copy_(weight)
-
-        module.to(device=devices.cpu, dtype=devices.dtype)
-        module.weight.requires_grad_(False)
-
-        return module
-
-    def calc_updown(self, orig_weight):
-        up = self.up_model.weight.to(orig_weight.device)
-        down = self.down_model.weight.to(orig_weight.device)
-
-        output_shape = [up.size(0), down.size(1)]
-        if self.mid_model is not None:
-            # cp-decomposition
-            mid = self.mid_model.weight.to(orig_weight.device)
-            updown = lyco_helpers.rebuild_cp_decomposition(up, down, mid)
-            output_shape += mid.shape[2:]
-        else:
-            if len(down.shape) == 4:
-                output_shape += down.shape[2:]
-            updown = lyco_helpers.rebuild_conventional(up, down, output_shape, self.network.dyn_dim)
-
-        return self.finalize_updown(updown, orig_weight, output_shape)
-
-    def forward(self, x, y):
-        self.up_model.to(device=devices.device)
-        self.down_model.to(device=devices.device)
-
-        return y + self.up_model(self.down_model(x)) * self.multiplier() * self.calc_scale()
-
-
--- a/extensions-builtin/Lora/network_norm.py
+++ b/extensions-builtin/Lora/network_norm.py
@@ -1,28 +0,0 @@
-import network
-
-
-class ModuleTypeNorm(network.ModuleType):
-    def create_module(self, net: network.Network, weights: network.NetworkWeights):
-        if all(x in weights.w for x in ["w_norm", "b_norm"]):
-            return NetworkModuleNorm(net, weights)
-
-        return None
-
-
-class NetworkModuleNorm(network.NetworkModule):
-    def __init__(self,  net: network.Network, weights: network.NetworkWeights):
-        super().__init__(net, weights)
-
-        self.w_norm = weights.w.get("w_norm")
-        self.b_norm = weights.w.get("b_norm")
-
-    def calc_updown(self, orig_weight):
-        output_shape = self.w_norm.shape
-        updown = self.w_norm.to(orig_weight.device)
-
-        if self.b_norm is not None:
-            ex_bias = self.b_norm.to(orig_weight.device)
-        else:
-            ex_bias = None
-
-        return self.finalize_updown(updown, orig_weight, output_shape, ex_bias)
--- a/extensions-builtin/Lora/network_oft.py
+++ b/extensions-builtin/Lora/network_oft.py
@@ -1,82 +0,0 @@
-import torch
-import network
-from lyco_helpers import factorization
-from einops import rearrange
-
-
-class ModuleTypeOFT(network.ModuleType):
-    def create_module(self, net: network.Network, weights: network.NetworkWeights):
-        if all(x in weights.w for x in ["oft_blocks"]) or all(x in weights.w for x in ["oft_diag"]):
-            return NetworkModuleOFT(net, weights)
-
-        return None
-
-# Supports both kohya-ss' implementation of COFT  https://github.com/kohya-ss/sd-scripts/blob/main/networks/oft.py
-# and KohakuBlueleaf's implementation of OFT/COFT https://github.com/KohakuBlueleaf/LyCORIS/blob/dev/lycoris/modules/diag_oft.py
-class NetworkModuleOFT(network.NetworkModule):
-    def __init__(self,  net: network.Network, weights: network.NetworkWeights):
-
-        super().__init__(net, weights)
-
-        self.lin_module = None
-        self.org_module: list[torch.Module] = [self.sd_module]
-
-        self.scale = 1.0
-
-        # kohya-ss
-        if "oft_blocks" in weights.w.keys():
-            self.is_kohya = True
-            self.oft_blocks = weights.w["oft_blocks"] # (num_blocks, block_size, block_size)
-            self.alpha = weights.w["alpha"] # alpha is constraint
-            self.dim = self.oft_blocks.shape[0] # lora dim
-        # LyCORIS
-        elif "oft_diag" in weights.w.keys():
-            self.is_kohya = False
-            self.oft_blocks = weights.w["oft_diag"]
-            # self.alpha is unused
-            self.dim = self.oft_blocks.shape[1] # (num_blocks, block_size, block_size)
-
-        is_linear = type(self.sd_module) in [torch.nn.Linear, torch.nn.modules.linear.NonDynamicallyQuantizableLinear]
-        is_conv = type(self.sd_module) in [torch.nn.Conv2d]
-        is_other_linear = type(self.sd_module) in [torch.nn.MultiheadAttention] # unsupported
-
-        if is_linear:
-            self.out_dim = self.sd_module.out_features
-        elif is_conv:
-            self.out_dim = self.sd_module.out_channels
-        elif is_other_linear:
-            self.out_dim = self.sd_module.embed_dim
-
-        if self.is_kohya:
-            self.constraint = self.alpha * self.out_dim
-            self.num_blocks = self.dim
-            self.block_size = self.out_dim // self.dim
-        else:
-            self.constraint = None
-            self.block_size, self.num_blocks = factorization(self.out_dim, self.dim)
-
-    def calc_updown(self, orig_weight):
-        oft_blocks = self.oft_blocks.to(orig_weight.device)
-        eye = torch.eye(self.block_size, device=oft_blocks.device)
-
-        if self.is_kohya:
-            block_Q = oft_blocks - oft_blocks.transpose(1, 2) # ensure skew-symmetric orthogonal matrix
-            norm_Q = torch.norm(block_Q.flatten())
-            new_norm_Q = torch.clamp(norm_Q, max=self.constraint.to(oft_blocks.device))
-            block_Q = block_Q * ((new_norm_Q + 1e-8) / (norm_Q + 1e-8))
-            oft_blocks = torch.matmul(eye + block_Q, (eye - block_Q).float().inverse())
-
-        R = oft_blocks.to(orig_weight.device)
-
-        # This errors out for MultiheadAttention, might need to be handled up-stream
-        merged_weight = rearrange(orig_weight, '(k n) ... -> k n ...', k=self.num_blocks, n=self.block_size)
-        merged_weight = torch.einsum(
-            'k n m, k n ... -> k m ...',
-            R,
-            merged_weight
-        )
-        merged_weight = rearrange(merged_weight, 'k m ... -> (k m) ...')
-
-        updown = merged_weight.to(orig_weight.device) - orig_weight.to(merged_weight.dtype)
-        output_shape = orig_weight.shape
-        return self.finalize_updown(updown, orig_weight, output_shape)
--- a/extensions-builtin/Lora/networks.py
+++ b/extensions-builtin/Lora/networks.py
@@ -1,566 +1,127 @@
-import gradio as gr
-import logging
 import os
 import re

 import lora_patches
+import functools
 import network
-import network_lora
-import network_glora
-import network_hada
-import network_ia3
-import network_lokr
-import network_full
-import network_norm
-import network_oft

 import torch
 from typing import Union

-from modules import shared, devices, sd_models, errors, scripts, sd_hijack
-import modules.textual_inversion.textual_inversion as textual_inversion
-
-from lora_logger import logger
-
-module_types = [
-    network_lora.ModuleTypeLora(),
-    network_hada.ModuleTypeHada(),
-    network_ia3.ModuleTypeIa3(),
-    network_lokr.ModuleTypeLokr(),
-    network_full.ModuleTypeFull(),
-    network_norm.ModuleTypeNorm(),
-    network_glora.ModuleTypeGLora(),
-    network_oft.ModuleTypeOFT(),
-]
+from modules import shared, sd_models, errors, scripts
+from ldm_patched.modules.utils import load_torch_file
+from ldm_patched.modules.sd import load_lora_for_models


-re_digits = re.compile(r"\d+")
-re_x_proj = re.compile(r"(.*)_([qkv]_proj)$")
-re_compiled = {}
-
-suffix_conversion = {
-    "attentions": {},
-    "resnets": {
-        "conv1": "in_layers_2",
-        "conv2": "out_layers_3",
-        "norm1": "in_layers_0",
-        "norm2": "out_layers_0",
-        "time_emb_proj": "emb_layers_1",
-        "conv_shortcut": "skip_connection",
-    }
-}
+@functools.lru_cache(maxsize=5)
+def load_lora_state_dict(filename):
+    return load_torch_file(filename, safe_load=True)


 def convert_diffusers_name_to_compvis(key, is_sd2):
-    def match(match_list, regex_text):
-        regex = re_compiled.get(regex_text)
-        if regex is None:
-            regex = re.compile(regex_text)
-            re_compiled[regex_text] = regex
-
-        r = re.match(regex, key)
-        if not r:
-            return False
-
-        match_list.clear()
-        match_list.extend([int(x) if re.match(re_digits, x) else x for x in r.groups()])
-        return True
-
-    m = []
-
-    if match(m, r"lora_unet_conv_in(.*)"):
-        return f'diffusion_model_input_blocks_0_0{m[0]}'
-
-    if match(m, r"lora_unet_conv_out(.*)"):
-        return f'diffusion_model_out_2{m[0]}'
-
-    if match(m, r"lora_unet_time_embedding_linear_(\d+)(.*)"):
-        return f"diffusion_model_time_embed_{m[0] * 2 - 2}{m[1]}"
-
-    if match(m, r"lora_unet_down_blocks_(\d+)_(attentions|resnets)_(\d+)_(.+)"):
-        suffix = suffix_conversion.get(m[1], {}).get(m[3], m[3])
-        return f"diffusion_model_input_blocks_{1 + m[0] * 3 + m[2]}_{1 if m[1] == 'attentions' else 0}_{suffix}"
-
-    if match(m, r"lora_unet_mid_block_(attentions|resnets)_(\d+)_(.+)"):
-        suffix = suffix_conversion.get(m[0], {}).get(m[2], m[2])
-        return f"diffusion_model_middle_block_{1 if m[0] == 'attentions' else m[1] * 2}_{suffix}"
-
-    if match(m, r"lora_unet_up_blocks_(\d+)_(attentions|resnets)_(\d+)_(.+)"):
-        suffix = suffix_conversion.get(m[1], {}).get(m[3], m[3])
-        return f"diffusion_model_output_blocks_{m[0] * 3 + m[2]}_{1 if m[1] == 'attentions' else 0}_{suffix}"
-
-    if match(m, r"lora_unet_down_blocks_(\d+)_downsamplers_0_conv"):
-        return f"diffusion_model_input_blocks_{3 + m[0] * 3}_0_op"
-
-    if match(m, r"lora_unet_up_blocks_(\d+)_upsamplers_0_conv"):
-        return f"diffusion_model_output_blocks_{2 + m[0] * 3}_{2 if m[0]>0 else 1}_conv"
-
-    if match(m, r"lora_te_text_model_encoder_layers_(\d+)_(.+)"):
-        if is_sd2:
-            if 'mlp_fc1' in m[1]:
-                return f"model_transformer_resblocks_{m[0]}_{m[1].replace('mlp_fc1', 'mlp_c_fc')}"
-            elif 'mlp_fc2' in m[1]:
-                return f"model_transformer_resblocks_{m[0]}_{m[1].replace('mlp_fc2', 'mlp_c_proj')}"
-            else:
-                return f"model_transformer_resblocks_{m[0]}_{m[1].replace('self_attn', 'attn')}"
-
-        return f"transformer_text_model_encoder_layers_{m[0]}_{m[1]}"
-
-    if match(m, r"lora_te2_text_model_encoder_layers_(\d+)_(.+)"):
-        if 'mlp_fc1' in m[1]:
-            return f"1_model_transformer_resblocks_{m[0]}_{m[1].replace('mlp_fc1', 'mlp_c_fc')}"
-        elif 'mlp_fc2' in m[1]:
-            return f"1_model_transformer_resblocks_{m[0]}_{m[1].replace('mlp_fc2', 'mlp_c_proj')}"
-        else:
-            return f"1_model_transformer_resblocks_{m[0]}_{m[1].replace('self_attn', 'attn')}"
-
-    return key
+    pass


 def assign_network_names_to_compvis_modules(sd_model):
-    network_layer_mapping = {}
-
-    if shared.sd_model.is_sdxl:
-        for i, embedder in enumerate(shared.sd_model.conditioner.embedders):
-            if not hasattr(embedder, 'wrapped'):
-                continue
-
-            for name, module in embedder.wrapped.named_modules():
-                network_name = f'{i}_{name.replace(".", "_")}'
-                network_layer_mapping[network_name] = module
-                module.network_layer_name = network_name
-    else:
-        for name, module in shared.sd_model.cond_stage_model.wrapped.named_modules():
-            network_name = name.replace(".", "_")
-            network_layer_mapping[network_name] = module
-            module.network_layer_name = network_name
-
-    for name, module in shared.sd_model.model.named_modules():
-        network_name = name.replace(".", "_")
-        network_layer_mapping[network_name] = module
-        module.network_layer_name = network_name
-
-    sd_model.network_layer_mapping = network_layer_mapping
+    pass


 def load_network(name, network_on_disk):
-    net = network.Network(name, network_on_disk)
-    net.mtime = os.path.getmtime(network_on_disk.filename)
-
-    sd = sd_models.read_state_dict(network_on_disk.filename)
-
-    # this should not be needed but is here as an emergency fix for an unknown error people are experiencing in 1.2.0
-    if not hasattr(shared.sd_model, 'network_layer_mapping'):
-        assign_network_names_to_compvis_modules(shared.sd_model)
-
-    keys_failed_to_match = {}
-    is_sd2 = 'model_transformer_resblocks' in shared.sd_model.network_layer_mapping
-
-    matched_networks = {}
-    bundle_embeddings = {}
-
-    for key_network, weight in sd.items():
-        key_network_without_network_parts, _, network_part = key_network.partition(".")
-
-        if key_network_without_network_parts == "bundle_emb":
-            emb_name, vec_name = network_part.split(".", 1)
-            emb_dict = bundle_embeddings.get(emb_name, {})
-            if vec_name.split('.')[0] == 'string_to_param':
-                _, k2 = vec_name.split('.', 1)
-                emb_dict['string_to_param'] = {k2: weight}
-            else:
-                emb_dict[vec_name] = weight
-            bundle_embeddings[emb_name] = emb_dict
-
-        key = convert_diffusers_name_to_compvis(key_network_without_network_parts, is_sd2)
-        sd_module = shared.sd_model.network_layer_mapping.get(key, None)
-
-        if sd_module is None:
-            m = re_x_proj.match(key)
-            if m:
-                sd_module = shared.sd_model.network_layer_mapping.get(m.group(1), None)
-
-        # SDXL loras seem to already have correct compvis keys, so only need to replace "lora_unet" with "diffusion_model"
-        if sd_module is None and "lora_unet" in key_network_without_network_parts:
-            key = key_network_without_network_parts.replace("lora_unet", "diffusion_model")
-            sd_module = shared.sd_model.network_layer_mapping.get(key, None)
-        elif sd_module is None and "lora_te1_text_model" in key_network_without_network_parts:
-            key = key_network_without_network_parts.replace("lora_te1_text_model", "0_transformer_text_model")
-            sd_module = shared.sd_model.network_layer_mapping.get(key, None)
-
-            # some SD1 Loras also have correct compvis keys
-            if sd_module is None:
-                key = key_network_without_network_parts.replace("lora_te1_text_model", "transformer_text_model")
-                sd_module = shared.sd_model.network_layer_mapping.get(key, None)
-
-        # kohya_ss OFT module
-        elif sd_module is None and "oft_unet" in key_network_without_network_parts:
-            key = key_network_without_network_parts.replace("oft_unet", "diffusion_model")
-            sd_module = shared.sd_model.network_layer_mapping.get(key, None)
-
-        # KohakuBlueLeaf OFT module
-        if sd_module is None and "oft_diag" in key:
-            key = key_network_without_network_parts.replace("lora_unet", "diffusion_model")
-            key = key_network_without_network_parts.replace("lora_te1_text_model", "0_transformer_text_model")
-            sd_module = shared.sd_model.network_layer_mapping.get(key, None)
-
-        if sd_module is None:
-            keys_failed_to_match[key_network] = key
-            continue
-
-        if key not in matched_networks:
-            matched_networks[key] = network.NetworkWeights(network_key=key_network, sd_key=key, w={}, sd_module=sd_module)
-
-        matched_networks[key].w[network_part] = weight
-
-    for key, weights in matched_networks.items():
-        net_module = None
-        for nettype in module_types:
-            net_module = nettype.create_module(net, weights)
-            if net_module is not None:
-                break
-
-        if net_module is None:
-            raise AssertionError(f"Could not find a module type (out of {', '.join([x.__class__.__name__ for x in module_types])}) that would accept those keys: {', '.join(weights.w)}")
-
-        net.modules[key] = net_module
-
-    embeddings = {}
-    for emb_name, data in bundle_embeddings.items():
-        embedding = textual_inversion.create_embedding_from_data(data, emb_name, filename=network_on_disk.filename + "/" + emb_name)
-        embedding.loaded = None
-        embeddings[emb_name] = embedding
-
-    net.bundle_embeddings = embeddings
-
-    if keys_failed_to_match:
-        logging.debug(f"Network {network_on_disk.filename} didn't match keys: {keys_failed_to_match}")
-
-    return net
+    pass


 def purge_networks_from_memory():
-    while len(networks_in_memory) > shared.opts.lora_in_memory_limit and len(networks_in_memory) > 0:
-        name = next(iter(networks_in_memory))
-        networks_in_memory.pop(name, None)
-
-    devices.torch_gc()
+    pass


 def load_networks(names, te_multipliers=None, unet_multipliers=None, dyn_dims=None):
-    emb_db = sd_hijack.model_hijack.embedding_db
-    already_loaded = {}
+    global lora_state_dict_cache

-    for net in loaded_networks:
-        if net.name in names:
-            already_loaded[net.name] = net
-        for emb_name, embedding in net.bundle_embeddings.items():
-            if embedding.loaded:
-                emb_db.register_embedding_by_name(None, shared.sd_model, emb_name)
-
-    loaded_networks.clear()
+    current_sd = sd_models.model_data.get_sd_model()
+    if current_sd is None:
+        return

    networks_on_disk = [available_networks.get(name, None) if name.lower() in forbidden_network_aliases else available_network_aliases.get(name, None) for name in names]
    if any(x is None for x in networks_on_disk):
        list_available_networks()
-
        networks_on_disk = [available_networks.get(name, None) if name.lower() in forbidden_network_aliases else available_network_aliases.get(name, None) for name in names]

-    failed_to_load_networks = []
+    compiled_lora_targets = []
+    for a, b, c in zip(networks_on_disk, unet_multipliers, te_multipliers):
+        compiled_lora_targets.append([a.filename, b, c])

-    for i, (network_on_disk, name) in enumerate(zip(networks_on_disk, names)):
-        net = already_loaded.get(name, None)
+    compiled_lora_targets_hash = str(compiled_lora_targets)

-        if network_on_disk is not None:
-            if net is None:
-                net = networks_in_memory.get(name)
+    if current_sd.current_lora_hash == compiled_lora_targets_hash:
+        return

-            if net is None or os.path.getmtime(network_on_disk.filename) > net.mtime:
-                try:
-                    net = load_network(name, network_on_disk)
+    current_sd.current_lora_hash = compiled_lora_targets_hash
+    current_sd.forge_objects.unet = current_sd.forge_objects_original.unet
+    current_sd.forge_objects.clip = current_sd.forge_objects_original.clip

-                    networks_in_memory.pop(name, None)
-                    networks_in_memory[name] = net
-                except Exception as e:
-                    errors.display(e, f"loading network {network_on_disk.filename}")
-                    continue
+    for filename, strength_model, strength_clip in compiled_lora_targets:
+        lora_sd = load_lora_state_dict(filename)
+        current_sd.forge_objects.unet, current_sd.forge_objects.clip = load_lora_for_models(
+            current_sd.forge_objects.unet, current_sd.forge_objects.clip, lora_sd, strength_model, strength_clip)

-            net.mentioned_name = name
-
-            network_on_disk.read_hash()
-
-        if net is None:
-            failed_to_load_networks.append(name)
-            logging.info(f"Couldn't find network with name {name}")
-            continue
-
-        net.te_multiplier = te_multipliers[i] if te_multipliers else 1.0
-        net.unet_multiplier = unet_multipliers[i] if unet_multipliers else 1.0
-        net.dyn_dim = dyn_dims[i] if dyn_dims else 1.0
-        loaded_networks.append(net)
-
-        for emb_name, embedding in net.bundle_embeddings.items():
-            if embedding.loaded is None and emb_name in emb_db.word_embeddings:
-                logger.warning(
-                    f'Skip bundle embedding: "{emb_name}"'
-                    ' as it was already loaded from embeddings folder'
-                )
-                continue
-
-            embedding.loaded = False
-            if emb_db.expected_shape == -1 or emb_db.expected_shape == embedding.shape:
-                embedding.loaded = True
-                emb_db.register_embedding(embedding, shared.sd_model)
-            else:
-                emb_db.skipped_embeddings[name] = embedding
-
-    if failed_to_load_networks:
-        lora_not_found_message = f'Lora not found: {", ".join(failed_to_load_networks)}'
-        sd_hijack.model_hijack.comments.append(lora_not_found_message)
-        if shared.opts.lora_not_found_warning_console:
-            print(f'\n{lora_not_found_message}\n')
-        if shared.opts.lora_not_found_gradio_warning:
-            gr.Warning(lora_not_found_message)
-
-    purge_networks_from_memory()
+    current_sd.forge_objects_after_applying_lora = current_sd.forge_objects.shallow_copy()
+    return


 def network_restore_weights_from_backup(self: Union[torch.nn.Conv2d, torch.nn.Linear, torch.nn.GroupNorm, torch.nn.LayerNorm, torch.nn.MultiheadAttention]):
-    weights_backup = getattr(self, "network_weights_backup", None)
-    bias_backup = getattr(self, "network_bias_backup", None)
-
-    if weights_backup is None and bias_backup is None:
-        return
-
-    if weights_backup is not None:
-        if isinstance(self, torch.nn.MultiheadAttention):
-            self.in_proj_weight.copy_(weights_backup[0])
-            self.out_proj.weight.copy_(weights_backup[1])
-        else:
-            self.weight.copy_(weights_backup)
-
-    if bias_backup is not None:
-        if isinstance(self, torch.nn.MultiheadAttention):
-            self.out_proj.bias.copy_(bias_backup)
-        else:
-            self.bias.copy_(bias_backup)
-    else:
-        if isinstance(self, torch.nn.MultiheadAttention):
-            self.out_proj.bias = None
-        else:
-            self.bias = None
+    pass


 def network_apply_weights(self: Union[torch.nn.Conv2d, torch.nn.Linear, torch.nn.GroupNorm, torch.nn.LayerNorm, torch.nn.MultiheadAttention]):
-    """
-    Applies the currently selected set of networks to the weights of torch layer self.
-    If weights already have this particular set of networks applied, does nothing.
-    If not, restores orginal weights from backup and alters weights according to networks.
-    """
-
-    network_layer_name = getattr(self, 'network_layer_name', None)
-    if network_layer_name is None:
-        return
-
-    current_names = getattr(self, "network_current_names", ())
-    wanted_names = tuple((x.name, x.te_multiplier, x.unet_multiplier, x.dyn_dim) for x in loaded_networks)
-
-    weights_backup = getattr(self, "network_weights_backup", None)
-    if weights_backup is None and wanted_names != ():
-        if current_names != ():
-            raise RuntimeError("no backup weights found and current weights are not unchanged")
-
-        if isinstance(self, torch.nn.MultiheadAttention):
-            weights_backup = (self.in_proj_weight.to(devices.cpu, copy=True), self.out_proj.weight.to(devices.cpu, copy=True))
-        else:
-            weights_backup = self.weight.to(devices.cpu, copy=True)
-
-        self.network_weights_backup = weights_backup
-
-    bias_backup = getattr(self, "network_bias_backup", None)
-    if bias_backup is None:
-        if isinstance(self, torch.nn.MultiheadAttention) and self.out_proj.bias is not None:
-            bias_backup = self.out_proj.bias.to(devices.cpu, copy=True)
-        elif getattr(self, 'bias', None) is not None:
-            bias_backup = self.bias.to(devices.cpu, copy=True)
-        else:
-            bias_backup = None
-        self.network_bias_backup = bias_backup
-
-    if current_names != wanted_names:
-        network_restore_weights_from_backup(self)
-
-        for net in loaded_networks:
-            module = net.modules.get(network_layer_name, None)
-            if module is not None and hasattr(self, 'weight'):
-                try:
-                    with torch.no_grad():
-                        if getattr(self, 'fp16_weight', None) is None:
-                            weight = self.weight
-                            bias = self.bias
-                        else:
-                            weight = self.fp16_weight.clone().to(self.weight.device)
-                            bias = getattr(self, 'fp16_bias', None)
-                            if bias is not None:
-                                bias = bias.clone().to(self.bias.device)
-                        updown, ex_bias = module.calc_updown(weight)
-
-                        if len(weight.shape) == 4 and weight.shape[1] == 9:
-                            # inpainting model. zero pad updown to make channel[1]  4 to 9
-                            updown = torch.nn.functional.pad(updown, (0, 0, 0, 0, 0, 5))
-
-                        self.weight.copy_((weight.to(dtype=updown.dtype) + updown).to(dtype=self.weight.dtype))
-                        if ex_bias is not None and hasattr(self, 'bias'):
-                            if self.bias is None:
-                                self.bias = torch.nn.Parameter(ex_bias).to(self.weight.dtype)
-                            else:
-                                self.bias.copy_((bias + ex_bias).to(dtype=self.bias.dtype))
-                except RuntimeError as e:
-                    logging.debug(f"Network {net.name} layer {network_layer_name}: {e}")
-                    extra_network_lora.errors[net.name] = extra_network_lora.errors.get(net.name, 0) + 1
-
-                continue
-
-            module_q = net.modules.get(network_layer_name + "_q_proj", None)
-            module_k = net.modules.get(network_layer_name + "_k_proj", None)
-            module_v = net.modules.get(network_layer_name + "_v_proj", None)
-            module_out = net.modules.get(network_layer_name + "_out_proj", None)
-
-            if isinstance(self, torch.nn.MultiheadAttention) and module_q and module_k and module_v and module_out:
-                try:
-                    with torch.no_grad():
-                        updown_q, _ = module_q.calc_updown(self.in_proj_weight)
-                        updown_k, _ = module_k.calc_updown(self.in_proj_weight)
-                        updown_v, _ = module_v.calc_updown(self.in_proj_weight)
-                        updown_qkv = torch.vstack([updown_q, updown_k, updown_v])
-                        updown_out, ex_bias = module_out.calc_updown(self.out_proj.weight)
-
-                        self.in_proj_weight += updown_qkv
-                        self.out_proj.weight += updown_out
-                    if ex_bias is not None:
-                        if self.out_proj.bias is None:
-                            self.out_proj.bias = torch.nn.Parameter(ex_bias)
-                        else:
-                            self.out_proj.bias += ex_bias
-
-                except RuntimeError as e:
-                    logging.debug(f"Network {net.name} layer {network_layer_name}: {e}")
-                    extra_network_lora.errors[net.name] = extra_network_lora.errors.get(net.name, 0) + 1
-
-                continue
-
-            if module is None:
-                continue
-
-            logging.debug(f"Network {net.name} layer {network_layer_name}: couldn't find supported operation")
-            extra_network_lora.errors[net.name] = extra_network_lora.errors.get(net.name, 0) + 1
-
-        self.network_current_names = wanted_names
+    pass


 def network_forward(org_module, input, original_forward):
-    """
-    Old way of applying Lora by executing operations during layer's forward.
-    Stacking many loras this way results in big performance degradation.
-    """
-
-    if len(loaded_networks) == 0:
-        return original_forward(org_module, input)
-
-    input = devices.cond_cast_unet(input)
-
-    network_restore_weights_from_backup(org_module)
-    network_reset_cached_weight(org_module)
-
-    y = original_forward(org_module, input)
-
-    network_layer_name = getattr(org_module, 'network_layer_name', None)
-    for lora in loaded_networks:
-        module = lora.modules.get(network_layer_name, None)
-        if module is None:
-            continue
-
-        y = module.forward(input, y)
-
-    return y
+    pass


 def network_reset_cached_weight(self: Union[torch.nn.Conv2d, torch.nn.Linear]):
-    self.network_current_names = ()
-    self.network_weights_backup = None
-    self.network_bias_backup = None
+    pass


 def network_Linear_forward(self, input):
-    if shared.opts.lora_functional:
-        return network_forward(self, input, originals.Linear_forward)
-
-    network_apply_weights(self)
-
-    return originals.Linear_forward(self, input)
+    pass


 def network_Linear_load_state_dict(self, *args, **kwargs):
-    network_reset_cached_weight(self)
-
-    return originals.Linear_load_state_dict(self, *args, **kwargs)
+    pass


 def network_Conv2d_forward(self, input):
-    if shared.opts.lora_functional:
-        return network_forward(self, input, originals.Conv2d_forward)
-
-    network_apply_weights(self)
-
-    return originals.Conv2d_forward(self, input)
+    pass


 def network_Conv2d_load_state_dict(self, *args, **kwargs):
-    network_reset_cached_weight(self)
-
-    return originals.Conv2d_load_state_dict(self, *args, **kwargs)
+    pass


 def network_GroupNorm_forward(self, input):
-    if shared.opts.lora_functional:
-        return network_forward(self, input, originals.GroupNorm_forward)
-
-    network_apply_weights(self)
-
-    return originals.GroupNorm_forward(self, input)
+    pass


 def network_GroupNorm_load_state_dict(self, *args, **kwargs):
-    network_reset_cached_weight(self)
-
-    return originals.GroupNorm_load_state_dict(self, *args, **kwargs)
+    pass


 def network_LayerNorm_forward(self, input):
-    if shared.opts.lora_functional:
-        return network_forward(self, input, originals.LayerNorm_forward)
-
-    network_apply_weights(self)
-
-    return originals.LayerNorm_forward(self, input)
+    pass


 def network_LayerNorm_load_state_dict(self, *args, **kwargs):
-    network_reset_cached_weight(self)
-
-    return originals.LayerNorm_load_state_dict(self, *args, **kwargs)
+    pass


 def network_MultiheadAttention_forward(self, *args, **kwargs):
-    network_apply_weights(self)
-
-    return originals.MultiheadAttention_forward(self, *args, **kwargs)
+    pass


 def network_MultiheadAttention_load_state_dict(self, *args, **kwargs):
-    network_reset_cached_weight(self)
-
-    return originals.MultiheadAttention_load_state_dict(self, *args, **kwargs)
+    pass


 def list_available_networks():
--- a/extensions-builtin/SwinIR/scripts/swinir_model.py
+++ b/extensions-builtin/SwinIR/scripts/swinir_model.py
@@ -5,7 +5,7 @@ import torch
 from PIL import Image

 from modules import devices, modelloader, script_callbacks, shared, upscaler_utils
-from modules.upscaler import Upscaler, UpscalerData
+from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory

 SWINIR_MODEL_URL = "https://github.com/JingyunLiang/SwinIR/releases/download/v0.0/003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth"

@@ -33,6 +33,8 @@ class UpscalerSwinIR(Upscaler):
        self.scalers = scalers

    def do_upscale(self, img: Image.Image, model_file: str) -> Image.Image:
+        prepare_free_memory()
+
        current_config = (model_file, shared.opts.SWIN_tile)

        if self._cached_model_config == current_config:
--- a/extensions-builtin/hypertile/hypertile.py
+++ b/extensions-builtin/hypertile/hypertile.py
@@ -1,351 +0,0 @@
-"""
-Hypertile module for splitting attention layers in SD-1.5 U-Net and SD-1.5 VAE
-Warn: The patch works well only if the input image has a width and height that are multiples of 128
-Original author: @tfernd Github: https://github.com/tfernd/HyperTile
-"""
-
-from __future__ import annotations
-
-from dataclasses import dataclass
-from typing import Callable
-
-from functools import wraps, cache
-
-import math
-import torch.nn as nn
-import random
-
-from einops import rearrange
-
-
-@dataclass
-class HypertileParams:
-    depth = 0
-    layer_name = ""
-    tile_size: int = 0
-    swap_size: int = 0
-    aspect_ratio: float = 1.0
-    forward = None
-    enabled = False
-
-
-
-# TODO add SD-XL layers
-DEPTH_LAYERS = {
-    0: [
-        # SD 1.5 U-Net (diffusers)
-        "down_blocks.0.attentions.0.transformer_blocks.0.attn1",
-        "down_blocks.0.attentions.1.transformer_blocks.0.attn1",
-        "up_blocks.3.attentions.0.transformer_blocks.0.attn1",
-        "up_blocks.3.attentions.1.transformer_blocks.0.attn1",
-        "up_blocks.3.attentions.2.transformer_blocks.0.attn1",
-        # SD 1.5 U-Net (ldm)
-        "input_blocks.1.1.transformer_blocks.0.attn1",
-        "input_blocks.2.1.transformer_blocks.0.attn1",
-        "output_blocks.9.1.transformer_blocks.0.attn1",
-        "output_blocks.10.1.transformer_blocks.0.attn1",
-        "output_blocks.11.1.transformer_blocks.0.attn1",
-        # SD 1.5 VAE
-        "decoder.mid_block.attentions.0",
-        "decoder.mid.attn_1",
-    ],
-    1: [
-        # SD 1.5 U-Net (diffusers)
-        "down_blocks.1.attentions.0.transformer_blocks.0.attn1",
-        "down_blocks.1.attentions.1.transformer_blocks.0.attn1",
-        "up_blocks.2.attentions.0.transformer_blocks.0.attn1",
-        "up_blocks.2.attentions.1.transformer_blocks.0.attn1",
-        "up_blocks.2.attentions.2.transformer_blocks.0.attn1",
-        # SD 1.5 U-Net (ldm)
-        "input_blocks.4.1.transformer_blocks.0.attn1",
-        "input_blocks.5.1.transformer_blocks.0.attn1",
-        "output_blocks.6.1.transformer_blocks.0.attn1",
-        "output_blocks.7.1.transformer_blocks.0.attn1",
-        "output_blocks.8.1.transformer_blocks.0.attn1",
-    ],
-    2: [
-        # SD 1.5 U-Net (diffusers)
-        "down_blocks.2.attentions.0.transformer_blocks.0.attn1",
-        "down_blocks.2.attentions.1.transformer_blocks.0.attn1",
-        "up_blocks.1.attentions.0.transformer_blocks.0.attn1",
-        "up_blocks.1.attentions.1.transformer_blocks.0.attn1",
-        "up_blocks.1.attentions.2.transformer_blocks.0.attn1",
-        # SD 1.5 U-Net (ldm)
-        "input_blocks.7.1.transformer_blocks.0.attn1",
-        "input_blocks.8.1.transformer_blocks.0.attn1",
-        "output_blocks.3.1.transformer_blocks.0.attn1",
-        "output_blocks.4.1.transformer_blocks.0.attn1",
-        "output_blocks.5.1.transformer_blocks.0.attn1",
-    ],
-    3: [
-        # SD 1.5 U-Net (diffusers)
-        "mid_block.attentions.0.transformer_blocks.0.attn1",
-        # SD 1.5 U-Net (ldm)
-        "middle_block.1.transformer_blocks.0.attn1",
-    ],
-}
-# XL layers, thanks for GitHub@gel-crabs for the help
-DEPTH_LAYERS_XL = {
-    0: [
-        # SD 1.5 U-Net (diffusers)
-        "down_blocks.0.attentions.0.transformer_blocks.0.attn1",
-        "down_blocks.0.attentions.1.transformer_blocks.0.attn1",
-        "up_blocks.3.attentions.0.transformer_blocks.0.attn1",
-        "up_blocks.3.attentions.1.transformer_blocks.0.attn1",
-        "up_blocks.3.attentions.2.transformer_blocks.0.attn1",
-        # SD 1.5 U-Net (ldm)
-        "input_blocks.4.1.transformer_blocks.0.attn1",
-        "input_blocks.5.1.transformer_blocks.0.attn1",
-        "output_blocks.3.1.transformer_blocks.0.attn1",
-        "output_blocks.4.1.transformer_blocks.0.attn1",
-        "output_blocks.5.1.transformer_blocks.0.attn1",
-        # SD 1.5 VAE
-        "decoder.mid_block.attentions.0",
-        "decoder.mid.attn_1",
-    ],
-    1: [
-        # SD 1.5 U-Net (diffusers)
-        #"down_blocks.1.attentions.0.transformer_blocks.0.attn1",
-        #"down_blocks.1.attentions.1.transformer_blocks.0.attn1",
-        #"up_blocks.2.attentions.0.transformer_blocks.0.attn1",
-        #"up_blocks.2.attentions.1.transformer_blocks.0.attn1",
-        #"up_blocks.2.attentions.2.transformer_blocks.0.attn1",
-        # SD 1.5 U-Net (ldm)
-        "input_blocks.4.1.transformer_blocks.1.attn1",
-        "input_blocks.5.1.transformer_blocks.1.attn1",
-        "output_blocks.3.1.transformer_blocks.1.attn1",
-        "output_blocks.4.1.transformer_blocks.1.attn1",
-        "output_blocks.5.1.transformer_blocks.1.attn1",
-        "input_blocks.7.1.transformer_blocks.0.attn1",
-        "input_blocks.8.1.transformer_blocks.0.attn1",
-        "output_blocks.0.1.transformer_blocks.0.attn1",
-        "output_blocks.1.1.transformer_blocks.0.attn1",
-        "output_blocks.2.1.transformer_blocks.0.attn1",
-        "input_blocks.7.1.transformer_blocks.1.attn1",
-        "input_blocks.8.1.transformer_blocks.1.attn1",
-        "output_blocks.0.1.transformer_blocks.1.attn1",
-        "output_blocks.1.1.transformer_blocks.1.attn1",
-        "output_blocks.2.1.transformer_blocks.1.attn1",
-        "input_blocks.7.1.transformer_blocks.2.attn1",
-        "input_blocks.8.1.transformer_blocks.2.attn1",
-        "output_blocks.0.1.transformer_blocks.2.attn1",
-        "output_blocks.1.1.transformer_blocks.2.attn1",
-        "output_blocks.2.1.transformer_blocks.2.attn1",
-        "input_blocks.7.1.transformer_blocks.3.attn1",
-        "input_blocks.8.1.transformer_blocks.3.attn1",
-        "output_blocks.0.1.transformer_blocks.3.attn1",
-        "output_blocks.1.1.transformer_blocks.3.attn1",
-        "output_blocks.2.1.transformer_blocks.3.attn1",
-        "input_blocks.7.1.transformer_blocks.4.attn1",
-        "input_blocks.8.1.transformer_blocks.4.attn1",
-        "output_blocks.0.1.transformer_blocks.4.attn1",
-        "output_blocks.1.1.transformer_blocks.4.attn1",
-        "output_blocks.2.1.transformer_blocks.4.attn1",
-        "input_blocks.7.1.transformer_blocks.5.attn1",
-        "input_blocks.8.1.transformer_blocks.5.attn1",
-        "output_blocks.0.1.transformer_blocks.5.attn1",
-        "output_blocks.1.1.transformer_blocks.5.attn1",
-        "output_blocks.2.1.transformer_blocks.5.attn1",
-        "input_blocks.7.1.transformer_blocks.6.attn1",
-        "input_blocks.8.1.transformer_blocks.6.attn1",
-        "output_blocks.0.1.transformer_blocks.6.attn1",
-        "output_blocks.1.1.transformer_blocks.6.attn1",
-        "output_blocks.2.1.transformer_blocks.6.attn1",
-        "input_blocks.7.1.transformer_blocks.7.attn1",
-        "input_blocks.8.1.transformer_blocks.7.attn1",
-        "output_blocks.0.1.transformer_blocks.7.attn1",
-        "output_blocks.1.1.transformer_blocks.7.attn1",
-        "output_blocks.2.1.transformer_blocks.7.attn1",
-        "input_blocks.7.1.transformer_blocks.8.attn1",
-        "input_blocks.8.1.transformer_blocks.8.attn1",
-        "output_blocks.0.1.transformer_blocks.8.attn1",
-        "output_blocks.1.1.transformer_blocks.8.attn1",
-        "output_blocks.2.1.transformer_blocks.8.attn1",
-        "input_blocks.7.1.transformer_blocks.9.attn1",
-        "input_blocks.8.1.transformer_blocks.9.attn1",
-        "output_blocks.0.1.transformer_blocks.9.attn1",
-        "output_blocks.1.1.transformer_blocks.9.attn1",
-        "output_blocks.2.1.transformer_blocks.9.attn1",
-    ],
-    2: [
-        # SD 1.5 U-Net (diffusers)
-        "mid_block.attentions.0.transformer_blocks.0.attn1",
-        # SD 1.5 U-Net (ldm)
-        "middle_block.1.transformer_blocks.0.attn1",
-        "middle_block.1.transformer_blocks.1.attn1",
-        "middle_block.1.transformer_blocks.2.attn1",
-        "middle_block.1.transformer_blocks.3.attn1",
-        "middle_block.1.transformer_blocks.4.attn1",
-        "middle_block.1.transformer_blocks.5.attn1",
-        "middle_block.1.transformer_blocks.6.attn1",
-        "middle_block.1.transformer_blocks.7.attn1",
-        "middle_block.1.transformer_blocks.8.attn1",
-        "middle_block.1.transformer_blocks.9.attn1",
-    ],
-    3 : [] # TODO - separate layers for SD-XL
-}
-
-
-RNG_INSTANCE = random.Random()
-
-@cache
-def get_divisors(value: int, min_value: int, /, max_options: int = 1) -> list[int]:
-    """
-    Returns divisors of value that
-        x * min_value <= value
-    in big -> small order, amount of divisors is limited by max_options
-    """
-    max_options = max(1, max_options) # at least 1 option should be returned
-    min_value = min(min_value, value)
-    divisors = [i for i in range(min_value, value + 1) if value % i == 0] # divisors in small -> big order
-    ns = [value // i for i in divisors[:max_options]]  # has at least 1 element # big -> small order
-    return ns
-
-
-def random_divisor(value: int, min_value: int, /, max_options: int = 1) -> int:
-    """
-    Returns a random divisor of value that
-        x * min_value <= value
-    if max_options is 1, the behavior is deterministic
-    """
-    ns = get_divisors(value, min_value, max_options=max_options) # get cached divisors
-    idx = RNG_INSTANCE.randint(0, len(ns) - 1)
-
-    return ns[idx]
-
-
-def set_hypertile_seed(seed: int) -> None:
-    RNG_INSTANCE.seed(seed)
-
-
-@cache
-def largest_tile_size_available(width: int, height: int) -> int:
-    """
-    Calculates the largest tile size available for a given width and height
-    Tile size is always a power of 2
-    """
-    gcd = math.gcd(width, height)
-    largest_tile_size_available = 1
-    while gcd % (largest_tile_size_available * 2) == 0:
-        largest_tile_size_available *= 2
-    return largest_tile_size_available
-
-
-def iterative_closest_divisors(hw:int, aspect_ratio:float) -> tuple[int, int]:
-    """
-    Finds h and w such that h*w = hw and h/w = aspect_ratio
-    We check all possible divisors of hw and return the closest to the aspect ratio
-    """
-    divisors = [i for i in range(2, hw + 1) if hw % i == 0] # all divisors of hw
-    pairs = [(i, hw // i) for i in divisors] # all pairs of divisors of hw
-    ratios = [w/h for h, w in pairs] # all ratios of pairs of divisors of hw
-    closest_ratio = min(ratios, key=lambda x: abs(x - aspect_ratio)) # closest ratio to aspect_ratio
-    closest_pair = pairs[ratios.index(closest_ratio)] # closest pair of divisors to aspect_ratio
-    return closest_pair
-
-
-@cache
-def find_hw_candidates(hw:int, aspect_ratio:float) -> tuple[int, int]:
-    """
-    Finds h and w such that h*w = hw and h/w = aspect_ratio
-    """
-    h, w = round(math.sqrt(hw * aspect_ratio)), round(math.sqrt(hw / aspect_ratio))
-    # find h and w such that h*w = hw and h/w = aspect_ratio
-    if h * w != hw:
-        w_candidate = hw / h
-        # check if w is an integer
-        if not w_candidate.is_integer():
-            h_candidate = hw / w
-            # check if h is an integer
-            if not h_candidate.is_integer():
-                return iterative_closest_divisors(hw, aspect_ratio)
-            else:
-                h = int(h_candidate)
-        else:
-            w = int(w_candidate)
-    return h, w
-
-
-def self_attn_forward(params: HypertileParams, scale_depth=True) -> Callable:
-
-    @wraps(params.forward)
-    def wrapper(*args, **kwargs):
-        if not params.enabled:
-            return params.forward(*args, **kwargs)
-
-        latent_tile_size = max(128, params.tile_size) // 8
-        x = args[0]
-
-        # VAE
-        if x.ndim == 4:
-            b, c, h, w = x.shape
-
-            nh = random_divisor(h, latent_tile_size, params.swap_size)
-            nw = random_divisor(w, latent_tile_size, params.swap_size)
-
-            if nh * nw > 1:
-                x = rearrange(x, "b c (nh h) (nw w) -> (b nh nw) c h w", nh=nh, nw=nw)  # split into nh * nw tiles
-
-            out = params.forward(x, *args[1:], **kwargs)
-
-            if nh * nw > 1:
-                out = rearrange(out, "(b nh nw) c h w -> b c (nh h) (nw w)", nh=nh, nw=nw)
-
-        # U-Net
-        else:
-            hw: int = x.size(1)
-            h, w = find_hw_candidates(hw, params.aspect_ratio)
-            assert h * w == hw, f"Invalid aspect ratio {params.aspect_ratio} for input of shape {x.shape}, hw={hw}, h={h}, w={w}"
-
-            factor = 2 ** params.depth if scale_depth else 1
-            nh = random_divisor(h, latent_tile_size * factor, params.swap_size)
-            nw = random_divisor(w, latent_tile_size * factor, params.swap_size)
-
-            if nh * nw > 1:
-                x = rearrange(x, "b (nh h nw w) c -> (b nh nw) (h w) c", h=h // nh, w=w // nw, nh=nh, nw=nw)
-
-            out = params.forward(x, *args[1:], **kwargs)
-
-            if nh * nw > 1:
-                out = rearrange(out, "(b nh nw) hw c -> b nh nw hw c", nh=nh, nw=nw)
-                out = rearrange(out, "b nh nw (h w) c -> b (nh h nw w) c", h=h // nh, w=w // nw)
-
-        return out
-
-    return wrapper
-
-
-def hypertile_hook_model(model: nn.Module, width, height, *, enable=False, tile_size_max=128, swap_size=1, max_depth=3, is_sdxl=False):
-    hypertile_layers = getattr(model, "__webui_hypertile_layers", None)
-    if hypertile_layers is None:
-        if not enable:
-            return
-
-        hypertile_layers = {}
-        layers = DEPTH_LAYERS_XL if is_sdxl else DEPTH_LAYERS
-
-        for depth in range(4):
-            for layer_name, module in model.named_modules():
-                if any(layer_name.endswith(try_name) for try_name in layers[depth]):
-                    params = HypertileParams()
-                    module.__webui_hypertile_params = params
-                    params.forward = module.forward
-                    params.depth = depth
-                    params.layer_name = layer_name
-                    module.forward = self_attn_forward(params)
-
-                    hypertile_layers[layer_name] = 1
-
-        model.__webui_hypertile_layers = hypertile_layers
-
-    aspect_ratio = width / height
-    tile_size = min(largest_tile_size_available(width, height), tile_size_max)
-
-    for layer_name, module in model.named_modules():
-        if layer_name in hypertile_layers:
-            params = module.__webui_hypertile_params
-
-            params.tile_size = tile_size
-            params.swap_size = swap_size
-            params.aspect_ratio = aspect_ratio
-            params.enabled = enable and params.depth <= max_depth
--- a/extensions-builtin/hypertile/scripts/hypertile_script.py
+++ b/extensions-builtin/hypertile/scripts/hypertile_script.py
@@ -1,109 +0,0 @@
-import hypertile
-from modules import scripts, script_callbacks, shared
-from scripts.hypertile_xyz import add_axis_options
-
-
-class ScriptHypertile(scripts.Script):
-    name = "Hypertile"
-
-    def title(self):
-        return self.name
-
-    def show(self, is_img2img):
-        return scripts.AlwaysVisible
-
-    def process(self, p, *args):
-        hypertile.set_hypertile_seed(p.all_seeds[0])
-
-        configure_hypertile(p.width, p.height, enable_unet=shared.opts.hypertile_enable_unet)
-
-        self.add_infotext(p)
-
-    def before_hr(self, p, *args):
-
-        enable = shared.opts.hypertile_enable_unet_secondpass or shared.opts.hypertile_enable_unet
-
-        # exclusive hypertile seed for the second pass
-        if enable:
-            hypertile.set_hypertile_seed(p.all_seeds[0])
-
-        configure_hypertile(p.hr_upscale_to_x, p.hr_upscale_to_y, enable_unet=enable)
-
-        if enable and not shared.opts.hypertile_enable_unet:
-            p.extra_generation_params["Hypertile U-Net second pass"] = True
-
-            self.add_infotext(p, add_unet_params=True)
-
-    def add_infotext(self, p, add_unet_params=False):
-        def option(name):
-            value = getattr(shared.opts, name)
-            default_value = shared.opts.get_default(name)
-            return None if value == default_value else value
-
-        if shared.opts.hypertile_enable_unet:
-            p.extra_generation_params["Hypertile U-Net"] = True
-
-        if shared.opts.hypertile_enable_unet or add_unet_params:
-            p.extra_generation_params["Hypertile U-Net max depth"] = option('hypertile_max_depth_unet')
-            p.extra_generation_params["Hypertile U-Net max tile size"] = option('hypertile_max_tile_unet')
-            p.extra_generation_params["Hypertile U-Net swap size"] = option('hypertile_swap_size_unet')
-
-        if shared.opts.hypertile_enable_vae:
-            p.extra_generation_params["Hypertile VAE"] = True
-            p.extra_generation_params["Hypertile VAE max depth"] = option('hypertile_max_depth_vae')
-            p.extra_generation_params["Hypertile VAE max tile size"] = option('hypertile_max_tile_vae')
-            p.extra_generation_params["Hypertile VAE swap size"] = option('hypertile_swap_size_vae')
-
-
-def configure_hypertile(width, height, enable_unet=True):
-    hypertile.hypertile_hook_model(
-        shared.sd_model.first_stage_model,
-        width,
-        height,
-        swap_size=shared.opts.hypertile_swap_size_vae,
-        max_depth=shared.opts.hypertile_max_depth_vae,
-        tile_size_max=shared.opts.hypertile_max_tile_vae,
-        enable=shared.opts.hypertile_enable_vae,
-    )
-
-    hypertile.hypertile_hook_model(
-        shared.sd_model.model,
-        width,
-        height,
-        swap_size=shared.opts.hypertile_swap_size_unet,
-        max_depth=shared.opts.hypertile_max_depth_unet,
-        tile_size_max=shared.opts.hypertile_max_tile_unet,
-        enable=enable_unet,
-        is_sdxl=shared.sd_model.is_sdxl
-    )
-
-
-def on_ui_settings():
-    import gradio as gr
-
-    options = {
-        "hypertile_explanation": shared.OptionHTML("""
-    <a href='https://github.com/tfernd/HyperTile'>Hypertile</a> optimizes the self-attention layer within U-Net and VAE models,
-    resulting in a reduction in computation time ranging from 1 to 4 times. The larger the generated image is, the greater the
-    benefit.
-    """),
-
-        "hypertile_enable_unet": shared.OptionInfo(False, "Enable Hypertile U-Net", infotext="Hypertile U-Net").info("enables hypertile for all modes, including hires fix second pass; noticeable change in details of the generated picture"),
-        "hypertile_enable_unet_secondpass": shared.OptionInfo(False, "Enable Hypertile U-Net for hires fix second pass", infotext="Hypertile U-Net second pass").info("enables hypertile just for hires fix second pass - regardless of whether the above setting is enabled"),
-        "hypertile_max_depth_unet": shared.OptionInfo(3, "Hypertile U-Net max depth", gr.Slider, {"minimum": 0, "maximum": 3, "step": 1}, infotext="Hypertile U-Net max depth").info("larger = more neural network layers affected; minor effect on performance"),
-        "hypertile_max_tile_unet": shared.OptionInfo(256, "Hypertile U-Net max tile size", gr.Slider, {"minimum": 0, "maximum": 512, "step": 16}, infotext="Hypertile U-Net max tile size").info("larger = worse performance"),
-        "hypertile_swap_size_unet": shared.OptionInfo(3, "Hypertile U-Net swap size", gr.Slider, {"minimum": 0, "maximum": 64, "step": 1}, infotext="Hypertile U-Net swap size"),
-
-        "hypertile_enable_vae": shared.OptionInfo(False, "Enable Hypertile VAE", infotext="Hypertile VAE").info("minimal change in the generated picture"),
-        "hypertile_max_depth_vae": shared.OptionInfo(3, "Hypertile VAE max depth", gr.Slider, {"minimum": 0, "maximum": 3, "step": 1}, infotext="Hypertile VAE max depth"),
-        "hypertile_max_tile_vae": shared.OptionInfo(128, "Hypertile VAE max tile size", gr.Slider, {"minimum": 0, "maximum": 512, "step": 16}, infotext="Hypertile VAE max tile size"),
-        "hypertile_swap_size_vae": shared.OptionInfo(3, "Hypertile VAE swap size ", gr.Slider, {"minimum": 0, "maximum": 64, "step": 1}, infotext="Hypertile VAE swap size"),
-    }
-
-    for name, opt in options.items():
-        opt.section = ('hypertile', "Hypertile")
-        shared.opts.add_option(name, opt)
-
-
-script_callbacks.on_ui_settings(on_ui_settings)
-script_callbacks.on_before_ui(add_axis_options)
--- a/extensions-builtin/hypertile/scripts/hypertile_xyz.py
+++ b/extensions-builtin/hypertile/scripts/hypertile_xyz.py
@@ -1,51 +0,0 @@
-from modules import scripts
-from modules.shared import opts
-
-xyz_grid = [x for x in scripts.scripts_data if x.script_class.__module__ == "xyz_grid.py"][0].module
-
-def int_applier(value_name:str, min_range:int = -1, max_range:int = -1):
-    """
-    Returns a function that applies the given value to the given value_name in opts.data.
-    """
-    def validate(value_name:str, value:str):
-        value = int(value)
-        # validate value
-        if not min_range == -1:
-            assert value >= min_range, f"Value {value} for {value_name} must be greater than or equal to {min_range}"
-        if not max_range == -1:
-            assert value <= max_range, f"Value {value} for {value_name} must be less than or equal to {max_range}"
-    def apply_int(p, x, xs):
-        validate(value_name, x)
-        opts.data[value_name] = int(x)
-    return apply_int
-
-def bool_applier(value_name:str):
-    """
-    Returns a function that applies the given value to the given value_name in opts.data.
-    """
-    def validate(value_name:str, value:str):
-        assert value.lower() in ["true", "false"], f"Value {value} for {value_name} must be either true or false"
-    def apply_bool(p, x, xs):
-        validate(value_name, x)
-        value_boolean = x.lower() == "true"
-        opts.data[value_name] = value_boolean
-    return apply_bool
-
-def add_axis_options():
-    extra_axis_options = [
-        xyz_grid.AxisOption("[Hypertile] Unet First pass Enabled", str, bool_applier("hypertile_enable_unet"), choices=xyz_grid.boolean_choice(reverse=True)),
-        xyz_grid.AxisOption("[Hypertile] Unet Second pass Enabled", str, bool_applier("hypertile_enable_unet_secondpass"), choices=xyz_grid.boolean_choice(reverse=True)),
-        xyz_grid.AxisOption("[Hypertile] Unet Max Depth", int, int_applier("hypertile_max_depth_unet", 0, 3), choices=lambda: [str(x) for x in range(4)]),
-        xyz_grid.AxisOption("[Hypertile] Unet Max Tile Size", int, int_applier("hypertile_max_tile_unet", 0, 512)),
-        xyz_grid.AxisOption("[Hypertile] Unet Swap Size", int, int_applier("hypertile_swap_size_unet", 0, 64)),
-        xyz_grid.AxisOption("[Hypertile] VAE Enabled", str, bool_applier("hypertile_enable_vae"), choices=xyz_grid.boolean_choice(reverse=True)),
-        xyz_grid.AxisOption("[Hypertile] VAE Max Depth", int, int_applier("hypertile_max_depth_vae", 0, 3), choices=lambda: [str(x) for x in range(4)]),
-        xyz_grid.AxisOption("[Hypertile] VAE Max Tile Size", int, int_applier("hypertile_max_tile_vae", 0, 512)),
-        xyz_grid.AxisOption("[Hypertile] VAE Swap Size", int, int_applier("hypertile_swap_size_vae", 0, 64)),
-    ]
-    set_a = {opt.label for opt in xyz_grid.axis_options}
-    set_b = {opt.label for opt in extra_axis_options}
-    if set_a.intersection(set_b):
-        return
-
-    xyz_grid.axis_options.extend(extra_axis_options)
--- a/extensions-builtin/sd_forge_controlnet/LICENSE
+++ b/extensions-builtin/sd_forge_controlnet/LICENSE
@@ -0,0 +1,674 @@
+                    GNU GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.  We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors.  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights.  Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received.  You must make sure that they, too, receive
+or can get the source code.  And you must show them these terms so they
+know their rights.
+
+  Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+  For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software.  For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+  Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so.  This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software.  The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable.  Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products.  If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+  Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary.  To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Use with the GNU Affero General Public License.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+    <program>  Copyright (C) <year>  <name of author>
+    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<https://www.gnu.org/licenses/>.
+
+  The GNU General Public License does not permit incorporating your program
+into proprietary programs.  If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.  But first, please read
+<https://www.gnu.org/licenses/why-not-lgpl.html>.
--- a/extensions-builtin/sd_forge_freeu/scripts/forge_freeu.py
+++ b/extensions-builtin/sd_forge_freeu/scripts/forge_freeu.py
@@ -0,0 +1,86 @@
+import gradio as gr
+
+from modules import scripts
+from ldm_patched.contrib.external_freelunch import FreeU_V2
+
+
+opFreeU_V2 = FreeU_V2()
+
+
+# def Fourier_filter(x, threshold, scale):
+#     x_freq = torch.fft.fftn(x.float(), dim=(-2, -1))
+#     x_freq = torch.fft.fftshift(x_freq, dim=(-2, -1))
+#     B, C, H, W = x_freq.shape
+#     mask = torch.ones((B, C, H, W), device=x.device)
+#     crow, ccol = H // 2, W //2
+#     mask[..., crow - threshold:crow + threshold, ccol - threshold:ccol + threshold] = scale
+#     x_freq = x_freq * mask
+#     x_freq = torch.fft.ifftshift(x_freq, dim=(-2, -1))
+#     x_filtered = torch.fft.ifftn(x_freq, dim=(-2, -1)).real
+#     return x_filtered.to(x.dtype)
+#
+#
+# def set_freeu_v2_patch(model, b1, b2, s1, s2):
+#     model_channels = model.model.model_config.unet_config["model_channels"]
+#     scale_dict = {model_channels * 4: (b1, s1), model_channels * 2: (b2, s2)}
+#
+#     def output_block_patch(h, hsp, *args, **kwargs):
+#         scale = scale_dict.get(h.shape[1], None)
+#         if scale is not None:
+#             hidden_mean = h.mean(1).unsqueeze(1)
+#             B = hidden_mean.shape[0]
+#             hidden_max, _ = torch.max(hidden_mean.view(B, -1), dim=-1, keepdim=True)
+#             hidden_min, _ = torch.min(hidden_mean.view(B, -1), dim=-1, keepdim=True)
+#             hidden_mean = (hidden_mean - hidden_min.unsqueeze(2).unsqueeze(3)) / \
+#                           (hidden_max - hidden_min).unsqueeze(2).unsqueeze(3)
+#             h[:, :h.shape[1] // 2] = h[:, :h.shape[1] // 2] * ((scale[0] - 1) * hidden_mean + 1)
+#             hsp = Fourier_filter(hsp, threshold=1, scale=scale[1])
+#         return h, hsp
+#
+#     m = model.clone()
+#     m.set_model_output_block_patch(output_block_patch)
+#     return m
+
+
+class FreeUForForge(scripts.Script):
+    def title(self):
+        return "FreeU Integrated"
+
+    def show(self, is_img2img):
+        # make this extension visible in both txt2img and img2img tab.
+        return scripts.AlwaysVisible
+
+    def ui(self, *args, **kwargs):
+        with gr.Accordion(open=False, label=self.title()):
+            freeu_enabled = gr.Checkbox(label='Enabled', value=False)
+            freeu_b1 = gr.Slider(label='B1', minimum=0, maximum=2, step=0.01, value=1.01)
+            freeu_b2 = gr.Slider(label='B2', minimum=0, maximum=2, step=0.01, value=1.02)
+            freeu_s1 = gr.Slider(label='S1', minimum=0, maximum=4, step=0.01, value=0.99)
+            freeu_s2 = gr.Slider(label='S2', minimum=0, maximum=4, step=0.01, value=0.95)
+
+        return freeu_enabled, freeu_b1, freeu_b2, freeu_s1, freeu_s2
+
+    def process_batch(self, p, *script_args, **kwargs):
+        freeu_enabled, freeu_b1, freeu_b2, freeu_s1, freeu_s2 = script_args
+
+        if not freeu_enabled:
+            return
+
+        unet = p.sd_model.forge_objects.unet
+
+        # unet = set_freeu_v2_patch(unet, freeu_b1, freeu_b2, freeu_s1, freeu_s2)
+        unet = opFreeU_V2.patch(unet, freeu_b1, freeu_b2, freeu_s1, freeu_s2)[0]
+
+        p.sd_model.forge_objects.unet = unet
+
+        # Below codes will add some logs to the texts below the image outputs on UI.
+        # The extra_generation_params does not influence results.
+        p.extra_generation_params.update(dict(
+            freeu_enabled=freeu_enabled,
+            freeu_b1=freeu_b1,
+            freeu_b2=freeu_b2,
+            freeu_s1=freeu_s1,
+            freeu_s2=freeu_s2,
+        ))
+
+        return
--- a/extensions-builtin/sd_forge_hypertile/scripts/forge_hypertile.py
+++ b/extensions-builtin/sd_forge_hypertile/scripts/forge_hypertile.py
@@ -0,0 +1,48 @@
+import gradio as gr
+
+from modules import scripts
+from ldm_patched.contrib.external_hypertile import HyperTile
+
+
+opHyperTile = HyperTile()
+
+
+class HyperTileForForge(scripts.Script):
+    def title(self):
+        return "HyperTile Integrated"
+
+    def show(self, is_img2img):
+        return scripts.AlwaysVisible
+
+    def ui(self, *args, **kwargs):
+        with gr.Accordion(open=False, label=self.title()):
+            enabled = gr.Checkbox(label='Enabled', value=False)
+            tile_size = gr.Slider(label='Tile Size', minimum=1, maximum=2048, step=1, value=256)
+            swap_size = gr.Slider(label='Swap Size', minimum=1, maximum=128, step=1, value=2)
+            max_depth = gr.Slider(label='Max Depth', minimum=0, maximum=10, step=1, value=0)
+            scale_depth = gr.Checkbox(label='Scale Depth', value=False)
+
+        return enabled, tile_size, swap_size, max_depth, scale_depth
+
+    def process_batch(self, p, *script_args, **kwargs):
+        enabled, tile_size, swap_size, max_depth, scale_depth = script_args
+        tile_size, swap_size, max_depth = int(tile_size), int(swap_size), int(max_depth)
+
+        if not enabled:
+            return
+
+        unet = p.sd_model.forge_objects.unet
+
+        unet = opHyperTile.patch(unet, tile_size, swap_size, max_depth, scale_depth)[0]
+
+        p.sd_model.forge_objects.unet = unet
+
+        p.extra_generation_params.update(dict(
+            HyperTile_enabled=enabled,
+            HyperTile_tile_size=tile_size,
+            HyperTile_swap_size=swap_size,
+            HyperTile_max_depth=max_depth,
+            HyperTile_scale_depth=scale_depth,
+        ))
+
+        return
--- a/extensions-builtin/sd_forge_kohya_hrfix/scripts/kohya_hrfix.py
+++ b/extensions-builtin/sd_forge_kohya_hrfix/scripts/kohya_hrfix.py
@@ -0,0 +1,55 @@
+import gradio as gr
+
+from modules import scripts
+from ldm_patched.contrib.external_model_downscale import PatchModelAddDownscale
+
+
+opPatchModelAddDownscale = PatchModelAddDownscale()
+
+
+class KohyaHRFixForForge(scripts.Script):
+    def title(self):
+        return "Kohya HRFix Integrated"
+
+    def show(self, is_img2img):
+        return scripts.AlwaysVisible
+
+    def ui(self, *args, **kwargs):
+        upscale_methods = ["bicubic", "nearest-exact", "bilinear", "area", "bislerp"]
+        with gr.Accordion(open=False, label=self.title()):
+            enabled = gr.Checkbox(label='Enabled', value=False)
+            block_number = gr.Slider(label='Block Number', value=3, minimum=1, maximum=32, step=1)
+            downscale_factor = gr.Slider(label='Downscale Factor', value=2.0, minimum=0.1, maximum=9.0, step=0.001)
+            start_percent = gr.Slider(label='Start Percent', value=0.0, minimum=0.0, maximum=1.0, step=0.001)
+            end_percent = gr.Slider(label='End Percent', value=0.35, minimum=0.0, maximum=1.0, step=0.001)
+            downscale_after_skip = gr.Checkbox(label='Downscale After Skip', value=True)
+            downscale_method = gr.Radio(label='Downscale Method', choices=upscale_methods, value=upscale_methods[0])
+            upscale_method = gr.Radio(label='Upscale Method', choices=upscale_methods, value=upscale_methods[0])
+
+        return enabled, block_number, downscale_factor, start_percent, end_percent, downscale_after_skip, downscale_method, upscale_method
+
+    def process_batch(self, p, *script_args, **kwargs):
+        enabled, block_number, downscale_factor, start_percent, end_percent, downscale_after_skip, downscale_method, upscale_method = script_args
+        block_number = int(block_number)
+
+        if not enabled:
+            return
+
+        unet = p.sd_model.forge_objects.unet
+
+        unet = opPatchModelAddDownscale.patch(unet, block_number, downscale_factor, start_percent, end_percent, downscale_after_skip, downscale_method, upscale_method)[0]
+
+        p.sd_model.forge_objects.unet = unet
+
+        p.extra_generation_params.update(dict(
+            kohya_hrfix_enabled=enabled,
+            kohya_hrfix_block_number=block_number,
+            kohya_hrfix_downscale_factor=downscale_factor,
+            kohya_hrfix_start_percent=start_percent,
+            kohya_hrfix_end_percent=end_percent,
+            kohya_hrfix_downscale_after_skip=downscale_after_skip,
+            kohya_hrfix_downscale_method=downscale_method,
+            kohya_hrfix_upscale_method=upscale_method,
+        ))
+
+        return
--- a/extensions-builtin/sd_forge_sag/scripts/forge_sag.py
+++ b/extensions-builtin/sd_forge_sag/scripts/forge_sag.py
@@ -0,0 +1,43 @@
+import gradio as gr
+
+from modules import scripts
+from ldm_patched.contrib.external_sag import SelfAttentionGuidance
+
+
+opSelfAttentionGuidance = SelfAttentionGuidance()
+
+
+class SAGForForge(scripts.Script):
+    def title(self):
+        return "SelfAttentionGuidance Integrated"
+
+    def show(self, is_img2img):
+        return scripts.AlwaysVisible
+
+    def ui(self, *args, **kwargs):
+        with gr.Accordion(open=False, label=self.title()):
+            enabled = gr.Checkbox(label='Enabled', value=False)
+            scale = gr.Slider(label='Scale', minimum=-2.0, maximum=5.0, step=0.01, value=0.5)
+            blur_sigma = gr.Slider(label='Blur Sigma', minimum=0.0, maximum=10.0, step=0.01, value=2.0)
+
+        return enabled, scale, blur_sigma
+
+    def process_batch(self, p, *script_args, **kwargs):
+        enabled, scale, blur_sigma = script_args
+
+        if not enabled:
+            return
+
+        unet = p.sd_model.forge_objects.unet
+
+        unet = opSelfAttentionGuidance.patch(unet, scale, blur_sigma)[0]
+
+        p.sd_model.forge_objects.unet = unet
+
+        p.extra_generation_params.update(dict(
+            sag_enabled=enabled,
+            sag_scale=scale,
+            sag_blur_sigma=blur_sigma
+        ))
+
+        return
--- a/extensions-builtin/sd_forge_svd/scripts/forge_svd.py
+++ b/extensions-builtin/sd_forge_svd/scripts/forge_svd.py
@@ -0,0 +1,113 @@
+import torch
+import gradio as gr
+import os
+import pathlib
+
+from modules import script_callbacks
+from modules.paths import models_path
+from modules.ui_common import ToolButton, refresh_symbol
+from modules import shared
+
+from modules_forge.forge_util import numpy_to_pytorch, pytorch_to_numpy, write_images_to_mp4
+from ldm_patched.modules.sd import load_checkpoint_guess_config
+from ldm_patched.contrib.external_video_model import VideoLinearCFGGuidance, SVD_img2vid_Conditioning
+from ldm_patched.contrib.external import KSampler, VAEDecode
+
+
+opVideoLinearCFGGuidance = VideoLinearCFGGuidance()
+opSVD_img2vid_Conditioning = SVD_img2vid_Conditioning()
+opKSampler = KSampler()
+opVAEDecode = VAEDecode()
+
+svd_root = os.path.join(models_path, 'svd')
+os.makedirs(svd_root, exist_ok=True)
+svd_filenames = []
+
+
+def update_svd_filenames():
+    global svd_filenames
+    svd_filenames = [
+        pathlib.Path(x).name for x in
+        shared.walk_files(svd_root, allowed_extensions=[".pt", ".ckpt", ".safetensors"])
+    ]
+    return svd_filenames
+
+
+@torch.inference_mode()
+@torch.no_grad()
+def predict(filename, width, height, video_frames, motion_bucket_id, fps, augmentation_level,
+            sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler,
+            sampling_denoise, guidance_min_cfg, input_image):
+    filename = os.path.join(svd_root, filename)
+    model_raw, _, vae, clip_vision = \
+        load_checkpoint_guess_config(filename, output_vae=True, output_clip=False, output_clipvision=True)
+    model = opVideoLinearCFGGuidance.patch(model_raw, guidance_min_cfg)[0]
+    init_image = numpy_to_pytorch(input_image)
+    positive, negative, latent_image = opSVD_img2vid_Conditioning.encode(
+        clip_vision, init_image, vae, width, height, video_frames, motion_bucket_id, fps, augmentation_level)
+    output_latent = opKSampler.sample(model, sampling_seed, sampling_steps, sampling_cfg,
+                                      sampling_sampler_name, sampling_scheduler, positive,
+                                      negative, latent_image, sampling_denoise)[0]
+    output_pixels = opVAEDecode.decode(vae, output_latent)[0]
+    outputs = pytorch_to_numpy(output_pixels)
+
+    video_filename = write_images_to_mp4(outputs, fps=fps)
+
+    return outputs, video_filename
+
+
+def on_ui_tabs():
+    with gr.Blocks() as svd_block:
+        with gr.Row():
+            with gr.Column():
+                input_image = gr.Image(label='Input Image', source='upload', type='numpy', height=400)
+
+                with gr.Row():
+                    filename = gr.Dropdown(label="SVD Checkpoint Filename",
+                                           choices=svd_filenames,
+                                           value=svd_filenames[0] if len(svd_filenames) > 0 else None)
+                    refresh_button = ToolButton(value=refresh_symbol, tooltip="Refresh")
+                    refresh_button.click(
+                        fn=lambda: gr.update(choices=update_svd_filenames),
+                        inputs=[], outputs=filename)
+
+                width = gr.Slider(label='Width', minimum=16, maximum=8192, step=8, value=1024)
+                height = gr.Slider(label='Height', minimum=16, maximum=8192, step=8, value=576)
+                video_frames = gr.Slider(label='Video Frames', minimum=1, maximum=4096, step=1, value=14)
+                motion_bucket_id = gr.Slider(label='Motion Bucket Id', minimum=1, maximum=1023, step=1, value=127)
+                fps = gr.Slider(label='Fps', minimum=1, maximum=1024, step=1, value=6)
+                augmentation_level = gr.Slider(label='Augmentation Level', minimum=0.0, maximum=10.0, step=0.01,
+                                               value=0.0)
+                sampling_steps = gr.Slider(label='Sampling Steps', minimum=1, maximum=200, step=1, value=20)
+                sampling_cfg = gr.Slider(label='CFG Scale', minimum=0.0, maximum=50.0, step=0.1, value=2.5)
+                sampling_denoise = gr.Slider(label='Sampling Denoise', minimum=0.0, maximum=1.0, step=0.01, value=1.0)
+                guidance_min_cfg = gr.Slider(label='Guidance Min Cfg', minimum=0.0, maximum=100.0, step=0.5, value=1.0)
+                sampling_sampler_name = gr.Radio(label='Sampler Name',
+                                                 choices=['euler', 'euler_ancestral', 'heun', 'heunpp2', 'dpm_2',
+                                                          'dpm_2_ancestral', 'lms', 'dpm_fast', 'dpm_adaptive',
+                                                          'dpmpp_2s_ancestral', 'dpmpp_sde', 'dpmpp_sde_gpu',
+                                                          'dpmpp_2m', 'dpmpp_2m_sde', 'dpmpp_2m_sde_gpu',
+                                                          'dpmpp_3m_sde', 'dpmpp_3m_sde_gpu', 'ddpm', 'lcm', 'ddim',
+                                                          'uni_pc', 'uni_pc_bh2'], value='euler')
+                sampling_scheduler = gr.Radio(label='Scheduler',
+                                              choices=['normal', 'karras', 'exponential', 'sgm_uniform', 'simple',
+                                                       'ddim_uniform'], value='karras')
+                sampling_seed = gr.Number(label='Seed', value=12345, precision=0)
+
+                generate_button = gr.Button(value="Generate")
+
+                ctrls = [filename, width, height, video_frames, motion_bucket_id, fps, augmentation_level,
+                         sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler,
+                         sampling_denoise, guidance_min_cfg, input_image]
+
+            with gr.Column():
+                output_video = gr.Video(autoplay=True)
+                output_gallery = gr.Gallery(label='Gallery', show_label=False, object_fit='contain',
+                                            visible=True, height=1024, columns=4)
+
+        generate_button.click(predict, inputs=ctrls, outputs=[output_gallery, output_video])
+    return [(svd_block, "SVD", "svd")]
+
+
+update_svd_filenames()
+script_callbacks.on_ui_tabs(on_ui_tabs)
--- a/extensions-builtin/sd_forge_z123/scripts/forge_z123.py
+++ b/extensions-builtin/sd_forge_z123/scripts/forge_z123.py
@@ -0,0 +1,100 @@
+import torch
+import gradio as gr
+import os
+import pathlib
+
+from modules import script_callbacks
+from modules.paths import models_path
+from modules.ui_common import ToolButton, refresh_symbol
+from modules import shared
+
+from modules_forge.forge_util import numpy_to_pytorch, pytorch_to_numpy
+from ldm_patched.modules.sd import load_checkpoint_guess_config
+from ldm_patched.contrib.external_stable3d import StableZero123_Conditioning
+from ldm_patched.contrib.external import KSampler, VAEDecode
+
+
+opStableZero123_Conditioning = StableZero123_Conditioning()
+opKSampler = KSampler()
+opVAEDecode = VAEDecode()
+
+model_root = os.path.join(models_path, 'z123')
+os.makedirs(model_root, exist_ok=True)
+model_filenames = []
+
+
+def update_model_filenames():
+    global model_filenames
+    model_filenames = [
+        pathlib.Path(x).name for x in
+        shared.walk_files(model_root, allowed_extensions=[".pt", ".ckpt", ".safetensors"])
+    ]
+    return model_filenames
+
+
+@torch.inference_mode()
+@torch.no_grad()
+def predict(filename, width, height, batch_size, elevation, azimuth,
+            sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler, sampling_denoise, input_image):
+    filename = os.path.join(model_root, filename)
+    model, _, vae, clip_vision = \
+        load_checkpoint_guess_config(filename, output_vae=True, output_clip=False, output_clipvision=True)
+    init_image = numpy_to_pytorch(input_image)
+    positive, negative, latent_image = opStableZero123_Conditioning.encode(
+        clip_vision, init_image, vae, width, height, batch_size, elevation, azimuth)
+    output_latent = opKSampler.sample(model, sampling_seed, sampling_steps, sampling_cfg,
+                                      sampling_sampler_name, sampling_scheduler, positive,
+                                      negative, latent_image, sampling_denoise)[0]
+    output_pixels = opVAEDecode.decode(vae, output_latent)[0]
+    outputs = pytorch_to_numpy(output_pixels)
+    return outputs
+
+
+def on_ui_tabs():
+    with gr.Blocks() as model_block:
+        with gr.Row():
+            with gr.Column():
+                input_image = gr.Image(label='Input Image', source='upload', type='numpy', height=400)
+
+                with gr.Row():
+                    filename = gr.Dropdown(label="Zero123 Checkpoint Filename",
+                                           choices=model_filenames,
+                                           value=model_filenames[0] if len(model_filenames) > 0 else None)
+                    refresh_button = ToolButton(value=refresh_symbol, tooltip="Refresh")
+                    refresh_button.click(
+                        fn=lambda: gr.update(choices=update_model_filenames),
+                        inputs=[], outputs=filename)
+
+                width = gr.Slider(label='Width', minimum=16, maximum=8192, step=8, value=256)
+                height = gr.Slider(label='Height', minimum=16, maximum=8192, step=8, value=256)
+                batch_size = gr.Slider(label='Batch Size', minimum=1, maximum=4096, step=1, value=4)
+                elevation = gr.Slider(label='Elevation', minimum=-180.0, maximum=180.0, step=0.001, value=10.0)
+                azimuth = gr.Slider(label='Azimuth', minimum=-180.0, maximum=180.0, step=0.001, value=142.0)
+                sampling_denoise = gr.Slider(label='Sampling Denoise', minimum=0.0, maximum=1.0, step=0.01, value=1.0)
+                sampling_steps = gr.Slider(label='Sampling Steps', minimum=1, maximum=10000, step=1, value=20)
+                sampling_cfg = gr.Slider(label='CFG Scale', minimum=0.0, maximum=100.0, step=0.1, value=5.0)
+                sampling_sampler_name = gr.Radio(label='Sampler Name',
+                                                 choices=['euler', 'euler_ancestral', 'heun', 'heunpp2', 'dpm_2',
+                                                          'dpm_2_ancestral', 'lms', 'dpm_fast', 'dpm_adaptive',
+                                                          'dpmpp_2s_ancestral', 'dpmpp_sde', 'dpmpp_sde_gpu',
+                                                          'dpmpp_2m', 'dpmpp_2m_sde', 'dpmpp_2m_sde_gpu',
+                                                          'dpmpp_3m_sde', 'dpmpp_3m_sde_gpu', 'ddpm', 'lcm', 'ddim',
+                                                          'uni_pc', 'uni_pc_bh2'], value='euler')
+                sampling_scheduler = gr.Radio(label='Sampling Scheduler',
+                                              choices=['normal', 'karras', 'exponential', 'sgm_uniform', 'simple',
+                                                       'ddim_uniform'], value='sgm_uniform')
+                sampling_seed = gr.Number(label='Seed', value=12345, precision=0)
+                generate_button = gr.Button(value="Generate")
+
+                ctrls = [filename, width, height, batch_size, elevation, azimuth, sampling_seed, sampling_steps, sampling_cfg, sampling_sampler_name, sampling_scheduler, sampling_denoise, input_image]
+
+            with gr.Column():
+                output_gallery = gr.Gallery(label='Gallery', show_label=False, object_fit='contain',
+                                            visible=True, height=1024, columns=4)
+
+        generate_button.click(predict, inputs=ctrls, outputs=[output_gallery])
+    return [(model_block, "Z123", "z123")]
+
+
+update_model_filenames()
+script_callbacks.on_ui_tabs(on_ui_tabs)
--- a/models/svd/put
+++ b/models/svd/put
--- a/models/z123/put
+++ b/models/z123/put
--- a/modules/cmd_args.py
+++ b/modules/cmd_args.py
@@ -2,8 +2,9 @@ import argparse
 import json
 import os
 from modules.paths_internal import models_path, script_path, data_path, extensions_dir, extensions_builtin_dir, sd_default_config, sd_model_file  # noqa: F401
+from ldm_patched.modules import args_parser

-parser = argparse.ArgumentParser()
+parser = args_parser.parser

 parser.add_argument("-f", action='store_true', help=argparse.SUPPRESS)  # allows running as root; implemented outside of webui
 parser.add_argument("--update-all-extensions", action='store_true', help="launch.py argument: download updates for all extensions when starting the program")
--- a/modules/dat_model.py
+++ b/modules/dat_model.py
@@ -2,7 +2,7 @@ import os

 from modules import modelloader, errors
 from modules.shared import cmd_opts, opts
-from modules.upscaler import Upscaler, UpscalerData
+from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
 from modules.upscaler_utils import upscale_with_model


@@ -23,6 +23,7 @@ class UpscalerDAT(Upscaler):
                self.scalers.append(model)

    def do_upscale(self, img, path):
+        prepare_free_memory()
        try:
            info = self.load_model(path)
        except Exception:
--- a/modules/deepbooru.py
+++ b/modules/deepbooru.py
@@ -4,7 +4,10 @@ import re
 import torch
 import numpy as np

-from modules import modelloader, paths, deepbooru_model, devices, images, shared
+from modules import modelloader, paths, deepbooru_model, images, shared
+from ldm_patched.modules import model_management
+from ldm_patched.modules.model_patcher import ModelPatcher
+

 re_special = re.compile(r'([\\()])')

@@ -12,6 +15,14 @@ re_special = re.compile(r'([\\()])')
 class DeepDanbooru:
    def __init__(self):
        self.model = None
+        self.load_device = model_management.text_encoder_device()
+        self.offload_device = model_management.text_encoder_offload_device()
+        self.dtype = torch.float32
+
+        if model_management.should_use_fp16(device=self.load_device):
+            self.dtype = torch.float16
+
+        self.patcher = None

    def load(self):
        if self.model is not None:
@@ -28,16 +39,16 @@ class DeepDanbooru:
        self.model.load_state_dict(torch.load(files[0], map_location="cpu"))

        self.model.eval()
-        self.model.to(devices.cpu, devices.dtype)
+        self.model.to(self.offload_device, self.dtype)
+
+        self.patcher = ModelPatcher(self.model, load_device=self.load_device, offload_device=self.offload_device)

    def start(self):
        self.load()
-        self.model.to(devices.device)
+        model_management.load_models_gpu([self.patcher])

    def stop(self):
-        if not shared.opts.interrogate_keep_models_in_memory:
-            self.model.to(devices.cpu)
-            devices.torch_gc()
+        pass

    def tag(self, pil_image):
        self.start()
@@ -56,8 +67,8 @@ class DeepDanbooru:
        pic = images.resize_image(2, pil_image.convert("RGB"), 512, 512)
        a = np.expand_dims(np.array(pic, dtype=np.float32), 0) / 255

-        with torch.no_grad(), devices.autocast():
-            x = torch.from_numpy(a).to(devices.device)
+        with torch.no_grad():
+            x = torch.from_numpy(a).to(self.load_device, self.dtype)
            y = self.model(x)[0].detach().cpu().numpy()

        probability_dict = {}
--- a/modules/devices.py
+++ b/modules/devices.py
@@ -1,211 +1,89 @@
-import sys
 import contextlib
-from functools import lru_cache
-
 import torch
-from modules import errors, shared
-from modules import torch_utils
-
-if sys.platform == "darwin":
-    from modules import mac_specific
-
-if shared.cmd_opts.use_ipex:
-    from modules import xpu_specific
+import ldm_patched.modules.model_management as model_management


 def has_xpu() -> bool:
-    return shared.cmd_opts.use_ipex and xpu_specific.has_xpu
+    return model_management.xpu_available


 def has_mps() -> bool:
-    if sys.platform != "darwin":
-        return False
-    else:
-        return mac_specific.has_mps
+    return model_management.mps_mode()


 def cuda_no_autocast(device_id=None) -> bool:
-    if device_id is None:
-        device_id = get_cuda_device_id()
-    return (
-        torch.cuda.get_device_capability(device_id) == (7, 5)
-        and torch.cuda.get_device_name(device_id).startswith("NVIDIA GeForce GTX 16")
-    )
+    return False


 def get_cuda_device_id():
-    return (
-        int(shared.cmd_opts.device_id)
-        if shared.cmd_opts.device_id is not None and shared.cmd_opts.device_id.isdigit()
-        else 0
-    ) or torch.cuda.current_device()
+    return model_management.get_torch_device().index


 def get_cuda_device_string():
-    if shared.cmd_opts.device_id is not None:
-        return f"cuda:{shared.cmd_opts.device_id}"
-
-    return "cuda"
+    return str(model_management.get_torch_device())


 def get_optimal_device_name():
-    if torch.cuda.is_available():
-        return get_cuda_device_string()
-
-    if has_mps():
-        return "mps"
-
-    if has_xpu():
-        return xpu_specific.get_xpu_device_string()
-
-    return "cpu"
+    return model_management.get_torch_device().type


 def get_optimal_device():
-    return torch.device(get_optimal_device_name())
+    return model_management.get_torch_device()


 def get_device_for(task):
-    if task in shared.cmd_opts.use_cpu or "all" in shared.cmd_opts.use_cpu:
-        return cpu
-
    return get_optimal_device()


 def torch_gc():
-
-    if torch.cuda.is_available():
-        with torch.cuda.device(get_cuda_device_string()):
-            torch.cuda.empty_cache()
-            torch.cuda.ipc_collect()
-
-    if has_mps():
-        mac_specific.torch_mps_gc()
-
-    if has_xpu():
-        xpu_specific.torch_xpu_gc()
+    model_management.soft_empty_cache()


 def enable_tf32():
-    if torch.cuda.is_available():
+    return

-        # enabling benchmark option seems to enable a range of cards to do fp16 when they otherwise can't
-        # see https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/4407
-        if cuda_no_autocast():
-            torch.backends.cudnn.benchmark = True
-
-        torch.backends.cuda.matmul.allow_tf32 = True
-        torch.backends.cudnn.allow_tf32 = True
-
-
-errors.run(enable_tf32, "Enabling TF32")

 cpu: torch.device = torch.device("cpu")
 fp8: bool = False
-device: torch.device = None
-device_interrogate: torch.device = None
-device_gfpgan: torch.device = None
-device_esrgan: torch.device = None
-device_codeformer: torch.device = None
-dtype: torch.dtype = torch.float16
-dtype_vae: torch.dtype = torch.float16
-dtype_unet: torch.dtype = torch.float16
-dtype_inference: torch.dtype = torch.float16
+device: torch.device = model_management.get_torch_device()
+device_interrogate: torch.device = cpu  # not used
+device_gfpgan: torch.device = cpu
+device_esrgan: torch.device = model_management.get_torch_device()  # will be managed in special way
+device_codeformer: torch.device = cpu
+dtype: torch.dtype = model_management.unet_dtype()
+dtype_vae: torch.dtype = model_management.vae_dtype()
+dtype_unet: torch.dtype = model_management.unet_dtype()
+dtype_inference: torch.dtype = model_management.unet_dtype()
 unet_needs_upcast = False


 def cond_cast_unet(input):
-    return input.to(dtype_unet) if unet_needs_upcast else input
+    return input


 def cond_cast_float(input):
-    return input.float() if unet_needs_upcast else input
+    return input


 nv_rng = None
-patch_module_list = [
-    torch.nn.Linear,
-    torch.nn.Conv2d,
-    torch.nn.MultiheadAttention,
-    torch.nn.GroupNorm,
-    torch.nn.LayerNorm,
-]
+patch_module_list = []


 def manual_cast_forward(target_dtype):
-    def forward_wrapper(self, *args, **kwargs):
-        if any(
-            isinstance(arg, torch.Tensor) and arg.dtype != target_dtype
-            for arg in args
-        ):
-            args = [arg.to(target_dtype) if isinstance(arg, torch.Tensor) else arg for arg in args]
-            kwargs = {k: v.to(target_dtype) if isinstance(v, torch.Tensor) else v for k, v in kwargs.items()}
-
-        org_dtype = torch_utils.get_param(self).dtype
-        if org_dtype != target_dtype:
-            self.to(target_dtype)
-        result = self.org_forward(*args, **kwargs)
-        if org_dtype != target_dtype:
-            self.to(org_dtype)
-
-        if target_dtype != dtype_inference:
-            if isinstance(result, tuple):
-                result = tuple(
-                    i.to(dtype_inference)
-                    if isinstance(i, torch.Tensor)
-                    else i
-                    for i in result
-                )
-            elif isinstance(result, torch.Tensor):
-                result = result.to(dtype_inference)
-        return result
-    return forward_wrapper
+    return


@contextlib.contextmanager
 def manual_cast(target_dtype):
-    applied = False
-    for module_type in patch_module_list:
-        if hasattr(module_type, "org_forward"):
-            continue
-        applied = True
-        org_forward = module_type.forward
-        if module_type == torch.nn.MultiheadAttention and has_xpu():
-            module_type.forward = manual_cast_forward(torch.float32)
-        else:
-            module_type.forward = manual_cast_forward(target_dtype)
-        module_type.org_forward = org_forward
-    try:
-        yield None
-    finally:
-        if applied:
-            for module_type in patch_module_list:
-                if hasattr(module_type, "org_forward"):
-                    module_type.forward = module_type.org_forward
-                    delattr(module_type, "org_forward")
+    return


 def autocast(disable=False):
-    if disable:
-        return contextlib.nullcontext()
-
-    if fp8 and device==cpu:
-        return torch.autocast("cpu", dtype=torch.bfloat16, enabled=True)
-
-    if fp8 and dtype_inference == torch.float32:
-        return manual_cast(dtype)
-
-    if dtype == torch.float32 or dtype_inference == torch.float32:
-        return contextlib.nullcontext()
-
-    if has_xpu() or has_mps() or cuda_no_autocast():
-        return manual_cast(dtype)
-
-    return torch.autocast("cuda")
+    return contextlib.nullcontext()


 def without_autocast(disable=False):
-    return torch.autocast("cuda", enabled=False) if torch.is_autocast_enabled() and not disable else contextlib.nullcontext()
+    return contextlib.nullcontext()


 class NansException(Exception):
@@ -213,43 +91,9 @@ class NansException(Exception):


 def test_for_nans(x, where):
-    if shared.cmd_opts.disable_nan_check:
-        return
-
-    if not torch.all(torch.isnan(x)).item():
-        return
-
-    if where == "unet":
-        message = "A tensor with all NaNs was produced in Unet."
-
-        if not shared.cmd_opts.no_half:
-            message += " This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the \"Upcast cross attention layer to float32\" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this."
-
-    elif where == "vae":
-        message = "A tensor with all NaNs was produced in VAE."
-
-        if not shared.cmd_opts.no_half and not shared.cmd_opts.no_half_vae:
-            message += " This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this."
-    else:
-        message = "A tensor with all NaNs was produced."
-
-    message += " Use --disable-nan-check commandline argument to disable this check."
-
-    raise NansException(message)
+    return


-@lru_cache
 def first_time_calculation():
-    """
-    just do any calculation with pytorch layers - the first time this is done it allocaltes about 700MB of memory and
-    spends about 2.7 seconds doing that, at least wih NVidia.
-    """
-
-    x = torch.zeros((1, 1)).to(device, dtype)
-    linear = torch.nn.Linear(1, 1).to(device, dtype)
-    linear(x)
-
-    x = torch.zeros((1, 1, 3, 3)).to(device, dtype)
-    conv2d = torch.nn.Conv2d(1, 1, (3, 3)).to(device, dtype)
-    conv2d(x)
+    return

--- a/modules/esrgan_model.py
+++ b/modules/esrgan_model.py
@@ -1,6 +1,6 @@
 from modules import modelloader, devices, errors
 from modules.shared import opts
-from modules.upscaler import Upscaler, UpscalerData
+from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
 from modules.upscaler_utils import upscale_with_model


@@ -27,6 +27,7 @@ class UpscalerESRGAN(Upscaler):
            self.scalers.append(scaler_data)

    def do_upscale(self, img, selected_model):
+        prepare_free_memory()
        try:
            model = self.load_model(selected_model)
        except Exception:
--- a/modules/hat_model.py
+++ b/modules/hat_model.py
@@ -3,7 +3,7 @@ import sys

 from modules import modelloader, devices
 from modules.shared import opts
-from modules.upscaler import Upscaler, UpscalerData
+from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
 from modules.upscaler_utils import upscale_with_model


@@ -20,6 +20,7 @@ class UpscalerHAT(Upscaler):
            self.scalers.append(scaler_data)

    def do_upscale(self, img, selected_model):
+        prepare_free_memory()
        try:
            model = self.load_model(selected_model)
        except Exception as e:
--- a/modules/initialize.py
+++ b/modules/initialize.py
@@ -12,6 +12,10 @@ def imports():
    logging.getLogger("torch.distributed.nn").setLevel(logging.ERROR)  # sshh...
    logging.getLogger("xformers").addFilter(lambda record: 'A matching Triton is not available' not in record.getMessage())

+    from modules_forge.initialization import initialize_forge
+    initialize_forge()
+    startup_timer.record("initialize forge")
+
    import torch  # noqa: F401
    startup_timer.record("import torch")
    import pytorch_lightning  # noqa: F401
--- a/modules/interrogate.py
+++ b/modules/interrogate.py
@@ -10,7 +10,10 @@ import torch.hub
 from torchvision import transforms
 from torchvision.transforms.functional import InterpolationMode

-from modules import devices, paths, shared, lowvram, modelloader, errors, torch_utils
+from modules import devices, paths, shared, modelloader, errors
+from ldm_patched.modules import model_management
+from ldm_patched.modules.model_patcher import ModelPatcher
+

 blip_image_eval_size = 384
 clip_model_name = 'ViT-L/14'
@@ -53,7 +56,16 @@ class InterrogateModels:
        self.loaded_categories = None
        self.skip_categories = []
        self.content_dir = content_dir
-        self.running_on_cpu = devices.device_interrogate == torch.device("cpu")
+
+        self.load_device = model_management.text_encoder_device()
+        self.offload_device = model_management.text_encoder_offload_device()
+        self.dtype = torch.float32
+
+        if model_management.should_use_fp16(device=self.load_device):
+            self.dtype = torch.float16
+
+        self.blip_patcher = None
+        self.clip_patcher = None

    def categories(self):
        if not os.path.exists(self.content_dir):
@@ -105,49 +117,37 @@ class InterrogateModels:

    def load_clip_model(self):
        import clip
+        import clip.model

-        if self.running_on_cpu:
-            model, preprocess = clip.load(clip_model_name, device="cpu", download_root=shared.cmd_opts.clip_models_path)
-        else:
-            model, preprocess = clip.load(clip_model_name, download_root=shared.cmd_opts.clip_models_path)
+        clip.model.LayerNorm = torch.nn.LayerNorm

+        model, preprocess = clip.load(clip_model_name, device="cpu", download_root=shared.cmd_opts.clip_models_path)
        model.eval()
-        model = model.to(devices.device_interrogate)

        return model, preprocess

    def load(self):
        if self.blip_model is None:
            self.blip_model = self.load_blip_model()
-            if not shared.cmd_opts.no_half and not self.running_on_cpu:
-                self.blip_model = self.blip_model.half()
-
-        self.blip_model = self.blip_model.to(devices.device_interrogate)
+            self.blip_model = self.blip_model.to(device=self.offload_device, dtype=self.dtype)
+            self.blip_patcher = ModelPatcher(self.blip_model, load_device=self.load_device, offload_device=self.offload_device)

        if self.clip_model is None:
            self.clip_model, self.clip_preprocess = self.load_clip_model()
-            if not shared.cmd_opts.no_half and not self.running_on_cpu:
-                self.clip_model = self.clip_model.half()
+            self.clip_model = self.clip_model.to(device=self.offload_device, dtype=self.dtype)
+            self.clip_patcher = ModelPatcher(self.clip_model, load_device=self.load_device, offload_device=self.offload_device)

-        self.clip_model = self.clip_model.to(devices.device_interrogate)
-
-        self.dtype = torch_utils.get_param(self.clip_model).dtype
+        model_management.load_models_gpu([self.blip_patcher, self.clip_patcher])
+        return

    def send_clip_to_ram(self):
-        if not shared.opts.interrogate_keep_models_in_memory:
-            if self.clip_model is not None:
-                self.clip_model = self.clip_model.to(devices.cpu)
+        pass

    def send_blip_to_ram(self):
-        if not shared.opts.interrogate_keep_models_in_memory:
-            if self.blip_model is not None:
-                self.blip_model = self.blip_model.to(devices.cpu)
+        pass

    def unload(self):
-        self.send_clip_to_ram()
-        self.send_blip_to_ram()
-
-        devices.torch_gc()
+        pass

    def rank(self, image_features, text_array, top_count=1):
        import clip
@@ -158,11 +158,11 @@ class InterrogateModels:
            text_array = text_array[0:int(shared.opts.interrogate_clip_dict_limit)]

        top_count = min(top_count, len(text_array))
-        text_tokens = clip.tokenize(list(text_array), truncate=True).to(devices.device_interrogate)
+        text_tokens = clip.tokenize(list(text_array), truncate=True).to(self.load_device)
        text_features = self.clip_model.encode_text(text_tokens).type(self.dtype)
        text_features /= text_features.norm(dim=-1, keepdim=True)

-        similarity = torch.zeros((1, len(text_array))).to(devices.device_interrogate)
+        similarity = torch.zeros((1, len(text_array))).to(self.load_device)
        for i in range(image_features.shape[0]):
            similarity += (100.0 * image_features[i].unsqueeze(0) @ text_features.T).softmax(dim=-1)
        similarity /= image_features.shape[0]
@@ -175,7 +175,7 @@ class InterrogateModels:
            transforms.Resize((blip_image_eval_size, blip_image_eval_size), interpolation=InterpolationMode.BICUBIC),
            transforms.ToTensor(),
            transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711))
-        ])(pil_image).unsqueeze(0).type(self.dtype).to(devices.device_interrogate)
+        ])(pil_image).unsqueeze(0).type(self.dtype).to(self.load_device)

        with torch.no_grad():
            caption = self.blip_model.generate(gpu_image, sample=False, num_beams=shared.opts.interrogate_clip_num_beams, min_length=shared.opts.interrogate_clip_min_length, max_length=shared.opts.interrogate_clip_max_length)
@@ -186,9 +186,6 @@ class InterrogateModels:
        res = ""
        shared.state.begin(job="interrogate")
        try:
-            lowvram.send_everything_to_cpu()
-            devices.torch_gc()
-
            self.load()

            caption = self.generate_caption(pil_image)
@@ -197,7 +194,7 @@ class InterrogateModels:

            res = caption

-            clip_image = self.clip_preprocess(pil_image).unsqueeze(0).type(self.dtype).to(devices.device_interrogate)
+            clip_image = self.clip_preprocess(pil_image).unsqueeze(0).type(self.dtype).to(self.load_device)

            with torch.no_grad(), devices.autocast():
                image_features = self.clip_model.encode_image(clip_image).type(self.dtype)
--- a/modules/launch_utils.py
+++ b/modules/launch_utils.py
@@ -15,6 +15,7 @@ from modules import cmd_args, errors
 from modules.paths_internal import script_path, extensions_dir
 from modules.timer import startup_timer
 from modules import logging_config
+from modules_forge import forge_version

 args, _ = cmd_args.parser.parse_known_args()
 logging_config.setup_logging(args.loglevel)
@@ -70,7 +71,7 @@ def commit_hash():


@lru_cache()
-def git_tag():
+def git_tag_a1111():
    try:
        return subprocess.check_output([git, "-C", script_path, "describe", "--tags"], shell=False, encoding='utf8').strip()
    except Exception:
@@ -85,6 +86,10 @@ def git_tag():
            return "<none>"


+def git_tag():
+    return 'f' + forge_version.version + '-' + git_tag_a1111()
+
+
 def run(command, desc=None, errdesc=None, custom_env=None, live: bool = default_command_live) -> str:
    if desc is not None:
        print(desc)
--- a/modules/lowvram.py
+++ b/modules/lowvram.py
@@ -6,142 +6,20 @@ cpu = torch.device("cpu")


 def send_everything_to_cpu():
-    global module_in_gpu
-
-    if module_in_gpu is not None:
-        module_in_gpu.to(cpu)
-
-    module_in_gpu = None
+    return


 def is_needed(sd_model):
-    return shared.cmd_opts.lowvram or shared.cmd_opts.medvram or shared.cmd_opts.medvram_sdxl and hasattr(sd_model, 'conditioner')
+    return False


 def apply(sd_model):
-    enable = is_needed(sd_model)
-    shared.parallel_processing_allowed = not enable
-
-    if enable:
-        setup_for_low_vram(sd_model, not shared.cmd_opts.lowvram)
-    else:
-        sd_model.lowvram = False
+    return


 def setup_for_low_vram(sd_model, use_medvram):
-    if getattr(sd_model, 'lowvram', False):
-        return
-
-    sd_model.lowvram = True
-
-    parents = {}
-
-    def send_me_to_gpu(module, _):
-        """send this module to GPU; send whatever tracked module was previous in GPU to CPU;
-        we add this as forward_pre_hook to a lot of modules and this way all but one of them will
-        be in CPU
-        """
-        global module_in_gpu
-
-        module = parents.get(module, module)
-
-        if module_in_gpu == module:
-            return
-
-        if module_in_gpu is not None:
-            module_in_gpu.to(cpu)
-
-        module.to(devices.device)
-        module_in_gpu = module
-
-    # see below for register_forward_pre_hook;
-    # first_stage_model does not use forward(), it uses encode/decode, so register_forward_pre_hook is
-    # useless here, and we just replace those methods
-
-    first_stage_model = sd_model.first_stage_model
-    first_stage_model_encode = sd_model.first_stage_model.encode
-    first_stage_model_decode = sd_model.first_stage_model.decode
-
-    def first_stage_model_encode_wrap(x):
-        send_me_to_gpu(first_stage_model, None)
-        return first_stage_model_encode(x)
-
-    def first_stage_model_decode_wrap(z):
-        send_me_to_gpu(first_stage_model, None)
-        return first_stage_model_decode(z)
-
-    to_remain_in_cpu = [
-        (sd_model, 'first_stage_model'),
-        (sd_model, 'depth_model'),
-        (sd_model, 'embedder'),
-        (sd_model, 'model'),
-        (sd_model, 'embedder'),
-    ]
-
-    is_sdxl = hasattr(sd_model, 'conditioner')
-    is_sd2 = not is_sdxl and hasattr(sd_model.cond_stage_model, 'model')
-
-    if is_sdxl:
-        to_remain_in_cpu.append((sd_model, 'conditioner'))
-    elif is_sd2:
-        to_remain_in_cpu.append((sd_model.cond_stage_model, 'model'))
-    else:
-        to_remain_in_cpu.append((sd_model.cond_stage_model, 'transformer'))
-
-    # remove several big modules: cond, first_stage, depth/embedder (if applicable), and unet from the model
-    stored = []
-    for obj, field in to_remain_in_cpu:
-        module = getattr(obj, field, None)
-        stored.append(module)
-        setattr(obj, field, None)
-
-    # send the model to GPU.
-    sd_model.to(devices.device)
-
-    # put modules back. the modules will be in CPU.
-    for (obj, field), module in zip(to_remain_in_cpu, stored):
-        setattr(obj, field, module)
-
-    # register hooks for those the first three models
-    if is_sdxl:
-        sd_model.conditioner.register_forward_pre_hook(send_me_to_gpu)
-    elif is_sd2:
-        sd_model.cond_stage_model.model.register_forward_pre_hook(send_me_to_gpu)
-        sd_model.cond_stage_model.model.token_embedding.register_forward_pre_hook(send_me_to_gpu)
-        parents[sd_model.cond_stage_model.model] = sd_model.cond_stage_model
-        parents[sd_model.cond_stage_model.model.token_embedding] = sd_model.cond_stage_model
-    else:
-        sd_model.cond_stage_model.transformer.register_forward_pre_hook(send_me_to_gpu)
-        parents[sd_model.cond_stage_model.transformer] = sd_model.cond_stage_model
-
-    sd_model.first_stage_model.register_forward_pre_hook(send_me_to_gpu)
-    sd_model.first_stage_model.encode = first_stage_model_encode_wrap
-    sd_model.first_stage_model.decode = first_stage_model_decode_wrap
-    if sd_model.depth_model:
-        sd_model.depth_model.register_forward_pre_hook(send_me_to_gpu)
-    if sd_model.embedder:
-        sd_model.embedder.register_forward_pre_hook(send_me_to_gpu)
-
-    if use_medvram:
-        sd_model.model.register_forward_pre_hook(send_me_to_gpu)
-    else:
-        diff_model = sd_model.model.diffusion_model
-
-        # the third remaining model is still too big for 4 GB, so we also do the same for its submodules
-        # so that only one of them is in GPU at a time
-        stored = diff_model.input_blocks, diff_model.middle_block, diff_model.output_blocks, diff_model.time_embed
-        diff_model.input_blocks, diff_model.middle_block, diff_model.output_blocks, diff_model.time_embed = None, None, None, None
-        sd_model.model.to(devices.device)
-        diff_model.input_blocks, diff_model.middle_block, diff_model.output_blocks, diff_model.time_embed = stored
-
-        # install hooks for bits of third model
-        diff_model.time_embed.register_forward_pre_hook(send_me_to_gpu)
-        for block in diff_model.input_blocks:
-            block.register_forward_pre_hook(send_me_to_gpu)
-        diff_model.middle_block.register_forward_pre_hook(send_me_to_gpu)
-        for block in diff_model.output_blocks:
-            block.register_forward_pre_hook(send_me_to_gpu)
+    return


 def is_enabled(sd_model):
-    return sd_model.lowvram
+    return False
--- a/modules/processing.py
+++ b/modules/processing.py
@@ -627,44 +627,7 @@ def decode_latent_batch(model, batch, target_device=None, check_for_nans=False):

    for i in range(batch.shape[0]):
        sample = decode_first_stage(model, batch[i:i + 1])[0]
-
-        if check_for_nans:
-
-            try:
-                devices.test_for_nans(sample, "vae")
-            except devices.NansException as e:
-                if shared.opts.auto_vae_precision_bfloat16:
-                    autofix_dtype = torch.bfloat16
-                    autofix_dtype_text = "bfloat16"
-                    autofix_dtype_setting = "Automatically convert VAE to bfloat16"
-                    autofix_dtype_comment = ""
-                elif shared.opts.auto_vae_precision:
-                    autofix_dtype = torch.float32
-                    autofix_dtype_text = "32-bit float"
-                    autofix_dtype_setting = "Automatically revert VAE to 32-bit floats"
-                    autofix_dtype_comment = "\nTo always start with 32-bit VAE, use --no-half-vae commandline flag."
-                else:
-                    raise e
-
-                if devices.dtype_vae == autofix_dtype:
-                    raise e
-
-                errors.print_error_explanation(
-                    "A tensor with all NaNs was produced in VAE.\n"
-                    f"Web UI will now convert VAE into {autofix_dtype_text} and retry.\n"
-                    f"To disable this behavior, disable the '{autofix_dtype_setting}' setting.{autofix_dtype_comment}"
-                )
-
-                devices.dtype_vae = autofix_dtype
-                model.first_stage_model.to(devices.dtype_vae)
-                batch = batch.to(devices.dtype_vae)
-
-                sample = decode_first_stage(model, batch[i:i + 1])[0]
-
-        if target_device is not None:
-            sample = sample.to(target_device)
-
-        samples.append(sample)
+        samples.append(sample.to(target_device))

    return samples

@@ -847,7 +810,7 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:

    infotexts = []
    output_images = []
-    with torch.no_grad(), p.sd_model.ema_scope():
+    with torch.no_grad():
        with devices.autocast():
            p.init(p.all_prompts, p.all_seeds, p.all_subseeds)

@@ -871,6 +834,7 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:

            sd_models.reload_model_weights()  # model can be changed for example by refiner

+            p.sd_model.forge_objects = p.sd_model.forge_objects_original.shallow_copy()
            p.prompts = p.all_prompts[n * p.batch_size:(n + 1) * p.batch_size]
            p.negative_prompts = p.all_negative_prompts[n * p.batch_size:(n + 1) * p.batch_size]
            p.seeds = p.all_seeds[n * p.batch_size:(n + 1) * p.batch_size]
@@ -887,8 +851,9 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
            p.parse_extra_network_prompts()

            if not p.disable_extra_networks:
-                with devices.autocast():
-                    extra_networks.activate(p, p.extra_network_data)
+                extra_networks.activate(p, p.extra_network_data)
+
+            p.sd_model.forge_objects = p.sd_model.forge_objects_after_applying_lora.shallow_copy()

            if p.scripts is not None:
                p.scripts.process_batch(p, batch_number=n, prompts=p.prompts, seeds=p.seeds, subseeds=p.subseeds)
@@ -940,8 +905,7 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
                    p.extra_generation_params['Noise Schedule'] = opts.sd_noise_schedule
                    p.sd_model.alphas_cumprod = rescale_zero_terminal_snr_abar(p.sd_model.alphas_cumprod).to(shared.device)

-            with devices.without_autocast() if devices.unet_needs_upcast else devices.autocast():
-                samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
+            samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)

            if p.scripts is not None:
                ps = scripts.PostSampleArgs(samples_ddim)
@@ -960,9 +924,6 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:

            del samples_ddim

-            if lowvram.is_enabled(shared.sd_model):
-                lowvram.send_everything_to_cpu()
-
            devices.torch_gc()

            state.nextjob()
@@ -1269,7 +1230,7 @@ class StableDiffusionProcessingTxt2Img(StableDiffusionProcessing):
                image = np.array(self.firstpass_image).astype(np.float32) / 255.0
                image = np.moveaxis(image, 2, 0)
                image = torch.from_numpy(np.expand_dims(image, axis=0))
-                image = image.to(shared.device, dtype=devices.dtype_vae)
+                image = image.to(shared.device, dtype=torch.float32)

                if opts.sd_vae_encode_method != 'Full':
                    self.extra_generation_params['VAE Encoder'] = opts.sd_vae_encode_method
@@ -1353,7 +1314,7 @@ class StableDiffusionProcessingTxt2Img(StableDiffusionProcessing):
                batch_images.append(image)

            decoded_samples = torch.from_numpy(np.array(batch_images))
-            decoded_samples = decoded_samples.to(shared.device, dtype=devices.dtype_vae)
+            decoded_samples = decoded_samples.to(shared.device, dtype=torch.float32)

            if opts.sd_vae_encode_method != 'Full':
                self.extra_generation_params['VAE Encoder'] = opts.sd_vae_encode_method
@@ -1458,7 +1419,7 @@ class StableDiffusionProcessingTxt2Img(StableDiffusionProcessing):
            if shared.opts.hires_fix_use_firstpass_conds:
                self.calculate_hr_conds()

-            elif lowvram.is_enabled(shared.sd_model) and shared.sd_model.sd_checkpoint_info == sd_models.select_checkpoint():  # if in lowvram mode, we need to calculate conds right away, before the cond NN is unloaded
+            elif shared.sd_model.sd_checkpoint_info == sd_models.select_checkpoint():  # if in lowvram mode, we need to calculate conds right away, before the cond NN is unloaded
                with devices.autocast():
                    extra_networks.activate(self, self.hr_extra_network_data)

@@ -1645,7 +1606,7 @@ class StableDiffusionProcessingImg2Img(StableDiffusionProcessing):
            raise RuntimeError(f"bad number of images passed: {len(imgs)}; expecting {self.batch_size} or less")

        image = torch.from_numpy(batch_images)
-        image = image.to(shared.device, dtype=devices.dtype_vae)
+        image = image.to(shared.device, dtype=torch.float32)

        if opts.sd_vae_encode_method != 'Full':
            self.extra_generation_params['VAE Encoder'] = opts.sd_vae_encode_method
--- a/modules/prompt_parser.py
+++ b/modules/prompt_parser.py
@@ -276,6 +276,12 @@ class DictWithShape(dict):
    def shape(self):
        return self["crossattn"].shape

+    def to(self, *args, **kwargs):
+        for k in self.keys():
+            if isinstance(self[k], torch.Tensor):
+                self[k] = self[k].to(*args, **kwargs)
+        return self
+

 def reconstruct_cond_batch(c: list[list[ScheduledPromptConditioning]], current_step):
    param = c[0][0].cond
@@ -317,15 +323,32 @@ def stack_conds(tensors):
    return torch.stack(tensors)


+def stack_conds_alter(tensors, weights):
+    token_count = max([x.shape[0] for x in tensors])
+    for i in range(len(tensors)):
+        if tensors[i].shape[0] != token_count:
+            last_vector = tensors[i][-1:]
+            last_vector_repeated = last_vector.repeat([token_count - tensors[i].shape[0], 1])
+            tensors[i] = torch.vstack([tensors[i], last_vector_repeated])
+
+    result = 0
+    full_weights = 0
+    for x, w in zip(tensors, weights):
+        result = result + x * float(w)
+        full_weights = full_weights + float(w)
+    result = result / full_weights
+
+    return result
+

 def reconstruct_multicond_batch(c: MulticondLearnedConditioning, current_step):
    param = c.batch[0][0].schedules[0].cond

-    tensors = []
-    conds_list = []
+    results = []

    for composable_prompts in c.batch:
-        conds_for_batch = []
+        tensors = []
+        weights = []

        for composable_prompt in composable_prompts:
            target_index = 0
@@ -334,19 +357,24 @@ def reconstruct_multicond_batch(c: MulticondLearnedConditioning, current_step):
                    target_index = current
                    break

-            conds_for_batch.append((len(tensors), composable_prompt.weight))
+            weights.append(composable_prompt.weight)
            tensors.append(composable_prompt.schedules[target_index].cond)

-        conds_list.append(conds_for_batch)
+        if isinstance(tensors[0], dict):
+            weighted = {k: stack_conds_alter([x[k] for x in tensors], weights) for k in tensors[0].keys()}
+        else:
+            weighted = stack_conds_alter(tensors, weights)

-    if isinstance(tensors[0], dict):
-        keys = list(tensors[0].keys())
-        stacked = {k: stack_conds([x[k] for x in tensors]) for k in keys}
-        stacked = DictWithShape(stacked, stacked['crossattn'].shape)
+        results.append(weighted)
+
+    if isinstance(results[0], dict):
+        results = {k: torch.stack([x[k] for x in results])
+                   for k in results[0].keys()}
+        results = DictWithShape(results, results['crossattn'].shape)
    else:
-        stacked = stack_conds(tensors).to(device=param.device, dtype=param.dtype)
+        results = torch.stack(results).to(device=param.device, dtype=param.dtype)

-    return conds_list, stacked
+    return results


 re_attention = re.compile(r"""
--- a/modules/realesrgan_model.py
+++ b/modules/realesrgan_model.py
@@ -2,7 +2,7 @@ import os

 from modules import modelloader, errors
 from modules.shared import cmd_opts, opts
-from modules.upscaler import Upscaler, UpscalerData
+from modules.upscaler import Upscaler, UpscalerData, prepare_free_memory
 from modules.upscaler_utils import upscale_with_model


@@ -27,6 +27,8 @@ class UpscalerRealESRGAN(Upscaler):
                self.scalers.append(scaler)

    def do_upscale(self, img, path):
+        prepare_free_memory()
+
        if not self.enable:
            return img

--- a/modules/sd_hijack.py
+++ b/modules/sd_hijack.py
@@ -57,57 +57,11 @@ def list_optimizers():


 def apply_optimizations(option=None):
-    global current_optimizer
-
-    undo_optimizations()
-
-    if len(optimizers) == 0:
-        # a script can access the model very early, and optimizations would not be filled by then
-        current_optimizer = None
-        return ''
-
-    ldm.modules.diffusionmodules.model.nonlinearity = silu
-    ldm.modules.diffusionmodules.openaimodel.th = sd_hijack_unet.th
-
-    sgm.modules.diffusionmodules.model.nonlinearity = silu
-    sgm.modules.diffusionmodules.openaimodel.th = sd_hijack_unet.th
-
-    if current_optimizer is not None:
-        current_optimizer.undo()
-        current_optimizer = None
-
-    selection = option or shared.opts.cross_attention_optimization
-    if selection == "Automatic" and len(optimizers) > 0:
-        matching_optimizer = next(iter([x for x in optimizers if x.cmd_opt and getattr(shared.cmd_opts, x.cmd_opt, False)]), optimizers[0])
-    else:
-        matching_optimizer = next(iter([x for x in optimizers if x.title() == selection]), None)
-
-    if selection == "None":
-        matching_optimizer = None
-    elif selection == "Automatic" and shared.cmd_opts.disable_opt_split_attention:
-        matching_optimizer = None
-    elif matching_optimizer is None:
-        matching_optimizer = optimizers[0]
-
-    if matching_optimizer is not None:
-        print(f"Applying attention optimization: {matching_optimizer.name}... ", end='')
-        matching_optimizer.apply()
-        print("done.")
-        current_optimizer = matching_optimizer
-        return current_optimizer.name
-    else:
-        print("Disabling attention optimization")
-        return ''
+    return


 def undo_optimizations():
-    ldm.modules.diffusionmodules.model.nonlinearity = diffusionmodules_model_nonlinearity
-    ldm.modules.attention.CrossAttention.forward = hypernetwork.attention_CrossAttention_forward
-    ldm.modules.diffusionmodules.model.AttnBlock.forward = diffusionmodules_model_AttnBlock_forward
-
-    sgm.modules.diffusionmodules.model.nonlinearity = diffusionmodules_model_nonlinearity
-    sgm.modules.attention.CrossAttention.forward = hypernetwork.attention_CrossAttention_forward
-    sgm.modules.diffusionmodules.model.AttnBlock.forward = diffusionmodules_model_AttnBlock_forward
+    return


 def fix_checkpoint():
@@ -182,131 +136,16 @@ class StableDiffusionModelHijack:
        self.embedding_db.add_embedding_dir(cmd_opts.embeddings_dir)

    def apply_optimizations(self, option=None):
-        try:
-            self.optimization_method = apply_optimizations(option)
-        except Exception as e:
-            errors.display(e, "applying cross attention optimization")
-            undo_optimizations()
+        pass

    def convert_sdxl_to_ssd(self, m):
-        """Converts an SDXL model to a Segmind Stable Diffusion model (see https://huggingface.co/segmind/SSD-1B)"""
-
-        delattr(m.model.diffusion_model.middle_block, '1')
-        delattr(m.model.diffusion_model.middle_block, '2')
-        for i in ['9', '8', '7', '6', '5', '4']:
-            delattr(m.model.diffusion_model.input_blocks[7][1].transformer_blocks, i)
-            delattr(m.model.diffusion_model.input_blocks[8][1].transformer_blocks, i)
-            delattr(m.model.diffusion_model.output_blocks[0][1].transformer_blocks, i)
-            delattr(m.model.diffusion_model.output_blocks[1][1].transformer_blocks, i)
-        delattr(m.model.diffusion_model.output_blocks[4][1].transformer_blocks, '1')
-        delattr(m.model.diffusion_model.output_blocks[5][1].transformer_blocks, '1')
-        devices.torch_gc()
+        pass

    def hijack(self, m):
-        conditioner = getattr(m, 'conditioner', None)
-        if conditioner:
-            text_cond_models = []
-
-            for i in range(len(conditioner.embedders)):
-                embedder = conditioner.embedders[i]
-                typename = type(embedder).__name__
-                if typename == 'FrozenOpenCLIPEmbedder':
-                    embedder.model.token_embedding = EmbeddingsWithFixes(embedder.model.token_embedding, self)
-                    conditioner.embedders[i] = sd_hijack_open_clip.FrozenOpenCLIPEmbedderWithCustomWords(embedder, self)
-                    text_cond_models.append(conditioner.embedders[i])
-                if typename == 'FrozenCLIPEmbedder':
-                    model_embeddings = embedder.transformer.text_model.embeddings
-                    model_embeddings.token_embedding = EmbeddingsWithFixes(model_embeddings.token_embedding, self)
-                    conditioner.embedders[i] = sd_hijack_clip.FrozenCLIPEmbedderForSDXLWithCustomWords(embedder, self)
-                    text_cond_models.append(conditioner.embedders[i])
-                if typename == 'FrozenOpenCLIPEmbedder2':
-                    embedder.model.token_embedding = EmbeddingsWithFixes(embedder.model.token_embedding, self, textual_inversion_key='clip_g')
-                    conditioner.embedders[i] = sd_hijack_open_clip.FrozenOpenCLIPEmbedder2WithCustomWords(embedder, self)
-                    text_cond_models.append(conditioner.embedders[i])
-
-            if len(text_cond_models) == 1:
-                m.cond_stage_model = text_cond_models[0]
-            else:
-                m.cond_stage_model = conditioner
-
-        if type(m.cond_stage_model) == xlmr.BertSeriesModelWithTransformation or type(m.cond_stage_model) == xlmr_m18.BertSeriesModelWithTransformation:
-            model_embeddings = m.cond_stage_model.roberta.embeddings
-            model_embeddings.token_embedding = EmbeddingsWithFixes(model_embeddings.word_embeddings, self)
-            m.cond_stage_model = sd_hijack_xlmr.FrozenXLMREmbedderWithCustomWords(m.cond_stage_model, self)
-
-        elif type(m.cond_stage_model) == ldm.modules.encoders.modules.FrozenCLIPEmbedder:
-            model_embeddings = m.cond_stage_model.transformer.text_model.embeddings
-            model_embeddings.token_embedding = EmbeddingsWithFixes(model_embeddings.token_embedding, self)
-            m.cond_stage_model = sd_hijack_clip.FrozenCLIPEmbedderWithCustomWords(m.cond_stage_model, self)
-
-        elif type(m.cond_stage_model) == ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder:
-            m.cond_stage_model.model.token_embedding = EmbeddingsWithFixes(m.cond_stage_model.model.token_embedding, self)
-            m.cond_stage_model = sd_hijack_open_clip.FrozenOpenCLIPEmbedderWithCustomWords(m.cond_stage_model, self)
-
-        apply_weighted_forward(m)
-        if m.cond_stage_key == "edit":
-            sd_hijack_unet.hijack_ddpm_edit()
-
-        self.apply_optimizations()
-
-        self.clip = m.cond_stage_model
-
-        def flatten(el):
-            flattened = [flatten(children) for children in el.children()]
-            res = [el]
-            for c in flattened:
-                res += c
-            return res
-
-        self.layers = flatten(m)
-
-        import modules.models.diffusion.ddpm_edit
-
-        if isinstance(m, ldm.models.diffusion.ddpm.LatentDiffusion):
-            sd_unet.original_forward = ldm_original_forward
-        elif isinstance(m, modules.models.diffusion.ddpm_edit.LatentDiffusion):
-            sd_unet.original_forward = ldm_original_forward
-        elif isinstance(m, sgm.models.diffusion.DiffusionEngine):
-            sd_unet.original_forward = sgm_original_forward
-        else:
-            sd_unet.original_forward = None
-
+        pass

    def undo_hijack(self, m):
-        conditioner = getattr(m, 'conditioner', None)
-        if conditioner:
-            for i in range(len(conditioner.embedders)):
-                embedder = conditioner.embedders[i]
-                if isinstance(embedder, (sd_hijack_open_clip.FrozenOpenCLIPEmbedderWithCustomWords, sd_hijack_open_clip.FrozenOpenCLIPEmbedder2WithCustomWords)):
-                    embedder.wrapped.model.token_embedding = embedder.wrapped.model.token_embedding.wrapped
-                    conditioner.embedders[i] = embedder.wrapped
-                if isinstance(embedder, sd_hijack_clip.FrozenCLIPEmbedderForSDXLWithCustomWords):
-                    embedder.wrapped.transformer.text_model.embeddings.token_embedding = embedder.wrapped.transformer.text_model.embeddings.token_embedding.wrapped
-                    conditioner.embedders[i] = embedder.wrapped
-
-            if hasattr(m, 'cond_stage_model'):
-                delattr(m, 'cond_stage_model')
-
-        elif type(m.cond_stage_model) == sd_hijack_xlmr.FrozenXLMREmbedderWithCustomWords:
-            m.cond_stage_model = m.cond_stage_model.wrapped
-
-        elif type(m.cond_stage_model) == sd_hijack_clip.FrozenCLIPEmbedderWithCustomWords:
-            m.cond_stage_model = m.cond_stage_model.wrapped
-
-            model_embeddings = m.cond_stage_model.transformer.text_model.embeddings
-            if type(model_embeddings.token_embedding) == EmbeddingsWithFixes:
-                model_embeddings.token_embedding = model_embeddings.token_embedding.wrapped
-        elif type(m.cond_stage_model) == sd_hijack_open_clip.FrozenOpenCLIPEmbedderWithCustomWords:
-            m.cond_stage_model.wrapped.model.token_embedding = m.cond_stage_model.wrapped.model.token_embedding.wrapped
-            m.cond_stage_model = m.cond_stage_model.wrapped
-
-        undo_optimizations()
-        undo_weighted_forward(m)
-
-        self.apply_circular(False)
-        self.layers = None
-        self.clip = None
-
+        pass

    def apply_circular(self, enable):
        if self.circular_enabled == enable:
@@ -321,17 +160,12 @@ class StableDiffusionModelHijack:
        self.comments = []
        self.extra_generation_params = {}

-    def get_prompt_lengths(self, text):
-        if self.clip is None:
-            return "-", "-"
-
-        _, token_count = self.clip.process_texts([text])
-
-        return token_count, self.clip.get_target_prompt_token_count(token_count)
+    def get_prompt_lengths(self, text, cond_stage_model):
+        _, token_count = cond_stage_model.process_texts([text])
+        return token_count, cond_stage_model.get_target_prompt_token_count(token_count)

    def redo_hijack(self, m):
-        self.undo_hijack(m)
-        self.hijack(m)
+        pass


 class EmbeddingsWithFixes(torch.nn.Module):
--- a/modules/sd_models.py
+++ b/modules/sd_models.py
@@ -10,6 +10,7 @@ from omegaconf import OmegaConf, ListConfig
 from os import mkdir
 from urllib import request
 import ldm.modules.midas as midas
+import gc

 from ldm.util import instantiate_from_config

@@ -17,6 +18,12 @@ from modules import paths, shared, modelloader, devices, script_callbacks, sd_va
 from modules.timer import Timer
 import tomesd
 import numpy as np
+from modules_forge import forge_loader
+import modules_forge.ops as forge_ops
+from ldm_patched.modules.ops import manual_cast
+from ldm_patched.modules import model_management as model_management
+import ldm_patched.modules.model_patcher
+

 model_dir = "Stable-diffusion"
 model_path = os.path.abspath(os.path.join(paths.models_path, model_dir))
@@ -366,26 +373,12 @@ def load_model_weights(model, checkpoint_info: CheckpointInfo, state_dict, timer
    sd_model_hash = checkpoint_info.calculate_shorthash()
    timer.record("calculate hash")

-    if devices.fp8:
-        # prevent model to load state dict in fp8
-        model.half()
-
    if not SkipWritingToConfig.skip:
        shared.opts.data["sd_model_checkpoint"] = checkpoint_info.title

    if state_dict is None:
        state_dict = get_checkpoint_state_dict(checkpoint_info, timer)

-    model.is_sdxl = hasattr(model, 'conditioner')
-    model.is_sd2 = not model.is_sdxl and hasattr(model.cond_stage_model, 'model')
-    model.is_sd1 = not model.is_sdxl and not model.is_sd2
-    model.is_ssd = model.is_sdxl and 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_q.weight' not in state_dict.keys()
-    if model.is_sdxl:
-        sd_models_xl.extend_sdxl(model)
-
-    if model.is_ssd:
-        sd_hijack.model_hijack.convert_sdxl_to_ssd(model)
-
    if shared.opts.sd_checkpoint_cache > 0:
        # cache newly loaded model
        checkpoints_loaded[checkpoint_info] = state_dict.copy()
@@ -395,65 +388,6 @@ def load_model_weights(model, checkpoint_info: CheckpointInfo, state_dict, timer

    del state_dict

-    if shared.cmd_opts.opt_channelslast:
-        model.to(memory_format=torch.channels_last)
-        timer.record("apply channels_last")
-
-    if shared.cmd_opts.no_half:
-        model.float()
-        model.alphas_cumprod_original = model.alphas_cumprod
-        devices.dtype_unet = torch.float32
-        timer.record("apply float()")
-    else:
-        vae = model.first_stage_model
-        depth_model = getattr(model, 'depth_model', None)
-
-        # with --no-half-vae, remove VAE from model when doing half() to prevent its weights from being converted to float16
-        if shared.cmd_opts.no_half_vae:
-            model.first_stage_model = None
-        # with --upcast-sampling, don't convert the depth model weights to float16
-        if shared.cmd_opts.upcast_sampling and depth_model:
-            model.depth_model = None
-
-        alphas_cumprod = model.alphas_cumprod
-        model.alphas_cumprod = None
-        model.half()
-        model.alphas_cumprod = alphas_cumprod
-        model.alphas_cumprod_original = alphas_cumprod
-        model.first_stage_model = vae
-        if depth_model:
-            model.depth_model = depth_model
-
-        devices.dtype_unet = torch.float16
-        timer.record("apply half()")
-
-    for module in model.modules():
-        if hasattr(module, 'fp16_weight'):
-            del module.fp16_weight
-        if hasattr(module, 'fp16_bias'):
-            del module.fp16_bias
-
-    if check_fp8(model):
-        devices.fp8 = True
-        first_stage = model.first_stage_model
-        model.first_stage_model = None
-        for module in model.modules():
-            if isinstance(module, (torch.nn.Conv2d, torch.nn.Linear)):
-                if shared.opts.cache_fp16_weight:
-                    module.fp16_weight = module.weight.data.clone().cpu().half()
-                    if module.bias is not None:
-                        module.fp16_bias = module.bias.data.clone().cpu().half()
-                module.to(torch.float8_e4m3fn)
-        model.first_stage_model = first_stage
-        timer.record("apply fp8")
-    else:
-        devices.fp8 = False
-
-    devices.unet_needs_upcast = shared.cmd_opts.upcast_sampling and devices.dtype == torch.float16 and devices.dtype_unet == torch.float16
-
-    model.first_stage_model.to(devices.dtype_vae)
-    timer.record("apply dtype to VAE")
-
    # clean up cache if limit is reached
    while len(checkpoints_loaded) > shared.opts.sd_checkpoint_cache:
        checkpoints_loaded.popitem(last=False)
@@ -590,14 +524,6 @@ class SdModelData:
            sd_vae.loaded_vae_file = getattr(v, "loaded_vae_file", None)
            sd_vae.checkpoint_info = v.sd_checkpoint_info

-        try:
-            self.loaded_sd_models.remove(v)
-        except ValueError:
-            pass
-
-        if v is not None:
-            self.loaded_sd_models.insert(0, v)
-

 model_data = SdModelData()

@@ -615,31 +541,19 @@ def get_empty_cond(sd_model):


 def send_model_to_cpu(m):
-    if m.lowvram:
-        lowvram.send_everything_to_cpu()
-    else:
-        m.to(devices.cpu)
-
-    devices.torch_gc()
+    pass


 def model_target_device(m):
-    if lowvram.is_needed(m):
-        return devices.cpu
-    else:
-        return devices.device
+    return devices.device


 def send_model_to_device(m):
-    lowvram.apply(m)
-
-    if not m.lowvram:
-        m.to(shared.device)
+    pass


 def send_model_to_trash(m):
-    m.to(device="meta")
-    devices.torch_gc()
+    pass


 def load_model(checkpoint_info=None, already_loaded_state_dict=None):
@@ -649,9 +563,14 @@ def load_model(checkpoint_info=None, already_loaded_state_dict=None):
    timer = Timer()

    if model_data.sd_model:
-        send_model_to_trash(model_data.sd_model)
+        if model_data.sd_model.filename == checkpoint_info.filename:
+            return model_data.sd_model
+
        model_data.sd_model = None
-        devices.torch_gc()
+        model_data.loaded_sd_models = []
+        model_management.unload_all_models()
+        model_management.soft_empty_cache()
+        gc.collect()

    timer.record("unload existing model")

@@ -660,58 +579,27 @@ def load_model(checkpoint_info=None, already_loaded_state_dict=None):
    else:
        state_dict = get_checkpoint_state_dict(checkpoint_info, timer)

-    checkpoint_config = sd_models_config.find_checkpoint_config(state_dict, checkpoint_info)
-    clip_is_included_into_sd = any(x for x in [sd1_clip_weight, sd2_clip_weight, sdxl_clip_weight, sdxl_refiner_clip_weight] if x in state_dict)
+    if shared.opts.sd_checkpoint_cache > 0:
+        # cache newly loaded model
+        checkpoints_loaded[checkpoint_info] = state_dict.copy()

-    timer.record("find config")
+    sd_model = forge_loader.load_model_for_a1111(timer=timer, checkpoint_info=checkpoint_info, state_dict=state_dict)
+    sd_model.filename = checkpoint_info.filename

-    sd_config = OmegaConf.load(checkpoint_config)
-    repair_config(sd_config)
+    del state_dict

-    timer.record("load config")
+    # clean up cache if limit is reached
+    while len(checkpoints_loaded) > shared.opts.sd_checkpoint_cache:
+        checkpoints_loaded.popitem(last=False)

-    print(f"Creating model from config: {checkpoint_config}")
+    shared.opts.data["sd_checkpoint_hash"] = checkpoint_info.sha256

-    sd_model = None
-    try:
-        with sd_disable_initialization.DisableInitialization(disable_clip=clip_is_included_into_sd or shared.cmd_opts.do_not_download_clip):
-            with sd_disable_initialization.InitializeOnMeta():
-                sd_model = instantiate_from_config(sd_config.model)
+    sd_vae.delete_base_vae()
+    sd_vae.clear_loaded_vae()
+    vae_file, vae_source = sd_vae.resolve_vae(checkpoint_info.filename).tuple()
+    sd_vae.load_vae(sd_model, vae_file, vae_source)
+    timer.record("load VAE")

-    except Exception as e:
-        errors.display(e, "creating model quickly", full_traceback=True)
-
-    if sd_model is None:
-        print('Failed to create model quickly; will retry using slow method.', file=sys.stderr)
-
-        with sd_disable_initialization.InitializeOnMeta():
-            sd_model = instantiate_from_config(sd_config.model)
-
-    sd_model.used_config = checkpoint_config
-
-    timer.record("create model")
-
-    if shared.cmd_opts.no_half:
-        weight_dtype_conversion = None
-    else:
-        weight_dtype_conversion = {
-            'first_stage_model': None,
-            'alphas_cumprod': None,
-            '': torch.float16,
-        }
-
-    with sd_disable_initialization.LoadStateDictOnMeta(state_dict, device=model_target_device(sd_model), weight_dtype_conversion=weight_dtype_conversion):
-        load_model_weights(sd_model, checkpoint_info, state_dict, timer)
-    timer.record("load weights from state dict")
-
-    send_model_to_device(sd_model)
-    timer.record("move model to device")
-
-    sd_hijack.model_hijack.hijack(sd_model)
-
-    timer.record("hijack")
-
-    sd_model.eval()
    model_data.set_sd_model(sd_model)
    model_data.was_loaded_at_least_once = True

@@ -723,7 +611,7 @@ def load_model(checkpoint_info=None, already_loaded_state_dict=None):

    timer.record("scripts callbacks")

-    with devices.autocast(), torch.no_grad():
+    with torch.no_grad():
        sd_model.cond_stage_model_empty_prompt = get_empty_cond(sd_model)

    timer.record("calculate empty prompt")
@@ -734,132 +622,14 @@ def load_model(checkpoint_info=None, already_loaded_state_dict=None):


 def reuse_model_from_already_loaded(sd_model, checkpoint_info, timer):
-    """
-    Checks if the desired checkpoint from checkpoint_info is not already loaded in model_data.loaded_sd_models.
-    If it is loaded, returns that (moving it to GPU if necessary, and moving the currently loadded model to CPU if necessary).
-    If not, returns the model that can be used to load weights from checkpoint_info's file.
-    If no such model exists, returns None.
-    Additionaly deletes loaded models that are over the limit set in settings (sd_checkpoints_limit).
-    """
-
-    already_loaded = None
-    for i in reversed(range(len(model_data.loaded_sd_models))):
-        loaded_model = model_data.loaded_sd_models[i]
-        if loaded_model.sd_checkpoint_info.filename == checkpoint_info.filename:
-            already_loaded = loaded_model
-            continue
-
-        if len(model_data.loaded_sd_models) > shared.opts.sd_checkpoints_limit > 0:
-            print(f"Unloading model {len(model_data.loaded_sd_models)} over the limit of {shared.opts.sd_checkpoints_limit}: {loaded_model.sd_checkpoint_info.title}")
-            model_data.loaded_sd_models.pop()
-            send_model_to_trash(loaded_model)
-            timer.record("send model to trash")
-
-        if shared.opts.sd_checkpoints_keep_in_cpu:
-            send_model_to_cpu(sd_model)
-            timer.record("send model to cpu")
-
-    if already_loaded is not None:
-        send_model_to_device(already_loaded)
-        timer.record("send model to device")
-
-        model_data.set_sd_model(already_loaded, already_loaded=True)
-
-        if not SkipWritingToConfig.skip:
-            shared.opts.data["sd_model_checkpoint"] = already_loaded.sd_checkpoint_info.title
-            shared.opts.data["sd_checkpoint_hash"] = already_loaded.sd_checkpoint_info.sha256
-
-        print(f"Using already loaded model {already_loaded.sd_checkpoint_info.title}: done in {timer.summary()}")
-        sd_vae.reload_vae_weights(already_loaded)
-        return model_data.sd_model
-    elif shared.opts.sd_checkpoints_limit > 1 and len(model_data.loaded_sd_models) < shared.opts.sd_checkpoints_limit:
-        print(f"Loading model {checkpoint_info.title} ({len(model_data.loaded_sd_models) + 1} out of {shared.opts.sd_checkpoints_limit})")
-
-        model_data.sd_model = None
-        load_model(checkpoint_info)
-        return model_data.sd_model
-    elif len(model_data.loaded_sd_models) > 0:
-        sd_model = model_data.loaded_sd_models.pop()
-        model_data.sd_model = sd_model
-
-        sd_vae.base_vae = getattr(sd_model, "base_vae", None)
-        sd_vae.loaded_vae_file = getattr(sd_model, "loaded_vae_file", None)
-        sd_vae.checkpoint_info = sd_model.sd_checkpoint_info
-
-        print(f"Reusing loaded model {sd_model.sd_checkpoint_info.title} to load {checkpoint_info.title}")
-        return sd_model
-    else:
-        return None
+    pass


 def reload_model_weights(sd_model=None, info=None, forced_reload=False):
-    checkpoint_info = info or select_checkpoint()
-
-    timer = Timer()
-
-    if not sd_model:
-        sd_model = model_data.sd_model
-
-    if sd_model is None:  # previous model load failed
-        current_checkpoint_info = None
-    else:
-        current_checkpoint_info = sd_model.sd_checkpoint_info
-        if check_fp8(sd_model) != devices.fp8:
-            # load from state dict again to prevent extra numerical errors
-            forced_reload = True
-        elif sd_model.sd_model_checkpoint == checkpoint_info.filename and not forced_reload:
-            return sd_model
-
-    sd_model = reuse_model_from_already_loaded(sd_model, checkpoint_info, timer)
-    if not forced_reload and sd_model is not None and sd_model.sd_checkpoint_info.filename == checkpoint_info.filename:
-        return sd_model
-
-    if sd_model is not None:
-        sd_unet.apply_unet("None")
-        send_model_to_cpu(sd_model)
-        sd_hijack.model_hijack.undo_hijack(sd_model)
-
-    state_dict = get_checkpoint_state_dict(checkpoint_info, timer)
-
-    checkpoint_config = sd_models_config.find_checkpoint_config(state_dict, checkpoint_info)
-
-    timer.record("find config")
-
-    if sd_model is None or checkpoint_config != sd_model.used_config:
-        if sd_model is not None:
-            send_model_to_trash(sd_model)
-
-        load_model(checkpoint_info, already_loaded_state_dict=state_dict)
-        return model_data.sd_model
-
-    try:
-        load_model_weights(sd_model, checkpoint_info, state_dict, timer)
-    except Exception:
-        print("Failed to load checkpoint, restoring previous")
-        load_model_weights(sd_model, current_checkpoint_info, None, timer)
-        raise
-    finally:
-        sd_hijack.model_hijack.hijack(sd_model)
-        timer.record("hijack")
-
-        if not sd_model.lowvram:
-            sd_model.to(devices.device)
-            timer.record("move model to device")
-
-        script_callbacks.model_loaded_callback(sd_model)
-        timer.record("script callbacks")
-
-    print(f"Weights loaded in {timer.summary()}.")
-
-    model_data.set_sd_model(sd_model)
-    sd_unet.apply_unet()
-
-    return sd_model
+    return load_model(info)


 def unload_model_weights(sd_model=None, info=None):
-    send_model_to_cpu(sd_model or shared.sd_model)
-
    return sd_model


--- a/modules/sd_models_xl.py
+++ b/modules/sd_models_xl.py
@@ -8,8 +8,12 @@ import sgm.modules.diffusionmodules.discretizer
 from modules import devices, shared, prompt_parser
 from modules import torch_utils

+import ldm_patched.modules.model_management as model_management
+

 def get_learned_conditioning(self: sgm.models.diffusion.DiffusionEngine, batch: prompt_parser.SdConditioning | list[str]):
+    model_management.load_model_gpu(self.forge_objects.clip.patcher)
+
    for embedder in self.conditioner.embedders:
        embedder.ucg_rate = 0.0

@@ -18,7 +22,7 @@ def get_learned_conditioning(self: sgm.models.diffusion.DiffusionEngine, batch:
    is_negative_prompt = getattr(batch, 'is_negative_prompt', False)
    aesthetic_score = shared.opts.sdxl_refiner_low_aesthetic_score if is_negative_prompt else shared.opts.sdxl_refiner_high_aesthetic_score

-    devices_args = dict(device=devices.device, dtype=devices.dtype)
+    devices_args = dict(device=self.forge_objects.clip.patcher.current_device, dtype=model_management.text_encoder_dtype())

    sdxl_conds = {
        "txt": batch,
@@ -34,14 +38,11 @@ def get_learned_conditioning(self: sgm.models.diffusion.DiffusionEngine, batch:
    return c


-def apply_model(self: sgm.models.diffusion.DiffusionEngine, x, t, cond):
-    sd = self.model.state_dict()
-    diffusion_model_input = sd.get('diffusion_model.input_blocks.0.0.weight', None)
-    if diffusion_model_input is not None:
-        if diffusion_model_input.shape[1] == 9:
-            x = torch.cat([x] + cond['c_concat'], dim=1)
+def apply_model(self: sgm.models.diffusion.DiffusionEngine, x, t, cond, *args, **kwargs):
+    if self.model.diffusion_model.in_channels == 9:
+        x = torch.cat([x] + cond['c_concat'], dim=1)

-    return self.model(x, t, cond)
+    return self.model(x, t, cond, *args, **kwargs)


 def get_first_stage_encoding(self, x):  # SDXL's encode_first_stage does everything so get_first_stage_encoding is just there for compatibility
--- a/modules/sd_samplers_cfg_denoiser.py
+++ b/modules/sd_samplers_cfg_denoiser.py
@@ -6,6 +6,7 @@ import modules.shared as shared
 from modules.script_callbacks import CFGDenoiserParams, cfg_denoiser_callback
 from modules.script_callbacks import CFGDenoisedParams, cfg_denoised_callback
 from modules.script_callbacks import AfterCFGCallbackParams, cfg_after_cfg_callback
+from modules_forge import forge_sampler


 def catenate_conds(conds):
@@ -66,7 +67,7 @@ class CFGDenoiser(torch.nn.Module):
    def inner_model(self):
        raise NotImplementedError()

-    def combine_denoised(self, x_out, conds_list, uncond, cond_scale):
+    def combine_denoised(self, x_out, conds_list, uncond, cond_scale, timestep, x_in, cond):
        denoised_uncond = x_out[-uncond.shape[0]:]
        denoised = torch.clone(denoised_uncond)

@@ -152,19 +153,13 @@ class CFGDenoiser(torch.nn.Module):
        if state.interrupted or state.skipped:
            raise sd_samplers_common.InterruptedException

-        if sd_samplers_common.apply_refiner(self):
+        if sd_samplers_common.apply_refiner(self, x):
            cond = self.sampler.sampler_extra_args['cond']
            uncond = self.sampler.sampler_extra_args['uncond']

-        # at self.image_cfg_scale == 1.0 produced results for edit model are the same as with normal sampling,
-        # so is_edit_model is set to False to support AND composition.
-        is_edit_model = shared.sd_model.cond_stage_key == "edit" and self.image_cfg_scale is not None and self.image_cfg_scale != 1.0
-
-        conds_list, tensor = prompt_parser.reconstruct_multicond_batch(cond, self.step)
+        cond = prompt_parser.reconstruct_multicond_batch(cond, self.step)
        uncond = prompt_parser.reconstruct_cond_batch(uncond, self.step)

-        assert not is_edit_model or all(len(conds) == 1 for conds in conds_list), "AND is not supported for InstructPix2Pix checkpoint (unless using Image CFG scale = 1.0)"
-
        # If we use masks, blending between the denoised and original latent images occurs here.
        def apply_blend(current_latent):
            blended_latent = current_latent * self.nmask + self.init_latent * self.mask
@@ -181,113 +176,12 @@ class CFGDenoiser(torch.nn.Module):
        if self.mask_before_denoising and self.mask is not None:
            x = apply_blend(x)

-        batch_size = len(conds_list)
-        repeats = [len(conds_list[i]) for i in range(batch_size)]
-
-        if shared.sd_model.model.conditioning_key == "crossattn-adm":
-            image_uncond = torch.zeros_like(image_cond)
-            make_condition_dict = lambda c_crossattn, c_adm: {"c_crossattn": [c_crossattn], "c_adm": c_adm}
-        else:
-            image_uncond = image_cond
-            if isinstance(uncond, dict):
-                make_condition_dict = lambda c_crossattn, c_concat: {**c_crossattn, "c_concat": [c_concat]}
-            else:
-                make_condition_dict = lambda c_crossattn, c_concat: {"c_crossattn": [c_crossattn], "c_concat": [c_concat]}
-
-        if not is_edit_model:
-            x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x])
-            sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma])
-            image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond])
-        else:
-            x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x] + [x])
-            sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma] + [sigma])
-            image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond] + [torch.zeros_like(self.init_latent)])
-
-        denoiser_params = CFGDenoiserParams(x_in, image_cond_in, sigma_in, state.sampling_step, state.sampling_steps, tensor, uncond, self)
+        denoiser_params = CFGDenoiserParams(x, image_cond, sigma, state.sampling_step, state.sampling_steps, cond, uncond, self)
        cfg_denoiser_callback(denoiser_params)
-        x_in = denoiser_params.x
-        image_cond_in = denoiser_params.image_cond
-        sigma_in = denoiser_params.sigma
-        tensor = denoiser_params.text_cond
-        uncond = denoiser_params.text_uncond
-        skip_uncond = False

-        # alternating uncond allows for higher thresholds without the quality loss normally expected from raising it
-        if self.step % 2 and s_min_uncond > 0 and sigma[0] < s_min_uncond and not is_edit_model:
-            skip_uncond = True
-            x_in = x_in[:-batch_size]
-            sigma_in = sigma_in[:-batch_size]
-
-        self.padded_cond_uncond = False
-        self.padded_cond_uncond_v0 = False
-        if shared.opts.pad_cond_uncond and tensor.shape[1] != uncond.shape[1]:
-            tensor, uncond = self.pad_cond_uncond(tensor, uncond)
-        elif shared.opts.pad_cond_uncond_v0 and tensor.shape[1] != uncond.shape[1]:
-            tensor, uncond = self.pad_cond_uncond_v0(tensor, uncond)
-
-        if tensor.shape[1] == uncond.shape[1] or skip_uncond:
-            if is_edit_model:
-                cond_in = catenate_conds([tensor, uncond, uncond])
-            elif skip_uncond:
-                cond_in = tensor
-            else:
-                cond_in = catenate_conds([tensor, uncond])
-
-            if shared.opts.batch_cond_uncond:
-                x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
-            else:
-                x_out = torch.zeros_like(x_in)
-                for batch_offset in range(0, x_out.shape[0], batch_size):
-                    a = batch_offset
-                    b = a + batch_size
-                    x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(subscript_cond(cond_in, a, b), image_cond_in[a:b]))
-        else:
-            x_out = torch.zeros_like(x_in)
-            batch_size = batch_size*2 if shared.opts.batch_cond_uncond else batch_size
-            for batch_offset in range(0, tensor.shape[0], batch_size):
-                a = batch_offset
-                b = min(a + batch_size, tensor.shape[0])
-
-                if not is_edit_model:
-                    c_crossattn = subscript_cond(tensor, a, b)
-                else:
-                    c_crossattn = torch.cat([tensor[a:b]], uncond)
-
-                x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))
-
-            if not skip_uncond:
-                x_out[-uncond.shape[0]:] = self.inner_model(x_in[-uncond.shape[0]:], sigma_in[-uncond.shape[0]:], cond=make_condition_dict(uncond, image_cond_in[-uncond.shape[0]:]))
-
-        denoised_image_indexes = [x[0][0] for x in conds_list]
-        if skip_uncond:
-            fake_uncond = torch.cat([x_out[i:i+1] for i in denoised_image_indexes])
-            x_out = torch.cat([x_out, fake_uncond])  # we skipped uncond denoising, so we put cond-denoised image to where the uncond-denoised image should be
-
-        denoised_params = CFGDenoisedParams(x_out, state.sampling_step, state.sampling_steps, self.inner_model)
-        cfg_denoised_callback(denoised_params)
-
-        devices.test_for_nans(x_out, "unet")
-
-        if is_edit_model:
-            denoised = self.combine_denoised_for_edit_model(x_out, cond_scale)
-        elif skip_uncond:
-            denoised = self.combine_denoised(x_out, conds_list, uncond, 1.0)
-        else:
-            denoised = self.combine_denoised(x_out, conds_list, uncond, cond_scale)
-
-        # Blend in the original latents (after)
-        if not self.mask_before_denoising and self.mask is not None:
-            denoised = apply_blend(denoised)
-
-        self.sampler.last_latent = self.get_pred_x0(torch.cat([x_in[i:i + 1] for i in denoised_image_indexes]), torch.cat([x_out[i:i + 1] for i in denoised_image_indexes]), sigma)
-
-        if opts.live_preview_content == "Prompt":
-            preview = self.sampler.last_latent
-        elif opts.live_preview_content == "Negative prompt":
-            preview = self.get_pred_x0(x_in[-uncond.shape[0]:], x_out[-uncond.shape[0]:], sigma)
-        else:
-            preview = self.get_pred_x0(torch.cat([x_in[i:i+1] for i in denoised_image_indexes]), torch.cat([denoised[i:i+1] for i in denoised_image_indexes]), sigma)
+        denoised = forge_sampler.forge_sample(self, denoiser_params=denoiser_params, cond_scale=cond_scale)

+        preview = self.sampler.last_latent = denoised
        sd_samplers_common.store_latent(preview)

        after_cfg_callback_params = AfterCFGCallbackParams(denoised, state.sampling_step, state.sampling_steps)
--- a/modules/sd_samplers_common.py
+++ b/modules/sd_samplers_common.py
@@ -5,6 +5,7 @@ import torch
 from PIL import Image
 from modules import devices, images, sd_vae_approx, sd_samplers, sd_vae_taesd, shared, sd_models
 from modules.shared import opts, state
+from ldm_patched.modules import model_management
 import k_diffusion.sampling


@@ -39,9 +40,7 @@ def samples_to_images_tensor(sample, approximation=None, model=None):

    if approximation is None or (shared.state.interrupted and opts.live_preview_fast_interrupt):
        approximation = approximation_indexes.get(opts.show_progress_type, 0)
-
-        from modules import lowvram
-        if approximation == 0 and lowvram.is_enabled(shared.sd_model) and not shared.opts.live_preview_allow_lowvram_full:
+        if approximation == 0:
            approximation = 1

    if approximation == 2:
@@ -54,8 +53,7 @@ def samples_to_images_tensor(sample, approximation=None, model=None):
    else:
        if model is None:
            model = shared.sd_model
-        with devices.without_autocast(): # fixes an issue with unstable VAEs that are flaky even in fp32
-            x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
+        x_sample = model.decode_first_stage(sample)

    return x_sample

@@ -71,7 +69,6 @@ def single_sample_to_image(sample, approximation=None):


 def decode_first_stage(model, x):
-    x = x.to(devices.dtype_vae)
    approx_index = approximation_indexes.get(opts.sd_vae_decode_method, 0)
    return samples_to_images_tensor(x, approx_index, model)

@@ -95,7 +92,6 @@ def images_tensor_to_samples(image, approximation=None, model=None):
    else:
        if model is None:
            model = shared.sd_model
-        model.first_stage_model.to(devices.dtype_vae)

        image = image.to(shared.device, dtype=devices.dtype_vae)
        image = image * 2 - 1
@@ -155,7 +151,7 @@ def replace_torchsde_browinan():
 replace_torchsde_browinan()


-def apply_refiner(cfg_denoiser):
+def apply_refiner(cfg_denoiser, x):
    completed_ratio = cfg_denoiser.step / cfg_denoiser.total_steps
    refiner_switch_at = cfg_denoiser.p.refiner_switch_at
    refiner_checkpoint_info = cfg_denoiser.p.refiner_checkpoint_info
@@ -184,10 +180,17 @@ def apply_refiner(cfg_denoiser):
    with sd_models.SkipWritingToConfig():
        sd_models.reload_model_weights(info=refiner_checkpoint_info)

+    refiner = sd_models.model_data.get_sd_model()
+
    devices.torch_gc()
    cfg_denoiser.p.setup_conds()
    cfg_denoiser.update_inner_model()

+    inference_memory = refiner.current_controlnet_required_memory
+    unet_patcher = refiner.forge_objects.unet
+    model_management.load_models_gpu(
+        [unet_patcher],
+        unet_patcher.memory_required([x.shape[0]] + list(x.shape[1:])) + inference_memory)
    return True


--- a/modules/sd_samplers_kdiffusion.py
+++ b/modules/sd_samplers_kdiffusion.py
@@ -7,6 +7,8 @@ from modules.script_callbacks import ExtraNoiseParams, extra_noise_callback

 from modules.shared import opts
 import modules.shared as shared
+import ldm_patched.modules.model_management
+

 samplers_k_diffusion = [
    ('DPM++ 2M Karras', 'sample_dpmpp_2m', ['k_dpmpp_2m_ka'], {'scheduler': 'karras'}),
@@ -139,11 +141,21 @@ class KDiffusionSampler(sd_samplers_common.Sampler):
        return sigmas

    def sample_img2img(self, p, x, noise, conditioning, unconditional_conditioning, steps=None, image_conditioning=None):
+        inference_memory = self.model_wrap.inner_model.current_controlnet_required_memory
+        unet_patcher = self.model_wrap.inner_model.forge_objects.unet
+        ldm_patched.modules.model_management.load_models_gpu(
+            [unet_patcher],
+            unet_patcher.memory_required([x.shape[0] * 2] + list(x.shape[1:])) + inference_memory)
+
+        self.model_wrap.log_sigmas = self.model_wrap.log_sigmas.to(unet_patcher.current_device)
+        self.model_wrap.sigmas = self.model_wrap.sigmas.to(unet_patcher.current_device)
+
        steps, t_enc = sd_samplers_common.setup_img2img_steps(p, steps)

        sigmas = self.get_sigmas(p, steps)
        sigma_sched = sigmas[steps - t_enc - 1:]

+        x = x.to(noise)
        xi = x + noise * sigma_sched[0]

        if opts.img2img_extra_noise > 0:
@@ -192,6 +204,15 @@ class KDiffusionSampler(sd_samplers_common.Sampler):
        return samples

    def sample(self, p, x, conditioning, unconditional_conditioning, steps=None, image_conditioning=None):
+        inference_memory = self.model_wrap.inner_model.current_controlnet_required_memory
+        unet_patcher = self.model_wrap.inner_model.forge_objects.unet
+        ldm_patched.modules.model_management.load_models_gpu(
+            [unet_patcher],
+            unet_patcher.memory_required([x.shape[0] * 2] + list(x.shape[1:])) + inference_memory)
+
+        self.model_wrap.log_sigmas = self.model_wrap.log_sigmas.to(unet_patcher.current_device)
+        self.model_wrap.sigmas = self.model_wrap.sigmas.to(unet_patcher.current_device)
+
        steps = steps or p.steps

        sigmas = self.get_sigmas(p, steps)
--- a/modules/sd_samplers_lcm.py
+++ b/modules/sd_samplers_lcm.py
@@ -34,7 +34,7 @@ class LCMCompVisDenoiser(DiscreteEpsDDPMDenoiser):

    def sigma_to_t(self, sigma, quantize=None):
        log_sigma = sigma.log()
-        dists = log_sigma - self.log_sigmas[:, None]
+        dists = log_sigma - self.log_sigmas.to(sigma)[:, None]
        return dists.abs().argmin(dim=0).view(sigma.shape) * self.skip_steps + (self.skip_steps - 1)


--- a/modules/sd_samplers_timesteps.py
+++ b/modules/sd_samplers_timesteps.py
@@ -7,6 +7,8 @@ from modules.script_callbacks import ExtraNoiseParams, extra_noise_callback

 from modules.shared import opts
 import modules.shared as shared
+import ldm_patched.modules.model_management
+

 samplers_timesteps = [
    ('DDIM', sd_samplers_timesteps_impl.ddim, ['ddim'], {}),
@@ -54,6 +56,7 @@ class CFGDenoiserTimesteps(CFGDenoiser):

    def get_pred_x0(self, x_in, x_out, sigma):
        ts = sigma.to(dtype=int)
+        self.alphas = self.alphas.to(ts.device)

        a_t = self.alphas[ts][:, None, None, None]
        sqrt_one_minus_at = (1 - a_t).sqrt()
@@ -95,6 +98,14 @@ class CompVisSampler(sd_samplers_common.Sampler):
        return timesteps

    def sample_img2img(self, p, x, noise, conditioning, unconditional_conditioning, steps=None, image_conditioning=None):
+        inference_memory = self.model_wrap.inner_model.current_controlnet_required_memory
+        unet_patcher = self.model_wrap.inner_model.forge_objects.unet
+        ldm_patched.modules.model_management.load_models_gpu(
+            [unet_patcher],
+            unet_patcher.memory_required([x.shape[0] * 2] + list(x.shape[1:])) + inference_memory)
+
+        self.model_wrap.inner_model.alphas_cumprod = self.model_wrap.inner_model.alphas_cumprod.to(unet_patcher.current_device)
+
        steps, t_enc = sd_samplers_common.setup_img2img_steps(p, steps)

        timesteps = self.get_timesteps(p, steps)
@@ -104,7 +115,7 @@ class CompVisSampler(sd_samplers_common.Sampler):
        sqrt_alpha_cumprod = torch.sqrt(alphas_cumprod[timesteps[t_enc]])
        sqrt_one_minus_alpha_cumprod = torch.sqrt(1 - alphas_cumprod[timesteps[t_enc]])

-        xi = x * sqrt_alpha_cumprod + noise * sqrt_one_minus_alpha_cumprod
+        xi = x.to(noise) * sqrt_alpha_cumprod + noise * sqrt_one_minus_alpha_cumprod

        if opts.img2img_extra_noise > 0:
            p.extra_generation_params["Extra noise"] = opts.img2img_extra_noise
@@ -138,6 +149,14 @@ class CompVisSampler(sd_samplers_common.Sampler):
        return samples

    def sample(self, p, x, conditioning, unconditional_conditioning, steps=None, image_conditioning=None):
+        inference_memory = self.model_wrap.inner_model.current_controlnet_required_memory
+        unet_patcher = self.model_wrap.inner_model.forge_objects.unet
+        ldm_patched.modules.model_management.load_models_gpu(
+            [unet_patcher],
+            unet_patcher.memory_required([x.shape[0] * 2] + list(x.shape[1:])) + inference_memory)
+
+        self.model_wrap.inner_model.alphas_cumprod = self.model_wrap.inner_model.alphas_cumprod.to(unet_patcher.current_device)
+
        steps = steps or p.steps
        timesteps = self.get_timesteps(p, steps)

--- a/modules/sd_vae.py
+++ b/modules/sd_vae.py
@@ -237,7 +237,6 @@ def load_vae(model, vae_file=None, vae_source="from unknown source"):
 # don't call this from outside
 def _load_vae_dict(model, vae_dict_1):
    model.first_stage_model.load_state_dict(vae_dict_1)
-    model.first_stage_model.to(devices.dtype_vae)


 def clear_loaded_vae():
@@ -263,20 +262,12 @@ def reload_vae_weights(sd_model=None, vae_file=unspecified):
    if loaded_vae_file == vae_file:
        return

-    if sd_model.lowvram:
-        lowvram.send_everything_to_cpu()
-    else:
-        sd_model.to(devices.cpu)
-
    sd_hijack.model_hijack.undo_hijack(sd_model)

    load_vae(sd_model, vae_file, vae_source)

    sd_hijack.model_hijack.hijack(sd_model)

-    if not sd_model.lowvram:
-        sd_model.to(devices.device)
-
    script_callbacks.model_loaded_callback(sd_model)

    print("VAE weights loaded.")
--- a/modules/shared_init.py
+++ b/modules/shared_init.py
@@ -24,13 +24,6 @@ def initialize():
        pass

    from modules import devices
-    devices.device, devices.device_interrogate, devices.device_gfpgan, devices.device_esrgan, devices.device_codeformer = \
-        (devices.cpu if any(y in cmd_opts.use_cpu for y in [x, 'all']) else devices.get_optimal_device() for x in ['sd', 'interrogate', 'gfpgan', 'esrgan', 'codeformer'])
-
-    devices.dtype = torch.float32 if cmd_opts.no_half else torch.float16
-    devices.dtype_vae = torch.float32 if cmd_opts.no_half or cmd_opts.no_half_vae else torch.float16
-    devices.dtype_inference = torch.float32 if cmd_opts.precision == 'full' else devices.dtype
-
    shared.device = devices.device
    shared.weight_load_location = None if cmd_opts.lowram else "cpu"

--- a/modules/shared_options.py
+++ b/modules/shared_options.py
@@ -299,7 +299,7 @@ options_templates.update(options_section(('ui_alternatives', "UI alternatives",

 options_templates.update(options_section(('ui', "User interface", "ui"), {
    "localization": OptionInfo("None", "Localization", gr.Dropdown, lambda: {"choices": ["None"] + list(localization.localizations.keys())}, refresh=lambda: localization.list_localizations(cmd_opts.localizations_dir)).needs_reload_ui(),
-    "quicksettings_list": OptionInfo(["sd_model_checkpoint"], "Quicksettings list", ui_components.DropdownMulti, lambda: {"choices": list(shared.opts.data_labels.keys())}).js("info", "settingsHintsShowQuicksettings").info("setting entries that appear at the top of page rather than in settings tab").needs_reload_ui(),
+    "quicksettings_list": OptionInfo(["sd_model_checkpoint", "sd_vae", "CLIP_stop_at_last_layers"], "Quicksettings list", ui_components.DropdownMulti, lambda: {"choices": list(shared.opts.data_labels.keys())}).js("info", "settingsHintsShowQuicksettings").info("setting entries that appear at the top of page rather than in settings tab").needs_reload_ui(),
    "ui_tab_order": OptionInfo([], "UI tab order", ui_components.DropdownMulti, lambda: {"choices": list(shared.tab_names)}).needs_reload_ui(),
    "hidden_tabs": OptionInfo([], "Hidden UI tabs", ui_components.DropdownMulti, lambda: {"choices": list(shared.tab_names)}).needs_reload_ui(),
    "ui_reorder_list": OptionInfo([], "UI item order for txt2img/img2img tabs", ui_components.DropdownMulti, lambda: {"choices": list(shared_items.ui_reorder_categories())}).info("selected items appear first").needs_reload_ui(),
--- a/modules/ui.py
+++ b/modules/ui.py
@@ -167,9 +167,15 @@ def update_token_counter(text, steps, *, is_positive=True):
        # messages related to it in console
        prompt_schedules = [[[steps, text]]]

+    try:
+        cond_stage_model = sd_models.model_data.sd_model.cond_stage_model
+        assert cond_stage_model is not None
+    except Exception:
+        return f"<span class='gr-box gr-text-input'>?/?</span>"
+
    flat_prompts = reduce(lambda list1, list2: list1+list2, prompt_schedules)
    prompts = [prompt_text for step, prompt_text in flat_prompts]
-    token_count, max_length = max([model_hijack.get_prompt_lengths(prompt) for prompt in prompts], key=lambda args: args[0])
+    token_count, max_length = max([model_hijack.get_prompt_lengths(prompt, cond_stage_model) for prompt in prompts], key=lambda args: args[0])
    return f"<span class='gr-box gr-text-input'>{token_count}/{max_length}</span>"


--- a/modules/ui_settings.py
+++ b/modules/ui_settings.py
@@ -294,7 +294,6 @@ class UiSettings:

        for _i, k, _item in self.quicksettings_list:
            component = self.component_dict[k]
-            info = opts.data_labels[k]

            if isinstance(component, gr.Textbox):
                methods = [component.submit, component.blur]
@@ -308,7 +307,7 @@ class UiSettings:
                    fn=lambda value, k=k: self.run_settings_single(value, key=k),
                    inputs=[component],
                    outputs=[component, self.text_settings],
-                    show_progress=info.refresh is not None,
+                    show_progress=False,
                )

        button_set_checkpoint = gr.Button('Change checkpoint', elem_id='change_checkpoint', visible=False)
--- a/modules/upscaler.py
+++ b/modules/upscaler.py
@@ -6,6 +6,13 @@ from PIL import Image

 import modules.shared
 from modules import modelloader, shared
+from ldm_patched.modules import model_management
+
+
+def prepare_free_memory():
+    model_management.free_memory(memory_required=1024*1024*3, device=model_management.get_torch_device())
+    print('Upscale script freed memory successfully.')
+

 LANCZOS = (Image.Resampling.LANCZOS if hasattr(Image, 'Resampling') else Image.LANCZOS)
 NEAREST = (Image.Resampling.NEAREST if hasattr(Image, 'Resampling') else Image.NEAREST)
--- a/modules_forge/forge_clip.py
+++ b/modules_forge/forge_clip.py
@@ -0,0 +1,84 @@
+from modules.sd_hijack_clip import FrozenCLIPEmbedderWithCustomWords
+from ldm_patched.modules import model_management
+from modules.shared import opts
+
+
+class CLIP_SD_15_L(FrozenCLIPEmbedderWithCustomWords):
+    def encode_with_transformers(self, tokens):
+        model_management.load_model_gpu(self.forge_objects.clip.patcher)
+        outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
+
+        if opts.CLIP_stop_at_last_layers > 1:
+            z = outputs.hidden_states[-opts.CLIP_stop_at_last_layers]
+            z = self.wrapped.transformer.text_model.final_layer_norm(z)
+        else:
+            z = outputs.last_hidden_state
+
+        return z
+
+
+class CLIP_SD_21_H(FrozenCLIPEmbedderWithCustomWords):
+    def __init__(self, wrapped, hijack):
+        super().__init__(wrapped, hijack)
+
+        if self.wrapped.layer == "penultimate":
+            self.wrapped.layer = "hidden"
+            self.wrapped.layer_idx = -2
+
+        self.id_start = 49406
+        self.id_end = 49407
+        self.id_pad = 0
+
+    def encode_with_transformers(self, tokens):
+        model_management.load_model_gpu(self.forge_objects.clip.patcher)
+        outputs = self.wrapped.transformer(tokens, output_hidden_states=self.wrapped.layer == "hidden")
+
+        if self.wrapped.layer == "last":
+            z = outputs.last_hidden_state
+        else:
+            z = outputs.hidden_states[self.wrapped.layer_idx]
+            z = self.wrapped.transformer.text_model.final_layer_norm(z)
+
+        return z
+
+
+class CLIP_SD_XL_L(FrozenCLIPEmbedderWithCustomWords):
+    def __init__(self, wrapped, hijack):
+        super().__init__(wrapped, hijack)
+
+    def encode_with_transformers(self, tokens):
+        outputs = self.wrapped.transformer(tokens, output_hidden_states=self.wrapped.layer == "hidden")
+
+        if self.wrapped.layer == "last":
+            z = outputs.last_hidden_state
+        else:
+            z = outputs.hidden_states[self.wrapped.layer_idx]
+
+        return z
+
+
+class CLIP_SD_XL_G(FrozenCLIPEmbedderWithCustomWords):
+    def __init__(self, wrapped, hijack):
+        super().__init__(wrapped, hijack)
+
+        if self.wrapped.layer == "penultimate":
+            self.wrapped.layer = "hidden"
+            self.wrapped.layer_idx = -2
+
+        self.id_start = 49406
+        self.id_end = 49407
+        self.id_pad = 0
+
+    def encode_with_transformers(self, tokens):
+        outputs = self.wrapped.transformer(tokens, output_hidden_states=self.wrapped.layer == "hidden")
+
+        if self.wrapped.layer == "last":
+            z = outputs.last_hidden_state
+        else:
+            z = outputs.hidden_states[self.wrapped.layer_idx]
+
+        pooled_output = outputs.pooler_output
+        text_projection = self.wrapped.text_projection
+        pooled_output = pooled_output.float().to(text_projection.device) @ text_projection.float()
+        z.pooled = pooled_output
+        return z
--- a/modules_forge/forge_loader.py
+++ b/modules_forge/forge_loader.py
@@ -0,0 +1,248 @@
+import torch
+import contextlib
+
+from ldm_patched.modules import model_management
+from ldm_patched.modules import model_detection
+
+from ldm_patched.modules.sd import VAE, CLIP, load_model_weights
+import ldm_patched.modules.model_patcher
+import ldm_patched.modules.utils
+import ldm_patched.modules.clip_vision
+
+from omegaconf import OmegaConf
+from modules.sd_models_config import find_checkpoint_config
+from modules.shared import cmd_opts
+from modules import sd_hijack
+from modules.sd_models_xl import extend_sdxl
+from ldm.util import instantiate_from_config
+from modules_forge import forge_clip
+
+import open_clip
+from transformers import CLIPTextModel, CLIPTokenizer
+
+
+class FakeObject:
+    def __init__(self, *args, **kwargs):
+        super().__init__()
+        self.visual = None
+        return
+
+    def eval(self, *args, **kwargs):
+        return self
+
+    def parameters(self, *args, **kwargs):
+        return []
+
+
+class ForgeSD:
+    def __init__(self, unet, clip, vae, clipvision):
+        self.unet = unet
+        self.clip = clip
+        self.vae = vae
+        self.clipvision = clipvision
+
+    def shallow_copy(self):
+        return ForgeSD(self.unet, self.clip, self.vae, self.clipvision)
+
+
+@contextlib.contextmanager
+def no_clip():
+    backup_openclip = open_clip.create_model_and_transforms
+    backup_CLIPTextModel = CLIPTextModel.from_pretrained
+    backup_CLIPTokenizer = CLIPTokenizer.from_pretrained
+
+    try:
+        open_clip.create_model_and_transforms = lambda *args, **kwargs: (FakeObject(), None, None)
+        CLIPTextModel.from_pretrained = lambda *args, **kwargs: FakeObject()
+        CLIPTokenizer.from_pretrained = lambda *args, **kwargs: FakeObject()
+        yield
+
+    finally:
+        open_clip.create_model_and_transforms = backup_openclip
+        CLIPTextModel.from_pretrained = backup_CLIPTextModel
+        CLIPTokenizer.from_pretrained = backup_CLIPTokenizer
+    return
+
+
+def load_checkpoint_guess_config(sd, output_vae=True, output_clip=True, output_clipvision=False, embedding_directory=None, output_model=True):
+    sd_keys = sd.keys()
+    clip = None
+    clipvision = None
+    vae = None
+    model = None
+    model_patcher = None
+    clip_target = None
+
+    parameters = ldm_patched.modules.utils.calculate_parameters(sd, "model.diffusion_model.")
+    unet_dtype = model_management.unet_dtype(model_params=parameters)
+    load_device = model_management.get_torch_device()
+    manual_cast_dtype = model_management.unet_manual_cast(unet_dtype, load_device)
+
+    class WeightsLoader(torch.nn.Module):
+        pass
+
+    model_config = model_detection.model_config_from_unet(sd, "model.diffusion_model.", unet_dtype)
+    model_config.set_manual_cast(manual_cast_dtype)
+
+    if model_config is None:
+        raise RuntimeError("ERROR: Could not detect model type")
+
+    if model_config.clip_vision_prefix is not None:
+        if output_clipvision:
+            clipvision = ldm_patched.modules.clip_vision.load_clipvision_from_sd(sd, model_config.clip_vision_prefix, True)
+
+    if output_model:
+        inital_load_device = model_management.unet_inital_load_device(parameters, unet_dtype)
+        offload_device = model_management.unet_offload_device()
+        model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device)
+        model.load_model_weights(sd, "model.diffusion_model.")
+
+    if output_vae:
+        vae_sd = ldm_patched.modules.utils.state_dict_prefix_replace(sd, {"first_stage_model.": ""}, filter_keys=True)
+        vae_sd = model_config.process_vae_state_dict(vae_sd)
+        vae = VAE(sd=vae_sd)
+
+    if output_clip:
+        w = WeightsLoader()
+        clip_target = model_config.clip_target()
+        if clip_target is not None:
+            clip = CLIP(clip_target, embedding_directory=embedding_directory)
+            w.cond_stage_model = clip.cond_stage_model
+            sd = model_config.process_clip_state_dict(sd)
+            load_model_weights(w, sd)
+
+    left_over = sd.keys()
+    if len(left_over) > 0:
+        print("left over keys:", left_over)
+
+    if output_model:
+        model_patcher = ldm_patched.modules.model_patcher.ModelPatcher(model, load_device=load_device, offload_device=model_management.unet_offload_device(), current_device=inital_load_device)
+        if inital_load_device != torch.device("cpu"):
+            print("loaded straight to GPU")
+            model_management.load_model_gpu(model_patcher)
+
+    return ForgeSD(model_patcher, clip, vae, clipvision)
+
+
+def load_model_for_a1111(timer, checkpoint_info=None, state_dict=None):
+    a1111_config_filename = find_checkpoint_config(state_dict, checkpoint_info)
+    a1111_config = OmegaConf.load(a1111_config_filename)
+    timer.record("forge solving config")
+
+    if hasattr(a1111_config.model.params, 'network_config'):
+        a1111_config.model.params.network_config.target = 'modules_forge.forge_loader.FakeObject'
+
+    if hasattr(a1111_config.model.params, 'unet_config'):
+        a1111_config.model.params.unet_config.target = 'modules_forge.forge_loader.FakeObject'
+
+    if hasattr(a1111_config.model.params, 'first_stage_config'):
+        a1111_config.model.params.first_stage_config.target = 'modules_forge.forge_loader.FakeObject'
+
+    with no_clip():
+        sd_model = instantiate_from_config(a1111_config.model)
+
+    timer.record("forge instantiate config")
+
+    forge_objects = load_checkpoint_guess_config(
+        state_dict,
+        output_vae=True,
+        output_clip=True,
+        output_clipvision=True,
+        embedding_directory=cmd_opts.embeddings_dir,
+        output_model=True
+    )
+    sd_model.forge_objects = forge_objects
+    sd_model.forge_objects_original = forge_objects.shallow_copy()
+    sd_model.forge_objects_after_applying_lora = forge_objects.shallow_copy()
+    timer.record("forge load real models")
+
+    sd_model.first_stage_model = forge_objects.vae.first_stage_model
+    sd_model.model.diffusion_model = forge_objects.unet.model.diffusion_model
+
+    conditioner = getattr(sd_model, 'conditioner', None)
+    if conditioner:
+        text_cond_models = []
+
+        for i in range(len(conditioner.embedders)):
+            embedder = conditioner.embedders[i]
+            typename = type(embedder).__name__
+            if typename == 'FrozenCLIPEmbedder':  # SDXL Clip L
+                embedder.tokenizer = forge_objects.clip.tokenizer.clip_l.tokenizer
+                embedder.transformer = forge_objects.clip.cond_stage_model.clip_l.transformer
+                model_embeddings = embedder.transformer.text_model.embeddings
+                model_embeddings.token_embedding = sd_hijack.EmbeddingsWithFixes(
+                    model_embeddings.token_embedding, sd_hijack.model_hijack)
+                embedder = forge_clip.CLIP_SD_XL_L(embedder, sd_hijack.model_hijack)
+                embedder.forge_objects = forge_objects
+                conditioner.embedders[i] = embedder
+                text_cond_models.append(embedder)
+            elif typename == 'FrozenOpenCLIPEmbedder2':  # SDXL Clip G
+                embedder.tokenizer = forge_objects.clip.tokenizer.clip_g.tokenizer
+                embedder.transformer = forge_objects.clip.cond_stage_model.clip_g.transformer
+                embedder.text_projection = forge_objects.clip.cond_stage_model.clip_g.text_projection
+                model_embeddings = embedder.transformer.text_model.embeddings
+                model_embeddings.token_embedding = sd_hijack.EmbeddingsWithFixes(
+                    model_embeddings.token_embedding, sd_hijack.model_hijack, textual_inversion_key='clip_g')
+                embedder = forge_clip.CLIP_SD_XL_G(embedder, sd_hijack.model_hijack)
+                embedder.forge_objects = forge_objects
+                conditioner.embedders[i] = embedder
+                text_cond_models.append(embedder)
+
+        if len(text_cond_models) == 1:
+            sd_model.cond_stage_model = text_cond_models[0]
+        else:
+            sd_model.cond_stage_model = conditioner
+    elif type(sd_model.cond_stage_model).__name__ == 'FrozenCLIPEmbedder':  # SD15 Clip
+        sd_model.cond_stage_model.tokenizer = forge_objects.clip.tokenizer.clip_l.tokenizer
+        sd_model.cond_stage_model.transformer = forge_objects.clip.cond_stage_model.clip_l.transformer
+        model_embeddings = sd_model.cond_stage_model.transformer.text_model.embeddings
+        model_embeddings.token_embedding = sd_hijack.EmbeddingsWithFixes(
+            model_embeddings.token_embedding, sd_hijack.model_hijack)
+        sd_model.cond_stage_model = forge_clip.CLIP_SD_15_L(sd_model.cond_stage_model, sd_hijack.model_hijack)
+        sd_model.cond_stage_model.forge_objects = forge_objects
+    elif type(sd_model.cond_stage_model).__name__ == 'FrozenOpenCLIPEmbedder':  # SD21 Clip
+        sd_model.cond_stage_model.tokenizer = forge_objects.clip.tokenizer.clip_h.tokenizer
+        sd_model.cond_stage_model.transformer = forge_objects.clip.cond_stage_model.clip_h.transformer
+        model_embeddings = sd_model.cond_stage_model.transformer.text_model.embeddings
+        model_embeddings.token_embedding = sd_hijack.EmbeddingsWithFixes(
+            model_embeddings.token_embedding, sd_hijack.model_hijack)
+        sd_model.cond_stage_model = forge_clip.CLIP_SD_21_H(sd_model.cond_stage_model, sd_hijack.model_hijack)
+        sd_model.cond_stage_model.forge_objects = forge_objects
+    else:
+        raise NotImplementedError('Bad Clip Class Name:' + type(sd_model.cond_stage_model).__name__)
+
+    timer.record("forge set components")
+
+    sd_model_hash = checkpoint_info.calculate_shorthash()
+    timer.record("calculate hash")
+
+    sd_model.is_sdxl = conditioner is not None
+    sd_model.is_sd2 = not sd_model.is_sdxl and hasattr(sd_model.cond_stage_model, 'model')
+    sd_model.is_sd1 = not sd_model.is_sdxl and not sd_model.is_sd2
+    sd_model.is_ssd = sd_model.is_sdxl and 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn1.to_q.weight' not in sd_model.state_dict().keys()
+    if sd_model.is_sdxl:
+        extend_sdxl(sd_model)
+    sd_model.sd_model_hash = sd_model_hash
+    sd_model.sd_model_checkpoint = checkpoint_info.filename
+    sd_model.sd_checkpoint_info = checkpoint_info
+
+    def patched_decode_first_stage(x):
+        sample = forge_objects.unet.model.model_config.latent_format.process_out(x)
+        sample = forge_objects.vae.decode(sample).movedim(-1, 1) * 2.0 - 1.0
+        return sample.to(x)
+
+    def patched_encode_first_stage(x):
+        sample = forge_objects.vae.encode(x.movedim(1, -1) * 0.5 + 0.5)
+        sample = forge_objects.unet.model.model_config.latent_format.process_in(sample)
+        return sample.to(x)
+
+    sd_model.ema_scope = lambda *args, **kwargs: contextlib.nullcontext()
+    sd_model.get_first_stage_encoding = lambda x: x
+    sd_model.decode_first_stage = patched_decode_first_stage
+    sd_model.encode_first_stage = patched_encode_first_stage
+    sd_model.clip = sd_model.cond_stage_model
+    sd_model.current_controlnet_required_memory = 0
+    timer.record("forge finalize")
+
+    sd_model.current_lora_hash = str([])
+    return sd_model
--- a/modules_forge/forge_sampler.py
+++ b/modules_forge/forge_sampler.py
@@ -0,0 +1,49 @@
+import torch
+from ldm_patched.modules.conds import CONDRegular, CONDCrossAttn
+from ldm_patched.modules.samplers import sampling_function
+
+
+def cond_from_a1111_to_patched_ldm(cond):
+    if isinstance(cond, torch.Tensor):
+        result = dict(
+            cross_attn=cond,
+            model_conds=dict(
+                c_crossattn=CONDCrossAttn(cond),
+            )
+        )
+        return [result, ]
+
+    cross_attn = cond['crossattn']
+    pooled_output = cond['vector']
+
+    result = dict(
+        cross_attn=cross_attn,
+        pooled_output=pooled_output,
+        model_conds=dict(
+            c_crossattn=CONDCrossAttn(cross_attn),
+            y=CONDRegular(pooled_output)
+        )
+    )
+
+    return [result, ]
+
+
+def forge_sample(self, denoiser_params, cond_scale):
+    model = self.inner_model.inner_model.forge_objects.unet.model
+    x = denoiser_params.x
+    timestep = denoiser_params.sigma
+    uncond = cond_from_a1111_to_patched_ldm(denoiser_params.text_uncond)
+    cond = cond_from_a1111_to_patched_ldm(denoiser_params.text_cond)
+    model_options = self.inner_model.inner_model.forge_objects.unet.model_options
+    seed = self.p.seeds[0]
+
+    image_cond_in = denoiser_params.image_cond
+    if isinstance(image_cond_in, torch.Tensor):
+        if image_cond_in.shape[0] == x.shape[0] \
+                and image_cond_in.shape[2] == x.shape[2] \
+                and image_cond_in.shape[3] == x.shape[3]:
+            uncond[0]['model_conds']['c_concat'] = CONDRegular(image_cond_in)
+            cond[0]['model_conds']['c_concat'] = CONDRegular(image_cond_in)
+
+    denoised = sampling_function(model, x, timestep, uncond, cond, cond_scale, model_options, seed)
+    return denoised
--- a/modules_forge/forge_util.py
+++ b/modules_forge/forge_util.py
@@ -0,0 +1,70 @@
+import torch
+import numpy as np
+import os
+import time
+import random
+import string
+
+
+def generate_random_filename(extension=".txt"):
+    timestamp = time.strftime("%Y%m%d-%H%M%S")
+    random_string = ''.join(random.choices(string.ascii_lowercase + string.digits, k=5))
+    filename = f"{timestamp}-{random_string}{extension}"
+    return filename
+
+
+@torch.no_grad()
+@torch.inference_mode()
+def pytorch_to_numpy(x):
+    return [np.clip(255. * y.cpu().numpy(), 0, 255).astype(np.uint8) for y in x]
+
+
+@torch.no_grad()
+@torch.inference_mode()
+def numpy_to_pytorch(x):
+    y = x.astype(np.float32) / 255.0
+    y = y[None]
+    y = np.ascontiguousarray(y.copy())
+    y = torch.from_numpy(y).float()
+    return y
+
+
+def write_images_to_mp4(frame_list: list, filename=None, fps=6):
+    from modules.paths_internal import default_output_dir
+
+    video_folder = os.path.join(default_output_dir, 'svd')
+    os.makedirs(video_folder, exist_ok=True)
+
+    if filename is None:
+        filename = generate_random_filename('.mp4')
+
+    full_path = os.path.join(video_folder, filename)
+
+    try:
+        import av
+    except ImportError:
+        from launch import run_pip
+        run_pip(
+            "install imageio[pyav]",
+            "imageio[pyav]",
+        )
+        import av
+
+    options = {
+        "crf": str(23)
+    }
+
+    output = av.open(full_path, "w")
+
+    stream = output.add_stream('libx264', fps, options=options)
+    stream.width = frame_list[0].shape[1]
+    stream.height = frame_list[0].shape[0]
+    for img in frame_list:
+        frame = av.VideoFrame.from_ndarray(img)
+        packet = stream.encode(frame)
+        output.mux(packet)
+    packet = stream.encode(None)
+    output.mux(packet)
+    output.close()
+
+    return full_path
--- a/modules_forge/forge_version.py
+++ b/modules_forge/forge_version.py
@@ -0,0 +1 @@
+version = '0.0.1'
--- a/modules_forge/gradio_compile.py
+++ b/modules_forge/gradio_compile.py
@@ -0,0 +1,54 @@
+
+def gradio_compile(items, prefix):
+    names = []
+    for k, v in items["required"].items():
+        t = v[0]
+        d = v[1] if len(v) > 1 else None
+        if prefix != '':
+            name = (prefix + '_' + k).replace(' ', '_').lower()
+        else:
+            name = k.replace(' ', '_').lower()
+
+        title = name.replace('_', ' ').title()
+
+        if t == 'INT':
+            default = int(d['default'])
+            min = int(d['min'])
+            max = int(d['max'])
+            step = int(d.get('step', 1))
+            print(f'{name} = gr.Slider(label=\'{title}\', minimum={min}, maximum={max}, step={step}, value={default})')
+            names.append(name)
+        elif t == 'FLOAT':
+            default = float(d['default'])
+            min = float(d['min'])
+            max = float(d['max'])
+            step = float(d.get('step', 0.001))
+            print(f'{name} = gr.Slider(label=\'{title}\', minimum={min}, maximum={max}, step={step}, value={default})')
+            names.append(name)
+        elif isinstance(t, list):
+            print(f'{name} = gr.Radio(label=\'{title}\', choices={str(t)}, value=\'{t[0]}\')')
+            names.append(name)
+        elif t == 'MODEL':
+            pass
+        elif t == 'CONDITIONING':
+            pass
+        elif t == 'LATENT':
+            pass
+        elif t == 'CLIP_VISION':
+            pass
+        elif t == 'IMAGE':
+            pass
+        elif t == 'VAE':
+            pass
+        else:
+            print('error ' + str(t))
+
+    return names
+
+
+# from modules_forge.gradio_compile import gradio_compile
+# ps = []
+# ps += gradio_compile(SVD_img2vid_Conditioning.INPUT_TYPES(), prefix='')
+# ps += gradio_compile(KSampler.INPUT_TYPES(), prefix='sampling')
+# ps += gradio_compile(VideoLinearCFGGuidance.INPUT_TYPES(), prefix='guidance')
+# print(', '.join(ps))
--- a/modules_forge/initialization.py
+++ b/modules_forge/initialization.py
@@ -0,0 +1,24 @@
+
+def initialize_forge():
+    from ldm_patched.modules import args_parser
+
+    args_parser.args, _ = args_parser.parser.parse_known_args()
+
+    import ldm_patched.modules.model_management as model_management
+    import torch
+
+    device = model_management.get_torch_device()
+    torch.zeros((1, 1)).to(device, torch.float32)
+    model_management.soft_empty_cache()
+
+    import modules_forge.patch_clip
+    modules_forge.patch_clip.patch_all_clip()
+
+    import modules_forge.patch_precision
+    modules_forge.patch_precision.patch_all_precision()
+
+    if model_management.directml_enabled:
+        model_management.lowvram_available = True
+        model_management.OOM_EXCEPTION = Exception
+
+    return
--- a/modules_forge/ops.py
+++ b/modules_forge/ops.py
@@ -0,0 +1,19 @@
+import torch
+import contextlib
+
+
+@contextlib.contextmanager
+def use_patched_ops(operations):
+    op_names = ['Linear', 'Conv2d', 'Conv3d', 'GroupNorm', 'LayerNorm']
+    backups = {op_name: getattr(torch.nn, op_name) for op_name in op_names}
+
+    try:
+        for op_name in op_names:
+            setattr(torch.nn, op_name, getattr(operations, op_name))
+
+        yield
+
+    finally:
+        for op_name in op_names:
+            setattr(torch.nn, op_name, backups[op_name])
+    return
--- a/modules_forge/patch_clip.py
+++ b/modules_forge/patch_clip.py
@@ -0,0 +1,112 @@
+# Consistent with Kohya/A1111 to reduce differences between model training and inference.
+
+import os
+import torch
+import ldm_patched.controlnet.cldm
+import ldm_patched.k_diffusion.sampling
+import ldm_patched.ldm.modules.attention
+import ldm_patched.ldm.modules.diffusionmodules.model
+import ldm_patched.ldm.modules.diffusionmodules.openaimodel
+import ldm_patched.ldm.modules.diffusionmodules.openaimodel
+import ldm_patched.modules.args_parser
+import ldm_patched.modules.model_base
+import ldm_patched.modules.model_management
+import ldm_patched.modules.model_patcher
+import ldm_patched.modules.samplers
+import ldm_patched.modules.sd
+import ldm_patched.modules.sd1_clip
+import ldm_patched.modules.clip_vision
+import ldm_patched.modules.ops as ops
+
+from modules_forge.ops import use_patched_ops
+from transformers import CLIPTextModel, CLIPTextConfig, modeling_utils
+
+
+def patched_SDClipModel__init__(self, max_length=77, freeze=True, layer="last", layer_idx=None,
+                                textmodel_json_config=None, dtype=None, special_tokens=None,
+                                layer_norm_hidden_state=True, **kwargs):
+    torch.nn.Module.__init__(self)
+    assert layer in self.LAYERS
+
+    if special_tokens is None:
+        special_tokens = {"start": 49406, "end": 49407, "pad": 49407}
+
+    if textmodel_json_config is None:
+        textmodel_json_config = os.path.join(os.path.dirname(os.path.realpath(ldm_patched.modules.sd1_clip.__file__)),
+                                             "sd1_clip_config.json")
+
+    config = CLIPTextConfig.from_json_file(textmodel_json_config)
+    self.num_layers = config.num_hidden_layers
+
+    with use_patched_ops(ops.manual_cast):
+        with modeling_utils.no_init_weights():
+            self.transformer = CLIPTextModel(config)
+
+    if dtype is not None:
+        self.transformer.to(dtype)
+
+    self.transformer.text_model.embeddings.to(torch.float32)
+
+    if freeze:
+        self.freeze()
+
+    self.max_length = max_length
+    self.layer = layer
+    self.layer_idx = None
+    self.special_tokens = special_tokens
+    self.text_projection = torch.nn.Parameter(torch.eye(self.transformer.get_input_embeddings().weight.shape[1]))
+    self.logit_scale = torch.nn.Parameter(torch.tensor(4.6055))
+    self.enable_attention_masks = False
+
+    self.layer_norm_hidden_state = layer_norm_hidden_state
+    if layer == "hidden":
+        assert layer_idx is not None
+        assert abs(layer_idx) < self.num_layers
+        self.clip_layer(layer_idx)
+    self.layer_default = (self.layer, self.layer_idx)
+
+
+def patched_SDClipModel_forward(self, tokens):
+    backup_embeds = self.transformer.get_input_embeddings()
+    device = backup_embeds.weight.device
+    tokens = self.set_up_textual_embeddings(tokens, backup_embeds)
+    tokens = torch.LongTensor(tokens).to(device)
+
+    attention_mask = None
+    if self.enable_attention_masks:
+        attention_mask = torch.zeros_like(tokens)
+        max_token = self.transformer.get_input_embeddings().weight.shape[0] - 1
+        for x in range(attention_mask.shape[0]):
+            for y in range(attention_mask.shape[1]):
+                attention_mask[x, y] = 1
+                if tokens[x, y] == max_token:
+                    break
+
+    outputs = self.transformer(input_ids=tokens, attention_mask=attention_mask,
+                               output_hidden_states=self.layer == "hidden")
+    self.transformer.set_input_embeddings(backup_embeds)
+
+    if self.layer == "last":
+        z = outputs.last_hidden_state
+    elif self.layer == "pooled":
+        z = outputs.pooler_output[:, None, :]
+    else:
+        z = outputs.hidden_states[self.layer_idx]
+        if self.layer_norm_hidden_state:
+            z = self.transformer.text_model.final_layer_norm(z)
+
+    if hasattr(outputs, "pooler_output"):
+        pooled_output = outputs.pooler_output.float()
+    else:
+        pooled_output = None
+
+    if self.text_projection is not None and pooled_output is not None:
+        pooled_output = pooled_output.float().to(self.text_projection.device) @ self.text_projection.float()
+
+    return z.float(), pooled_output
+
+
+def patch_all_clip():
+    ldm_patched.modules.sd1_clip.SDClipModel.__init__ = patched_SDClipModel__init__
+    ldm_patched.modules.sd1_clip.SDClipModel.forward = patched_SDClipModel_forward
+    return
--- a/modules_forge/patch_precision.py
+++ b/modules_forge/patch_precision.py
@@ -0,0 +1,60 @@
+# Consistent with Kohya to reduce differences between model training and inference.
+
+import torch
+import math
+import einops
+import numpy as np
+
+import ldm_patched.ldm.modules.diffusionmodules.openaimodel
+import ldm_patched.modules.model_sampling
+import ldm_patched.modules.sd1_clip
+
+from ldm_patched.ldm.modules.diffusionmodules.util import make_beta_schedule
+
+
+def patched_timestep_embedding(timesteps, dim, max_period=10000, repeat_only=False):
+    # Consistent with Kohya to reduce differences between model training and inference.
+
+    if not repeat_only:
+        half = dim // 2
+        freqs = torch.exp(
+            -math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half
+        ).to(device=timesteps.device)
+        args = timesteps[:, None].float() * freqs[None]
+        embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
+        if dim % 2:
+            embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
+    else:
+        embedding = einops.repeat(timesteps, 'b -> b d', d=dim)
+    return embedding
+
+
+def patched_register_schedule(self, given_betas=None, beta_schedule="linear", timesteps=1000,
+                          linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3):
+    # Consistent with Kohya to reduce differences between model training and inference.
+
+    if given_betas is not None:
+        betas = given_betas
+    else:
+        betas = make_beta_schedule(
+            beta_schedule,
+            timesteps,
+            linear_start=linear_start,
+            linear_end=linear_end,
+            cosine_s=cosine_s)
+
+    alphas = 1. - betas
+    alphas_cumprod = np.cumprod(alphas, axis=0)
+    timesteps, = betas.shape
+    self.num_timesteps = int(timesteps)
+    self.linear_start = linear_start
+    self.linear_end = linear_end
+    sigmas = torch.tensor(((1 - alphas_cumprod) / alphas_cumprod) ** 0.5, dtype=torch.float32)
+    self.set_sigmas(sigmas)
+    return
+
+
+def patch_all_precision():
+    ldm_patched.ldm.modules.diffusionmodules.openaimodel.timestep_embedding = patched_timestep_embedding
+    ldm_patched.modules.model_sampling.ModelSamplingDiscrete._register_schedule = patched_register_schedule
+    return
--- a/webui-user.bat
+++ b/webui-user.bat
@@ -5,4 +5,14 @@ set GIT=
 set VENV_DIR=
 set COMMANDLINE_ARGS=

+@REM Uncomment following code to reference an existing A1111 checkout.
+@REM set A1111_HOME=Your A1111 checkout dir
+@REM
+@REM set VENV_DIR=%A1111_HOME%/venv
+@REM set COMMANDLINE_ARGS=%COMMANDLINE_ARGS%^
+@REM  --ckpt-dir %A1111_HOME%/models/Stable-diffusion^
+@REM  --hypernetwork-dir %A1111_HOME%/models/hypernetworks^
+@REM  --embeddings-dir %A1111_HOME%/models/embeddings^
+@REM  --lora-dir %A1111_HOME%/models/Lora
+
 call webui.bat