ComfyUI version 0.3.52

Update frontend to v1.25.10 and revert navigation mode override (#9522 )
- Update comfyui-frontend-package from 1.25.9 to 1.25.10 - Revert forced legacy navigation mode from PR #9518 - Frontend v1.25.10 includes proper navigation mode fixes and improved display text
2026-02-13 11:40:02 +00:00 · 2025-08-23 18:57:09 -04:00 · 2025-08-23 17:54:01 -04:00 · 2025-08-23 13:56:17 -04:00 · 2025-08-23 01:36:44 -04:00 · 2025-08-22 23:15:44 -04:00
140 changed files with 13742 additions and 2957 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,2 +1,3 @@
 /web/assets/** linguist-generated
 /web/** linguist-vendored
+comfy_api_nodes/apis/__init__.py linguist-generated
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@@ -22,7 +22,7 @@ body:
      description: Please confirm you have tried to reproduce the issue with all custom nodes disabled.
      options:
        - label: I have tried disabling custom nodes and the issue persists (see [how to disable custom nodes](https://docs.comfy.org/troubleshooting/custom-node-issues#step-1%3A-test-with-all-custom-nodes-disabled) if you need help)
-          required: true
+          required: false
  - type: textarea
    attributes:
      label: Expected Behavior
--- a/.github/ISSUE_TEMPLATE/user-support.yml
+++ b/.github/ISSUE_TEMPLATE/user-support.yml
@@ -18,7 +18,7 @@ body:
        description: Please confirm you have tried to reproduce the issue with all custom nodes disabled.
        options:
          - label: I have tried disabling custom nodes and the issue persists (see [how to disable custom nodes](https://docs.comfy.org/troubleshooting/custom-node-issues#step-1%3A-test-with-all-custom-nodes-disabled) if you need help)
-            required: true
+            required: false
    - type: textarea
      attributes:
            label: Your question
--- a/.github/workflows/stable-release.yml
+++ b/.github/workflows/stable-release.yml
@@ -12,17 +12,17 @@ on:
        description: 'CUDA version'
        required: true
        type: string
-        default: "128"
+        default: "129"
      python_minor:
        description: 'Python minor version'
        required: true
        type: string
-        default: "12"
+        default: "13"
      python_patch:
        description: 'Python patch version'
        required: true
        type: string
-        default: "10"
+        default: "6"


 jobs:
@@ -66,8 +66,13 @@ jobs:
          curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
          ./python.exe get-pip.py
          ./python.exe -s -m pip install ../cu${{ inputs.cu }}_python_deps/*
-            sed -i '1i../ComfyUI' ./python3${{ inputs.python_minor }}._pth
-            cd ..
+          sed -i '1i../ComfyUI' ./python3${{ inputs.python_minor }}._pth
+
+          rm ./Lib/site-packages/torch/lib/dnnl.lib #I don't think this is actually used and I need the space
+          rm ./Lib/site-packages/torch/lib/libprotoc.lib
+          rm ./Lib/site-packages/torch/lib/libprotobuf.lib
+
+          cd ..

          git clone --depth 1 https://github.com/comfyanonymous/taesd
          cp taesd/*.safetensors ./ComfyUI_copy/models/vae_approx/
@@ -85,7 +90,7 @@ jobs:

          cd ..

-          "C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=9 -mfb=128 -md=512m -ms=on -mf=BCJ2 ComfyUI_windows_portable.7z ComfyUI_windows_portable
+          "C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=9 -mfb=128 -md=768m -ms=on -mf=BCJ2 ComfyUI_windows_portable.7z ComfyUI_windows_portable
          mv ComfyUI_windows_portable.7z ComfyUI/ComfyUI_windows_portable_nvidia.7z

          cd ComfyUI_windows_portable
--- a/.github/workflows/windows_release_dependencies.yml
+++ b/.github/workflows/windows_release_dependencies.yml
@@ -17,19 +17,19 @@ on:
        description: 'cuda version'
        required: true
        type: string
-        default: "128"
+        default: "129"

      python_minor:
        description: 'python minor version'
        required: true
        type: string
-        default: "12"
+        default: "13"

      python_patch:
        description: 'python patch version'
        required: true
        type: string
-        default: "10"
+        default: "6"
 #  push:
 #    branches:
 #      - master
--- a/.github/workflows/windows_release_package.yml
+++ b/.github/workflows/windows_release_package.yml
@@ -7,19 +7,19 @@ on:
        description: 'cuda version'
        required: true
        type: string
-        default: "128"
+        default: "129"

      python_minor:
        description: 'python minor version'
        required: true
        type: string
-        default: "12"
+        default: "13"

      python_patch:
        description: 'python patch version'
        required: true
        type: string
-        default: "10"
+        default: "6"
 #  push:
 #    branches:
 #      - master
@@ -64,6 +64,10 @@ jobs:
            ./python.exe get-pip.py
            ./python.exe -s -m pip install ../cu${{ inputs.cu }}_python_deps/*
            sed -i '1i../ComfyUI' ./python3${{ inputs.python_minor }}._pth
+
+            rm ./Lib/site-packages/torch/lib/dnnl.lib #I don't think this is actually used and I need the space
+            rm ./Lib/site-packages/torch/lib/libprotoc.lib
+            rm ./Lib/site-packages/torch/lib/libprotobuf.lib
            cd ..

            git clone --depth 1 https://github.com/comfyanonymous/taesd
@@ -82,7 +86,7 @@ jobs:

            cd ..

-            "C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=9 -mfb=128 -md=512m -ms=on -mf=BCJ2 ComfyUI_windows_portable.7z ComfyUI_windows_portable
+            "C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=9 -mfb=128 -md=768m -ms=on -mf=BCJ2 ComfyUI_windows_portable.7z ComfyUI_windows_portable
            mv ComfyUI_windows_portable.7z ComfyUI/new_ComfyUI_windows_portable_nvidia_cu${{ inputs.cu }}_or_cpu.7z

            cd ComfyUI_windows_portable
--- a/27
+++ b/27
@@ -5,20 +5,21 @@
 # Inlined the team members for now.

 # Maintainers
-*.md @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/tests/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/tests-unit/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/notebooks/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/script_examples/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/.github/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/requirements.txt @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/pyproject.toml @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
+*.md @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/tests/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/tests-unit/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/notebooks/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/script_examples/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/.github/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/requirements.txt @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/pyproject.toml @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill

 # Python web server
-/api_server/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne
-/app/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne
-/utils/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne
+/api_server/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne @guill
+/app/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne @guill
+/utils/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne @guill

 # Node developers
-/comfy_extras/ @yoland68 @robinjhuang @pythongosssss @ltdrdata @Kosinkadink @webfiltered @christian-byrne
-/comfy/comfy_types/ @yoland68 @robinjhuang @pythongosssss @ltdrdata @Kosinkadink @webfiltered @christian-byrne
+/comfy_extras/ @yoland68 @robinjhuang @pythongosssss @ltdrdata @Kosinkadink @webfiltered @christian-byrne @guill
+/comfy/comfy_types/ @yoland68 @robinjhuang @pythongosssss @ltdrdata @Kosinkadink @webfiltered @christian-byrne @guill
+/comfy_api_nodes/ @yoland68 @robinjhuang @pythongosssss @ltdrdata @Kosinkadink @webfiltered @christian-byrne @guill
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ ComfyUI lets you design and execute advanced stable diffusion pipelines using a
 ## Get Started

 #### [Desktop Application](https://www.comfy.org/download)
- The easiest way to get started. 
+- The easiest way to get started.
 - Available on Windows & macOS.

 #### [Windows Portable Package](#installing)
@@ -66,10 +66,12 @@ See what ComfyUI can do with the [example workflows](https://comfyanonymous.gith
   - [Lumina Image 2.0](https://comfyanonymous.github.io/ComfyUI_examples/lumina2/)
   - [HiDream](https://comfyanonymous.github.io/ComfyUI_examples/hidream/)
   - [Cosmos Predict2](https://comfyanonymous.github.io/ComfyUI_examples/cosmos_predict2/)
+   - [Qwen Image](https://comfyanonymous.github.io/ComfyUI_examples/qwen_image/)
 - Image Editing Models
   - [Omnigen 2](https://comfyanonymous.github.io/ComfyUI_examples/omnigen/)
   - [Flux Kontext](https://comfyanonymous.github.io/ComfyUI_examples/flux/#flux-kontext-image-editing-model)
   - [HiDream E1.1](https://comfyanonymous.github.io/ComfyUI_examples/hidream/#hidream-e11)
+   - [Qwen Image Edit](https://comfyanonymous.github.io/ComfyUI_examples/qwen_image/#edit-model)
 - Video Models
   - [Stable Video Diffusion](https://comfyanonymous.github.io/ComfyUI_examples/video/)
   - [Mochi](https://comfyanonymous.github.io/ComfyUI_examples/mochi/)
@@ -111,7 +113,7 @@ Workflow examples can be found on the [Examples page](https://comfyanonymous.git

 ## Release Process

-ComfyUI follows a weekly release cycle every Friday, with three interconnected repositories:
+ComfyUI follows a weekly release cycle targeting Friday but this regularly changes because of model releases or large changes to the codebase. There are three interconnected repositories:

 1. **[ComfyUI Core](https://github.com/comfyanonymous/ComfyUI)**
   - Releases a new stable version (e.g., v0.7.0)
@@ -190,7 +192,7 @@ comfy install

 ## Manual Install (Windows, Linux)

-python 3.13 is supported but using 3.12 is recommended because some custom nodes and their dependencies might not support it yet.
+Python 3.13 is very well supported. If you have trouble with some custom node dependencies you can try 3.12

 Git clone this repo.

@@ -202,7 +204,7 @@ Put your VAE in: models/vae
 ### AMD GPUs (Linux only)
 AMD users can install rocm and pytorch with pip if you don't have it already installed, this is the command to install the stable version:

-```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3```
+```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.4```

 This is the command to install the nightly with ROCm 6.4 which might have some performance improvements:

@@ -210,33 +212,25 @@ This is the command to install the nightly with ROCm 6.4 which might have some p

 ### Intel GPUs (Windows and Linux)

-(Option 1) Intel Arc GPU users can install native PyTorch with torch.xpu support using pip (currently available in PyTorch nightly builds). More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)
-  
-1. To install PyTorch nightly, use the following command:
+(Option 1) Intel Arc GPU users can install native PyTorch with torch.xpu support using pip. More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)
+
+1. To install PyTorch xpu, use the following command:
+
+```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu```
+
+This is the command to install the Pytorch xpu nightly which might have some performance improvements:

 ```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu```

-2. Launch ComfyUI by running `python main.py`
-
-
 (Option 2) Alternatively, Intel GPUs supported by Intel Extension for PyTorch (IPEX) can leverage IPEX for improved performance.

-1. For Intel® Arc™ A-Series Graphics utilizing IPEX, create a conda environment and use the commands below:
-
-```
-conda install libuv
-pip install torch==2.3.1.post0+cxx11.abi torchvision==0.18.1.post0+cxx11.abi torchaudio==2.3.1.post0+cxx11.abi intel-extension-for-pytorch==2.3.110.post0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
-```
-
-For other supported Intel GPUs with IPEX, visit [Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu) for more information.
-
-Additional discussion and help can be found [here](https://github.com/comfyanonymous/ComfyUI/discussions/476).
+1. visit [Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu) for more information.

 ### NVIDIA

 Nvidia users should install stable pytorch using this command:

-```pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu128```
+```pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu129```

 This is the command to install pytorch nightly instead which might have performance improvements.

@@ -351,7 +345,7 @@ Generate a self-signed certificate (not appropriate for shared/production use) a

 Use `--tls-keyfile key.pem --tls-certfile cert.pem` to enable TLS/SSL, the app will now be accessible with `https://...` instead of `http://...`.

-> Note: Windows users can use [alexisrolland/docker-openssl](https://github.com/alexisrolland/docker-openssl) or one of the [3rd party binary distributions](https://wiki.openssl.org/index.php/Binaries) to run the command example above. 
+> Note: Windows users can use [alexisrolland/docker-openssl](https://github.com/alexisrolland/docker-openssl) or one of the [3rd party binary distributions](https://wiki.openssl.org/index.php/Binaries) to run the command example above.
 <br/><br/>If you use a container, note that the volume mount `-v` can be a relative path so `... -v ".\:/openssl-certs" ...` would create the key & cert files in the current directory of your command prompt or powershell terminal.

 ## Support and dev channel
--- a/app/model_manager.py
+++ b/app/model_manager.py
@@ -130,10 +130,21 @@ class ModelFileManager:

            for file_name in filenames:
                try:
-                    relative_path = os.path.relpath(os.path.join(dirpath, file_name), directory)
-                    result.append(relative_path)
-                except:
-                    logging.warning(f"Warning: Unable to access {file_name}. Skipping this file.")
+                    full_path = os.path.join(dirpath, file_name)
+                    relative_path = os.path.relpath(full_path, directory)
+
+                    # Get file metadata
+                    file_info = {
+                        "name": relative_path,
+                        "pathIndex": pathIndex,
+                        "modified": os.path.getmtime(full_path),  # Add modification time
+                        "created": os.path.getctime(full_path),   # Add creation time
+                        "size": os.path.getsize(full_path)        # Add file size
+                    }
+                    result.append(file_info)
+
+                except Exception as e:
+                    logging.warning(f"Warning: Unable to access {file_name}. Error: {e}. Skipping this file.")
                    continue

            for d in subdirs:
@@ -144,7 +155,7 @@ class ModelFileManager:
                    logging.warning(f"Warning: Unable to access {path}. Skipping this path.")
                    continue

-        return [{"name": f, "pathIndex": pathIndex} for f in result], dirs, time.perf_counter()
+        return result, dirs, time.perf_counter()

    def get_model_previews(self, filepath: str) -> list[str | BytesIO]:
        dirname = os.path.dirname(filepath)
--- a/app/user_manager.py
+++ b/app/user_manager.py
@@ -20,13 +20,15 @@ class FileInfo(TypedDict):
    path: str
    size: int
    modified: int
+    created: int


 def get_file_info(path: str, relative_to: str) -> FileInfo:
    return {
        "path": os.path.relpath(path, relative_to).replace(os.sep, '/'),
        "size": os.path.getsize(path),
-        "modified": os.path.getmtime(path)
+        "modified": os.path.getmtime(path),
+        "created": os.path.getctime(path)
    }


@@ -361,10 +363,17 @@ class UserManager():
            if not overwrite and os.path.exists(path):
                return web.Response(status=409, text="File already exists")

-            body = await request.read()
+            try:
+                body = await request.read()

-            with open(path, "wb") as f:
-                f.write(body)
+                with open(path, "wb") as f:
+                    f.write(body)
+            except OSError as e:
+                logging.warning(f"Error saving file '{path}': {e}")
+                return web.Response(
+                    status=400,
+                    reason="Invalid filename. Please avoid special characters like :\\/*?\"<>|"
+                )

            user_path = self.get_request_user_filepath(request, None)
            if full_info:
--- a/comfy/cli_args.py
+++ b/comfy/cli_args.py
@@ -132,6 +132,8 @@ parser.add_argument("--reserve-vram", type=float, default=None, help="Set the am

 parser.add_argument("--async-offload", action="store_true", help="Use async weight offloading.")

+parser.add_argument("--force-non-blocking", action="store_true", help="Force ComfyUI to use non-blocking operations for all applicable tensors. This may improve performance on some non-Nvidia systems but can cause issues with some workflows.")
+
 parser.add_argument("--default-hashing-function", type=str, choices=['md5', 'sha1', 'sha256', 'sha512'], default='sha256', help="Allows you to choose the hash function to use for duplicate filename / contents comparison. Default is sha256.")

 parser.add_argument("--disable-smart-memory", action="store_true", help="Force ComfyUI to agressively offload to regular ram instead of keeping models in vram when it can.")
--- a/comfy/clip_model.py
+++ b/comfy/clip_model.py
@@ -97,7 +97,7 @@ class CLIPTextModel_(torch.nn.Module):
        self.encoder = CLIPEncoder(num_layers, embed_dim, heads, intermediate_size, intermediate_activation, dtype, device, operations)
        self.final_layer_norm = operations.LayerNorm(embed_dim, dtype=dtype, device=device)

-    def forward(self, input_tokens=None, attention_mask=None, embeds=None, num_tokens=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=torch.float32):
+    def forward(self, input_tokens=None, attention_mask=None, embeds=None, num_tokens=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=torch.float32, embeds_info=[]):
        if embeds is not None:
            x = embeds + comfy.ops.cast_to(self.embeddings.position_embedding.weight, dtype=dtype, device=embeds.device)
        else:
--- a/comfy/conds.py
+++ b/comfy/conds.py
@@ -1,6 +1,7 @@
 import torch
 import math
 import comfy.utils
+import logging


 class CONDRegular:
@@ -10,12 +11,15 @@ class CONDRegular:
    def _copy_with(self, cond):
        return self.__class__(cond)

-    def process_cond(self, batch_size, device, **kwargs):
-        return self._copy_with(comfy.utils.repeat_to_batch_size(self.cond, batch_size).to(device))
+    def process_cond(self, batch_size, **kwargs):
+        return self._copy_with(comfy.utils.repeat_to_batch_size(self.cond, batch_size))

    def can_concat(self, other):
        if self.cond.shape != other.cond.shape:
            return False
+        if self.cond.device != other.cond.device:
+            logging.warning("WARNING: conds not on same device, skipping concat.")
+            return False
        return True

    def concat(self, others):
@@ -29,14 +33,14 @@ class CONDRegular:


 class CONDNoiseShape(CONDRegular):
-    def process_cond(self, batch_size, device, area, **kwargs):
+    def process_cond(self, batch_size, area, **kwargs):
        data = self.cond
        if area is not None:
            dims = len(area) // 2
            for i in range(dims):
                data = data.narrow(i + 2, area[i + dims], area[i])

-        return self._copy_with(comfy.utils.repeat_to_batch_size(data, batch_size).to(device))
+        return self._copy_with(comfy.utils.repeat_to_batch_size(data, batch_size))


 class CONDCrossAttn(CONDRegular):
@@ -51,6 +55,9 @@ class CONDCrossAttn(CONDRegular):
            diff = mult_min // min(s1[1], s2[1])
            if diff > 4: #arbitrary limit on the padding because it's probably going to impact performance negatively if it's too much
                return False
+        if self.cond.device != other.cond.device:
+            logging.warning("WARNING: conds not on same device: skipping concat.")
+            return False
        return True

    def concat(self, others):
@@ -73,7 +80,7 @@ class CONDConstant(CONDRegular):
    def __init__(self, cond):
        self.cond = cond

-    def process_cond(self, batch_size, device, **kwargs):
+    def process_cond(self, batch_size, **kwargs):
        return self._copy_with(self.cond)

    def can_concat(self, other):
@@ -92,10 +99,10 @@ class CONDList(CONDRegular):
    def __init__(self, cond):
        self.cond = cond

-    def process_cond(self, batch_size, device, **kwargs):
+    def process_cond(self, batch_size, **kwargs):
        out = []
        for c in self.cond:
-            out.append(comfy.utils.repeat_to_batch_size(c, batch_size).to(device))
+            out.append(comfy.utils.repeat_to_batch_size(c, batch_size))

        return self._copy_with(out)

--- a/comfy/context_windows.py
+++ b/comfy/context_windows.py
@@ -0,0 +1,540 @@
+from __future__ import annotations
+from typing import TYPE_CHECKING, Callable
+import torch
+import numpy as np
+import collections
+from dataclasses import dataclass
+from abc import ABC, abstractmethod
+import logging
+import comfy.model_management
+import comfy.patcher_extension
+if TYPE_CHECKING:
+    from comfy.model_base import BaseModel
+    from comfy.model_patcher import ModelPatcher
+    from comfy.controlnet import ControlBase
+
+
+class ContextWindowABC(ABC):
+    def __init__(self):
+        ...
+
+    @abstractmethod
+    def get_tensor(self, full: torch.Tensor) -> torch.Tensor:
+        """
+        Get torch.Tensor applicable to current window.
+        """
+        raise NotImplementedError("Not implemented.")
+
+    @abstractmethod
+    def add_window(self, full: torch.Tensor, to_add: torch.Tensor) -> torch.Tensor:
+        """
+        Apply torch.Tensor of window to the full tensor, in place. Returns reference to updated full tensor, not a copy.
+        """
+        raise NotImplementedError("Not implemented.")
+
+class ContextHandlerABC(ABC):
+    def __init__(self):
+        ...
+
+    @abstractmethod
+    def should_use_context(self, model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep: torch.Tensor, model_options: dict[str]) -> bool:
+        raise NotImplementedError("Not implemented.")
+
+    @abstractmethod
+    def get_resized_cond(self, cond_in: list[dict], x_in: torch.Tensor, window: ContextWindowABC, device=None) -> list:
+        raise NotImplementedError("Not implemented.")
+
+    @abstractmethod
+    def execute(self, calc_cond_batch: Callable, model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep: torch.Tensor, model_options: dict[str]):
+        raise NotImplementedError("Not implemented.")
+
+
+
+class IndexListContextWindow(ContextWindowABC):
+    def __init__(self, index_list: list[int], dim: int=0):
+        self.index_list = index_list
+        self.context_length = len(index_list)
+        self.dim = dim
+
+    def get_tensor(self, full: torch.Tensor, device=None, dim=None) -> torch.Tensor:
+        if dim is None:
+            dim = self.dim
+        if dim == 0 and full.shape[dim] == 1:
+            return full
+        idx = [slice(None)] * dim + [self.index_list]
+        return full[idx].to(device)
+
+    def add_window(self, full: torch.Tensor, to_add: torch.Tensor, dim=None) -> torch.Tensor:
+        if dim is None:
+            dim = self.dim
+        idx = [slice(None)] * dim + [self.index_list]
+        full[idx] += to_add
+        return full
+
+
+class IndexListCallbacks:
+    EVALUATE_CONTEXT_WINDOWS = "evaluate_context_windows"
+    COMBINE_CONTEXT_WINDOW_RESULTS = "combine_context_window_results"
+    EXECUTE_START = "execute_start"
+    EXECUTE_CLEANUP = "execute_cleanup"
+
+    def init_callbacks(self):
+        return {}
+
+
+@dataclass
+class ContextSchedule:
+    name: str
+    func: Callable
+
+@dataclass
+class ContextFuseMethod:
+    name: str
+    func: Callable
+
+ContextResults = collections.namedtuple("ContextResults", ['window_idx', 'sub_conds_out', 'sub_conds', 'window'])
+class IndexListContextHandler(ContextHandlerABC):
+    def __init__(self, context_schedule: ContextSchedule, fuse_method: ContextFuseMethod, context_length: int=1, context_overlap: int=0, context_stride: int=1, closed_loop=False, dim=0):
+        self.context_schedule = context_schedule
+        self.fuse_method = fuse_method
+        self.context_length = context_length
+        self.context_overlap = context_overlap
+        self.context_stride = context_stride
+        self.closed_loop = closed_loop
+        self.dim = dim
+        self._step = 0
+
+        self.callbacks = {}
+
+    def should_use_context(self, model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep: torch.Tensor, model_options: dict[str]) -> bool:
+        # for now, assume first dim is batch - should have stored on BaseModel in actual implementation
+        if x_in.size(self.dim) > self.context_length:
+            logging.info(f"Using context windows {self.context_length} for {x_in.size(self.dim)} frames.")
+            return True
+        return False
+
+    def prepare_control_objects(self, control: ControlBase, device=None) -> ControlBase:
+        if control.previous_controlnet is not None:
+            self.prepare_control_objects(control.previous_controlnet, device)
+        return control
+
+    def get_resized_cond(self, cond_in: list[dict], x_in: torch.Tensor, window: IndexListContextWindow, device=None) -> list:
+        if cond_in is None:
+            return None
+        # reuse or resize cond items to match context requirements
+        resized_cond = []
+        # cond object is a list containing a dict - outer list is irrelevant, so just loop through it
+        for actual_cond in cond_in:
+            resized_actual_cond = actual_cond.copy()
+            # now we are in the inner dict - "pooled_output" is a tensor, "control" is a ControlBase object, "model_conds" is dictionary
+            for key in actual_cond:
+                try:
+                    cond_item = actual_cond[key]
+                    if isinstance(cond_item, torch.Tensor):
+                        # check that tensor is the expected length - x.size(0)
+                        if self.dim < cond_item.ndim and cond_item.size(self.dim) == x_in.size(self.dim):
+                            # if so, it's subsetting time - tell controls the expected indeces so they can handle them
+                            actual_cond_item = window.get_tensor(cond_item)
+                            resized_actual_cond[key] = actual_cond_item.to(device)
+                        else:
+                            resized_actual_cond[key] = cond_item.to(device)
+                    # look for control
+                    elif key == "control":
+                        resized_actual_cond[key] = self.prepare_control_objects(cond_item, device)
+                    elif isinstance(cond_item, dict):
+                        new_cond_item = cond_item.copy()
+                        # when in dictionary, look for tensors and CONDCrossAttn [comfy/conds.py] (has cond attr that is a tensor)
+                        for cond_key, cond_value in new_cond_item.items():
+                            if isinstance(cond_value, torch.Tensor):
+                                if cond_value.ndim < self.dim and cond_value.size(0) == x_in.size(self.dim):
+                                    new_cond_item[cond_key] = window.get_tensor(cond_value, device)
+                            # if has cond that is a Tensor, check if needs to be subset
+                            elif hasattr(cond_value, "cond") and isinstance(cond_value.cond, torch.Tensor):
+                                if cond_value.cond.ndim < self.dim and cond_value.cond.size(0) == x_in.size(self.dim):
+                                    new_cond_item[cond_key] = cond_value._copy_with(window.get_tensor(cond_value.cond, device))
+                            elif cond_key == "num_video_frames": # for SVD
+                                new_cond_item[cond_key] = cond_value._copy_with(cond_value.cond)
+                                new_cond_item[cond_key].cond = window.context_length
+                        resized_actual_cond[key] = new_cond_item
+                    else:
+                        resized_actual_cond[key] = cond_item
+                finally:
+                    del cond_item  # just in case to prevent VRAM issues
+            resized_cond.append(resized_actual_cond)
+        return resized_cond
+
+    def set_step(self, timestep: torch.Tensor, model_options: dict[str]):
+        mask = torch.isclose(model_options["transformer_options"]["sample_sigmas"], timestep, rtol=0.0001)
+        matches = torch.nonzero(mask)
+        if torch.numel(matches) == 0:
+            raise Exception("No sample_sigmas matched current timestep; something went wrong.")
+        self._step = int(matches[0].item())
+
+    def get_context_windows(self, model: BaseModel, x_in: torch.Tensor, model_options: dict[str]) -> list[IndexListContextWindow]:
+        full_length = x_in.size(self.dim) # TODO: choose dim based on model
+        context_windows = self.context_schedule.func(full_length, self, model_options)
+        context_windows = [IndexListContextWindow(window, dim=self.dim) for window in context_windows]
+        return context_windows
+
+    def execute(self, calc_cond_batch: Callable, model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep: torch.Tensor, model_options: dict[str]):
+        self.set_step(timestep, model_options)
+        context_windows = self.get_context_windows(model, x_in, model_options)
+        enumerated_context_windows = list(enumerate(context_windows))
+
+        conds_final = [torch.zeros_like(x_in) for _ in conds]
+        if self.fuse_method.name == ContextFuseMethods.RELATIVE:
+            counts_final = [torch.ones(get_shape_for_dim(x_in, self.dim), device=x_in.device) for _ in conds]
+        else:
+            counts_final = [torch.zeros(get_shape_for_dim(x_in, self.dim), device=x_in.device) for _ in conds]
+        biases_final = [([0.0] * x_in.shape[self.dim]) for _ in conds]
+
+        for callback in comfy.patcher_extension.get_all_callbacks(IndexListCallbacks.EXECUTE_START, self.callbacks):
+            callback(self, model, x_in, conds, timestep, model_options)
+
+        for enum_window in enumerated_context_windows:
+            results = self.evaluate_context_windows(calc_cond_batch, model, x_in, conds, timestep, [enum_window], model_options)
+            for result in results:
+                self.combine_context_window_results(x_in, result.sub_conds_out, result.sub_conds, result.window, result.window_idx, len(enumerated_context_windows), timestep,
+                                            conds_final, counts_final, biases_final)
+        try:
+            # finalize conds
+            if self.fuse_method.name == ContextFuseMethods.RELATIVE:
+                # relative is already normalized, so return as is
+                del counts_final
+                return conds_final
+            else:
+                # normalize conds via division by context usage counts
+                for i in range(len(conds_final)):
+                    conds_final[i] /= counts_final[i]
+                del counts_final
+                return conds_final
+        finally:
+            for callback in comfy.patcher_extension.get_all_callbacks(IndexListCallbacks.EXECUTE_CLEANUP, self.callbacks):
+                callback(self, model, x_in, conds, timestep, model_options)
+
+    def evaluate_context_windows(self, calc_cond_batch: Callable, model: BaseModel, x_in: torch.Tensor, conds, timestep: torch.Tensor, enumerated_context_windows: list[tuple[int, IndexListContextWindow]],
+                                model_options, device=None, first_device=None):
+        results: list[ContextResults] = []
+        for window_idx, window in enumerated_context_windows:
+            # allow processing to end between context window executions for faster Cancel
+            comfy.model_management.throw_exception_if_processing_interrupted()
+
+            for callback in comfy.patcher_extension.get_all_callbacks(IndexListCallbacks.EVALUATE_CONTEXT_WINDOWS, self.callbacks):
+                callback(self, model, x_in, conds, timestep, model_options, window_idx, window, model_options, device, first_device)
+
+            # update exposed params
+            model_options["transformer_options"]["context_window"] = window
+            # get subsections of x, timestep, conds
+            sub_x = window.get_tensor(x_in, device)
+            sub_timestep = window.get_tensor(timestep, device, dim=0)
+            sub_conds = [self.get_resized_cond(cond, x_in, window, device) for cond in conds]
+
+            sub_conds_out = calc_cond_batch(model, sub_conds, sub_x, sub_timestep, model_options)
+            if device is not None:
+                for i in range(len(sub_conds_out)):
+                    sub_conds_out[i] = sub_conds_out[i].to(x_in.device)
+            results.append(ContextResults(window_idx, sub_conds_out, sub_conds, window))
+        return results
+
+
+    def combine_context_window_results(self, x_in: torch.Tensor, sub_conds_out, sub_conds, window: IndexListContextWindow, window_idx: int, total_windows: int, timestep: torch.Tensor,
+                                    conds_final: list[torch.Tensor], counts_final: list[torch.Tensor], biases_final: list[torch.Tensor]):
+        if self.fuse_method.name == ContextFuseMethods.RELATIVE:
+            for pos, idx in enumerate(window.index_list):
+                # bias is the influence of a specific index in relation to the whole context window
+                bias = 1 - abs(idx - (window.index_list[0] + window.index_list[-1]) / 2) / ((window.index_list[-1] - window.index_list[0] + 1e-2) / 2)
+                bias = max(1e-2, bias)
+                # take weighted average relative to total bias of current idx
+                for i in range(len(sub_conds_out)):
+                    bias_total = biases_final[i][idx]
+                    prev_weight = (bias_total / (bias_total + bias))
+                    new_weight = (bias / (bias_total + bias))
+                    # account for dims of tensors
+                    idx_window = [slice(None)] * self.dim + [idx]
+                    pos_window = [slice(None)] * self.dim + [pos]
+                    # apply new values
+                    conds_final[i][idx_window] = conds_final[i][idx_window] * prev_weight + sub_conds_out[i][pos_window] * new_weight
+                    biases_final[i][idx] = bias_total + bias
+        else:
+            # add conds and counts based on weights of fuse method
+            weights = get_context_weights(window.context_length, x_in.shape[self.dim], window.index_list, self, sigma=timestep)
+            weights_tensor = match_weights_to_dim(weights, x_in, self.dim, device=x_in.device)
+            for i in range(len(sub_conds_out)):
+                window.add_window(conds_final[i], sub_conds_out[i] * weights_tensor)
+                window.add_window(counts_final[i], weights_tensor)
+
+        for callback in comfy.patcher_extension.get_all_callbacks(IndexListCallbacks.COMBINE_CONTEXT_WINDOW_RESULTS, self.callbacks):
+            callback(self, x_in, sub_conds_out, sub_conds, window, window_idx, total_windows, timestep, conds_final, counts_final, biases_final)
+
+
+def _prepare_sampling_wrapper(executor, model, noise_shape: torch.Tensor, *args, **kwargs):
+    # limit noise_shape length to context_length for more accurate vram use estimation
+    model_options = kwargs.get("model_options", None)
+    if model_options is None:
+        raise Exception("model_options not found in prepare_sampling_wrapper; this should never happen, something went wrong.")
+    handler: IndexListContextHandler = model_options.get("context_handler", None)
+    if handler is not None:
+        noise_shape = list(noise_shape)
+        noise_shape[handler.dim] = min(noise_shape[handler.dim], handler.context_length)
+    return executor(model, noise_shape, *args, **kwargs)
+
+
+def create_prepare_sampling_wrapper(model: ModelPatcher):
+    model.add_wrapper_with_key(
+        comfy.patcher_extension.WrappersMP.PREPARE_SAMPLING,
+        "ContextWindows_prepare_sampling",
+        _prepare_sampling_wrapper
+    )
+
+
+def match_weights_to_dim(weights: list[float], x_in: torch.Tensor, dim: int, device=None) -> torch.Tensor:
+    total_dims = len(x_in.shape)
+    weights_tensor = torch.Tensor(weights).to(device=device)
+    for _ in range(dim):
+        weights_tensor = weights_tensor.unsqueeze(0)
+    for _ in range(total_dims - dim - 1):
+        weights_tensor = weights_tensor.unsqueeze(-1)
+    return weights_tensor
+
+def get_shape_for_dim(x_in: torch.Tensor, dim: int) -> list[int]:
+    total_dims = len(x_in.shape)
+    shape = []
+    for _ in range(dim):
+        shape.append(1)
+    shape.append(x_in.shape[dim])
+    for _ in range(total_dims - dim - 1):
+        shape.append(1)
+    return shape
+
+class ContextSchedules:
+    UNIFORM_LOOPED = "looped_uniform"
+    UNIFORM_STANDARD = "standard_uniform"
+    STATIC_STANDARD = "standard_static"
+    BATCHED = "batched"
+
+
+# from https://github.com/neggles/animatediff-cli/blob/main/src/animatediff/pipelines/context.py
+def create_windows_uniform_looped(num_frames: int, handler: IndexListContextHandler, model_options: dict[str]):
+    windows = []
+    if num_frames < handler.context_length:
+        windows.append(list(range(num_frames)))
+        return windows
+
+    context_stride = min(handler.context_stride, int(np.ceil(np.log2(num_frames / handler.context_length))) + 1)
+    # obtain uniform windows as normal, looping and all
+    for context_step in 1 << np.arange(context_stride):
+        pad = int(round(num_frames * ordered_halving(handler._step)))
+        for j in range(
+            int(ordered_halving(handler._step) * context_step) + pad,
+            num_frames + pad + (0 if handler.closed_loop else -handler.context_overlap),
+            (handler.context_length * context_step - handler.context_overlap),
+        ):
+            windows.append([e % num_frames for e in range(j, j + handler.context_length * context_step, context_step)])
+
+    return windows
+
+def create_windows_uniform_standard(num_frames: int, handler: IndexListContextHandler, model_options: dict[str]):
+    # unlike looped, uniform_straight does NOT allow windows that loop back to the beginning;
+    # instead, they get shifted to the corresponding end of the frames.
+    # in the case that a window (shifted or not) is identical to the previous one, it gets skipped.
+    windows = []
+    if num_frames <= handler.context_length:
+        windows.append(list(range(num_frames)))
+        return windows
+
+    context_stride = min(handler.context_stride, int(np.ceil(np.log2(num_frames / handler.context_length))) + 1)
+    # first, obtain uniform windows as normal, looping and all
+    for context_step in 1 << np.arange(context_stride):
+        pad = int(round(num_frames * ordered_halving(handler._step)))
+        for j in range(
+            int(ordered_halving(handler._step) * context_step) + pad,
+            num_frames + pad + (-handler.context_overlap),
+            (handler.context_length * context_step - handler.context_overlap),
+        ):
+            windows.append([e % num_frames for e in range(j, j + handler.context_length * context_step, context_step)])
+
+    # now that windows are created, shift any windows that loop, and delete duplicate windows
+    delete_idxs = []
+    win_i = 0
+    while win_i < len(windows):
+        # if window is rolls over itself, need to shift it
+        is_roll, roll_idx = does_window_roll_over(windows[win_i], num_frames)
+        if is_roll:
+            roll_val = windows[win_i][roll_idx]  # roll_val might not be 0 for windows of higher strides
+            shift_window_to_end(windows[win_i], num_frames=num_frames)
+            # check if next window (cyclical) is missing roll_val
+            if roll_val not in windows[(win_i+1) % len(windows)]:
+                # need to insert new window here - just insert window starting at roll_val
+                windows.insert(win_i+1, list(range(roll_val, roll_val + handler.context_length)))
+        # delete window if it's not unique
+        for pre_i in range(0, win_i):
+            if windows[win_i] == windows[pre_i]:
+                delete_idxs.append(win_i)
+                break
+        win_i += 1
+
+    # reverse delete_idxs so that they will be deleted in an order that doesn't break idx correlation
+    delete_idxs.reverse()
+    for i in delete_idxs:
+        windows.pop(i)
+
+    return windows
+
+
+def create_windows_static_standard(num_frames: int, handler: IndexListContextHandler, model_options: dict[str]):
+    windows = []
+    if num_frames <= handler.context_length:
+        windows.append(list(range(num_frames)))
+        return windows
+    # always return the same set of windows
+    delta = handler.context_length - handler.context_overlap
+    for start_idx in range(0, num_frames, delta):
+        # if past the end of frames, move start_idx back to allow same context_length
+        ending = start_idx + handler.context_length
+        if ending >= num_frames:
+            final_delta = ending - num_frames
+            final_start_idx = start_idx - final_delta
+            windows.append(list(range(final_start_idx, final_start_idx + handler.context_length)))
+            break
+        windows.append(list(range(start_idx, start_idx + handler.context_length)))
+    return windows
+
+
+def create_windows_batched(num_frames: int, handler: IndexListContextHandler, model_options: dict[str]):
+    windows = []
+    if num_frames <= handler.context_length:
+        windows.append(list(range(num_frames)))
+        return windows
+    # always return the same set of windows;
+    # no overlap, just cut up based on context_length;
+    # last window size will be different if num_frames % opts.context_length != 0
+    for start_idx in range(0, num_frames, handler.context_length):
+        windows.append(list(range(start_idx, min(start_idx + handler.context_length, num_frames))))
+    return windows
+
+
+def create_windows_default(num_frames: int, handler: IndexListContextHandler):
+    return [list(range(num_frames))]
+
+
+CONTEXT_MAPPING = {
+    ContextSchedules.UNIFORM_LOOPED: create_windows_uniform_looped,
+    ContextSchedules.UNIFORM_STANDARD: create_windows_uniform_standard,
+    ContextSchedules.STATIC_STANDARD: create_windows_static_standard,
+    ContextSchedules.BATCHED: create_windows_batched,
+}
+
+
+def get_matching_context_schedule(context_schedule: str) -> ContextSchedule:
+    func = CONTEXT_MAPPING.get(context_schedule, None)
+    if func is None:
+        raise ValueError(f"Unknown context_schedule '{context_schedule}'.")
+    return ContextSchedule(context_schedule, func)
+
+
+def get_context_weights(length: int, full_length: int, idxs: list[int], handler: IndexListContextHandler, sigma: torch.Tensor=None):
+    return handler.fuse_method.func(length, sigma=sigma, handler=handler, full_length=full_length, idxs=idxs)
+
+
+def create_weights_flat(length: int, **kwargs) -> list[float]:
+    # weight is the same for all
+    return [1.0] * length
+
+def create_weights_pyramid(length: int, **kwargs) -> list[float]:
+    # weight is based on the distance away from the edge of the context window;
+    # based on weighted average concept in FreeNoise paper
+    if length % 2 == 0:
+        max_weight = length // 2
+        weight_sequence = list(range(1, max_weight + 1, 1)) + list(range(max_weight, 0, -1))
+    else:
+        max_weight = (length + 1) // 2
+        weight_sequence = list(range(1, max_weight, 1)) + [max_weight] + list(range(max_weight - 1, 0, -1))
+    return weight_sequence
+
+def create_weights_overlap_linear(length: int, full_length: int, idxs: list[int], handler: IndexListContextHandler, **kwargs):
+    # based on code in Kijai's WanVideoWrapper: https://github.com/kijai/ComfyUI-WanVideoWrapper/blob/dbb2523b37e4ccdf45127e5ae33e31362f755c8e/nodes.py#L1302
+    # only expected overlap is given different weights
+    weights_torch = torch.ones((length))
+    # blend left-side on all except first window
+    if min(idxs) > 0:
+        ramp_up = torch.linspace(1e-37, 1, handler.context_overlap)
+        weights_torch[:handler.context_overlap] = ramp_up
+    # blend right-side on all except last window
+    if max(idxs) < full_length-1:
+        ramp_down = torch.linspace(1, 1e-37, handler.context_overlap)
+        weights_torch[-handler.context_overlap:] = ramp_down
+    return weights_torch
+
+class ContextFuseMethods:
+    FLAT = "flat"
+    PYRAMID = "pyramid"
+    RELATIVE = "relative"
+    OVERLAP_LINEAR = "overlap-linear"
+
+    LIST = [PYRAMID, FLAT, OVERLAP_LINEAR]
+    LIST_STATIC = [PYRAMID, RELATIVE, FLAT, OVERLAP_LINEAR]
+
+
+FUSE_MAPPING = {
+    ContextFuseMethods.FLAT: create_weights_flat,
+    ContextFuseMethods.PYRAMID: create_weights_pyramid,
+    ContextFuseMethods.RELATIVE: create_weights_pyramid,
+    ContextFuseMethods.OVERLAP_LINEAR: create_weights_overlap_linear,
+}
+
+def get_matching_fuse_method(fuse_method: str) -> ContextFuseMethod:
+    func = FUSE_MAPPING.get(fuse_method, None)
+    if func is None:
+        raise ValueError(f"Unknown fuse_method '{fuse_method}'.")
+    return ContextFuseMethod(fuse_method, func)
+
+# Returns fraction that has denominator that is a power of 2
+def ordered_halving(val):
+    # get binary value, padded with 0s for 64 bits
+    bin_str = f"{val:064b}"
+    # flip binary value, padding included
+    bin_flip = bin_str[::-1]
+    # convert binary to int
+    as_int = int(bin_flip, 2)
+    # divide by 1 << 64, equivalent to 2**64, or 18446744073709551616,
+    # or b10000000000000000000000000000000000000000000000000000000000000000 (1 with 64 zero's)
+    return as_int / (1 << 64)
+
+
+def get_missing_indexes(windows: list[list[int]], num_frames: int) -> list[int]:
+    all_indexes = list(range(num_frames))
+    for w in windows:
+        for val in w:
+            try:
+                all_indexes.remove(val)
+            except ValueError:
+                pass
+    return all_indexes
+
+
+def does_window_roll_over(window: list[int], num_frames: int) -> tuple[bool, int]:
+    prev_val = -1
+    for i, val in enumerate(window):
+        val = val % num_frames
+        if val < prev_val:
+            return True, i
+        prev_val = val
+    return False, -1
+
+
+def shift_window_to_start(window: list[int], num_frames: int):
+    start_val = window[0]
+    for i in range(len(window)):
+        # 1) subtract each element by start_val to move vals relative to the start of all frames
+        # 2) add num_frames and take modulus to get adjusted vals
+        window[i] = ((window[i] - start_val) + num_frames) % num_frames
+
+
+def shift_window_to_end(window: list[int], num_frames: int):
+    # 1) shift window to start
+    shift_window_to_start(window, num_frames)
+    end_val = window[-1]
+    end_delta = num_frames - end_val - 1
+    for i in range(len(window)):
+        # 2) add end_delta to each val to slide windows to end
+        window[i] = window[i] + end_delta
--- a/comfy/controlnet.py
+++ b/comfy/controlnet.py
@@ -28,6 +28,7 @@ import comfy.model_detection
 import comfy.model_patcher
 import comfy.ops
 import comfy.latent_formats
+import comfy.model_base

 import comfy.cldm.cldm
 import comfy.t2i_adapter.adapter
@@ -35,6 +36,7 @@ import comfy.ldm.cascade.controlnet
 import comfy.cldm.mmdit
 import comfy.ldm.hydit.controlnet
 import comfy.ldm.flux.controlnet
+import comfy.ldm.qwen_image.controlnet
 import comfy.cldm.dit_embedder
 from typing import TYPE_CHECKING
 if TYPE_CHECKING:
@@ -43,7 +45,6 @@ if TYPE_CHECKING:

 def broadcast_image_to(tensor, target_batch_size, batched_number):
    current_batch_size = tensor.shape[0]
-    #print(current_batch_size, target_batch_size)
    if current_batch_size == 1:
        return tensor

@@ -236,11 +237,11 @@ class ControlNet(ControlBase):
            self.cond_hint = None
            compression_ratio = self.compression_ratio
            if self.vae is not None:
-                compression_ratio *= self.vae.downscale_ratio
+                compression_ratio *= self.vae.spacial_compression_encode()
            else:
                if self.latent_format is not None:
                    raise ValueError("This Controlnet needs a VAE but none was provided, please use a ControlNetApply node with a VAE input and connect it.")
-            self.cond_hint = comfy.utils.common_upscale(self.cond_hint_original, x_noisy.shape[3] * compression_ratio, x_noisy.shape[2] * compression_ratio, self.upscale_algorithm, "center")
+            self.cond_hint = comfy.utils.common_upscale(self.cond_hint_original, x_noisy.shape[-1] * compression_ratio, x_noisy.shape[-2] * compression_ratio, self.upscale_algorithm, "center")
            self.cond_hint = self.preprocess_image(self.cond_hint)
            if self.vae is not None:
                loaded_models = comfy.model_management.loaded_models(only_currently_used=True)
@@ -265,12 +266,12 @@ class ControlNet(ControlBase):
        for c in self.extra_conds:
            temp = cond.get(c, None)
            if temp is not None:
-                extra[c] = temp.to(dtype)
+                extra[c] = comfy.model_base.convert_tensor(temp, dtype, x_noisy.device)

        timestep = self.model_sampling_current.timestep(t)
        x_noisy = self.model_sampling_current.calculate_input(t, x_noisy)

-        control = self.control_model(x=x_noisy.to(dtype), hint=self.cond_hint, timesteps=timestep.to(dtype), context=context.to(dtype), **extra)
+        control = self.control_model(x=x_noisy.to(dtype), hint=self.cond_hint, timesteps=timestep.to(dtype), context=comfy.model_management.cast_to_device(context, x_noisy.device, dtype), **extra)
        return self.control_merge(control, control_prev, output_dtype=None)

    def copy(self):
@@ -582,6 +583,15 @@ def load_controlnet_flux_instantx(sd, model_options={}):
    control = ControlNet(control_model, compression_ratio=1, latent_format=latent_format, concat_mask=concat_mask, load_device=load_device, manual_cast_dtype=manual_cast_dtype, extra_conds=extra_conds)
    return control

+def load_controlnet_qwen_instantx(sd, model_options={}):
+    model_config, operations, load_device, unet_dtype, manual_cast_dtype, offload_device = controlnet_config(sd, model_options=model_options)
+    control_model = comfy.ldm.qwen_image.controlnet.QwenImageControlNetModel(operations=operations, device=offload_device, dtype=unet_dtype, **model_config.unet_config)
+    control_model = controlnet_load_state_dict(control_model, sd)
+    latent_format = comfy.latent_formats.Wan21()
+    extra_conds = []
+    control = ControlNet(control_model, compression_ratio=1, latent_format=latent_format, load_device=load_device, manual_cast_dtype=manual_cast_dtype, extra_conds=extra_conds)
+    return control
+
 def convert_mistoline(sd):
    return comfy.utils.state_dict_prefix_replace(sd, {"single_controlnet_blocks.": "controlnet_single_blocks."})

@@ -655,8 +665,11 @@ def load_controlnet_state_dict(state_dict, model=None, model_options={}):
                return load_controlnet_sd35(controlnet_data, model_options=model_options) #Stability sd3.5 format
            else:
                return load_controlnet_mmdit(controlnet_data, model_options=model_options) #SD3 diffusers controlnet
+        elif "transformer_blocks.0.img_mlp.net.0.proj.weight" in controlnet_data:
+            return load_controlnet_qwen_instantx(controlnet_data, model_options=model_options)
        elif "controlnet_x_embedder.weight" in controlnet_data:
            return load_controlnet_flux_instantx(controlnet_data, model_options=model_options)
+
    elif "controlnet_blocks.0.linear.weight" in controlnet_data: #mistoline flux
        return load_controlnet_flux_xlabs_mistoline(convert_mistoline(controlnet_data), mistoline=True, model_options=model_options)

--- a/comfy/ldm/ace/model.py
+++ b/comfy/ldm/ace/model.py
@@ -19,6 +19,7 @@ import torch
 from torch import nn

 import comfy.model_management
+import comfy.patcher_extension

 from comfy.ldm.lightricks.model import TimestepEmbedding, Timesteps
 from .attention import LinearTransformerBlock, t2i_modulate
@@ -343,7 +344,28 @@ class ACEStepTransformer2DModel(nn.Module):
        output = self.final_layer(hidden_states, embedded_timestep, output_length)
        return output

-    def forward(
+    def forward(self,
+        x,
+        timestep,
+        attention_mask=None,
+        context: Optional[torch.Tensor] = None,
+        text_attention_mask: Optional[torch.LongTensor] = None,
+        speaker_embeds: Optional[torch.FloatTensor] = None,
+        lyric_token_idx: Optional[torch.LongTensor] = None,
+        lyric_mask: Optional[torch.LongTensor] = None,
+        block_controlnet_hidden_states: Optional[Union[List[torch.Tensor], torch.Tensor]] = None,
+        controlnet_scale: Union[float, torch.Tensor] = 1.0,
+        lyrics_strength=1.0,
+        **kwargs
+    ):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, kwargs.get("transformer_options", {}))
+        ).execute(x, timestep, attention_mask, context, text_attention_mask, speaker_embeds, lyric_token_idx, lyric_mask, block_controlnet_hidden_states,
+                  controlnet_scale, lyrics_strength, **kwargs)
+
+    def _forward(
        self,
        x,
        timestep,
--- a/comfy/ldm/aura/mmdit.py
+++ b/comfy/ldm/aura/mmdit.py
@@ -9,6 +9,7 @@ import torch.nn.functional as F

 from comfy.ldm.modules.attention import optimized_attention
 import comfy.ops
+import comfy.patcher_extension
 import comfy.ldm.common_dit

 def modulate(x, shift, scale):
@@ -436,6 +437,13 @@ class MMDiT(nn.Module):
        return x + pos_encoding.reshape(1, -1, self.positional_encoding.shape[-1])

    def forward(self, x, timestep, context, transformer_options={}, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
+        ).execute(x, timestep, context, transformer_options, **kwargs)
+
+    def _forward(self, x, timestep, context, transformer_options={}, **kwargs):
        patches_replace = transformer_options.get("patches_replace", {})
        # patchify x, add PE
        b, c, h, w = x.shape
--- a/comfy/ldm/chroma/model.py
+++ b/comfy/ldm/chroma/model.py
@@ -5,6 +5,7 @@ from dataclasses import dataclass
 import torch
 from torch import Tensor, nn
 from einops import rearrange, repeat
+import comfy.patcher_extension
 import comfy.ldm.common_dit

 from comfy.ldm.flux.layers import (
@@ -253,6 +254,13 @@ class Chroma(nn.Module):
        return img

    def forward(self, x, timestep, context, guidance, control=None, transformer_options={}, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
+        ).execute(x, timestep, context, guidance, control, transformer_options, **kwargs)
+
+    def _forward(self, x, timestep, context, guidance, control=None, transformer_options={}, **kwargs):
        bs, c, h, w = x.shape
        x = comfy.ldm.common_dit.pad_to_patch_size(x, (self.patch_size, self.patch_size))

--- a/comfy/ldm/cosmos/cosmos_tokenizer/utils.py
+++ b/comfy/ldm/cosmos/cosmos_tokenizer/utils.py
@@ -58,7 +58,8 @@ def is_odd(n: int) -> bool:


 def nonlinearity(x):
-    return x * torch.sigmoid(x)
+    # x * sigmoid(x)
+    return torch.nn.functional.silu(x)


 def Normalize(in_channels, num_groups=32):
--- a/comfy/ldm/cosmos/model.py
+++ b/comfy/ldm/cosmos/model.py
@@ -27,6 +27,8 @@ from torchvision import transforms
 from enum import Enum
 import logging

+import comfy.patcher_extension
+
 from .blocks import (
    FinalLayer,
    GeneralDITTransformerBlock,
@@ -435,6 +437,42 @@ class GeneralDIT(nn.Module):
        latent_condition_sigma: Optional[torch.Tensor] = None,
        condition_video_augment_sigma: Optional[torch.Tensor] = None,
        **kwargs,
+    ):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, kwargs.get("transformer_options", {}))
+        ).execute(x,
+                timesteps,
+                context,
+                attention_mask,
+                fps,
+                image_size,
+                padding_mask,
+                scalar_feature,
+                data_type,
+                latent_condition,
+                latent_condition_sigma,
+                condition_video_augment_sigma,
+                **kwargs)
+
+    def _forward(
+        self,
+        x: torch.Tensor,
+        timesteps: torch.Tensor,
+        context: torch.Tensor,
+        attention_mask: Optional[torch.Tensor] = None,
+        # crossattn_emb: torch.Tensor,
+        # crossattn_mask: Optional[torch.Tensor] = None,
+        fps: Optional[torch.Tensor] = None,
+        image_size: Optional[torch.Tensor] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        scalar_feature: Optional[torch.Tensor] = None,
+        data_type: Optional[DataType] = DataType.VIDEO,
+        latent_condition: Optional[torch.Tensor] = None,
+        latent_condition_sigma: Optional[torch.Tensor] = None,
+        condition_video_augment_sigma: Optional[torch.Tensor] = None,
+        **kwargs,
    ):
        """
        Args:
--- a/comfy/ldm/cosmos/predict2.py
+++ b/comfy/ldm/cosmos/predict2.py
@@ -11,6 +11,7 @@ import math
 from .position_embedding import VideoRopePosition3DEmb, LearnablePosEmbAxis
 from torchvision import transforms

+import comfy.patcher_extension
 from comfy.ldm.modules.attention import optimized_attention

 def apply_rotary_pos_emb(
@@ -805,7 +806,21 @@ class MiniTrainDIT(nn.Module):
        )
        return x_B_C_Tt_Hp_Wp

-    def forward(
+    def forward(self,
+        x: torch.Tensor,
+        timesteps: torch.Tensor,
+        context: torch.Tensor,
+        fps: Optional[torch.Tensor] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        **kwargs,
+    ):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, kwargs.get("transformer_options", {}))
+        ).execute(x, timesteps, context, fps, padding_mask, **kwargs)
+
+    def _forward(
        self,
        x: torch.Tensor,
        timesteps: torch.Tensor,
--- a/comfy/ldm/flux/model.py
+++ b/comfy/ldm/flux/model.py
@@ -6,6 +6,7 @@ import torch
 from torch import Tensor, nn
 from einops import rearrange, repeat
 import comfy.ldm.common_dit
+import comfy.patcher_extension

 from .layers import (
    DoubleStreamBlock,
@@ -214,6 +215,13 @@ class Flux(nn.Module):
        return img, repeat(img_ids, "h w c -> b (h w) c", b=bs)

    def forward(self, x, timestep, context, y=None, guidance=None, ref_latents=None, control=None, transformer_options={}, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
+        ).execute(x, timestep, context, y, guidance, ref_latents, control, transformer_options, **kwargs)
+
+    def _forward(self, x, timestep, context, y=None, guidance=None, ref_latents=None, control=None, transformer_options={}, **kwargs):
        bs, c, h_orig, w_orig = x.shape
        patch_size = self.patch_size

@@ -224,19 +232,27 @@ class Flux(nn.Module):
        if ref_latents is not None:
            h = 0
            w = 0
+            index = 0
+            index_ref_method = kwargs.get("ref_latents_method", "offset") == "index"
            for ref in ref_latents:
-                h_offset = 0
-                w_offset = 0
-                if ref.shape[-2] + h > ref.shape[-1] + w:
-                    w_offset = w
+                if index_ref_method:
+                    index += 1
+                    h_offset = 0
+                    w_offset = 0
                else:
-                    h_offset = h
+                    index = 1
+                    h_offset = 0
+                    w_offset = 0
+                    if ref.shape[-2] + h > ref.shape[-1] + w:
+                        w_offset = w
+                    else:
+                        h_offset = h
+                    h = max(h, ref.shape[-2] + h_offset)
+                    w = max(w, ref.shape[-1] + w_offset)

-                kontext, kontext_ids = self.process_img(ref, index=1, h_offset=h_offset, w_offset=w_offset)
+                kontext, kontext_ids = self.process_img(ref, index=index, h_offset=h_offset, w_offset=w_offset)
                img = torch.cat([img, kontext], dim=1)
                img_ids = torch.cat([img_ids, kontext_ids], dim=1)
-                h = max(h, ref.shape[-2] + h_offset)
-                w = max(w, ref.shape[-1] + w_offset)

        txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
        out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control, transformer_options, attn_mask=kwargs.get("attention_mask", None))
--- a/comfy/ldm/hidream/model.py
+++ b/comfy/ldm/hidream/model.py
@@ -13,6 +13,7 @@ from comfy.ldm.flux.layers import LastLayer

 from comfy.ldm.modules.attention import optimized_attention
 import comfy.model_management
+import comfy.patcher_extension
 import comfy.ldm.common_dit


@@ -692,7 +693,23 @@ class HiDreamImageTransformer2DModel(nn.Module):
            raise NotImplementedError
        return x, x_masks, img_sizes

-    def forward(
+    def forward(self,
+        x: torch.Tensor,
+        t: torch.Tensor,
+        y: Optional[torch.Tensor] = None,
+        context: Optional[torch.Tensor] = None,
+        encoder_hidden_states_llama3=None,
+        image_cond=None,
+        control = None,
+        transformer_options = {},
+    ):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
+        ).execute(x, t, y, context, encoder_hidden_states_llama3, image_cond, control, transformer_options)
+
+    def _forward(
        self,
        x: torch.Tensor,
        t: torch.Tensor,
--- a/comfy/ldm/hunyuan3d/model.py
+++ b/comfy/ldm/hunyuan3d/model.py
@@ -7,6 +7,7 @@ from comfy.ldm.flux.layers import (
    SingleStreamBlock,
    timestep_embedding,
 )
+import comfy.patcher_extension


 class Hunyuan3Dv2(nn.Module):
@@ -67,6 +68,13 @@ class Hunyuan3Dv2(nn.Module):
        self.final_layer = LastLayer(hidden_size, 1, in_channels, dtype=dtype, device=device, operations=operations)

    def forward(self, x, timestep, context, guidance=None, transformer_options={}, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
+        ).execute(x, timestep, context, guidance, transformer_options, **kwargs)
+
+    def _forward(self, x, timestep, context, guidance=None, transformer_options={}, **kwargs):
        x = x.movedim(-1, -2)
        timestep = 1.0 - timestep
        txt = context
--- a/comfy/ldm/hunyuan3d/vae.py
+++ b/comfy/ldm/hunyuan3d/vae.py
@@ -178,7 +178,7 @@ class FourierEmbedder(nn.Module):

 class CrossAttentionProcessor:
    def __call__(self, attn, q, k, v):
-        out = F.scaled_dot_product_attention(q, k, v)
+        out = comfy.ops.scaled_dot_product_attention(q, k, v)
        return out


--- a/comfy/ldm/hunyuan_video/model.py
+++ b/comfy/ldm/hunyuan_video/model.py
@@ -1,6 +1,7 @@
 #Based on Flux code because of weird hunyuan video code license.

 import torch
+import comfy.patcher_extension
 import comfy.ldm.flux.layers
 import comfy.ldm.modules.diffusionmodules.mmdit
 from comfy.ldm.modules.attention import optimized_attention
@@ -348,6 +349,13 @@ class HunyuanVideo(nn.Module):
        return repeat(img_ids, "t h w c -> b (t h w) c", b=bs)

    def forward(self, x, timestep, context, y, guidance=None, attention_mask=None, guiding_frame_index=None, ref_latent=None, control=None, transformer_options={}, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
+        ).execute(x, timestep, context, y, guidance, attention_mask, guiding_frame_index, ref_latent, control, transformer_options, **kwargs)
+
+    def _forward(self, x, timestep, context, y, guidance=None, attention_mask=None, guiding_frame_index=None, ref_latent=None, control=None, transformer_options={}, **kwargs):
        bs, c, t, h, w = x.shape
        img_ids = self.img_ids(x)
        txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
--- a/comfy/ldm/lightricks/model.py
+++ b/comfy/ldm/lightricks/model.py
@@ -1,5 +1,6 @@
 import torch
 from torch import nn
+import comfy.patcher_extension
 import comfy.ldm.modules.attention
 import comfy.ldm.common_dit
 from einops import rearrange
@@ -420,6 +421,13 @@ class LTXVModel(torch.nn.Module):
        self.patchifier = SymmetricPatchifier(1)

    def forward(self, x, timestep, context, attention_mask, frame_rate=25, transformer_options={}, keyframe_idxs=None, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
+        ).execute(x, timestep, context, attention_mask, frame_rate, transformer_options, keyframe_idxs, **kwargs)
+
+    def _forward(self, x, timestep, context, attention_mask, frame_rate=25, transformer_options={}, keyframe_idxs=None, **kwargs):
        patches_replace = transformer_options.get("patches_replace", {})

        orig_shape = list(x.shape)
--- a/comfy/ldm/lumina/model.py
+++ b/comfy/ldm/lumina/model.py
@@ -11,6 +11,7 @@ import comfy.ldm.common_dit
 from comfy.ldm.modules.diffusionmodules.mmdit import TimestepEmbedder
 from comfy.ldm.modules.attention import optimized_attention_masked
 from comfy.ldm.flux.layers import EmbedND
+import comfy.patcher_extension


 def modulate(x, scale):
@@ -590,8 +591,15 @@ class NextDiT(nn.Module):

        return padded_full_embed, mask, img_sizes, l_effective_cap_len, freqs_cis

-    # def forward(self, x, t, cap_feats, cap_mask):
    def forward(self, x, timesteps, context, num_tokens, attention_mask=None, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, kwargs.get("transformer_options", {}))
+        ).execute(x, timesteps, context, num_tokens, attention_mask, **kwargs)
+
+    # def forward(self, x, t, cap_feats, cap_mask):
+    def _forward(self, x, timesteps, context, num_tokens, attention_mask=None, **kwargs):
        t = 1.0 - timesteps
        cap_feats = context
        cap_mask = attention_mask
--- a/comfy/ldm/modules/attention.py
+++ b/comfy/ldm/modules/attention.py
@@ -448,7 +448,7 @@ def attention_pytorch(q, k, v, heads, mask=None, attn_precision=None, skip_resha
            mask = mask.unsqueeze(1)

    if SDP_BATCH_LIMIT >= b:
-        out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
+        out = comfy.ops.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
        if not skip_output_reshape:
            out = (
                out.transpose(1, 2).reshape(b, -1, heads * dim_head)
@@ -461,7 +461,7 @@ def attention_pytorch(q, k, v, heads, mask=None, attn_precision=None, skip_resha
                if mask.shape[0] > 1:
                    m = mask[i : i + SDP_BATCH_LIMIT]

-            out[i : i + SDP_BATCH_LIMIT] = torch.nn.functional.scaled_dot_product_attention(
+            out[i : i + SDP_BATCH_LIMIT] = comfy.ops.scaled_dot_product_attention(
                q[i : i + SDP_BATCH_LIMIT],
                k[i : i + SDP_BATCH_LIMIT],
                v[i : i + SDP_BATCH_LIMIT],
--- a/comfy/ldm/modules/diffusionmodules/mmdit.py
+++ b/comfy/ldm/modules/diffusionmodules/mmdit.py
@@ -109,7 +109,7 @@ class PatchEmbed(nn.Module):
 def modulate(x, shift, scale):
    if shift is None:
        shift = torch.zeros_like(scale)
-    return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)
+    return torch.addcmul(shift.unsqueeze(1), x, 1+ scale.unsqueeze(1))


 #################################################################################
@@ -564,10 +564,7 @@ class DismantledBlock(nn.Module):
        assert not self.pre_only
        attn1 = self.attn.post_attention(attn)
        attn2 = self.attn2.post_attention(attn2)
-        out1 = gate_msa.unsqueeze(1) * attn1
-        out2 = gate_msa2.unsqueeze(1) * attn2
-        x = x + out1
-        x = x + out2
+        x = gate_cat(x, gate_msa, gate_msa2, attn1, attn2)
        x = x + gate_mlp.unsqueeze(1) * self.mlp(
            modulate(self.norm2(x), shift_mlp, scale_mlp)
        )
@@ -594,6 +591,11 @@ class DismantledBlock(nn.Module):
            )
            return self.post_attention(attn, *intermediates)

+def gate_cat(x, gate_msa, gate_msa2, attn1, attn2):
+    out1 = gate_msa.unsqueeze(1) * attn1
+    out2 = gate_msa2.unsqueeze(1) * attn2
+    x = torch.stack([x, out1, out2], dim=0).sum(dim=0)
+    return x

 def block_mixing(*args, use_checkpoint=True, **kwargs):
    if use_checkpoint:
--- a/comfy/ldm/modules/diffusionmodules/model.py
+++ b/comfy/ldm/modules/diffusionmodules/model.py
@@ -36,7 +36,7 @@ def get_timestep_embedding(timesteps, embedding_dim):

 def nonlinearity(x):
    # swish
-    return x*torch.sigmoid(x)
+    return torch.nn.functional.silu(x)


 def Normalize(in_channels, num_groups=32):
@@ -285,7 +285,7 @@ def pytorch_attention(q, k, v):
    )

    try:
-        out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=None, dropout_p=0.0, is_causal=False)
+        out = comfy.ops.scaled_dot_product_attention(q, k, v, attn_mask=None, dropout_p=0.0, is_causal=False)
        out = out.transpose(2, 3).reshape(orig_shape)
    except model_management.OOM_EXCEPTION:
        logging.warning("scaled_dot_product_attention OOMed: switched to slice attention")
--- a/comfy/ldm/qwen_image/controlnet.py
+++ b/comfy/ldm/qwen_image/controlnet.py
@@ -0,0 +1,77 @@
+import torch
+import math
+
+from .model import QwenImageTransformer2DModel
+
+
+class QwenImageControlNetModel(QwenImageTransformer2DModel):
+    def __init__(
+        self,
+        extra_condition_channels=0,
+        dtype=None,
+        device=None,
+        operations=None,
+        **kwargs
+    ):
+        super().__init__(final_layer=False, dtype=dtype, device=device, operations=operations, **kwargs)
+        self.main_model_double = 60
+
+        # controlnet_blocks
+        self.controlnet_blocks = torch.nn.ModuleList([])
+        for _ in range(len(self.transformer_blocks)):
+            self.controlnet_blocks.append(operations.Linear(self.inner_dim, self.inner_dim, device=device, dtype=dtype))
+        self.controlnet_x_embedder = operations.Linear(self.in_channels + extra_condition_channels, self.inner_dim, device=device, dtype=dtype)
+
+    def forward(
+        self,
+        x,
+        timesteps,
+        context,
+        attention_mask=None,
+        guidance: torch.Tensor = None,
+        ref_latents=None,
+        hint=None,
+        transformer_options={},
+        **kwargs
+    ):
+        timestep = timesteps
+        encoder_hidden_states = context
+        encoder_hidden_states_mask = attention_mask
+
+        hidden_states, img_ids, orig_shape = self.process_img(x)
+        hint, _, _ = self.process_img(hint)
+
+        txt_start = round(max(((x.shape[-1] + (self.patch_size // 2)) // self.patch_size) // 2, ((x.shape[-2] + (self.patch_size // 2)) // self.patch_size) // 2))
+        txt_ids = torch.arange(txt_start, txt_start + context.shape[1], device=x.device).reshape(1, -1, 1).repeat(x.shape[0], 1, 3)
+        ids = torch.cat((txt_ids, img_ids), dim=1)
+        image_rotary_emb = self.pe_embedder(ids).squeeze(1).unsqueeze(2).to(x.dtype)
+        del ids, txt_ids, img_ids
+
+        hidden_states = self.img_in(hidden_states) + self.controlnet_x_embedder(hint)
+        encoder_hidden_states = self.txt_norm(encoder_hidden_states)
+        encoder_hidden_states = self.txt_in(encoder_hidden_states)
+
+        if guidance is not None:
+            guidance = guidance * 1000
+
+        temb = (
+            self.time_text_embed(timestep, hidden_states)
+            if guidance is None
+            else self.time_text_embed(timestep, guidance, hidden_states)
+        )
+
+        repeat = math.ceil(self.main_model_double / len(self.controlnet_blocks))
+
+        controlnet_block_samples = ()
+        for i, block in enumerate(self.transformer_blocks):
+            encoder_hidden_states, hidden_states = block(
+                hidden_states=hidden_states,
+                encoder_hidden_states=encoder_hidden_states,
+                encoder_hidden_states_mask=encoder_hidden_states_mask,
+                temb=temb,
+                image_rotary_emb=image_rotary_emb,
+            )
+
+            controlnet_block_samples = controlnet_block_samples + (self.controlnet_blocks[i](hidden_states),) * repeat
+
+        return {"input": controlnet_block_samples[:self.main_model_double]}
--- a/comfy/ldm/qwen_image/model.py
+++ b/comfy/ldm/qwen_image/model.py
@@ -0,0 +1,469 @@
+# https://github.com/QwenLM/Qwen-Image (Apache 2.0)
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from typing import Optional, Tuple
+from einops import repeat
+
+from comfy.ldm.lightricks.model import TimestepEmbedding, Timesteps
+from comfy.ldm.modules.attention import optimized_attention_masked
+from comfy.ldm.flux.layers import EmbedND
+import comfy.ldm.common_dit
+import comfy.patcher_extension
+
+class GELU(nn.Module):
+    def __init__(self, dim_in: int, dim_out: int, approximate: str = "none", bias: bool = True, dtype=None, device=None, operations=None):
+        super().__init__()
+        self.proj = operations.Linear(dim_in, dim_out, bias=bias, dtype=dtype, device=device)
+        self.approximate = approximate
+
+    def forward(self, hidden_states):
+        hidden_states = self.proj(hidden_states)
+        hidden_states = F.gelu(hidden_states, approximate=self.approximate)
+        return hidden_states
+
+
+class FeedForward(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        dim_out: Optional[int] = None,
+        mult: int = 4,
+        dropout: float = 0.0,
+        inner_dim=None,
+        bias: bool = True,
+        dtype=None, device=None, operations=None
+    ):
+        super().__init__()
+        if inner_dim is None:
+            inner_dim = int(dim * mult)
+        dim_out = dim_out if dim_out is not None else dim
+
+        self.net = nn.ModuleList([])
+        self.net.append(GELU(dim, inner_dim, approximate="tanh", bias=bias, dtype=dtype, device=device, operations=operations))
+        self.net.append(nn.Dropout(dropout))
+        self.net.append(operations.Linear(inner_dim, dim_out, bias=bias, dtype=dtype, device=device))
+
+    def forward(self, hidden_states: torch.Tensor, *args, **kwargs) -> torch.Tensor:
+        for module in self.net:
+            hidden_states = module(hidden_states)
+        return hidden_states
+
+
+def apply_rotary_emb(x, freqs_cis):
+    if x.shape[1] == 0:
+        return x
+
+    t_ = x.reshape(*x.shape[:-1], -1, 1, 2)
+    t_out = freqs_cis[..., 0] * t_[..., 0] + freqs_cis[..., 1] * t_[..., 1]
+    return t_out.reshape(*x.shape)
+
+
+class QwenTimestepProjEmbeddings(nn.Module):
+    def __init__(self, embedding_dim, pooled_projection_dim, dtype=None, device=None, operations=None):
+        super().__init__()
+        self.time_proj = Timesteps(num_channels=256, flip_sin_to_cos=True, downscale_freq_shift=0, scale=1000)
+        self.timestep_embedder = TimestepEmbedding(
+            in_channels=256,
+            time_embed_dim=embedding_dim,
+            dtype=dtype,
+            device=device,
+            operations=operations
+        )
+
+    def forward(self, timestep, hidden_states):
+        timesteps_proj = self.time_proj(timestep)
+        timesteps_emb = self.timestep_embedder(timesteps_proj.to(dtype=hidden_states.dtype))
+        return timesteps_emb
+
+
+class Attention(nn.Module):
+    def __init__(
+        self,
+        query_dim: int,
+        dim_head: int = 64,
+        heads: int = 8,
+        dropout: float = 0.0,
+        bias: bool = False,
+        eps: float = 1e-5,
+        out_bias: bool = True,
+        out_dim: int = None,
+        out_context_dim: int = None,
+        dtype=None,
+        device=None,
+        operations=None
+    ):
+        super().__init__()
+        self.inner_dim = out_dim if out_dim is not None else dim_head * heads
+        self.inner_kv_dim = self.inner_dim
+        self.heads = heads
+        self.dim_head = dim_head
+        self.out_dim = out_dim if out_dim is not None else query_dim
+        self.out_context_dim = out_context_dim if out_context_dim is not None else query_dim
+        self.dropout = dropout
+
+        # Q/K normalization
+        self.norm_q = operations.RMSNorm(dim_head, eps=eps, elementwise_affine=True, dtype=dtype, device=device)
+        self.norm_k = operations.RMSNorm(dim_head, eps=eps, elementwise_affine=True, dtype=dtype, device=device)
+        self.norm_added_q = operations.RMSNorm(dim_head, eps=eps, dtype=dtype, device=device)
+        self.norm_added_k = operations.RMSNorm(dim_head, eps=eps, dtype=dtype, device=device)
+
+        # Image stream projections
+        self.to_q = operations.Linear(query_dim, self.inner_dim, bias=bias, dtype=dtype, device=device)
+        self.to_k = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+        self.to_v = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+
+        # Text stream projections
+        self.add_q_proj = operations.Linear(query_dim, self.inner_dim, bias=bias, dtype=dtype, device=device)
+        self.add_k_proj = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+        self.add_v_proj = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+
+        # Output projections
+        self.to_out = nn.ModuleList([
+            operations.Linear(self.inner_dim, self.out_dim, bias=out_bias, dtype=dtype, device=device),
+            nn.Dropout(dropout)
+        ])
+        self.to_add_out = operations.Linear(self.inner_dim, self.out_context_dim, bias=out_bias, dtype=dtype, device=device)
+
+    def forward(
+        self,
+        hidden_states: torch.FloatTensor,  # Image stream
+        encoder_hidden_states: torch.FloatTensor = None,  # Text stream
+        encoder_hidden_states_mask: torch.FloatTensor = None,
+        attention_mask: Optional[torch.FloatTensor] = None,
+        image_rotary_emb: Optional[torch.Tensor] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        seq_txt = encoder_hidden_states.shape[1]
+
+        img_query = self.to_q(hidden_states).unflatten(-1, (self.heads, -1))
+        img_key = self.to_k(hidden_states).unflatten(-1, (self.heads, -1))
+        img_value = self.to_v(hidden_states).unflatten(-1, (self.heads, -1))
+
+        txt_query = self.add_q_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+        txt_key = self.add_k_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+        txt_value = self.add_v_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+
+        img_query = self.norm_q(img_query)
+        img_key = self.norm_k(img_key)
+        txt_query = self.norm_added_q(txt_query)
+        txt_key = self.norm_added_k(txt_key)
+
+        joint_query = torch.cat([txt_query, img_query], dim=1)
+        joint_key = torch.cat([txt_key, img_key], dim=1)
+        joint_value = torch.cat([txt_value, img_value], dim=1)
+
+        joint_query = apply_rotary_emb(joint_query, image_rotary_emb)
+        joint_key = apply_rotary_emb(joint_key, image_rotary_emb)
+
+        joint_query = joint_query.flatten(start_dim=2)
+        joint_key = joint_key.flatten(start_dim=2)
+        joint_value = joint_value.flatten(start_dim=2)
+
+        joint_hidden_states = optimized_attention_masked(joint_query, joint_key, joint_value, self.heads, attention_mask)
+
+        txt_attn_output = joint_hidden_states[:, :seq_txt, :]
+        img_attn_output = joint_hidden_states[:, seq_txt:, :]
+
+        img_attn_output = self.to_out[0](img_attn_output)
+        img_attn_output = self.to_out[1](img_attn_output)
+        txt_attn_output = self.to_add_out(txt_attn_output)
+
+        return img_attn_output, txt_attn_output
+
+
+class QwenImageTransformerBlock(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        num_attention_heads: int,
+        attention_head_dim: int,
+        eps: float = 1e-6,
+        dtype=None,
+        device=None,
+        operations=None
+    ):
+        super().__init__()
+        self.dim = dim
+        self.num_attention_heads = num_attention_heads
+        self.attention_head_dim = attention_head_dim
+
+        self.img_mod = nn.Sequential(
+            nn.SiLU(),
+            operations.Linear(dim, 6 * dim, bias=True, dtype=dtype, device=device),
+        )
+        self.img_norm1 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.img_norm2 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.img_mlp = FeedForward(dim=dim, dim_out=dim, dtype=dtype, device=device, operations=operations)
+
+        self.txt_mod = nn.Sequential(
+            nn.SiLU(),
+            operations.Linear(dim, 6 * dim, bias=True, dtype=dtype, device=device),
+        )
+        self.txt_norm1 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.txt_norm2 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.txt_mlp = FeedForward(dim=dim, dim_out=dim, dtype=dtype, device=device, operations=operations)
+
+        self.attn = Attention(
+            query_dim=dim,
+            dim_head=attention_head_dim,
+            heads=num_attention_heads,
+            out_dim=dim,
+            bias=True,
+            eps=eps,
+            dtype=dtype,
+            device=device,
+            operations=operations,
+        )
+
+    def _modulate(self, x: torch.Tensor, mod_params: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        shift, scale, gate = torch.chunk(mod_params, 3, dim=-1)
+        return torch.addcmul(shift.unsqueeze(1), x, 1 + scale.unsqueeze(1)), gate.unsqueeze(1)
+
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        encoder_hidden_states: torch.Tensor,
+        encoder_hidden_states_mask: torch.Tensor,
+        temb: torch.Tensor,
+        image_rotary_emb: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        img_mod_params = self.img_mod(temb)
+        txt_mod_params = self.txt_mod(temb)
+        img_mod1, img_mod2 = img_mod_params.chunk(2, dim=-1)
+        txt_mod1, txt_mod2 = txt_mod_params.chunk(2, dim=-1)
+
+        img_normed = self.img_norm1(hidden_states)
+        img_modulated, img_gate1 = self._modulate(img_normed, img_mod1)
+        txt_normed = self.txt_norm1(encoder_hidden_states)
+        txt_modulated, txt_gate1 = self._modulate(txt_normed, txt_mod1)
+
+        img_attn_output, txt_attn_output = self.attn(
+            hidden_states=img_modulated,
+            encoder_hidden_states=txt_modulated,
+            encoder_hidden_states_mask=encoder_hidden_states_mask,
+            image_rotary_emb=image_rotary_emb,
+        )
+
+        hidden_states = hidden_states + img_gate1 * img_attn_output
+        encoder_hidden_states = encoder_hidden_states + txt_gate1 * txt_attn_output
+
+        img_normed2 = self.img_norm2(hidden_states)
+        img_modulated2, img_gate2 = self._modulate(img_normed2, img_mod2)
+        hidden_states = torch.addcmul(hidden_states, img_gate2, self.img_mlp(img_modulated2))
+
+        txt_normed2 = self.txt_norm2(encoder_hidden_states)
+        txt_modulated2, txt_gate2 = self._modulate(txt_normed2, txt_mod2)
+        encoder_hidden_states = torch.addcmul(encoder_hidden_states, txt_gate2, self.txt_mlp(txt_modulated2))
+
+        return encoder_hidden_states, hidden_states
+
+
+class LastLayer(nn.Module):
+    def __init__(
+        self,
+        embedding_dim: int,
+        conditioning_embedding_dim: int,
+        elementwise_affine=False,
+        eps=1e-6,
+        bias=True,
+        dtype=None, device=None, operations=None
+    ):
+        super().__init__()
+        self.silu = nn.SiLU()
+        self.linear = operations.Linear(conditioning_embedding_dim, embedding_dim * 2, bias=bias, dtype=dtype, device=device)
+        self.norm = operations.LayerNorm(embedding_dim, eps, elementwise_affine=False, bias=bias, dtype=dtype, device=device)
+
+    def forward(self, x: torch.Tensor, conditioning_embedding: torch.Tensor) -> torch.Tensor:
+        emb = self.linear(self.silu(conditioning_embedding))
+        scale, shift = torch.chunk(emb, 2, dim=1)
+        x = torch.addcmul(shift[:, None, :], self.norm(x), (1 + scale)[:, None, :])
+        return x
+
+
+class QwenImageTransformer2DModel(nn.Module):
+    def __init__(
+        self,
+        patch_size: int = 2,
+        in_channels: int = 64,
+        out_channels: Optional[int] = 16,
+        num_layers: int = 60,
+        attention_head_dim: int = 128,
+        num_attention_heads: int = 24,
+        joint_attention_dim: int = 3584,
+        pooled_projection_dim: int = 768,
+        guidance_embeds: bool = False,
+        axes_dims_rope: Tuple[int, int, int] = (16, 56, 56),
+        image_model=None,
+        final_layer=True,
+        dtype=None,
+        device=None,
+        operations=None,
+    ):
+        super().__init__()
+        self.dtype = dtype
+        self.patch_size = patch_size
+        self.in_channels = in_channels
+        self.out_channels = out_channels or in_channels
+        self.inner_dim = num_attention_heads * attention_head_dim
+
+        self.pe_embedder = EmbedND(dim=attention_head_dim, theta=10000, axes_dim=list(axes_dims_rope))
+
+        self.time_text_embed = QwenTimestepProjEmbeddings(
+            embedding_dim=self.inner_dim,
+            pooled_projection_dim=pooled_projection_dim,
+            dtype=dtype,
+            device=device,
+            operations=operations
+        )
+
+        self.txt_norm = operations.RMSNorm(joint_attention_dim, eps=1e-6, dtype=dtype, device=device)
+        self.img_in = operations.Linear(in_channels, self.inner_dim, dtype=dtype, device=device)
+        self.txt_in = operations.Linear(joint_attention_dim, self.inner_dim, dtype=dtype, device=device)
+
+        self.transformer_blocks = nn.ModuleList([
+            QwenImageTransformerBlock(
+                dim=self.inner_dim,
+                num_attention_heads=num_attention_heads,
+                attention_head_dim=attention_head_dim,
+                dtype=dtype,
+                device=device,
+                operations=operations
+            )
+            for _ in range(num_layers)
+        ])
+
+        if final_layer:
+            self.norm_out = LastLayer(self.inner_dim, self.inner_dim, dtype=dtype, device=device, operations=operations)
+            self.proj_out = operations.Linear(self.inner_dim, patch_size * patch_size * self.out_channels, bias=True, dtype=dtype, device=device)
+
+    def process_img(self, x, index=0, h_offset=0, w_offset=0):
+        bs, c, t, h, w = x.shape
+        patch_size = self.patch_size
+        hidden_states = comfy.ldm.common_dit.pad_to_patch_size(x, (1, self.patch_size, self.patch_size))
+        orig_shape = hidden_states.shape
+        hidden_states = hidden_states.view(orig_shape[0], orig_shape[1], orig_shape[-2] // 2, 2, orig_shape[-1] // 2, 2)
+        hidden_states = hidden_states.permute(0, 2, 4, 1, 3, 5)
+        hidden_states = hidden_states.reshape(orig_shape[0], (orig_shape[-2] // 2) * (orig_shape[-1] // 2), orig_shape[1] * 4)
+        h_len = ((h + (patch_size // 2)) // patch_size)
+        w_len = ((w + (patch_size // 2)) // patch_size)
+
+        h_offset = ((h_offset + (patch_size // 2)) // patch_size)
+        w_offset = ((w_offset + (patch_size // 2)) // patch_size)
+
+        img_ids = torch.zeros((h_len, w_len, 3), device=x.device)
+        img_ids[:, :, 0] = img_ids[:, :, 1] + index
+        img_ids[:, :, 1] = img_ids[:, :, 1] + torch.linspace(h_offset, h_len - 1 + h_offset, steps=h_len, device=x.device, dtype=x.dtype).unsqueeze(1) - (h_len // 2)
+        img_ids[:, :, 2] = img_ids[:, :, 2] + torch.linspace(w_offset, w_len - 1 + w_offset, steps=w_len, device=x.device, dtype=x.dtype).unsqueeze(0) - (w_len // 2)
+        return hidden_states, repeat(img_ids, "h w c -> b (h w) c", b=bs), orig_shape
+
+    def forward(self, x, timestep, context, attention_mask=None, guidance=None, ref_latents=None, transformer_options={}, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
+        ).execute(x, timestep, context, attention_mask, guidance, ref_latents, transformer_options, **kwargs)
+
+    def _forward(
+        self,
+        x,
+        timesteps,
+        context,
+        attention_mask=None,
+        guidance: torch.Tensor = None,
+        ref_latents=None,
+        transformer_options={},
+        control=None,
+        **kwargs
+    ):
+        timestep = timesteps
+        encoder_hidden_states = context
+        encoder_hidden_states_mask = attention_mask
+
+        hidden_states, img_ids, orig_shape = self.process_img(x)
+        num_embeds = hidden_states.shape[1]
+
+        if ref_latents is not None:
+            h = 0
+            w = 0
+            index = 0
+            index_ref_method = kwargs.get("ref_latents_method", "index") == "index"
+            for ref in ref_latents:
+                if index_ref_method:
+                    index += 1
+                    h_offset = 0
+                    w_offset = 0
+                else:
+                    index = 1
+                    h_offset = 0
+                    w_offset = 0
+                    if ref.shape[-2] + h > ref.shape[-1] + w:
+                        w_offset = w
+                    else:
+                        h_offset = h
+                    h = max(h, ref.shape[-2] + h_offset)
+                    w = max(w, ref.shape[-1] + w_offset)
+
+                kontext, kontext_ids, _ = self.process_img(ref, index=index, h_offset=h_offset, w_offset=w_offset)
+                hidden_states = torch.cat([hidden_states, kontext], dim=1)
+                img_ids = torch.cat([img_ids, kontext_ids], dim=1)
+
+        txt_start = round(max(((x.shape[-1] + (self.patch_size // 2)) // self.patch_size) // 2, ((x.shape[-2] + (self.patch_size // 2)) // self.patch_size) // 2))
+        txt_ids = torch.arange(txt_start, txt_start + context.shape[1], device=x.device).reshape(1, -1, 1).repeat(x.shape[0], 1, 3)
+        ids = torch.cat((txt_ids, img_ids), dim=1)
+        image_rotary_emb = self.pe_embedder(ids).squeeze(1).unsqueeze(2).to(x.dtype)
+        del ids, txt_ids, img_ids
+
+        hidden_states = self.img_in(hidden_states)
+        encoder_hidden_states = self.txt_norm(encoder_hidden_states)
+        encoder_hidden_states = self.txt_in(encoder_hidden_states)
+
+        if guidance is not None:
+            guidance = guidance * 1000
+
+        temb = (
+            self.time_text_embed(timestep, hidden_states)
+            if guidance is None
+            else self.time_text_embed(timestep, guidance, hidden_states)
+        )
+
+        patches_replace = transformer_options.get("patches_replace", {})
+        patches = transformer_options.get("patches", {})
+        blocks_replace = patches_replace.get("dit", {})
+
+        for i, block in enumerate(self.transformer_blocks):
+            if ("double_block", i) in blocks_replace:
+                def block_wrap(args):
+                    out = {}
+                    out["txt"], out["img"] = block(hidden_states=args["img"], encoder_hidden_states=args["txt"], encoder_hidden_states_mask=encoder_hidden_states_mask, temb=args["vec"], image_rotary_emb=args["pe"])
+                    return out
+                out = blocks_replace[("double_block", i)]({"img": hidden_states, "txt": encoder_hidden_states, "vec": temb, "pe": image_rotary_emb}, {"original_block": block_wrap})
+                hidden_states = out["img"]
+                encoder_hidden_states = out["txt"]
+            else:
+                encoder_hidden_states, hidden_states = block(
+                    hidden_states=hidden_states,
+                    encoder_hidden_states=encoder_hidden_states,
+                    encoder_hidden_states_mask=encoder_hidden_states_mask,
+                    temb=temb,
+                    image_rotary_emb=image_rotary_emb,
+                )
+
+            if "double_block" in patches:
+                for p in patches["double_block"]:
+                    out = p({"img": hidden_states, "txt": encoder_hidden_states, "x": x, "block_index": i})
+                    hidden_states = out["img"]
+                    encoder_hidden_states = out["txt"]
+
+            if control is not None: # Controlnet
+                control_i = control.get("input")
+                if i < len(control_i):
+                    add = control_i[i]
+                    if add is not None:
+                        hidden_states += add
+
+        hidden_states = self.norm_out(hidden_states, temb)
+        hidden_states = self.proj_out(hidden_states)
+
+        hidden_states = hidden_states[:, :num_embeds].view(orig_shape[0], orig_shape[-2] // 2, orig_shape[-1] // 2, orig_shape[1], 2, 2)
+        hidden_states = hidden_states.permute(0, 3, 1, 4, 2, 5)
+        return hidden_states.reshape(orig_shape)[:, :, :, :x.shape[-2], :x.shape[-1]]
--- a/comfy/ldm/wan/model.py
+++ b/comfy/ldm/wan/model.py
@@ -11,6 +11,7 @@ from comfy.ldm.flux.layers import EmbedND
 from comfy.ldm.flux.math import apply_rope
 import comfy.ldm.common_dit
 import comfy.model_management
+import comfy.patcher_extension


 def sinusoidal_embedding_1d(dim, position):
@@ -146,6 +147,15 @@ WAN_CROSSATTENTION_CLASSES = {
 }


+def repeat_e(e, x):
+    repeats = 1
+    if e.size(1) > 1:
+        repeats = x.size(1) // e.size(1)
+    if repeats == 1:
+        return e
+    return torch.repeat_interleave(e, repeats, dim=1)
+
+
 class WanAttentionBlock(nn.Module):

    def __init__(self,
@@ -201,6 +211,7 @@ class WanAttentionBlock(nn.Module):
            freqs(Tensor): Rope freqs, shape [1024, C / num_heads / 2]
        """
        # assert e.dtype == torch.float32
+
        if e.ndim < 4:
            e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device) + e).chunk(6, dim=1)
        else:
@@ -209,15 +220,15 @@ class WanAttentionBlock(nn.Module):

        # self-attention
        y = self.self_attn(
-            self.norm1(x) * (1 + e[1]) + e[0],
+            torch.addcmul(repeat_e(e[0], x), self.norm1(x), 1 + repeat_e(e[1], x)),
            freqs)

-        x = x + y * e[2]
+        x = torch.addcmul(x, y, repeat_e(e[2], x))

        # cross-attention & ffn
        x = x + self.cross_attn(self.norm3(x), context, context_img_len=context_img_len)
-        y = self.ffn(self.norm2(x) * (1 + e[4]) + e[3])
-        x = x + y * e[5]
+        y = self.ffn(torch.addcmul(repeat_e(e[3], x), self.norm2(x), 1 + repeat_e(e[4], x)))
+        x = torch.addcmul(x, y, repeat_e(e[5], x))
        return x


@@ -331,7 +342,8 @@ class Head(nn.Module):
            e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device) + e.unsqueeze(1)).chunk(2, dim=1)
        else:
            e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device).unsqueeze(0) + e.unsqueeze(2)).unbind(2)
-        x = (self.head(self.norm(x) * (1 + e[1]) + e[0]))
+
+        x = (self.head(torch.addcmul(repeat_e(e[0], x), self.norm(x), 1 + repeat_e(e[1], x))))
        return x


@@ -380,6 +392,7 @@ class WanModel(torch.nn.Module):
                 cross_attn_norm=True,
                 eps=1e-6,
                 flf_pos_embed_token_number=None,
+                 in_dim_ref_conv=None,
                 image_model=None,
                 device=None,
                 dtype=None,
@@ -473,6 +486,11 @@ class WanModel(torch.nn.Module):
        else:
            self.img_emb = None

+        if in_dim_ref_conv is not None:
+            self.ref_conv = operations.Conv2d(in_dim_ref_conv, dim, kernel_size=patch_size[1:], stride=patch_size[1:], device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        else:
+            self.ref_conv = None
+
    def forward_orig(
        self,
        x,
@@ -515,6 +533,13 @@ class WanModel(torch.nn.Module):
        e = e.reshape(t.shape[0], -1, e.shape[-1])
        e0 = self.time_projection(e).unflatten(2, (6, self.dim))

+        full_ref = None
+        if self.ref_conv is not None:
+            full_ref = kwargs.get("reference_latent", None)
+            if full_ref is not None:
+                full_ref = self.ref_conv(full_ref).flatten(2).transpose(1, 2)
+                x = torch.concat((full_ref, x), dim=1)
+
        # context
        context = self.text_embedding(context)

@@ -541,11 +566,21 @@ class WanModel(torch.nn.Module):
        # head
        x = self.head(x, e)

+        if full_ref is not None:
+            x = x[:, full_ref.shape[1]:]
+
        # unpatchify
        x = self.unpatchify(x, grid_sizes)
        return x

    def forward(self, x, timestep, context, clip_fea=None, time_dim_concat=None, transformer_options={}, **kwargs):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self._forward,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
+        ).execute(x, timestep, context, clip_fea, time_dim_concat, transformer_options, **kwargs)
+
+    def _forward(self, x, timestep, context, clip_fea=None, time_dim_concat=None, transformer_options={}, **kwargs):
        bs, c, t, h, w = x.shape
        x = comfy.ldm.common_dit.pad_to_patch_size(x, self.patch_size)

@@ -559,6 +594,9 @@ class WanModel(torch.nn.Module):
            x = torch.cat([x, time_dim_concat], dim=2)
            t_len = ((x.shape[2] + (patch_size[0] // 2)) // patch_size[0])

+        if self.ref_conv is not None and "reference_latent" in kwargs:
+            t_len += 1
+
        img_ids = torch.zeros((t_len, h_len, w_len, 3), device=x.device, dtype=x.dtype)
        img_ids[:, :, :, 0] = img_ids[:, :, :, 0] + torch.linspace(0, t_len - 1, steps=t_len, device=x.device, dtype=x.dtype).reshape(-1, 1, 1)
        img_ids[:, :, :, 1] = img_ids[:, :, :, 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype).reshape(1, -1, 1)
@@ -738,7 +776,12 @@ class CameraWanModel(WanModel):
                 operations=None,
                 ):

-        super().__init__(model_type='i2v', patch_size=patch_size, text_len=text_len, in_dim=in_dim, dim=dim, ffn_dim=ffn_dim, freq_dim=freq_dim, text_dim=text_dim, out_dim=out_dim, num_heads=num_heads, num_layers=num_layers, window_size=window_size, qk_norm=qk_norm, cross_attn_norm=cross_attn_norm, eps=eps, flf_pos_embed_token_number=flf_pos_embed_token_number, image_model=image_model, device=device, dtype=dtype, operations=operations)
+        if model_type == 'camera':
+            model_type = 'i2v'
+        else:
+            model_type = 't2v'
+
+        super().__init__(model_type=model_type, patch_size=patch_size, text_len=text_len, in_dim=in_dim, dim=dim, ffn_dim=ffn_dim, freq_dim=freq_dim, text_dim=text_dim, out_dim=out_dim, num_heads=num_heads, num_layers=num_layers, window_size=window_size, qk_norm=qk_norm, cross_attn_norm=cross_attn_norm, eps=eps, flf_pos_embed_token_number=flf_pos_embed_token_number, image_model=image_model, device=device, dtype=dtype, operations=operations)
        operation_settings = {"operations": operations, "device": device, "dtype": dtype}

        self.control_adapter = WanCamAdapter(in_dim_control_adapter, dim, kernel_size=patch_size[1:], stride=patch_size[1:], operation_settings=operation_settings)
@@ -758,8 +801,7 @@ class CameraWanModel(WanModel):
        # embeddings
        x = self.patch_embedding(x.float()).to(x.dtype)
        if self.control_adapter is not None and camera_conditions is not None:
-            x_camera = self.control_adapter(camera_conditions).to(x.dtype)
-            x = x + x_camera
+            x = x + self.control_adapter(camera_conditions).to(x.dtype)
        grid_sizes = x.shape[2:]
        x = x.flatten(2).transpose(1, 2)

--- a/comfy/ldm/wan/vae.py
+++ b/comfy/ldm/wan/vae.py
@@ -24,12 +24,17 @@ class CausalConv3d(ops.Conv3d):
                         self.padding[1], 2 * self.padding[0], 0)
        self.padding = (0, 0, 0)

-    def forward(self, x, cache_x=None):
+    def forward(self, x, cache_x=None, cache_list=None, cache_idx=None):
+        if cache_list is not None:
+            cache_x = cache_list[cache_idx]
+            cache_list[cache_idx] = None
+
        padding = list(self._padding)
        if cache_x is not None and self._padding[4] > 0:
            cache_x = cache_x.to(x.device)
            x = torch.cat([cache_x, x], dim=2)
            padding[4] -= cache_x.shape[2]
+            del cache_x
        x = F.pad(x, padding)

        return super().forward(x)
@@ -166,7 +171,7 @@ class ResidualBlock(nn.Module):
            if in_dim != out_dim else nn.Identity()

    def forward(self, x, feat_cache=None, feat_idx=[0]):
-        h = self.shortcut(x)
+        old_x = x
        for layer in self.residual:
            if isinstance(layer, CausalConv3d) and feat_cache is not None:
                idx = feat_idx[0]
@@ -178,12 +183,12 @@ class ResidualBlock(nn.Module):
                            cache_x.device), cache_x
                    ],
                                        dim=2)
-                x = layer(x, feat_cache[idx])
+                x = layer(x, cache_list=feat_cache, cache_idx=idx)
                feat_cache[idx] = cache_x
                feat_idx[0] += 1
            else:
                x = layer(x)
-        return x + h
+        return x + self.shortcut(old_x)


 class AttentionBlock(nn.Module):
--- a/comfy/ldm/wan/vae2_2.py
+++ b/comfy/ldm/wan/vae2_2.py
@@ -136,7 +136,7 @@ class ResidualBlock(nn.Module):
            if in_dim != out_dim else nn.Identity())

    def forward(self, x, feat_cache=None, feat_idx=[0]):
-        h = self.shortcut(x)
+        old_x = x
        for layer in self.residual:
            if isinstance(layer, CausalConv3d) and feat_cache is not None:
                idx = feat_idx[0]
@@ -151,12 +151,12 @@ class ResidualBlock(nn.Module):
                        ],
                        dim=2,
                    )
-                x = layer(x, feat_cache[idx])
+                x = layer(x, cache_list=feat_cache, cache_idx=idx)
                feat_cache[idx] = cache_x
                feat_idx[0] += 1
            else:
                x = layer(x)
-        return x + h
+        return x + self.shortcut(old_x)


 def patchify(x, patch_size):
@@ -327,7 +327,7 @@ class Down_ResidualBlock(nn.Module):
        self.downsamples = nn.Sequential(*downsamples)

    def forward(self, x, feat_cache=None, feat_idx=[0]):
-        x_copy = x.clone()
+        x_copy = x
        for module in self.downsamples:
            x = module(x, feat_cache, feat_idx)

@@ -369,7 +369,7 @@ class Up_ResidualBlock(nn.Module):
        self.upsamples = nn.Sequential(*upsamples)

    def forward(self, x, feat_cache=None, feat_idx=[0], first_chunk=False):
-        x_main = x.clone()
+        x_main = x
        for module in self.upsamples:
            x_main = module(x_main, feat_cache, feat_idx)
        if self.avg_shortcut is not None:
--- a/comfy/lora.py
+++ b/comfy/lora.py
@@ -293,6 +293,16 @@ def model_lora_keys_unet(model, key_map={}):
                key_lora = k[len("diffusion_model."):-len(".weight")]
                key_map["{}".format(key_lora)] = k

+    if isinstance(model, comfy.model_base.QwenImage):
+        for k in sdk:
+            if k.startswith("diffusion_model.") and k.endswith(".weight"): #QwenImage lora format
+                key_lora = k[len("diffusion_model."):-len(".weight")]
+                # Direct mapping for transformer_blocks format (QwenImage LoRA format)
+                key_map["{}".format(key_lora)] = k
+                # Support transformer prefix format
+                key_map["transformer.{}".format(key_lora)] = k
+                key_map["lycoris_{}".format(key_lora.replace(".", "_"))] = k #SimpleTuner lycoris format
+
    return key_map


--- a/comfy/model_base.py
+++ b/comfy/model_base.py
@@ -42,6 +42,7 @@ import comfy.ldm.hidream.model
 import comfy.ldm.chroma.model
 import comfy.ldm.ace.model
 import comfy.ldm.omnigen.omnigen2
+import comfy.ldm.qwen_image.model

 import comfy.model_management
 import comfy.patcher_extension
@@ -106,10 +107,12 @@ def model_sampling(model_config, model_type):
    return ModelSampling(model_config)


-def convert_tensor(extra, dtype):
+def convert_tensor(extra, dtype, device):
    if hasattr(extra, "dtype"):
        if extra.dtype != torch.int and extra.dtype != torch.long:
-            extra = extra.to(dtype)
+            extra = comfy.model_management.cast_to_device(extra, device, dtype)
+        else:
+            extra = comfy.model_management.cast_to_device(extra, device, None)
    return extra


@@ -160,7 +163,7 @@ class BaseModel(torch.nn.Module):
        xc = self.model_sampling.calculate_input(sigma, x)

        if c_concat is not None:
-            xc = torch.cat([xc] + [c_concat], dim=1)
+            xc = torch.cat([xc] + [comfy.model_management.cast_to_device(c_concat, xc.device, xc.dtype)], dim=1)

        context = c_crossattn
        dtype = self.get_dtype()
@@ -169,20 +172,21 @@ class BaseModel(torch.nn.Module):
            dtype = self.manual_cast_dtype

        xc = xc.to(dtype)
+        device = xc.device
        t = self.model_sampling.timestep(t).float()
        if context is not None:
-            context = context.to(dtype)
+            context = comfy.model_management.cast_to_device(context, device, dtype)

        extra_conds = {}
        for o in kwargs:
            extra = kwargs[o]

            if hasattr(extra, "dtype"):
-                extra = convert_tensor(extra, dtype)
+                extra = convert_tensor(extra, dtype, device)
            elif isinstance(extra, list):
                ex = []
                for ext in extra:
-                    ex.append(convert_tensor(ext, dtype))
+                    ex.append(convert_tensor(ext, dtype, device))
                extra = ex
            extra_conds[o] = extra

@@ -398,7 +402,7 @@ class SD21UNCLIP(BaseModel):
        unclip_conditioning = kwargs.get("unclip_conditioning", None)
        device = kwargs["device"]
        if unclip_conditioning is None:
-            return torch.zeros((1, self.adm_channels))
+            return torch.zeros((1, self.adm_channels), device=device)
        else:
            return unclip_adm(unclip_conditioning, device, self.noise_augmentor, kwargs.get("unclip_noise_augment_merge", 0.05), kwargs.get("seed", 0) - 10)

@@ -612,9 +616,11 @@ class IP2P:

        if image is None:
            image = torch.zeros_like(noise)
+        else:
+            image = image.to(device=device)

        if image.shape[1:] != noise.shape[1:]:
-            image = utils.common_upscale(image.to(device), noise.shape[-1], noise.shape[-2], "bilinear", "center")
+            image = utils.common_upscale(image, noise.shape[-1], noise.shape[-2], "bilinear", "center")

        image = utils.resize_to_batch_size(image, noise.shape[0])
        return self.process_ip2p_image_in(image)
@@ -693,7 +699,7 @@ class StableCascade_B(BaseModel):
        #size of prior doesn't really matter if zeros because it gets resized but I still want it to get batched
        prior = kwargs.get("stable_cascade_prior", torch.zeros((1, 16, (noise.shape[2] * 4) // 42, (noise.shape[3] * 4) // 42), dtype=noise.dtype, layout=noise.layout, device=noise.device))

-        out["effnet"] = comfy.conds.CONDRegular(prior)
+        out["effnet"] = comfy.conds.CONDRegular(prior.to(device=noise.device))
        out["sca"] = comfy.conds.CONDRegular(torch.zeros((1,)))
        return out

@@ -884,6 +890,10 @@ class Flux(BaseModel):
            for lat in ref_latents:
                latents.append(self.process_latent_in(lat))
            out['ref_latents'] = comfy.conds.CONDList(latents)
+
+            ref_latents_method = kwargs.get("reference_latents_method", None)
+            if ref_latents_method is not None:
+                out['ref_latents_method'] = comfy.conds.CONDConstant(ref_latents_method)
        return out

    def extra_conds_shapes(self, **kwargs):
@@ -1118,7 +1128,11 @@ class WAN21(BaseModel):
                mask = mask.repeat(1, 4, 1, 1, 1)
            mask = utils.resize_to_batch_size(mask, noise.shape[0])

-        return torch.cat((mask, image), dim=1)
+        concat_mask_index = kwargs.get("concat_mask_index", 0)
+        if concat_mask_index != 0:
+            return torch.cat((image[:, :concat_mask_index], mask, image[:, concat_mask_index:]), dim=1)
+        else:
+            return torch.cat((mask, image), dim=1)

    def extra_conds(self, **kwargs):
        out = super().extra_conds(**kwargs)
@@ -1134,6 +1148,10 @@ class WAN21(BaseModel):
        if time_dim_concat is not None:
            out['time_dim_concat'] = comfy.conds.CONDRegular(self.process_latent_in(time_dim_concat))

+        reference_latents = kwargs.get("reference_latents", None)
+        if reference_latents is not None:
+            out['reference_latent'] = comfy.conds.CONDRegular(self.process_latent_in(reference_latents[-1])[:, :, 0])
+
        return out


@@ -1158,10 +1176,10 @@ class WAN21_Vace(WAN21):

        vace_frames_out = []
        for j in range(len(vace_frames)):
-            vf = vace_frames[j].clone()
+            vf = vace_frames[j].to(device=noise.device, dtype=noise.dtype, copy=True)
            for i in range(0, vf.shape[1], 16):
                vf[:, i:i + 16] = self.process_latent_in(vf[:, i:i + 16])
-            vf = torch.cat([vf, mask[j]], dim=1)
+            vf = torch.cat([vf, mask[j].to(device=noise.device, dtype=noise.dtype)], dim=1)
            vace_frames_out.append(vf)

        vace_frames = torch.stack(vace_frames_out, dim=1)
@@ -1202,7 +1220,7 @@ class WAN22(BaseModel):
    def process_timestep(self, timestep, x, denoise_mask=None, **kwargs):
        if denoise_mask is None:
            return timestep
-        temp_ts = (torch.mean(denoise_mask[:, :, :, ::2, ::2], dim=1, keepdim=True) * timestep.view([timestep.shape[0]] + [1] * (denoise_mask.ndim - 1))).reshape(timestep.shape[0], -1)
+        temp_ts = (torch.mean(denoise_mask[:, :, :, :, :], dim=(1, 3, 4), keepdim=True) * timestep.view([timestep.shape[0]] + [1] * (denoise_mask.ndim - 1))).reshape(timestep.shape[0], -1)
        return temp_ts

    def scale_latent_inpaint(self, sigma, noise, latent_image, **kwargs):
@@ -1303,3 +1321,32 @@ class Omnigen2(BaseModel):
        if ref_latents is not None:
            out['ref_latents'] = list([1, 16, sum(map(lambda a: math.prod(a.size()), ref_latents)) // 16])
        return out
+
+class QwenImage(BaseModel):
+    def __init__(self, model_config, model_type=ModelType.FLUX, device=None):
+        super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.qwen_image.model.QwenImageTransformer2DModel)
+        self.memory_usage_factor_conds = ("ref_latents",)
+
+    def extra_conds(self, **kwargs):
+        out = super().extra_conds(**kwargs)
+        cross_attn = kwargs.get("cross_attn", None)
+        if cross_attn is not None:
+            out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)
+        ref_latents = kwargs.get("reference_latents", None)
+        if ref_latents is not None:
+            latents = []
+            for lat in ref_latents:
+                latents.append(self.process_latent_in(lat))
+            out['ref_latents'] = comfy.conds.CONDList(latents)
+
+            ref_latents_method = kwargs.get("reference_latents_method", None)
+            if ref_latents_method is not None:
+                out['ref_latents_method'] = comfy.conds.CONDConstant(ref_latents_method)
+        return out
+
+    def extra_conds_shapes(self, **kwargs):
+        out = {}
+        ref_latents = kwargs.get("reference_latents", None)
+        if ref_latents is not None:
+            out['ref_latents'] = list([1, 16, sum(map(lambda a: math.prod(a.size()), ref_latents)) // 16])
+        return out
--- a/comfy/model_detection.py
+++ b/comfy/model_detection.py
@@ -364,7 +364,10 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
            dit_config["vace_in_dim"] = state_dict['{}vace_patch_embedding.weight'.format(key_prefix)].shape[1]
            dit_config["vace_layers"] = count_blocks(state_dict_keys, '{}vace_blocks.'.format(key_prefix) + '{}.')
        elif '{}control_adapter.conv.weight'.format(key_prefix) in state_dict_keys:
-            dit_config["model_type"] = "camera"
+            if '{}img_emb.proj.0.bias'.format(key_prefix) in state_dict_keys:
+                dit_config["model_type"] = "camera"
+            else:
+                dit_config["model_type"] = "camera_2.2"
        else:
            if '{}img_emb.proj.0.bias'.format(key_prefix) in state_dict_keys:
                dit_config["model_type"] = "i2v"
@@ -373,6 +376,11 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
        flf_weight = state_dict.get('{}img_emb.emb_pos'.format(key_prefix))
        if flf_weight is not None:
            dit_config["flf_pos_embed_token_number"] = flf_weight.shape[1]
+
+        ref_conv_weight = state_dict.get('{}ref_conv.weight'.format(key_prefix))
+        if ref_conv_weight is not None:
+            dit_config["in_dim_ref_conv"] = ref_conv_weight.shape[1]
+
        return dit_config

    if '{}latent_in.weight'.format(key_prefix) in state_dict_keys:  # Hunyuan 3D
@@ -481,6 +489,13 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
        dit_config["timestep_scale"] = 1000.0
        return dit_config

+    if '{}txt_norm.weight'.format(key_prefix) in state_dict_keys:  # Qwen Image
+        dit_config = {}
+        dit_config["image_model"] = "qwen_image"
+        dit_config["in_channels"] = state_dict['{}img_in.weight'.format(key_prefix)].shape[1]
+        dit_config["num_layers"] = count_blocks(state_dict_keys, '{}transformer_blocks.'.format(key_prefix) + '{}.')
+        return dit_config
+
    if '{}input_blocks.0.0.weight'.format(key_prefix) not in state_dict_keys:
        return None

@@ -867,7 +882,7 @@ def convert_diffusers_mmdit(state_dict, output_prefix=""):
        depth_single_blocks = count_blocks(state_dict, 'single_transformer_blocks.{}.')
        hidden_size = state_dict["x_embedder.bias"].shape[0]
        sd_map = comfy.utils.flux_to_diffusers({"depth": depth, "depth_single_blocks": depth_single_blocks, "hidden_size": hidden_size}, output_prefix=output_prefix)
-    elif 'transformer_blocks.0.attn.add_q_proj.weight' in state_dict: #SD3
+    elif 'transformer_blocks.0.attn.add_q_proj.weight' in state_dict and 'pos_embed.proj.weight' in state_dict: #SD3
        num_blocks = count_blocks(state_dict, 'transformer_blocks.{}.')
        depth = state_dict["pos_embed.proj.weight"].shape[0] // 64
        sd_map = comfy.utils.mmdit_to_diffusers({"depth": depth, "num_blocks": num_blocks}, output_prefix=output_prefix)
--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@@ -78,7 +78,6 @@ try:
    torch_version = torch.version.__version__
    temp = torch_version.split(".")
    torch_version_numeric = (int(temp[0]), int(temp[1]))
-    xpu_available = (torch_version_numeric[0] < 2 or (torch_version_numeric[0] == 2 and torch_version_numeric[1] <= 4)) and torch.xpu.is_available()
 except:
    pass

@@ -102,10 +101,14 @@ if args.directml is not None:

 try:
    import intel_extension_for_pytorch as ipex  # noqa: F401
-    _ = torch.xpu.device_count()
-    xpu_available = xpu_available or torch.xpu.is_available()
 except:
-    xpu_available = xpu_available or (hasattr(torch, "xpu") and torch.xpu.is_available())
+    pass
+
+try:
+    _ = torch.xpu.device_count()
+    xpu_available = torch.xpu.is_available()
+except:
+    xpu_available = False

 try:
    if torch.backends.mps.is_available():
@@ -321,9 +324,9 @@ try:
            if torch_version_numeric >= (2, 7):  # works on 2.6 but doesn't actually seem to improve much
                if any((a in arch) for a in ["gfx90a", "gfx942", "gfx1100", "gfx1101", "gfx1151"]):  # TODO: more arches, TODO: gfx950
                    ENABLE_PYTORCH_ATTENTION = True
-            if torch_version_numeric >= (2, 8):
-                if any((a in arch) for a in ["gfx1201"]):
-                    ENABLE_PYTORCH_ATTENTION = True
+#            if torch_version_numeric >= (2, 8):
+#                if any((a in arch) for a in ["gfx1201"]):
+#                    ENABLE_PYTORCH_ATTENTION = True
        if torch_version_numeric >= (2, 7) and rocm_version >= (6, 4):
            if any((a in arch) for a in ["gfx1201", "gfx942", "gfx950"]):  # TODO: more arches
                SUPPORT_FP8_OPS = True
@@ -340,7 +343,7 @@ if ENABLE_PYTORCH_ATTENTION:

 PRIORITIZE_FP16 = False  # TODO: remove and replace with something that shows exactly which dtype is faster than the other
 try:
-    if is_nvidia() and PerformanceFeature.Fp16Accumulation in args.fast:
+    if (is_nvidia() or is_amd()) and PerformanceFeature.Fp16Accumulation in args.fast:
        torch.backends.cuda.matmul.allow_fp16_accumulation = True
        PRIORITIZE_FP16 = True  # TODO: limit to cards where it actually boosts performance
        logging.info("Enabled fp16 accumulation.")
@@ -529,6 +532,8 @@ WINDOWS = any(platform.win32_ver())
 EXTRA_RESERVED_VRAM = 400 * 1024 * 1024
 if WINDOWS:
    EXTRA_RESERVED_VRAM = 600 * 1024 * 1024 #Windows is higher because of the shared vram issue
+    if total_vram > (15 * 1024):  # more extra reserved vram on 16GB+ cards
+        EXTRA_RESERVED_VRAM += 100 * 1024 * 1024

 if args.reserve_vram is not None:
    EXTRA_RESERVED_VRAM = args.reserve_vram * 1024 * 1024 * 1024
@@ -588,7 +593,13 @@ def load_models_gpu(models, memory_required=0, force_patch_weights=False, minimu
    else:
        minimum_memory_required = max(inference_memory, minimum_memory_required + extra_reserved_memory())

-    models = set(models)
+    models_temp = set()
+    for m in models:
+        models_temp.add(m)
+        for mm in m.model_patches_models():
+            models_temp.add(mm)
+
+    models = models_temp

    models_to_load = []

@@ -944,10 +955,12 @@ def pick_weight_dtype(dtype, fallback_dtype, device=None):
    return dtype

 def device_supports_non_blocking(device):
+    if args.force_non_blocking:
+        return True
    if is_device_mps(device):
        return False #pytorch bug? mps doesn't support non blocking
-    if is_intel_xpu():
-        return True
+    if is_intel_xpu(): #xpu does support non blocking but it is slower on iGPUs for some reason so disable by default until situation changes
+        return False
    if args.deterministic: #TODO: figure out why deterministic breaks non blocking from gpu to cpu (previews)
        return False
    if directml_enabled:
@@ -1280,10 +1293,10 @@ def should_use_bf16(device=None, model_params=0, prioritize_performance=True, ma
        return False

    if is_intel_xpu():
-        if torch_version_numeric < (2, 6):
+        if torch_version_numeric < (2, 3):
            return True
        else:
-            return torch.xpu.get_device_capability(device)['has_bfloat16_conversions']
+            return torch.xpu.is_bf16_supported()

    if is_ascend_npu():
        return True
--- a/comfy/model_patcher.py
+++ b/comfy/model_patcher.py
@@ -430,6 +430,9 @@ class ModelPatcher:
    def set_model_forward_timestep_embed_patch(self, patch):
        self.set_model_patch(patch, "forward_timestep_embed_patch")

+    def set_model_double_block_patch(self, patch):
+        self.set_model_patch(patch, "double_block")
+
    def add_object_patch(self, name, obj):
        self.object_patches[name] = obj

@@ -486,6 +489,30 @@ class ModelPatcher:
            if hasattr(wrap_func, "to"):
                self.model_options["model_function_wrapper"] = wrap_func.to(device)

+    def model_patches_models(self):
+        to = self.model_options["transformer_options"]
+        models = []
+        if "patches" in to:
+            patches = to["patches"]
+            for name in patches:
+                patch_list = patches[name]
+                for i in range(len(patch_list)):
+                    if hasattr(patch_list[i], "models"):
+                        models += patch_list[i].models()
+        if "patches_replace" in to:
+            patches = to["patches_replace"]
+            for name in patches:
+                patch_list = patches[name]
+                for k in patch_list:
+                    if hasattr(patch_list[k], "models"):
+                        models += patch_list[k].models()
+        if "model_function_wrapper" in self.model_options:
+            wrap_func = self.model_options["model_function_wrapper"]
+            if hasattr(wrap_func, "models"):
+                models += wrap_func.models()
+
+        return models
+
    def model_dtype(self):
        if hasattr(self.model, "get_dtype"):
            return self.model.get_dtype()
--- a/comfy/ops.py
+++ b/comfy/ops.py
@@ -24,6 +24,32 @@ import comfy.float
 import comfy.rmsnorm
 import contextlib

+
+def scaled_dot_product_attention(q, k, v, *args, **kwargs):
+    return torch.nn.functional.scaled_dot_product_attention(q, k, v, *args, **kwargs)
+
+
+try:
+    if torch.cuda.is_available():
+        from torch.nn.attention import SDPBackend, sdpa_kernel
+        import inspect
+        if "set_priority" in inspect.signature(sdpa_kernel).parameters:
+            SDPA_BACKEND_PRIORITY = [
+                SDPBackend.FLASH_ATTENTION,
+                SDPBackend.EFFICIENT_ATTENTION,
+                SDPBackend.MATH,
+            ]
+
+            SDPA_BACKEND_PRIORITY.insert(0, SDPBackend.CUDNN_ATTENTION)
+
+            def scaled_dot_product_attention(q, k, v, *args, **kwargs):
+                with sdpa_kernel(SDPA_BACKEND_PRIORITY, set_priority=True):
+                    return torch.nn.functional.scaled_dot_product_attention(q, k, v, *args, **kwargs)
+        else:
+            logging.warning("Torch version too old to set sdpa backend priority.")
+except (ModuleNotFoundError, TypeError):
+    logging.warning("Could not set sdpa backend priority.")
+
 cast_to = comfy.model_management.cast_to #TODO: remove once no more references

 def cast_to_input(weight, input, non_blocking=False, copy=True):
--- a/comfy/patcher_extension.py
+++ b/comfy/patcher_extension.py
@@ -50,6 +50,7 @@ class WrappersMP:
    OUTER_SAMPLE = "outer_sample"
    PREPARE_SAMPLING = "prepare_sampling"
    SAMPLER_SAMPLE = "sampler_sample"
+    PREDICT_NOISE = "predict_noise"
    CALC_COND_BATCH = "calc_cond_batch"
    APPLY_MODEL = "apply_model"
    DIFFUSION_MODEL = "diffusion_model"
--- a/comfy/rmsnorm.py
+++ b/comfy/rmsnorm.py
@@ -1,6 +1,7 @@
 import torch
 import comfy.model_management
 import numbers
+import logging

 RMSNorm = None

@@ -9,6 +10,7 @@ try:
    RMSNorm = torch.nn.RMSNorm
 except:
    rms_norm_torch = None
+    logging.warning("Please update pytorch to use native RMSNorm")


 def rms_norm(x, weight=None, eps=1e-6):
--- a/comfy/sampler_helpers.py
+++ b/comfy/sampler_helpers.py
@@ -149,7 +149,7 @@ def cleanup_models(conds, models):

    cleanup_additional_models(set(control_cleanup))

-def prepare_model_patcher(model: 'ModelPatcher', conds, model_options: dict):
+def prepare_model_patcher(model: ModelPatcher, conds, model_options: dict):
    '''
    Registers hooks from conds.
    '''
@@ -158,8 +158,8 @@ def prepare_model_patcher(model: 'ModelPatcher', conds, model_options: dict):
    for k in conds:
        get_hooks_from_cond(conds[k], hooks)
    # add wrappers and callbacks from ModelPatcher to transformer_options
-    model_options["transformer_options"]["wrappers"] = comfy.patcher_extension.copy_nested_dicts(model.wrappers)
-    model_options["transformer_options"]["callbacks"] = comfy.patcher_extension.copy_nested_dicts(model.callbacks)
+    comfy.patcher_extension.merge_nested_dicts(model_options["transformer_options"].setdefault("wrappers", {}), model.wrappers, copy_dict1=False)
+    comfy.patcher_extension.merge_nested_dicts(model_options["transformer_options"].setdefault("callbacks", {}), model.callbacks, copy_dict1=False)
    # begin registering hooks
    registered = comfy.hooks.HookGroup()
    target_dict = comfy.hooks.create_target_dict(comfy.hooks.EnumWeightTarget.Model)
--- a/comfy/samplers.py
+++ b/comfy/samplers.py
@@ -16,6 +16,8 @@ import comfy.sampler_helpers
 import comfy.model_patcher
 import comfy.patcher_extension
 import comfy.hooks
+import comfy.context_windows
+import comfy.utils
 import scipy.stats
 import numpy

@@ -60,7 +62,7 @@ def get_area_and_mult(conds, x_in, timestep_in):
        if "mask_strength" in conds:
            mask_strength = conds["mask_strength"]
        mask = conds['mask']
-        assert (mask.shape[1:] == x_in.shape[2:])
+        # assert (mask.shape[1:] == x_in.shape[2:])

        mask = mask[:input_x.shape[0]]
        if area is not None:
@@ -68,7 +70,7 @@ def get_area_and_mult(conds, x_in, timestep_in):
                mask = mask.narrow(i + 1, area[len(dims) + i], area[i])

        mask = mask * mask_strength
-        mask = mask.unsqueeze(1).repeat(input_x.shape[0] // mask.shape[0], input_x.shape[1], 1, 1)
+        mask = mask.unsqueeze(1).repeat((input_x.shape[0] // mask.shape[0], input_x.shape[1]) + (1, ) * (mask.ndim - 1))
    else:
        mask = torch.ones_like(input_x)
    mult = mask * strength
@@ -89,7 +91,7 @@ def get_area_and_mult(conds, x_in, timestep_in):
    conditioning = {}
    model_conds = conds["model_conds"]
    for c in model_conds:
-        conditioning[c] = model_conds[c].process_cond(batch_size=x_in.shape[0], device=x_in.device, area=area)
+        conditioning[c] = model_conds[c].process_cond(batch_size=x_in.shape[0], area=area)

    hooks = conds.get('hooks', None)
    control = conds.get('control', None)
@@ -198,14 +200,20 @@ def finalize_default_conds(model: 'BaseModel', hooked_to_run: dict[comfy.hooks.H
            hooked_to_run.setdefault(p.hooks, list())
            hooked_to_run[p.hooks] += [(p, i)]

-def calc_cond_batch(model: 'BaseModel', conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
+def calc_cond_batch(model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options: dict[str]):
+    handler: comfy.context_windows.ContextHandlerABC = model_options.get("context_handler", None)
+    if handler is None or not handler.should_use_context(model, conds, x_in, timestep, model_options):
+        return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options)
+    return handler.execute(_calc_cond_batch_outer, model, conds, x_in, timestep, model_options)
+
+def _calc_cond_batch_outer(model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
    executor = comfy.patcher_extension.WrapperExecutor.new_executor(
        _calc_cond_batch,
        comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.CALC_COND_BATCH, model_options, is_model_options=True)
    )
    return executor.execute(model, conds, x_in, timestep, model_options)

-def _calc_cond_batch(model: 'BaseModel', conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
+def _calc_cond_batch(model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
    out_conds = []
    out_counts = []
    # separate conds by matching hooks
@@ -546,7 +554,10 @@ def resolve_areas_and_cond_masks_multidim(conditions, dims, device):
            if len(mask.shape) == len(dims):
                mask = mask.unsqueeze(0)
            if mask.shape[1:] != dims:
-                mask = torch.nn.functional.interpolate(mask.unsqueeze(1), size=dims, mode='bilinear', align_corners=False).squeeze(1)
+                if mask.ndim < 4:
+                    mask = comfy.utils.common_upscale(mask.unsqueeze(1), dims[-1], dims[-2], 'bilinear', 'none').squeeze(1)
+                else:
+                    mask = comfy.utils.common_upscale(mask, dims[-1], dims[-2], 'bilinear', 'none')

            if modified.get("set_area_to_bounds", False): #TODO: handle dim != 2
                bounds = torch.max(torch.abs(mask),dim=0).values.unsqueeze(0)
@@ -946,7 +957,14 @@ class CFGGuider:
            self.original_conds[k] = comfy.sampler_helpers.convert_cond(conds[k])

    def __call__(self, *args, **kwargs):
-        return self.predict_noise(*args, **kwargs)
+        return self.outer_predict_noise(*args, **kwargs)
+
+    def outer_predict_noise(self, x, timestep, model_options={}, seed=None):
+        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
+            self.predict_noise,
+            self,
+            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.PREDICT_NOISE, self.model_options, is_model_options=True)
+        ).execute(x, timestep, model_options, seed)

    def predict_noise(self, x, timestep, model_options={}, seed=None):
        return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
--- a/comfy/sd.py
+++ b/comfy/sd.py
@@ -47,6 +47,7 @@ import comfy.text_encoders.wan
 import comfy.text_encoders.hidream
 import comfy.text_encoders.ace
 import comfy.text_encoders.omnigen2
+import comfy.text_encoders.qwen_image

 import comfy.model_patcher
 import comfy.lora
@@ -771,6 +772,7 @@ class CLIPType(Enum):
    CHROMA = 15
    ACE = 16
    OMNIGEN2 = 17
+    QWEN_IMAGE = 18


 def load_clip(ckpt_paths, embedding_directory=None, clip_type=CLIPType.STABLE_DIFFUSION, model_options={}):
@@ -791,6 +793,7 @@ class TEModel(Enum):
    T5_XXL_OLD = 8
    GEMMA_2_2B = 9
    QWEN25_3B = 10
+    QWEN25_7B = 11

 def detect_te_model(sd):
    if "text_model.encoder.layers.30.mlp.fc1.weight" in sd:
@@ -812,7 +815,11 @@ def detect_te_model(sd):
    if 'model.layers.0.post_feedforward_layernorm.weight' in sd:
        return TEModel.GEMMA_2_2B
    if 'model.layers.0.self_attn.k_proj.bias' in sd:
-        return TEModel.QWEN25_3B
+        weight = sd['model.layers.0.self_attn.k_proj.bias']
+        if weight.shape[0] == 256:
+            return TEModel.QWEN25_3B
+        if weight.shape[0] == 512:
+            return TEModel.QWEN25_7B
    if "model.layers.0.post_attention_layernorm.weight" in sd:
        return TEModel.LLAMA3_8
    return None
@@ -917,6 +924,9 @@ def load_text_encoder_state_dicts(state_dicts=[], embedding_directory=None, clip
        elif te_model == TEModel.QWEN25_3B:
            clip_target.clip = comfy.text_encoders.omnigen2.te(**llama_detect(clip_data))
            clip_target.tokenizer = comfy.text_encoders.omnigen2.Omnigen2Tokenizer
+        elif te_model == TEModel.QWEN25_7B:
+            clip_target.clip = comfy.text_encoders.qwen_image.te(**llama_detect(clip_data))
+            clip_target.tokenizer = comfy.text_encoders.qwen_image.QwenImageTokenizer
        else:
            # clip_l
            if clip_type == CLIPType.SD3:
--- a/comfy/sd1_clip.py
+++ b/comfy/sd1_clip.py
@@ -204,17 +204,19 @@ class SDClipModel(torch.nn.Module, ClipTokenWeightEncoder):
            tokens_embed = self.transformer.get_input_embeddings()(tokens_embed, out_dtype=torch.float32)
            index = 0
            pad_extra = 0
+            embeds_info = []
            for o in other_embeds:
                emb = o[1]
                if torch.is_tensor(emb):
                    emb = {"type": "embedding", "data": emb}

+                extra = None
                emb_type = emb.get("type", None)
                if emb_type == "embedding":
                    emb = emb.get("data", None)
                else:
                    if hasattr(self.transformer, "preprocess_embed"):
-                        emb = self.transformer.preprocess_embed(emb, device=device)
+                        emb, extra = self.transformer.preprocess_embed(emb, device=device)
                    else:
                        emb = None

@@ -229,6 +231,7 @@ class SDClipModel(torch.nn.Module, ClipTokenWeightEncoder):
                    tokens_embed = torch.cat([tokens_embed[:, :ind], emb, tokens_embed[:, ind:]], dim=1)
                    attention_mask = attention_mask[:ind] + [1] * emb_shape + attention_mask[ind:]
                    index += emb_shape - 1
+                    embeds_info.append({"type": emb_type, "index": ind, "size": emb_shape, "extra": extra})
                else:
                    index += -1
                    pad_extra += emb_shape
@@ -243,11 +246,11 @@ class SDClipModel(torch.nn.Module, ClipTokenWeightEncoder):
            attention_masks.append(attention_mask)
            num_tokens.append(sum(attention_mask))

-        return torch.cat(embeds_out), torch.tensor(attention_masks, device=device, dtype=torch.long), num_tokens
+        return torch.cat(embeds_out), torch.tensor(attention_masks, device=device, dtype=torch.long), num_tokens, embeds_info

    def forward(self, tokens):
        device = self.transformer.get_input_embeddings().weight.device
-        embeds, attention_mask, num_tokens = self.process_tokens(tokens, device)
+        embeds, attention_mask, num_tokens, embeds_info = self.process_tokens(tokens, device)

        attention_mask_model = None
        if self.enable_attention_masks:
@@ -258,7 +261,7 @@ class SDClipModel(torch.nn.Module, ClipTokenWeightEncoder):
        else:
            intermediate_output = self.layer_idx

-        outputs = self.transformer(None, attention_mask_model, embeds=embeds, num_tokens=num_tokens, intermediate_output=intermediate_output, final_layer_norm_intermediate=self.layer_norm_hidden_state, dtype=torch.float32)
+        outputs = self.transformer(None, attention_mask_model, embeds=embeds, num_tokens=num_tokens, intermediate_output=intermediate_output, final_layer_norm_intermediate=self.layer_norm_hidden_state, dtype=torch.float32, embeds_info=embeds_info)

        if self.layer == "last":
            z = outputs[0].float()
@@ -531,7 +534,10 @@ class SDTokenizer:
        min_padding = tokenizer_options.get("{}_min_padding".format(self.embedding_key), self.min_padding)

        text = escape_important(text)
-        parsed_weights = token_weights(text, 1.0)
+        if kwargs.get("disable_weights", False):
+            parsed_weights = [(text, 1.0)]
+        else:
+            parsed_weights = token_weights(text, 1.0)

        # tokenize words
        tokens = []
--- a/comfy/supported_models.py
+++ b/comfy/supported_models.py
@@ -19,6 +19,7 @@ import comfy.text_encoders.lumina2
 import comfy.text_encoders.wan
 import comfy.text_encoders.ace
 import comfy.text_encoders.omnigen2
+import comfy.text_encoders.qwen_image

 from . import supported_models_base
 from . import latent_formats
@@ -1045,6 +1046,18 @@ class WAN21_Camera(WAN21_T2V):
    def get_model(self, state_dict, prefix="", device=None):
        out = model_base.WAN21_Camera(self, image_to_video=False, device=device)
        return out
+
+class WAN22_Camera(WAN21_T2V):
+    unet_config = {
+        "image_model": "wan2.1",
+        "model_type": "camera_2.2",
+        "in_dim": 36,
+    }
+
+    def get_model(self, state_dict, prefix="", device=None):
+        out = model_base.WAN21_Camera(self, image_to_video=False, device=device)
+        return out
+
 class WAN21_Vace(WAN21_T2V):
    unet_config = {
        "image_model": "wan2.1",
@@ -1229,7 +1242,36 @@ class Omnigen2(supported_models_base.BASE):
        hunyuan_detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen25_3b.transformer.".format(pref))
        return supported_models_base.ClipTarget(comfy.text_encoders.omnigen2.Omnigen2Tokenizer, comfy.text_encoders.omnigen2.te(**hunyuan_detect))

+class QwenImage(supported_models_base.BASE):
+    unet_config = {
+        "image_model": "qwen_image",
+    }

-models = [LotusD, Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanVideoSkyreelsI2V, HunyuanVideoI2V, HunyuanVideo, CosmosT2V, CosmosI2V, CosmosT2IPredict2, CosmosI2VPredict2, Lumina2, WAN22_T2V, WAN21_T2V, WAN21_I2V, WAN21_FunControl2V, WAN21_Vace, WAN21_Camera, Hunyuan3Dv2mini, Hunyuan3Dv2, HiDream, Chroma, ACEStep, Omnigen2]
+    sampling_settings = {
+        "multiplier": 1.0,
+        "shift": 1.15,
+    }
+
+    memory_usage_factor = 1.8 #TODO
+
+    unet_extra_config = {}
+    latent_format = latent_formats.Wan21
+
+    supported_inference_dtypes = [torch.bfloat16, torch.float32]
+
+    vae_key_prefix = ["vae."]
+    text_encoder_key_prefix = ["text_encoders."]
+
+    def get_model(self, state_dict, prefix="", device=None):
+        out = model_base.QwenImage(self, device=device)
+        return out
+
+    def clip_target(self, state_dict={}):
+        pref = self.text_encoder_key_prefix[0]
+        hunyuan_detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen25_7b.transformer.".format(pref))
+        return supported_models_base.ClipTarget(comfy.text_encoders.qwen_image.QwenImageTokenizer, comfy.text_encoders.qwen_image.te(**hunyuan_detect))
+
+
+models = [LotusD, Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanVideoSkyreelsI2V, HunyuanVideoI2V, HunyuanVideo, CosmosT2V, CosmosI2V, CosmosT2IPredict2, CosmosI2VPredict2, Lumina2, WAN22_T2V, WAN21_T2V, WAN21_I2V, WAN21_FunControl2V, WAN21_Vace, WAN21_Camera, WAN22_Camera, Hunyuan3Dv2mini, Hunyuan3Dv2, HiDream, Chroma, ACEStep, Omnigen2, QwenImage]

 models += [SVD_img2vid]
--- a/comfy/text_encoders/bert.py
+++ b/comfy/text_encoders/bert.py
@@ -116,7 +116,7 @@ class BertModel_(torch.nn.Module):
        self.embeddings = BertEmbeddings(config_dict["vocab_size"], config_dict["max_position_embeddings"], config_dict["type_vocab_size"], config_dict["pad_token_id"], embed_dim, layer_norm_eps, dtype, device, operations)
        self.encoder = BertEncoder(config_dict["num_hidden_layers"], embed_dim, config_dict["intermediate_size"], config_dict["num_attention_heads"], layer_norm_eps, dtype, device, operations)

-    def forward(self, input_tokens, attention_mask=None, embeds=None, num_tokens=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=None):
+    def forward(self, input_tokens, attention_mask=None, embeds=None, num_tokens=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=None, embeds_info=[]):
        x = self.embeddings(input_tokens, embeds=embeds, dtype=dtype)
        mask = None
        if attention_mask is not None:
--- a/comfy/text_encoders/llama.py
+++ b/comfy/text_encoders/llama.py
@@ -2,12 +2,14 @@ import torch
 import torch.nn as nn
 from dataclasses import dataclass
 from typing import Optional, Any
+import math

 from comfy.ldm.modules.attention import optimized_attention_for_device
 import comfy.model_management
 import comfy.ldm.common_dit

 import comfy.model_management
+from . import qwen_vl

@dataclass
 class Llama2Config:
@@ -25,6 +27,7 @@ class Llama2Config:
    rms_norm_add = False
    mlp_activation = "silu"
    qkv_bias = False
+    rope_dims = None

@dataclass
 class Qwen25_3BConfig:
@@ -42,6 +45,25 @@ class Qwen25_3BConfig:
    rms_norm_add = False
    mlp_activation = "silu"
    qkv_bias = True
+    rope_dims = None
+
+@dataclass
+class Qwen25_7BVLI_Config:
+    vocab_size: int = 152064
+    hidden_size: int = 3584
+    intermediate_size: int = 18944
+    num_hidden_layers: int = 28
+    num_attention_heads: int = 28
+    num_key_value_heads: int = 4
+    max_position_embeddings: int = 128000
+    rms_norm_eps: float = 1e-6
+    rope_theta: float = 1000000.0
+    transformer_type: str = "llama"
+    head_dim = 128
+    rms_norm_add = False
+    mlp_activation = "silu"
+    qkv_bias = True
+    rope_dims = [16, 24, 24]

@dataclass
 class Gemma2_2B_Config:
@@ -59,6 +81,7 @@ class Gemma2_2B_Config:
    rms_norm_add = True
    mlp_activation = "gelu_pytorch_tanh"
    qkv_bias = False
+    rope_dims = None

 class RMSNorm(nn.Module):
    def __init__(self, dim: int, eps: float = 1e-5, add=False, device=None, dtype=None):
@@ -83,24 +106,30 @@ def rotate_half(x):
    return torch.cat((-x2, x1), dim=-1)


-def precompute_freqs_cis(head_dim, seq_len, theta, device=None):
+def precompute_freqs_cis(head_dim, position_ids, theta, rope_dims=None, device=None):
    theta_numerator = torch.arange(0, head_dim, 2, device=device).float()
    inv_freq = 1.0 / (theta ** (theta_numerator / head_dim))

-    position_ids = torch.arange(0, seq_len, device=device).unsqueeze(0)
-
    inv_freq_expanded = inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
    position_ids_expanded = position_ids[:, None, :].float()
    freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
    emb = torch.cat((freqs, freqs), dim=-1)
    cos = emb.cos()
    sin = emb.sin()
+    if rope_dims is not None and position_ids.shape[0] > 1:
+        mrope_section = rope_dims * 2
+        cos = torch.cat([m[i % 3] for i, m in enumerate(cos.split(mrope_section, dim=-1))], dim=-1).unsqueeze(0)
+        sin = torch.cat([m[i % 3] for i, m in enumerate(sin.split(mrope_section, dim=-1))], dim=-1).unsqueeze(0)
+    else:
+        cos = cos.unsqueeze(1)
+        sin = sin.unsqueeze(1)
+
    return (cos, sin)


 def apply_rope(xq, xk, freqs_cis):
-    cos = freqs_cis[0].unsqueeze(1)
-    sin = freqs_cis[1].unsqueeze(1)
+    cos = freqs_cis[0]
+    sin = freqs_cis[1]
    q_embed = (xq * cos) + (rotate_half(xq) * sin)
    k_embed = (xk * cos) + (rotate_half(xk) * sin)
    return q_embed, k_embed
@@ -260,7 +289,7 @@ class Llama2_(nn.Module):
        self.norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps, add=config.rms_norm_add, device=device, dtype=dtype)
        # self.lm_head = ops.Linear(config.hidden_size, config.vocab_size, bias=False, device=device, dtype=dtype)

-    def forward(self, x, attention_mask=None, embeds=None, num_tokens=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=None):
+    def forward(self, x, attention_mask=None, embeds=None, num_tokens=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=None, position_ids=None, embeds_info=[]):
        if embeds is not None:
            x = embeds
        else:
@@ -269,9 +298,13 @@ class Llama2_(nn.Module):
        if self.normalize_in:
            x *= self.config.hidden_size ** 0.5

+        if position_ids is None:
+            position_ids = torch.arange(0, x.shape[1], device=x.device).unsqueeze(0)
+
        freqs_cis = precompute_freqs_cis(self.config.head_dim,
-                                         x.shape[1],
+                                         position_ids,
                                         self.config.rope_theta,
+                                         self.config.rope_dims,
                                         device=x.device)

        mask = None
@@ -348,6 +381,45 @@ class Qwen25_3B(BaseLlama, torch.nn.Module):
        self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
        self.dtype = dtype

+class Qwen25_7BVLI(BaseLlama, torch.nn.Module):
+    def __init__(self, config_dict, dtype, device, operations):
+        super().__init__()
+        config = Qwen25_7BVLI_Config(**config_dict)
+        self.num_layers = config.num_hidden_layers
+
+        self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
+        self.visual = qwen_vl.Qwen2VLVisionTransformer(hidden_size=1280, output_hidden_size=config.hidden_size, device=device, dtype=dtype, ops=operations)
+        self.dtype = dtype
+
+    def preprocess_embed(self, embed, device):
+        if embed["type"] == "image":
+            image, grid = qwen_vl.process_qwen2vl_images(embed["data"])
+            return self.visual(image.to(device, dtype=torch.float32), grid), grid
+        return None, None
+
+    def forward(self, x, attention_mask=None, embeds=None, num_tokens=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=None, embeds_info=[]):
+        grid = None
+        for e in embeds_info:
+            if e.get("type") == "image":
+                grid = e.get("extra", None)
+                position_ids = torch.zeros((3, embeds.shape[1]), device=embeds.device)
+                start = e.get("index")
+                position_ids[:, :start] = torch.arange(0, start, device=embeds.device)
+                end = e.get("size") + start
+                len_max = int(grid.max()) // 2
+                start_next = len_max + start
+                position_ids[:, end:] = torch.arange(start_next, start_next + (embeds.shape[1] - end), device=embeds.device)
+                position_ids[0, start:end] = start
+                max_d = int(grid[0][1]) // 2
+                position_ids[1, start:end] = torch.arange(start, start + max_d, device=embeds.device).unsqueeze(1).repeat(1, math.ceil((end - start) / max_d)).flatten(0)[:end - start]
+                max_d = int(grid[0][2]) // 2
+                position_ids[2, start:end] = torch.arange(start, start + max_d, device=embeds.device).unsqueeze(0).repeat(math.ceil((end - start) / max_d), 1).flatten(0)[:end - start]
+
+        if grid is None:
+            position_ids = None
+
+        return super().forward(x, attention_mask=attention_mask, embeds=embeds, num_tokens=num_tokens, intermediate_output=intermediate_output, final_layer_norm_intermediate=final_layer_norm_intermediate, dtype=dtype, position_ids=position_ids)
+
 class Gemma2_2B(BaseLlama, torch.nn.Module):
    def __init__(self, config_dict, dtype, device, operations):
        super().__init__()
--- a/comfy/text_encoders/qwen_image.py
+++ b/comfy/text_encoders/qwen_image.py
@@ -0,0 +1,85 @@
+from transformers import Qwen2Tokenizer
+from comfy import sd1_clip
+import comfy.text_encoders.llama
+import os
+import torch
+import numbers
+
+class Qwen25_7BVLITokenizer(sd1_clip.SDTokenizer):
+    def __init__(self, embedding_directory=None, tokenizer_data={}):
+        tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "qwen25_tokenizer")
+        super().__init__(tokenizer_path, pad_with_end=False, embedding_size=3584, embedding_key='qwen25_7b', tokenizer_class=Qwen2Tokenizer, has_start_token=False, has_end_token=False, pad_to_max_length=False, max_length=99999999, min_length=1, pad_token=151643, tokenizer_data=tokenizer_data)
+
+
+class QwenImageTokenizer(sd1_clip.SD1Tokenizer):
+    def __init__(self, embedding_directory=None, tokenizer_data={}):
+        super().__init__(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data, name="qwen25_7b", tokenizer=Qwen25_7BVLITokenizer)
+        self.llama_template = "<|im_start|>system\nDescribe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n"
+        self.llama_template_images = "<|im_start|>system\nDescribe the key features of the input image (color, shape, size, texture, objects, background), then explain how the user's text instruction should alter or modify the image. Generate a new image that meets the user's requirements while maintaining consistency with the original input where appropriate.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>{}<|im_end|>\n<|im_start|>assistant\n"
+
+    def tokenize_with_weights(self, text, return_word_ids=False, llama_template=None, images=[], **kwargs):
+        if llama_template is None:
+            if len(images) > 0:
+                llama_text = self.llama_template_images.format(text)
+            else:
+                llama_text = self.llama_template.format(text)
+        else:
+            llama_text = llama_template.format(text)
+        tokens = super().tokenize_with_weights(llama_text, return_word_ids=return_word_ids, disable_weights=True, **kwargs)
+        key_name = next(iter(tokens))
+        embed_count = 0
+        qwen_tokens = tokens[key_name]
+        for r in qwen_tokens:
+            for i in range(len(r)):
+                if r[i][0] == 151655:
+                    if len(images) > embed_count:
+                        r[i] = ({"type": "image", "data": images[embed_count], "original_type": "image"},) + r[i][1:]
+                        embed_count += 1
+        return tokens
+
+
+class Qwen25_7BVLIModel(sd1_clip.SDClipModel):
+    def __init__(self, device="cpu", layer="last", layer_idx=None, dtype=None, attention_mask=True, model_options={}):
+        super().__init__(device=device, layer=layer, layer_idx=layer_idx, textmodel_json_config={}, dtype=dtype, special_tokens={"pad": 151643}, layer_norm_hidden_state=False, model_class=comfy.text_encoders.llama.Qwen25_7BVLI, enable_attention_masks=attention_mask, return_attention_masks=attention_mask, model_options=model_options)
+
+
+class QwenImageTEModel(sd1_clip.SD1ClipModel):
+    def __init__(self, device="cpu", dtype=None, model_options={}):
+        super().__init__(device=device, dtype=dtype, name="qwen25_7b", clip_model=Qwen25_7BVLIModel, model_options=model_options)
+
+    def encode_token_weights(self, token_weight_pairs):
+        out, pooled, extra = super().encode_token_weights(token_weight_pairs)
+        tok_pairs = token_weight_pairs["qwen25_7b"][0]
+        count_im_start = 0
+        for i, v in enumerate(tok_pairs):
+            elem = v[0]
+            if not torch.is_tensor(elem):
+                if isinstance(elem, numbers.Integral):
+                    if elem == 151644 and count_im_start < 2:
+                        template_end = i
+                        count_im_start += 1
+
+        if out.shape[1] > (template_end + 3):
+            if tok_pairs[template_end + 1][0] == 872:
+                if tok_pairs[template_end + 2][0] == 198:
+                    template_end += 3
+
+        out = out[:, template_end:]
+
+        extra["attention_mask"] = extra["attention_mask"][:, template_end:]
+        if extra["attention_mask"].sum() == torch.numel(extra["attention_mask"]):
+            extra.pop("attention_mask")  # attention mask is useless if no masked elements
+
+        return out, pooled, extra
+
+
+def te(dtype_llama=None, llama_scaled_fp8=None):
+    class QwenImageTEModel_(QwenImageTEModel):
+        def __init__(self, device="cpu", dtype=None, model_options={}):
+            if llama_scaled_fp8 is not None and "scaled_fp8" not in model_options:
+                model_options = model_options.copy()
+                model_options["scaled_fp8"] = llama_scaled_fp8
+            if dtype_llama is not None:
+                dtype = dtype_llama
+            super().__init__(device=device, dtype=dtype, model_options=model_options)
+    return QwenImageTEModel_
--- a/comfy/text_encoders/qwen_vl.py
+++ b/comfy/text_encoders/qwen_vl.py
@@ -0,0 +1,428 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from typing import Optional, Tuple
+import math
+from comfy.ldm.modules.attention import optimized_attention_for_device
+
+
+def process_qwen2vl_images(
+    images: torch.Tensor,
+    min_pixels: int = 3136,
+    max_pixels: int = 12845056,
+    patch_size: int = 14,
+    temporal_patch_size: int = 2,
+    merge_size: int = 2,
+    image_mean: list = None,
+    image_std: list = None,
+):
+    if image_mean is None:
+        image_mean = [0.48145466, 0.4578275, 0.40821073]
+    if image_std is None:
+        image_std = [0.26862954, 0.26130258, 0.27577711]
+
+    batch_size, height, width, channels = images.shape
+    device = images.device
+    # dtype = images.dtype
+
+    images = images.permute(0, 3, 1, 2)
+
+    grid_thw_list = []
+    img = images[0]
+
+    factor = patch_size * merge_size
+
+    h_bar = round(height / factor) * factor
+    w_bar = round(width / factor) * factor
+
+    if h_bar * w_bar > max_pixels:
+        beta = math.sqrt((height * width) / max_pixels)
+        h_bar = max(factor, math.floor(height / beta / factor) * factor)
+        w_bar = max(factor, math.floor(width / beta / factor) * factor)
+    elif h_bar * w_bar < min_pixels:
+        beta = math.sqrt(min_pixels / (height * width))
+        h_bar = math.ceil(height * beta / factor) * factor
+        w_bar = math.ceil(width * beta / factor) * factor
+
+    img_resized = F.interpolate(
+        img.unsqueeze(0),
+        size=(h_bar, w_bar),
+        mode='bilinear',
+        align_corners=False
+    ).squeeze(0)
+
+    normalized = img_resized.clone()
+    for c in range(3):
+        normalized[c] = (img_resized[c] - image_mean[c]) / image_std[c]
+
+    grid_h = h_bar // patch_size
+    grid_w = w_bar // patch_size
+    grid_thw = torch.tensor([1, grid_h, grid_w], device=device, dtype=torch.long)
+
+    pixel_values = normalized
+    grid_thw_list.append(grid_thw)
+    image_grid_thw = torch.stack(grid_thw_list)
+
+    grid_t = 1
+    channel = pixel_values.shape[0]
+    pixel_values = pixel_values.unsqueeze(0).repeat(2, 1, 1, 1)
+
+    patches = pixel_values.reshape(
+        grid_t,
+        temporal_patch_size,
+        channel,
+        grid_h // merge_size,
+        merge_size,
+        patch_size,
+        grid_w // merge_size,
+        merge_size,
+        patch_size,
+    )
+
+    patches = patches.permute(0, 3, 6, 4, 7, 2, 1, 5, 8)
+    flatten_patches = patches.reshape(
+        grid_t * grid_h * grid_w,
+        channel * temporal_patch_size * patch_size * patch_size
+    )
+
+    return flatten_patches, image_grid_thw
+
+
+class VisionPatchEmbed(nn.Module):
+    def __init__(
+        self,
+        patch_size: int = 14,
+        temporal_patch_size: int = 2,
+        in_channels: int = 3,
+        embed_dim: int = 3584,
+        device=None,
+        dtype=None,
+        ops=None,
+    ):
+        super().__init__()
+        self.patch_size = patch_size
+        self.temporal_patch_size = temporal_patch_size
+        self.in_channels = in_channels
+        self.embed_dim = embed_dim
+
+        kernel_size = [temporal_patch_size, patch_size, patch_size]
+        self.proj = ops.Conv3d(
+            in_channels,
+            embed_dim,
+            kernel_size=kernel_size,
+            stride=kernel_size,
+            bias=False,
+            device=device,
+            dtype=dtype
+        )
+
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
+        hidden_states = hidden_states.view(
+            -1, self.in_channels, self.temporal_patch_size, self.patch_size, self.patch_size
+        )
+        hidden_states = self.proj(hidden_states)
+        return hidden_states.view(-1, self.embed_dim)
+
+
+def rotate_half(x):
+    x1 = x[..., : x.shape[-1] // 2]
+    x2 = x[..., x.shape[-1] // 2 :]
+    return torch.cat((-x2, x1), dim=-1)
+
+
+def apply_rotary_pos_emb_vision(q, k, cos, sin):
+    cos, sin = cos.unsqueeze(-2).float(), sin.unsqueeze(-2).float()
+    q_embed = (q * cos) + (rotate_half(q) * sin)
+    k_embed = (k * cos) + (rotate_half(k) * sin)
+    return q_embed, k_embed
+
+
+class VisionRotaryEmbedding(nn.Module):
+    def __init__(self, dim: int, theta: float = 10000.0):
+        super().__init__()
+        self.dim = dim
+        self.theta = theta
+
+    def forward(self, seqlen: int, device) -> torch.Tensor:
+        inv_freq = 1.0 / (self.theta ** (torch.arange(0, self.dim, 2, dtype=torch.float, device=device) / self.dim))
+        seq = torch.arange(seqlen, device=inv_freq.device, dtype=inv_freq.dtype)
+        freqs = torch.outer(seq, inv_freq)
+        return freqs
+
+
+class PatchMerger(nn.Module):
+    def __init__(self, dim: int, context_dim: int, spatial_merge_size: int = 2, device=None, dtype=None, ops=None):
+        super().__init__()
+        self.hidden_size = context_dim * (spatial_merge_size ** 2)
+        self.ln_q = ops.RMSNorm(context_dim, eps=1e-6, device=device, dtype=dtype)
+        self.mlp = nn.Sequential(
+            ops.Linear(self.hidden_size, self.hidden_size, device=device, dtype=dtype),
+            nn.GELU(),
+            ops.Linear(self.hidden_size, dim, device=device, dtype=dtype),
+        )
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.ln_q(x).reshape(-1, self.hidden_size)
+        x = self.mlp(x)
+        return x
+
+
+class VisionAttention(nn.Module):
+    def __init__(self, hidden_size: int, num_heads: int, device=None, dtype=None, ops=None):
+        super().__init__()
+        self.hidden_size = hidden_size
+        self.num_heads = num_heads
+        self.head_dim = hidden_size // num_heads
+        self.scaling = self.head_dim ** -0.5
+
+        self.qkv = ops.Linear(hidden_size, hidden_size * 3, bias=True, device=device, dtype=dtype)
+        self.proj = ops.Linear(hidden_size, hidden_size, bias=True, device=device, dtype=dtype)
+
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
+        cu_seqlens=None,
+        optimized_attention=None,
+    ) -> torch.Tensor:
+        if hidden_states.dim() == 2:
+            seq_length, _ = hidden_states.shape
+            batch_size = 1
+            hidden_states = hidden_states.unsqueeze(0)
+        else:
+            batch_size, seq_length, _ = hidden_states.shape
+
+        qkv = self.qkv(hidden_states)
+        qkv = qkv.reshape(batch_size, seq_length, 3, self.num_heads, self.head_dim)
+        query_states, key_states, value_states = qkv.reshape(seq_length, 3, self.num_heads, -1).permute(1, 0, 2, 3).unbind(0)
+
+        if position_embeddings is not None:
+            cos, sin = position_embeddings
+            query_states, key_states = apply_rotary_pos_emb_vision(query_states, key_states, cos, sin)
+
+        query_states = query_states.transpose(0, 1).unsqueeze(0)
+        key_states = key_states.transpose(0, 1).unsqueeze(0)
+        value_states = value_states.transpose(0, 1).unsqueeze(0)
+
+        lengths = cu_seqlens[1:] - cu_seqlens[:-1]
+        splits = [
+            torch.split(tensor, lengths.tolist(), dim=2) for tensor in (query_states, key_states, value_states)
+        ]
+
+        attn_outputs = [
+            optimized_attention(q, k, v, self.num_heads, skip_reshape=True)
+            for q, k, v in zip(*splits)
+        ]
+        attn_output = torch.cat(attn_outputs, dim=1)
+        attn_output = attn_output.reshape(seq_length, -1)
+        attn_output = self.proj(attn_output)
+
+        return attn_output
+
+
+class VisionMLP(nn.Module):
+    def __init__(self, hidden_size: int, intermediate_size: int, device=None, dtype=None, ops=None):
+        super().__init__()
+        self.gate_proj = ops.Linear(hidden_size, intermediate_size, bias=True, device=device, dtype=dtype)
+        self.up_proj = ops.Linear(hidden_size, intermediate_size, bias=True, device=device, dtype=dtype)
+        self.down_proj = ops.Linear(intermediate_size, hidden_size, bias=True, device=device, dtype=dtype)
+        self.act_fn = nn.SiLU()
+
+    def forward(self, hidden_state):
+        return self.down_proj(self.act_fn(self.gate_proj(hidden_state)) * self.up_proj(hidden_state))
+
+
+class VisionBlock(nn.Module):
+    def __init__(self, hidden_size: int, intermediate_size: int, num_heads: int, device=None, dtype=None, ops=None):
+        super().__init__()
+        self.norm1 = ops.RMSNorm(hidden_size, eps=1e-6, device=device, dtype=dtype)
+        self.norm2 = ops.RMSNorm(hidden_size, eps=1e-6, device=device, dtype=dtype)
+        self.attn = VisionAttention(hidden_size, num_heads, device=device, dtype=dtype, ops=ops)
+        self.mlp = VisionMLP(hidden_size, intermediate_size, device=device, dtype=dtype, ops=ops)
+
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
+        cu_seqlens=None,
+        optimized_attention=None,
+    ) -> torch.Tensor:
+        residual = hidden_states
+        hidden_states = self.norm1(hidden_states)
+        hidden_states = self.attn(hidden_states, position_embeddings, cu_seqlens, optimized_attention)
+        hidden_states = residual + hidden_states
+
+        residual = hidden_states
+        hidden_states = self.norm2(hidden_states)
+        hidden_states = self.mlp(hidden_states)
+        hidden_states = residual + hidden_states
+
+        return hidden_states
+
+
+class Qwen2VLVisionTransformer(nn.Module):
+    def __init__(
+        self,
+        hidden_size: int = 3584,
+        output_hidden_size: int = 3584,
+        intermediate_size: int = 3420,
+        num_heads: int = 16,
+        num_layers: int = 32,
+        patch_size: int = 14,
+        temporal_patch_size: int = 2,
+        spatial_merge_size: int = 2,
+        window_size: int = 112,
+        device=None,
+        dtype=None,
+        ops=None
+    ):
+        super().__init__()
+        self.hidden_size = hidden_size
+        self.patch_size = patch_size
+        self.spatial_merge_size = spatial_merge_size
+        self.window_size = window_size
+        self.fullatt_block_indexes = [7, 15, 23, 31]
+
+        self.patch_embed = VisionPatchEmbed(
+            patch_size=patch_size,
+            temporal_patch_size=temporal_patch_size,
+            in_channels=3,
+            embed_dim=hidden_size,
+            device=device,
+            dtype=dtype,
+            ops=ops,
+        )
+
+        head_dim = hidden_size // num_heads
+        self.rotary_pos_emb = VisionRotaryEmbedding(head_dim // 2)
+
+        self.blocks = nn.ModuleList([
+            VisionBlock(hidden_size, intermediate_size, num_heads, device, dtype, ops)
+            for _ in range(num_layers)
+        ])
+
+        self.merger = PatchMerger(
+            dim=output_hidden_size,
+            context_dim=hidden_size,
+            spatial_merge_size=spatial_merge_size,
+            device=device,
+            dtype=dtype,
+            ops=ops,
+        )
+
+    def get_window_index(self, grid_thw):
+        window_index = []
+        cu_window_seqlens = [0]
+        window_index_id = 0
+        vit_merger_window_size = self.window_size // self.spatial_merge_size // self.patch_size
+
+        for grid_t, grid_h, grid_w in grid_thw:
+            llm_grid_h = grid_h // self.spatial_merge_size
+            llm_grid_w = grid_w // self.spatial_merge_size
+
+            index = torch.arange(grid_t * llm_grid_h * llm_grid_w).reshape(grid_t, llm_grid_h, llm_grid_w)
+
+            pad_h = vit_merger_window_size - llm_grid_h % vit_merger_window_size
+            pad_w = vit_merger_window_size - llm_grid_w % vit_merger_window_size
+            num_windows_h = (llm_grid_h + pad_h) // vit_merger_window_size
+            num_windows_w = (llm_grid_w + pad_w) // vit_merger_window_size
+
+            index_padded = F.pad(index, (0, pad_w, 0, pad_h), "constant", -100)
+            index_padded = index_padded.reshape(
+                grid_t,
+                num_windows_h,
+                vit_merger_window_size,
+                num_windows_w,
+                vit_merger_window_size,
+            )
+            index_padded = index_padded.permute(0, 1, 3, 2, 4).reshape(
+                grid_t,
+                num_windows_h * num_windows_w,
+                vit_merger_window_size,
+                vit_merger_window_size,
+            )
+
+            seqlens = (index_padded != -100).sum([2, 3]).reshape(-1)
+            index_padded = index_padded.reshape(-1)
+            index_new = index_padded[index_padded != -100]
+            window_index.append(index_new + window_index_id)
+
+            cu_seqlens_tmp = seqlens.cumsum(0) * self.spatial_merge_size * self.spatial_merge_size + cu_window_seqlens[-1]
+            cu_window_seqlens.extend(cu_seqlens_tmp.tolist())
+            window_index_id += (grid_t * llm_grid_h * llm_grid_w).item()
+
+        window_index = torch.cat(window_index, dim=0)
+        return window_index, cu_window_seqlens
+
+    def get_position_embeddings(self, grid_thw, device):
+        pos_ids = []
+
+        for t, h, w in grid_thw:
+            hpos_ids = torch.arange(h, device=device).unsqueeze(1).expand(-1, w)
+            hpos_ids = hpos_ids.reshape(
+                h // self.spatial_merge_size,
+                self.spatial_merge_size,
+                w // self.spatial_merge_size,
+                self.spatial_merge_size,
+            )
+            hpos_ids = hpos_ids.permute(0, 2, 1, 3).flatten()
+
+            wpos_ids = torch.arange(w, device=device).unsqueeze(0).expand(h, -1)
+            wpos_ids = wpos_ids.reshape(
+                h // self.spatial_merge_size,
+                self.spatial_merge_size,
+                w // self.spatial_merge_size,
+                self.spatial_merge_size,
+            )
+            wpos_ids = wpos_ids.permute(0, 2, 1, 3).flatten()
+
+            pos_ids.append(torch.stack([hpos_ids, wpos_ids], dim=-1).repeat(t, 1))
+
+        pos_ids = torch.cat(pos_ids, dim=0)
+        max_grid_size = grid_thw[:, 1:].max()
+        rotary_pos_emb_full = self.rotary_pos_emb(max_grid_size, device)
+        return rotary_pos_emb_full[pos_ids].flatten(1)
+
+    def forward(
+        self,
+        pixel_values: torch.Tensor,
+        image_grid_thw: Optional[torch.Tensor] = None,
+    ) -> torch.Tensor:
+        optimized_attention = optimized_attention_for_device(pixel_values.device, mask=False, small_input=True)
+
+        hidden_states = self.patch_embed(pixel_values)
+
+        window_index, cu_window_seqlens = self.get_window_index(image_grid_thw)
+        cu_window_seqlens = torch.tensor(cu_window_seqlens, device=hidden_states.device)
+        cu_window_seqlens = torch.unique_consecutive(cu_window_seqlens)
+
+        position_embeddings = self.get_position_embeddings(image_grid_thw, hidden_states.device)
+
+        seq_len, _ = hidden_states.size()
+        spatial_merge_unit = self.spatial_merge_size * self.spatial_merge_size
+
+        hidden_states = hidden_states.reshape(seq_len // spatial_merge_unit, spatial_merge_unit, -1)
+        hidden_states = hidden_states[window_index, :, :]
+        hidden_states = hidden_states.reshape(seq_len, -1)
+
+        position_embeddings = position_embeddings.reshape(seq_len // spatial_merge_unit, spatial_merge_unit, -1)
+        position_embeddings = position_embeddings[window_index, :, :]
+        position_embeddings = position_embeddings.reshape(seq_len, -1)
+        position_embeddings = torch.cat((position_embeddings, position_embeddings), dim=-1)
+        position_embeddings = (position_embeddings.cos(), position_embeddings.sin())
+
+        cu_seqlens = torch.repeat_interleave(image_grid_thw[:, 1] * image_grid_thw[:, 2], image_grid_thw[:, 0]).cumsum(
+            dim=0,
+            dtype=torch.int32,
+        )
+        cu_seqlens = F.pad(cu_seqlens, (1, 0), value=0)
+
+        for i, block in enumerate(self.blocks):
+            if i in self.fullatt_block_indexes:
+                cu_seqlens_now = cu_seqlens
+            else:
+                cu_seqlens_now = cu_window_seqlens
+            hidden_states = block(hidden_states, position_embeddings, cu_seqlens_now, optimized_attention=optimized_attention)
+
+        hidden_states = self.merger(hidden_states)
+        return hidden_states
--- a/comfy/text_encoders/t5.py
+++ b/comfy/text_encoders/t5.py
@@ -199,7 +199,7 @@ class T5Stack(torch.nn.Module):
        self.final_layer_norm = T5LayerNorm(model_dim, dtype=dtype, device=device, operations=operations)
        # self.dropout = nn.Dropout(config.dropout_rate)

-    def forward(self, x, attention_mask=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=None):
+    def forward(self, x, attention_mask=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=None, embeds_info=[]):
        mask = None
        if attention_mask is not None:
            mask = 1.0 - attention_mask.to(x.dtype).reshape((attention_mask.shape[0], 1, -1, attention_mask.shape[-1])).expand(attention_mask.shape[0], 1, attention_mask.shape[-1], attention_mask.shape[-1])
--- a/comfy/weight_adapter/lora.py
+++ b/comfy/weight_adapter/lora.py
@@ -96,6 +96,7 @@ class LoRAAdapter(WeightAdapterBase):
        diffusers3_lora = "{}.lora.up.weight".format(x)
        mochi_lora = "{}.lora_B".format(x)
        transformers_lora = "{}.lora_linear_layer.up.weight".format(x)
+        qwen_default_lora = "{}.lora_B.default.weight".format(x)
        A_name = None

        if regular_lora in lora.keys():
@@ -122,6 +123,10 @@ class LoRAAdapter(WeightAdapterBase):
            A_name = transformers_lora
            B_name = "{}.lora_linear_layer.down.weight".format(x)
            mid_name = None
+        elif qwen_default_lora in lora.keys():
+            A_name = qwen_default_lora
+            B_name = "{}.lora_A.default.weight".format(x)
+            mid_name = None

        if A_name is not None:
            mid = None
--- a/comfy_api/generate_api_stubs.py
+++ b/comfy_api/generate_api_stubs.py
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+"""
+Script to generate .pyi stub files for the synchronous API wrappers.
+This allows generating stubs without running the full ComfyUI application.
+"""
+
+import os
+import sys
+import logging
+import importlib
+
+# Add ComfyUI to path so we can import modules
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from comfy_api.internal.async_to_sync import AsyncToSyncConverter
+from comfy_api.version_list import supported_versions
+
+
+def generate_stubs_for_module(module_name: str) -> None:
+    """Generate stub files for a specific module that exports ComfyAPI and ComfyAPISync."""
+    try:
+        # Import the module
+        module = importlib.import_module(module_name)
+
+        # Check if module has ComfyAPISync (the sync wrapper)
+        if hasattr(module, "ComfyAPISync"):
+            # Module already has a sync class
+            api_class = getattr(module, "ComfyAPI", None)
+            sync_class = getattr(module, "ComfyAPISync")
+
+            if api_class:
+                # Generate the stub file
+                AsyncToSyncConverter.generate_stub_file(api_class, sync_class)
+                logging.info(f"Generated stub file for {module_name}")
+            else:
+                logging.warning(
+                    f"Module {module_name} has ComfyAPISync but no ComfyAPI"
+                )
+
+        elif hasattr(module, "ComfyAPI"):
+            # Module only has async API, need to create sync wrapper first
+            from comfy_api.internal.async_to_sync import create_sync_class
+
+            api_class = getattr(module, "ComfyAPI")
+            sync_class = create_sync_class(api_class)
+
+            # Generate the stub file
+            AsyncToSyncConverter.generate_stub_file(api_class, sync_class)
+            logging.info(f"Generated stub file for {module_name}")
+        else:
+            logging.warning(
+                f"Module {module_name} does not export ComfyAPI or ComfyAPISync"
+            )
+
+    except Exception as e:
+        logging.error(f"Failed to generate stub for {module_name}: {e}")
+        import traceback
+
+        traceback.print_exc()
+
+
+def main():
+    """Main function to generate all API stub files."""
+    logging.basicConfig(level=logging.INFO)
+
+    logging.info("Starting stub generation...")
+
+    # Dynamically get module names from supported_versions
+    api_modules = []
+    for api_class in supported_versions:
+        # Extract module name from the class
+        module_name = api_class.__module__
+        if module_name not in api_modules:
+            api_modules.append(module_name)
+
+    logging.info(f"Found {len(api_modules)} API modules: {api_modules}")
+
+    # Generate stubs for each module
+    for module_name in api_modules:
+        generate_stubs_for_module(module_name)
+
+    logging.info("Stub generation complete!")
+
+
+if __name__ == "__main__":
+    main()
--- a/comfy_api/input/init.py
+++ b/comfy_api/input/init.py
@@ -1,8 +1,16 @@
-from .basic_types import ImageInput, AudioInput
-from .video_types import VideoInput
+# This file only exists for backwards compatibility.
+from comfy_api.latest._input import (
+    ImageInput,
+    AudioInput,
+    MaskInput,
+    LatentInput,
+    VideoInput,
+)

 __all__ = [
    "ImageInput",
    "AudioInput",
+    "MaskInput",
+    "LatentInput",
    "VideoInput",
 ]
--- a/comfy_api/input/basic_types.py
+++ b/comfy_api/input/basic_types.py
@@ -1,20 +1,14 @@
-import torch
-from typing import TypedDict
-
-ImageInput = torch.Tensor
-"""
-An image in format [B, H, W, C] where B is the batch size, C is the number of channels,
-"""
-
-class AudioInput(TypedDict):
-    """
-    TypedDict representing audio input.
-    """
-
-    waveform: torch.Tensor
-    """
-    Tensor in the format [B, C, T] where B is the batch size, C is the number of channels,
-    """
-
-    sample_rate: int
+# This file only exists for backwards compatibility.
+from comfy_api.latest._input.basic_types import (
+    ImageInput,
+    AudioInput,
+    MaskInput,
+    LatentInput,
+)

+__all__ = [
+    "ImageInput",
+    "AudioInput",
+    "MaskInput",
+    "LatentInput",
+]
--- a/comfy_api/input/video_types.py
+++ b/comfy_api/input/video_types.py
@@ -1,85 +1,6 @@
-from __future__ import annotations
-from abc import ABC, abstractmethod
-from typing import Optional, Union
-import io
-import av
-from comfy_api.util import VideoContainer, VideoCodec, VideoComponents
+# This file only exists for backwards compatibility.
+from comfy_api.latest._input.video_types import VideoInput

-class VideoInput(ABC):
-    """
-    Abstract base class for video input types.
-    """
-
-    @abstractmethod
-    def get_components(self) -> VideoComponents:
-        """
-        Abstract method to get the video components (images, audio, and frame rate).
-
-        Returns:
-            VideoComponents containing images, audio, and frame rate
-        """
-        pass
-
-    @abstractmethod
-    def save_to(
-        self,
-        path: str,
-        format: VideoContainer = VideoContainer.AUTO,
-        codec: VideoCodec = VideoCodec.AUTO,
-        metadata: Optional[dict] = None
-    ):
-        """
-        Abstract method to save the video input to a file.
-        """
-        pass
-
-    def get_stream_source(self) -> Union[str, io.BytesIO]:
-        """
-        Get a streamable source for the video. This allows processing without
-        loading the entire video into memory.
-
-        Returns:
-            Either a file path (str) or a BytesIO object that can be opened with av.
-
-        Default implementation creates a BytesIO buffer, but subclasses should
-        override this for better performance when possible.
-        """
-        buffer = io.BytesIO()
-        self.save_to(buffer)
-        buffer.seek(0)
-        return buffer
-
-    # Provide a default implementation, but subclasses can provide optimized versions
-    # if possible.
-    def get_dimensions(self) -> tuple[int, int]:
-        """
-        Returns the dimensions of the video input.
-
-        Returns:
-            Tuple of (width, height)
-        """
-        components = self.get_components()
-        return components.images.shape[2], components.images.shape[1]
-
-    def get_duration(self) -> float:
-        """
-        Returns the duration of the video in seconds.
-
-        Returns:
-            Duration in seconds
-        """
-        components = self.get_components()
-        frame_count = components.images.shape[0]
-        return float(frame_count / components.frame_rate)
-
-    def get_container_format(self) -> str:
-        """
-        Returns the container format of the video (e.g., 'mp4', 'mov', 'avi').
-
-        Returns:
-            Container format as string
-        """
-        # Default implementation - subclasses should override for better performance
-        source = self.get_stream_source()
-        with av.open(source, mode="r") as container:
-            return container.format.name
+__all__ = [
+    "VideoInput",
+]
--- a/comfy_api/input_impl/init.py
+++ b/comfy_api/input_impl/init.py
@@ -1,7 +1,7 @@
-from .video_types import VideoFromFile, VideoFromComponents
+# This file only exists for backwards compatibility.
+from comfy_api.latest._input_impl import VideoFromFile, VideoFromComponents

 __all__ = [
-    # Implementations
    "VideoFromFile",
    "VideoFromComponents",
 ]
--- a/comfy_api/input_impl/video_types.py
+++ b/comfy_api/input_impl/video_types.py
@@ -1,324 +1,2 @@
-from __future__ import annotations
-from av.container import InputContainer
-from av.subtitles.stream import SubtitleStream
-from fractions import Fraction
-from typing import Optional
-from comfy_api.input import AudioInput
-import av
-import io
-import json
-import numpy as np
-import torch
-from comfy_api.input import VideoInput
-from comfy_api.util import VideoContainer, VideoCodec, VideoComponents
-
-
-def container_to_output_format(container_format: str | None) -> str | None:
-    """
-    A container's `format` may be a comma-separated list of formats.
-    E.g., iso container's `format` may be `mov,mp4,m4a,3gp,3g2,mj2`.
-    However, writing to a file/stream with `av.open` requires a single format,
-    or `None` to auto-detect.
-    """
-    if not container_format:
-        return None  # Auto-detect
-
-    if "," not in container_format:
-        return container_format
-
-    formats = container_format.split(",")
-    return formats[0]
-
-
-def get_open_write_kwargs(
-    dest: str | io.BytesIO, container_format: str, to_format: str | None
-) -> dict:
-    """Get kwargs for writing a `VideoFromFile` to a file/stream with `av.open`"""
-    open_kwargs = {
-        "mode": "w",
-        # If isobmff, preserve custom metadata tags (workflow, prompt, extra_pnginfo)
-        "options": {"movflags": "use_metadata_tags"},
-    }
-
-    is_write_to_buffer = isinstance(dest, io.BytesIO)
-    if is_write_to_buffer:
-        # Set output format explicitly, since it cannot be inferred from file extension
-        if to_format == VideoContainer.AUTO:
-            to_format = container_format.lower()
-        elif isinstance(to_format, str):
-            to_format = to_format.lower()
-        open_kwargs["format"] = container_to_output_format(to_format)
-
-    return open_kwargs
-
-
-class VideoFromFile(VideoInput):
-    """
-    Class representing video input from a file.
-    """
-
-    def __init__(self, file: str | io.BytesIO):
-        """
-        Initialize the VideoFromFile object based off of either a path on disk or a BytesIO object
-        containing the file contents.
-        """
-        self.__file = file
-
-    def get_stream_source(self) -> str | io.BytesIO:
-        """
-        Return the underlying file source for efficient streaming.
-        This avoids unnecessary memory copies when the source is already a file path.
-        """
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)
-        return self.__file
-
-    def get_dimensions(self) -> tuple[int, int]:
-        """
-        Returns the dimensions of the video input.
-
-        Returns:
-            Tuple of (width, height)
-        """
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)  # Reset the BytesIO object to the beginning
-        with av.open(self.__file, mode='r') as container:
-            for stream in container.streams:
-                if stream.type == 'video':
-                    assert isinstance(stream, av.VideoStream)
-                    return stream.width, stream.height
-        raise ValueError(f"No video stream found in file '{self.__file}'")
-
-    def get_duration(self) -> float:
-        """
-        Returns the duration of the video in seconds.
-
-        Returns:
-            Duration in seconds
-        """
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)
-        with av.open(self.__file, mode="r") as container:
-            if container.duration is not None:
-                return float(container.duration / av.time_base)
-
-            # Fallback: calculate from frame count and frame rate
-            video_stream = next(
-                (s for s in container.streams if s.type == "video"), None
-            )
-            if video_stream and video_stream.frames and video_stream.average_rate:
-                return float(video_stream.frames / video_stream.average_rate)
-
-            # Last resort: decode frames to count them
-            if video_stream and video_stream.average_rate:
-                frame_count = 0
-                container.seek(0)
-                for packet in container.demux(video_stream):
-                    for _ in packet.decode():
-                        frame_count += 1
-                if frame_count > 0:
-                    return float(frame_count / video_stream.average_rate)
-
-        raise ValueError(f"Could not determine duration for file '{self.__file}'")
-
-    def get_container_format(self) -> str:
-        """
-        Returns the container format of the video (e.g., 'mp4', 'mov', 'avi').
-
-        Returns:
-            Container format as string
-        """
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)
-        with av.open(self.__file, mode='r') as container:
-            return container.format.name
-
-    def get_components_internal(self, container: InputContainer) -> VideoComponents:
-        # Get video frames
-        frames = []
-        for frame in container.decode(video=0):
-            img = frame.to_ndarray(format='rgb24')  # shape: (H, W, 3)
-            img = torch.from_numpy(img) / 255.0  # shape: (H, W, 3)
-            frames.append(img)
-
-        images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 3, 0, 0)
-
-        # Get frame rate
-        video_stream = next(s for s in container.streams if s.type == 'video')
-        frame_rate = Fraction(video_stream.average_rate) if video_stream and video_stream.average_rate else Fraction(1)
-
-        # Get audio if available
-        audio = None
-        try:
-            container.seek(0)  # Reset the container to the beginning
-            for stream in container.streams:
-                if stream.type != 'audio':
-                    continue
-                assert isinstance(stream, av.AudioStream)
-                audio_frames = []
-                for packet in container.demux(stream):
-                    for frame in packet.decode():
-                        assert isinstance(frame, av.AudioFrame)
-                        audio_frames.append(frame.to_ndarray())  # shape: (channels, samples)
-                if len(audio_frames) > 0:
-                    audio_data = np.concatenate(audio_frames, axis=1)  # shape: (channels, total_samples)
-                    audio_tensor = torch.from_numpy(audio_data).unsqueeze(0)  # shape: (1, channels, total_samples)
-                    audio = AudioInput({
-                        "waveform": audio_tensor,
-                        "sample_rate": int(stream.sample_rate) if stream.sample_rate else 1,
-                    })
-        except StopIteration:
-            pass  # No audio stream
-
-        metadata = container.metadata
-        return VideoComponents(images=images, audio=audio, frame_rate=frame_rate, metadata=metadata)
-
-    def get_components(self) -> VideoComponents:
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)  # Reset the BytesIO object to the beginning
-        with av.open(self.__file, mode='r') as container:
-            return self.get_components_internal(container)
-        raise ValueError(f"No video stream found in file '{self.__file}'")
-
-    def save_to(
-        self,
-        path: str | io.BytesIO,
-        format: VideoContainer = VideoContainer.AUTO,
-        codec: VideoCodec = VideoCodec.AUTO,
-        metadata: Optional[dict] = None
-    ):
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)  # Reset the BytesIO object to the beginning
-        with av.open(self.__file, mode='r') as container:
-            container_format = container.format.name
-            video_encoding = container.streams.video[0].codec.name if len(container.streams.video) > 0 else None
-            reuse_streams = True
-            if format != VideoContainer.AUTO and format not in container_format.split(","):
-                reuse_streams = False
-            if codec != VideoCodec.AUTO and codec != video_encoding and video_encoding is not None:
-                reuse_streams = False
-
-            if not reuse_streams:
-                components = self.get_components_internal(container)
-                video = VideoFromComponents(components)
-                return video.save_to(
-                    path,
-                    format=format,
-                    codec=codec,
-                    metadata=metadata
-                )
-
-            streams = container.streams
-
-            open_kwargs = get_open_write_kwargs(path, container_format, format)
-            with av.open(path, **open_kwargs) as output_container:
-                # Copy over the original metadata
-                for key, value in container.metadata.items():
-                    if metadata is None or key not in metadata:
-                        output_container.metadata[key] = value
-
-                # Add our new metadata
-                if metadata is not None:
-                    for key, value in metadata.items():
-                        if isinstance(value, str):
-                            output_container.metadata[key] = value
-                        else:
-                            output_container.metadata[key] = json.dumps(value)
-
-                # Add streams to the new container
-                stream_map = {}
-                for stream in streams:
-                    if isinstance(stream, (av.VideoStream, av.AudioStream, SubtitleStream)):
-                        out_stream = output_container.add_stream_from_template(template=stream, opaque=True)
-                        stream_map[stream] = out_stream
-
-                # Write packets to the new container
-                for packet in container.demux():
-                    if packet.stream in stream_map and packet.dts is not None:
-                        packet.stream = stream_map[packet.stream]
-                        output_container.mux(packet)
-
-class VideoFromComponents(VideoInput):
-    """
-    Class representing video input from tensors.
-    """
-
-    def __init__(self, components: VideoComponents):
-        self.__components = components
-
-    def get_components(self) -> VideoComponents:
-        return VideoComponents(
-            images=self.__components.images,
-            audio=self.__components.audio,
-            frame_rate=self.__components.frame_rate
-        )
-
-    def save_to(
-        self,
-        path: str,
-        format: VideoContainer = VideoContainer.AUTO,
-        codec: VideoCodec = VideoCodec.AUTO,
-        metadata: Optional[dict] = None
-    ):
-        if format != VideoContainer.AUTO and format != VideoContainer.MP4:
-            raise ValueError("Only MP4 format is supported for now")
-        if codec != VideoCodec.AUTO and codec != VideoCodec.H264:
-            raise ValueError("Only H264 codec is supported for now")
-        with av.open(path, mode='w', options={'movflags': 'use_metadata_tags'}) as output:
-            # Add metadata before writing any streams
-            if metadata is not None:
-                for key, value in metadata.items():
-                    output.metadata[key] = json.dumps(value)
-
-            frame_rate = Fraction(round(self.__components.frame_rate * 1000), 1000)
-            # Create a video stream
-            video_stream = output.add_stream('h264', rate=frame_rate)
-            video_stream.width = self.__components.images.shape[2]
-            video_stream.height = self.__components.images.shape[1]
-            video_stream.pix_fmt = 'yuv420p'
-
-            # Create an audio stream
-            audio_sample_rate = 1
-            audio_stream: Optional[av.AudioStream] = None
-            if self.__components.audio:
-                audio_sample_rate = int(self.__components.audio['sample_rate'])
-                audio_stream = output.add_stream('aac', rate=audio_sample_rate)
-                audio_stream.sample_rate = audio_sample_rate
-                audio_stream.format = 'fltp'
-
-            # Encode video
-            for i, frame in enumerate(self.__components.images):
-                img = (frame * 255).clamp(0, 255).byte().cpu().numpy() # shape: (H, W, 3)
-                frame = av.VideoFrame.from_ndarray(img, format='rgb24')
-                frame = frame.reformat(format='yuv420p')  # Convert to YUV420P as required by h264
-                packet = video_stream.encode(frame)
-                output.mux(packet)
-
-            # Flush video
-            packet = video_stream.encode(None)
-            output.mux(packet)
-
-            if audio_stream and self.__components.audio:
-                # Encode audio
-                samples_per_frame = int(audio_sample_rate / frame_rate)
-                num_frames = self.__components.audio['waveform'].shape[2] // samples_per_frame
-                for i in range(num_frames):
-                    start = i * samples_per_frame
-                    end = start + samples_per_frame
-                    # TODO(Feature) - Add support for stereo audio
-                    chunk = (
-                        self.__components.audio["waveform"][0, 0, start:end]
-                        .unsqueeze(0)
-                        .contiguous()
-                        .numpy()
-                    )
-                    audio_frame = av.AudioFrame.from_ndarray(chunk, format='fltp', layout='mono')
-                    audio_frame.sample_rate = audio_sample_rate
-                    audio_frame.pts = i * samples_per_frame
-                    for packet in audio_stream.encode(audio_frame):
-                        output.mux(packet)
-
-                # Flush audio
-                for packet in audio_stream.encode(None):
-                    output.mux(packet)
-
+# This file only exists for backwards compatibility.
+from comfy_api.latest._input_impl.video_types import *  # noqa: F403
--- a/comfy_api/internal/init.py
+++ b/comfy_api/internal/init.py
@@ -0,0 +1,150 @@
+# Internal infrastructure for ComfyAPI
+from .api_registry import (
+    ComfyAPIBase as ComfyAPIBase,
+    ComfyAPIWithVersion as ComfyAPIWithVersion,
+    register_versions as register_versions,
+    get_all_versions as get_all_versions,
+)
+
+import asyncio
+from dataclasses import asdict
+from typing import Callable, Optional
+
+
+def first_real_override(cls: type, name: str, *, base: type=None) -> Optional[Callable]:
+    """Return the *callable* override of `name` visible on `cls`, or None if every
+    implementation up to (and including) `base` is the placeholder defined on `base`.
+
+    If base is not provided, it will assume cls has a GET_BASE_CLASS
+    """
+    if base is None:
+        if not hasattr(cls, "GET_BASE_CLASS"):
+            raise ValueError("base is required if cls does not have a GET_BASE_CLASS; is this a valid ComfyNode subclass?")
+        base = cls.GET_BASE_CLASS()
+    base_attr = getattr(base, name, None)
+    if base_attr is None:
+        return None
+    base_func = base_attr.__func__
+    for c in cls.mro():                       # NodeB, NodeA, ComfyNode, object …
+        if c is base:                         # reached the placeholder – we're done
+            break
+        if name in c.__dict__:                # first class that *defines* the attr
+            func = getattr(c, name).__func__
+            if func is not base_func:         # real override
+                return getattr(cls, name)     # bound to *cls*
+    return None
+
+
+class _ComfyNodeInternal:
+    """Class that all V3-based APIs inherit from for ComfyNode.
+
+    This is intended to only be referenced within execution.py, as it has to handle all V3 APIs going forward."""
+    @classmethod
+    def GET_NODE_INFO_V1(cls):
+        ...
+
+
+class _NodeOutputInternal:
+    """Class that all V3-based APIs inherit from for NodeOutput.
+
+    This is intended to only be referenced within execution.py, as it has to handle all V3 APIs going forward."""
+    ...
+
+
+def as_pruned_dict(dataclass_obj):
+    '''Return dict of dataclass object with pruned None values.'''
+    return prune_dict(asdict(dataclass_obj))
+
+def prune_dict(d: dict):
+    return {k: v for k,v in d.items() if v is not None}
+
+
+def is_class(obj):
+    '''
+    Returns True if is a class type.
+    Returns False if is a class instance.
+    '''
+    return isinstance(obj, type)
+
+
+def copy_class(cls: type) -> type:
+    '''
+    Copy a class and its attributes.
+    '''
+    if cls is None:
+        return None
+    cls_dict = {
+            k: v for k, v in cls.__dict__.items()
+            if k not in ('__dict__', '__weakref__', '__module__', '__doc__')
+        }
+    # new class
+    new_cls = type(
+        cls.__name__,
+        (cls,),
+        cls_dict
+    )
+    # metadata preservation
+    new_cls.__module__ = cls.__module__
+    new_cls.__doc__ = cls.__doc__
+    return new_cls
+
+
+class classproperty(object):
+    def __init__(self, f):
+        self.f = f
+    def __get__(self, obj, owner):
+        return self.f(owner)
+
+
+# NOTE: this was ai generated and validated by hand
+def shallow_clone_class(cls, new_name=None):
+    '''
+    Shallow clone a class while preserving super() functionality.
+    '''
+    new_name = new_name or f"{cls.__name__}Clone"
+    # Include the original class in the bases to maintain proper inheritance
+    new_bases = (cls,) + cls.__bases__
+    return type(new_name, new_bases, dict(cls.__dict__))
+
+# NOTE: this was ai generated and validated by hand
+def lock_class(cls):
+    '''
+    Lock a class so that its top-levelattributes cannot be modified.
+    '''
+    # Locked instance __setattr__
+    def locked_instance_setattr(self, name, value):
+        raise AttributeError(
+            f"Cannot set attribute '{name}' on immutable instance of {type(self).__name__}"
+        )
+    # Locked metaclass
+    class LockedMeta(type(cls)):
+        def __setattr__(cls_, name, value):
+            raise AttributeError(
+                f"Cannot modify class attribute '{name}' on locked class '{cls_.__name__}'"
+            )
+    # Rebuild class with locked behavior
+    locked_dict = dict(cls.__dict__)
+    locked_dict['__setattr__'] = locked_instance_setattr
+
+    return LockedMeta(cls.__name__, cls.__bases__, locked_dict)
+
+
+def make_locked_method_func(type_obj, func, class_clone):
+    """
+    Returns a function that, when called with **inputs, will execute:
+    getattr(type_obj, func).__func__(lock_class(class_clone), **inputs)
+
+    Supports both synchronous and asynchronous methods.
+    """
+    locked_class = lock_class(class_clone)
+    method = getattr(type_obj, func).__func__
+
+    # Check if the original method is async
+    if asyncio.iscoroutinefunction(method):
+        async def wrapped_async_func(**inputs):
+            return await method(locked_class, **inputs)
+        return wrapped_async_func
+    else:
+        def wrapped_func(**inputs):
+            return method(locked_class, **inputs)
+        return wrapped_func
--- a/comfy_api/internal/api_registry.py
+++ b/comfy_api/internal/api_registry.py
@@ -0,0 +1,39 @@
+from typing import Type, List, NamedTuple
+from comfy_api.internal.singleton import ProxiedSingleton
+from packaging import version as packaging_version
+
+
+class ComfyAPIBase(ProxiedSingleton):
+    def __init__(self):
+        pass
+
+
+class ComfyAPIWithVersion(NamedTuple):
+    version: str
+    api_class: Type[ComfyAPIBase]
+
+
+def parse_version(version_str: str) -> packaging_version.Version:
+    """
+    Parses a version string into a packaging_version.Version object.
+    Raises ValueError if the version string is invalid.
+    """
+    if version_str == "latest":
+        return packaging_version.parse("9999999.9999999.9999999")
+    return packaging_version.parse(version_str)
+
+
+registered_versions: List[ComfyAPIWithVersion] = []
+
+
+def register_versions(versions: List[ComfyAPIWithVersion]):
+    versions.sort(key=lambda x: parse_version(x.version))
+    global registered_versions
+    registered_versions = versions
+
+
+def get_all_versions() -> List[ComfyAPIWithVersion]:
+    """
+    Returns a list of all registered ComfyAPI versions.
+    """
+    return registered_versions
--- a/comfy_api/internal/async_to_sync.py
+++ b/comfy_api/internal/async_to_sync.py
@@ -0,0 +1,987 @@
+import asyncio
+import concurrent.futures
+import contextvars
+import functools
+import inspect
+import logging
+import os
+import textwrap
+import threading
+from enum import Enum
+from typing import Optional, Type, get_origin, get_args
+
+
+class TypeTracker:
+    """Tracks types discovered during stub generation for automatic import generation."""
+
+    def __init__(self):
+        self.discovered_types = {}  # type_name -> (module, qualname)
+        self.builtin_types = {
+            "Any",
+            "Dict",
+            "List",
+            "Optional",
+            "Tuple",
+            "Union",
+            "Set",
+            "Sequence",
+            "cast",
+            "NamedTuple",
+            "str",
+            "int",
+            "float",
+            "bool",
+            "None",
+            "bytes",
+            "object",
+            "type",
+            "dict",
+            "list",
+            "tuple",
+            "set",
+        }
+        self.already_imported = (
+            set()
+        )  # Track types already imported to avoid duplicates
+
+    def track_type(self, annotation):
+        """Track a type annotation and record its module/import info."""
+        if annotation is None or annotation is type(None):
+            return
+
+        # Skip builtins and typing module types we already import
+        type_name = getattr(annotation, "__name__", None)
+        if type_name and (
+            type_name in self.builtin_types or type_name in self.already_imported
+        ):
+            return
+
+        # Get module and qualname
+        module = getattr(annotation, "__module__", None)
+        qualname = getattr(annotation, "__qualname__", type_name or "")
+
+        # Skip types from typing module (they're already imported)
+        if module == "typing":
+            return
+
+        # Skip UnionType and GenericAlias from types module as they're handled specially
+        if module == "types" and type_name in ("UnionType", "GenericAlias"):
+            return
+
+        if module and module not in ["builtins", "__main__"]:
+            # Store the type info
+            if type_name:
+                self.discovered_types[type_name] = (module, qualname)
+
+    def get_imports(self, main_module_name: str) -> list[str]:
+        """Generate import statements for all discovered types."""
+        imports = []
+        imports_by_module = {}
+
+        for type_name, (module, qualname) in sorted(self.discovered_types.items()):
+            # Skip types from the main module (they're already imported)
+            if main_module_name and module == main_module_name:
+                continue
+
+            if module not in imports_by_module:
+                imports_by_module[module] = []
+            if type_name not in imports_by_module[module]:  # Avoid duplicates
+                imports_by_module[module].append(type_name)
+
+        # Generate import statements
+        for module, types in sorted(imports_by_module.items()):
+            if len(types) == 1:
+                imports.append(f"from {module} import {types[0]}")
+            else:
+                imports.append(f"from {module} import {', '.join(sorted(set(types)))}")
+
+        return imports
+
+
+class AsyncToSyncConverter:
+    """
+    Provides utilities to convert async classes to sync classes with proper type hints.
+    """
+
+    _thread_pool: Optional[concurrent.futures.ThreadPoolExecutor] = None
+    _thread_pool_lock = threading.Lock()
+    _thread_pool_initialized = False
+
+    @classmethod
+    def get_thread_pool(cls, max_workers=None) -> concurrent.futures.ThreadPoolExecutor:
+        """Get or create the shared thread pool with proper thread-safe initialization."""
+        # Fast path - check if already initialized without acquiring lock
+        if cls._thread_pool_initialized:
+            assert cls._thread_pool is not None, "Thread pool should be initialized"
+            return cls._thread_pool
+
+        # Slow path - acquire lock and create pool if needed
+        with cls._thread_pool_lock:
+            if not cls._thread_pool_initialized:
+                cls._thread_pool = concurrent.futures.ThreadPoolExecutor(
+                    max_workers=max_workers, thread_name_prefix="async_to_sync_"
+                )
+                cls._thread_pool_initialized = True
+
+        # This should never be None at this point, but add assertion for type checker
+        assert cls._thread_pool is not None
+        return cls._thread_pool
+
+    @classmethod
+    def run_async_in_thread(cls, coro_func, *args, **kwargs):
+        """
+        Run an async function in a separate thread from the thread pool.
+        Blocks until the async function completes.
+        Properly propagates contextvars between threads and manages event loops.
+        """
+        # Capture current context - this includes all context variables
+        context = contextvars.copy_context()
+
+        # Store the result and any exception that occurs
+        result_container: dict = {"result": None, "exception": None}
+
+        # Function that runs in the thread pool
+        def run_in_thread():
+            # Create new event loop for this thread
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+
+            try:
+                # Create the coroutine within the context
+                async def run_with_context():
+                    # The coroutine function might access context variables
+                    return await coro_func(*args, **kwargs)
+
+                # Run the coroutine with the captured context
+                # This ensures all context variables are available in the async function
+                result = context.run(loop.run_until_complete, run_with_context())
+                result_container["result"] = result
+            except Exception as e:
+                # Store the exception to re-raise in the calling thread
+                result_container["exception"] = e
+            finally:
+                # Ensure event loop is properly closed to prevent warnings
+                try:
+                    # Cancel any remaining tasks
+                    pending = asyncio.all_tasks(loop)
+                    for task in pending:
+                        task.cancel()
+
+                    # Run the loop briefly to handle cancellations
+                    if pending:
+                        loop.run_until_complete(
+                            asyncio.gather(*pending, return_exceptions=True)
+                        )
+                except Exception:
+                    pass  # Ignore errors during cleanup
+
+                # Close the event loop
+                loop.close()
+
+                # Clear the event loop from the thread
+                asyncio.set_event_loop(None)
+
+        # Submit to thread pool and wait for result
+        thread_pool = cls.get_thread_pool()
+        future = thread_pool.submit(run_in_thread)
+        future.result()  # Wait for completion
+
+        # Re-raise any exception that occurred in the thread
+        if result_container["exception"] is not None:
+            raise result_container["exception"]
+
+        return result_container["result"]
+
+    @classmethod
+    def create_sync_class(cls, async_class: Type, thread_pool_size=10) -> Type:
+        """
+        Creates a new class with synchronous versions of all async methods.
+
+        Args:
+            async_class: The async class to convert
+            thread_pool_size: Size of thread pool to use
+
+        Returns:
+            A new class with sync versions of all async methods
+        """
+        sync_class_name = "ComfyAPISyncStub"
+        cls.get_thread_pool(thread_pool_size)
+
+        # Create a proper class with docstrings and proper base classes
+        sync_class_dict = {
+            "__doc__": async_class.__doc__,
+            "__module__": async_class.__module__,
+            "__qualname__": sync_class_name,
+            "__orig_class__": async_class,  # Store original class for typing references
+        }
+
+        # Create __init__ method
+        def __init__(self, *args, **kwargs):
+            self._async_instance = async_class(*args, **kwargs)
+
+            # Handle annotated class attributes (like execution: Execution)
+            # Get all annotations from the class hierarchy
+            all_annotations = {}
+            for base_class in reversed(inspect.getmro(async_class)):
+                if hasattr(base_class, "__annotations__"):
+                    all_annotations.update(base_class.__annotations__)
+
+            # For each annotated attribute, check if it needs to be created or wrapped
+            for attr_name, attr_type in all_annotations.items():
+                if hasattr(self._async_instance, attr_name):
+                    # Attribute exists on the instance
+                    attr = getattr(self._async_instance, attr_name)
+                    # Check if this attribute needs a sync wrapper
+                    if hasattr(attr, "__class__"):
+                        from comfy_api.internal.singleton import ProxiedSingleton
+
+                        if isinstance(attr, ProxiedSingleton):
+                            # Create a sync version of this attribute
+                            try:
+                                sync_attr_class = cls.create_sync_class(attr.__class__)
+                                # Create instance of the sync wrapper with the async instance
+                                sync_attr = object.__new__(sync_attr_class)  # type: ignore
+                                sync_attr._async_instance = attr
+                                setattr(self, attr_name, sync_attr)
+                            except Exception:
+                                # If we can't create a sync version, keep the original
+                                setattr(self, attr_name, attr)
+                        else:
+                            # Not async, just copy the reference
+                            setattr(self, attr_name, attr)
+                else:
+                    # Attribute doesn't exist, but is annotated - create it
+                    # This handles cases like execution: Execution
+                    if isinstance(attr_type, type):
+                        # Check if the type is defined as an inner class
+                        if hasattr(async_class, attr_type.__name__):
+                            inner_class = getattr(async_class, attr_type.__name__)
+                            from comfy_api.internal.singleton import ProxiedSingleton
+
+                            # Create an instance of the inner class
+                            try:
+                                # For ProxiedSingleton classes, get or create the singleton instance
+                                if issubclass(inner_class, ProxiedSingleton):
+                                    async_instance = inner_class.get_instance()
+                                else:
+                                    async_instance = inner_class()
+
+                                # Create sync wrapper
+                                sync_attr_class = cls.create_sync_class(inner_class)
+                                sync_attr = object.__new__(sync_attr_class)  # type: ignore
+                                sync_attr._async_instance = async_instance
+                                setattr(self, attr_name, sync_attr)
+                                # Also set on the async instance for consistency
+                                setattr(self._async_instance, attr_name, async_instance)
+                            except Exception as e:
+                                logging.warning(
+                                    f"Failed to create instance for {attr_name}: {e}"
+                                )
+
+            # Handle other instance attributes that might not be annotated
+            for name, attr in inspect.getmembers(self._async_instance):
+                if name.startswith("_") or hasattr(self, name):
+                    continue
+
+                # If attribute is an instance of a class, and that class is defined in the original class
+                # we need to check if it needs a sync wrapper
+                if isinstance(attr, object) and not isinstance(
+                    attr, (str, int, float, bool, list, dict, tuple)
+                ):
+                    from comfy_api.internal.singleton import ProxiedSingleton
+
+                    if isinstance(attr, ProxiedSingleton):
+                        # Create a sync version of this nested class
+                        try:
+                            sync_attr_class = cls.create_sync_class(attr.__class__)
+                            # Create instance of the sync wrapper with the async instance
+                            sync_attr = object.__new__(sync_attr_class)  # type: ignore
+                            sync_attr._async_instance = attr
+                            setattr(self, name, sync_attr)
+                        except Exception:
+                            # If we can't create a sync version, keep the original
+                            setattr(self, name, attr)
+
+        sync_class_dict["__init__"] = __init__
+
+        # Process methods from the async class
+        for name, method in inspect.getmembers(
+            async_class, predicate=inspect.isfunction
+        ):
+            if name.startswith("_"):
+                continue
+
+            # Extract the actual return type from a coroutine
+            if inspect.iscoroutinefunction(method):
+                # Create sync version of async method with proper signature
+                @functools.wraps(method)
+                def sync_method(self, *args, _method_name=name, **kwargs):
+                    async_method = getattr(self._async_instance, _method_name)
+                    return AsyncToSyncConverter.run_async_in_thread(
+                        async_method, *args, **kwargs
+                    )
+
+                # Add to the class dict
+                sync_class_dict[name] = sync_method
+            else:
+                # For regular methods, create a proxy method
+                @functools.wraps(method)
+                def proxy_method(self, *args, _method_name=name, **kwargs):
+                    method = getattr(self._async_instance, _method_name)
+                    return method(*args, **kwargs)
+
+                # Add to the class dict
+                sync_class_dict[name] = proxy_method
+
+        # Handle property access
+        for name, prop in inspect.getmembers(
+            async_class, lambda x: isinstance(x, property)
+        ):
+
+            def make_property(name, prop_obj):
+                def getter(self):
+                    value = getattr(self._async_instance, name)
+                    if inspect.iscoroutinefunction(value):
+
+                        def sync_fn(*args, **kwargs):
+                            return AsyncToSyncConverter.run_async_in_thread(
+                                value, *args, **kwargs
+                            )
+
+                        return sync_fn
+                    return value
+
+                def setter(self, value):
+                    setattr(self._async_instance, name, value)
+
+                return property(getter, setter if prop_obj.fset else None)
+
+            sync_class_dict[name] = make_property(name, prop)
+
+        # Create the class
+        sync_class = type(sync_class_name, (object,), sync_class_dict)
+
+        return sync_class
+
+    @classmethod
+    def _format_type_annotation(
+        cls, annotation, type_tracker: Optional[TypeTracker] = None
+    ) -> str:
+        """Convert a type annotation to its string representation for stub files."""
+        if (
+            annotation is inspect.Parameter.empty
+            or annotation is inspect.Signature.empty
+        ):
+            return "Any"
+
+        # Handle None type
+        if annotation is type(None):
+            return "None"
+
+        # Track the type if we have a tracker
+        if type_tracker:
+            type_tracker.track_type(annotation)
+
+        # Try using typing.get_origin/get_args for Python 3.8+
+        try:
+            origin = get_origin(annotation)
+            args = get_args(annotation)
+
+            if origin is not None:
+                # Track the origin type
+                if type_tracker:
+                    type_tracker.track_type(origin)
+
+                # Get the origin name
+                origin_name = getattr(origin, "__name__", str(origin))
+                if "." in origin_name:
+                    origin_name = origin_name.split(".")[-1]
+
+                # Special handling for types.UnionType (Python 3.10+ pipe operator)
+                # Convert to old-style Union for compatibility
+                if str(origin) == "<class 'types.UnionType'>" or origin_name == "UnionType":
+                    origin_name = "Union"
+
+                # Format arguments recursively
+                if args:
+                    formatted_args = []
+                    for arg in args:
+                        # Track each type in the union
+                        if type_tracker:
+                            type_tracker.track_type(arg)
+                        formatted_args.append(cls._format_type_annotation(arg, type_tracker))
+                    return f"{origin_name}[{', '.join(formatted_args)}]"
+                else:
+                    return origin_name
+        except (AttributeError, TypeError):
+            # Fallback for older Python versions or non-generic types
+            pass
+
+        # Handle generic types the old way for compatibility
+        if hasattr(annotation, "__origin__") and hasattr(annotation, "__args__"):
+            origin = annotation.__origin__
+            origin_name = (
+                origin.__name__
+                if hasattr(origin, "__name__")
+                else str(origin).split("'")[1]
+            )
+
+            # Format each type argument
+            args = []
+            for arg in annotation.__args__:
+                args.append(cls._format_type_annotation(arg, type_tracker))
+
+            return f"{origin_name}[{', '.join(args)}]"
+
+        # Handle regular types with __name__
+        if hasattr(annotation, "__name__"):
+            return annotation.__name__
+
+        # Handle special module types (like types from typing module)
+        if hasattr(annotation, "__module__") and hasattr(annotation, "__qualname__"):
+            # For types like typing.Literal, typing.TypedDict, etc.
+            return annotation.__qualname__
+
+        # Last resort: string conversion with cleanup
+        type_str = str(annotation)
+
+        # Clean up common patterns more robustly
+        if type_str.startswith("<class '") and type_str.endswith("'>"):
+            type_str = type_str[8:-2]  # Remove "<class '" and "'>"
+
+        # Remove module prefixes for common modules
+        for prefix in ["typing.", "builtins.", "types."]:
+            if type_str.startswith(prefix):
+                type_str = type_str[len(prefix) :]
+
+        # Handle special cases
+        if type_str in ("_empty", "inspect._empty"):
+            return "None"
+
+        # Fix NoneType (this should rarely be needed now)
+        if type_str == "NoneType":
+            return "None"
+
+        return type_str
+
+    @classmethod
+    def _extract_coroutine_return_type(cls, annotation):
+        """Extract the actual return type from a Coroutine annotation."""
+        if hasattr(annotation, "__args__") and len(annotation.__args__) > 2:
+            # Coroutine[Any, Any, ReturnType] -> extract ReturnType
+            return annotation.__args__[2]
+        return annotation
+
+    @classmethod
+    def _format_parameter_default(cls, default_value) -> str:
+        """Format a parameter's default value for stub files."""
+        if default_value is inspect.Parameter.empty:
+            return ""
+        elif default_value is None:
+            return " = None"
+        elif isinstance(default_value, bool):
+            return f" = {default_value}"
+        elif default_value == {}:
+            return " = {}"
+        elif default_value == []:
+            return " = []"
+        else:
+            return f" = {default_value}"
+
+    @classmethod
+    def _format_method_parameters(
+        cls,
+        sig: inspect.Signature,
+        skip_self: bool = True,
+        type_hints: Optional[dict] = None,
+        type_tracker: Optional[TypeTracker] = None,
+    ) -> str:
+        """Format method parameters for stub files."""
+        params = []
+        if type_hints is None:
+            type_hints = {}
+
+        for i, (param_name, param) in enumerate(sig.parameters.items()):
+            if i == 0 and param_name == "self" and skip_self:
+                params.append("self")
+            else:
+                # Get type annotation from type hints if available, otherwise from signature
+                annotation = type_hints.get(param_name, param.annotation)
+                type_str = cls._format_type_annotation(annotation, type_tracker)
+
+                # Get default value
+                default_str = cls._format_parameter_default(param.default)
+
+                # Combine parameter parts
+                if annotation is inspect.Parameter.empty:
+                    params.append(f"{param_name}: Any{default_str}")
+                else:
+                    params.append(f"{param_name}: {type_str}{default_str}")
+
+        return ", ".join(params)
+
+    @classmethod
+    def _generate_method_signature(
+        cls,
+        method_name: str,
+        method,
+        is_async: bool = False,
+        type_tracker: Optional[TypeTracker] = None,
+    ) -> str:
+        """Generate a complete method signature for stub files."""
+        sig = inspect.signature(method)
+
+        # Try to get evaluated type hints to resolve string annotations
+        try:
+            from typing import get_type_hints
+            type_hints = get_type_hints(method)
+        except Exception:
+            # Fallback to empty dict if we can't get type hints
+            type_hints = {}
+
+        # For async methods, extract the actual return type
+        return_annotation = type_hints.get('return', sig.return_annotation)
+        if is_async and inspect.iscoroutinefunction(method):
+            return_annotation = cls._extract_coroutine_return_type(return_annotation)
+
+        # Format parameters with type hints
+        params_str = cls._format_method_parameters(sig, type_hints=type_hints, type_tracker=type_tracker)
+
+        # Format return type
+        return_type = cls._format_type_annotation(return_annotation, type_tracker)
+        if return_annotation is inspect.Signature.empty:
+            return_type = "None"
+
+        return f"def {method_name}({params_str}) -> {return_type}: ..."
+
+    @classmethod
+    def _generate_imports(
+        cls, async_class: Type, type_tracker: TypeTracker
+    ) -> list[str]:
+        """Generate import statements for the stub file."""
+        imports = []
+
+        # Add standard typing imports
+        imports.append(
+            "from typing import Any, Dict, List, Optional, Tuple, Union, Set, Sequence, cast, NamedTuple"
+        )
+
+        # Add imports from the original module
+        if async_class.__module__ != "builtins":
+            module = inspect.getmodule(async_class)
+            additional_types = []
+
+            if module:
+                # Check if module has __all__ defined
+                module_all = getattr(module, "__all__", None)
+
+                for name, obj in sorted(inspect.getmembers(module)):
+                    if isinstance(obj, type):
+                        # Skip if __all__ is defined and this name isn't in it
+                        # unless it's already been tracked as used in type annotations
+                        if module_all is not None and name not in module_all:
+                            # Check if this type was actually used in annotations
+                            if name not in type_tracker.discovered_types:
+                                continue
+
+                        # Check for NamedTuple
+                        if issubclass(obj, tuple) and hasattr(obj, "_fields"):
+                            additional_types.append(name)
+                            # Mark as already imported
+                            type_tracker.already_imported.add(name)
+                        # Check for Enum
+                        elif issubclass(obj, Enum) and name != "Enum":
+                            additional_types.append(name)
+                            # Mark as already imported
+                            type_tracker.already_imported.add(name)
+
+            if additional_types:
+                type_imports = ", ".join([async_class.__name__] + additional_types)
+                imports.append(f"from {async_class.__module__} import {type_imports}")
+            else:
+                imports.append(
+                    f"from {async_class.__module__} import {async_class.__name__}"
+                )
+
+        # Add imports for all discovered types
+        # Pass the main module name to avoid duplicate imports
+        imports.extend(
+            type_tracker.get_imports(main_module_name=async_class.__module__)
+        )
+
+        # Add base module import if needed
+        if hasattr(inspect.getmodule(async_class), "__name__"):
+            module_name = inspect.getmodule(async_class).__name__
+            if "." in module_name:
+                base_module = module_name.split(".")[0]
+                # Only add if not already importing from it
+                if not any(imp.startswith(f"from {base_module}") for imp in imports):
+                    imports.append(f"import {base_module}")
+
+        return imports
+
+    @classmethod
+    def _get_class_attributes(cls, async_class: Type) -> list[tuple[str, Type]]:
+        """Extract class attributes that are classes themselves."""
+        class_attributes = []
+
+        # Look for class attributes that are classes
+        for name, attr in sorted(inspect.getmembers(async_class)):
+            if isinstance(attr, type) and not name.startswith("_"):
+                class_attributes.append((name, attr))
+            elif (
+                hasattr(async_class, "__annotations__")
+                and name in async_class.__annotations__
+            ):
+                annotation = async_class.__annotations__[name]
+                if isinstance(annotation, type):
+                    class_attributes.append((name, annotation))
+
+        return class_attributes
+
+    @classmethod
+    def _generate_inner_class_stub(
+        cls,
+        name: str,
+        attr: Type,
+        indent: str = "    ",
+        type_tracker: Optional[TypeTracker] = None,
+    ) -> list[str]:
+        """Generate stub for an inner class."""
+        stub_lines = []
+        stub_lines.append(f"{indent}class {name}Sync:")
+
+        # Add docstring if available
+        if hasattr(attr, "__doc__") and attr.__doc__:
+            stub_lines.extend(
+                cls._format_docstring_for_stub(attr.__doc__, f"{indent}    ")
+            )
+
+        # Add __init__ if it exists
+        if hasattr(attr, "__init__"):
+            try:
+                init_method = getattr(attr, "__init__")
+                init_sig = inspect.signature(init_method)
+
+                # Try to get type hints
+                try:
+                    from typing import get_type_hints
+                    init_hints = get_type_hints(init_method)
+                except Exception:
+                    init_hints = {}
+
+                # Format parameters
+                params_str = cls._format_method_parameters(
+                    init_sig, type_hints=init_hints, type_tracker=type_tracker
+                )
+                # Add __init__ docstring if available (before the method)
+                if hasattr(init_method, "__doc__") and init_method.__doc__:
+                    stub_lines.extend(
+                        cls._format_docstring_for_stub(
+                            init_method.__doc__, f"{indent}    "
+                        )
+                    )
+                stub_lines.append(
+                    f"{indent}    def __init__({params_str}) -> None: ..."
+                )
+            except (ValueError, TypeError):
+                stub_lines.append(
+                    f"{indent}    def __init__(self, *args, **kwargs) -> None: ..."
+                )
+
+        # Add methods to the inner class
+        has_methods = False
+        for method_name, method in sorted(
+            inspect.getmembers(attr, predicate=inspect.isfunction)
+        ):
+            if method_name.startswith("_"):
+                continue
+
+            has_methods = True
+            try:
+                # Add method docstring if available (before the method signature)
+                if method.__doc__:
+                    stub_lines.extend(
+                        cls._format_docstring_for_stub(method.__doc__, f"{indent}    ")
+                    )
+
+                method_sig = cls._generate_method_signature(
+                    method_name, method, is_async=True, type_tracker=type_tracker
+                )
+                stub_lines.append(f"{indent}    {method_sig}")
+            except (ValueError, TypeError):
+                stub_lines.append(
+                    f"{indent}    def {method_name}(self, *args, **kwargs): ..."
+                )
+
+        if not has_methods:
+            stub_lines.append(f"{indent}    pass")
+
+        return stub_lines
+
+    @classmethod
+    def _format_docstring_for_stub(
+        cls, docstring: str, indent: str = "    "
+    ) -> list[str]:
+        """Format a docstring for inclusion in a stub file with proper indentation."""
+        if not docstring:
+            return []
+
+        # First, dedent the docstring to remove any existing indentation
+        dedented = textwrap.dedent(docstring).strip()
+
+        # Split into lines
+        lines = dedented.split("\n")
+
+        # Build the properly indented docstring
+        result = []
+        result.append(f'{indent}"""')
+
+        for line in lines:
+            if line.strip():  # Non-empty line
+                result.append(f"{indent}{line}")
+            else:  # Empty line
+                result.append("")
+
+        result.append(f'{indent}"""')
+        return result
+
+    @classmethod
+    def _post_process_stub_content(cls, stub_content: list[str]) -> list[str]:
+        """Post-process stub content to fix any remaining issues."""
+        processed = []
+
+        for line in stub_content:
+            # Skip processing imports
+            if line.startswith(("from ", "import ")):
+                processed.append(line)
+                continue
+
+            # Fix method signatures missing return types
+            if (
+                line.strip().startswith("def ")
+                and line.strip().endswith(": ...")
+                and ") -> " not in line
+            ):
+                # Add -> None for methods without return annotation
+                line = line.replace(": ...", " -> None: ...")
+
+            processed.append(line)
+
+        return processed
+
+    @classmethod
+    def generate_stub_file(cls, async_class: Type, sync_class: Type) -> None:
+        """
+        Generate a .pyi stub file for the sync class to help IDEs with type checking.
+        """
+        try:
+            # Only generate stub if we can determine module path
+            if async_class.__module__ == "__main__":
+                return
+
+            module = inspect.getmodule(async_class)
+            if not module:
+                return
+
+            module_path = module.__file__
+            if not module_path:
+                return
+
+            # Create stub file path in a 'generated' subdirectory
+            module_dir = os.path.dirname(module_path)
+            stub_dir = os.path.join(module_dir, "generated")
+
+            # Ensure the generated directory exists
+            os.makedirs(stub_dir, exist_ok=True)
+
+            module_name = os.path.basename(module_path)
+            if module_name.endswith(".py"):
+                module_name = module_name[:-3]
+
+            sync_stub_path = os.path.join(stub_dir, f"{sync_class.__name__}.pyi")
+
+            # Create a type tracker for this stub generation
+            type_tracker = TypeTracker()
+
+            stub_content = []
+
+            # We'll generate imports after processing all methods to capture all types
+            # Leave a placeholder for imports
+            imports_placeholder_index = len(stub_content)
+            stub_content.append("")  # Will be replaced with imports later
+
+            # Class definition
+            stub_content.append(f"class {sync_class.__name__}:")
+
+            # Docstring
+            if async_class.__doc__:
+                stub_content.extend(
+                    cls._format_docstring_for_stub(async_class.__doc__, "    ")
+                )
+
+            # Generate __init__
+            try:
+                init_method = async_class.__init__
+                init_signature = inspect.signature(init_method)
+
+                # Try to get type hints for __init__
+                try:
+                    from typing import get_type_hints
+                    init_hints = get_type_hints(init_method)
+                except Exception:
+                    init_hints = {}
+
+                # Format parameters
+                params_str = cls._format_method_parameters(
+                    init_signature, type_hints=init_hints, type_tracker=type_tracker
+                )
+                # Add __init__ docstring if available (before the method)
+                if hasattr(init_method, "__doc__") and init_method.__doc__:
+                    stub_content.extend(
+                        cls._format_docstring_for_stub(init_method.__doc__, "    ")
+                    )
+                stub_content.append(f"    def __init__({params_str}) -> None: ...")
+            except (ValueError, TypeError):
+                stub_content.append(
+                    "    def __init__(self, *args, **kwargs) -> None: ..."
+                )
+
+            stub_content.append("")  # Add newline after __init__
+
+            # Get class attributes
+            class_attributes = cls._get_class_attributes(async_class)
+
+            # Generate inner classes
+            for name, attr in class_attributes:
+                inner_class_stub = cls._generate_inner_class_stub(
+                    name, attr, type_tracker=type_tracker
+                )
+                stub_content.extend(inner_class_stub)
+                stub_content.append("")  # Add newline after the inner class
+
+            # Add methods to the main class
+            processed_methods = set()  # Keep track of methods we've processed
+            for name, method in sorted(
+                inspect.getmembers(async_class, predicate=inspect.isfunction)
+            ):
+                if name.startswith("_") or name in processed_methods:
+                    continue
+
+                processed_methods.add(name)
+
+                try:
+                    method_sig = cls._generate_method_signature(
+                        name, method, is_async=True, type_tracker=type_tracker
+                    )
+
+                    # Add docstring if available (before the method signature for proper formatting)
+                    if method.__doc__:
+                        stub_content.extend(
+                            cls._format_docstring_for_stub(method.__doc__, "    ")
+                        )
+
+                    stub_content.append(f"    {method_sig}")
+
+                    stub_content.append("")  # Add newline after each method
+
+                except (ValueError, TypeError):
+                    # If we can't get the signature, just add a simple stub
+                    stub_content.append(f"    def {name}(self, *args, **kwargs): ...")
+                    stub_content.append("")  # Add newline
+
+            # Add properties
+            for name, prop in sorted(
+                inspect.getmembers(async_class, lambda x: isinstance(x, property))
+            ):
+                stub_content.append("    @property")
+                stub_content.append(f"    def {name}(self) -> Any: ...")
+                if prop.fset:
+                    stub_content.append(f"    @{name}.setter")
+                    stub_content.append(
+                        f"    def {name}(self, value: Any) -> None: ..."
+                    )
+                stub_content.append("")  # Add newline after each property
+
+            # Add placeholders for the nested class instances
+            # Check the actual attribute names from class annotations and attributes
+            attribute_mappings = {}
+
+            # First check annotations for typed attributes (including from parent classes)
+            # Collect all annotations from the class hierarchy
+            all_annotations = {}
+            for base_class in reversed(inspect.getmro(async_class)):
+                if hasattr(base_class, "__annotations__"):
+                    all_annotations.update(base_class.__annotations__)
+
+            for attr_name, attr_type in sorted(all_annotations.items()):
+                for class_name, class_type in class_attributes:
+                    # If the class type matches the annotated type
+                    if (
+                        attr_type == class_type
+                        or (hasattr(attr_type, "__name__") and attr_type.__name__ == class_name)
+                        or (isinstance(attr_type, str) and attr_type == class_name)
+                    ):
+                        attribute_mappings[class_name] = attr_name
+
+            # Remove the extra checking - annotations should be sufficient
+
+            # Add the attribute declarations with proper names
+            for class_name, class_type in class_attributes:
+                # Check if there's a mapping from annotation
+                attr_name = attribute_mappings.get(class_name, class_name)
+                # Use the annotation name if it exists, even if the attribute doesn't exist yet
+                # This is because the attribute might be created at runtime
+                stub_content.append(f"    {attr_name}: {class_name}Sync")
+
+            stub_content.append("")  # Add a final newline
+
+            # Now generate imports with all discovered types
+            imports = cls._generate_imports(async_class, type_tracker)
+
+            # Deduplicate imports while preserving order
+            seen = set()
+            unique_imports = []
+            for imp in imports:
+                if imp not in seen:
+                    seen.add(imp)
+                    unique_imports.append(imp)
+                else:
+                    logging.warning(f"Duplicate import detected: {imp}")
+
+            # Replace the placeholder with actual imports
+            stub_content[imports_placeholder_index : imports_placeholder_index + 1] = (
+                unique_imports
+            )
+
+            # Post-process stub content
+            stub_content = cls._post_process_stub_content(stub_content)
+
+            # Write stub file
+            with open(sync_stub_path, "w") as f:
+                f.write("\n".join(stub_content))
+
+            logging.info(f"Generated stub file: {sync_stub_path}")
+
+        except Exception as e:
+            # If stub generation fails, log the error but don't break the main functionality
+            logging.error(
+                f"Error generating stub file for {sync_class.__name__}: {str(e)}"
+            )
+            import traceback
+
+            logging.error(traceback.format_exc())
+
+
+def create_sync_class(async_class: Type, thread_pool_size=10) -> Type:
+    """
+    Creates a sync version of an async class
+
+    Args:
+        async_class: The async class to convert
+        thread_pool_size: Size of thread pool to use
+
+    Returns:
+        A new class with sync versions of all async methods
+    """
+    return AsyncToSyncConverter.create_sync_class(async_class, thread_pool_size)
--- a/comfy_api/internal/singleton.py
+++ b/comfy_api/internal/singleton.py
@@ -0,0 +1,33 @@
+from typing import Type, TypeVar
+
+class SingletonMetaclass(type):
+    T = TypeVar("T", bound="SingletonMetaclass")
+    _instances = {}
+
+    def __call__(cls, *args, **kwargs):
+        if cls not in cls._instances:
+            cls._instances[cls] = super(SingletonMetaclass, cls).__call__(
+                *args, **kwargs
+            )
+        return cls._instances[cls]
+
+    def inject_instance(cls: Type[T], instance: T) -> None:
+        assert cls not in SingletonMetaclass._instances, (
+            "Cannot inject instance after first instantiation"
+        )
+        SingletonMetaclass._instances[cls] = instance
+
+    def get_instance(cls: Type[T], *args, **kwargs) -> T:
+        """
+        Gets the singleton instance of the class, creating it if it doesn't exist.
+        """
+        if cls not in SingletonMetaclass._instances:
+            SingletonMetaclass._instances[cls] = super(
+                SingletonMetaclass, cls
+            ).__call__(*args, **kwargs)
+        return cls._instances[cls]
+
+
+class ProxiedSingleton(object, metaclass=SingletonMetaclass):
+    def __init__(self):
+        super().__init__()
--- a/comfy_api/latest/init.py
+++ b/comfy_api/latest/init.py
@@ -0,0 +1,124 @@
+from __future__ import annotations
+
+from abc import ABC, abstractmethod
+from typing import Type, TYPE_CHECKING
+from comfy_api.internal import ComfyAPIBase
+from comfy_api.internal.singleton import ProxiedSingleton
+from comfy_api.internal.async_to_sync import create_sync_class
+from comfy_api.latest._input import ImageInput, AudioInput, MaskInput, LatentInput, VideoInput
+from comfy_api.latest._input_impl import VideoFromFile, VideoFromComponents
+from comfy_api.latest._util import VideoCodec, VideoContainer, VideoComponents
+from comfy_api.latest._io import _IO as io  #noqa: F401
+from comfy_api.latest._ui import _UI as ui  #noqa: F401
+# from comfy_api.latest._resources import _RESOURCES as resources  #noqa: F401
+from comfy_execution.utils import get_executing_context
+from comfy_execution.progress import get_progress_state, PreviewImageTuple
+from PIL import Image
+from comfy.cli_args import args
+import numpy as np
+
+
+class ComfyAPI_latest(ComfyAPIBase):
+    VERSION = "latest"
+    STABLE = False
+
+    class Execution(ProxiedSingleton):
+        async def set_progress(
+            self,
+            value: float,
+            max_value: float,
+            node_id: str | None = None,
+            preview_image: Image.Image | ImageInput | None = None,
+            ignore_size_limit: bool = False,
+        ) -> None:
+            """
+            Update the progress bar displayed in the ComfyUI interface.
+
+            This function allows custom nodes and API calls to report their progress
+            back to the user interface, providing visual feedback during long operations.
+
+            Migration from previous API: comfy.utils.PROGRESS_BAR_HOOK
+            """
+            executing_context = get_executing_context()
+            if node_id is None and executing_context is not None:
+                node_id = executing_context.node_id
+            if node_id is None:
+                raise ValueError("node_id must be provided if not in executing context")
+
+            # Convert preview_image to PreviewImageTuple if needed
+            to_display: PreviewImageTuple | Image.Image | ImageInput | None = preview_image
+            if to_display is not None:
+                # First convert to PIL Image if needed
+                if isinstance(to_display, ImageInput):
+                    # Convert ImageInput (torch.Tensor) to PIL Image
+                    # Handle tensor shape [B, H, W, C] -> get first image if batch
+                    tensor = to_display
+                    if len(tensor.shape) == 4:
+                        tensor = tensor[0]
+
+                    # Convert to numpy array and scale to 0-255
+                    image_np = (tensor.cpu().numpy() * 255).astype(np.uint8)
+                    to_display = Image.fromarray(image_np)
+
+                if isinstance(to_display, Image.Image):
+                    # Detect image format from PIL Image
+                    image_format = to_display.format if to_display.format else "JPEG"
+                    # Use None for preview_size if ignore_size_limit is True
+                    preview_size = None if ignore_size_limit else args.preview_size
+                    to_display = (image_format, to_display, preview_size)
+
+            get_progress_state().update_progress(
+                node_id=node_id,
+                value=value,
+                max_value=max_value,
+                image=to_display,
+            )
+
+    execution: Execution
+
+class ComfyExtension(ABC):
+    async def on_load(self) -> None:
+        """
+        Called when an extension is loaded.
+        This should be used to initialize any global resources neeeded by the extension.
+        """
+
+    @abstractmethod
+    async def get_node_list(self) -> list[type[io.ComfyNode]]:
+        """
+        Returns a list of nodes that this extension provides.
+        """
+
+class Input:
+    Image = ImageInput
+    Audio = AudioInput
+    Mask = MaskInput
+    Latent = LatentInput
+    Video = VideoInput
+
+class InputImpl:
+    VideoFromFile = VideoFromFile
+    VideoFromComponents = VideoFromComponents
+
+class Types:
+    VideoCodec = VideoCodec
+    VideoContainer = VideoContainer
+    VideoComponents = VideoComponents
+
+ComfyAPI = ComfyAPI_latest
+
+# Create a synchronous version of the API
+if TYPE_CHECKING:
+    import comfy_api.latest.generated.ComfyAPISyncStub  # type: ignore
+
+    ComfyAPISync: Type[comfy_api.latest.generated.ComfyAPISyncStub.ComfyAPISyncStub]
+ComfyAPISync = create_sync_class(ComfyAPI_latest)
+
+__all__ = [
+    "ComfyAPI",
+    "ComfyAPISync",
+    "Input",
+    "InputImpl",
+    "Types",
+    "ComfyExtension",
+]
--- a/comfy_api/latest/_input/init.py
+++ b/comfy_api/latest/_input/init.py
@@ -0,0 +1,10 @@
+from .basic_types import ImageInput, AudioInput, MaskInput, LatentInput
+from .video_types import VideoInput
+
+__all__ = [
+    "ImageInput",
+    "AudioInput",
+    "VideoInput",
+    "MaskInput",
+    "LatentInput",
+]
--- a/comfy_api/latest/_input/basic_types.py
+++ b/comfy_api/latest/_input/basic_types.py
@@ -0,0 +1,42 @@
+import torch
+from typing import TypedDict, List, Optional
+
+ImageInput = torch.Tensor
+"""
+An image in format [B, H, W, C] where B is the batch size, C is the number of channels,
+"""
+
+MaskInput = torch.Tensor
+"""
+A mask in format [B, H, W] where B is the batch size
+"""
+
+class AudioInput(TypedDict):
+    """
+    TypedDict representing audio input.
+    """
+
+    waveform: torch.Tensor
+    """
+    Tensor in the format [B, C, T] where B is the batch size, C is the number of channels,
+    """
+
+    sample_rate: int
+
+class LatentInput(TypedDict):
+    """
+    TypedDict representing latent input.
+    """
+
+    samples: torch.Tensor
+    """
+    Tensor in the format [B, C, H, W] where B is the batch size, C is the number of channels,
+    H is the height, and W is the width.
+    """
+
+    noise_mask: Optional[MaskInput]
+    """
+    Optional noise mask tensor in the same format as samples.
+    """
+
+    batch_index: Optional[List[int]]
--- a/comfy_api/latest/_input/video_types.py
+++ b/comfy_api/latest/_input/video_types.py
@@ -0,0 +1,85 @@
+from __future__ import annotations
+from abc import ABC, abstractmethod
+from typing import Optional, Union
+import io
+import av
+from comfy_api.util import VideoContainer, VideoCodec, VideoComponents
+
+class VideoInput(ABC):
+    """
+    Abstract base class for video input types.
+    """
+
+    @abstractmethod
+    def get_components(self) -> VideoComponents:
+        """
+        Abstract method to get the video components (images, audio, and frame rate).
+
+        Returns:
+            VideoComponents containing images, audio, and frame rate
+        """
+        pass
+
+    @abstractmethod
+    def save_to(
+        self,
+        path: str,
+        format: VideoContainer = VideoContainer.AUTO,
+        codec: VideoCodec = VideoCodec.AUTO,
+        metadata: Optional[dict] = None
+    ):
+        """
+        Abstract method to save the video input to a file.
+        """
+        pass
+
+    def get_stream_source(self) -> Union[str, io.BytesIO]:
+        """
+        Get a streamable source for the video. This allows processing without
+        loading the entire video into memory.
+
+        Returns:
+            Either a file path (str) or a BytesIO object that can be opened with av.
+
+        Default implementation creates a BytesIO buffer, but subclasses should
+        override this for better performance when possible.
+        """
+        buffer = io.BytesIO()
+        self.save_to(buffer)
+        buffer.seek(0)
+        return buffer
+
+    # Provide a default implementation, but subclasses can provide optimized versions
+    # if possible.
+    def get_dimensions(self) -> tuple[int, int]:
+        """
+        Returns the dimensions of the video input.
+
+        Returns:
+            Tuple of (width, height)
+        """
+        components = self.get_components()
+        return components.images.shape[2], components.images.shape[1]
+
+    def get_duration(self) -> float:
+        """
+        Returns the duration of the video in seconds.
+
+        Returns:
+            Duration in seconds
+        """
+        components = self.get_components()
+        frame_count = components.images.shape[0]
+        return float(frame_count / components.frame_rate)
+
+    def get_container_format(self) -> str:
+        """
+        Returns the container format of the video (e.g., 'mp4', 'mov', 'avi').
+
+        Returns:
+            Container format as string
+        """
+        # Default implementation - subclasses should override for better performance
+        source = self.get_stream_source()
+        with av.open(source, mode="r") as container:
+            return container.format.name
--- a/comfy_api/latest/_input_impl/init.py
+++ b/comfy_api/latest/_input_impl/init.py
@@ -0,0 +1,7 @@
+from .video_types import VideoFromFile, VideoFromComponents
+
+__all__ = [
+    # Implementations
+    "VideoFromFile",
+    "VideoFromComponents",
+]
--- a/comfy_api/latest/_input_impl/video_types.py
+++ b/comfy_api/latest/_input_impl/video_types.py
@@ -0,0 +1,324 @@
+from __future__ import annotations
+from av.container import InputContainer
+from av.subtitles.stream import SubtitleStream
+from fractions import Fraction
+from typing import Optional
+from comfy_api.latest._input import AudioInput, VideoInput
+import av
+import io
+import json
+import numpy as np
+import torch
+from comfy_api.latest._util import VideoContainer, VideoCodec, VideoComponents
+
+
+def container_to_output_format(container_format: str | None) -> str | None:
+    """
+    A container's `format` may be a comma-separated list of formats.
+    E.g., iso container's `format` may be `mov,mp4,m4a,3gp,3g2,mj2`.
+    However, writing to a file/stream with `av.open` requires a single format,
+    or `None` to auto-detect.
+    """
+    if not container_format:
+        return None  # Auto-detect
+
+    if "," not in container_format:
+        return container_format
+
+    formats = container_format.split(",")
+    return formats[0]
+
+
+def get_open_write_kwargs(
+    dest: str | io.BytesIO, container_format: str, to_format: str | None
+) -> dict:
+    """Get kwargs for writing a `VideoFromFile` to a file/stream with `av.open`"""
+    open_kwargs = {
+        "mode": "w",
+        # If isobmff, preserve custom metadata tags (workflow, prompt, extra_pnginfo)
+        "options": {"movflags": "use_metadata_tags"},
+    }
+
+    is_write_to_buffer = isinstance(dest, io.BytesIO)
+    if is_write_to_buffer:
+        # Set output format explicitly, since it cannot be inferred from file extension
+        if to_format == VideoContainer.AUTO:
+            to_format = container_format.lower()
+        elif isinstance(to_format, str):
+            to_format = to_format.lower()
+        open_kwargs["format"] = container_to_output_format(to_format)
+
+    return open_kwargs
+
+
+class VideoFromFile(VideoInput):
+    """
+    Class representing video input from a file.
+    """
+
+    def __init__(self, file: str | io.BytesIO):
+        """
+        Initialize the VideoFromFile object based off of either a path on disk or a BytesIO object
+        containing the file contents.
+        """
+        self.__file = file
+
+    def get_stream_source(self) -> str | io.BytesIO:
+        """
+        Return the underlying file source for efficient streaming.
+        This avoids unnecessary memory copies when the source is already a file path.
+        """
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)
+        return self.__file
+
+    def get_dimensions(self) -> tuple[int, int]:
+        """
+        Returns the dimensions of the video input.
+
+        Returns:
+            Tuple of (width, height)
+        """
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)  # Reset the BytesIO object to the beginning
+        with av.open(self.__file, mode='r') as container:
+            for stream in container.streams:
+                if stream.type == 'video':
+                    assert isinstance(stream, av.VideoStream)
+                    return stream.width, stream.height
+        raise ValueError(f"No video stream found in file '{self.__file}'")
+
+    def get_duration(self) -> float:
+        """
+        Returns the duration of the video in seconds.
+
+        Returns:
+            Duration in seconds
+        """
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)
+        with av.open(self.__file, mode="r") as container:
+            if container.duration is not None:
+                return float(container.duration / av.time_base)
+
+            # Fallback: calculate from frame count and frame rate
+            video_stream = next(
+                (s for s in container.streams if s.type == "video"), None
+            )
+            if video_stream and video_stream.frames and video_stream.average_rate:
+                return float(video_stream.frames / video_stream.average_rate)
+
+            # Last resort: decode frames to count them
+            if video_stream and video_stream.average_rate:
+                frame_count = 0
+                container.seek(0)
+                for packet in container.demux(video_stream):
+                    for _ in packet.decode():
+                        frame_count += 1
+                if frame_count > 0:
+                    return float(frame_count / video_stream.average_rate)
+
+        raise ValueError(f"Could not determine duration for file '{self.__file}'")
+
+    def get_container_format(self) -> str:
+        """
+        Returns the container format of the video (e.g., 'mp4', 'mov', 'avi').
+
+        Returns:
+            Container format as string
+        """
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)
+        with av.open(self.__file, mode='r') as container:
+            return container.format.name
+
+    def get_components_internal(self, container: InputContainer) -> VideoComponents:
+        # Get video frames
+        frames = []
+        for frame in container.decode(video=0):
+            img = frame.to_ndarray(format='rgb24')  # shape: (H, W, 3)
+            img = torch.from_numpy(img) / 255.0  # shape: (H, W, 3)
+            frames.append(img)
+
+        images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 3, 0, 0)
+
+        # Get frame rate
+        video_stream = next(s for s in container.streams if s.type == 'video')
+        frame_rate = Fraction(video_stream.average_rate) if video_stream and video_stream.average_rate else Fraction(1)
+
+        # Get audio if available
+        audio = None
+        try:
+            container.seek(0)  # Reset the container to the beginning
+            for stream in container.streams:
+                if stream.type != 'audio':
+                    continue
+                assert isinstance(stream, av.AudioStream)
+                audio_frames = []
+                for packet in container.demux(stream):
+                    for frame in packet.decode():
+                        assert isinstance(frame, av.AudioFrame)
+                        audio_frames.append(frame.to_ndarray())  # shape: (channels, samples)
+                if len(audio_frames) > 0:
+                    audio_data = np.concatenate(audio_frames, axis=1)  # shape: (channels, total_samples)
+                    audio_tensor = torch.from_numpy(audio_data).unsqueeze(0)  # shape: (1, channels, total_samples)
+                    audio = AudioInput({
+                        "waveform": audio_tensor,
+                        "sample_rate": int(stream.sample_rate) if stream.sample_rate else 1,
+                    })
+        except StopIteration:
+            pass  # No audio stream
+
+        metadata = container.metadata
+        return VideoComponents(images=images, audio=audio, frame_rate=frame_rate, metadata=metadata)
+
+    def get_components(self) -> VideoComponents:
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)  # Reset the BytesIO object to the beginning
+        with av.open(self.__file, mode='r') as container:
+            return self.get_components_internal(container)
+        raise ValueError(f"No video stream found in file '{self.__file}'")
+
+    def save_to(
+        self,
+        path: str | io.BytesIO,
+        format: VideoContainer = VideoContainer.AUTO,
+        codec: VideoCodec = VideoCodec.AUTO,
+        metadata: Optional[dict] = None
+    ):
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)  # Reset the BytesIO object to the beginning
+        with av.open(self.__file, mode='r') as container:
+            container_format = container.format.name
+            video_encoding = container.streams.video[0].codec.name if len(container.streams.video) > 0 else None
+            reuse_streams = True
+            if format != VideoContainer.AUTO and format not in container_format.split(","):
+                reuse_streams = False
+            if codec != VideoCodec.AUTO and codec != video_encoding and video_encoding is not None:
+                reuse_streams = False
+
+            if not reuse_streams:
+                components = self.get_components_internal(container)
+                video = VideoFromComponents(components)
+                return video.save_to(
+                    path,
+                    format=format,
+                    codec=codec,
+                    metadata=metadata
+                )
+
+            streams = container.streams
+
+            open_kwargs = get_open_write_kwargs(path, container_format, format)
+            with av.open(path, **open_kwargs) as output_container:
+                # Copy over the original metadata
+                for key, value in container.metadata.items():
+                    if metadata is None or key not in metadata:
+                        output_container.metadata[key] = value
+
+                # Add our new metadata
+                if metadata is not None:
+                    for key, value in metadata.items():
+                        if isinstance(value, str):
+                            output_container.metadata[key] = value
+                        else:
+                            output_container.metadata[key] = json.dumps(value)
+
+                # Add streams to the new container
+                stream_map = {}
+                for stream in streams:
+                    if isinstance(stream, (av.VideoStream, av.AudioStream, SubtitleStream)):
+                        out_stream = output_container.add_stream_from_template(template=stream, opaque=True)
+                        stream_map[stream] = out_stream
+
+                # Write packets to the new container
+                for packet in container.demux():
+                    if packet.stream in stream_map and packet.dts is not None:
+                        packet.stream = stream_map[packet.stream]
+                        output_container.mux(packet)
+
+class VideoFromComponents(VideoInput):
+    """
+    Class representing video input from tensors.
+    """
+
+    def __init__(self, components: VideoComponents):
+        self.__components = components
+
+    def get_components(self) -> VideoComponents:
+        return VideoComponents(
+            images=self.__components.images,
+            audio=self.__components.audio,
+            frame_rate=self.__components.frame_rate
+        )
+
+    def save_to(
+        self,
+        path: str,
+        format: VideoContainer = VideoContainer.AUTO,
+        codec: VideoCodec = VideoCodec.AUTO,
+        metadata: Optional[dict] = None
+    ):
+        if format != VideoContainer.AUTO and format != VideoContainer.MP4:
+            raise ValueError("Only MP4 format is supported for now")
+        if codec != VideoCodec.AUTO and codec != VideoCodec.H264:
+            raise ValueError("Only H264 codec is supported for now")
+        with av.open(path, mode='w', options={'movflags': 'use_metadata_tags'}) as output:
+            # Add metadata before writing any streams
+            if metadata is not None:
+                for key, value in metadata.items():
+                    output.metadata[key] = json.dumps(value)
+
+            frame_rate = Fraction(round(self.__components.frame_rate * 1000), 1000)
+            # Create a video stream
+            video_stream = output.add_stream('h264', rate=frame_rate)
+            video_stream.width = self.__components.images.shape[2]
+            video_stream.height = self.__components.images.shape[1]
+            video_stream.pix_fmt = 'yuv420p'
+
+            # Create an audio stream
+            audio_sample_rate = 1
+            audio_stream: Optional[av.AudioStream] = None
+            if self.__components.audio:
+                audio_sample_rate = int(self.__components.audio['sample_rate'])
+                audio_stream = output.add_stream('aac', rate=audio_sample_rate)
+                audio_stream.sample_rate = audio_sample_rate
+                audio_stream.format = 'fltp'
+
+            # Encode video
+            for i, frame in enumerate(self.__components.images):
+                img = (frame * 255).clamp(0, 255).byte().cpu().numpy() # shape: (H, W, 3)
+                frame = av.VideoFrame.from_ndarray(img, format='rgb24')
+                frame = frame.reformat(format='yuv420p')  # Convert to YUV420P as required by h264
+                packet = video_stream.encode(frame)
+                output.mux(packet)
+
+            # Flush video
+            packet = video_stream.encode(None)
+            output.mux(packet)
+
+            if audio_stream and self.__components.audio:
+                # Encode audio
+                samples_per_frame = int(audio_sample_rate / frame_rate)
+                num_frames = self.__components.audio['waveform'].shape[2] // samples_per_frame
+                for i in range(num_frames):
+                    start = i * samples_per_frame
+                    end = start + samples_per_frame
+                    # TODO(Feature) - Add support for stereo audio
+                    chunk = (
+                        self.__components.audio["waveform"][0, 0, start:end]
+                        .unsqueeze(0)
+                        .contiguous()
+                        .numpy()
+                    )
+                    audio_frame = av.AudioFrame.from_ndarray(chunk, format='fltp', layout='mono')
+                    audio_frame.sample_rate = audio_sample_rate
+                    audio_frame.pts = i * samples_per_frame
+                    for packet in audio_stream.encode(audio_frame):
+                        output.mux(packet)
+
+                # Flush audio
+                for packet in audio_stream.encode(None):
+                    output.mux(packet)
+
+
--- a/comfy_api/latest/_io.py
+++ b/comfy_api/latest/_io.py
--- a/comfy_api/latest/_resources.py
+++ b/comfy_api/latest/_resources.py
@@ -0,0 +1,72 @@
+from __future__ import annotations
+import comfy.utils
+import folder_paths
+import logging
+from abc import ABC, abstractmethod
+from typing import Any
+import torch
+
+class ResourceKey(ABC):
+    Type = Any
+    def __init__(self):
+        ...
+
+class TorchDictFolderFilename(ResourceKey):
+    '''Key for requesting a torch file via file_name from a folder category.'''
+    Type = dict[str, torch.Tensor]
+    def __init__(self, folder_name: str, file_name: str):
+        self.folder_name = folder_name
+        self.file_name = file_name
+
+    def __hash__(self):
+        return hash((self.folder_name, self.file_name))
+
+    def __eq__(self, other: object) -> bool:
+        if not isinstance(other, TorchDictFolderFilename):
+            return False
+        return self.folder_name == other.folder_name and self.file_name == other.file_name
+
+    def __str__(self):
+        return f"{self.folder_name} -> {self.file_name}"
+
+class Resources(ABC):
+    def __init__(self):
+        ...
+
+    @abstractmethod
+    def get(self, key: ResourceKey, default: Any=...) -> Any:
+        pass
+
+class ResourcesLocal(Resources):
+    def __init__(self):
+        super().__init__()
+        self.local_resources: dict[ResourceKey, Any] = {}
+
+    def get(self, key: ResourceKey, default: Any=...) -> Any:
+        cached = self.local_resources.get(key, None)
+        if cached is not None:
+            logging.info(f"Using cached resource '{key}'")
+            return cached
+        logging.info(f"Loading resource '{key}'")
+        to_return = None
+        if isinstance(key, TorchDictFolderFilename):
+            if default is ...:
+                to_return = comfy.utils.load_torch_file(folder_paths.get_full_path_or_raise(key.folder_name, key.file_name), safe_load=True)
+            else:
+                full_path = folder_paths.get_full_path(key.folder_name, key.file_name)
+                if full_path is not None:
+                    to_return = comfy.utils.load_torch_file(full_path, safe_load=True)
+
+        if to_return is not None:
+            self.local_resources[key] = to_return
+            return to_return
+        if default is not ...:
+            return default
+        raise Exception(f"Unsupported resource key type: {type(key)}")
+
+
+class _RESOURCES:
+    ResourceKey = ResourceKey
+    TorchDictFolderFilename = TorchDictFolderFilename
+    Resources = Resources
+    ResourcesLocal = ResourcesLocal
--- a/comfy_api/latest/_ui.py
+++ b/comfy_api/latest/_ui.py
@@ -0,0 +1,463 @@
+from __future__ import annotations
+
+import json
+import os
+import random
+from io import BytesIO
+from typing import Type
+
+import av
+import numpy as np
+import torch
+try:
+    import torchaudio
+    TORCH_AUDIO_AVAILABLE = True
+except:
+    TORCH_AUDIO_AVAILABLE = False
+from PIL import Image as PILImage
+from PIL.PngImagePlugin import PngInfo
+
+import folder_paths
+
+# used for image preview
+from comfy.cli_args import args
+from comfy_api.latest._io import ComfyNode, FolderType, Image, _UIOutput
+
+
+class SavedResult(dict):
+    def __init__(self, filename: str, subfolder: str, type: FolderType):
+        super().__init__(filename=filename, subfolder=subfolder,type=type.value)
+
+    @property
+    def filename(self) -> str:
+        return self["filename"]
+
+    @property
+    def subfolder(self) -> str:
+        return self["subfolder"]
+
+    @property
+    def type(self) -> FolderType:
+        return FolderType(self["type"])
+
+
+class SavedImages(_UIOutput):
+    """A UI output class to represent one or more saved images, potentially animated."""
+    def __init__(self, results: list[SavedResult], is_animated: bool = False):
+        super().__init__()
+        self.results = results
+        self.is_animated = is_animated
+
+    def as_dict(self) -> dict:
+        data = {"images": self.results}
+        if self.is_animated:
+            data["animated"] = (True,)
+        return data
+
+
+class SavedAudios(_UIOutput):
+    """UI wrapper around one or more audio files on disk (FLAC / MP3 / Opus)."""
+    def __init__(self, results: list[SavedResult]):
+        super().__init__()
+        self.results = results
+
+    def as_dict(self) -> dict:
+        return {"audio": self.results}
+
+
+def _get_directory_by_folder_type(folder_type: FolderType) -> str:
+    if folder_type == FolderType.input:
+        return folder_paths.get_input_directory()
+    if folder_type == FolderType.output:
+        return folder_paths.get_output_directory()
+    return folder_paths.get_temp_directory()
+
+
+class ImageSaveHelper:
+    """A helper class with static methods to handle image saving and metadata."""
+
+    @staticmethod
+    def _convert_tensor_to_pil(image_tensor: torch.Tensor) -> PILImage.Image:
+        """Converts a single torch tensor to a PIL Image."""
+        return PILImage.fromarray(np.clip(255.0 * image_tensor.cpu().numpy(), 0, 255).astype(np.uint8))
+
+    @staticmethod
+    def _create_png_metadata(cls: Type[ComfyNode] | None) -> PngInfo | None:
+        """Creates a PngInfo object with prompt and extra_pnginfo."""
+        if args.disable_metadata or cls is None or not cls.hidden:
+            return None
+        metadata = PngInfo()
+        if cls.hidden.prompt:
+            metadata.add_text("prompt", json.dumps(cls.hidden.prompt))
+        if cls.hidden.extra_pnginfo:
+            for x in cls.hidden.extra_pnginfo:
+                metadata.add_text(x, json.dumps(cls.hidden.extra_pnginfo[x]))
+        return metadata
+
+    @staticmethod
+    def _create_animated_png_metadata(cls: Type[ComfyNode] | None) -> PngInfo | None:
+        """Creates a PngInfo object with prompt and extra_pnginfo for animated PNGs (APNG)."""
+        if args.disable_metadata or cls is None or not cls.hidden:
+            return None
+        metadata = PngInfo()
+        if cls.hidden.prompt:
+            metadata.add(
+                b"comf",
+                "prompt".encode("latin-1", "strict")
+                + b"\0"
+                + json.dumps(cls.hidden.prompt).encode("latin-1", "strict"),
+                after_idat=True,
+            )
+        if cls.hidden.extra_pnginfo:
+            for x in cls.hidden.extra_pnginfo:
+                metadata.add(
+                    b"comf",
+                    x.encode("latin-1", "strict")
+                    + b"\0"
+                    + json.dumps(cls.hidden.extra_pnginfo[x]).encode("latin-1", "strict"),
+                    after_idat=True,
+                )
+        return metadata
+
+    @staticmethod
+    def _create_webp_metadata(pil_image: PILImage.Image, cls: Type[ComfyNode] | None) -> PILImage.Exif:
+        """Creates EXIF metadata bytes for WebP images."""
+        exif_data = pil_image.getexif()
+        if args.disable_metadata or cls is None or cls.hidden is None:
+            return exif_data
+        if cls.hidden.prompt is not None:
+            exif_data[0x0110] = "prompt:{}".format(json.dumps(cls.hidden.prompt))  # EXIF 0x0110 = Model
+        if cls.hidden.extra_pnginfo is not None:
+            inital_exif_tag = 0x010F  # EXIF 0x010f = Make
+            for key, value in cls.hidden.extra_pnginfo.items():
+                exif_data[inital_exif_tag] = "{}:{}".format(key, json.dumps(value))
+                inital_exif_tag -= 1
+        return exif_data
+
+    @staticmethod
+    def save_images(
+        images, filename_prefix: str, folder_type: FolderType, cls: Type[ComfyNode] | None, compress_level = 4,
+    ) -> list[SavedResult]:
+        """Saves a batch of images as individual PNG files."""
+        full_output_folder, filename, counter, subfolder, _ = folder_paths.get_save_image_path(
+            filename_prefix, _get_directory_by_folder_type(folder_type), images[0].shape[1], images[0].shape[0]
+        )
+        results = []
+        metadata = ImageSaveHelper._create_png_metadata(cls)
+        for batch_number, image_tensor in enumerate(images):
+            img = ImageSaveHelper._convert_tensor_to_pil(image_tensor)
+            filename_with_batch_num = filename.replace("%batch_num%", str(batch_number))
+            file = f"{filename_with_batch_num}_{counter:05}_.png"
+            img.save(os.path.join(full_output_folder, file), pnginfo=metadata, compress_level=compress_level)
+            results.append(SavedResult(file, subfolder, folder_type))
+            counter += 1
+        return results
+
+    @staticmethod
+    def get_save_images_ui(images, filename_prefix: str, cls: Type[ComfyNode] | None, compress_level=4) -> SavedImages:
+        """Saves a batch of images and returns a UI object for the node output."""
+        return SavedImages(
+                ImageSaveHelper.save_images(
+                images,
+                filename_prefix=filename_prefix,
+                folder_type=FolderType.output,
+                cls=cls,
+                compress_level=compress_level,
+            )
+        )
+
+    @staticmethod
+    def save_animated_png(
+        images, filename_prefix: str, folder_type: FolderType, cls: Type[ComfyNode] | None, fps: float, compress_level: int
+    ) -> SavedResult:
+        """Saves a batch of images as a single animated PNG."""
+        full_output_folder, filename, counter, subfolder, _ = folder_paths.get_save_image_path(
+            filename_prefix, _get_directory_by_folder_type(folder_type), images[0].shape[1], images[0].shape[0]
+        )
+        pil_images = [ImageSaveHelper._convert_tensor_to_pil(img) for img in images]
+        metadata = ImageSaveHelper._create_animated_png_metadata(cls)
+        file = f"{filename}_{counter:05}_.png"
+        save_path = os.path.join(full_output_folder, file)
+        pil_images[0].save(
+            save_path,
+            pnginfo=metadata,
+            compress_level=compress_level,
+            save_all=True,
+            duration=int(1000.0 / fps),
+            append_images=pil_images[1:],
+        )
+        return SavedResult(file, subfolder, folder_type)
+
+    @staticmethod
+    def get_save_animated_png_ui(
+        images, filename_prefix: str, cls: Type[ComfyNode] | None, fps: float, compress_level: int
+    ) -> SavedImages:
+        """Saves an animated PNG and returns a UI object for the node output."""
+        result = ImageSaveHelper.save_animated_png(
+            images,
+            filename_prefix=filename_prefix,
+            folder_type=FolderType.output,
+            cls=cls,
+            fps=fps,
+            compress_level=compress_level,
+        )
+        return SavedImages([result], is_animated=len(images) > 1)
+
+    @staticmethod
+    def save_animated_webp(
+        images,
+        filename_prefix: str,
+        folder_type: FolderType,
+        cls: Type[ComfyNode] | None,
+        fps: float,
+        lossless: bool,
+        quality: int,
+        method: int,
+    ) -> SavedResult:
+        """Saves a batch of images as a single animated WebP."""
+        full_output_folder, filename, counter, subfolder, _ = folder_paths.get_save_image_path(
+            filename_prefix, _get_directory_by_folder_type(folder_type), images[0].shape[1], images[0].shape[0]
+        )
+        pil_images = [ImageSaveHelper._convert_tensor_to_pil(img) for img in images]
+        pil_exif = ImageSaveHelper._create_webp_metadata(pil_images[0], cls)
+        file = f"{filename}_{counter:05}_.webp"
+        pil_images[0].save(
+            os.path.join(full_output_folder, file),
+            save_all=True,
+            duration=int(1000.0 / fps),
+            append_images=pil_images[1:],
+            exif=pil_exif,
+            lossless=lossless,
+            quality=quality,
+            method=method,
+        )
+        return SavedResult(file, subfolder, folder_type)
+
+    @staticmethod
+    def get_save_animated_webp_ui(
+        images,
+        filename_prefix: str,
+        cls: Type[ComfyNode] | None,
+        fps: float,
+        lossless: bool,
+        quality: int,
+        method: int,
+    ) -> SavedImages:
+        """Saves an animated WebP and returns a UI object for the node output."""
+        result = ImageSaveHelper.save_animated_webp(
+            images,
+            filename_prefix=filename_prefix,
+            folder_type=FolderType.output,
+            cls=cls,
+            fps=fps,
+            lossless=lossless,
+            quality=quality,
+            method=method,
+        )
+        return SavedImages([result], is_animated=len(images) > 1)
+
+
+class AudioSaveHelper:
+    """A helper class with static methods to handle audio saving and metadata."""
+    _OPUS_RATES = [8000, 12000, 16000, 24000, 48000]
+
+    @staticmethod
+    def save_audio(
+        audio: dict,
+        filename_prefix: str,
+        folder_type: FolderType,
+        cls: Type[ComfyNode] | None,
+        format: str = "flac",
+        quality: str = "128k",
+    ) -> list[SavedResult]:
+        full_output_folder, filename, counter, subfolder, _ = folder_paths.get_save_image_path(
+            filename_prefix, _get_directory_by_folder_type(folder_type)
+        )
+
+        metadata = {}
+        if not args.disable_metadata and cls is not None:
+            if cls.hidden.prompt is not None:
+                metadata["prompt"] = json.dumps(cls.hidden.prompt)
+            if cls.hidden.extra_pnginfo is not None:
+                for x in cls.hidden.extra_pnginfo:
+                    metadata[x] = json.dumps(cls.hidden.extra_pnginfo[x])
+
+        results = []
+        for batch_number, waveform in enumerate(audio["waveform"].cpu()):
+            filename_with_batch_num = filename.replace("%batch_num%", str(batch_number))
+            file = f"{filename_with_batch_num}_{counter:05}_.{format}"
+            output_path = os.path.join(full_output_folder, file)
+
+            # Use original sample rate initially
+            sample_rate = audio["sample_rate"]
+
+            # Handle Opus sample rate requirements
+            if format == "opus":
+                if sample_rate > 48000:
+                    sample_rate = 48000
+                elif sample_rate not in AudioSaveHelper._OPUS_RATES:
+                    # Find the next highest supported rate
+                    for rate in sorted(AudioSaveHelper._OPUS_RATES):
+                        if rate > sample_rate:
+                            sample_rate = rate
+                            break
+                    if sample_rate not in AudioSaveHelper._OPUS_RATES:  # Fallback if still not supported
+                        sample_rate = 48000
+
+                # Resample if necessary
+                if sample_rate != audio["sample_rate"]:
+                    if not TORCH_AUDIO_AVAILABLE:
+                        raise Exception("torchaudio is not available; cannot resample audio.")
+                    waveform = torchaudio.functional.resample(waveform, audio["sample_rate"], sample_rate)
+
+            # Create output with specified format
+            output_buffer = BytesIO()
+            output_container = av.open(output_buffer, mode="w", format=format)
+
+            # Set metadata on the container
+            for key, value in metadata.items():
+                output_container.metadata[key] = value
+
+            # Set up the output stream with appropriate properties
+            if format == "opus":
+                out_stream = output_container.add_stream("libopus", rate=sample_rate)
+                if quality == "64k":
+                    out_stream.bit_rate = 64000
+                elif quality == "96k":
+                    out_stream.bit_rate = 96000
+                elif quality == "128k":
+                    out_stream.bit_rate = 128000
+                elif quality == "192k":
+                    out_stream.bit_rate = 192000
+                elif quality == "320k":
+                    out_stream.bit_rate = 320000
+            elif format == "mp3":
+                out_stream = output_container.add_stream("libmp3lame", rate=sample_rate)
+                if quality == "V0":
+                    # TODO i would really love to support V3 and V5 but there doesn't seem to be a way to set the qscale level, the property below is a bool
+                    out_stream.codec_context.qscale = 1
+                elif quality == "128k":
+                    out_stream.bit_rate = 128000
+                elif quality == "320k":
+                    out_stream.bit_rate = 320000
+            else:  # format == "flac":
+                out_stream = output_container.add_stream("flac", rate=sample_rate)
+
+            frame = av.AudioFrame.from_ndarray(
+                waveform.movedim(0, 1).reshape(1, -1).float().numpy(),
+                format="flt",
+                layout="mono" if waveform.shape[0] == 1 else "stereo",
+            )
+            frame.sample_rate = sample_rate
+            frame.pts = 0
+            output_container.mux(out_stream.encode(frame))
+
+            # Flush encoder
+            output_container.mux(out_stream.encode(None))
+
+            # Close containers
+            output_container.close()
+
+            # Write the output to file
+            output_buffer.seek(0)
+            with open(output_path, "wb") as f:
+                f.write(output_buffer.getbuffer())
+
+            results.append(SavedResult(file, subfolder, folder_type))
+            counter += 1
+
+        return results
+
+    @staticmethod
+    def get_save_audio_ui(
+        audio, filename_prefix: str, cls: Type[ComfyNode] | None, format: str = "flac", quality: str = "128k",
+    ) -> SavedAudios:
+        """Save and instantly wrap for UI."""
+        return SavedAudios(
+            AudioSaveHelper.save_audio(
+                audio,
+                filename_prefix=filename_prefix,
+                folder_type=FolderType.output,
+                cls=cls,
+                format=format,
+                quality=quality,
+            )
+        )
+
+
+class PreviewImage(_UIOutput):
+    def __init__(self, image: Image.Type, animated: bool = False, cls: Type[ComfyNode] = None, **kwargs):
+        self.values = ImageSaveHelper.save_images(
+            image,
+            filename_prefix="ComfyUI_temp_" + ''.join(random.choice("abcdefghijklmnopqrstupvxyz") for _ in range(5)),
+            folder_type=FolderType.temp,
+            cls=cls,
+            compress_level=1,
+        )
+        self.animated = animated
+
+    def as_dict(self):
+        return {
+            "images": self.values,
+            "animated": (self.animated,)
+        }
+
+
+class PreviewMask(PreviewImage):
+    def __init__(self, mask: PreviewMask.Type, animated: bool=False, cls: ComfyNode=None, **kwargs):
+        preview = mask.reshape((-1, 1, mask.shape[-2], mask.shape[-1])).movedim(1, -1).expand(-1, -1, -1, 3)
+        super().__init__(preview, animated, cls, **kwargs)
+
+
+class PreviewAudio(_UIOutput):
+    def __init__(self, audio: dict, cls: Type[ComfyNode] = None, **kwargs):
+        self.values = AudioSaveHelper.save_audio(
+            audio,
+            filename_prefix="ComfyUI_temp_" + "".join(random.choice("abcdefghijklmnopqrstuvwxyz") for _ in range(5)),
+            folder_type=FolderType.temp,
+            cls=cls,
+            format="flac",
+            quality="128k",
+        )
+
+    def as_dict(self) -> dict:
+        return {"audio": self.values}
+
+
+class PreviewVideo(_UIOutput):
+    def __init__(self, values: list[SavedResult | dict], **kwargs):
+        self.values = values
+
+    def as_dict(self):
+        return {"images": self.values, "animated": (True,)}
+
+
+class PreviewUI3D(_UIOutput):
+    def __init__(self, model_file, camera_info, **kwargs):
+        self.model_file = model_file
+        self.camera_info = camera_info
+
+    def as_dict(self):
+        return {"result": [self.model_file, self.camera_info]}
+
+
+class PreviewText(_UIOutput):
+    def __init__(self, value: str, **kwargs):
+        self.value = value
+
+    def as_dict(self):
+        return {"text": (self.value,)}
+
+
+class _UI:
+    SavedResult = SavedResult
+    SavedImages = SavedImages
+    SavedAudios = SavedAudios
+    ImageSaveHelper = ImageSaveHelper
+    AudioSaveHelper = AudioSaveHelper
+    PreviewImage = PreviewImage
+    PreviewMask = PreviewMask
+    PreviewAudio = PreviewAudio
+    PreviewVideo = PreviewVideo
+    PreviewUI3D = PreviewUI3D
+    PreviewText = PreviewText
--- a/comfy_api/latest/_util/init.py
+++ b/comfy_api/latest/_util/init.py
@@ -0,0 +1,8 @@
+from .video_types import VideoContainer, VideoCodec, VideoComponents
+
+__all__ = [
+    # Utility Types
+    "VideoContainer",
+    "VideoCodec",
+    "VideoComponents",
+]
--- a/comfy_api/latest/_util/video_types.py
+++ b/comfy_api/latest/_util/video_types.py
@@ -0,0 +1,52 @@
+from __future__ import annotations
+from dataclasses import dataclass
+from enum import Enum
+from fractions import Fraction
+from typing import Optional
+from comfy_api.latest._input import ImageInput, AudioInput
+
+class VideoCodec(str, Enum):
+    AUTO = "auto"
+    H264 = "h264"
+
+    @classmethod
+    def as_input(cls) -> list[str]:
+        """
+        Returns a list of codec names that can be used as node input.
+        """
+        return [member.value for member in cls]
+
+class VideoContainer(str, Enum):
+    AUTO = "auto"
+    MP4 = "mp4"
+
+    @classmethod
+    def as_input(cls) -> list[str]:
+        """
+        Returns a list of container names that can be used as node input.
+        """
+        return [member.value for member in cls]
+
+    @classmethod
+    def get_extension(cls, value) -> str:
+        """
+        Returns the file extension for the container.
+        """
+        if isinstance(value, str):
+            value = cls(value)
+        if value == VideoContainer.MP4 or value == VideoContainer.AUTO:
+            return "mp4"
+        return ""
+
+@dataclass
+class VideoComponents:
+    """
+    Dataclass representing the components of a video.
+    """
+
+    images: ImageInput
+    frame_rate: Fraction
+    audio: Optional[AudioInput] = None
+    metadata: Optional[dict] = None
+
+
--- a/comfy_api/latest/generated/ComfyAPISyncStub.pyi
+++ b/comfy_api/latest/generated/ComfyAPISyncStub.pyi
@@ -0,0 +1,20 @@
+from typing import Any, Dict, List, Optional, Tuple, Union, Set, Sequence, cast, NamedTuple
+from comfy_api.latest import ComfyAPI_latest
+from PIL.Image import Image
+from torch import Tensor
+class ComfyAPISyncStub:
+    def __init__(self) -> None: ...
+
+    class ExecutionSync:
+        def __init__(self) -> None: ...
+        """
+        Update the progress bar displayed in the ComfyUI interface.
+
+        This function allows custom nodes and API calls to report their progress
+        back to the user interface, providing visual feedback during long operations.
+
+        Migration from previous API: comfy.utils.PROGRESS_BAR_HOOK
+        """
+        def set_progress(self, value: float, max_value: float, node_id: Union[str, None] = None, preview_image: Union[Image, Tensor, None] = None, ignore_size_limit: bool = False) -> None: ...
+
+    execution: ExecutionSync
--- a/comfy_api/util.py
+++ b/comfy_api/util.py
@@ -0,0 +1,8 @@
+# This file only exists for backwards compatibility.
+from comfy_api.latest._util import VideoCodec, VideoContainer, VideoComponents
+
+__all__ = [
+    "VideoCodec",
+    "VideoContainer",
+    "VideoComponents",
+]
--- a/comfy_api/util/init.py
+++ b/comfy_api/util/init.py
@@ -1,7 +1,7 @@
-from .video_types import VideoContainer, VideoCodec, VideoComponents
+# This file only exists for backwards compatibility.
+from comfy_api.latest._util import VideoContainer, VideoCodec, VideoComponents

 __all__ = [
-    # Utility Types
    "VideoContainer",
    "VideoCodec",
    "VideoComponents",
--- a/comfy_api/util/video_types.py
+++ b/comfy_api/util/video_types.py
@@ -1,51 +1,12 @@
-from __future__ import annotations
-from dataclasses import dataclass
-from enum import Enum
-from fractions import Fraction
-from typing import Optional
-from comfy_api.input import ImageInput, AudioInput
-
-class VideoCodec(str, Enum):
-    AUTO = "auto"
-    H264 = "h264"
-
-    @classmethod
-    def as_input(cls) -> list[str]:
-        """
-        Returns a list of codec names that can be used as node input.
-        """
-        return [member.value for member in cls]
-
-class VideoContainer(str, Enum):
-    AUTO = "auto"
-    MP4 = "mp4"
-
-    @classmethod
-    def as_input(cls) -> list[str]:
-        """
-        Returns a list of container names that can be used as node input.
-        """
-        return [member.value for member in cls]
-
-    @classmethod
-    def get_extension(cls, value) -> str:
-        """
-        Returns the file extension for the container.
-        """
-        if isinstance(value, str):
-            value = cls(value)
-        if value == VideoContainer.MP4 or value == VideoContainer.AUTO:
-            return "mp4"
-        return ""
-
-@dataclass
-class VideoComponents:
-    """
-    Dataclass representing the components of a video.
-    """
-
-    images: ImageInput
-    frame_rate: Fraction
-    audio: Optional[AudioInput] = None
-    metadata: Optional[dict] = None
+# This file only exists for backwards compatibility.
+from comfy_api.latest._util.video_types import (
+    VideoContainer,
+    VideoCodec,
+    VideoComponents,
+)

+__all__ = [
+    "VideoContainer",
+    "VideoCodec",
+    "VideoComponents",
+]
--- a/comfy_api/v0_0_1/init.py
+++ b/comfy_api/v0_0_1/init.py
@@ -0,0 +1,42 @@
+from comfy_api.v0_0_2 import (
+    ComfyAPIAdapter_v0_0_2,
+    Input as Input_v0_0_2,
+    InputImpl as InputImpl_v0_0_2,
+    Types as Types_v0_0_2,
+)
+from typing import Type, TYPE_CHECKING
+from comfy_api.internal.async_to_sync import create_sync_class
+
+
+# This version only exists to serve as a template for future version adapters.
+# There is no reason anyone should ever use it.
+class ComfyAPIAdapter_v0_0_1(ComfyAPIAdapter_v0_0_2):
+    VERSION = "0.0.1"
+    STABLE = True
+
+class Input(Input_v0_0_2):
+    pass
+
+class InputImpl(InputImpl_v0_0_2):
+    pass
+
+class Types(Types_v0_0_2):
+    pass
+
+ComfyAPI = ComfyAPIAdapter_v0_0_1
+
+# Create a synchronous version of the API
+if TYPE_CHECKING:
+    from comfy_api.v0_0_1.generated.ComfyAPISyncStub import ComfyAPISyncStub  # type: ignore
+
+    ComfyAPISync: Type[ComfyAPISyncStub]
+
+ComfyAPISync = create_sync_class(ComfyAPIAdapter_v0_0_1)
+
+__all__ = [
+    "ComfyAPI",
+    "ComfyAPISync",
+    "Input",
+    "InputImpl",
+    "Types",
+]
--- a/comfy_api/v0_0_1/generated/ComfyAPISyncStub.pyi
+++ b/comfy_api/v0_0_1/generated/ComfyAPISyncStub.pyi
@@ -0,0 +1,20 @@
+from typing import Any, Dict, List, Optional, Tuple, Union, Set, Sequence, cast, NamedTuple
+from comfy_api.v0_0_1 import ComfyAPIAdapter_v0_0_1
+from PIL.Image import Image
+from torch import Tensor
+class ComfyAPISyncStub:
+    def __init__(self) -> None: ...
+
+    class ExecutionSync:
+        def __init__(self) -> None: ...
+        """
+        Update the progress bar displayed in the ComfyUI interface.
+
+        This function allows custom nodes and API calls to report their progress
+        back to the user interface, providing visual feedback during long operations.
+
+        Migration from previous API: comfy.utils.PROGRESS_BAR_HOOK
+        """
+        def set_progress(self, value: float, max_value: float, node_id: Union[str, None] = None, preview_image: Union[Image, Tensor, None] = None, ignore_size_limit: bool = False) -> None: ...
+
+    execution: ExecutionSync
--- a/comfy_api/v0_0_2/init.py
+++ b/comfy_api/v0_0_2/init.py
@@ -0,0 +1,45 @@
+from comfy_api.latest import (
+    ComfyAPI_latest,
+    Input as Input_latest,
+    InputImpl as InputImpl_latest,
+    Types as Types_latest,
+)
+from typing import Type, TYPE_CHECKING
+from comfy_api.internal.async_to_sync import create_sync_class
+from comfy_api.latest import io, ui, ComfyExtension  #noqa: F401
+
+
+class ComfyAPIAdapter_v0_0_2(ComfyAPI_latest):
+    VERSION = "0.0.2"
+    STABLE = False
+
+
+class Input(Input_latest):
+    pass
+
+
+class InputImpl(InputImpl_latest):
+    pass
+
+
+class Types(Types_latest):
+    pass
+
+
+ComfyAPI = ComfyAPIAdapter_v0_0_2
+
+# Create a synchronous version of the API
+if TYPE_CHECKING:
+    from comfy_api.v0_0_2.generated.ComfyAPISyncStub import ComfyAPISyncStub  # type: ignore
+
+    ComfyAPISync: Type[ComfyAPISyncStub]
+ComfyAPISync = create_sync_class(ComfyAPIAdapter_v0_0_2)
+
+__all__ = [
+    "ComfyAPI",
+    "ComfyAPISync",
+    "Input",
+    "InputImpl",
+    "Types",
+    "ComfyExtension",
+]
--- a/comfy_api/v0_0_2/generated/ComfyAPISyncStub.pyi
+++ b/comfy_api/v0_0_2/generated/ComfyAPISyncStub.pyi
@@ -0,0 +1,20 @@
+from typing import Any, Dict, List, Optional, Tuple, Union, Set, Sequence, cast, NamedTuple
+from comfy_api.v0_0_2 import ComfyAPIAdapter_v0_0_2
+from PIL.Image import Image
+from torch import Tensor
+class ComfyAPISyncStub:
+    def __init__(self) -> None: ...
+
+    class ExecutionSync:
+        def __init__(self) -> None: ...
+        """
+        Update the progress bar displayed in the ComfyUI interface.
+
+        This function allows custom nodes and API calls to report their progress
+        back to the user interface, providing visual feedback during long operations.
+
+        Migration from previous API: comfy.utils.PROGRESS_BAR_HOOK
+        """
+        def set_progress(self, value: float, max_value: float, node_id: Union[str, None] = None, preview_image: Union[Image, Tensor, None] = None, ignore_size_limit: bool = False) -> None: ...
+
+    execution: ExecutionSync
--- a/comfy_api/version_list.py
+++ b/comfy_api/version_list.py
@@ -0,0 +1,12 @@
+from comfy_api.latest import ComfyAPI_latest
+from comfy_api.v0_0_2 import ComfyAPIAdapter_v0_0_2
+from comfy_api.v0_0_1 import ComfyAPIAdapter_v0_0_1
+from comfy_api.internal import ComfyAPIBase
+from typing import List, Type
+
+supported_versions: List[Type[ComfyAPIBase]] = [
+    ComfyAPI_latest,
+    ComfyAPIAdapter_v0_0_2,
+    ComfyAPIAdapter_v0_0_1,
+]
+
--- a/comfy_api_nodes/apinode_utils.py
+++ b/comfy_api_nodes/apinode_utils.py
@@ -1,4 +1,5 @@
 from __future__ import annotations
+import aiohttp
 import io
 import logging
 import mimetypes
@@ -21,7 +22,6 @@ from server import PromptServer

 import numpy as np
 from PIL import Image
-import requests
 import torch
 import math
 import base64
@@ -30,7 +30,7 @@ from io import BytesIO
 import av


-def download_url_to_video_output(video_url: str, timeout: int = None) -> VideoFromFile:
+async def download_url_to_video_output(video_url: str, timeout: int = None) -> VideoFromFile:
    """Downloads a video from a URL and returns a `VIDEO` output.

    Args:
@@ -39,7 +39,7 @@ def download_url_to_video_output(video_url: str, timeout: int = None) -> VideoFr
    Returns:
        A Comfy node `VIDEO` output.
    """
-    video_io = download_url_to_bytesio(video_url, timeout)
+    video_io = await download_url_to_bytesio(video_url, timeout)
    if video_io is None:
        error_msg = f"Failed to download video from {video_url}"
        logging.error(error_msg)
@@ -62,7 +62,7 @@ def downscale_image_tensor(image, total_pixels=1536 * 1024) -> torch.Tensor:
    return s


-def validate_and_cast_response(
+async def validate_and_cast_response(
    response, timeout: int = None, node_id: Union[str, None] = None
 ) -> torch.Tensor:
    """Validates and casts a response to a torch.Tensor.
@@ -86,35 +86,24 @@ def validate_and_cast_response(
    image_tensors: list[torch.Tensor] = []

    # Process each image in the data array
-    for image_data in data:
-        image_url = image_data.url
-        b64_data = image_data.b64_json
+    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=timeout)) as session:
+        for img_data in data:
+            img_bytes: bytes
+            if img_data.b64_json:
+                img_bytes = base64.b64decode(img_data.b64_json)
+            elif img_data.url:
+                if node_id:
+                    PromptServer.instance.send_progress_text(f"Result URL: {img_data.url}", node_id)
+                async with session.get(img_data.url) as resp:
+                    if resp.status != 200:
+                        raise ValueError("Failed to download generated image")
+                    img_bytes = await resp.read()
+            else:
+                raise ValueError("Invalid image payload – neither URL nor base64 data present.")

-        if not image_url and not b64_data:
-            raise ValueError("No image was generated in the response")
-
-        if b64_data:
-            img_data = base64.b64decode(b64_data)
-            img = Image.open(io.BytesIO(img_data))
-
-        elif image_url:
-            if node_id:
-                PromptServer.instance.send_progress_text(
-                    f"Result URL: {image_url}", node_id
-                )
-            img_response = requests.get(image_url, timeout=timeout)
-            if img_response.status_code != 200:
-                raise ValueError("Failed to download the image")
-            img = Image.open(io.BytesIO(img_response.content))
-
-        img = img.convert("RGBA")
-
-        # Convert to numpy array, normalize to float32 between 0 and 1
-        img_array = np.array(img).astype(np.float32) / 255.0
-        img_tensor = torch.from_numpy(img_array)
-
-        # Add to list of tensors
-        image_tensors.append(img_tensor)
+            pil_img = Image.open(BytesIO(img_bytes)).convert("RGBA")
+            arr = np.asarray(pil_img).astype(np.float32) / 255.0
+            image_tensors.append(torch.from_numpy(arr))

    return torch.stack(image_tensors, dim=0)

@@ -175,7 +164,7 @@ def mimetype_to_extension(mime_type: str) -> str:
    return mime_type.split("/")[-1].lower()


-def download_url_to_bytesio(url: str, timeout: int = None) -> BytesIO:
+async def download_url_to_bytesio(url: str, timeout: int = None) -> BytesIO:
    """Downloads content from a URL using requests and returns it as BytesIO.

    Args:
@@ -185,9 +174,11 @@ def download_url_to_bytesio(url: str, timeout: int = None) -> BytesIO:
    Returns:
        BytesIO object containing the downloaded content.
    """
-    response = requests.get(url, stream=True, timeout=timeout)
-    response.raise_for_status()  # Raises HTTPError for bad responses (4XX or 5XX)
-    return BytesIO(response.content)
+    timeout_cfg = aiohttp.ClientTimeout(total=timeout) if timeout else None
+    async with aiohttp.ClientSession(timeout=timeout_cfg) as session:
+        async with session.get(url) as resp:
+            resp.raise_for_status()  # Raises HTTPError for bad responses (4XX or 5XX)
+            return BytesIO(await resp.read())


 def bytesio_to_image_tensor(image_bytesio: BytesIO, mode: str = "RGBA") -> torch.Tensor:
@@ -210,15 +201,15 @@ def bytesio_to_image_tensor(image_bytesio: BytesIO, mode: str = "RGBA") -> torch
    return torch.from_numpy(image_array).unsqueeze(0)


-def download_url_to_image_tensor(url: str, timeout: int = None) -> torch.Tensor:
+async def download_url_to_image_tensor(url: str, timeout: int = None) -> torch.Tensor:
    """Downloads an image from a URL and returns a [B, H, W, C] tensor."""
-    image_bytesio = download_url_to_bytesio(url, timeout)
+    image_bytesio = await download_url_to_bytesio(url, timeout)
    return bytesio_to_image_tensor(image_bytesio)


-def process_image_response(response: requests.Response) -> torch.Tensor:
+def process_image_response(response_content: bytes | str) -> torch.Tensor:
    """Uses content from a Response object and converts it to a torch.Tensor"""
-    return bytesio_to_image_tensor(BytesIO(response.content))
+    return bytesio_to_image_tensor(BytesIO(response_content))


 def _tensor_to_pil(image: torch.Tensor, total_pixels: int = 2048 * 2048) -> Image.Image:
@@ -336,10 +327,10 @@ def text_filepath_to_data_uri(filepath: str) -> str:
    return f"data:{mime_type};base64,{base64_string}"


-def upload_file_to_comfyapi(
+async def upload_file_to_comfyapi(
    file_bytes_io: BytesIO,
    filename: str,
-    upload_mime_type: str,
+    upload_mime_type: Optional[str],
    auth_kwargs: Optional[dict[str, str]] = None,
 ) -> str:
    """
@@ -354,7 +345,10 @@ def upload_file_to_comfyapi(
    Returns:
        The download URL for the uploaded file.
    """
-    request_object = UploadRequest(file_name=filename, content_type=upload_mime_type)
+    if upload_mime_type is None:
+        request_object = UploadRequest(file_name=filename)
+    else:
+        request_object = UploadRequest(file_name=filename, content_type=upload_mime_type)
    operation = SynchronousOperation(
        endpoint=ApiEndpoint(
            path="/customers/storage",
@@ -366,12 +360,8 @@ def upload_file_to_comfyapi(
        auth_kwargs=auth_kwargs,
    )

-    response: UploadResponse = operation.execute()
-    upload_response = ApiClient.upload_file(
-        response.upload_url, file_bytes_io, content_type=upload_mime_type
-    )
-    upload_response.raise_for_status()
-
+    response: UploadResponse = await operation.execute()
+    await ApiClient.upload_file(response.upload_url, file_bytes_io, content_type=upload_mime_type)
    return response.download_url


@@ -399,7 +389,7 @@ def video_to_base64_string(
    return base64.b64encode(video_bytes_io.getvalue()).decode("utf-8")


-def upload_video_to_comfyapi(
+async def upload_video_to_comfyapi(
    video: VideoInput,
    auth_kwargs: Optional[dict[str, str]] = None,
    container: VideoContainer = VideoContainer.MP4,
@@ -439,9 +429,7 @@ def upload_video_to_comfyapi(
    video.save_to(video_bytes_io, format=container, codec=codec)
    video_bytes_io.seek(0)

-    return upload_file_to_comfyapi(
-        video_bytes_io, filename, upload_mime_type, auth_kwargs
-    )
+    return await upload_file_to_comfyapi(video_bytes_io, filename, upload_mime_type, auth_kwargs)


 def audio_tensor_to_contiguous_ndarray(waveform: torch.Tensor) -> np.ndarray:
@@ -501,7 +489,7 @@ def audio_ndarray_to_bytesio(
    return audio_bytes_io


-def upload_audio_to_comfyapi(
+async def upload_audio_to_comfyapi(
    audio: AudioInput,
    auth_kwargs: Optional[dict[str, str]] = None,
    container_format: str = "mp4",
@@ -527,7 +515,7 @@ def upload_audio_to_comfyapi(
        audio_data_np, sample_rate, container_format, codec_name
    )

-    return upload_file_to_comfyapi(audio_bytes_io, filename, mime_type, auth_kwargs)
+    return await upload_file_to_comfyapi(audio_bytes_io, filename, mime_type, auth_kwargs)


 def audio_to_base64_string(
@@ -544,7 +532,7 @@ def audio_to_base64_string(
    return base64.b64encode(audio_bytes).decode("utf-8")


-def upload_images_to_comfyapi(
+async def upload_images_to_comfyapi(
    image: torch.Tensor,
    max_images=8,
    auth_kwargs: Optional[dict[str, str]] = None,
@@ -561,55 +549,15 @@ def upload_images_to_comfyapi(
        mime_type: Optional MIME type for the image.
    """
    # if batch, try to upload each file if max_images is greater than 0
-    idx_image = 0
    download_urls: list[str] = []
    is_batch = len(image.shape) > 3
-    batch_length = 1
-    if is_batch:
-        batch_length = image.shape[0]
-    while True:
-        curr_image = image
-        if len(image.shape) > 3:
-            curr_image = image[idx_image]
-        # get BytesIO version of image
-        img_binary = tensor_to_bytesio(curr_image, mime_type=mime_type)
-        # first, request upload/download urls from comfy API
-        if not mime_type:
-            request_object = UploadRequest(file_name=img_binary.name)
-        else:
-            request_object = UploadRequest(
-                file_name=img_binary.name, content_type=mime_type
-            )
-        operation = SynchronousOperation(
-            endpoint=ApiEndpoint(
-                path="/customers/storage",
-                method=HttpMethod.POST,
-                request_model=UploadRequest,
-                response_model=UploadResponse,
-            ),
-            request=request_object,
-            auth_kwargs=auth_kwargs,
-        )
-        response = operation.execute()
+    batch_len = image.shape[0] if is_batch else 1

-        upload_response = ApiClient.upload_file(
-            response.upload_url, img_binary, content_type=mime_type
-        )
-        # verify success
-        try:
-            upload_response.raise_for_status()
-        except requests.exceptions.HTTPError as e:
-            raise ValueError(f"Could not upload one or more images: {e}") from e
-        # add download_url to list
-        download_urls.append(response.download_url)
-
-        idx_image += 1
-        # stop uploading additional files if done
-        if is_batch and max_images > 0:
-            if idx_image >= max_images:
-                break
-            if idx_image >= batch_length:
-                break
+    for idx in range(min(batch_len, max_images)):
+        tensor = image[idx] if is_batch else image
+        img_io = tensor_to_bytesio(tensor, mime_type=mime_type)
+        url = await upload_file_to_comfyapi(img_io, img_io.name, mime_type, auth_kwargs)
+        download_urls.append(url)
    return download_urls


--- a/comfy_api_nodes/apis/init.py
+++ b/comfy_api_nodes/apis/init.py
--- a/comfy_api_nodes/apis/client.py
+++ b/comfy_api_nodes/apis/client.py
--- a/comfy_api_nodes/apis/request_logger.py
+++ b/comfy_api_nodes/apis/request_logger.py
@@ -1,3 +1,5 @@
+from __future__ import annotations
+
 import os
 import datetime
 import json
--- a/comfy_api_nodes/apis/tripo_api.py
+++ b/comfy_api_nodes/apis/tripo_api.py
@@ -127,7 +127,7 @@ class TripoTextToModelRequest(BaseModel):
    type: TripoTaskType = Field(TripoTaskType.TEXT_TO_MODEL, description='Type of task')
    prompt: str = Field(..., description='The text prompt describing the model to generate', max_length=1024)
    negative_prompt: Optional[str] = Field(None, description='The negative text prompt', max_length=1024)
-    model_version: Optional[TripoModelVersion] = TripoModelVersion.V2_5
+    model_version: Optional[TripoModelVersion] = TripoModelVersion.v2_5_20250123
    face_limit: Optional[int] = Field(None, description='The number of faces to limit the generation to')
    texture: Optional[bool] = Field(True, description='Whether to apply texture to the generated model')
    pbr: Optional[bool] = Field(True, description='Whether to apply PBR to the generated model')
--- a/comfy_api_nodes/nodes_bfl.py
+++ b/comfy_api_nodes/nodes_bfl.py
@@ -1,3 +1,4 @@
+import asyncio
 import io
 from inspect import cleandoc
 from typing import Union, Optional
@@ -28,7 +29,7 @@ from comfy_api_nodes.apinode_utils import (

 import numpy as np
 from PIL import Image
-import requests
+import aiohttp
 import torch
 import base64
 import time
@@ -44,18 +45,18 @@ def convert_mask_to_image(mask: torch.Tensor):
    return mask


-def handle_bfl_synchronous_operation(
+async def handle_bfl_synchronous_operation(
    operation: SynchronousOperation,
    timeout_bfl_calls=360,
    node_id: Union[str, None] = None,
 ):
-    response_api: BFLFluxProGenerateResponse = operation.execute()
-    return _poll_until_generated(
+    response_api: BFLFluxProGenerateResponse = await operation.execute()
+    return await _poll_until_generated(
        response_api.polling_url, timeout=timeout_bfl_calls, node_id=node_id
    )


-def _poll_until_generated(
+async def _poll_until_generated(
    polling_url: str, timeout=360, node_id: Union[str, None] = None
 ):
    # used bfl-comfy-nodes to verify code implementation:
@@ -66,55 +67,56 @@ def _poll_until_generated(
    retry_404_seconds = 2
    retry_202_seconds = 2
    retry_pending_seconds = 1
-    request = requests.Request(method=HttpMethod.GET, url=polling_url)
-    # NOTE: should True loop be replaced with checking if workflow has been interrupted?
-    while True:
-        if node_id:
-            time_elapsed = time.time() - start_time
-            PromptServer.instance.send_progress_text(
-                f"Generating ({time_elapsed:.0f}s)", node_id
-            )

-        response = requests.Session().send(request.prepare())
-        if response.status_code == 200:
-            result = response.json()
-            if result["status"] == BFLStatus.ready:
-                img_url = result["result"]["sample"]
-                if node_id:
-                    PromptServer.instance.send_progress_text(
-                        f"Result URL: {img_url}", node_id
-                    )
-                img_response = requests.get(img_url)
-                return process_image_response(img_response)
-            elif result["status"] in [
-                BFLStatus.request_moderated,
-                BFLStatus.content_moderated,
-            ]:
-                status = result["status"]
-                raise Exception(
-                    f"BFL API did not return an image due to: {status}."
+    async with aiohttp.ClientSession() as session:
+        # NOTE: should True loop be replaced with checking if workflow has been interrupted?
+        while True:
+            if node_id:
+                time_elapsed = time.time() - start_time
+                PromptServer.instance.send_progress_text(
+                    f"Generating ({time_elapsed:.0f}s)", node_id
                )
-            elif result["status"] == BFLStatus.error:
-                raise Exception(f"BFL API encountered an error: {result}.")
-            elif result["status"] == BFLStatus.pending:
-                time.sleep(retry_pending_seconds)
-                continue
-        elif response.status_code == 404:
-            if retries_404 < max_retries_404:
-                retries_404 += 1
-                time.sleep(retry_404_seconds)
-                continue
-            raise Exception(
-                f"BFL API could not find task after {max_retries_404} tries."
-            )
-        elif response.status_code == 202:
-            time.sleep(retry_202_seconds)
-        elif time.time() - start_time > timeout:
-            raise Exception(
-                f"BFL API experienced a timeout; could not return request under {timeout} seconds."
-            )
-        else:
-            raise Exception(f"BFL API encountered an error: {response.json()}")
+
+            async with session.get(polling_url) as response:
+                if response.status == 200:
+                    result = await response.json()
+                    if result["status"] == BFLStatus.ready:
+                        img_url = result["result"]["sample"]
+                        if node_id:
+                            PromptServer.instance.send_progress_text(
+                                f"Result URL: {img_url}", node_id
+                            )
+                        async with session.get(img_url) as img_resp:
+                            return process_image_response(await img_resp.content.read())
+                    elif result["status"] in [
+                        BFLStatus.request_moderated,
+                        BFLStatus.content_moderated,
+                    ]:
+                        status = result["status"]
+                        raise Exception(
+                            f"BFL API did not return an image due to: {status}."
+                        )
+                    elif result["status"] == BFLStatus.error:
+                        raise Exception(f"BFL API encountered an error: {result}.")
+                    elif result["status"] == BFLStatus.pending:
+                        await asyncio.sleep(retry_pending_seconds)
+                        continue
+                elif response.status == 404:
+                    if retries_404 < max_retries_404:
+                        retries_404 += 1
+                        await asyncio.sleep(retry_404_seconds)
+                        continue
+                    raise Exception(
+                        f"BFL API could not find task after {max_retries_404} tries."
+                    )
+                elif response.status == 202:
+                    await asyncio.sleep(retry_202_seconds)
+                elif time.time() - start_time > timeout:
+                    raise Exception(
+                        f"BFL API experienced a timeout; could not return request under {timeout} seconds."
+                    )
+                else:
+                    raise Exception(f"BFL API encountered an error: {response.json()}")

 def convert_image_to_base64(image: torch.Tensor):
    scaled_image = downscale_image_tensor(image, total_pixels=2048 * 2048)
@@ -222,7 +224,7 @@ class FluxProUltraImageNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        aspect_ratio: str,
@@ -266,7 +268,7 @@ class FluxProUltraImageNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -354,7 +356,7 @@ class FluxKontextProImageNode(ComfyNodeABC):

    BFL_PATH = "/proxy/bfl/flux-kontext-pro/generate"

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        aspect_ratio: str,
@@ -397,7 +399,7 @@ class FluxKontextProImageNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -489,7 +491,7 @@ class FluxProImageNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        prompt_upsampling,
@@ -524,7 +526,7 @@ class FluxProImageNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -632,7 +634,7 @@ class FluxProExpandNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt: str,
@@ -670,7 +672,7 @@ class FluxProExpandNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -744,7 +746,7 @@ class FluxProFillNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        mask: torch.Tensor,
@@ -780,7 +782,7 @@ class FluxProFillNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -879,7 +881,7 @@ class FluxProCannyNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        control_image: torch.Tensor,
        prompt: str,
@@ -929,7 +931,7 @@ class FluxProCannyNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -1008,7 +1010,7 @@ class FluxProDepthNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        control_image: torch.Tensor,
        prompt: str,
@@ -1045,7 +1047,7 @@ class FluxProDepthNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


--- a/comfy_api_nodes/nodes_gemini.py
+++ b/comfy_api_nodes/nodes_gemini.py
@@ -2,8 +2,13 @@
 API Nodes for Gemini Multimodal LLM Usage via Remote API
 See: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
 """
+from __future__ import annotations

+
+import json
+import time
 import os
+import uuid
 from enum import Enum
 from typing import Optional, Literal

@@ -44,6 +49,8 @@ class GeminiModel(str, Enum):

    gemini_2_5_pro_preview_05_06 = "gemini-2.5-pro-preview-05-06"
    gemini_2_5_flash_preview_04_17 = "gemini-2.5-flash-preview-04-17"
+    gemini_2_5_pro = "gemini-2.5-pro"
+    gemini_2_5_flash = "gemini-2.5-flash"


 def get_gemini_endpoint(
@@ -95,7 +102,7 @@ class GeminiNode(ComfyNodeABC):
                    {
                        "tooltip": "The Gemini model to use for generating responses.",
                        "options": [model.value for model in GeminiModel],
-                        "default": GeminiModel.gemini_2_5_pro_preview_05_06.value,
+                        "default": GeminiModel.gemini_2_5_pro.value,
                    },
                ),
                "seed": (
@@ -301,7 +308,7 @@ class GeminiNode(ComfyNodeABC):
        """
        return GeminiPart(text=text)

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: GeminiModel,
@@ -330,7 +337,7 @@ class GeminiNode(ComfyNodeABC):
            parts.extend(files)

        # Create response
-        response = SynchronousOperation(
+        response = await SynchronousOperation(
            endpoint=get_gemini_endpoint(model),
            request=GeminiGenerateContentRequest(
                contents=[
@@ -346,7 +353,27 @@ class GeminiNode(ComfyNodeABC):
        # Get result output
        output_text = self.get_text_from_response(response)
        if unique_id and output_text:
-            PromptServer.instance.send_progress_text(output_text, node_id=unique_id)
+            # Not a true chat history like the OpenAI Chat node. It is emulated so the frontend can show a copy button.
+            render_spec = {
+                "node_id": unique_id,
+                "component": "ChatHistoryWidget",
+                "props": {
+                    "history": json.dumps(
+                        [
+                            {
+                                "prompt": prompt,
+                                "response": output_text,
+                                "response_id": str(uuid.uuid4()),
+                                "timestamp": time.time(),
+                            }
+                        ]
+                    ),
+                },
+            }
+            PromptServer.instance.send_sync(
+                "display_component",
+                render_spec,
+            )

        return (output_text or "Empty response from Gemini model...",)

--- a/comfy_api_nodes/nodes_ideogram.py
+++ b/comfy_api_nodes/nodes_ideogram.py
@@ -1,8 +1,8 @@
-from comfy.comfy_types.node_typing import IO, ComfyNodeABC, InputTypeDict
-from inspect import cleandoc
+from io import BytesIO
+from typing_extensions import override
+from comfy_api.latest import ComfyExtension, io as comfy_io
 from PIL import Image
 import numpy as np
-import io
 import torch
 from comfy_api_nodes.apis import (
    IdeogramGenerateRequest,
@@ -212,7 +212,7 @@ V3_RESOLUTIONS= [
    "1536x640"
 ]

-def download_and_process_images(image_urls):
+async def download_and_process_images(image_urls):
    """Helper function to download and process multiple images from URLs"""

    # Initialize list to store image tensors
@@ -220,7 +220,7 @@ def download_and_process_images(image_urls):

    for image_url in image_urls:
        # Using functions from apinode_utils.py to handle downloading and processing
-        image_bytesio = download_url_to_bytesio(image_url)  # Download image content to BytesIO
+        image_bytesio = await download_url_to_bytesio(image_url)  # Download image content to BytesIO
        img_tensor = bytesio_to_image_tensor(image_bytesio, mode="RGB")  # Convert to torch.Tensor with RGB mode
        image_tensors.append(img_tensor)

@@ -246,90 +246,81 @@ def display_image_urls_on_node(image_urls, node_id):
            PromptServer.instance.send_progress_text(urls_text, node_id)


-class IdeogramV1(ComfyNodeABC):
-    """
-    Generates images using the Ideogram V1 model.
-    """
-
-    def __init__(self):
-        pass
+class IdeogramV1(comfy_io.ComfyNode):

    @classmethod
-    def INPUT_TYPES(cls) -> InputTypeDict:
-        return {
-            "required": {
-                "prompt": (
-                    IO.STRING,
-                    {
-                        "multiline": True,
-                        "default": "",
-                        "tooltip": "Prompt for the image generation",
-                    },
+    def define_schema(cls):
+        return comfy_io.Schema(
+            node_id="IdeogramV1",
+            display_name="Ideogram V1",
+            category="api node/image/Ideogram",
+            description="Generates images using the Ideogram V1 model.",
+            inputs=[
+                comfy_io.String.Input(
+                    "prompt",
+                    multiline=True,
+                    default="",
+                    tooltip="Prompt for the image generation",
                ),
-                "turbo": (
-                    IO.BOOLEAN,
-                    {
-                        "default": False,
-                        "tooltip": "Whether to use turbo mode (faster generation, potentially lower quality)",
-                    }
+                comfy_io.Boolean.Input(
+                    "turbo",
+                    default=False,
+                    tooltip="Whether to use turbo mode (faster generation, potentially lower quality)",
                ),
-            },
-            "optional": {
-                "aspect_ratio": (
-                    IO.COMBO,
-                    {
-                        "options": list(V1_V2_RATIO_MAP.keys()),
-                        "default": "1:1",
-                        "tooltip": "The aspect ratio for image generation.",
-                    },
+                comfy_io.Combo.Input(
+                    "aspect_ratio",
+                    options=list(V1_V2_RATIO_MAP.keys()),
+                    default="1:1",
+                    tooltip="The aspect ratio for image generation.",
+                    optional=True,
                ),
-                "magic_prompt_option": (
-                    IO.COMBO,
-                    {
-                        "options": ["AUTO", "ON", "OFF"],
-                        "default": "AUTO",
-                        "tooltip": "Determine if MagicPrompt should be used in generation",
-                    },
+                comfy_io.Combo.Input(
+                    "magic_prompt_option",
+                    options=["AUTO", "ON", "OFF"],
+                    default="AUTO",
+                    tooltip="Determine if MagicPrompt should be used in generation",
+                    optional=True,
                ),
-                "seed": (
-                    IO.INT,
-                    {
-                        "default": 0,
-                        "min": 0,
-                        "max": 2147483647,
-                        "step": 1,
-                        "control_after_generate": True,
-                        "display": "number",
-                    },
+                comfy_io.Int.Input(
+                    "seed",
+                    default=0,
+                    min=0,
+                    max=2147483647,
+                    step=1,
+                    control_after_generate=True,
+                    display_mode=comfy_io.NumberDisplay.number,
+                    optional=True,
                ),
-                "negative_prompt": (
-                    IO.STRING,
-                    {
-                        "multiline": True,
-                        "default": "",
-                        "tooltip": "Description of what to exclude from the image",
-                    },
+                comfy_io.String.Input(
+                    "negative_prompt",
+                    multiline=True,
+                    default="",
+                    tooltip="Description of what to exclude from the image",
+                    optional=True,
                ),
-                "num_images": (
-                    IO.INT,
-                    {"default": 1, "min": 1, "max": 8, "step": 1, "display": "number"},
+                comfy_io.Int.Input(
+                    "num_images",
+                    default=1,
+                    min=1,
+                    max=8,
+                    step=1,
+                    display_mode=comfy_io.NumberDisplay.number,
+                    optional=True,
                ),
-            },
-            "hidden": {
-                "auth_token": "AUTH_TOKEN_COMFY_ORG",
-                "comfy_api_key": "API_KEY_COMFY_ORG",
-                "unique_id": "UNIQUE_ID",
-            },
-        }
+            ],
+            outputs=[
+                comfy_io.Image.Output(),
+            ],
+            hidden=[
+                comfy_io.Hidden.auth_token_comfy_org,
+                comfy_io.Hidden.api_key_comfy_org,
+                comfy_io.Hidden.unique_id,
+            ],
+        )

-    RETURN_TYPES = (IO.IMAGE,)
-    FUNCTION = "api_call"
-    CATEGORY = "api node/image/Ideogram"
-    DESCRIPTION = cleandoc(__doc__ or "")
-    API_NODE = True
-
-    def api_call(
-        self,
+    @classmethod
+    async def execute(
+        cls,
        prompt,
        turbo=False,
        aspect_ratio="1:1",
@@ -337,13 +328,15 @@ class IdeogramV1(ComfyNodeABC):
        seed=0,
        negative_prompt="",
        num_images=1,
-        unique_id=None,
-        **kwargs,
    ):
        # Determine the model based on turbo setting
        aspect_ratio = V1_V2_RATIO_MAP.get(aspect_ratio, None)
        model = "V_1_TURBO" if turbo else "V_1"

+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        operation = SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/ideogram/generate",
@@ -364,10 +357,10 @@ class IdeogramV1(ComfyNodeABC):
                    negative_prompt=negative_prompt if negative_prompt else None,
                )
            ),
-            auth_kwargs=kwargs,
+            auth_kwargs=auth,
        )

-        response = operation.execute()
+        response = await operation.execute()

        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")
@@ -377,93 +370,85 @@ class IdeogramV1(ComfyNodeABC):
        if not image_urls:
            raise Exception("No image URLs were generated in the response")

-        display_image_urls_on_node(image_urls, unique_id)
-        return (download_and_process_images(image_urls),)
+        display_image_urls_on_node(image_urls, cls.hidden.unique_id)
+        return comfy_io.NodeOutput(await download_and_process_images(image_urls))


-class IdeogramV2(ComfyNodeABC):
-    """
-    Generates images using the Ideogram V2 model.
-    """
-
-    def __init__(self):
-        pass
+class IdeogramV2(comfy_io.ComfyNode):

    @classmethod
-    def INPUT_TYPES(cls) -> InputTypeDict:
-        return {
-            "required": {
-                "prompt": (
-                    IO.STRING,
-                    {
-                        "multiline": True,
-                        "default": "",
-                        "tooltip": "Prompt for the image generation",
-                    },
+    def define_schema(cls):
+        return comfy_io.Schema(
+            node_id="IdeogramV2",
+            display_name="Ideogram V2",
+            category="api node/image/Ideogram",
+            description="Generates images using the Ideogram V2 model.",
+            inputs=[
+                comfy_io.String.Input(
+                    "prompt",
+                    multiline=True,
+                    default="",
+                    tooltip="Prompt for the image generation",
                ),
-                "turbo": (
-                    IO.BOOLEAN,
-                    {
-                        "default": False,
-                        "tooltip": "Whether to use turbo mode (faster generation, potentially lower quality)",
-                    }
+                comfy_io.Boolean.Input(
+                    "turbo",
+                    default=False,
+                    tooltip="Whether to use turbo mode (faster generation, potentially lower quality)",
                ),
-            },
-            "optional": {
-                "aspect_ratio": (
-                    IO.COMBO,
-                    {
-                        "options": list(V1_V2_RATIO_MAP.keys()),
-                        "default": "1:1",
-                        "tooltip": "The aspect ratio for image generation. Ignored if resolution is not set to AUTO.",
-                    },
+                comfy_io.Combo.Input(
+                    "aspect_ratio",
+                    options=list(V1_V2_RATIO_MAP.keys()),
+                    default="1:1",
+                    tooltip="The aspect ratio for image generation. Ignored if resolution is not set to AUTO.",
+                    optional=True,
                ),
-                "resolution": (
-                    IO.COMBO,
-                    {
-                        "options": list(V1_V1_RES_MAP.keys()),
-                        "default": "Auto",
-                        "tooltip": "The resolution for image generation. If not set to AUTO, this overrides the aspect_ratio setting.",
-                    },
+                comfy_io.Combo.Input(
+                    "resolution",
+                    options=list(V1_V1_RES_MAP.keys()),
+                    default="Auto",
+                    tooltip="The resolution for image generation. "
+                            "If not set to AUTO, this overrides the aspect_ratio setting.",
+                    optional=True,
                ),
-                "magic_prompt_option": (
-                    IO.COMBO,
-                    {
-                        "options": ["AUTO", "ON", "OFF"],
-                        "default": "AUTO",
-                        "tooltip": "Determine if MagicPrompt should be used in generation",
-                    },
+                comfy_io.Combo.Input(
+                    "magic_prompt_option",
+                    options=["AUTO", "ON", "OFF"],
+                    default="AUTO",
+                    tooltip="Determine if MagicPrompt should be used in generation",
+                    optional=True,
                ),
-                "seed": (
-                    IO.INT,
-                    {
-                        "default": 0,
-                        "min": 0,
-                        "max": 2147483647,
-                        "step": 1,
-                        "control_after_generate": True,
-                        "display": "number",
-                    },
+                comfy_io.Int.Input(
+                    "seed",
+                    default=0,
+                    min=0,
+                    max=2147483647,
+                    step=1,
+                    control_after_generate=True,
+                    display_mode=comfy_io.NumberDisplay.number,
+                    optional=True,
                ),
-                "style_type": (
-                    IO.COMBO,
-                    {
-                        "options": ["AUTO", "GENERAL", "REALISTIC", "DESIGN", "RENDER_3D", "ANIME"],
-                        "default": "NONE",
-                        "tooltip": "Style type for generation (V2 only)",
-                    },
+                comfy_io.Combo.Input(
+                    "style_type",
+                    options=["AUTO", "GENERAL", "REALISTIC", "DESIGN", "RENDER_3D", "ANIME"],
+                    default="NONE",
+                    tooltip="Style type for generation (V2 only)",
+                    optional=True,
                ),
-                "negative_prompt": (
-                    IO.STRING,
-                    {
-                        "multiline": True,
-                        "default": "",
-                        "tooltip": "Description of what to exclude from the image",
-                    },
+                comfy_io.String.Input(
+                    "negative_prompt",
+                    multiline=True,
+                    default="",
+                    tooltip="Description of what to exclude from the image",
+                    optional=True,
                ),
-                "num_images": (
-                    IO.INT,
-                    {"default": 1, "min": 1, "max": 8, "step": 1, "display": "number"},
+                comfy_io.Int.Input(
+                    "num_images",
+                    default=1,
+                    min=1,
+                    max=8,
+                    step=1,
+                    display_mode=comfy_io.NumberDisplay.number,
+                    optional=True,
                ),
                #"color_palette": (
                #    IO.STRING,
@@ -473,22 +458,20 @@ class IdeogramV2(ComfyNodeABC):
                #        "tooltip": "Color palette preset name or hex colors with weights",
                #    },
                #),
-            },
-            "hidden": {
-                "auth_token": "AUTH_TOKEN_COMFY_ORG",
-                "comfy_api_key": "API_KEY_COMFY_ORG",
-                "unique_id": "UNIQUE_ID",
-            },
-        }
+            ],
+            outputs=[
+                comfy_io.Image.Output(),
+            ],
+            hidden=[
+                comfy_io.Hidden.auth_token_comfy_org,
+                comfy_io.Hidden.api_key_comfy_org,
+                comfy_io.Hidden.unique_id,
+            ],
+        )

-    RETURN_TYPES = (IO.IMAGE,)
-    FUNCTION = "api_call"
-    CATEGORY = "api node/image/Ideogram"
-    DESCRIPTION = cleandoc(__doc__ or "")
-    API_NODE = True
-
-    def api_call(
-        self,
+    @classmethod
+    async def execute(
+        cls,
        prompt,
        turbo=False,
        aspect_ratio="1:1",
@@ -499,8 +482,6 @@ class IdeogramV2(ComfyNodeABC):
        negative_prompt="",
        num_images=1,
        color_palette="",
-        unique_id=None,
-        **kwargs,
    ):
        aspect_ratio = V1_V2_RATIO_MAP.get(aspect_ratio, None)
        resolution = V1_V1_RES_MAP.get(resolution, None)
@@ -517,6 +498,10 @@ class IdeogramV2(ComfyNodeABC):
        else:
            final_aspect_ratio = aspect_ratio if aspect_ratio != "ASPECT_1_1" else None

+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        operation = SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/ideogram/generate",
@@ -540,10 +525,10 @@ class IdeogramV2(ComfyNodeABC):
                    color_palette=color_palette if color_palette else None,
                )
            ),
-            auth_kwargs=kwargs,
+            auth_kwargs=auth,
        )

-        response = operation.execute()
+        response = await operation.execute()

        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")
@@ -553,108 +538,99 @@ class IdeogramV2(ComfyNodeABC):
        if not image_urls:
            raise Exception("No image URLs were generated in the response")

-        display_image_urls_on_node(image_urls, unique_id)
-        return (download_and_process_images(image_urls),)
+        display_image_urls_on_node(image_urls, cls.hidden.unique_id)
+        return comfy_io.NodeOutput(await download_and_process_images(image_urls))

-class IdeogramV3(ComfyNodeABC):
-    """
-    Generates images using the Ideogram V3 model. Supports both regular image generation from text prompts and image editing with mask.
-    """

-    def __init__(self):
-        pass
+class IdeogramV3(comfy_io.ComfyNode):

    @classmethod
-    def INPUT_TYPES(cls) -> InputTypeDict:
-        return {
-            "required": {
-                "prompt": (
-                    IO.STRING,
-                    {
-                        "multiline": True,
-                        "default": "",
-                        "tooltip": "Prompt for the image generation or editing",
-                    },
+    def define_schema(cls):
+        return comfy_io.Schema(
+            node_id="IdeogramV3",
+            display_name="Ideogram V3",
+            category="api node/image/Ideogram",
+            description="Generates images using the Ideogram V3 model. "
+                        "Supports both regular image generation from text prompts and image editing with mask.",
+            inputs=[
+                comfy_io.String.Input(
+                    "prompt",
+                    multiline=True,
+                    default="",
+                    tooltip="Prompt for the image generation or editing",
                ),
-            },
-            "optional": {
-                "image": (
-                    IO.IMAGE,
-                    {
-                        "default": None,
-                        "tooltip": "Optional reference image for image editing.",
-                    },
+                comfy_io.Image.Input(
+                    "image",
+                    tooltip="Optional reference image for image editing.",
+                    optional=True,
                ),
-                "mask": (
-                    IO.MASK,
-                    {
-                        "default": None,
-                        "tooltip": "Optional mask for inpainting (white areas will be replaced)",
-                    },
+                comfy_io.Mask.Input(
+                    "mask",
+                    tooltip="Optional mask for inpainting (white areas will be replaced)",
+                    optional=True,
                ),
-                "aspect_ratio": (
-                    IO.COMBO,
-                    {
-                        "options": list(V3_RATIO_MAP.keys()),
-                        "default": "1:1",
-                        "tooltip": "The aspect ratio for image generation. Ignored if resolution is not set to Auto.",
-                    },
+                comfy_io.Combo.Input(
+                    "aspect_ratio",
+                    options=list(V3_RATIO_MAP.keys()),
+                    default="1:1",
+                    tooltip="The aspect ratio for image generation. Ignored if resolution is not set to Auto.",
+                    optional=True,
                ),
-                "resolution": (
-                    IO.COMBO,
-                    {
-                        "options": V3_RESOLUTIONS,
-                        "default": "Auto",
-                        "tooltip": "The resolution for image generation. If not set to Auto, this overrides the aspect_ratio setting.",
-                    },
+                comfy_io.Combo.Input(
+                    "resolution",
+                    options=V3_RESOLUTIONS,
+                    default="Auto",
+                    tooltip="The resolution for image generation. "
+                            "If not set to Auto, this overrides the aspect_ratio setting.",
+                    optional=True,
                ),
-                "magic_prompt_option": (
-                    IO.COMBO,
-                    {
-                        "options": ["AUTO", "ON", "OFF"],
-                        "default": "AUTO",
-                        "tooltip": "Determine if MagicPrompt should be used in generation",
-                    },
+                comfy_io.Combo.Input(
+                    "magic_prompt_option",
+                    options=["AUTO", "ON", "OFF"],
+                    default="AUTO",
+                    tooltip="Determine if MagicPrompt should be used in generation",
+                    optional=True,
                ),
-                "seed": (
-                    IO.INT,
-                    {
-                        "default": 0,
-                        "min": 0,
-                        "max": 2147483647,
-                        "step": 1,
-                        "control_after_generate": True,
-                        "display": "number",
-                    },
+                comfy_io.Int.Input(
+                    "seed",
+                    default=0,
+                    min=0,
+                    max=2147483647,
+                    step=1,
+                    control_after_generate=True,
+                    display_mode=comfy_io.NumberDisplay.number,
+                    optional=True,
                ),
-                "num_images": (
-                    IO.INT,
-                    {"default": 1, "min": 1, "max": 8, "step": 1, "display": "number"},
+                comfy_io.Int.Input(
+                    "num_images",
+                    default=1,
+                    min=1,
+                    max=8,
+                    step=1,
+                    display_mode=comfy_io.NumberDisplay.number,
+                    optional=True,
                ),
-                "rendering_speed": (
-                    IO.COMBO,
-                    {
-                        "options": ["BALANCED", "TURBO", "QUALITY"],
-                        "default": "BALANCED",
-                        "tooltip": "Controls the trade-off between generation speed and quality",
-                    },
+                comfy_io.Combo.Input(
+                    "rendering_speed",
+                    options=["BALANCED", "TURBO", "QUALITY"],
+                    default="BALANCED",
+                    tooltip="Controls the trade-off between generation speed and quality",
+                    optional=True,
                ),
-            },
-            "hidden": {
-                "auth_token": "AUTH_TOKEN_COMFY_ORG",
-                "comfy_api_key": "API_KEY_COMFY_ORG",
-                "unique_id": "UNIQUE_ID",
-            },
-        }
+            ],
+            outputs=[
+                comfy_io.Image.Output(),
+            ],
+            hidden=[
+                comfy_io.Hidden.auth_token_comfy_org,
+                comfy_io.Hidden.api_key_comfy_org,
+                comfy_io.Hidden.unique_id,
+            ],
+        )

-    RETURN_TYPES = (IO.IMAGE,)
-    FUNCTION = "api_call"
-    CATEGORY = "api node/image/Ideogram"
-    DESCRIPTION = cleandoc(__doc__ or "")
-    API_NODE = True
-
-    def api_call(
-        self,
+    @classmethod
+    async def execute(
+        cls,
        prompt,
        image=None,
        mask=None,
@@ -664,9 +640,11 @@ class IdeogramV3(ComfyNodeABC):
        seed=0,
        num_images=1,
        rendering_speed="BALANCED",
-        unique_id=None,
-        **kwargs,
    ):
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        # Check if both image and mask are provided for editing mode
        if image is not None and mask is not None:
            # Edit mode
@@ -686,7 +664,7 @@ class IdeogramV3(ComfyNodeABC):
            # Process image
            img_np = (input_tensor.numpy() * 255).astype(np.uint8)
            img = Image.fromarray(img_np)
-            img_byte_arr = io.BytesIO()
+            img_byte_arr = BytesIO()
            img.save(img_byte_arr, format="PNG")
            img_byte_arr.seek(0)
            img_binary = img_byte_arr
@@ -695,7 +673,7 @@ class IdeogramV3(ComfyNodeABC):
            # Process mask - white areas will be replaced
            mask_np = (mask.squeeze().cpu().numpy() * 255).astype(np.uint8)
            mask_img = Image.fromarray(mask_np)
-            mask_byte_arr = io.BytesIO()
+            mask_byte_arr = BytesIO()
            mask_img.save(mask_byte_arr, format="PNG")
            mask_byte_arr.seek(0)
            mask_binary = mask_byte_arr
@@ -729,7 +707,7 @@ class IdeogramV3(ComfyNodeABC):
                    "mask": mask_binary,
                },
                content_type="multipart/form-data",
-                auth_kwargs=kwargs,
+                auth_kwargs=auth,
            )

        elif image is not None or mask is not None:
@@ -770,11 +748,11 @@ class IdeogramV3(ComfyNodeABC):
                    response_model=IdeogramGenerateResponse,
                ),
                request=gen_request,
-                auth_kwargs=kwargs,
+                auth_kwargs=auth,
            )

        # Execute the operation and process response
-        response = operation.execute()
+        response = await operation.execute()

        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")
@@ -784,18 +762,18 @@ class IdeogramV3(ComfyNodeABC):
        if not image_urls:
            raise Exception("No image URLs were generated in the response")

-        display_image_urls_on_node(image_urls, unique_id)
-        return (download_and_process_images(image_urls),)
+        display_image_urls_on_node(image_urls, cls.hidden.unique_id)
+        return comfy_io.NodeOutput(await download_and_process_images(image_urls))


-NODE_CLASS_MAPPINGS = {
-    "IdeogramV1": IdeogramV1,
-    "IdeogramV2": IdeogramV2,
-    "IdeogramV3": IdeogramV3,
-}
+class IdeogramExtension(ComfyExtension):
+    @override
+    async def get_node_list(self) -> list[type[comfy_io.ComfyNode]]:
+        return [
+            IdeogramV1,
+            IdeogramV2,
+            IdeogramV3,
+        ]

-NODE_DISPLAY_NAME_MAPPINGS = {
-    "IdeogramV1": "Ideogram V1",
-    "IdeogramV2": "Ideogram V2",
-    "IdeogramV3": "Ideogram V3",
-}
+async def comfy_entrypoint() -> IdeogramExtension:
+    return IdeogramExtension()
--- a/comfy_api_nodes/nodes_kling.py
+++ b/comfy_api_nodes/nodes_kling.py
@@ -109,7 +109,7 @@ class KlingApiError(Exception):
    pass


-def poll_until_finished(
+async def poll_until_finished(
    auth_kwargs: dict[str, str],
    api_endpoint: ApiEndpoint[Any, R],
    result_url_extractor: Optional[Callable[[R], str]] = None,
@@ -117,7 +117,7 @@ def poll_until_finished(
    node_id: Optional[str] = None,
 ) -> R:
    """Polls the Kling API endpoint until the task reaches a terminal state, then returns the response."""
-    return PollingOperation(
+    return await PollingOperation(
        poll_endpoint=api_endpoint,
        completed_statuses=[
            KlingTaskStatus.succeed.value,
@@ -278,18 +278,18 @@ def get_images_urls_from_response(response) -> Optional[str]:
        return None


-def video_result_to_node_output(
+async def video_result_to_node_output(
    video: KlingVideoResult,
 ) -> tuple[VideoFromFile, str, str]:
    """Converts a KlingVideoResult to a tuple of (VideoFromFile, str, str) to be used as a ComfyUI node output."""
    return (
-        download_url_to_video_output(video.url),
+        await download_url_to_video_output(str(video.url)),
        str(video.id),
        str(video.duration),
    )


-def image_result_to_node_output(
+async def image_result_to_node_output(
    images: list[KlingImageResult],
 ) -> torch.Tensor:
    """
@@ -297,9 +297,9 @@ def image_result_to_node_output(
    If multiple images are returned, they will be stacked along the batch dimension.
    """
    if len(images) == 1:
-        return download_url_to_image_tensor(images[0].url)
+        return await download_url_to_image_tensor(str(images[0].url))
    else:
-        return torch.cat([download_url_to_image_tensor(image.url) for image in images])
+        return torch.cat([await download_url_to_image_tensor(str(image.url)) for image in images])


 class KlingNodeBase(ComfyNodeABC):
@@ -421,6 +421,8 @@ class KlingTextToVideoNode(KlingNodeBase):
            "pro mode / 10s duration / kling-v2-master": ("pro", "10", "kling-v2-master"),
            "standard mode / 5s duration / kling-v2-master": ("std", "5", "kling-v2-master"),
            "standard mode / 10s duration / kling-v2-master": ("std", "10", "kling-v2-master"),
+            "pro mode / 5s duration / kling-v2-1-master": ("pro", "5", "kling-v2-1-master"),
+            "pro mode / 10s duration / kling-v2-1-master": ("pro", "10", "kling-v2-1-master"),
        }

    @classmethod
@@ -467,10 +469,10 @@ class KlingTextToVideoNode(KlingNodeBase):
    RETURN_NAMES = ("VIDEO", "video_id", "duration")
    DESCRIPTION = "Kling Text to Video Node"

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingText2VideoResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_TEXT_TO_VIDEO}/{task_id}",
@@ -483,7 +485,7 @@ class KlingTextToVideoNode(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        negative_prompt: str,
@@ -519,17 +521,17 @@ class KlingTextToVideoNode(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)

        task_id = task_creation_response.data.task_id
-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingCameraControlT2VNode(KlingTextToVideoNode):
@@ -581,7 +583,7 @@ class KlingCameraControlT2VNode(KlingTextToVideoNode):

    DESCRIPTION = "Transform text into cinematic videos with professional camera movements that simulate real-world cinematography. Control virtual camera actions including zoom, rotation, pan, tilt, and first-person view, while maintaining focus on your original text."

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        negative_prompt: str,
@@ -591,7 +593,7 @@ class KlingCameraControlT2VNode(KlingTextToVideoNode):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            model_name=KlingVideoGenModelName.kling_v1,
            cfg_scale=cfg_scale,
            mode=KlingVideoGenMode.std,
@@ -670,10 +672,10 @@ class KlingImage2VideoNode(KlingNodeBase):
    RETURN_NAMES = ("VIDEO", "video_id", "duration")
    DESCRIPTION = "Kling Image to Video Node"

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingImage2VideoResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_IMAGE_TO_VIDEO}/{task_id}",
@@ -686,7 +688,7 @@ class KlingImage2VideoNode(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        start_frame: torch.Tensor,
        prompt: str,
@@ -733,17 +735,17 @@ class KlingImage2VideoNode(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingCameraControlI2VNode(KlingImage2VideoNode):
@@ -798,7 +800,7 @@ class KlingCameraControlI2VNode(KlingImage2VideoNode):

    DESCRIPTION = "Transform still images into cinematic videos with professional camera movements that simulate real-world cinematography. Control virtual camera actions including zoom, rotation, pan, tilt, and first-person view, while maintaining focus on your original image."

-    def api_call(
+    async def api_call(
        self,
        start_frame: torch.Tensor,
        prompt: str,
@@ -809,7 +811,7 @@ class KlingCameraControlI2VNode(KlingImage2VideoNode):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            model_name=KlingVideoGenModelName.kling_v1_5,
            start_frame=start_frame,
            cfg_scale=cfg_scale,
@@ -897,7 +899,7 @@ class KlingStartEndFrameNode(KlingImage2VideoNode):

    DESCRIPTION = "Generate a video sequence that transitions between your provided start and end images. The node creates all frames in between, producing a smooth transformation from the first frame to the last."

-    def api_call(
+    async def api_call(
        self,
        start_frame: torch.Tensor,
        end_frame: torch.Tensor,
@@ -912,7 +914,7 @@ class KlingStartEndFrameNode(KlingImage2VideoNode):
        mode, duration, model_name = KlingStartEndFrameNode.get_mode_string_mapping()[
            mode
        ]
-        return super().api_call(
+        return await super().api_call(
            prompt=prompt,
            negative_prompt=negative_prompt,
            model_name=model_name,
@@ -964,10 +966,10 @@ class KlingVideoExtendNode(KlingNodeBase):
    RETURN_NAMES = ("VIDEO", "video_id", "duration")
    DESCRIPTION = "Kling Video Extend Node. Extend videos made by other Kling nodes. The video_id is created by using other Kling Nodes."

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingVideoExtendResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_VIDEO_EXTEND}/{task_id}",
@@ -980,7 +982,7 @@ class KlingVideoExtendNode(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        negative_prompt: str,
@@ -1006,17 +1008,17 @@ class KlingVideoExtendNode(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingVideoEffectsBase(KlingNodeBase):
@@ -1025,10 +1027,10 @@ class KlingVideoEffectsBase(KlingNodeBase):
    RETURN_TYPES = ("VIDEO", "STRING", "STRING")
    RETURN_NAMES = ("VIDEO", "video_id", "duration")

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingVideoEffectsResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_VIDEO_EFFECTS}/{task_id}",
@@ -1041,7 +1043,7 @@ class KlingVideoEffectsBase(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        dual_character: bool,
        effect_scene: KlingDualCharacterEffectsScene | KlingSingleImageEffectsScene,
@@ -1084,17 +1086,17 @@ class KlingVideoEffectsBase(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingDualCharacterVideoEffectNode(KlingVideoEffectsBase):
@@ -1142,7 +1144,7 @@ class KlingDualCharacterVideoEffectNode(KlingVideoEffectsBase):
    RETURN_TYPES = ("VIDEO", "STRING")
    RETURN_NAMES = ("VIDEO", "duration")

-    def api_call(
+    async def api_call(
        self,
        image_left: torch.Tensor,
        image_right: torch.Tensor,
@@ -1153,7 +1155,7 @@ class KlingDualCharacterVideoEffectNode(KlingVideoEffectsBase):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        video, _, duration = super().api_call(
+        video, _, duration = await super().api_call(
            dual_character=True,
            effect_scene=effect_scene,
            model_name=model_name,
@@ -1208,7 +1210,7 @@ class KlingSingleImageVideoEffectNode(KlingVideoEffectsBase):

    DESCRIPTION = "Achieve different special effects when generating a video based on the effect_scene."

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        effect_scene: KlingSingleImageEffectsScene,
@@ -1217,7 +1219,7 @@ class KlingSingleImageVideoEffectNode(KlingVideoEffectsBase):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            dual_character=False,
            effect_scene=effect_scene,
            model_name=model_name,
@@ -1253,11 +1255,11 @@ class KlingLipSyncBase(KlingNodeBase):
                f"Text is too long. Maximum length is {MAX_PROMPT_LENGTH_LIP_SYNC} characters."
            )

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingLipSyncResponse:
        """Polls the Kling API endpoint until the task reaches a terminal state."""
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_LIP_SYNC}/{task_id}",
@@ -1270,7 +1272,7 @@ class KlingLipSyncBase(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        audio: Optional[AudioInput] = None,
@@ -1287,12 +1289,12 @@ class KlingLipSyncBase(KlingNodeBase):
        self.validate_lip_sync_video(video)

        # Upload video to Comfy API and get download URL
-        video_url = upload_video_to_comfyapi(video, auth_kwargs=kwargs)
+        video_url = await upload_video_to_comfyapi(video, auth_kwargs=kwargs)
        logging.info("Uploaded video to Comfy API. URL: %s", video_url)

        # Upload the audio file to Comfy API and get download URL
        if audio:
-            audio_url = upload_audio_to_comfyapi(audio, auth_kwargs=kwargs)
+            audio_url = await upload_audio_to_comfyapi(audio, auth_kwargs=kwargs)
            logging.info("Uploaded audio to Comfy API. URL: %s", audio_url)
        else:
            audio_url = None
@@ -1319,17 +1321,17 @@ class KlingLipSyncBase(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingLipSyncAudioToVideoNode(KlingLipSyncBase):
@@ -1357,7 +1359,7 @@ class KlingLipSyncAudioToVideoNode(KlingLipSyncBase):

    DESCRIPTION = "Kling Lip Sync Audio to Video Node. Syncs mouth movements in a video file to the audio content of an audio file. When using, ensure that the audio contains clearly distinguishable vocals and that the video contains a distinct face. The audio file should not be larger than 5MB. The video file should not be larger than 100MB, should have height/width between 720px and 1920px, and should be between 2s and 10s in length."

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        audio: AudioInput,
@@ -1365,7 +1367,7 @@ class KlingLipSyncAudioToVideoNode(KlingLipSyncBase):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            video=video,
            audio=audio,
            voice_language=voice_language,
@@ -1469,7 +1471,7 @@ class KlingLipSyncTextToVideoNode(KlingLipSyncBase):

    DESCRIPTION = "Kling Lip Sync Text to Video Node. Syncs mouth movements in a video file to a text prompt. The video file should not be larger than 100MB, should have height/width between 720px and 1920px, and should be between 2s and 10s in length."

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        text: str,
@@ -1479,7 +1481,7 @@ class KlingLipSyncTextToVideoNode(KlingLipSyncBase):
        **kwargs,
    ):
        voice_id, voice_language = KlingLipSyncTextToVideoNode.get_voice_config()[voice]
-        return super().api_call(
+        return await super().api_call(
            video=video,
            text=text,
            voice_language=voice_language,
@@ -1533,10 +1535,10 @@ class KlingVirtualTryOnNode(KlingImageGenerationBase):

    DESCRIPTION = "Kling Virtual Try On Node. Input a human image and a cloth image to try on the cloth on the human. You can merge multiple clothing item pictures into one image with a white background."

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingVirtualTryOnResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_VIRTUAL_TRY_ON}/{task_id}",
@@ -1549,7 +1551,7 @@ class KlingVirtualTryOnNode(KlingImageGenerationBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        human_image: torch.Tensor,
        cloth_image: torch.Tensor,
@@ -1572,17 +1574,17 @@ class KlingVirtualTryOnNode(KlingImageGenerationBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_image_result_response(final_response)

        images = get_images_from_response(final_response)
-        return (image_result_to_node_output(images),)
+        return (await image_result_to_node_output(images),)


 class KlingImageGenerationNode(KlingImageGenerationBase):
@@ -1655,13 +1657,13 @@ class KlingImageGenerationNode(KlingImageGenerationBase):

    DESCRIPTION = "Kling Image Generation Node. Generate an image from a text prompt with an optional reference image."

-    def get_response(
+    async def get_response(
        self,
        task_id: str,
        auth_kwargs: Optional[dict[str, str]],
        node_id: Optional[str] = None,
    ) -> KlingImageGenerationsResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_IMAGE_GENERATIONS}/{task_id}",
@@ -1674,7 +1676,7 @@ class KlingImageGenerationNode(KlingImageGenerationBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        model_name: KlingImageGenModelName,
        prompt: str,
@@ -1690,7 +1692,11 @@ class KlingImageGenerationNode(KlingImageGenerationBase):
    ):
        self.validate_prompt(prompt, negative_prompt)

-        if image is not None:
+        if image is None:
+            image_type = None
+        elif model_name == KlingImageGenModelName.kling_v1:
+            raise ValueError(f"The model {KlingImageGenModelName.kling_v1.value} does not support reference images.")
+        else:
            image = tensor_to_base64_string(image)

        initial_operation = SynchronousOperation(
@@ -1714,17 +1720,17 @@ class KlingImageGenerationNode(KlingImageGenerationBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_image_result_response(final_response)

        images = get_images_from_response(final_response)
-        return (image_result_to_node_output(images),)
+        return (await image_result_to_node_output(images),)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_luma.py
+++ b/comfy_api_nodes/nodes_luma.py
@@ -38,7 +38,7 @@ from comfy_api_nodes.apinode_utils import (
 )
 from server import PromptServer

-import requests
+import aiohttp
 import torch
 from io import BytesIO

@@ -217,7 +217,7 @@ class LumaImageGenerationNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -234,19 +234,19 @@ class LumaImageGenerationNode(ComfyNodeABC):
        # handle image_luma_ref
        api_image_ref = None
        if image_luma_ref is not None:
-            api_image_ref = self._convert_luma_refs(
+            api_image_ref = await self._convert_luma_refs(
                image_luma_ref, max_refs=4, auth_kwargs=kwargs,
            )
        # handle style_luma_ref
        api_style_ref = None
        if style_image is not None:
-            api_style_ref = self._convert_style_image(
+            api_style_ref = await self._convert_style_image(
                style_image, weight=style_image_weight, auth_kwargs=kwargs,
            )
        # handle character_ref images
        character_ref = None
        if character_image is not None:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                character_image, max_images=4, auth_kwargs=kwargs,
            )
            character_ref = LumaCharacterRef(
@@ -270,7 +270,7 @@ class LumaImageGenerationNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        operation = PollingOperation(
            poll_endpoint=ApiEndpoint(
@@ -286,19 +286,20 @@ class LumaImageGenerationNode(ComfyNodeABC):
            node_id=unique_id,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        img_response = requests.get(response_poll.assets.image)
-        img = process_image_response(img_response)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.image) as img_response:
+                img = process_image_response(await img_response.content.read())
        return (img,)

-    def _convert_luma_refs(
+    async def _convert_luma_refs(
        self, luma_ref: LumaReferenceChain, max_refs: int, auth_kwargs: Optional[dict[str,str]] = None
    ):
        luma_urls = []
        ref_count = 0
        for ref in luma_ref.refs:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                ref.image, max_images=1, auth_kwargs=auth_kwargs
            )
            luma_urls.append(download_urls[0])
@@ -307,13 +308,13 @@ class LumaImageGenerationNode(ComfyNodeABC):
                break
        return luma_ref.create_api_model(download_urls=luma_urls, max_refs=max_refs)

-    def _convert_style_image(
+    async def _convert_style_image(
        self, style_image: torch.Tensor, weight: float, auth_kwargs: Optional[dict[str,str]] = None
    ):
        chain = LumaReferenceChain(
            first_ref=LumaReference(image=style_image, weight=weight)
        )
-        return self._convert_luma_refs(chain, max_refs=1, auth_kwargs=auth_kwargs)
+        return await self._convert_luma_refs(chain, max_refs=1, auth_kwargs=auth_kwargs)


 class LumaImageModifyNode(ComfyNodeABC):
@@ -370,7 +371,7 @@ class LumaImageModifyNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -381,7 +382,7 @@ class LumaImageModifyNode(ComfyNodeABC):
        **kwargs,
    ):
        # first, upload image
-        download_urls = upload_images_to_comfyapi(
+        download_urls = await upload_images_to_comfyapi(
            image, max_images=1, auth_kwargs=kwargs,
        )
        image_url = download_urls[0]
@@ -402,7 +403,7 @@ class LumaImageModifyNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        operation = PollingOperation(
            poll_endpoint=ApiEndpoint(
@@ -418,10 +419,11 @@ class LumaImageModifyNode(ComfyNodeABC):
            node_id=unique_id,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        img_response = requests.get(response_poll.assets.image)
-        img = process_image_response(img_response)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.image) as img_response:
+                img = process_image_response(await img_response.content.read())
        return (img,)


@@ -494,7 +496,7 @@ class LumaTextToVideoGenerationNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -529,7 +531,7 @@ class LumaTextToVideoGenerationNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        if unique_id:
            PromptServer.instance.send_progress_text(f"Luma video generation started: {response_api.id}", unique_id)
@@ -549,10 +551,11 @@ class LumaTextToVideoGenerationNode(ComfyNodeABC):
            estimated_duration=LUMA_T2V_AVERAGE_DURATION,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.assets.video)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.video) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 class LumaImageToVideoGenerationNode(ComfyNodeABC):
@@ -626,7 +629,7 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -644,7 +647,7 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            raise Exception(
                "At least one of first_image and last_image requires an input."
            )
-        keyframes = self._convert_to_keyframes(first_image, last_image, auth_kwargs=kwargs)
+        keyframes = await self._convert_to_keyframes(first_image, last_image, auth_kwargs=kwargs)
        duration = duration if model != LumaVideoModel.ray_1_6 else None
        resolution = resolution if model != LumaVideoModel.ray_1_6 else None

@@ -667,7 +670,7 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        if unique_id:
            PromptServer.instance.send_progress_text(f"Luma video generation started: {response_api.id}", unique_id)
@@ -687,12 +690,13 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            estimated_duration=LUMA_I2V_AVERAGE_DURATION,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.assets.video)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.video) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)

-    def _convert_to_keyframes(
+    async def _convert_to_keyframes(
        self,
        first_image: torch.Tensor = None,
        last_image: torch.Tensor = None,
@@ -703,12 +707,12 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
        frame0 = None
        frame1 = None
        if first_image is not None:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                first_image, max_images=1, auth_kwargs=auth_kwargs,
            )
            frame0 = LumaImageReference(type="image", url=download_urls[0])
        if last_image is not None:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                last_image, max_images=1, auth_kwargs=auth_kwargs,
            )
            frame1 = LumaImageReference(type="image", url=download_urls[0])
--- a/comfy_api_nodes/nodes_minimax.py
+++ b/comfy_api_nodes/nodes_minimax.py
@@ -1,3 +1,4 @@
+from inspect import cleandoc
 from typing import Union
 import logging
 import torch
@@ -10,7 +11,7 @@ from comfy_api_nodes.apis import (
    MinimaxFileRetrieveResponse,
    MinimaxTaskResultResponse,
    SubjectReferenceItem,
-    Model
+    MiniMaxModel
 )
 from comfy_api_nodes.apis.client import (
    ApiEndpoint,
@@ -84,9 +85,8 @@ class MinimaxTextToVideoNode:
    FUNCTION = "generate_video"
    CATEGORY = "api node/video/MiniMax"
    API_NODE = True
-    OUTPUT_NODE = True

-    def generate_video(
+    async def generate_video(
        self,
        prompt_text,
        seed=0,
@@ -104,12 +104,12 @@ class MinimaxTextToVideoNode:
        # upload image, if passed in
        image_url = None
        if image is not None:
-            image_url = upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs)[0]
+            image_url = (await upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs))[0]

        # TODO: figure out how to deal with subject properly, API returns invalid params when using S2V-01 model
        subject_reference = None
        if subject is not None:
-            subject_url = upload_images_to_comfyapi(subject, max_images=1, auth_kwargs=kwargs)[0]
+            subject_url = (await upload_images_to_comfyapi(subject, max_images=1, auth_kwargs=kwargs))[0]
            subject_reference = [SubjectReferenceItem(image=subject_url)]


@@ -121,7 +121,7 @@ class MinimaxTextToVideoNode:
                response_model=MinimaxVideoGenerationResponse,
            ),
            request=MinimaxVideoGenerationRequest(
-                model=Model(model),
+                model=MiniMaxModel(model),
                prompt=prompt_text,
                callback_url=None,
                first_frame_image=image_url,
@@ -130,7 +130,7 @@ class MinimaxTextToVideoNode:
            ),
            auth_kwargs=kwargs,
        )
-        response = video_generate_operation.execute()
+        response = await video_generate_operation.execute()

        task_id = response.task_id
        if not task_id:
@@ -151,7 +151,7 @@ class MinimaxTextToVideoNode:
            node_id=unique_id,
            auth_kwargs=kwargs,
        )
-        task_result = video_generate_operation.execute()
+        task_result = await video_generate_operation.execute()

        file_id = task_result.file_id
        if file_id is None:
@@ -167,7 +167,7 @@ class MinimaxTextToVideoNode:
            request=EmptyRequest(),
            auth_kwargs=kwargs,
        )
-        file_result = file_retrieve_operation.execute()
+        file_result = await file_retrieve_operation.execute()

        file_url = file_result.file.download_url
        if file_url is None:
@@ -182,7 +182,7 @@ class MinimaxTextToVideoNode:
                message = f"Result URL: {file_url}"
            PromptServer.instance.send_progress_text(message, unique_id)

-        video_io = download_url_to_bytesio(file_url)
+        video_io = await download_url_to_bytesio(file_url)
        if video_io is None:
            error_msg = f"Failed to download video from {file_url}"
            logging.error(error_msg)
@@ -251,7 +251,6 @@ class MinimaxImageToVideoNode(MinimaxTextToVideoNode):
    FUNCTION = "generate_video"
    CATEGORY = "api node/video/MiniMax"
    API_NODE = True
-    OUTPUT_NODE = True


 class MinimaxSubjectToVideoNode(MinimaxTextToVideoNode):
@@ -313,7 +312,181 @@ class MinimaxSubjectToVideoNode(MinimaxTextToVideoNode):
    FUNCTION = "generate_video"
    CATEGORY = "api node/video/MiniMax"
    API_NODE = True
-    OUTPUT_NODE = True
+
+
+class MinimaxHailuoVideoNode:
+    """Generates videos from prompt, with optional start frame using the new MiniMax Hailuo-02 model."""
+
+    @classmethod
+    def INPUT_TYPES(s):
+        return {
+            "required": {
+                "prompt_text": (
+                    "STRING",
+                    {
+                        "multiline": True,
+                        "default": "",
+                        "tooltip": "Text prompt to guide the video generation.",
+                    },
+                ),
+            },
+            "optional": {
+                "seed": (
+                    IO.INT,
+                    {
+                        "default": 0,
+                        "min": 0,
+                        "max": 0xFFFFFFFFFFFFFFFF,
+                        "control_after_generate": True,
+                        "tooltip": "The random seed used for creating the noise.",
+                    },
+                ),
+                "first_frame_image": (
+                    IO.IMAGE,
+                    {
+                        "tooltip": "Optional image to use as the first frame to generate a video."
+                    },
+                ),
+                "prompt_optimizer": (
+                    IO.BOOLEAN,
+                    {
+                        "tooltip": "Optimize prompt to improve generation quality when needed.",
+                        "default": True,
+                    },
+                ),
+                "duration": (
+                    IO.COMBO,
+                    {
+                        "tooltip": "The length of the output video in seconds.",
+                        "default": 6,
+                        "options": [6, 10],
+                    },
+                ),
+                "resolution": (
+                    IO.COMBO,
+                    {
+                        "tooltip": "The dimensions of the video display. "
+                                   "1080p corresponds to 1920 x 1080 pixels, 768p corresponds to 1366 x 768 pixels.",
+                        "default": "768P",
+                        "options": ["768P", "1080P"],
+                    },
+                ),
+            },
+            "hidden": {
+                "auth_token": "AUTH_TOKEN_COMFY_ORG",
+                "comfy_api_key": "API_KEY_COMFY_ORG",
+                "unique_id": "UNIQUE_ID",
+            },
+        }
+
+    RETURN_TYPES = ("VIDEO",)
+    DESCRIPTION = cleandoc(__doc__ or "")
+    FUNCTION = "generate_video"
+    CATEGORY = "api node/video/MiniMax"
+    API_NODE = True
+
+    async def generate_video(
+        self,
+        prompt_text,
+        seed=0,
+        first_frame_image: torch.Tensor=None, # used for ImageToVideo
+        prompt_optimizer=True,
+        duration=6,
+        resolution="768P",
+        model="MiniMax-Hailuo-02",
+        unique_id: Union[str, None]=None,
+        **kwargs,
+    ):
+        if first_frame_image is None:
+            validate_string(prompt_text, field_name="prompt_text")
+
+        if model == "MiniMax-Hailuo-02" and resolution.upper() == "1080P" and duration != 6:
+            raise Exception(
+                "When model is MiniMax-Hailuo-02 and resolution is 1080P, duration is limited to 6 seconds."
+            )
+
+        # upload image, if passed in
+        image_url = None
+        if first_frame_image is not None:
+            image_url = (await upload_images_to_comfyapi(first_frame_image, max_images=1, auth_kwargs=kwargs))[0]
+
+        video_generate_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/minimax/video_generation",
+                method=HttpMethod.POST,
+                request_model=MinimaxVideoGenerationRequest,
+                response_model=MinimaxVideoGenerationResponse,
+            ),
+            request=MinimaxVideoGenerationRequest(
+                model=MiniMaxModel(model),
+                prompt=prompt_text,
+                callback_url=None,
+                first_frame_image=image_url,
+                prompt_optimizer=prompt_optimizer,
+                duration=duration,
+                resolution=resolution,
+            ),
+            auth_kwargs=kwargs,
+        )
+        response = await video_generate_operation.execute()
+
+        task_id = response.task_id
+        if not task_id:
+            raise Exception(f"MiniMax generation failed: {response.base_resp}")
+
+        average_duration = 120 if resolution == "768P" else 240
+        video_generate_operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path="/proxy/minimax/query/video_generation",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=MinimaxTaskResultResponse,
+                query_params={"task_id": task_id},
+            ),
+            completed_statuses=["Success"],
+            failed_statuses=["Fail"],
+            status_extractor=lambda x: x.status.value,
+            estimated_duration=average_duration,
+            node_id=unique_id,
+            auth_kwargs=kwargs,
+        )
+        task_result = await video_generate_operation.execute()
+
+        file_id = task_result.file_id
+        if file_id is None:
+            raise Exception("Request was not successful. Missing file ID.")
+        file_retrieve_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/minimax/files/retrieve",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=MinimaxFileRetrieveResponse,
+                query_params={"file_id": int(file_id)},
+            ),
+            request=EmptyRequest(),
+            auth_kwargs=kwargs,
+        )
+        file_result = await file_retrieve_operation.execute()
+
+        file_url = file_result.file.download_url
+        if file_url is None:
+            raise Exception(
+                f"No video was found in the response. Full response: {file_result.model_dump()}"
+            )
+        logging.info(f"Generated video URL: {file_url}")
+        if unique_id:
+            if hasattr(file_result.file, "backup_download_url"):
+                message = f"Result URL: {file_url}\nBackup URL: {file_result.file.backup_download_url}"
+            else:
+                message = f"Result URL: {file_url}"
+            PromptServer.instance.send_progress_text(message, unique_id)
+
+        video_io = await download_url_to_bytesio(file_url)
+        if video_io is None:
+            error_msg = f"Failed to download video from {file_url}"
+            logging.error(error_msg)
+            raise Exception(error_msg)
+        return (VideoFromFile(video_io),)


 # A dictionary that contains all nodes you want to export with their names
@@ -322,6 +495,7 @@ NODE_CLASS_MAPPINGS = {
    "MinimaxTextToVideoNode": MinimaxTextToVideoNode,
    "MinimaxImageToVideoNode": MinimaxImageToVideoNode,
    # "MinimaxSubjectToVideoNode": MinimaxSubjectToVideoNode,
+    "MinimaxHailuoVideoNode": MinimaxHailuoVideoNode,
 }

 # A dictionary that contains the friendly/humanly readable titles for the nodes
@@ -329,4 +503,5 @@ NODE_DISPLAY_NAME_MAPPINGS = {
    "MinimaxTextToVideoNode": "MiniMax Text to Video",
    "MinimaxImageToVideoNode": "MiniMax Image to Video",
    "MinimaxSubjectToVideoNode": "MiniMax Subject to Video",
+    "MinimaxHailuoVideoNode": "MiniMax Hailuo Video",
 }
--- a/comfy_api_nodes/nodes_moonvalley.py
+++ b/comfy_api_nodes/nodes_moonvalley.py
@@ -1,6 +1,5 @@
 import logging
 from typing import Any, Callable, Optional, TypeVar
-import random
 import torch
 from comfy_api_nodes.util.validation_utils import (
    get_image_dimensions,
@@ -95,14 +94,14 @@ def get_video_url_from_response(response) -> Optional[str]:
        return None


-def poll_until_finished(
+async def poll_until_finished(
    auth_kwargs: dict[str, str],
    api_endpoint: ApiEndpoint[Any, R],
    result_url_extractor: Optional[Callable[[R], str]] = None,
    node_id: Optional[str] = None,
 ) -> R:
    """Polls the Moonvalley API endpoint until the task reaches a terminal state, then returns the response."""
-    return PollingOperation(
+    return await PollingOperation(
        poll_endpoint=api_endpoint,
        completed_statuses=[
            "completed",
@@ -208,20 +207,29 @@ def _get_video_dimensions(video: VideoInput) -> tuple[int, int]:
 def _validate_video_dimensions(width: int, height: int) -> None:
    """Validates video dimensions meet Moonvalley V2V requirements."""
    supported_resolutions = {
-        (1920, 1080), (1080, 1920), (1152, 1152),
-        (1536, 1152), (1152, 1536)
+        (1920, 1080),
+        (1080, 1920),
+        (1152, 1152),
+        (1536, 1152),
+        (1152, 1536),
    }

    if (width, height) not in supported_resolutions:
-        supported_list = ', '.join([f'{w}x{h}' for w, h in sorted(supported_resolutions)])
-        raise ValueError(f"Resolution {width}x{height} not supported. Supported: {supported_list}")
+        supported_list = ", ".join(
+            [f"{w}x{h}" for w, h in sorted(supported_resolutions)]
+        )
+        raise ValueError(
+            f"Resolution {width}x{height} not supported. Supported: {supported_list}"
+        )


 def _validate_container_format(video: VideoInput) -> None:
    """Validates video container format is MP4."""
    container_format = video.get_container_format()
-    if container_format not in ['mp4', 'mov,mp4,m4a,3gp,3g2,mj2']:
-        raise ValueError(f"Only MP4 container format supported. Got: {container_format}")
+    if container_format not in ["mp4", "mov,mp4,m4a,3gp,3g2,mj2"]:
+        raise ValueError(
+            f"Only MP4 container format supported. Got: {container_format}"
+        )


 def _validate_and_trim_duration(video: VideoInput) -> VideoInput:
@@ -244,7 +252,6 @@ def _trim_if_too_long(video: VideoInput, duration: float) -> VideoInput:
    return video


-
 def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
    """
    Returns a new VideoInput object trimmed from the beginning to the specified duration,
@@ -302,7 +309,9 @@ def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
        # Calculate target frame count that's divisible by 16
        fps = input_container.streams.video[0].average_rate
        estimated_frames = int(duration_sec * fps)
-        target_frames = (estimated_frames // 16) * 16  # Round down to nearest multiple of 16
+        target_frames = (
+            estimated_frames // 16
+        ) * 16  # Round down to nearest multiple of 16

        if target_frames == 0:
            raise ValueError("Video too short: need at least 16 frames for Moonvalley")
@@ -394,10 +403,10 @@ class BaseMoonvalleyVideoNode:
        else:
            return control_map["Motion Transfer"]

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> MoonvalleyPromptResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{API_PROMPTS_ENDPOINT}/{task_id}",
@@ -424,7 +433,7 @@ class BaseMoonvalleyVideoNode:
                    MoonvalleyTextToVideoInferenceParams,
                    "negative_prompt",
                    multiline=True,
-                    default="low-poly, flat shader, bad rigging, stiff animation, uncanny eyes, low-quality textures, looping glitch, cheap effect, overbloom, bloom spam, default lighting, game asset, stiff face, ugly specular, AI artifacts",
+                    default="<synthetic> <scene cut> gopro, bright, contrast, static, overexposed, vignette, artifacts, still, noise, texture, scanlines, videogame, 360 camera, VR, transition, flare, saturation, distorted, warped, wide angle, saturated, vibrant, glowing, cross dissolve, cheesy, ugly hands, mutated hands, mutant, disfigured, extra fingers, blown out, horrible, blurry, worst quality, bad, dissolve, melt, fade in, fade out, wobbly, weird, low quality, plastic, stock footage, video camera, boring",
                ),
                "resolution": (
                    IO.COMBO,
@@ -441,12 +450,11 @@ class BaseMoonvalleyVideoNode:
                        "tooltip": "Resolution of the output video",
                    },
                ),
-                # "length": (IO.COMBO,{"options":['5s','10s'], "default": '5s'}),
                "prompt_adherence": model_field_to_node_input(
                    IO.FLOAT,
                    MoonvalleyTextToVideoInferenceParams,
                    "guidance_scale",
-                    default=7.0,
+                    default=10.0,
                    step=1,
                    min=1,
                    max=20,
@@ -455,13 +463,12 @@ class BaseMoonvalleyVideoNode:
                    IO.INT,
                    MoonvalleyTextToVideoInferenceParams,
                    "seed",
-                    default=random.randint(0, 2**32 - 1),
+                    default=9,
                    min=0,
                    max=4294967295,
                    step=1,
                    display="number",
                    tooltip="Random seed value",
-                    control_after_generate=True,
                ),
                "steps": model_field_to_node_input(
                    IO.INT,
@@ -507,7 +514,7 @@ class MoonvalleyImg2VideoNode(BaseMoonvalleyVideoNode):
    RETURN_NAMES = ("video",)
    DESCRIPTION = "Moonvalley Marey Image to Video Node"

-    def generate(
+    async def generate(
        self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs
    ):
        image = kwargs.get("image", None)
@@ -532,8 +539,10 @@ class MoonvalleyImg2VideoNode(BaseMoonvalleyVideoNode):
        # Get MIME type from tensor - assuming PNG format for image tensors
        mime_type = "image/png"

-        image_url = upload_images_to_comfyapi(
-            image, max_images=1, auth_kwargs=kwargs, mime_type=mime_type
+        image_url = (
+            await upload_images_to_comfyapi(
+                image, max_images=1, auth_kwargs=kwargs, mime_type=mime_type
+            )
        )[0]

        request = MoonvalleyTextToVideoRequest(
@@ -549,14 +558,14 @@ class MoonvalleyImg2VideoNode(BaseMoonvalleyVideoNode):
            request=request,
            auth_kwargs=kwargs,
        )
-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
-        video = download_url_to_video_output(final_response.output_url)
+        video = await download_url_to_video_output(final_response.output_url)
        return (video,)


@@ -570,17 +579,39 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
        return {
            "required": {
                "prompt": model_field_to_node_input(
-                    IO.STRING, MoonvalleyVideoToVideoRequest, "prompt_text",
-                    multiline=True
+                    IO.STRING,
+                    MoonvalleyVideoToVideoRequest,
+                    "prompt_text",
+                    multiline=True,
                ),
                "negative_prompt": model_field_to_node_input(
                    IO.STRING,
                    MoonvalleyVideoToVideoInferenceParams,
                    "negative_prompt",
                    multiline=True,
-                    default="low-poly, flat shader, bad rigging, stiff animation, uncanny eyes, low-quality textures, looping glitch, cheap effect, overbloom, bloom spam, default lighting, game asset, stiff face, ugly specular, AI artifacts"
+                    default="<synthetic> <scene cut> gopro, bright, contrast, static, overexposed, vignette, artifacts, still, noise, texture, scanlines, videogame, 360 camera, VR, transition, flare, saturation, distorted, warped, wide angle, saturated, vibrant, glowing, cross dissolve, cheesy, ugly hands, mutated hands, mutant, disfigured, extra fingers, blown out, horrible, blurry, worst quality, bad, dissolve, melt, fade in, fade out, wobbly, weird, low quality, plastic, stock footage, video camera, boring",
+                ),
+                "seed": model_field_to_node_input(
+                    IO.INT,
+                    MoonvalleyVideoToVideoInferenceParams,
+                    "seed",
+                    default=9,
+                    min=0,
+                    max=4294967295,
+                    step=1,
+                    display="number",
+                    tooltip="Random seed value",
+                    control_after_generate=False,
+                ),
+                "prompt_adherence": model_field_to_node_input(
+                    IO.FLOAT,
+                    MoonvalleyVideoToVideoInferenceParams,
+                    "guidance_scale",
+                    default=10.0,
+                    step=1,
+                    min=1,
+                    max=20,
                ),
-                "seed": model_field_to_node_input(IO.INT,MoonvalleyVideoToVideoInferenceParams, "seed", default=random.randint(0, 2**32 - 1), min=0, max=4294967295, step=1, display="number", tooltip="Random seed value", control_after_generate=True),
            },
            "hidden": {
                "auth_token": "AUTH_TOKEN_COMFY_ORG",
@@ -588,7 +619,14 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
                "unique_id": "UNIQUE_ID",
            },
            "optional": {
-                "video": (IO.VIDEO, {"default": "", "multiline": False, "tooltip": "The reference video used to generate the output video. Must be at least 5 seconds long. Videos longer than 5s will be automatically trimmed. Only MP4 format supported."}),
+                "video": (
+                    IO.VIDEO,
+                    {
+                        "default": "",
+                        "multiline": False,
+                        "tooltip": "The reference video used to generate the output video. Must be at least 5 seconds long. Videos longer than 5s will be automatically trimmed. Only MP4 format supported.",
+                    },
+                ),
                "control_type": (
                    ["Motion Transfer", "Pose Transfer"],
                    {"default": "Motion Transfer"},
@@ -602,17 +640,24 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
                        "max": 100,
                        "tooltip": "Only used if control_type is 'Motion Transfer'",
                    },
-                )
-            }
+                ),
+                "image": model_field_to_node_input(
+                    IO.IMAGE,
+                    MoonvalleyTextToVideoRequest,
+                    "image_url",
+                    tooltip="The reference image used to generate the video",
+                ),
+            },
        }

    RETURN_TYPES = ("VIDEO",)
    RETURN_NAMES = ("video",)

-    def generate(
+    async def generate(
        self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs
    ):
        video = kwargs.get("video")
+        image = kwargs.get("image", None)

        if not video:
            raise MoonvalleyApiError("video is required")
@@ -620,8 +665,16 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
        video_url = ""
        if video:
            validated_video = validate_video_to_video_input(video)
-            video_url = upload_video_to_comfyapi(validated_video, auth_kwargs=kwargs)
+            video_url = await upload_video_to_comfyapi(
+                validated_video, auth_kwargs=kwargs
+            )
+        mime_type = "image/png"

+        if not image is None:
+            validate_input_image(image, with_frame_conditioning=True)
+            image_url = await upload_images_to_comfyapi(
+                image=image, auth_kwargs=kwargs, max_images=1, mime_type=mime_type
+            )
        control_type = kwargs.get("control_type")
        motion_intensity = kwargs.get("motion_intensity")

@@ -631,12 +684,12 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
        # Only include motion_intensity for Motion Transfer
        control_params = {}
        if control_type == "Motion Transfer" and motion_intensity is not None:
-            control_params['motion_intensity'] = motion_intensity
+            control_params["motion_intensity"] = motion_intensity

-        inference_params=MoonvalleyVideoToVideoInferenceParams(
+        inference_params = MoonvalleyVideoToVideoInferenceParams(
            negative_prompt=negative_prompt,
            seed=kwargs.get("seed"),
-            control_params=control_params
+            control_params=control_params,
        )

        control = self.parseControlParameter(control_type)
@@ -647,6 +700,7 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
            prompt_text=prompt,
            inference_params=inference_params,
        )
+        request.image_url = image_url if not image is None else None

        initial_operation = SynchronousOperation(
            endpoint=ApiEndpoint(
@@ -658,15 +712,15 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
            request=request,
            auth_kwargs=kwargs,
        )
-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )

-        video = download_url_to_video_output(final_response.output_url)
+        video = await download_url_to_video_output(final_response.output_url)

        return (video,)

@@ -688,21 +742,21 @@ class MoonvalleyTxt2VideoNode(BaseMoonvalleyVideoNode):
                del input_types["optional"][param]
        return input_types

-    def generate(
+    async def generate(
        self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs
    ):
        validate_prompts(prompt, negative_prompt, MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
        width_height = self.parseWidthHeightFromRes(kwargs.get("resolution"))

-        inference_params=MoonvalleyTextToVideoInferenceParams(
-                    negative_prompt=negative_prompt,
-                    steps=kwargs.get("steps"),
-                    seed=kwargs.get("seed"),
-                    guidance_scale=kwargs.get("prompt_adherence"),
-                    num_frames=128,
-                    width=width_height.get("width"),
-                    height=width_height.get("height"),
-                )
+        inference_params = MoonvalleyTextToVideoInferenceParams(
+            negative_prompt=negative_prompt,
+            steps=kwargs.get("steps"),
+            seed=kwargs.get("seed"),
+            guidance_scale=kwargs.get("prompt_adherence"),
+            num_frames=128,
+            width=width_height.get("width"),
+            height=width_height.get("height"),
+        )
        request = MoonvalleyTextToVideoRequest(
            prompt_text=prompt, inference_params=inference_params
        )
@@ -717,15 +771,15 @@ class MoonvalleyTxt2VideoNode(BaseMoonvalleyVideoNode):
            request=request,
            auth_kwargs=kwargs,
        )
-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )

-        video = download_url_to_video_output(final_response.output_url)
+        video = await download_url_to_video_output(final_response.output_url)
        return (video,)


--- a/comfy_api_nodes/nodes_openai.py
+++ b/comfy_api_nodes/nodes_openai.py
@@ -80,6 +80,9 @@ class SupportedOpenAIModel(str, Enum):
    gpt_4_1 = "gpt-4.1"
    gpt_4_1_mini = "gpt-4.1-mini"
    gpt_4_1_nano = "gpt-4.1-nano"
+    gpt_5 = "gpt-5"
+    gpt_5_mini = "gpt-5-mini"
+    gpt_5_nano = "gpt-5-nano"


 class OpenAIDalle2(ComfyNodeABC):
@@ -163,7 +166,7 @@ class OpenAIDalle2(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        seed=0,
@@ -233,9 +236,9 @@ class OpenAIDalle2(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

-        img_tensor = validate_and_cast_response(response, node_id=unique_id)
+        img_tensor = await validate_and_cast_response(response, node_id=unique_id)
        return (img_tensor,)


@@ -311,7 +314,7 @@ class OpenAIDalle3(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        seed=0,
@@ -343,9 +346,9 @@ class OpenAIDalle3(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

-        img_tensor = validate_and_cast_response(response, node_id=unique_id)
+        img_tensor = await validate_and_cast_response(response, node_id=unique_id)
        return (img_tensor,)


@@ -446,7 +449,7 @@ class OpenAIGPTImage1(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        seed=0,
@@ -464,8 +467,6 @@ class OpenAIGPTImage1(ComfyNodeABC):
        path = "/proxy/openai/images/generations"
        content_type = "application/json"
        request_class = OpenAIImageGenerationRequest
-        img_binaries = []
-        mask_binary = None
        files = []

        if image is not None:
@@ -484,14 +485,11 @@ class OpenAIGPTImage1(ComfyNodeABC):
                img_byte_arr = io.BytesIO()
                img.save(img_byte_arr, format="PNG")
                img_byte_arr.seek(0)
-                img_binary = img_byte_arr
-                img_binary.name = f"image_{i}.png"

-                img_binaries.append(img_binary)
                if batch_size == 1:
-                    files.append(("image", img_binary))
+                    files.append(("image", (f"image_{i}.png", img_byte_arr, "image/png")))
                else:
-                    files.append(("image[]", img_binary))
+                    files.append(("image[]", (f"image_{i}.png", img_byte_arr, "image/png")))

        if mask is not None:
            if image is None:
@@ -511,9 +509,7 @@ class OpenAIGPTImage1(ComfyNodeABC):
            mask_img_byte_arr = io.BytesIO()
            mask_img.save(mask_img_byte_arr, format="PNG")
            mask_img_byte_arr.seek(0)
-            mask_binary = mask_img_byte_arr
-            mask_binary.name = "mask.png"
-            files.append(("mask", mask_binary))
+            files.append(("mask", ("mask.png", mask_img_byte_arr, "image/png")))

        # Build the operation
        operation = SynchronousOperation(
@@ -537,9 +533,9 @@ class OpenAIGPTImage1(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

-        img_tensor = validate_and_cast_response(response, node_id=unique_id)
+        img_tensor = await validate_and_cast_response(response, node_id=unique_id)
        return (img_tensor,)


@@ -623,7 +619,7 @@ class OpenAIChatNode(OpenAITextNode):

    DESCRIPTION = "Generate text responses from an OpenAI model."

-    def get_result_response(
+    async def get_result_response(
        self,
        response_id: str,
        include: Optional[list[Includable]] = None,
@@ -639,7 +635,7 @@ class OpenAIChatNode(OpenAITextNode):
                creation above for more information.

        """
-        return PollingOperation(
+        return await PollingOperation(
            poll_endpoint=ApiEndpoint(
                path=f"{RESPONSES_ENDPOINT}/{response_id}",
                method=HttpMethod.GET,
@@ -784,7 +780,7 @@ class OpenAIChatNode(OpenAITextNode):

        self.history[session_id] = new_history

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        persist_context: bool,
@@ -815,7 +811,7 @@ class OpenAIChatNode(OpenAITextNode):
            previous_response_id = None

        # Create response
-        create_response = SynchronousOperation(
+        create_response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path=RESPONSES_ENDPOINT,
                method=HttpMethod.POST,
@@ -848,7 +844,7 @@ class OpenAIChatNode(OpenAITextNode):
        response_id = create_response.id

        # Get result output
-        result_response = self.get_result_response(response_id, auth_kwargs=kwargs)
+        result_response = await self.get_result_response(response_id, auth_kwargs=kwargs)
        output_text = self.parse_output_text_from_response(result_response)

        # Update history
@@ -1002,7 +998,7 @@ NODE_DISPLAY_NAME_MAPPINGS = {
    "OpenAIDalle2": "OpenAI DALL·E 2",
    "OpenAIDalle3": "OpenAI DALL·E 3",
    "OpenAIGPTImage1": "OpenAI GPT Image 1",
-    "OpenAIChatNode": "OpenAI Chat",
-    "OpenAIInputFiles": "OpenAI Chat Input Files",
-    "OpenAIChatConfig": "OpenAI Chat Advanced Options",
+    "OpenAIChatNode": "OpenAI ChatGPT",
+    "OpenAIInputFiles": "OpenAI ChatGPT Input Files",
+    "OpenAIChatConfig": "OpenAI ChatGPT Advanced Options",
 }
--- a/comfy_api_nodes/nodes_pika.py
+++ b/comfy_api_nodes/nodes_pika.py
@@ -122,7 +122,7 @@ class PikaNodeBase(ComfyNodeABC):
    FUNCTION = "api_call"
    RETURN_TYPES = ("VIDEO",)

-    def poll_for_task_status(
+    async def poll_for_task_status(
        self,
        task_id: str,
        auth_kwargs: Optional[dict[str, str]] = None,
@@ -152,9 +152,9 @@ class PikaNodeBase(ComfyNodeABC):
            node_id=node_id,
            estimated_duration=60
        )
-        return polling_operation.execute()
+        return await polling_operation.execute()

-    def execute_task(
+    async def execute_task(
        self,
        initial_operation: SynchronousOperation[R, PikaGenerateResponse],
        auth_kwargs: Optional[dict[str, str]] = None,
@@ -169,14 +169,14 @@ class PikaNodeBase(ComfyNodeABC):
        Returns:
            A tuple containing the video file as a VIDEO output.
        """
-        initial_response = initial_operation.execute()
+        initial_response = await initial_operation.execute()
        if not is_valid_initial_response(initial_response):
            error_msg = f"Pika initial request failed. Code: {initial_response.code}, Message: {initial_response.message}, Data: {initial_response.data}"
            logging.error(error_msg)
            raise PikaApiError(error_msg)

        task_id = initial_response.video_id
-        final_response = self.poll_for_task_status(task_id, auth_kwargs)
+        final_response = await self.poll_for_task_status(task_id, auth_kwargs)
        if not is_valid_video_response(final_response):
            error_msg = (
                f"Pika task {task_id} succeeded but no video data found in response."
@@ -187,7 +187,7 @@ class PikaNodeBase(ComfyNodeABC):
        video_url = str(final_response.url)
        logging.info("Pika task %s succeeded. Video URL: %s", task_id, video_url)

-        return (download_url_to_video_output(video_url),)
+        return (await download_url_to_video_output(video_url),)


 class PikaImageToVideoV2_2(PikaNodeBase):
@@ -212,7 +212,7 @@ class PikaImageToVideoV2_2(PikaNodeBase):

    DESCRIPTION = "Sends an image and prompt to the Pika API v2.2 to generate a video."

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt_text: str,
@@ -251,7 +251,7 @@ class PikaImageToVideoV2_2(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaTextToVideoNodeV2_2(PikaNodeBase):
@@ -281,7 +281,7 @@ class PikaTextToVideoNodeV2_2(PikaNodeBase):

    DESCRIPTION = "Sends a text prompt to the Pika API v2.2 to generate a video."

-    def api_call(
+    async def api_call(
        self,
        prompt_text: str,
        negative_prompt: str,
@@ -311,7 +311,7 @@ class PikaTextToVideoNodeV2_2(PikaNodeBase):
            content_type="application/x-www-form-urlencoded",
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaScenesV2_2(PikaNodeBase):
@@ -361,7 +361,7 @@ class PikaScenesV2_2(PikaNodeBase):

    DESCRIPTION = "Combine your images to create a video with the objects in them. Upload multiple images as ingredients and generate a high-quality video that incorporates all of them."

-    def api_call(
+    async def api_call(
        self,
        prompt_text: str,
        negative_prompt: str,
@@ -420,7 +420,7 @@ class PikaScenesV2_2(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikAdditionsNode(PikaNodeBase):
@@ -462,7 +462,7 @@ class PikAdditionsNode(PikaNodeBase):

    DESCRIPTION = "Add any object or image into your video. Upload a video and specify what you'd like to add to create a seamlessly integrated result."

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        image: torch.Tensor,
@@ -481,10 +481,10 @@ class PikAdditionsNode(PikaNodeBase):
        image_bytes_io = tensor_to_bytesio(image)
        image_bytes_io.seek(0)

-        pika_files = [
-            ("video", ("video.mp4", video_bytes_io, "video/mp4")),
-            ("image", ("image.png", image_bytes_io, "image/png")),
-        ]
+        pika_files = {
+            "video": ("video.mp4", video_bytes_io, "video/mp4"),
+            "image": ("image.png", image_bytes_io, "image/png"),
+        }

        # Prepare non-file data
        pika_request_data = PikaBodyGeneratePikadditionsGeneratePikadditionsPost(
@@ -506,7 +506,7 @@ class PikAdditionsNode(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaSwapsNode(PikaNodeBase):
@@ -558,7 +558,7 @@ class PikaSwapsNode(PikaNodeBase):
    DESCRIPTION = "Swap out any object or region of your video with a new image or object. Define areas to replace either with a mask or coordinates."
    RETURN_TYPES = ("VIDEO",)

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        image: torch.Tensor,
@@ -587,11 +587,11 @@ class PikaSwapsNode(PikaNodeBase):
        image_bytes_io = tensor_to_bytesio(image)
        image_bytes_io.seek(0)

-        pika_files = [
-            ("video", ("video.mp4", video_bytes_io, "video/mp4")),
-            ("image", ("image.png", image_bytes_io, "image/png")),
-            ("modifyRegionMask", ("mask.png", mask_bytes_io, "image/png")),
-        ]
+        pika_files = {
+            "video": ("video.mp4", video_bytes_io, "video/mp4"),
+            "image": ("image.png", image_bytes_io, "image/png"),
+            "modifyRegionMask": ("mask.png", mask_bytes_io, "image/png"),
+        }

        # Prepare non-file data
        pika_request_data = PikaBodyGeneratePikaswapsGeneratePikaswapsPost(
@@ -613,7 +613,7 @@ class PikaSwapsNode(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaffectsNode(PikaNodeBase):
@@ -664,7 +664,7 @@ class PikaffectsNode(PikaNodeBase):

    DESCRIPTION = "Generate a video with a specific Pikaffect. Supported Pikaffects: Cake-ify, Crumble, Crush, Decapitate, Deflate, Dissolve, Explode, Eye-pop, Inflate, Levitate, Melt, Peel, Poke, Squish, Ta-da, Tear"

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        pikaffect: str,
@@ -693,7 +693,7 @@ class PikaffectsNode(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaStartEndFrameNode2_2(PikaNodeBase):
@@ -718,7 +718,7 @@ class PikaStartEndFrameNode2_2(PikaNodeBase):

    DESCRIPTION = "Generate a video by combining your first and last frame. Upload two images to define the start and end points, and let the AI create a smooth transition between them."

-    def api_call(
+    async def api_call(
        self,
        image_start: torch.Tensor,
        image_end: torch.Tensor,
@@ -732,10 +732,7 @@ class PikaStartEndFrameNode2_2(PikaNodeBase):
    ) -> tuple[VideoFromFile]:

        pika_files = [
-            (
-                "keyFrames",
-                ("image_start.png", tensor_to_bytesio(image_start), "image/png"),
-            ),
+            ("keyFrames", ("image_start.png", tensor_to_bytesio(image_start), "image/png")),
            ("keyFrames", ("image_end.png", tensor_to_bytesio(image_end), "image/png")),
        ]

@@ -758,7 +755,7 @@ class PikaStartEndFrameNode2_2(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_pixverse.py
+++ b/comfy_api_nodes/nodes_pixverse.py
@@ -30,7 +30,7 @@ from comfy.comfy_types.node_typing import IO, ComfyNodeABC
 from comfy_api.input_impl import VideoFromFile

 import torch
-import requests
+import aiohttp
 from io import BytesIO


@@ -47,7 +47,7 @@ def get_video_url_from_response(
    return str(response.Resp.url)


-def upload_image_to_pixverse(image: torch.Tensor, auth_kwargs=None):
+async def upload_image_to_pixverse(image: torch.Tensor, auth_kwargs=None):
    # first, upload image to Pixverse and get image id to use in actual generation call
    files = {"image": tensor_to_bytesio(image)}
    operation = SynchronousOperation(
@@ -62,7 +62,7 @@ def upload_image_to_pixverse(image: torch.Tensor, auth_kwargs=None):
        content_type="multipart/form-data",
        auth_kwargs=auth_kwargs,
    )
-    response_upload: PixverseImageUploadResponse = operation.execute()
+    response_upload: PixverseImageUploadResponse = await operation.execute()

    if response_upload.Resp is None:
        raise Exception(
@@ -164,7 +164,7 @@ class PixverseTextToVideoNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        aspect_ratio: str,
@@ -205,7 +205,7 @@ class PixverseTextToVideoNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")
@@ -229,11 +229,11 @@ class PixverseTextToVideoNode(ComfyNodeABC):
            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_T2V,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.Resp.url)
-
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 class PixverseImageToVideoNode(ComfyNodeABC):
@@ -302,7 +302,7 @@ class PixverseImageToVideoNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt: str,
@@ -316,7 +316,7 @@ class PixverseImageToVideoNode(ComfyNodeABC):
        **kwargs,
    ):
        validate_string(prompt, strip_whitespace=False)
-        img_id = upload_image_to_pixverse(image, auth_kwargs=kwargs)
+        img_id = await upload_image_to_pixverse(image, auth_kwargs=kwargs)

        # 1080p is limited to 5 seconds duration
        # only normal motion_mode supported for 1080p or for non-5 second duration
@@ -345,7 +345,7 @@ class PixverseImageToVideoNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")
@@ -369,10 +369,11 @@ class PixverseImageToVideoNode(ComfyNodeABC):
            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_I2V,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.Resp.url)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 class PixverseTransitionVideoNode(ComfyNodeABC):
@@ -436,7 +437,7 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        first_frame: torch.Tensor,
        last_frame: torch.Tensor,
@@ -450,8 +451,8 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
        **kwargs,
    ):
        validate_string(prompt, strip_whitespace=False)
-        first_frame_id = upload_image_to_pixverse(first_frame, auth_kwargs=kwargs)
-        last_frame_id = upload_image_to_pixverse(last_frame, auth_kwargs=kwargs)
+        first_frame_id = await upload_image_to_pixverse(first_frame, auth_kwargs=kwargs)
+        last_frame_id = await upload_image_to_pixverse(last_frame, auth_kwargs=kwargs)

        # 1080p is limited to 5 seconds duration
        # only normal motion_mode supported for 1080p or for non-5 second duration
@@ -480,7 +481,7 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")
@@ -504,10 +505,11 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_T2V,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.Resp.url)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 NODE_CLASS_MAPPINGS = {
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
comfyanonymous	71ed4a399e	ComfyUI version 0.3.52	2025-08-23 18:57:09 -04:00
Christian Byrne	3e316c6338	Update frontend to v1.25.10 and revert navigation mode override (#9522 ) - Update comfyui-frontend-package from 1.25.9 to 1.25.10 - Revert forced legacy navigation mode from PR #9518 - Frontend v1.25.10 includes proper navigation mode fixes and improved display text	2025-08-23 17:54:01 -04:00
comfyanonymous	8be0d22ab7	Don't use the annoying new navigation mode by default. (#9518 )	2025-08-23 13:56:17 -04:00
comfyanonymous	59eddda900	Python 3.13 is well supported. (#9511 )	2025-08-23 01:36:44 -04:00
comfyanonymous	41048c69b4	Fix Conditioning masks on 3d latents. (#9506 )	2025-08-22 23:15:44 -04:00
Jedrzej Kosinski	fc247150fe	Implement EasyCache and Invent LazyCache (#9496 ) * Attempting a universal implementation of EasyCache, starting with flux as test; I screwed up the math a bit, but when I set it just right it works. * Fixed math to make threshold work as expected, refactored code to use EasyCacheHolder instead of a dict wrapped by object * Use sigmas from transformer_options instead of timesteps to be compatible with a greater amount of models, make end_percent work * Make log statement when not skipping useful, preparing for per-cond caching * Added DIFFUSION_MODEL wrapper around forward function for wan model * Add subsampling for heuristic inputs * Add subsampling to output_prev (output_prev_subsampled now) * Properly consider conds in EasyCache logic * Created SuperEasyCache to test what happens if caching and reuse is moved outside the scope of conds, added PREDICT_NOISE wrapper to facilitate this test * Change max reuse_threshold to 3.0 * Mark EasyCache/SuperEasyCache as experimental (beta) * Make Lumina2 compatible with EasyCache * Add EasyCache support for Qwen Image * Fix missing comma, curse you Cursor * Add EasyCache support to AceStep * Add EasyCache support to Chroma * Added EasyCache support to Cosmos Predict t2i * Make EasyCache not crash with Cosmos Predict ImagToVideo latents, but does not work well at all * Add EasyCache support to hidream * Added EasyCache support to hunyuan video * Added EasyCache support to hunyuan3d * Added EasyCache support to LTXV (not very good, but does not crash) * Implemented EasyCache for aura_flow * Renamed SuperEasyCache to LazyCache, hardcoded subsample_factor to 8 on nodes * Eatra logging when verbose is true for EasyCache	2025-08-22 22:41:08 -04:00
contentis	fe31ad0276	Add elementwise fusions (#9495 ) * Add elementwise fusions * Add addcmul pattern to Qwen	2025-08-22 19:39:15 -04:00
ComfyUI Wiki	ca4e96a8ae	Update template to 0.1.65 (#9501 )	2025-08-22 17:40:18 -04:00
Alexander Piskun	050c67323c	feat(api-nodes): add copy button to Gemini Chat node (#9440 )	2025-08-22 10:51:14 -07:00
Alexander Piskun	497d41fb50	feat(api-nodes): change "OpenAI Chat" display name to "OpenAI ChatGPT" (#9443 )	2025-08-22 10:50:35 -07:00
comfyanonymous	ff57793659	Support InstantX Qwen controlnet. (#9488 )	2025-08-22 00:53:11 -04:00
comfyanonymous	f7bd5e58dd	Make it easier to implement future qwen controlnets. (#9485 )	2025-08-21 23:18:04 -04:00
Alexander Piskun	7ed73d12d1	[V3] convert Ideogram API nodes to the V3 schema (#9278 ) * convert Ideogram API nodes to the V3 schema * use auth_kwargs instead of auth_token/comfy_api_key	2025-08-21 22:06:51 -04:00
Alexander Piskun	eb39019daa	[V3] convert Google Veo API node to the V3 schema (#9272 ) * convert Google Veo API node to the V3 schema * use own full io.Schema for Veo3VideoGenerationNode * fixed typo * use auth_kwargs instead of auth_token/comfy_api_key	2025-08-21 22:06:13 -04:00
Alexander Piskun	bab08f40d1	v3 nodes (part a) (#9149 )	2025-08-21 22:05:36 -04:00
Alexander Piskun	bc49106837	convert String nodes to V3 schema (#9370 )	2025-08-21 22:03:57 -04:00
comfyanonymous	1b2de2642d	Support diffsynth inpaint controlnet (model patch). (#9471 )	2025-08-21 00:33:49 -04:00
comfyanonymous	9fa1036f60	Forgot this. (#9470 )	2025-08-20 23:09:35 -04:00
saurabh-pingale	0737b7e0d2	fix(userdata): catch invalid workflow filenames (#9434 ) (#9445 )	2025-08-20 22:27:57 -04:00
comfyanonymous	0963493a9c	Support for Qwen Diffsynth Controlnets canny and depth. (#9465 ) These are not real controlnets but actually a patch on the model so they will be treated as such. Put them in the models/model_patches/ folder. Use the new ModelPatchLoader and QwenImageDiffsynthControlnet nodes.	2025-08-20 22:26:37 -04:00
comfyanonymous	e73a9dbe30	Add that qwen edit model is supported to readme. (#9463 )	2025-08-20 17:34:13 -04:00
Harel Cain	fe01885acf	LTXV: fix key frame noise mask dimensions for when real noise mask exists (#9425 )	2025-08-20 03:33:10 -04:00
comfyanonymous	7139d6d93f	ComfyUI version 0.3.51	2025-08-20 03:15:30 -04:00
ComfyUI Wiki	2f52e8f05f	Bump template to 0.1.62 (#9419 ) * Bump template to 0.1.61 * Bump template to 0.1.62	2025-08-20 03:15:09 -04:00
comfyanonymous	8d38ea3bbf	Fix bf16 precision issue with qwen image embeddings. (#9441 )	2025-08-20 02:58:54 -04:00
comfyanonymous	5a8f502db5	Disable prompt weights for qwen. (#9438 )	2025-08-20 01:08:11 -04:00
comfyanonymous	7cd2c4bd6a	Qwen rotary embeddings should now match reference code. (#9437 )	2025-08-20 00:45:27 -04:00
comfyanonymous	dfa791eb4b	Rope fix for qwen vl. (#9435 )	2025-08-19 20:47:42 -04:00
comfyanonymous	bddd69618b	Change the TextEncodeQwenImageEdit node to use logic closer to reference. (#9432 )	2025-08-19 16:49:01 -04:00
Alexander Piskun	54d8fdbed0	feat(api-nodes): add Vidu Video nodes (#9368 )	2025-08-19 16:30:06 -04:00
Alexander Piskun	d844d8b13b	api_nodes: added release version of google's models (#9304 )	2025-08-19 16:29:24 -04:00
Alexander Piskun	07a927517c	api_nodes: add GPT-5 series models (#9325 )	2025-08-19 16:29:01 -04:00
Alexander Piskun	f16a70ba67	api_nodes: add MinimaxHailuoVideoNode node (#9262 )	2025-08-19 16:28:27 -04:00
Alexander Piskun	36b5127fd3	api_nodes: add kling-v2-1 and v2-1-master (#9257 )	2025-08-19 16:28:07 -04:00
comfyanonymous	4977f203fa	P2 of qwen edit model. (#9412 ) * P2 of qwen edit model. * Typo. * Fix normal qwen. * Fix. * Make the TextEncodeQwenImageEdit also set the ref latent. If you don't want it to set the ref latent and want to use the ReferenceLatent node with your custom latent instead just disconnect the VAE.	2025-08-18 22:38:34 -04:00
Alexander Piskun	bd2ab73976	fix(WAN-nodes): invalid nodeid for WanTrackToVideo (#9396 )	2025-08-18 03:26:55 -04:00
Christian Byrne	da2efeaec6	Bump frontend to 1.25.9 (#9394 )	2025-08-17 20:21:02 -07:00
Jedrzej Kosinski	7f3b9b16c6	Make step index detection much more robust (#9392 )	2025-08-17 18:54:07 -04:00
ComfyUI Wiki	d4e353a94e	Update template to 0.1.60 (#9377 )	2025-08-17 17:38:40 -04:00
comfyanonymous	ed43784b0d	WIP Qwen edit model: The diffusion model part. (#9383 )	2025-08-17 16:45:39 -04:00
comfyanonymous	0f2b8525bc	Qwen image model refactor. (#9375 )	2025-08-16 17:51:28 -04:00
Terry Jia	20a84166d0	record audio node (#8716 ) * record audio node * sf	2025-08-16 02:07:12 -04:00
Christian Byrne	ed2e33c69a	bump frontend version to 1.25.8 (#9361 )	2025-08-15 23:32:58 -04:00
comfyanonymous	1702e6df16	Implement wan2.2 camera model. (#9357 ) Use the old WanCameraImageToVideo node.	2025-08-15 17:29:58 -04:00
comfyanonymous	c308a8840a	Add FluxKontextMultiReferenceLatentMethod node. (#9356 ) This node is only useful if someone trains the kontext model to properly use multiple reference images via the index method. The default is the offset method which feeds the multiple images like if they were stitched together as one. This method works with the current flux kontext model.	2025-08-15 15:50:39 -04:00
Alexander Piskun	027c63f63a	fix(OpenAIGPTImage1): set correct MIME type for multipart uploads to OpenAI edits (#9348 )	2025-08-15 14:57:47 -04:00
comfyanonymous	e08ecfbd8a	Add warning when using old pytorch. (#9347 )	2025-08-15 00:22:26 -04:00
comfyanonymous	4e5c230f6a	Fix last commit not working on older pytorch. (#9346 )	2025-08-14 23:44:02 -04:00
Xiangxi Guo (Ryan)	f0d5d0111f	Avoid torch compile graphbreak for older pytorch versions (#9344 ) Turns out torch.compile has some gaps in context manager decorator syntax support. I've sent patches to fix that in PyTorch, but it won't be available for all the folks running older versions of PyTorch, hence this trivial patch.	2025-08-14 23:41:37 -04:00
comfyanonymous	ad19a069f6	Make SLG nodes work on Qwen Image model. (#9345 )	2025-08-14 23:16:01 -04:00
Alexander Piskun	5d65d6753b	convert WAN nodes to V3 schema (#9201 )	2025-08-14 21:48:41 -04:00
guill	deebee4ff6	Update default parameters for Moonvalley video nodes (#9290 ) * Update default parameters for Moonvalley video nodes - Changed default negative prompts to a more extensive list for both BaseMoonvalleyVideoNode and MoonvalleyVideo2VideoNode. - Updated default guidance scale values for both nodes to enhance prompt adherence. - Set a fixed default seed value for consistency in video generation. * no message * ruff fix --------- Co-authored-by: thorsten <thorsten@tripod-digital.co.nz>	2025-08-14 21:46:55 -04:00
Yoland Yan	fa570cbf59	Update CODEOWNERS (#9343 )	2025-08-14 19:44:22 -04:00
filtered	644b23ac0b	Make custom node testing checkbox optional in issue templates (#9342 ) The checkbox for confirming custom node testing is now optional in both bug report and user support templates. This allows users to submit issues even if they haven't been able to test with custom nodes disabled, making the reporting process more accessible.	2025-08-14 17:36:53 -04:00
comfyanonymous	72fd4d22b6	av is an essential dependency. (#9341 )	2025-08-14 16:03:21 -04:00
Jedrzej Kosinski	e4f7ea105f	Added context window support to core sampling code (#9238 ) * Added initial support for basic context windows - in progress * Add prepare_sampling wrapper for context window to more accurately estimate latent memory requirements, fixed merging wrappers/callbacks dicts in prepare_model_patcher * Made context windows compatible with different dimensions; works for WAN, but results are bad * Fix comfy.patcher_extension.merge_nested_dicts calls in prepare_model_patcher in sampler_helpers.py * Considering adding some callbacks to context window code to allow extensions of behavior without the need to rewrite code * Made dim slicing cleaner * Add Wan Context WIndows node for testing * Made context schedule and fuse method functions be stored on the handler instead of needing to be registered in core code to be found * Moved some code around between node_context_windows.py and context_windows.py * Change manual context window nodes names/ids * Added callbacks to IndexListContexHandler * Adjusted default values for context_length and context_overlap, made schema.inputs definition for WAN Context Windows less annoying * Make get_resized_cond more robust for various dim sizes * Fix typo * Another small fix	2025-08-13 21:33:05 -04:00
Simon Lui	c991a5da65	Fix XPU iGPU regressions (#9322 ) * Change bf16 check and switch non-blocking to off default with option to force to regain speed on certain classes of iGPUs and refactor xpu check. * Turn non_blocking off by default for xpu. * Update README.md for Intel GPUs.	2025-08-13 19:13:35 -04:00
comfyanonymous	9df8792d4b	Make last PR not crash comfy on old pytorch. (#9324 )	2025-08-13 15:12:41 -04:00
contentis	3da5a07510	SDPA backend priority (#9299 )	2025-08-13 14:53:27 -04:00
comfyanonymous	afa0a45206	Reduce portable size again. (#9323 ) * compress more * test * not needed	2025-08-13 14:42:08 -04:00
comfyanonymous	615eb52049	Put back frontend version. (#9317 )	2025-08-13 03:48:06 -04:00
comfyanonymous	d5c1954d5c	ComfyUI version 0.3.50	2025-08-13 03:46:38 -04:00
comfyanonymous	e400f26c8f	Downgrade frontend for release. (#9316 )	2025-08-13 03:44:54 -04:00
comfyanonymous	5ca8e2fac3	Update release workflow to python3.13 pytorch cu129 (#9315 ) * Try to reduce size of portable even more. * Update stable release workflow to python 3.13 cu129 * Update dependencies workflow to python3.13 cu129	2025-08-13 03:01:12 -04:00
ComfyUI Wiki	3294782d19	Update template to 0.1.59 (#9313 )	2025-08-13 02:50:50 -04:00
Jedrzej Kosinski	898d88e10e	Make torchaudio exception catching less specific (#9309 )	2025-08-12 23:34:58 -04:00
comfyanonymous	560d38f34c	Wan2.2 fun control support. (#9292 )	2025-08-12 23:26:33 -04:00
comfyanonymous	e1d4f36d8d	Update test release package workflow with python 3.13 cu129. (#9306 )	2025-08-12 20:13:04 -04:00
ComfyUI Wiki	1e3ae1eed8	Update template to 0.1.58 (#9302 )	2025-08-12 17:14:27 -04:00
Alexander Piskun	f4231a80b1	fix(Kling Image API Node): do not pass "image_type" when no image (#9271 ) * fix(Kling Image API Node): do not pass "image_type" when no image * fix(Kling Image API Node): raise client-side error when kling_v1 is used with reference image	2025-08-11 17:15:14 -04:00
PsychoLogicAu	2208aa616d	Support SimpleTuner lycoris lora for Qwen-Image (#9280 )	2025-08-11 16:56:16 -04:00
ComfyUI Wiki	629b173837	Update template & embedded docs (#9283 ) * Update template & embedded docs * Update embedded docs to 0.2.6	2025-08-11 16:52:12 -04:00
Alexander Piskun	fa340add55	remove creation of non-used asyncio_loop (#9284 )	2025-08-11 16:48:17 -04:00
comfyanonymous	966f3a5206	Only show feature flags log when verbose. (#9281 )	2025-08-11 05:53:01 -04:00
comfyanonymous	0552de7c7d	Bump pytorch cuda and rocm versions in readme instructions. (#9273 )	2025-08-10 05:03:47 -04:00
comfyanonymous	5828607ccf	Not sure if AMD actually support fp16 acc but it doesn't crash. (#9258 )	2025-08-09 12:49:25 -04:00
comfyanonymous	735bb4bdb1	Users report gfx1201 is buggy on flux with pytorch attention. (#9244 )	2025-08-08 04:21:00 -04:00
Alexander Piskun	bf2a1b5b1e	async API nodes (#9129 ) * converted API nodes to async * converted BFL API nodes to async * fixed client bug; converted gemini, ideogram, minimax * fixed client bug; converted openai nodes * fixed client bug; converted moonvalley, pika nodes * fixed client bug; converted kling, luma nodes * converted pixverse, rodin nodes * converted tripo, veo2 * converted recraft nodes * add lost log_request_response call	2025-08-07 23:37:50 -04:00
Jedrzej Kosinski	42974a448c	_ui.py import torchaudio safety check (#9234 ) * Added safety around torchaudio import in _ui.py * Trusted cursor too much, fixed torchaudio bool	2025-08-07 17:54:09 -04:00
comfyanonymous	05df2df489	Fix RepeatLatentBatch not working on multi dim latents. (#9227 )	2025-08-07 11:20:40 -04:00
Christian Byrne	37d620a6b8	Update frontend to v1.24.3 (#9175 )	2025-08-06 19:52:39 -04:00
ComfyUI Wiki	32691b16f4	Update template to 0.1.52 (#9206 )	2025-08-06 13:26:29 -04:00
flybirdxx	4c3e57b0ae	Fixed an issue where qwenLora could not be loaded properly. (#9208 )	2025-08-06 13:23:11 -04:00
comfyanonymous	9126c0cfe4	Qwen Image model merging node. (#9202 )	2025-08-06 04:07:04 -04:00
comfyanonymous	d8c51ba15a	Add Qwen Image model to readme. (#9191 )	2025-08-05 07:41:18 -04:00
comfyanonymous	32a95bba8a	ComfyUI version 0.3.49	2025-08-05 07:33:02 -04:00
ComfyUI Wiki	da1ad9b516	Update template to 0.1.51 (#9187 )	2025-08-05 07:24:12 -04:00
comfyanonymous	d044a24398	Fix default shift and any latent size for qwen image model. (#9186 )	2025-08-05 06:12:27 -04:00
ComfyUI Wiki	5be6fd09ff	Update template to 0.1.48 (#9182 )	2025-08-05 03:48:56 -04:00
Christian Byrne	f69609bbd6	Add Veo3 video generation node with audio support (#9110 ) - Create new Veo3VideoGenerationNode that extends VeoVideoGenerationNode - Add support for generateAudio parameter (only for Veo3 models) - Support new Veo3 models: veo-3.0-generate-001, veo-3.0-fast-generate-001 - Fix Veo3 duration constraint to 8 seconds only - Update original node to be clearly Veo 2 only - Update API paths to use model parameter: /proxy/veo/{model}/generate - Regenerate API types from staging to include generateAudio parameter - Fix TripoModelVersion enum reference after regeneration - Mark generated API types file in .gitattributes	2025-08-05 01:52:25 -04:00
comfyanonymous	c012400240	Initial support for qwen image model. (#9179 )	2025-08-04 22:53:25 -04:00
comfyanonymous	03895dea7c	Fix another issue with the PR. (#9170 )	2025-08-04 04:33:04 -04:00
comfyanonymous	84f9759424	Add some warnings and prevent crash when cond devices don't match. (#9169 )	2025-08-04 04:20:12 -04:00
comfyanonymous	7991341e89	Various fixes for broken things from earlier PR. (#9168 )	2025-08-04 04:02:40 -04:00
comfyanonymous	140ffc7fdc	Fix broken controlnet from last PR. (#9167 )	2025-08-04 03:28:12 -04:00
comfyanonymous	182f90b5ec	Lower cond vram use by casting at the same time as device transfer. (#9159 )	2025-08-04 03:11:53 -04:00
comfyanonymous	aebac22193	Cleanup. (#9160 )	2025-08-03 07:08:11 -04:00
comfyanonymous	13aaa66ec2	Make sure context is on the right device. (#9154 )	2025-08-02 15:09:23 -04:00
comfyanonymous	5f582a9757	Make sure all the conds are on the right device. (#9151 )	2025-08-02 15:00:13 -04:00
ComfyUI Wiki	fbcc23945d	Update template to 0.1.47 (#9153 )	2025-08-02 14:15:29 -04:00
Johnpaul Chiwetelu	3dfefc88d0	API for Recently Used Items (#8792 ) * feat: add file creation time to model file metadata and user file info * fix linting	2025-08-01 22:02:06 -04:00
comfyanonymous	bff60b5cfc	ComfyUI version 0.3.48	2025-08-01 20:03:22 -04:00
comfyanonymous	1e638a140b	Tiny wan vae optimizations. (#9136 )	2025-08-01 05:25:38 -04:00
ComfyUI Wiki	4696d74305	update template to 0.1.45 (#9135 )	2025-08-01 03:06:18 -04:00
comfyanonymous	5ee381c058	Fix WanFirstLastFrameToVideo node when no clip vision. (#9134 )	2025-07-31 23:33:27 -04:00
Jedrzej Kosinski	4887743a2a	V3 Node Schema Definition - initial (#8656 )	2025-07-31 18:02:12 -04:00
comfyanonymous	97b8a2c26a	More accurate explanation of release process. (#9126 )	2025-07-31 05:46:23 -04:00
guill	97eb256a35	Add support for partial execution in backend (#9123 ) When a prompt is submitted, it can optionally include `partial_execution_targets` as a list of ids. If it does, rather than adding all outputs to the execution list, we add only those in the list.	2025-07-30 22:55:28 -04:00
chaObserv	61b08d4ba6	Replace manual x * sigmoid(x) with torch silu in VAE nonlinearity (#9057 )	2025-07-30 19:25:56 -04:00
comfyanonymous	da9dab7edd	Small wan camera memory optimization. (#9111 )	2025-07-30 05:55:26 -04:00
ComfyUI Wiki	d2aaef029c	Update template to 0.1.44 (#9104 )	2025-07-29 22:50:49 -04:00
guill	0a3d062e06	ComfyAPI Core v0.0.2 (#8962 ) * ComfyAPI Core v0.0.2 * Respond to PR feedback * Fix Python 3.9 errors * Fix missing backward compatibility proxy * Reorganize types a bit The input types, input impls, and utility types are now all available in the versioned API. See the change in `comfy_extras/nodes_video.py` for an example of their usage. * Remove the need for `--generate-api-stubs` * Fix generated stubs differing by Python version * Fix ruff formatting issues	2025-07-29 22:17:22 -04:00
comfyanonymous	2f74e17975	ComfyUI version 0.3.47	2025-07-29 20:08:25 -04:00
comfyanonymous	dca6bdd4fa	Make wan2.2 5B i2v take a lot less memory. (#9102 )	2025-07-29 19:44:18 -04:00
comfyanonymous	7d593baf91	Extra reserved vram on large cards on windows. (#9093 )	2025-07-29 04:07:45 -04:00
comfyanonymous	c60dc4177c	Remove unecessary clones in the wan2.2 VAE. (#9083 )	2025-07-28 14:48:19 -04:00