Change node id to reflect node name

Make sure models_memory_reserve is considered with inference_memory as well in max func calls
Renamed to Reserve Additional Memory
2026-02-12 11:10:03 +00:00 · 2025-08-18 15:39:16 -07:00 · 2025-08-18 15:34:53 -07:00 · 2025-08-18 15:04:49 -07:00 · 2025-08-18 14:51:11 -07:00 · 2025-08-18 14:45:21 -07:00
67 changed files with 5612 additions and 1569 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,2 +1,3 @@
 /web/assets/** linguist-generated
 /web/** linguist-vendored
+comfy_api_nodes/apis/__init__.py linguist-generated
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@@ -22,7 +22,7 @@ body:
      description: Please confirm you have tried to reproduce the issue with all custom nodes disabled.
      options:
        - label: I have tried disabling custom nodes and the issue persists (see [how to disable custom nodes](https://docs.comfy.org/troubleshooting/custom-node-issues#step-1%3A-test-with-all-custom-nodes-disabled) if you need help)
-          required: true
+          required: false
  - type: textarea
    attributes:
      label: Expected Behavior
--- a/.github/ISSUE_TEMPLATE/user-support.yml
+++ b/.github/ISSUE_TEMPLATE/user-support.yml
@@ -18,7 +18,7 @@ body:
        description: Please confirm you have tried to reproduce the issue with all custom nodes disabled.
        options:
          - label: I have tried disabling custom nodes and the issue persists (see [how to disable custom nodes](https://docs.comfy.org/troubleshooting/custom-node-issues#step-1%3A-test-with-all-custom-nodes-disabled) if you need help)
-            required: true
+            required: false
    - type: textarea
      attributes:
            label: Your question
--- a/.github/workflows/stable-release.yml
+++ b/.github/workflows/stable-release.yml
@@ -12,17 +12,17 @@ on:
        description: 'CUDA version'
        required: true
        type: string
-        default: "128"
+        default: "129"
      python_minor:
        description: 'Python minor version'
        required: true
        type: string
-        default: "12"
+        default: "13"
      python_patch:
        description: 'Python patch version'
        required: true
        type: string
-        default: "10"
+        default: "6"


 jobs:
@@ -66,8 +66,13 @@ jobs:
          curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
          ./python.exe get-pip.py
          ./python.exe -s -m pip install ../cu${{ inputs.cu }}_python_deps/*
-            sed -i '1i../ComfyUI' ./python3${{ inputs.python_minor }}._pth
-            cd ..
+          sed -i '1i../ComfyUI' ./python3${{ inputs.python_minor }}._pth
+
+          rm ./Lib/site-packages/torch/lib/dnnl.lib #I don't think this is actually used and I need the space
+          rm ./Lib/site-packages/torch/lib/libprotoc.lib
+          rm ./Lib/site-packages/torch/lib/libprotobuf.lib
+
+          cd ..

          git clone --depth 1 https://github.com/comfyanonymous/taesd
          cp taesd/*.safetensors ./ComfyUI_copy/models/vae_approx/
@@ -85,7 +90,7 @@ jobs:

          cd ..

-          "C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=9 -mfb=128 -md=512m -ms=on -mf=BCJ2 ComfyUI_windows_portable.7z ComfyUI_windows_portable
+          "C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=9 -mfb=128 -md=768m -ms=on -mf=BCJ2 ComfyUI_windows_portable.7z ComfyUI_windows_portable
          mv ComfyUI_windows_portable.7z ComfyUI/ComfyUI_windows_portable_nvidia.7z

          cd ComfyUI_windows_portable
--- a/.github/workflows/windows_release_dependencies.yml
+++ b/.github/workflows/windows_release_dependencies.yml
@@ -17,19 +17,19 @@ on:
        description: 'cuda version'
        required: true
        type: string
-        default: "128"
+        default: "129"

      python_minor:
        description: 'python minor version'
        required: true
        type: string
-        default: "12"
+        default: "13"

      python_patch:
        description: 'python patch version'
        required: true
        type: string
-        default: "10"
+        default: "6"
 #  push:
 #    branches:
 #      - master
--- a/.github/workflows/windows_release_package.yml
+++ b/.github/workflows/windows_release_package.yml
@@ -7,19 +7,19 @@ on:
        description: 'cuda version'
        required: true
        type: string
-        default: "128"
+        default: "129"

      python_minor:
        description: 'python minor version'
        required: true
        type: string
-        default: "12"
+        default: "13"

      python_patch:
        description: 'python patch version'
        required: true
        type: string
-        default: "10"
+        default: "6"
 #  push:
 #    branches:
 #      - master
@@ -64,6 +64,10 @@ jobs:
            ./python.exe get-pip.py
            ./python.exe -s -m pip install ../cu${{ inputs.cu }}_python_deps/*
            sed -i '1i../ComfyUI' ./python3${{ inputs.python_minor }}._pth
+
+            rm ./Lib/site-packages/torch/lib/dnnl.lib #I don't think this is actually used and I need the space
+            rm ./Lib/site-packages/torch/lib/libprotoc.lib
+            rm ./Lib/site-packages/torch/lib/libprotobuf.lib
            cd ..

            git clone --depth 1 https://github.com/comfyanonymous/taesd
@@ -82,7 +86,7 @@ jobs:

            cd ..

-            "C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=9 -mfb=128 -md=512m -ms=on -mf=BCJ2 ComfyUI_windows_portable.7z ComfyUI_windows_portable
+            "C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=9 -mfb=128 -md=768m -ms=on -mf=BCJ2 ComfyUI_windows_portable.7z ComfyUI_windows_portable
            mv ComfyUI_windows_portable.7z ComfyUI/new_ComfyUI_windows_portable_nvidia_cu${{ inputs.cu }}_or_cpu.7z

            cd ComfyUI_windows_portable
--- a/27
+++ b/27
@@ -5,20 +5,21 @@
 # Inlined the team members for now.

 # Maintainers
-*.md @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/tests/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/tests-unit/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/notebooks/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/script_examples/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/.github/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/requirements.txt @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
-/pyproject.toml @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne
+*.md @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/tests/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/tests-unit/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/notebooks/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/script_examples/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/.github/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/requirements.txt @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill
+/pyproject.toml @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @Kosinkadink @christian-byrne @guill

 # Python web server
-/api_server/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne
-/app/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne
-/utils/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne
+/api_server/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne @guill
+/app/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne @guill
+/utils/ @yoland68 @robinjhuang @webfiltered @pythongosssss @ltdrdata @christian-byrne @guill

 # Node developers
-/comfy_extras/ @yoland68 @robinjhuang @pythongosssss @ltdrdata @Kosinkadink @webfiltered @christian-byrne
-/comfy/comfy_types/ @yoland68 @robinjhuang @pythongosssss @ltdrdata @Kosinkadink @webfiltered @christian-byrne
+/comfy_extras/ @yoland68 @robinjhuang @pythongosssss @ltdrdata @Kosinkadink @webfiltered @christian-byrne @guill
+/comfy/comfy_types/ @yoland68 @robinjhuang @pythongosssss @ltdrdata @Kosinkadink @webfiltered @christian-byrne @guill
+/comfy_api_nodes/ @yoland68 @robinjhuang @pythongosssss @ltdrdata @Kosinkadink @webfiltered @christian-byrne @guill
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ ComfyUI lets you design and execute advanced stable diffusion pipelines using a
 ## Get Started

 #### [Desktop Application](https://www.comfy.org/download)
- The easiest way to get started. 
+- The easiest way to get started.
 - Available on Windows & macOS.

 #### [Windows Portable Package](#installing)
@@ -66,6 +66,7 @@ See what ComfyUI can do with the [example workflows](https://comfyanonymous.gith
   - [Lumina Image 2.0](https://comfyanonymous.github.io/ComfyUI_examples/lumina2/)
   - [HiDream](https://comfyanonymous.github.io/ComfyUI_examples/hidream/)
   - [Cosmos Predict2](https://comfyanonymous.github.io/ComfyUI_examples/cosmos_predict2/)
+   - [Qwen Image](https://comfyanonymous.github.io/ComfyUI_examples/qwen_image/)
 - Image Editing Models
   - [Omnigen 2](https://comfyanonymous.github.io/ComfyUI_examples/omnigen/)
   - [Flux Kontext](https://comfyanonymous.github.io/ComfyUI_examples/flux/#flux-kontext-image-editing-model)
@@ -202,7 +203,7 @@ Put your VAE in: models/vae
 ### AMD GPUs (Linux only)
 AMD users can install rocm and pytorch with pip if you don't have it already installed, this is the command to install the stable version:

-```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3```
+```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.4```

 This is the command to install the nightly with ROCm 6.4 which might have some performance improvements:

@@ -210,33 +211,25 @@ This is the command to install the nightly with ROCm 6.4 which might have some p

 ### Intel GPUs (Windows and Linux)

-(Option 1) Intel Arc GPU users can install native PyTorch with torch.xpu support using pip (currently available in PyTorch nightly builds). More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)
-  
-1. To install PyTorch nightly, use the following command:
+(Option 1) Intel Arc GPU users can install native PyTorch with torch.xpu support using pip. More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)
+
+1. To install PyTorch xpu, use the following command:
+
+```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu```
+
+This is the command to install the Pytorch xpu nightly which might have some performance improvements:

 ```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu```

-2. Launch ComfyUI by running `python main.py`
-
-
 (Option 2) Alternatively, Intel GPUs supported by Intel Extension for PyTorch (IPEX) can leverage IPEX for improved performance.

-1. For Intel® Arc™ A-Series Graphics utilizing IPEX, create a conda environment and use the commands below:
-
-```
-conda install libuv
-pip install torch==2.3.1.post0+cxx11.abi torchvision==0.18.1.post0+cxx11.abi torchaudio==2.3.1.post0+cxx11.abi intel-extension-for-pytorch==2.3.110.post0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
-```
-
-For other supported Intel GPUs with IPEX, visit [Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu) for more information.
-
-Additional discussion and help can be found [here](https://github.com/comfyanonymous/ComfyUI/discussions/476).
+1. visit [Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu) for more information.

 ### NVIDIA

 Nvidia users should install stable pytorch using this command:

-```pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu128```
+```pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu129```

 This is the command to install pytorch nightly instead which might have performance improvements.

@@ -351,7 +344,7 @@ Generate a self-signed certificate (not appropriate for shared/production use) a

 Use `--tls-keyfile key.pem --tls-certfile cert.pem` to enable TLS/SSL, the app will now be accessible with `https://...` instead of `http://...`.

-> Note: Windows users can use [alexisrolland/docker-openssl](https://github.com/alexisrolland/docker-openssl) or one of the [3rd party binary distributions](https://wiki.openssl.org/index.php/Binaries) to run the command example above. 
+> Note: Windows users can use [alexisrolland/docker-openssl](https://github.com/alexisrolland/docker-openssl) or one of the [3rd party binary distributions](https://wiki.openssl.org/index.php/Binaries) to run the command example above.
 <br/><br/>If you use a container, note that the volume mount `-v` can be a relative path so `... -v ".\:/openssl-certs" ...` would create the key & cert files in the current directory of your command prompt or powershell terminal.

 ## Support and dev channel
--- a/app/model_manager.py
+++ b/app/model_manager.py
@@ -130,10 +130,21 @@ class ModelFileManager:

            for file_name in filenames:
                try:
-                    relative_path = os.path.relpath(os.path.join(dirpath, file_name), directory)
-                    result.append(relative_path)
-                except:
-                    logging.warning(f"Warning: Unable to access {file_name}. Skipping this file.")
+                    full_path = os.path.join(dirpath, file_name)
+                    relative_path = os.path.relpath(full_path, directory)
+
+                    # Get file metadata
+                    file_info = {
+                        "name": relative_path,
+                        "pathIndex": pathIndex,
+                        "modified": os.path.getmtime(full_path),  # Add modification time
+                        "created": os.path.getctime(full_path),   # Add creation time
+                        "size": os.path.getsize(full_path)        # Add file size
+                    }
+                    result.append(file_info)
+
+                except Exception as e:
+                    logging.warning(f"Warning: Unable to access {file_name}. Error: {e}. Skipping this file.")
                    continue

            for d in subdirs:
@@ -144,7 +155,7 @@ class ModelFileManager:
                    logging.warning(f"Warning: Unable to access {path}. Skipping this path.")
                    continue

-        return [{"name": f, "pathIndex": pathIndex} for f in result], dirs, time.perf_counter()
+        return result, dirs, time.perf_counter()

    def get_model_previews(self, filepath: str) -> list[str | BytesIO]:
        dirname = os.path.dirname(filepath)
--- a/app/user_manager.py
+++ b/app/user_manager.py
@@ -20,13 +20,15 @@ class FileInfo(TypedDict):
    path: str
    size: int
    modified: int
+    created: int


 def get_file_info(path: str, relative_to: str) -> FileInfo:
    return {
        "path": os.path.relpath(path, relative_to).replace(os.sep, '/'),
        "size": os.path.getsize(path),
-        "modified": os.path.getmtime(path)
+        "modified": os.path.getmtime(path),
+        "created": os.path.getctime(path)
    }


--- a/comfy/cli_args.py
+++ b/comfy/cli_args.py
@@ -132,6 +132,8 @@ parser.add_argument("--reserve-vram", type=float, default=None, help="Set the am

 parser.add_argument("--async-offload", action="store_true", help="Use async weight offloading.")

+parser.add_argument("--force-non-blocking", action="store_true", help="Force ComfyUI to use non-blocking operations for all applicable tensors. This may improve performance on some non-Nvidia systems but can cause issues with some workflows.")
+
 parser.add_argument("--default-hashing-function", type=str, choices=['md5', 'sha1', 'sha256', 'sha512'], default='sha256', help="Allows you to choose the hash function to use for duplicate filename / contents comparison. Default is sha256.")

 parser.add_argument("--disable-smart-memory", action="store_true", help="Force ComfyUI to agressively offload to regular ram instead of keeping models in vram when it can.")
--- a/comfy/conds.py
+++ b/comfy/conds.py
@@ -1,6 +1,7 @@
 import torch
 import math
 import comfy.utils
+import logging


 class CONDRegular:
@@ -10,12 +11,15 @@ class CONDRegular:
    def _copy_with(self, cond):
        return self.__class__(cond)

-    def process_cond(self, batch_size, device, **kwargs):
-        return self._copy_with(comfy.utils.repeat_to_batch_size(self.cond, batch_size).to(device))
+    def process_cond(self, batch_size, **kwargs):
+        return self._copy_with(comfy.utils.repeat_to_batch_size(self.cond, batch_size))

    def can_concat(self, other):
        if self.cond.shape != other.cond.shape:
            return False
+        if self.cond.device != other.cond.device:
+            logging.warning("WARNING: conds not on same device, skipping concat.")
+            return False
        return True

    def concat(self, others):
@@ -29,14 +33,14 @@ class CONDRegular:


 class CONDNoiseShape(CONDRegular):
-    def process_cond(self, batch_size, device, area, **kwargs):
+    def process_cond(self, batch_size, area, **kwargs):
        data = self.cond
        if area is not None:
            dims = len(area) // 2
            for i in range(dims):
                data = data.narrow(i + 2, area[i + dims], area[i])

-        return self._copy_with(comfy.utils.repeat_to_batch_size(data, batch_size).to(device))
+        return self._copy_with(comfy.utils.repeat_to_batch_size(data, batch_size))


 class CONDCrossAttn(CONDRegular):
@@ -51,6 +55,9 @@ class CONDCrossAttn(CONDRegular):
            diff = mult_min // min(s1[1], s2[1])
            if diff > 4: #arbitrary limit on the padding because it's probably going to impact performance negatively if it's too much
                return False
+        if self.cond.device != other.cond.device:
+            logging.warning("WARNING: conds not on same device: skipping concat.")
+            return False
        return True

    def concat(self, others):
@@ -73,7 +80,7 @@ class CONDConstant(CONDRegular):
    def __init__(self, cond):
        self.cond = cond

-    def process_cond(self, batch_size, device, **kwargs):
+    def process_cond(self, batch_size, **kwargs):
        return self._copy_with(self.cond)

    def can_concat(self, other):
@@ -92,10 +99,10 @@ class CONDList(CONDRegular):
    def __init__(self, cond):
        self.cond = cond

-    def process_cond(self, batch_size, device, **kwargs):
+    def process_cond(self, batch_size, **kwargs):
        out = []
        for c in self.cond:
-            out.append(comfy.utils.repeat_to_batch_size(c, batch_size).to(device))
+            out.append(comfy.utils.repeat_to_batch_size(c, batch_size))

        return self._copy_with(out)

--- a/comfy/context_windows.py
+++ b/comfy/context_windows.py
@@ -0,0 +1,540 @@
+from __future__ import annotations
+from typing import TYPE_CHECKING, Callable
+import torch
+import numpy as np
+import collections
+from dataclasses import dataclass
+from abc import ABC, abstractmethod
+import logging
+import comfy.model_management
+import comfy.patcher_extension
+if TYPE_CHECKING:
+    from comfy.model_base import BaseModel
+    from comfy.model_patcher import ModelPatcher
+    from comfy.controlnet import ControlBase
+
+
+class ContextWindowABC(ABC):
+    def __init__(self):
+        ...
+
+    @abstractmethod
+    def get_tensor(self, full: torch.Tensor) -> torch.Tensor:
+        """
+        Get torch.Tensor applicable to current window.
+        """
+        raise NotImplementedError("Not implemented.")
+
+    @abstractmethod
+    def add_window(self, full: torch.Tensor, to_add: torch.Tensor) -> torch.Tensor:
+        """
+        Apply torch.Tensor of window to the full tensor, in place. Returns reference to updated full tensor, not a copy.
+        """
+        raise NotImplementedError("Not implemented.")
+
+class ContextHandlerABC(ABC):
+    def __init__(self):
+        ...
+
+    @abstractmethod
+    def should_use_context(self, model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep: torch.Tensor, model_options: dict[str]) -> bool:
+        raise NotImplementedError("Not implemented.")
+
+    @abstractmethod
+    def get_resized_cond(self, cond_in: list[dict], x_in: torch.Tensor, window: ContextWindowABC, device=None) -> list:
+        raise NotImplementedError("Not implemented.")
+
+    @abstractmethod
+    def execute(self, calc_cond_batch: Callable, model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep: torch.Tensor, model_options: dict[str]):
+        raise NotImplementedError("Not implemented.")
+
+
+
+class IndexListContextWindow(ContextWindowABC):
+    def __init__(self, index_list: list[int], dim: int=0):
+        self.index_list = index_list
+        self.context_length = len(index_list)
+        self.dim = dim
+
+    def get_tensor(self, full: torch.Tensor, device=None, dim=None) -> torch.Tensor:
+        if dim is None:
+            dim = self.dim
+        if dim == 0 and full.shape[dim] == 1:
+            return full
+        idx = [slice(None)] * dim + [self.index_list]
+        return full[idx].to(device)
+
+    def add_window(self, full: torch.Tensor, to_add: torch.Tensor, dim=None) -> torch.Tensor:
+        if dim is None:
+            dim = self.dim
+        idx = [slice(None)] * dim + [self.index_list]
+        full[idx] += to_add
+        return full
+
+
+class IndexListCallbacks:
+    EVALUATE_CONTEXT_WINDOWS = "evaluate_context_windows"
+    COMBINE_CONTEXT_WINDOW_RESULTS = "combine_context_window_results"
+    EXECUTE_START = "execute_start"
+    EXECUTE_CLEANUP = "execute_cleanup"
+
+    def init_callbacks(self):
+        return {}
+
+
+@dataclass
+class ContextSchedule:
+    name: str
+    func: Callable
+
+@dataclass
+class ContextFuseMethod:
+    name: str
+    func: Callable
+
+ContextResults = collections.namedtuple("ContextResults", ['window_idx', 'sub_conds_out', 'sub_conds', 'window'])
+class IndexListContextHandler(ContextHandlerABC):
+    def __init__(self, context_schedule: ContextSchedule, fuse_method: ContextFuseMethod, context_length: int=1, context_overlap: int=0, context_stride: int=1, closed_loop=False, dim=0):
+        self.context_schedule = context_schedule
+        self.fuse_method = fuse_method
+        self.context_length = context_length
+        self.context_overlap = context_overlap
+        self.context_stride = context_stride
+        self.closed_loop = closed_loop
+        self.dim = dim
+        self._step = 0
+
+        self.callbacks = {}
+
+    def should_use_context(self, model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep: torch.Tensor, model_options: dict[str]) -> bool:
+        # for now, assume first dim is batch - should have stored on BaseModel in actual implementation
+        if x_in.size(self.dim) > self.context_length:
+            logging.info(f"Using context windows {self.context_length} for {x_in.size(self.dim)} frames.")
+            return True
+        return False
+
+    def prepare_control_objects(self, control: ControlBase, device=None) -> ControlBase:
+        if control.previous_controlnet is not None:
+            self.prepare_control_objects(control.previous_controlnet, device)
+        return control
+
+    def get_resized_cond(self, cond_in: list[dict], x_in: torch.Tensor, window: IndexListContextWindow, device=None) -> list:
+        if cond_in is None:
+            return None
+        # reuse or resize cond items to match context requirements
+        resized_cond = []
+        # cond object is a list containing a dict - outer list is irrelevant, so just loop through it
+        for actual_cond in cond_in:
+            resized_actual_cond = actual_cond.copy()
+            # now we are in the inner dict - "pooled_output" is a tensor, "control" is a ControlBase object, "model_conds" is dictionary
+            for key in actual_cond:
+                try:
+                    cond_item = actual_cond[key]
+                    if isinstance(cond_item, torch.Tensor):
+                        # check that tensor is the expected length - x.size(0)
+                        if self.dim < cond_item.ndim and cond_item.size(self.dim) == x_in.size(self.dim):
+                            # if so, it's subsetting time - tell controls the expected indeces so they can handle them
+                            actual_cond_item = window.get_tensor(cond_item)
+                            resized_actual_cond[key] = actual_cond_item.to(device)
+                        else:
+                            resized_actual_cond[key] = cond_item.to(device)
+                    # look for control
+                    elif key == "control":
+                        resized_actual_cond[key] = self.prepare_control_objects(cond_item, device)
+                    elif isinstance(cond_item, dict):
+                        new_cond_item = cond_item.copy()
+                        # when in dictionary, look for tensors and CONDCrossAttn [comfy/conds.py] (has cond attr that is a tensor)
+                        for cond_key, cond_value in new_cond_item.items():
+                            if isinstance(cond_value, torch.Tensor):
+                                if cond_value.ndim < self.dim and cond_value.size(0) == x_in.size(self.dim):
+                                    new_cond_item[cond_key] = window.get_tensor(cond_value, device)
+                            # if has cond that is a Tensor, check if needs to be subset
+                            elif hasattr(cond_value, "cond") and isinstance(cond_value.cond, torch.Tensor):
+                                if cond_value.cond.ndim < self.dim and cond_value.cond.size(0) == x_in.size(self.dim):
+                                    new_cond_item[cond_key] = cond_value._copy_with(window.get_tensor(cond_value.cond, device))
+                            elif cond_key == "num_video_frames": # for SVD
+                                new_cond_item[cond_key] = cond_value._copy_with(cond_value.cond)
+                                new_cond_item[cond_key].cond = window.context_length
+                        resized_actual_cond[key] = new_cond_item
+                    else:
+                        resized_actual_cond[key] = cond_item
+                finally:
+                    del cond_item  # just in case to prevent VRAM issues
+            resized_cond.append(resized_actual_cond)
+        return resized_cond
+
+    def set_step(self, timestep: torch.Tensor, model_options: dict[str]):
+        mask = torch.isclose(model_options["transformer_options"]["sample_sigmas"], timestep, rtol=0.0001)
+        matches = torch.nonzero(mask)
+        if torch.numel(matches) == 0:
+            raise Exception("No sample_sigmas matched current timestep; something went wrong.")
+        self._step = int(matches[0].item())
+
+    def get_context_windows(self, model: BaseModel, x_in: torch.Tensor, model_options: dict[str]) -> list[IndexListContextWindow]:
+        full_length = x_in.size(self.dim) # TODO: choose dim based on model
+        context_windows = self.context_schedule.func(full_length, self, model_options)
+        context_windows = [IndexListContextWindow(window, dim=self.dim) for window in context_windows]
+        return context_windows
+
+    def execute(self, calc_cond_batch: Callable, model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep: torch.Tensor, model_options: dict[str]):
+        self.set_step(timestep, model_options)
+        context_windows = self.get_context_windows(model, x_in, model_options)
+        enumerated_context_windows = list(enumerate(context_windows))
+
+        conds_final = [torch.zeros_like(x_in) for _ in conds]
+        if self.fuse_method.name == ContextFuseMethods.RELATIVE:
+            counts_final = [torch.ones(get_shape_for_dim(x_in, self.dim), device=x_in.device) for _ in conds]
+        else:
+            counts_final = [torch.zeros(get_shape_for_dim(x_in, self.dim), device=x_in.device) for _ in conds]
+        biases_final = [([0.0] * x_in.shape[self.dim]) for _ in conds]
+
+        for callback in comfy.patcher_extension.get_all_callbacks(IndexListCallbacks.EXECUTE_START, self.callbacks):
+            callback(self, model, x_in, conds, timestep, model_options)
+
+        for enum_window in enumerated_context_windows:
+            results = self.evaluate_context_windows(calc_cond_batch, model, x_in, conds, timestep, [enum_window], model_options)
+            for result in results:
+                self.combine_context_window_results(x_in, result.sub_conds_out, result.sub_conds, result.window, result.window_idx, len(enumerated_context_windows), timestep,
+                                            conds_final, counts_final, biases_final)
+        try:
+            # finalize conds
+            if self.fuse_method.name == ContextFuseMethods.RELATIVE:
+                # relative is already normalized, so return as is
+                del counts_final
+                return conds_final
+            else:
+                # normalize conds via division by context usage counts
+                for i in range(len(conds_final)):
+                    conds_final[i] /= counts_final[i]
+                del counts_final
+                return conds_final
+        finally:
+            for callback in comfy.patcher_extension.get_all_callbacks(IndexListCallbacks.EXECUTE_CLEANUP, self.callbacks):
+                callback(self, model, x_in, conds, timestep, model_options)
+
+    def evaluate_context_windows(self, calc_cond_batch: Callable, model: BaseModel, x_in: torch.Tensor, conds, timestep: torch.Tensor, enumerated_context_windows: list[tuple[int, IndexListContextWindow]],
+                                model_options, device=None, first_device=None):
+        results: list[ContextResults] = []
+        for window_idx, window in enumerated_context_windows:
+            # allow processing to end between context window executions for faster Cancel
+            comfy.model_management.throw_exception_if_processing_interrupted()
+
+            for callback in comfy.patcher_extension.get_all_callbacks(IndexListCallbacks.EVALUATE_CONTEXT_WINDOWS, self.callbacks):
+                callback(self, model, x_in, conds, timestep, model_options, window_idx, window, model_options, device, first_device)
+
+            # update exposed params
+            model_options["transformer_options"]["context_window"] = window
+            # get subsections of x, timestep, conds
+            sub_x = window.get_tensor(x_in, device)
+            sub_timestep = window.get_tensor(timestep, device, dim=0)
+            sub_conds = [self.get_resized_cond(cond, x_in, window, device) for cond in conds]
+
+            sub_conds_out = calc_cond_batch(model, sub_conds, sub_x, sub_timestep, model_options)
+            if device is not None:
+                for i in range(len(sub_conds_out)):
+                    sub_conds_out[i] = sub_conds_out[i].to(x_in.device)
+            results.append(ContextResults(window_idx, sub_conds_out, sub_conds, window))
+        return results
+
+
+    def combine_context_window_results(self, x_in: torch.Tensor, sub_conds_out, sub_conds, window: IndexListContextWindow, window_idx: int, total_windows: int, timestep: torch.Tensor,
+                                    conds_final: list[torch.Tensor], counts_final: list[torch.Tensor], biases_final: list[torch.Tensor]):
+        if self.fuse_method.name == ContextFuseMethods.RELATIVE:
+            for pos, idx in enumerate(window.index_list):
+                # bias is the influence of a specific index in relation to the whole context window
+                bias = 1 - abs(idx - (window.index_list[0] + window.index_list[-1]) / 2) / ((window.index_list[-1] - window.index_list[0] + 1e-2) / 2)
+                bias = max(1e-2, bias)
+                # take weighted average relative to total bias of current idx
+                for i in range(len(sub_conds_out)):
+                    bias_total = biases_final[i][idx]
+                    prev_weight = (bias_total / (bias_total + bias))
+                    new_weight = (bias / (bias_total + bias))
+                    # account for dims of tensors
+                    idx_window = [slice(None)] * self.dim + [idx]
+                    pos_window = [slice(None)] * self.dim + [pos]
+                    # apply new values
+                    conds_final[i][idx_window] = conds_final[i][idx_window] * prev_weight + sub_conds_out[i][pos_window] * new_weight
+                    biases_final[i][idx] = bias_total + bias
+        else:
+            # add conds and counts based on weights of fuse method
+            weights = get_context_weights(window.context_length, x_in.shape[self.dim], window.index_list, self, sigma=timestep)
+            weights_tensor = match_weights_to_dim(weights, x_in, self.dim, device=x_in.device)
+            for i in range(len(sub_conds_out)):
+                window.add_window(conds_final[i], sub_conds_out[i] * weights_tensor)
+                window.add_window(counts_final[i], weights_tensor)
+
+        for callback in comfy.patcher_extension.get_all_callbacks(IndexListCallbacks.COMBINE_CONTEXT_WINDOW_RESULTS, self.callbacks):
+            callback(self, x_in, sub_conds_out, sub_conds, window, window_idx, total_windows, timestep, conds_final, counts_final, biases_final)
+
+
+def _prepare_sampling_wrapper(executor, model, noise_shape: torch.Tensor, *args, **kwargs):
+    # limit noise_shape length to context_length for more accurate vram use estimation
+    model_options = kwargs.get("model_options", None)
+    if model_options is None:
+        raise Exception("model_options not found in prepare_sampling_wrapper; this should never happen, something went wrong.")
+    handler: IndexListContextHandler = model_options.get("context_handler", None)
+    if handler is not None:
+        noise_shape = list(noise_shape)
+        noise_shape[handler.dim] = min(noise_shape[handler.dim], handler.context_length)
+    return executor(model, noise_shape, *args, **kwargs)
+
+
+def create_prepare_sampling_wrapper(model: ModelPatcher):
+    model.add_wrapper_with_key(
+        comfy.patcher_extension.WrappersMP.PREPARE_SAMPLING,
+        "ContextWindows_prepare_sampling",
+        _prepare_sampling_wrapper
+    )
+
+
+def match_weights_to_dim(weights: list[float], x_in: torch.Tensor, dim: int, device=None) -> torch.Tensor:
+    total_dims = len(x_in.shape)
+    weights_tensor = torch.Tensor(weights).to(device=device)
+    for _ in range(dim):
+        weights_tensor = weights_tensor.unsqueeze(0)
+    for _ in range(total_dims - dim - 1):
+        weights_tensor = weights_tensor.unsqueeze(-1)
+    return weights_tensor
+
+def get_shape_for_dim(x_in: torch.Tensor, dim: int) -> list[int]:
+    total_dims = len(x_in.shape)
+    shape = []
+    for _ in range(dim):
+        shape.append(1)
+    shape.append(x_in.shape[dim])
+    for _ in range(total_dims - dim - 1):
+        shape.append(1)
+    return shape
+
+class ContextSchedules:
+    UNIFORM_LOOPED = "looped_uniform"
+    UNIFORM_STANDARD = "standard_uniform"
+    STATIC_STANDARD = "standard_static"
+    BATCHED = "batched"
+
+
+# from https://github.com/neggles/animatediff-cli/blob/main/src/animatediff/pipelines/context.py
+def create_windows_uniform_looped(num_frames: int, handler: IndexListContextHandler, model_options: dict[str]):
+    windows = []
+    if num_frames < handler.context_length:
+        windows.append(list(range(num_frames)))
+        return windows
+
+    context_stride = min(handler.context_stride, int(np.ceil(np.log2(num_frames / handler.context_length))) + 1)
+    # obtain uniform windows as normal, looping and all
+    for context_step in 1 << np.arange(context_stride):
+        pad = int(round(num_frames * ordered_halving(handler._step)))
+        for j in range(
+            int(ordered_halving(handler._step) * context_step) + pad,
+            num_frames + pad + (0 if handler.closed_loop else -handler.context_overlap),
+            (handler.context_length * context_step - handler.context_overlap),
+        ):
+            windows.append([e % num_frames for e in range(j, j + handler.context_length * context_step, context_step)])
+
+    return windows
+
+def create_windows_uniform_standard(num_frames: int, handler: IndexListContextHandler, model_options: dict[str]):
+    # unlike looped, uniform_straight does NOT allow windows that loop back to the beginning;
+    # instead, they get shifted to the corresponding end of the frames.
+    # in the case that a window (shifted or not) is identical to the previous one, it gets skipped.
+    windows = []
+    if num_frames <= handler.context_length:
+        windows.append(list(range(num_frames)))
+        return windows
+
+    context_stride = min(handler.context_stride, int(np.ceil(np.log2(num_frames / handler.context_length))) + 1)
+    # first, obtain uniform windows as normal, looping and all
+    for context_step in 1 << np.arange(context_stride):
+        pad = int(round(num_frames * ordered_halving(handler._step)))
+        for j in range(
+            int(ordered_halving(handler._step) * context_step) + pad,
+            num_frames + pad + (-handler.context_overlap),
+            (handler.context_length * context_step - handler.context_overlap),
+        ):
+            windows.append([e % num_frames for e in range(j, j + handler.context_length * context_step, context_step)])
+
+    # now that windows are created, shift any windows that loop, and delete duplicate windows
+    delete_idxs = []
+    win_i = 0
+    while win_i < len(windows):
+        # if window is rolls over itself, need to shift it
+        is_roll, roll_idx = does_window_roll_over(windows[win_i], num_frames)
+        if is_roll:
+            roll_val = windows[win_i][roll_idx]  # roll_val might not be 0 for windows of higher strides
+            shift_window_to_end(windows[win_i], num_frames=num_frames)
+            # check if next window (cyclical) is missing roll_val
+            if roll_val not in windows[(win_i+1) % len(windows)]:
+                # need to insert new window here - just insert window starting at roll_val
+                windows.insert(win_i+1, list(range(roll_val, roll_val + handler.context_length)))
+        # delete window if it's not unique
+        for pre_i in range(0, win_i):
+            if windows[win_i] == windows[pre_i]:
+                delete_idxs.append(win_i)
+                break
+        win_i += 1
+
+    # reverse delete_idxs so that they will be deleted in an order that doesn't break idx correlation
+    delete_idxs.reverse()
+    for i in delete_idxs:
+        windows.pop(i)
+
+    return windows
+
+
+def create_windows_static_standard(num_frames: int, handler: IndexListContextHandler, model_options: dict[str]):
+    windows = []
+    if num_frames <= handler.context_length:
+        windows.append(list(range(num_frames)))
+        return windows
+    # always return the same set of windows
+    delta = handler.context_length - handler.context_overlap
+    for start_idx in range(0, num_frames, delta):
+        # if past the end of frames, move start_idx back to allow same context_length
+        ending = start_idx + handler.context_length
+        if ending >= num_frames:
+            final_delta = ending - num_frames
+            final_start_idx = start_idx - final_delta
+            windows.append(list(range(final_start_idx, final_start_idx + handler.context_length)))
+            break
+        windows.append(list(range(start_idx, start_idx + handler.context_length)))
+    return windows
+
+
+def create_windows_batched(num_frames: int, handler: IndexListContextHandler, model_options: dict[str]):
+    windows = []
+    if num_frames <= handler.context_length:
+        windows.append(list(range(num_frames)))
+        return windows
+    # always return the same set of windows;
+    # no overlap, just cut up based on context_length;
+    # last window size will be different if num_frames % opts.context_length != 0
+    for start_idx in range(0, num_frames, handler.context_length):
+        windows.append(list(range(start_idx, min(start_idx + handler.context_length, num_frames))))
+    return windows
+
+
+def create_windows_default(num_frames: int, handler: IndexListContextHandler):
+    return [list(range(num_frames))]
+
+
+CONTEXT_MAPPING = {
+    ContextSchedules.UNIFORM_LOOPED: create_windows_uniform_looped,
+    ContextSchedules.UNIFORM_STANDARD: create_windows_uniform_standard,
+    ContextSchedules.STATIC_STANDARD: create_windows_static_standard,
+    ContextSchedules.BATCHED: create_windows_batched,
+}
+
+
+def get_matching_context_schedule(context_schedule: str) -> ContextSchedule:
+    func = CONTEXT_MAPPING.get(context_schedule, None)
+    if func is None:
+        raise ValueError(f"Unknown context_schedule '{context_schedule}'.")
+    return ContextSchedule(context_schedule, func)
+
+
+def get_context_weights(length: int, full_length: int, idxs: list[int], handler: IndexListContextHandler, sigma: torch.Tensor=None):
+    return handler.fuse_method.func(length, sigma=sigma, handler=handler, full_length=full_length, idxs=idxs)
+
+
+def create_weights_flat(length: int, **kwargs) -> list[float]:
+    # weight is the same for all
+    return [1.0] * length
+
+def create_weights_pyramid(length: int, **kwargs) -> list[float]:
+    # weight is based on the distance away from the edge of the context window;
+    # based on weighted average concept in FreeNoise paper
+    if length % 2 == 0:
+        max_weight = length // 2
+        weight_sequence = list(range(1, max_weight + 1, 1)) + list(range(max_weight, 0, -1))
+    else:
+        max_weight = (length + 1) // 2
+        weight_sequence = list(range(1, max_weight, 1)) + [max_weight] + list(range(max_weight - 1, 0, -1))
+    return weight_sequence
+
+def create_weights_overlap_linear(length: int, full_length: int, idxs: list[int], handler: IndexListContextHandler, **kwargs):
+    # based on code in Kijai's WanVideoWrapper: https://github.com/kijai/ComfyUI-WanVideoWrapper/blob/dbb2523b37e4ccdf45127e5ae33e31362f755c8e/nodes.py#L1302
+    # only expected overlap is given different weights
+    weights_torch = torch.ones((length))
+    # blend left-side on all except first window
+    if min(idxs) > 0:
+        ramp_up = torch.linspace(1e-37, 1, handler.context_overlap)
+        weights_torch[:handler.context_overlap] = ramp_up
+    # blend right-side on all except last window
+    if max(idxs) < full_length-1:
+        ramp_down = torch.linspace(1, 1e-37, handler.context_overlap)
+        weights_torch[-handler.context_overlap:] = ramp_down
+    return weights_torch
+
+class ContextFuseMethods:
+    FLAT = "flat"
+    PYRAMID = "pyramid"
+    RELATIVE = "relative"
+    OVERLAP_LINEAR = "overlap-linear"
+
+    LIST = [PYRAMID, FLAT, OVERLAP_LINEAR]
+    LIST_STATIC = [PYRAMID, RELATIVE, FLAT, OVERLAP_LINEAR]
+
+
+FUSE_MAPPING = {
+    ContextFuseMethods.FLAT: create_weights_flat,
+    ContextFuseMethods.PYRAMID: create_weights_pyramid,
+    ContextFuseMethods.RELATIVE: create_weights_pyramid,
+    ContextFuseMethods.OVERLAP_LINEAR: create_weights_overlap_linear,
+}
+
+def get_matching_fuse_method(fuse_method: str) -> ContextFuseMethod:
+    func = FUSE_MAPPING.get(fuse_method, None)
+    if func is None:
+        raise ValueError(f"Unknown fuse_method '{fuse_method}'.")
+    return ContextFuseMethod(fuse_method, func)
+
+# Returns fraction that has denominator that is a power of 2
+def ordered_halving(val):
+    # get binary value, padded with 0s for 64 bits
+    bin_str = f"{val:064b}"
+    # flip binary value, padding included
+    bin_flip = bin_str[::-1]
+    # convert binary to int
+    as_int = int(bin_flip, 2)
+    # divide by 1 << 64, equivalent to 2**64, or 18446744073709551616,
+    # or b10000000000000000000000000000000000000000000000000000000000000000 (1 with 64 zero's)
+    return as_int / (1 << 64)
+
+
+def get_missing_indexes(windows: list[list[int]], num_frames: int) -> list[int]:
+    all_indexes = list(range(num_frames))
+    for w in windows:
+        for val in w:
+            try:
+                all_indexes.remove(val)
+            except ValueError:
+                pass
+    return all_indexes
+
+
+def does_window_roll_over(window: list[int], num_frames: int) -> tuple[bool, int]:
+    prev_val = -1
+    for i, val in enumerate(window):
+        val = val % num_frames
+        if val < prev_val:
+            return True, i
+        prev_val = val
+    return False, -1
+
+
+def shift_window_to_start(window: list[int], num_frames: int):
+    start_val = window[0]
+    for i in range(len(window)):
+        # 1) subtract each element by start_val to move vals relative to the start of all frames
+        # 2) add num_frames and take modulus to get adjusted vals
+        window[i] = ((window[i] - start_val) + num_frames) % num_frames
+
+
+def shift_window_to_end(window: list[int], num_frames: int):
+    # 1) shift window to start
+    shift_window_to_start(window, num_frames)
+    end_val = window[-1]
+    end_delta = num_frames - end_val - 1
+    for i in range(len(window)):
+        # 2) add end_delta to each val to slide windows to end
+        window[i] = window[i] + end_delta
--- a/comfy/controlnet.py
+++ b/comfy/controlnet.py
@@ -28,6 +28,7 @@ import comfy.model_detection
 import comfy.model_patcher
 import comfy.ops
 import comfy.latent_formats
+import comfy.model_base

 import comfy.cldm.cldm
 import comfy.t2i_adapter.adapter
@@ -43,7 +44,6 @@ if TYPE_CHECKING:

 def broadcast_image_to(tensor, target_batch_size, batched_number):
    current_batch_size = tensor.shape[0]
-    #print(current_batch_size, target_batch_size)
    if current_batch_size == 1:
        return tensor

@@ -265,12 +265,12 @@ class ControlNet(ControlBase):
        for c in self.extra_conds:
            temp = cond.get(c, None)
            if temp is not None:
-                extra[c] = temp.to(dtype)
+                extra[c] = comfy.model_base.convert_tensor(temp, dtype, x_noisy.device)

        timestep = self.model_sampling_current.timestep(t)
        x_noisy = self.model_sampling_current.calculate_input(t, x_noisy)

-        control = self.control_model(x=x_noisy.to(dtype), hint=self.cond_hint, timesteps=timestep.to(dtype), context=context.to(dtype), **extra)
+        control = self.control_model(x=x_noisy.to(dtype), hint=self.cond_hint, timesteps=timestep.to(dtype), context=comfy.model_management.cast_to_device(context, x_noisy.device, dtype), **extra)
        return self.control_merge(control, control_prev, output_dtype=None)

    def copy(self):
--- a/comfy/ldm/flux/model.py
+++ b/comfy/ldm/flux/model.py
@@ -224,19 +224,27 @@ class Flux(nn.Module):
        if ref_latents is not None:
            h = 0
            w = 0
+            index = 0
+            index_ref_method = kwargs.get("ref_latents_method", "offset") == "index"
            for ref in ref_latents:
-                h_offset = 0
-                w_offset = 0
-                if ref.shape[-2] + h > ref.shape[-1] + w:
-                    w_offset = w
+                if index_ref_method:
+                    index += 1
+                    h_offset = 0
+                    w_offset = 0
                else:
-                    h_offset = h
+                    index = 1
+                    h_offset = 0
+                    w_offset = 0
+                    if ref.shape[-2] + h > ref.shape[-1] + w:
+                        w_offset = w
+                    else:
+                        h_offset = h
+                    h = max(h, ref.shape[-2] + h_offset)
+                    w = max(w, ref.shape[-1] + w_offset)

-                kontext, kontext_ids = self.process_img(ref, index=1, h_offset=h_offset, w_offset=w_offset)
+                kontext, kontext_ids = self.process_img(ref, index=index, h_offset=h_offset, w_offset=w_offset)
                img = torch.cat([img, kontext], dim=1)
                img_ids = torch.cat([img_ids, kontext_ids], dim=1)
-                h = max(h, ref.shape[-2] + h_offset)
-                w = max(w, ref.shape[-1] + w_offset)

        txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
        out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control, transformer_options, attn_mask=kwargs.get("attention_mask", None))
--- a/comfy/ldm/hunyuan3d/vae.py
+++ b/comfy/ldm/hunyuan3d/vae.py
@@ -178,7 +178,7 @@ class FourierEmbedder(nn.Module):

 class CrossAttentionProcessor:
    def __call__(self, attn, q, k, v):
-        out = F.scaled_dot_product_attention(q, k, v)
+        out = comfy.ops.scaled_dot_product_attention(q, k, v)
        return out


--- a/comfy/ldm/modules/attention.py
+++ b/comfy/ldm/modules/attention.py
@@ -448,7 +448,7 @@ def attention_pytorch(q, k, v, heads, mask=None, attn_precision=None, skip_resha
            mask = mask.unsqueeze(1)

    if SDP_BATCH_LIMIT >= b:
-        out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
+        out = comfy.ops.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
        if not skip_output_reshape:
            out = (
                out.transpose(1, 2).reshape(b, -1, heads * dim_head)
@@ -461,7 +461,7 @@ def attention_pytorch(q, k, v, heads, mask=None, attn_precision=None, skip_resha
                if mask.shape[0] > 1:
                    m = mask[i : i + SDP_BATCH_LIMIT]

-            out[i : i + SDP_BATCH_LIMIT] = torch.nn.functional.scaled_dot_product_attention(
+            out[i : i + SDP_BATCH_LIMIT] = comfy.ops.scaled_dot_product_attention(
                q[i : i + SDP_BATCH_LIMIT],
                k[i : i + SDP_BATCH_LIMIT],
                v[i : i + SDP_BATCH_LIMIT],
--- a/comfy/ldm/modules/diffusionmodules/model.py
+++ b/comfy/ldm/modules/diffusionmodules/model.py
@@ -285,7 +285,7 @@ def pytorch_attention(q, k, v):
    )

    try:
-        out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=None, dropout_p=0.0, is_causal=False)
+        out = comfy.ops.scaled_dot_product_attention(q, k, v, attn_mask=None, dropout_p=0.0, is_causal=False)
        out = out.transpose(2, 3).reshape(orig_shape)
    except model_management.OOM_EXCEPTION:
        logging.warning("scaled_dot_product_attention OOMed: switched to slice attention")
--- a/comfy/ldm/qwen_image/model.py
+++ b/comfy/ldm/qwen_image/model.py
@@ -0,0 +1,443 @@
+# https://github.com/QwenLM/Qwen-Image (Apache 2.0)
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from typing import Optional, Tuple
+from einops import repeat
+
+from comfy.ldm.lightricks.model import TimestepEmbedding, Timesteps
+from comfy.ldm.modules.attention import optimized_attention_masked
+from comfy.ldm.flux.layers import EmbedND
+import comfy.ldm.common_dit
+
+class GELU(nn.Module):
+    def __init__(self, dim_in: int, dim_out: int, approximate: str = "none", bias: bool = True, dtype=None, device=None, operations=None):
+        super().__init__()
+        self.proj = operations.Linear(dim_in, dim_out, bias=bias, dtype=dtype, device=device)
+        self.approximate = approximate
+
+    def forward(self, hidden_states):
+        hidden_states = self.proj(hidden_states)
+        hidden_states = F.gelu(hidden_states, approximate=self.approximate)
+        return hidden_states
+
+
+class FeedForward(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        dim_out: Optional[int] = None,
+        mult: int = 4,
+        dropout: float = 0.0,
+        inner_dim=None,
+        bias: bool = True,
+        dtype=None, device=None, operations=None
+    ):
+        super().__init__()
+        if inner_dim is None:
+            inner_dim = int(dim * mult)
+        dim_out = dim_out if dim_out is not None else dim
+
+        self.net = nn.ModuleList([])
+        self.net.append(GELU(dim, inner_dim, approximate="tanh", bias=bias, dtype=dtype, device=device, operations=operations))
+        self.net.append(nn.Dropout(dropout))
+        self.net.append(operations.Linear(inner_dim, dim_out, bias=bias, dtype=dtype, device=device))
+
+    def forward(self, hidden_states: torch.Tensor, *args, **kwargs) -> torch.Tensor:
+        for module in self.net:
+            hidden_states = module(hidden_states)
+        return hidden_states
+
+
+def apply_rotary_emb(x, freqs_cis):
+    if x.shape[1] == 0:
+        return x
+
+    t_ = x.reshape(*x.shape[:-1], -1, 1, 2)
+    t_out = freqs_cis[..., 0] * t_[..., 0] + freqs_cis[..., 1] * t_[..., 1]
+    return t_out.reshape(*x.shape)
+
+
+class QwenTimestepProjEmbeddings(nn.Module):
+    def __init__(self, embedding_dim, pooled_projection_dim, dtype=None, device=None, operations=None):
+        super().__init__()
+        self.time_proj = Timesteps(num_channels=256, flip_sin_to_cos=True, downscale_freq_shift=0, scale=1000)
+        self.timestep_embedder = TimestepEmbedding(
+            in_channels=256,
+            time_embed_dim=embedding_dim,
+            dtype=dtype,
+            device=device,
+            operations=operations
+        )
+
+    def forward(self, timestep, hidden_states):
+        timesteps_proj = self.time_proj(timestep)
+        timesteps_emb = self.timestep_embedder(timesteps_proj.to(dtype=hidden_states.dtype))
+        return timesteps_emb
+
+
+class Attention(nn.Module):
+    def __init__(
+        self,
+        query_dim: int,
+        dim_head: int = 64,
+        heads: int = 8,
+        dropout: float = 0.0,
+        bias: bool = False,
+        eps: float = 1e-5,
+        out_bias: bool = True,
+        out_dim: int = None,
+        out_context_dim: int = None,
+        dtype=None,
+        device=None,
+        operations=None
+    ):
+        super().__init__()
+        self.inner_dim = out_dim if out_dim is not None else dim_head * heads
+        self.inner_kv_dim = self.inner_dim
+        self.heads = heads
+        self.dim_head = dim_head
+        self.out_dim = out_dim if out_dim is not None else query_dim
+        self.out_context_dim = out_context_dim if out_context_dim is not None else query_dim
+        self.dropout = dropout
+
+        # Q/K normalization
+        self.norm_q = operations.RMSNorm(dim_head, eps=eps, elementwise_affine=True, dtype=dtype, device=device)
+        self.norm_k = operations.RMSNorm(dim_head, eps=eps, elementwise_affine=True, dtype=dtype, device=device)
+        self.norm_added_q = operations.RMSNorm(dim_head, eps=eps, dtype=dtype, device=device)
+        self.norm_added_k = operations.RMSNorm(dim_head, eps=eps, dtype=dtype, device=device)
+
+        # Image stream projections
+        self.to_q = operations.Linear(query_dim, self.inner_dim, bias=bias, dtype=dtype, device=device)
+        self.to_k = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+        self.to_v = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+
+        # Text stream projections
+        self.add_q_proj = operations.Linear(query_dim, self.inner_dim, bias=bias, dtype=dtype, device=device)
+        self.add_k_proj = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+        self.add_v_proj = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+
+        # Output projections
+        self.to_out = nn.ModuleList([
+            operations.Linear(self.inner_dim, self.out_dim, bias=out_bias, dtype=dtype, device=device),
+            nn.Dropout(dropout)
+        ])
+        self.to_add_out = operations.Linear(self.inner_dim, self.out_context_dim, bias=out_bias, dtype=dtype, device=device)
+
+    def forward(
+        self,
+        hidden_states: torch.FloatTensor,  # Image stream
+        encoder_hidden_states: torch.FloatTensor = None,  # Text stream
+        encoder_hidden_states_mask: torch.FloatTensor = None,
+        attention_mask: Optional[torch.FloatTensor] = None,
+        image_rotary_emb: Optional[torch.Tensor] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        seq_txt = encoder_hidden_states.shape[1]
+
+        img_query = self.to_q(hidden_states).unflatten(-1, (self.heads, -1))
+        img_key = self.to_k(hidden_states).unflatten(-1, (self.heads, -1))
+        img_value = self.to_v(hidden_states).unflatten(-1, (self.heads, -1))
+
+        txt_query = self.add_q_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+        txt_key = self.add_k_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+        txt_value = self.add_v_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+
+        img_query = self.norm_q(img_query)
+        img_key = self.norm_k(img_key)
+        txt_query = self.norm_added_q(txt_query)
+        txt_key = self.norm_added_k(txt_key)
+
+        joint_query = torch.cat([txt_query, img_query], dim=1)
+        joint_key = torch.cat([txt_key, img_key], dim=1)
+        joint_value = torch.cat([txt_value, img_value], dim=1)
+
+        joint_query = apply_rotary_emb(joint_query, image_rotary_emb)
+        joint_key = apply_rotary_emb(joint_key, image_rotary_emb)
+
+        joint_query = joint_query.flatten(start_dim=2)
+        joint_key = joint_key.flatten(start_dim=2)
+        joint_value = joint_value.flatten(start_dim=2)
+
+        joint_hidden_states = optimized_attention_masked(joint_query, joint_key, joint_value, self.heads, attention_mask)
+
+        txt_attn_output = joint_hidden_states[:, :seq_txt, :]
+        img_attn_output = joint_hidden_states[:, seq_txt:, :]
+
+        img_attn_output = self.to_out[0](img_attn_output)
+        img_attn_output = self.to_out[1](img_attn_output)
+        txt_attn_output = self.to_add_out(txt_attn_output)
+
+        return img_attn_output, txt_attn_output
+
+
+class QwenImageTransformerBlock(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        num_attention_heads: int,
+        attention_head_dim: int,
+        eps: float = 1e-6,
+        dtype=None,
+        device=None,
+        operations=None
+    ):
+        super().__init__()
+        self.dim = dim
+        self.num_attention_heads = num_attention_heads
+        self.attention_head_dim = attention_head_dim
+
+        self.img_mod = nn.Sequential(
+            nn.SiLU(),
+            operations.Linear(dim, 6 * dim, bias=True, dtype=dtype, device=device),
+        )
+        self.img_norm1 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.img_norm2 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.img_mlp = FeedForward(dim=dim, dim_out=dim, dtype=dtype, device=device, operations=operations)
+
+        self.txt_mod = nn.Sequential(
+            nn.SiLU(),
+            operations.Linear(dim, 6 * dim, bias=True, dtype=dtype, device=device),
+        )
+        self.txt_norm1 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.txt_norm2 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.txt_mlp = FeedForward(dim=dim, dim_out=dim, dtype=dtype, device=device, operations=operations)
+
+        self.attn = Attention(
+            query_dim=dim,
+            dim_head=attention_head_dim,
+            heads=num_attention_heads,
+            out_dim=dim,
+            bias=True,
+            eps=eps,
+            dtype=dtype,
+            device=device,
+            operations=operations,
+        )
+
+    def _modulate(self, x, mod_params):
+        shift, scale, gate = mod_params.chunk(3, dim=-1)
+        return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1), gate.unsqueeze(1)
+
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        encoder_hidden_states: torch.Tensor,
+        encoder_hidden_states_mask: torch.Tensor,
+        temb: torch.Tensor,
+        image_rotary_emb: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        img_mod_params = self.img_mod(temb)
+        txt_mod_params = self.txt_mod(temb)
+        img_mod1, img_mod2 = img_mod_params.chunk(2, dim=-1)
+        txt_mod1, txt_mod2 = txt_mod_params.chunk(2, dim=-1)
+
+        img_normed = self.img_norm1(hidden_states)
+        img_modulated, img_gate1 = self._modulate(img_normed, img_mod1)
+        txt_normed = self.txt_norm1(encoder_hidden_states)
+        txt_modulated, txt_gate1 = self._modulate(txt_normed, txt_mod1)
+
+        img_attn_output, txt_attn_output = self.attn(
+            hidden_states=img_modulated,
+            encoder_hidden_states=txt_modulated,
+            encoder_hidden_states_mask=encoder_hidden_states_mask,
+            image_rotary_emb=image_rotary_emb,
+        )
+
+        hidden_states = hidden_states + img_gate1 * img_attn_output
+        encoder_hidden_states = encoder_hidden_states + txt_gate1 * txt_attn_output
+
+        img_normed2 = self.img_norm2(hidden_states)
+        img_modulated2, img_gate2 = self._modulate(img_normed2, img_mod2)
+        hidden_states = hidden_states + img_gate2 * self.img_mlp(img_modulated2)
+
+        txt_normed2 = self.txt_norm2(encoder_hidden_states)
+        txt_modulated2, txt_gate2 = self._modulate(txt_normed2, txt_mod2)
+        encoder_hidden_states = encoder_hidden_states + txt_gate2 * self.txt_mlp(txt_modulated2)
+
+        return encoder_hidden_states, hidden_states
+
+
+class LastLayer(nn.Module):
+    def __init__(
+        self,
+        embedding_dim: int,
+        conditioning_embedding_dim: int,
+        elementwise_affine=False,
+        eps=1e-6,
+        bias=True,
+        dtype=None, device=None, operations=None
+    ):
+        super().__init__()
+        self.silu = nn.SiLU()
+        self.linear = operations.Linear(conditioning_embedding_dim, embedding_dim * 2, bias=bias, dtype=dtype, device=device)
+        self.norm = operations.LayerNorm(embedding_dim, eps, elementwise_affine=False, bias=bias, dtype=dtype, device=device)
+
+    def forward(self, x: torch.Tensor, conditioning_embedding: torch.Tensor) -> torch.Tensor:
+        emb = self.linear(self.silu(conditioning_embedding))
+        scale, shift = torch.chunk(emb, 2, dim=1)
+        x = self.norm(x) * (1 + scale)[:, None, :] + shift[:, None, :]
+        return x
+
+
+class QwenImageTransformer2DModel(nn.Module):
+    def __init__(
+        self,
+        patch_size: int = 2,
+        in_channels: int = 64,
+        out_channels: Optional[int] = 16,
+        num_layers: int = 60,
+        attention_head_dim: int = 128,
+        num_attention_heads: int = 24,
+        joint_attention_dim: int = 3584,
+        pooled_projection_dim: int = 768,
+        guidance_embeds: bool = False,
+        axes_dims_rope: Tuple[int, int, int] = (16, 56, 56),
+        image_model=None,
+        dtype=None,
+        device=None,
+        operations=None,
+    ):
+        super().__init__()
+        self.dtype = dtype
+        self.patch_size = patch_size
+        self.out_channels = out_channels or in_channels
+        self.inner_dim = num_attention_heads * attention_head_dim
+
+        self.pe_embedder = EmbedND(dim=attention_head_dim, theta=10000, axes_dim=list(axes_dims_rope))
+
+        self.time_text_embed = QwenTimestepProjEmbeddings(
+            embedding_dim=self.inner_dim,
+            pooled_projection_dim=pooled_projection_dim,
+            dtype=dtype,
+            device=device,
+            operations=operations
+        )
+
+        self.txt_norm = operations.RMSNorm(joint_attention_dim, eps=1e-6, dtype=dtype, device=device)
+        self.img_in = operations.Linear(in_channels, self.inner_dim, dtype=dtype, device=device)
+        self.txt_in = operations.Linear(joint_attention_dim, self.inner_dim, dtype=dtype, device=device)
+
+        self.transformer_blocks = nn.ModuleList([
+            QwenImageTransformerBlock(
+                dim=self.inner_dim,
+                num_attention_heads=num_attention_heads,
+                attention_head_dim=attention_head_dim,
+                dtype=dtype,
+                device=device,
+                operations=operations
+            )
+            for _ in range(num_layers)
+        ])
+
+        self.norm_out = LastLayer(self.inner_dim, self.inner_dim, dtype=dtype, device=device, operations=operations)
+        self.proj_out = operations.Linear(self.inner_dim, patch_size * patch_size * self.out_channels, bias=True, dtype=dtype, device=device)
+        self.gradient_checkpointing = False
+
+    def process_img(self, x, index=0, h_offset=0, w_offset=0):
+        bs, c, t, h, w = x.shape
+        patch_size = self.patch_size
+        hidden_states = comfy.ldm.common_dit.pad_to_patch_size(x, (1, self.patch_size, self.patch_size))
+        orig_shape = hidden_states.shape
+        hidden_states = hidden_states.view(orig_shape[0], orig_shape[1], orig_shape[-2] // 2, 2, orig_shape[-1] // 2, 2)
+        hidden_states = hidden_states.permute(0, 2, 4, 1, 3, 5)
+        hidden_states = hidden_states.reshape(orig_shape[0], (orig_shape[-2] // 2) * (orig_shape[-1] // 2), orig_shape[1] * 4)
+        h_len = ((h + (patch_size // 2)) // patch_size)
+        w_len = ((w + (patch_size // 2)) // patch_size)
+
+        h_offset = ((h_offset + (patch_size // 2)) // patch_size)
+        w_offset = ((w_offset + (patch_size // 2)) // patch_size)
+
+        img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
+        img_ids[:, :, 0] = img_ids[:, :, 1] + index
+        img_ids[:, :, 1] = img_ids[:, :, 1] + torch.linspace(h_offset, h_len - 1 + h_offset, steps=h_len, device=x.device, dtype=x.dtype).unsqueeze(1)
+        img_ids[:, :, 2] = img_ids[:, :, 2] + torch.linspace(w_offset, w_len - 1 + w_offset, steps=w_len, device=x.device, dtype=x.dtype).unsqueeze(0)
+        return hidden_states, repeat(img_ids, "h w c -> b (h w) c", b=bs), orig_shape
+
+    def forward(
+        self,
+        x,
+        timesteps,
+        context,
+        attention_mask=None,
+        guidance: torch.Tensor = None,
+        ref_latents=None,
+        transformer_options={},
+        **kwargs
+    ):
+        timestep = timesteps
+        encoder_hidden_states = context
+        encoder_hidden_states_mask = attention_mask
+
+        hidden_states, img_ids, orig_shape = self.process_img(x)
+        num_embeds = hidden_states.shape[1]
+
+        if ref_latents is not None:
+            h = 0
+            w = 0
+            index = 0
+            index_ref_method = kwargs.get("ref_latents_method", "index") == "index"
+            for ref in ref_latents:
+                if index_ref_method:
+                    index += 1
+                    h_offset = 0
+                    w_offset = 0
+                else:
+                    index = 1
+                    h_offset = 0
+                    w_offset = 0
+                    if ref.shape[-2] + h > ref.shape[-1] + w:
+                        w_offset = w
+                    else:
+                        h_offset = h
+                    h = max(h, ref.shape[-2] + h_offset)
+                    w = max(w, ref.shape[-1] + w_offset)
+
+                kontext, kontext_ids, _ = self.process_img(ref, index=index, h_offset=h_offset, w_offset=w_offset)
+                hidden_states = torch.cat([hidden_states, kontext], dim=1)
+                img_ids = torch.cat([img_ids, kontext_ids], dim=1)
+
+        txt_start = round(max(((x.shape[-1] + (self.patch_size // 2)) // self.patch_size), ((x.shape[-2] + (self.patch_size // 2)) // self.patch_size)))
+        txt_ids = torch.linspace(txt_start, txt_start + context.shape[1], steps=context.shape[1], device=x.device, dtype=x.dtype).reshape(1, -1, 1).repeat(x.shape[0], 1, 3)
+        ids = torch.cat((txt_ids, img_ids), dim=1)
+        image_rotary_emb = self.pe_embedder(ids).squeeze(1).unsqueeze(2).to(x.dtype)
+
+        hidden_states = self.img_in(hidden_states)
+        encoder_hidden_states = self.txt_norm(encoder_hidden_states)
+        encoder_hidden_states = self.txt_in(encoder_hidden_states)
+
+        if guidance is not None:
+            guidance = guidance * 1000
+
+        temb = (
+            self.time_text_embed(timestep, hidden_states)
+            if guidance is None
+            else self.time_text_embed(timestep, guidance, hidden_states)
+        )
+
+        patches_replace = transformer_options.get("patches_replace", {})
+        blocks_replace = patches_replace.get("dit", {})
+
+        for i, block in enumerate(self.transformer_blocks):
+            if ("double_block", i) in blocks_replace:
+                def block_wrap(args):
+                    out = {}
+                    out["txt"], out["img"] = block(hidden_states=args["img"], encoder_hidden_states=args["txt"], encoder_hidden_states_mask=encoder_hidden_states_mask, temb=args["vec"], image_rotary_emb=args["pe"])
+                    return out
+                out = blocks_replace[("double_block", i)]({"img": hidden_states, "txt": encoder_hidden_states, "vec": temb, "pe": image_rotary_emb}, {"original_block": block_wrap})
+                hidden_states = out["img"]
+                encoder_hidden_states = out["txt"]
+            else:
+                encoder_hidden_states, hidden_states = block(
+                    hidden_states=hidden_states,
+                    encoder_hidden_states=encoder_hidden_states,
+                    encoder_hidden_states_mask=encoder_hidden_states_mask,
+                    temb=temb,
+                    image_rotary_emb=image_rotary_emb,
+                )
+
+        hidden_states = self.norm_out(hidden_states, temb)
+        hidden_states = self.proj_out(hidden_states)
+
+        hidden_states = hidden_states[:, :num_embeds].view(orig_shape[0], orig_shape[-2] // 2, orig_shape[-1] // 2, orig_shape[1], 2, 2)
+        hidden_states = hidden_states.permute(0, 3, 1, 4, 2, 5)
+        return hidden_states.reshape(orig_shape)[:, :, :, :x.shape[-2], :x.shape[-1]]
--- a/comfy/ldm/wan/model.py
+++ b/comfy/ldm/wan/model.py
@@ -391,6 +391,7 @@ class WanModel(torch.nn.Module):
                 cross_attn_norm=True,
                 eps=1e-6,
                 flf_pos_embed_token_number=None,
+                 in_dim_ref_conv=None,
                 image_model=None,
                 device=None,
                 dtype=None,
@@ -484,6 +485,11 @@ class WanModel(torch.nn.Module):
        else:
            self.img_emb = None

+        if in_dim_ref_conv is not None:
+            self.ref_conv = operations.Conv2d(in_dim_ref_conv, dim, kernel_size=patch_size[1:], stride=patch_size[1:], device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
+        else:
+            self.ref_conv = None
+
    def forward_orig(
        self,
        x,
@@ -526,6 +532,13 @@ class WanModel(torch.nn.Module):
        e = e.reshape(t.shape[0], -1, e.shape[-1])
        e0 = self.time_projection(e).unflatten(2, (6, self.dim))

+        full_ref = None
+        if self.ref_conv is not None:
+            full_ref = kwargs.get("reference_latent", None)
+            if full_ref is not None:
+                full_ref = self.ref_conv(full_ref).flatten(2).transpose(1, 2)
+                x = torch.concat((full_ref, x), dim=1)
+
        # context
        context = self.text_embedding(context)

@@ -552,6 +565,9 @@ class WanModel(torch.nn.Module):
        # head
        x = self.head(x, e)

+        if full_ref is not None:
+            x = x[:, full_ref.shape[1]:]
+
        # unpatchify
        x = self.unpatchify(x, grid_sizes)
        return x
@@ -570,6 +586,9 @@ class WanModel(torch.nn.Module):
            x = torch.cat([x, time_dim_concat], dim=2)
            t_len = ((x.shape[2] + (patch_size[0] // 2)) // patch_size[0])

+        if self.ref_conv is not None and "reference_latent" in kwargs:
+            t_len += 1
+
        img_ids = torch.zeros((t_len, h_len, w_len, 3), device=x.device, dtype=x.dtype)
        img_ids[:, :, :, 0] = img_ids[:, :, :, 0] + torch.linspace(0, t_len - 1, steps=t_len, device=x.device, dtype=x.dtype).reshape(-1, 1, 1)
        img_ids[:, :, :, 1] = img_ids[:, :, :, 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype).reshape(1, -1, 1)
@@ -749,7 +768,12 @@ class CameraWanModel(WanModel):
                 operations=None,
                 ):

-        super().__init__(model_type='i2v', patch_size=patch_size, text_len=text_len, in_dim=in_dim, dim=dim, ffn_dim=ffn_dim, freq_dim=freq_dim, text_dim=text_dim, out_dim=out_dim, num_heads=num_heads, num_layers=num_layers, window_size=window_size, qk_norm=qk_norm, cross_attn_norm=cross_attn_norm, eps=eps, flf_pos_embed_token_number=flf_pos_embed_token_number, image_model=image_model, device=device, dtype=dtype, operations=operations)
+        if model_type == 'camera':
+            model_type = 'i2v'
+        else:
+            model_type = 't2v'
+
+        super().__init__(model_type=model_type, patch_size=patch_size, text_len=text_len, in_dim=in_dim, dim=dim, ffn_dim=ffn_dim, freq_dim=freq_dim, text_dim=text_dim, out_dim=out_dim, num_heads=num_heads, num_layers=num_layers, window_size=window_size, qk_norm=qk_norm, cross_attn_norm=cross_attn_norm, eps=eps, flf_pos_embed_token_number=flf_pos_embed_token_number, image_model=image_model, device=device, dtype=dtype, operations=operations)
        operation_settings = {"operations": operations, "device": device, "dtype": dtype}

        self.control_adapter = WanCamAdapter(in_dim_control_adapter, dim, kernel_size=patch_size[1:], stride=patch_size[1:], operation_settings=operation_settings)
--- a/comfy/lora.py
+++ b/comfy/lora.py
@@ -293,6 +293,16 @@ def model_lora_keys_unet(model, key_map={}):
                key_lora = k[len("diffusion_model."):-len(".weight")]
                key_map["{}".format(key_lora)] = k

+    if isinstance(model, comfy.model_base.QwenImage):
+        for k in sdk:
+            if k.startswith("diffusion_model.") and k.endswith(".weight"): #QwenImage lora format
+                key_lora = k[len("diffusion_model."):-len(".weight")]
+                # Direct mapping for transformer_blocks format (QwenImage LoRA format)
+                key_map["{}".format(key_lora)] = k
+                # Support transformer prefix format
+                key_map["transformer.{}".format(key_lora)] = k
+                key_map["lycoris_{}".format(key_lora.replace(".", "_"))] = k #SimpleTuner lycoris format
+
    return key_map


--- a/comfy/model_base.py
+++ b/comfy/model_base.py
@@ -42,6 +42,7 @@ import comfy.ldm.hidream.model
 import comfy.ldm.chroma.model
 import comfy.ldm.ace.model
 import comfy.ldm.omnigen.omnigen2
+import comfy.ldm.qwen_image.model

 import comfy.model_management
 import comfy.patcher_extension
@@ -106,10 +107,12 @@ def model_sampling(model_config, model_type):
    return ModelSampling(model_config)


-def convert_tensor(extra, dtype):
+def convert_tensor(extra, dtype, device):
    if hasattr(extra, "dtype"):
        if extra.dtype != torch.int and extra.dtype != torch.long:
-            extra = extra.to(dtype)
+            extra = comfy.model_management.cast_to_device(extra, device, dtype)
+        else:
+            extra = comfy.model_management.cast_to_device(extra, device, None)
    return extra


@@ -160,7 +163,7 @@ class BaseModel(torch.nn.Module):
        xc = self.model_sampling.calculate_input(sigma, x)

        if c_concat is not None:
-            xc = torch.cat([xc] + [c_concat], dim=1)
+            xc = torch.cat([xc] + [comfy.model_management.cast_to_device(c_concat, xc.device, xc.dtype)], dim=1)

        context = c_crossattn
        dtype = self.get_dtype()
@@ -169,20 +172,21 @@ class BaseModel(torch.nn.Module):
            dtype = self.manual_cast_dtype

        xc = xc.to(dtype)
+        device = xc.device
        t = self.model_sampling.timestep(t).float()
        if context is not None:
-            context = context.to(dtype)
+            context = comfy.model_management.cast_to_device(context, device, dtype)

        extra_conds = {}
        for o in kwargs:
            extra = kwargs[o]

            if hasattr(extra, "dtype"):
-                extra = convert_tensor(extra, dtype)
+                extra = convert_tensor(extra, dtype, device)
            elif isinstance(extra, list):
                ex = []
                for ext in extra:
-                    ex.append(convert_tensor(ext, dtype))
+                    ex.append(convert_tensor(ext, dtype, device))
                extra = ex
            extra_conds[o] = extra

@@ -398,7 +402,7 @@ class SD21UNCLIP(BaseModel):
        unclip_conditioning = kwargs.get("unclip_conditioning", None)
        device = kwargs["device"]
        if unclip_conditioning is None:
-            return torch.zeros((1, self.adm_channels))
+            return torch.zeros((1, self.adm_channels), device=device)
        else:
            return unclip_adm(unclip_conditioning, device, self.noise_augmentor, kwargs.get("unclip_noise_augment_merge", 0.05), kwargs.get("seed", 0) - 10)

@@ -612,9 +616,11 @@ class IP2P:

        if image is None:
            image = torch.zeros_like(noise)
+        else:
+            image = image.to(device=device)

        if image.shape[1:] != noise.shape[1:]:
-            image = utils.common_upscale(image.to(device), noise.shape[-1], noise.shape[-2], "bilinear", "center")
+            image = utils.common_upscale(image, noise.shape[-1], noise.shape[-2], "bilinear", "center")

        image = utils.resize_to_batch_size(image, noise.shape[0])
        return self.process_ip2p_image_in(image)
@@ -693,7 +699,7 @@ class StableCascade_B(BaseModel):
        #size of prior doesn't really matter if zeros because it gets resized but I still want it to get batched
        prior = kwargs.get("stable_cascade_prior", torch.zeros((1, 16, (noise.shape[2] * 4) // 42, (noise.shape[3] * 4) // 42), dtype=noise.dtype, layout=noise.layout, device=noise.device))

-        out["effnet"] = comfy.conds.CONDRegular(prior)
+        out["effnet"] = comfy.conds.CONDRegular(prior.to(device=noise.device))
        out["sca"] = comfy.conds.CONDRegular(torch.zeros((1,)))
        return out

@@ -884,6 +890,10 @@ class Flux(BaseModel):
            for lat in ref_latents:
                latents.append(self.process_latent_in(lat))
            out['ref_latents'] = comfy.conds.CONDList(latents)
+
+            ref_latents_method = kwargs.get("reference_latents_method", None)
+            if ref_latents_method is not None:
+                out['ref_latents_method'] = comfy.conds.CONDConstant(ref_latents_method)
        return out

    def extra_conds_shapes(self, **kwargs):
@@ -1118,7 +1128,11 @@ class WAN21(BaseModel):
                mask = mask.repeat(1, 4, 1, 1, 1)
            mask = utils.resize_to_batch_size(mask, noise.shape[0])

-        return torch.cat((mask, image), dim=1)
+        concat_mask_index = kwargs.get("concat_mask_index", 0)
+        if concat_mask_index != 0:
+            return torch.cat((image[:, :concat_mask_index], mask, image[:, concat_mask_index:]), dim=1)
+        else:
+            return torch.cat((mask, image), dim=1)

    def extra_conds(self, **kwargs):
        out = super().extra_conds(**kwargs)
@@ -1134,6 +1148,10 @@ class WAN21(BaseModel):
        if time_dim_concat is not None:
            out['time_dim_concat'] = comfy.conds.CONDRegular(self.process_latent_in(time_dim_concat))

+        reference_latents = kwargs.get("reference_latents", None)
+        if reference_latents is not None:
+            out['reference_latent'] = comfy.conds.CONDRegular(self.process_latent_in(reference_latents[-1])[:, :, 0])
+
        return out


@@ -1158,10 +1176,10 @@ class WAN21_Vace(WAN21):

        vace_frames_out = []
        for j in range(len(vace_frames)):
-            vf = vace_frames[j].clone()
+            vf = vace_frames[j].to(device=noise.device, dtype=noise.dtype, copy=True)
            for i in range(0, vf.shape[1], 16):
                vf[:, i:i + 16] = self.process_latent_in(vf[:, i:i + 16])
-            vf = torch.cat([vf, mask[j]], dim=1)
+            vf = torch.cat([vf, mask[j].to(device=noise.device, dtype=noise.dtype)], dim=1)
            vace_frames_out.append(vf)

        vace_frames = torch.stack(vace_frames_out, dim=1)
@@ -1303,3 +1321,24 @@ class Omnigen2(BaseModel):
        if ref_latents is not None:
            out['ref_latents'] = list([1, 16, sum(map(lambda a: math.prod(a.size()), ref_latents)) // 16])
        return out
+
+class QwenImage(BaseModel):
+    def __init__(self, model_config, model_type=ModelType.FLUX, device=None):
+        super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.qwen_image.model.QwenImageTransformer2DModel)
+
+    def extra_conds(self, **kwargs):
+        out = super().extra_conds(**kwargs)
+        cross_attn = kwargs.get("cross_attn", None)
+        if cross_attn is not None:
+            out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)
+        ref_latents = kwargs.get("reference_latents", None)
+        if ref_latents is not None:
+            latents = []
+            for lat in ref_latents:
+                latents.append(self.process_latent_in(lat))
+            out['ref_latents'] = comfy.conds.CONDList(latents)
+
+            ref_latents_method = kwargs.get("reference_latents_method", None)
+            if ref_latents_method is not None:
+                out['ref_latents_method'] = comfy.conds.CONDConstant(ref_latents_method)
+        return out
--- a/comfy/model_detection.py
+++ b/comfy/model_detection.py
@@ -364,7 +364,10 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
            dit_config["vace_in_dim"] = state_dict['{}vace_patch_embedding.weight'.format(key_prefix)].shape[1]
            dit_config["vace_layers"] = count_blocks(state_dict_keys, '{}vace_blocks.'.format(key_prefix) + '{}.')
        elif '{}control_adapter.conv.weight'.format(key_prefix) in state_dict_keys:
-            dit_config["model_type"] = "camera"
+            if '{}img_emb.proj.0.bias'.format(key_prefix) in state_dict_keys:
+                dit_config["model_type"] = "camera"
+            else:
+                dit_config["model_type"] = "camera_2.2"
        else:
            if '{}img_emb.proj.0.bias'.format(key_prefix) in state_dict_keys:
                dit_config["model_type"] = "i2v"
@@ -373,6 +376,11 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
        flf_weight = state_dict.get('{}img_emb.emb_pos'.format(key_prefix))
        if flf_weight is not None:
            dit_config["flf_pos_embed_token_number"] = flf_weight.shape[1]
+
+        ref_conv_weight = state_dict.get('{}ref_conv.weight'.format(key_prefix))
+        if ref_conv_weight is not None:
+            dit_config["in_dim_ref_conv"] = ref_conv_weight.shape[1]
+
        return dit_config

    if '{}latent_in.weight'.format(key_prefix) in state_dict_keys:  # Hunyuan 3D
@@ -481,6 +489,11 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
        dit_config["timestep_scale"] = 1000.0
        return dit_config

+    if '{}txt_norm.weight'.format(key_prefix) in state_dict_keys:  # Qwen Image
+        dit_config = {}
+        dit_config["image_model"] = "qwen_image"
+        return dit_config
+
    if '{}input_blocks.0.0.weight'.format(key_prefix) not in state_dict_keys:
        return None

@@ -867,7 +880,7 @@ def convert_diffusers_mmdit(state_dict, output_prefix=""):
        depth_single_blocks = count_blocks(state_dict, 'single_transformer_blocks.{}.')
        hidden_size = state_dict["x_embedder.bias"].shape[0]
        sd_map = comfy.utils.flux_to_diffusers({"depth": depth, "depth_single_blocks": depth_single_blocks, "hidden_size": hidden_size}, output_prefix=output_prefix)
-    elif 'transformer_blocks.0.attn.add_q_proj.weight' in state_dict: #SD3
+    elif 'transformer_blocks.0.attn.add_q_proj.weight' in state_dict and 'pos_embed.proj.weight' in state_dict: #SD3
        num_blocks = count_blocks(state_dict, 'transformer_blocks.{}.')
        depth = state_dict["pos_embed.proj.weight"].shape[0] // 64
        sd_map = comfy.utils.mmdit_to_diffusers({"depth": depth, "num_blocks": num_blocks}, output_prefix=output_prefix)
--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@@ -78,7 +78,6 @@ try:
    torch_version = torch.version.__version__
    temp = torch_version.split(".")
    torch_version_numeric = (int(temp[0]), int(temp[1]))
-    xpu_available = (torch_version_numeric[0] < 2 or (torch_version_numeric[0] == 2 and torch_version_numeric[1] <= 4)) and torch.xpu.is_available()
 except:
    pass

@@ -102,10 +101,14 @@ if args.directml is not None:

 try:
    import intel_extension_for_pytorch as ipex  # noqa: F401
-    _ = torch.xpu.device_count()
-    xpu_available = xpu_available or torch.xpu.is_available()
 except:
-    xpu_available = xpu_available or (hasattr(torch, "xpu") and torch.xpu.is_available())
+    pass
+
+try:
+    _ = torch.xpu.device_count()
+    xpu_available = torch.xpu.is_available()
+except:
+    xpu_available = False

 try:
    if torch.backends.mps.is_available():
@@ -321,9 +324,9 @@ try:
            if torch_version_numeric >= (2, 7):  # works on 2.6 but doesn't actually seem to improve much
                if any((a in arch) for a in ["gfx90a", "gfx942", "gfx1100", "gfx1101", "gfx1151"]):  # TODO: more arches, TODO: gfx950
                    ENABLE_PYTORCH_ATTENTION = True
-            if torch_version_numeric >= (2, 8):
-                if any((a in arch) for a in ["gfx1201"]):
-                    ENABLE_PYTORCH_ATTENTION = True
+#            if torch_version_numeric >= (2, 8):
+#                if any((a in arch) for a in ["gfx1201"]):
+#                    ENABLE_PYTORCH_ATTENTION = True
        if torch_version_numeric >= (2, 7) and rocm_version >= (6, 4):
            if any((a in arch) for a in ["gfx1201", "gfx942", "gfx950"]):  # TODO: more arches
                SUPPORT_FP8_OPS = True
@@ -340,7 +343,7 @@ if ENABLE_PYTORCH_ATTENTION:

 PRIORITIZE_FP16 = False  # TODO: remove and replace with something that shows exactly which dtype is faster than the other
 try:
-    if is_nvidia() and PerformanceFeature.Fp16Accumulation in args.fast:
+    if (is_nvidia() or is_amd()) and PerformanceFeature.Fp16Accumulation in args.fast:
        torch.backends.cuda.matmul.allow_fp16_accumulation = True
        PRIORITIZE_FP16 = True  # TODO: limit to cards where it actually boosts performance
        logging.info("Enabled fp16 accumulation.")
@@ -579,16 +582,23 @@ def free_memory(memory_required, device, keep_loaded=[]):
                soft_empty_cache()
    return unloaded_models

+def get_models_memory_reserve(models):
+    total_reserve = 0
+    for model in models:
+        total_reserve += model.get_model_memory_reserve(convert_to_bytes=True)
+    return total_reserve
+
 def load_models_gpu(models, memory_required=0, force_patch_weights=False, minimum_memory_required=None, force_full_load=False):
    cleanup_models_gc()
    global vram_state

    inference_memory = minimum_inference_memory()
-    extra_mem = max(inference_memory, memory_required + extra_reserved_memory())
+    models_memory_reserve = get_models_memory_reserve(models)
+    extra_mem = max(inference_memory + models_memory_reserve, memory_required + extra_reserved_memory() + models_memory_reserve)
    if minimum_memory_required is None:
        minimum_memory_required = extra_mem
    else:
-        minimum_memory_required = max(inference_memory, minimum_memory_required + extra_reserved_memory())
+        minimum_memory_required = max(inference_memory + models_memory_reserve, minimum_memory_required + extra_reserved_memory() + models_memory_reserve)

    models = set(models)

@@ -946,10 +956,12 @@ def pick_weight_dtype(dtype, fallback_dtype, device=None):
    return dtype

 def device_supports_non_blocking(device):
+    if args.force_non_blocking:
+        return True
    if is_device_mps(device):
        return False #pytorch bug? mps doesn't support non blocking
-    if is_intel_xpu():
-        return True
+    if is_intel_xpu(): #xpu does support non blocking but it is slower on iGPUs for some reason so disable by default until situation changes
+        return False
    if args.deterministic: #TODO: figure out why deterministic breaks non blocking from gpu to cpu (previews)
        return False
    if directml_enabled:
@@ -1282,10 +1294,10 @@ def should_use_bf16(device=None, model_params=0, prioritize_performance=True, ma
        return False

    if is_intel_xpu():
-        if torch_version_numeric < (2, 6):
+        if torch_version_numeric < (2, 3):
            return True
        else:
-            return torch.xpu.get_device_capability(device)['has_bfloat16_conversions']
+            return torch.xpu.is_bf16_supported()

    if is_ascend_npu():
        return True
--- a/comfy/model_patcher.py
+++ b/comfy/model_patcher.py
@@ -24,7 +24,7 @@ import inspect
 import logging
 import math
 import uuid
-from typing import Callable, Optional
+from typing import Callable, Optional, Union

 import torch

@@ -84,6 +84,12 @@ def set_model_options_pre_cfg_function(model_options, pre_cfg_function, disable_
        model_options["disable_cfg1_optimization"] = True
    return model_options

+def add_model_options_memory_reserve(model_options, memory_reserve_gb: float):
+    if "model_memory_reserve" not in model_options:
+        model_options["model_memory_reserve"] = []
+    model_options["model_memory_reserve"].append(memory_reserve_gb)
+    return model_options
+
 def create_model_options_clone(orig_model_options: dict):
    return comfy.patcher_extension.copy_nested_dicts(orig_model_options)

@@ -439,6 +445,17 @@ class ModelPatcher:
            self.force_cast_weights = True
        self.patches_uuid = uuid.uuid4() #TODO: optimize by preventing a full model reload for this

+    def add_model_memory_reserve(self, memory_reserve_gb: float):
+        """Adds additional expected memory usage for the model, in gigabytes."""
+        self.model_options = add_model_options_memory_reserve(self.model_options, memory_reserve_gb)
+
+    def get_model_memory_reserve(self, convert_to_bytes: bool = False) -> Union[float, int]:
+        """Returns the total expected memory usage for the model in gigabytes, or bytes if convert_to_bytes is True."""
+        total_reserve = sum(self.model_options.get("model_memory_reserve", []))
+        if convert_to_bytes:
+            return total_reserve * 1024 * 1024 * 1024
+        return total_reserve
+
    def add_weight_wrapper(self, name, function):
        self.weight_wrapper_patches[name] = self.weight_wrapper_patches.get(name, []) + [function]
        self.patches_uuid = uuid.uuid4()
--- a/comfy/ops.py
+++ b/comfy/ops.py
@@ -24,6 +24,32 @@ import comfy.float
 import comfy.rmsnorm
 import contextlib

+
+def scaled_dot_product_attention(q, k, v, *args, **kwargs):
+    return torch.nn.functional.scaled_dot_product_attention(q, k, v, *args, **kwargs)
+
+
+try:
+    if torch.cuda.is_available():
+        from torch.nn.attention import SDPBackend, sdpa_kernel
+        import inspect
+        if "set_priority" in inspect.signature(sdpa_kernel).parameters:
+            SDPA_BACKEND_PRIORITY = [
+                SDPBackend.FLASH_ATTENTION,
+                SDPBackend.EFFICIENT_ATTENTION,
+                SDPBackend.MATH,
+            ]
+
+            SDPA_BACKEND_PRIORITY.insert(0, SDPBackend.CUDNN_ATTENTION)
+
+            def scaled_dot_product_attention(q, k, v, *args, **kwargs):
+                with sdpa_kernel(SDPA_BACKEND_PRIORITY, set_priority=True):
+                    return torch.nn.functional.scaled_dot_product_attention(q, k, v, *args, **kwargs)
+        else:
+            logging.warning("Torch version too old to set sdpa backend priority.")
+except (ModuleNotFoundError, TypeError):
+    logging.warning("Could not set sdpa backend priority.")
+
 cast_to = comfy.model_management.cast_to #TODO: remove once no more references

 def cast_to_input(weight, input, non_blocking=False, copy=True):
--- a/comfy/rmsnorm.py
+++ b/comfy/rmsnorm.py
@@ -1,6 +1,7 @@
 import torch
 import comfy.model_management
 import numbers
+import logging

 RMSNorm = None

@@ -9,6 +10,7 @@ try:
    RMSNorm = torch.nn.RMSNorm
 except:
    rms_norm_torch = None
+    logging.warning("Please update pytorch to use native RMSNorm")


 def rms_norm(x, weight=None, eps=1e-6):
--- a/comfy/sampler_helpers.py
+++ b/comfy/sampler_helpers.py
@@ -149,7 +149,7 @@ def cleanup_models(conds, models):

    cleanup_additional_models(set(control_cleanup))

-def prepare_model_patcher(model: 'ModelPatcher', conds, model_options: dict):
+def prepare_model_patcher(model: ModelPatcher, conds, model_options: dict):
    '''
    Registers hooks from conds.
    '''
@@ -158,8 +158,8 @@ def prepare_model_patcher(model: 'ModelPatcher', conds, model_options: dict):
    for k in conds:
        get_hooks_from_cond(conds[k], hooks)
    # add wrappers and callbacks from ModelPatcher to transformer_options
-    model_options["transformer_options"]["wrappers"] = comfy.patcher_extension.copy_nested_dicts(model.wrappers)
-    model_options["transformer_options"]["callbacks"] = comfy.patcher_extension.copy_nested_dicts(model.callbacks)
+    comfy.patcher_extension.merge_nested_dicts(model_options["transformer_options"].setdefault("wrappers", {}), model.wrappers, copy_dict1=False)
+    comfy.patcher_extension.merge_nested_dicts(model_options["transformer_options"].setdefault("callbacks", {}), model.callbacks, copy_dict1=False)
    # begin registering hooks
    registered = comfy.hooks.HookGroup()
    target_dict = comfy.hooks.create_target_dict(comfy.hooks.EnumWeightTarget.Model)
--- a/comfy/samplers.py
+++ b/comfy/samplers.py
@@ -16,6 +16,7 @@ import comfy.sampler_helpers
 import comfy.model_patcher
 import comfy.patcher_extension
 import comfy.hooks
+import comfy.context_windows
 import scipy.stats
 import numpy

@@ -89,7 +90,7 @@ def get_area_and_mult(conds, x_in, timestep_in):
    conditioning = {}
    model_conds = conds["model_conds"]
    for c in model_conds:
-        conditioning[c] = model_conds[c].process_cond(batch_size=x_in.shape[0], device=x_in.device, area=area)
+        conditioning[c] = model_conds[c].process_cond(batch_size=x_in.shape[0], area=area)

    hooks = conds.get('hooks', None)
    control = conds.get('control', None)
@@ -198,14 +199,20 @@ def finalize_default_conds(model: 'BaseModel', hooked_to_run: dict[comfy.hooks.H
            hooked_to_run.setdefault(p.hooks, list())
            hooked_to_run[p.hooks] += [(p, i)]

-def calc_cond_batch(model: 'BaseModel', conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
+def calc_cond_batch(model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options: dict[str]):
+    handler: comfy.context_windows.ContextHandlerABC = model_options.get("context_handler", None)
+    if handler is None or not handler.should_use_context(model, conds, x_in, timestep, model_options):
+        return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options)
+    return handler.execute(_calc_cond_batch_outer, model, conds, x_in, timestep, model_options)
+
+def _calc_cond_batch_outer(model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
    executor = comfy.patcher_extension.WrapperExecutor.new_executor(
        _calc_cond_batch,
        comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.CALC_COND_BATCH, model_options, is_model_options=True)
    )
    return executor.execute(model, conds, x_in, timestep, model_options)

-def _calc_cond_batch(model: 'BaseModel', conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
+def _calc_cond_batch(model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
    out_conds = []
    out_counts = []
    # separate conds by matching hooks
--- a/comfy/sd.py
+++ b/comfy/sd.py
@@ -47,6 +47,7 @@ import comfy.text_encoders.wan
 import comfy.text_encoders.hidream
 import comfy.text_encoders.ace
 import comfy.text_encoders.omnigen2
+import comfy.text_encoders.qwen_image

 import comfy.model_patcher
 import comfy.lora
@@ -771,6 +772,7 @@ class CLIPType(Enum):
    CHROMA = 15
    ACE = 16
    OMNIGEN2 = 17
+    QWEN_IMAGE = 18


 def load_clip(ckpt_paths, embedding_directory=None, clip_type=CLIPType.STABLE_DIFFUSION, model_options={}):
@@ -791,6 +793,7 @@ class TEModel(Enum):
    T5_XXL_OLD = 8
    GEMMA_2_2B = 9
    QWEN25_3B = 10
+    QWEN25_7B = 11

 def detect_te_model(sd):
    if "text_model.encoder.layers.30.mlp.fc1.weight" in sd:
@@ -812,7 +815,11 @@ def detect_te_model(sd):
    if 'model.layers.0.post_feedforward_layernorm.weight' in sd:
        return TEModel.GEMMA_2_2B
    if 'model.layers.0.self_attn.k_proj.bias' in sd:
-        return TEModel.QWEN25_3B
+        weight = sd['model.layers.0.self_attn.k_proj.bias']
+        if weight.shape[0] == 256:
+            return TEModel.QWEN25_3B
+        if weight.shape[0] == 512:
+            return TEModel.QWEN25_7B
    if "model.layers.0.post_attention_layernorm.weight" in sd:
        return TEModel.LLAMA3_8
    return None
@@ -917,6 +924,9 @@ def load_text_encoder_state_dicts(state_dicts=[], embedding_directory=None, clip
        elif te_model == TEModel.QWEN25_3B:
            clip_target.clip = comfy.text_encoders.omnigen2.te(**llama_detect(clip_data))
            clip_target.tokenizer = comfy.text_encoders.omnigen2.Omnigen2Tokenizer
+        elif te_model == TEModel.QWEN25_7B:
+            clip_target.clip = comfy.text_encoders.qwen_image.te(**llama_detect(clip_data))
+            clip_target.tokenizer = comfy.text_encoders.qwen_image.QwenImageTokenizer
        else:
            # clip_l
            if clip_type == CLIPType.SD3:
--- a/comfy/supported_models.py
+++ b/comfy/supported_models.py
@@ -19,6 +19,7 @@ import comfy.text_encoders.lumina2
 import comfy.text_encoders.wan
 import comfy.text_encoders.ace
 import comfy.text_encoders.omnigen2
+import comfy.text_encoders.qwen_image

 from . import supported_models_base
 from . import latent_formats
@@ -1045,6 +1046,18 @@ class WAN21_Camera(WAN21_T2V):
    def get_model(self, state_dict, prefix="", device=None):
        out = model_base.WAN21_Camera(self, image_to_video=False, device=device)
        return out
+
+class WAN22_Camera(WAN21_T2V):
+    unet_config = {
+        "image_model": "wan2.1",
+        "model_type": "camera_2.2",
+        "in_dim": 36,
+    }
+
+    def get_model(self, state_dict, prefix="", device=None):
+        out = model_base.WAN21_Camera(self, image_to_video=False, device=device)
+        return out
+
 class WAN21_Vace(WAN21_T2V):
    unet_config = {
        "image_model": "wan2.1",
@@ -1229,7 +1242,36 @@ class Omnigen2(supported_models_base.BASE):
        hunyuan_detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen25_3b.transformer.".format(pref))
        return supported_models_base.ClipTarget(comfy.text_encoders.omnigen2.Omnigen2Tokenizer, comfy.text_encoders.omnigen2.te(**hunyuan_detect))

+class QwenImage(supported_models_base.BASE):
+    unet_config = {
+        "image_model": "qwen_image",
+    }

-models = [LotusD, Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanVideoSkyreelsI2V, HunyuanVideoI2V, HunyuanVideo, CosmosT2V, CosmosI2V, CosmosT2IPredict2, CosmosI2VPredict2, Lumina2, WAN22_T2V, WAN21_T2V, WAN21_I2V, WAN21_FunControl2V, WAN21_Vace, WAN21_Camera, Hunyuan3Dv2mini, Hunyuan3Dv2, HiDream, Chroma, ACEStep, Omnigen2]
+    sampling_settings = {
+        "multiplier": 1.0,
+        "shift": 1.15,
+    }
+
+    memory_usage_factor = 1.8 #TODO
+
+    unet_extra_config = {}
+    latent_format = latent_formats.Wan21
+
+    supported_inference_dtypes = [torch.bfloat16, torch.float32]
+
+    vae_key_prefix = ["vae."]
+    text_encoder_key_prefix = ["text_encoders."]
+
+    def get_model(self, state_dict, prefix="", device=None):
+        out = model_base.QwenImage(self, device=device)
+        return out
+
+    def clip_target(self, state_dict={}):
+        pref = self.text_encoder_key_prefix[0]
+        hunyuan_detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen25_7b.transformer.".format(pref))
+        return supported_models_base.ClipTarget(comfy.text_encoders.qwen_image.QwenImageTokenizer, comfy.text_encoders.qwen_image.te(**hunyuan_detect))
+
+
+models = [LotusD, Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanVideoSkyreelsI2V, HunyuanVideoI2V, HunyuanVideo, CosmosT2V, CosmosI2V, CosmosT2IPredict2, CosmosI2VPredict2, Lumina2, WAN22_T2V, WAN21_T2V, WAN21_I2V, WAN21_FunControl2V, WAN21_Vace, WAN21_Camera, WAN22_Camera, Hunyuan3Dv2mini, Hunyuan3Dv2, HiDream, Chroma, ACEStep, Omnigen2, QwenImage]

 models += [SVD_img2vid]
--- a/comfy/text_encoders/llama.py
+++ b/comfy/text_encoders/llama.py
@@ -43,6 +43,23 @@ class Qwen25_3BConfig:
    mlp_activation = "silu"
    qkv_bias = True

+@dataclass
+class Qwen25_7BVLI_Config:
+    vocab_size: int = 152064
+    hidden_size: int = 3584
+    intermediate_size: int = 18944
+    num_hidden_layers: int = 28
+    num_attention_heads: int = 28
+    num_key_value_heads: int = 4
+    max_position_embeddings: int = 128000
+    rms_norm_eps: float = 1e-6
+    rope_theta: float = 1000000.0
+    transformer_type: str = "llama"
+    head_dim = 128
+    rms_norm_add = False
+    mlp_activation = "silu"
+    qkv_bias = True
+
@dataclass
 class Gemma2_2B_Config:
    vocab_size: int = 256000
@@ -348,6 +365,15 @@ class Qwen25_3B(BaseLlama, torch.nn.Module):
        self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
        self.dtype = dtype

+class Qwen25_7BVLI(BaseLlama, torch.nn.Module):
+    def __init__(self, config_dict, dtype, device, operations):
+        super().__init__()
+        config = Qwen25_7BVLI_Config(**config_dict)
+        self.num_layers = config.num_hidden_layers
+
+        self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
+        self.dtype = dtype
+
 class Gemma2_2B(BaseLlama, torch.nn.Module):
    def __init__(self, config_dict, dtype, device, operations):
        super().__init__()
--- a/comfy/text_encoders/qwen_image.py
+++ b/comfy/text_encoders/qwen_image.py
@@ -0,0 +1,71 @@
+from transformers import Qwen2Tokenizer
+from comfy import sd1_clip
+import comfy.text_encoders.llama
+import os
+import torch
+import numbers
+
+class Qwen25_7BVLITokenizer(sd1_clip.SDTokenizer):
+    def __init__(self, embedding_directory=None, tokenizer_data={}):
+        tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "qwen25_tokenizer")
+        super().__init__(tokenizer_path, pad_with_end=False, embedding_size=3584, embedding_key='qwen25_7b', tokenizer_class=Qwen2Tokenizer, has_start_token=False, has_end_token=False, pad_to_max_length=False, max_length=99999999, min_length=1, pad_token=151643, tokenizer_data=tokenizer_data)
+
+
+class QwenImageTokenizer(sd1_clip.SD1Tokenizer):
+    def __init__(self, embedding_directory=None, tokenizer_data={}):
+        super().__init__(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data, name="qwen25_7b", tokenizer=Qwen25_7BVLITokenizer)
+        self.llama_template = "<|im_start|>system\nDescribe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n"
+
+    def tokenize_with_weights(self, text, return_word_ids=False, llama_template=None,**kwargs):
+        if llama_template is None:
+            llama_text = self.llama_template.format(text)
+        else:
+            llama_text = llama_template.format(text)
+        return super().tokenize_with_weights(llama_text, return_word_ids=return_word_ids, **kwargs)
+
+
+class Qwen25_7BVLIModel(sd1_clip.SDClipModel):
+    def __init__(self, device="cpu", layer="last", layer_idx=None, dtype=None, attention_mask=True, model_options={}):
+        super().__init__(device=device, layer=layer, layer_idx=layer_idx, textmodel_json_config={}, dtype=dtype, special_tokens={"pad": 151643}, layer_norm_hidden_state=False, model_class=comfy.text_encoders.llama.Qwen25_7BVLI, enable_attention_masks=attention_mask, return_attention_masks=attention_mask, model_options=model_options)
+
+
+class QwenImageTEModel(sd1_clip.SD1ClipModel):
+    def __init__(self, device="cpu", dtype=None, model_options={}):
+        super().__init__(device=device, dtype=dtype, name="qwen25_7b", clip_model=Qwen25_7BVLIModel, model_options=model_options)
+
+    def encode_token_weights(self, token_weight_pairs):
+        out, pooled, extra = super().encode_token_weights(token_weight_pairs)
+        tok_pairs = token_weight_pairs["qwen25_7b"][0]
+        count_im_start = 0
+        for i, v in enumerate(tok_pairs):
+            elem = v[0]
+            if not torch.is_tensor(elem):
+                if isinstance(elem, numbers.Integral):
+                    if elem == 151644 and count_im_start < 2:
+                        template_end = i
+                        count_im_start += 1
+
+        if out.shape[1] > (template_end + 3):
+            if tok_pairs[template_end + 1][0] == 872:
+                if tok_pairs[template_end + 2][0] == 198:
+                    template_end += 3
+
+        out = out[:, template_end:]
+
+        extra["attention_mask"] = extra["attention_mask"][:, template_end:]
+        if extra["attention_mask"].sum() == torch.numel(extra["attention_mask"]):
+            extra.pop("attention_mask")  # attention mask is useless if no masked elements
+
+        return out, pooled, extra
+
+
+def te(dtype_llama=None, llama_scaled_fp8=None):
+    class QwenImageTEModel_(QwenImageTEModel):
+        def __init__(self, device="cpu", dtype=None, model_options={}):
+            if llama_scaled_fp8 is not None and "scaled_fp8" not in model_options:
+                model_options = model_options.copy()
+                model_options["scaled_fp8"] = llama_scaled_fp8
+            if dtype_llama is not None:
+                dtype = dtype_llama
+            super().__init__(device=device, dtype=dtype, model_options=model_options)
+    return QwenImageTEModel_
--- a/comfy/weight_adapter/lora.py
+++ b/comfy/weight_adapter/lora.py
@@ -96,6 +96,7 @@ class LoRAAdapter(WeightAdapterBase):
        diffusers3_lora = "{}.lora.up.weight".format(x)
        mochi_lora = "{}.lora_B".format(x)
        transformers_lora = "{}.lora_linear_layer.up.weight".format(x)
+        qwen_default_lora = "{}.lora_B.default.weight".format(x)
        A_name = None

        if regular_lora in lora.keys():
@@ -122,6 +123,10 @@ class LoRAAdapter(WeightAdapterBase):
            A_name = transformers_lora
            B_name = "{}.lora_linear_layer.down.weight".format(x)
            mid_name = None
+        elif qwen_default_lora in lora.keys():
+            A_name = qwen_default_lora
+            B_name = "{}.lora_A.default.weight".format(x)
+            mid_name = None

        if A_name is not None:
            mid = None
--- a/comfy_api/latest/_ui.py
+++ b/comfy_api/latest/_ui.py
@@ -9,7 +9,11 @@ from typing import Type
 import av
 import numpy as np
 import torch
-import torchaudio
+try:
+    import torchaudio
+    TORCH_AUDIO_AVAILABLE = True
+except:
+    TORCH_AUDIO_AVAILABLE = False
 from PIL import Image as PILImage
 from PIL.PngImagePlugin import PngInfo

@@ -302,6 +306,8 @@ class AudioSaveHelper:

                # Resample if necessary
                if sample_rate != audio["sample_rate"]:
+                    if not TORCH_AUDIO_AVAILABLE:
+                        raise Exception("torchaudio is not available; cannot resample audio.")
                    waveform = torchaudio.functional.resample(waveform, audio["sample_rate"], sample_rate)

            # Create output with specified format
--- a/comfy_api_nodes/apinode_utils.py
+++ b/comfy_api_nodes/apinode_utils.py
@@ -1,4 +1,5 @@
 from __future__ import annotations
+import aiohttp
 import io
 import logging
 import mimetypes
@@ -21,7 +22,6 @@ from server import PromptServer

 import numpy as np
 from PIL import Image
-import requests
 import torch
 import math
 import base64
@@ -30,7 +30,7 @@ from io import BytesIO
 import av


-def download_url_to_video_output(video_url: str, timeout: int = None) -> VideoFromFile:
+async def download_url_to_video_output(video_url: str, timeout: int = None) -> VideoFromFile:
    """Downloads a video from a URL and returns a `VIDEO` output.

    Args:
@@ -39,7 +39,7 @@ def download_url_to_video_output(video_url: str, timeout: int = None) -> VideoFr
    Returns:
        A Comfy node `VIDEO` output.
    """
-    video_io = download_url_to_bytesio(video_url, timeout)
+    video_io = await download_url_to_bytesio(video_url, timeout)
    if video_io is None:
        error_msg = f"Failed to download video from {video_url}"
        logging.error(error_msg)
@@ -62,7 +62,7 @@ def downscale_image_tensor(image, total_pixels=1536 * 1024) -> torch.Tensor:
    return s


-def validate_and_cast_response(
+async def validate_and_cast_response(
    response, timeout: int = None, node_id: Union[str, None] = None
 ) -> torch.Tensor:
    """Validates and casts a response to a torch.Tensor.
@@ -86,35 +86,24 @@ def validate_and_cast_response(
    image_tensors: list[torch.Tensor] = []

    # Process each image in the data array
-    for image_data in data:
-        image_url = image_data.url
-        b64_data = image_data.b64_json
+    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=timeout)) as session:
+        for img_data in data:
+            img_bytes: bytes
+            if img_data.b64_json:
+                img_bytes = base64.b64decode(img_data.b64_json)
+            elif img_data.url:
+                if node_id:
+                    PromptServer.instance.send_progress_text(f"Result URL: {img_data.url}", node_id)
+                async with session.get(img_data.url) as resp:
+                    if resp.status != 200:
+                        raise ValueError("Failed to download generated image")
+                    img_bytes = await resp.read()
+            else:
+                raise ValueError("Invalid image payload – neither URL nor base64 data present.")

-        if not image_url and not b64_data:
-            raise ValueError("No image was generated in the response")
-
-        if b64_data:
-            img_data = base64.b64decode(b64_data)
-            img = Image.open(io.BytesIO(img_data))
-
-        elif image_url:
-            if node_id:
-                PromptServer.instance.send_progress_text(
-                    f"Result URL: {image_url}", node_id
-                )
-            img_response = requests.get(image_url, timeout=timeout)
-            if img_response.status_code != 200:
-                raise ValueError("Failed to download the image")
-            img = Image.open(io.BytesIO(img_response.content))
-
-        img = img.convert("RGBA")
-
-        # Convert to numpy array, normalize to float32 between 0 and 1
-        img_array = np.array(img).astype(np.float32) / 255.0
-        img_tensor = torch.from_numpy(img_array)
-
-        # Add to list of tensors
-        image_tensors.append(img_tensor)
+            pil_img = Image.open(BytesIO(img_bytes)).convert("RGBA")
+            arr = np.asarray(pil_img).astype(np.float32) / 255.0
+            image_tensors.append(torch.from_numpy(arr))

    return torch.stack(image_tensors, dim=0)

@@ -175,7 +164,7 @@ def mimetype_to_extension(mime_type: str) -> str:
    return mime_type.split("/")[-1].lower()


-def download_url_to_bytesio(url: str, timeout: int = None) -> BytesIO:
+async def download_url_to_bytesio(url: str, timeout: int = None) -> BytesIO:
    """Downloads content from a URL using requests and returns it as BytesIO.

    Args:
@@ -185,9 +174,11 @@ def download_url_to_bytesio(url: str, timeout: int = None) -> BytesIO:
    Returns:
        BytesIO object containing the downloaded content.
    """
-    response = requests.get(url, stream=True, timeout=timeout)
-    response.raise_for_status()  # Raises HTTPError for bad responses (4XX or 5XX)
-    return BytesIO(response.content)
+    timeout_cfg = aiohttp.ClientTimeout(total=timeout) if timeout else None
+    async with aiohttp.ClientSession(timeout=timeout_cfg) as session:
+        async with session.get(url) as resp:
+            resp.raise_for_status()  # Raises HTTPError for bad responses (4XX or 5XX)
+            return BytesIO(await resp.read())


 def bytesio_to_image_tensor(image_bytesio: BytesIO, mode: str = "RGBA") -> torch.Tensor:
@@ -210,15 +201,15 @@ def bytesio_to_image_tensor(image_bytesio: BytesIO, mode: str = "RGBA") -> torch
    return torch.from_numpy(image_array).unsqueeze(0)


-def download_url_to_image_tensor(url: str, timeout: int = None) -> torch.Tensor:
+async def download_url_to_image_tensor(url: str, timeout: int = None) -> torch.Tensor:
    """Downloads an image from a URL and returns a [B, H, W, C] tensor."""
-    image_bytesio = download_url_to_bytesio(url, timeout)
+    image_bytesio = await download_url_to_bytesio(url, timeout)
    return bytesio_to_image_tensor(image_bytesio)


-def process_image_response(response: requests.Response) -> torch.Tensor:
+def process_image_response(response_content: bytes | str) -> torch.Tensor:
    """Uses content from a Response object and converts it to a torch.Tensor"""
-    return bytesio_to_image_tensor(BytesIO(response.content))
+    return bytesio_to_image_tensor(BytesIO(response_content))


 def _tensor_to_pil(image: torch.Tensor, total_pixels: int = 2048 * 2048) -> Image.Image:
@@ -336,10 +327,10 @@ def text_filepath_to_data_uri(filepath: str) -> str:
    return f"data:{mime_type};base64,{base64_string}"


-def upload_file_to_comfyapi(
+async def upload_file_to_comfyapi(
    file_bytes_io: BytesIO,
    filename: str,
-    upload_mime_type: str,
+    upload_mime_type: Optional[str],
    auth_kwargs: Optional[dict[str, str]] = None,
 ) -> str:
    """
@@ -354,7 +345,10 @@ def upload_file_to_comfyapi(
    Returns:
        The download URL for the uploaded file.
    """
-    request_object = UploadRequest(file_name=filename, content_type=upload_mime_type)
+    if upload_mime_type is None:
+        request_object = UploadRequest(file_name=filename)
+    else:
+        request_object = UploadRequest(file_name=filename, content_type=upload_mime_type)
    operation = SynchronousOperation(
        endpoint=ApiEndpoint(
            path="/customers/storage",
@@ -366,12 +360,8 @@ def upload_file_to_comfyapi(
        auth_kwargs=auth_kwargs,
    )

-    response: UploadResponse = operation.execute()
-    upload_response = ApiClient.upload_file(
-        response.upload_url, file_bytes_io, content_type=upload_mime_type
-    )
-    upload_response.raise_for_status()
-
+    response: UploadResponse = await operation.execute()
+    await ApiClient.upload_file(response.upload_url, file_bytes_io, content_type=upload_mime_type)
    return response.download_url


@@ -399,7 +389,7 @@ def video_to_base64_string(
    return base64.b64encode(video_bytes_io.getvalue()).decode("utf-8")


-def upload_video_to_comfyapi(
+async def upload_video_to_comfyapi(
    video: VideoInput,
    auth_kwargs: Optional[dict[str, str]] = None,
    container: VideoContainer = VideoContainer.MP4,
@@ -439,9 +429,7 @@ def upload_video_to_comfyapi(
    video.save_to(video_bytes_io, format=container, codec=codec)
    video_bytes_io.seek(0)

-    return upload_file_to_comfyapi(
-        video_bytes_io, filename, upload_mime_type, auth_kwargs
-    )
+    return await upload_file_to_comfyapi(video_bytes_io, filename, upload_mime_type, auth_kwargs)


 def audio_tensor_to_contiguous_ndarray(waveform: torch.Tensor) -> np.ndarray:
@@ -501,7 +489,7 @@ def audio_ndarray_to_bytesio(
    return audio_bytes_io


-def upload_audio_to_comfyapi(
+async def upload_audio_to_comfyapi(
    audio: AudioInput,
    auth_kwargs: Optional[dict[str, str]] = None,
    container_format: str = "mp4",
@@ -527,7 +515,7 @@ def upload_audio_to_comfyapi(
        audio_data_np, sample_rate, container_format, codec_name
    )

-    return upload_file_to_comfyapi(audio_bytes_io, filename, mime_type, auth_kwargs)
+    return await upload_file_to_comfyapi(audio_bytes_io, filename, mime_type, auth_kwargs)


 def audio_to_base64_string(
@@ -544,7 +532,7 @@ def audio_to_base64_string(
    return base64.b64encode(audio_bytes).decode("utf-8")


-def upload_images_to_comfyapi(
+async def upload_images_to_comfyapi(
    image: torch.Tensor,
    max_images=8,
    auth_kwargs: Optional[dict[str, str]] = None,
@@ -561,55 +549,15 @@ def upload_images_to_comfyapi(
        mime_type: Optional MIME type for the image.
    """
    # if batch, try to upload each file if max_images is greater than 0
-    idx_image = 0
    download_urls: list[str] = []
    is_batch = len(image.shape) > 3
-    batch_length = 1
-    if is_batch:
-        batch_length = image.shape[0]
-    while True:
-        curr_image = image
-        if len(image.shape) > 3:
-            curr_image = image[idx_image]
-        # get BytesIO version of image
-        img_binary = tensor_to_bytesio(curr_image, mime_type=mime_type)
-        # first, request upload/download urls from comfy API
-        if not mime_type:
-            request_object = UploadRequest(file_name=img_binary.name)
-        else:
-            request_object = UploadRequest(
-                file_name=img_binary.name, content_type=mime_type
-            )
-        operation = SynchronousOperation(
-            endpoint=ApiEndpoint(
-                path="/customers/storage",
-                method=HttpMethod.POST,
-                request_model=UploadRequest,
-                response_model=UploadResponse,
-            ),
-            request=request_object,
-            auth_kwargs=auth_kwargs,
-        )
-        response = operation.execute()
+    batch_len = image.shape[0] if is_batch else 1

-        upload_response = ApiClient.upload_file(
-            response.upload_url, img_binary, content_type=mime_type
-        )
-        # verify success
-        try:
-            upload_response.raise_for_status()
-        except requests.exceptions.HTTPError as e:
-            raise ValueError(f"Could not upload one or more images: {e}") from e
-        # add download_url to list
-        download_urls.append(response.download_url)
-
-        idx_image += 1
-        # stop uploading additional files if done
-        if is_batch and max_images > 0:
-            if idx_image >= max_images:
-                break
-            if idx_image >= batch_length:
-                break
+    for idx in range(min(batch_len, max_images)):
+        tensor = image[idx] if is_batch else image
+        img_io = tensor_to_bytesio(tensor, mime_type=mime_type)
+        url = await upload_file_to_comfyapi(img_io, img_io.name, mime_type, auth_kwargs)
+        download_urls.append(url)
    return download_urls


--- a/comfy_api_nodes/apis/init.py
+++ b/comfy_api_nodes/apis/init.py
--- a/comfy_api_nodes/apis/client.py
+++ b/comfy_api_nodes/apis/client.py
--- a/comfy_api_nodes/apis/tripo_api.py
+++ b/comfy_api_nodes/apis/tripo_api.py
@@ -127,7 +127,7 @@ class TripoTextToModelRequest(BaseModel):
    type: TripoTaskType = Field(TripoTaskType.TEXT_TO_MODEL, description='Type of task')
    prompt: str = Field(..., description='The text prompt describing the model to generate', max_length=1024)
    negative_prompt: Optional[str] = Field(None, description='The negative text prompt', max_length=1024)
-    model_version: Optional[TripoModelVersion] = TripoModelVersion.V2_5
+    model_version: Optional[TripoModelVersion] = TripoModelVersion.v2_5_20250123
    face_limit: Optional[int] = Field(None, description='The number of faces to limit the generation to')
    texture: Optional[bool] = Field(True, description='Whether to apply texture to the generated model')
    pbr: Optional[bool] = Field(True, description='Whether to apply PBR to the generated model')
--- a/comfy_api_nodes/nodes_bfl.py
+++ b/comfy_api_nodes/nodes_bfl.py
@@ -1,3 +1,4 @@
+import asyncio
 import io
 from inspect import cleandoc
 from typing import Union, Optional
@@ -28,7 +29,7 @@ from comfy_api_nodes.apinode_utils import (

 import numpy as np
 from PIL import Image
-import requests
+import aiohttp
 import torch
 import base64
 import time
@@ -44,18 +45,18 @@ def convert_mask_to_image(mask: torch.Tensor):
    return mask


-def handle_bfl_synchronous_operation(
+async def handle_bfl_synchronous_operation(
    operation: SynchronousOperation,
    timeout_bfl_calls=360,
    node_id: Union[str, None] = None,
 ):
-    response_api: BFLFluxProGenerateResponse = operation.execute()
-    return _poll_until_generated(
+    response_api: BFLFluxProGenerateResponse = await operation.execute()
+    return await _poll_until_generated(
        response_api.polling_url, timeout=timeout_bfl_calls, node_id=node_id
    )


-def _poll_until_generated(
+async def _poll_until_generated(
    polling_url: str, timeout=360, node_id: Union[str, None] = None
 ):
    # used bfl-comfy-nodes to verify code implementation:
@@ -66,55 +67,56 @@ def _poll_until_generated(
    retry_404_seconds = 2
    retry_202_seconds = 2
    retry_pending_seconds = 1
-    request = requests.Request(method=HttpMethod.GET, url=polling_url)
-    # NOTE: should True loop be replaced with checking if workflow has been interrupted?
-    while True:
-        if node_id:
-            time_elapsed = time.time() - start_time
-            PromptServer.instance.send_progress_text(
-                f"Generating ({time_elapsed:.0f}s)", node_id
-            )

-        response = requests.Session().send(request.prepare())
-        if response.status_code == 200:
-            result = response.json()
-            if result["status"] == BFLStatus.ready:
-                img_url = result["result"]["sample"]
-                if node_id:
-                    PromptServer.instance.send_progress_text(
-                        f"Result URL: {img_url}", node_id
-                    )
-                img_response = requests.get(img_url)
-                return process_image_response(img_response)
-            elif result["status"] in [
-                BFLStatus.request_moderated,
-                BFLStatus.content_moderated,
-            ]:
-                status = result["status"]
-                raise Exception(
-                    f"BFL API did not return an image due to: {status}."
+    async with aiohttp.ClientSession() as session:
+        # NOTE: should True loop be replaced with checking if workflow has been interrupted?
+        while True:
+            if node_id:
+                time_elapsed = time.time() - start_time
+                PromptServer.instance.send_progress_text(
+                    f"Generating ({time_elapsed:.0f}s)", node_id
                )
-            elif result["status"] == BFLStatus.error:
-                raise Exception(f"BFL API encountered an error: {result}.")
-            elif result["status"] == BFLStatus.pending:
-                time.sleep(retry_pending_seconds)
-                continue
-        elif response.status_code == 404:
-            if retries_404 < max_retries_404:
-                retries_404 += 1
-                time.sleep(retry_404_seconds)
-                continue
-            raise Exception(
-                f"BFL API could not find task after {max_retries_404} tries."
-            )
-        elif response.status_code == 202:
-            time.sleep(retry_202_seconds)
-        elif time.time() - start_time > timeout:
-            raise Exception(
-                f"BFL API experienced a timeout; could not return request under {timeout} seconds."
-            )
-        else:
-            raise Exception(f"BFL API encountered an error: {response.json()}")
+
+            async with session.get(polling_url) as response:
+                if response.status == 200:
+                    result = await response.json()
+                    if result["status"] == BFLStatus.ready:
+                        img_url = result["result"]["sample"]
+                        if node_id:
+                            PromptServer.instance.send_progress_text(
+                                f"Result URL: {img_url}", node_id
+                            )
+                        async with session.get(img_url) as img_resp:
+                            return process_image_response(await img_resp.content.read())
+                    elif result["status"] in [
+                        BFLStatus.request_moderated,
+                        BFLStatus.content_moderated,
+                    ]:
+                        status = result["status"]
+                        raise Exception(
+                            f"BFL API did not return an image due to: {status}."
+                        )
+                    elif result["status"] == BFLStatus.error:
+                        raise Exception(f"BFL API encountered an error: {result}.")
+                    elif result["status"] == BFLStatus.pending:
+                        await asyncio.sleep(retry_pending_seconds)
+                        continue
+                elif response.status == 404:
+                    if retries_404 < max_retries_404:
+                        retries_404 += 1
+                        await asyncio.sleep(retry_404_seconds)
+                        continue
+                    raise Exception(
+                        f"BFL API could not find task after {max_retries_404} tries."
+                    )
+                elif response.status == 202:
+                    await asyncio.sleep(retry_202_seconds)
+                elif time.time() - start_time > timeout:
+                    raise Exception(
+                        f"BFL API experienced a timeout; could not return request under {timeout} seconds."
+                    )
+                else:
+                    raise Exception(f"BFL API encountered an error: {response.json()}")

 def convert_image_to_base64(image: torch.Tensor):
    scaled_image = downscale_image_tensor(image, total_pixels=2048 * 2048)
@@ -222,7 +224,7 @@ class FluxProUltraImageNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        aspect_ratio: str,
@@ -266,7 +268,7 @@ class FluxProUltraImageNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -354,7 +356,7 @@ class FluxKontextProImageNode(ComfyNodeABC):

    BFL_PATH = "/proxy/bfl/flux-kontext-pro/generate"

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        aspect_ratio: str,
@@ -397,7 +399,7 @@ class FluxKontextProImageNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -489,7 +491,7 @@ class FluxProImageNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        prompt_upsampling,
@@ -524,7 +526,7 @@ class FluxProImageNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -632,7 +634,7 @@ class FluxProExpandNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt: str,
@@ -670,7 +672,7 @@ class FluxProExpandNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -744,7 +746,7 @@ class FluxProFillNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        mask: torch.Tensor,
@@ -780,7 +782,7 @@ class FluxProFillNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -879,7 +881,7 @@ class FluxProCannyNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        control_image: torch.Tensor,
        prompt: str,
@@ -929,7 +931,7 @@ class FluxProCannyNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -1008,7 +1010,7 @@ class FluxProDepthNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        control_image: torch.Tensor,
        prompt: str,
@@ -1045,7 +1047,7 @@ class FluxProDepthNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


--- a/comfy_api_nodes/nodes_gemini.py
+++ b/comfy_api_nodes/nodes_gemini.py
@@ -303,7 +303,7 @@ class GeminiNode(ComfyNodeABC):
        """
        return GeminiPart(text=text)

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: GeminiModel,
@@ -332,7 +332,7 @@ class GeminiNode(ComfyNodeABC):
            parts.extend(files)

        # Create response
-        response = SynchronousOperation(
+        response = await SynchronousOperation(
            endpoint=get_gemini_endpoint(model),
            request=GeminiGenerateContentRequest(
                contents=[
--- a/comfy_api_nodes/nodes_ideogram.py
+++ b/comfy_api_nodes/nodes_ideogram.py
@@ -212,7 +212,7 @@ V3_RESOLUTIONS= [
    "1536x640"
 ]

-def download_and_process_images(image_urls):
+async def download_and_process_images(image_urls):
    """Helper function to download and process multiple images from URLs"""

    # Initialize list to store image tensors
@@ -220,7 +220,7 @@ def download_and_process_images(image_urls):

    for image_url in image_urls:
        # Using functions from apinode_utils.py to handle downloading and processing
-        image_bytesio = download_url_to_bytesio(image_url)  # Download image content to BytesIO
+        image_bytesio = await download_url_to_bytesio(image_url)  # Download image content to BytesIO
        img_tensor = bytesio_to_image_tensor(image_bytesio, mode="RGB")  # Convert to torch.Tensor with RGB mode
        image_tensors.append(img_tensor)

@@ -328,7 +328,7 @@ class IdeogramV1(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        turbo=False,
@@ -367,7 +367,7 @@ class IdeogramV1(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")
@@ -378,7 +378,7 @@ class IdeogramV1(ComfyNodeABC):
            raise Exception("No image URLs were generated in the response")

        display_image_urls_on_node(image_urls, unique_id)
-        return (download_and_process_images(image_urls),)
+        return (await download_and_process_images(image_urls),)


 class IdeogramV2(ComfyNodeABC):
@@ -487,7 +487,7 @@ class IdeogramV2(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        turbo=False,
@@ -543,7 +543,7 @@ class IdeogramV2(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")
@@ -554,7 +554,7 @@ class IdeogramV2(ComfyNodeABC):
            raise Exception("No image URLs were generated in the response")

        display_image_urls_on_node(image_urls, unique_id)
-        return (download_and_process_images(image_urls),)
+        return (await download_and_process_images(image_urls),)

 class IdeogramV3(ComfyNodeABC):
    """
@@ -653,7 +653,7 @@ class IdeogramV3(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        image=None,
@@ -774,7 +774,7 @@ class IdeogramV3(ComfyNodeABC):
            )

        # Execute the operation and process response
-        response = operation.execute()
+        response = await operation.execute()

        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")
@@ -785,7 +785,7 @@ class IdeogramV3(ComfyNodeABC):
            raise Exception("No image URLs were generated in the response")

        display_image_urls_on_node(image_urls, unique_id)
-        return (download_and_process_images(image_urls),)
+        return (await download_and_process_images(image_urls),)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_kling.py
+++ b/comfy_api_nodes/nodes_kling.py
@@ -109,7 +109,7 @@ class KlingApiError(Exception):
    pass


-def poll_until_finished(
+async def poll_until_finished(
    auth_kwargs: dict[str, str],
    api_endpoint: ApiEndpoint[Any, R],
    result_url_extractor: Optional[Callable[[R], str]] = None,
@@ -117,7 +117,7 @@ def poll_until_finished(
    node_id: Optional[str] = None,
 ) -> R:
    """Polls the Kling API endpoint until the task reaches a terminal state, then returns the response."""
-    return PollingOperation(
+    return await PollingOperation(
        poll_endpoint=api_endpoint,
        completed_statuses=[
            KlingTaskStatus.succeed.value,
@@ -278,18 +278,18 @@ def get_images_urls_from_response(response) -> Optional[str]:
        return None


-def video_result_to_node_output(
+async def video_result_to_node_output(
    video: KlingVideoResult,
 ) -> tuple[VideoFromFile, str, str]:
    """Converts a KlingVideoResult to a tuple of (VideoFromFile, str, str) to be used as a ComfyUI node output."""
    return (
-        download_url_to_video_output(video.url),
+        await download_url_to_video_output(str(video.url)),
        str(video.id),
        str(video.duration),
    )


-def image_result_to_node_output(
+async def image_result_to_node_output(
    images: list[KlingImageResult],
 ) -> torch.Tensor:
    """
@@ -297,9 +297,9 @@ def image_result_to_node_output(
    If multiple images are returned, they will be stacked along the batch dimension.
    """
    if len(images) == 1:
-        return download_url_to_image_tensor(images[0].url)
+        return await download_url_to_image_tensor(str(images[0].url))
    else:
-        return torch.cat([download_url_to_image_tensor(image.url) for image in images])
+        return torch.cat([await download_url_to_image_tensor(str(image.url)) for image in images])


 class KlingNodeBase(ComfyNodeABC):
@@ -467,10 +467,10 @@ class KlingTextToVideoNode(KlingNodeBase):
    RETURN_NAMES = ("VIDEO", "video_id", "duration")
    DESCRIPTION = "Kling Text to Video Node"

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingText2VideoResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_TEXT_TO_VIDEO}/{task_id}",
@@ -483,7 +483,7 @@ class KlingTextToVideoNode(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        negative_prompt: str,
@@ -519,17 +519,17 @@ class KlingTextToVideoNode(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)

        task_id = task_creation_response.data.task_id
-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingCameraControlT2VNode(KlingTextToVideoNode):
@@ -581,7 +581,7 @@ class KlingCameraControlT2VNode(KlingTextToVideoNode):

    DESCRIPTION = "Transform text into cinematic videos with professional camera movements that simulate real-world cinematography. Control virtual camera actions including zoom, rotation, pan, tilt, and first-person view, while maintaining focus on your original text."

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        negative_prompt: str,
@@ -591,7 +591,7 @@ class KlingCameraControlT2VNode(KlingTextToVideoNode):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            model_name=KlingVideoGenModelName.kling_v1,
            cfg_scale=cfg_scale,
            mode=KlingVideoGenMode.std,
@@ -670,10 +670,10 @@ class KlingImage2VideoNode(KlingNodeBase):
    RETURN_NAMES = ("VIDEO", "video_id", "duration")
    DESCRIPTION = "Kling Image to Video Node"

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingImage2VideoResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_IMAGE_TO_VIDEO}/{task_id}",
@@ -686,7 +686,7 @@ class KlingImage2VideoNode(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        start_frame: torch.Tensor,
        prompt: str,
@@ -733,17 +733,17 @@ class KlingImage2VideoNode(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingCameraControlI2VNode(KlingImage2VideoNode):
@@ -798,7 +798,7 @@ class KlingCameraControlI2VNode(KlingImage2VideoNode):

    DESCRIPTION = "Transform still images into cinematic videos with professional camera movements that simulate real-world cinematography. Control virtual camera actions including zoom, rotation, pan, tilt, and first-person view, while maintaining focus on your original image."

-    def api_call(
+    async def api_call(
        self,
        start_frame: torch.Tensor,
        prompt: str,
@@ -809,7 +809,7 @@ class KlingCameraControlI2VNode(KlingImage2VideoNode):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            model_name=KlingVideoGenModelName.kling_v1_5,
            start_frame=start_frame,
            cfg_scale=cfg_scale,
@@ -897,7 +897,7 @@ class KlingStartEndFrameNode(KlingImage2VideoNode):

    DESCRIPTION = "Generate a video sequence that transitions between your provided start and end images. The node creates all frames in between, producing a smooth transformation from the first frame to the last."

-    def api_call(
+    async def api_call(
        self,
        start_frame: torch.Tensor,
        end_frame: torch.Tensor,
@@ -912,7 +912,7 @@ class KlingStartEndFrameNode(KlingImage2VideoNode):
        mode, duration, model_name = KlingStartEndFrameNode.get_mode_string_mapping()[
            mode
        ]
-        return super().api_call(
+        return await super().api_call(
            prompt=prompt,
            negative_prompt=negative_prompt,
            model_name=model_name,
@@ -964,10 +964,10 @@ class KlingVideoExtendNode(KlingNodeBase):
    RETURN_NAMES = ("VIDEO", "video_id", "duration")
    DESCRIPTION = "Kling Video Extend Node. Extend videos made by other Kling nodes. The video_id is created by using other Kling Nodes."

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingVideoExtendResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_VIDEO_EXTEND}/{task_id}",
@@ -980,7 +980,7 @@ class KlingVideoExtendNode(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        negative_prompt: str,
@@ -1006,17 +1006,17 @@ class KlingVideoExtendNode(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingVideoEffectsBase(KlingNodeBase):
@@ -1025,10 +1025,10 @@ class KlingVideoEffectsBase(KlingNodeBase):
    RETURN_TYPES = ("VIDEO", "STRING", "STRING")
    RETURN_NAMES = ("VIDEO", "video_id", "duration")

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingVideoEffectsResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_VIDEO_EFFECTS}/{task_id}",
@@ -1041,7 +1041,7 @@ class KlingVideoEffectsBase(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        dual_character: bool,
        effect_scene: KlingDualCharacterEffectsScene | KlingSingleImageEffectsScene,
@@ -1084,17 +1084,17 @@ class KlingVideoEffectsBase(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingDualCharacterVideoEffectNode(KlingVideoEffectsBase):
@@ -1142,7 +1142,7 @@ class KlingDualCharacterVideoEffectNode(KlingVideoEffectsBase):
    RETURN_TYPES = ("VIDEO", "STRING")
    RETURN_NAMES = ("VIDEO", "duration")

-    def api_call(
+    async def api_call(
        self,
        image_left: torch.Tensor,
        image_right: torch.Tensor,
@@ -1153,7 +1153,7 @@ class KlingDualCharacterVideoEffectNode(KlingVideoEffectsBase):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        video, _, duration = super().api_call(
+        video, _, duration = await super().api_call(
            dual_character=True,
            effect_scene=effect_scene,
            model_name=model_name,
@@ -1208,7 +1208,7 @@ class KlingSingleImageVideoEffectNode(KlingVideoEffectsBase):

    DESCRIPTION = "Achieve different special effects when generating a video based on the effect_scene."

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        effect_scene: KlingSingleImageEffectsScene,
@@ -1217,7 +1217,7 @@ class KlingSingleImageVideoEffectNode(KlingVideoEffectsBase):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            dual_character=False,
            effect_scene=effect_scene,
            model_name=model_name,
@@ -1253,11 +1253,11 @@ class KlingLipSyncBase(KlingNodeBase):
                f"Text is too long. Maximum length is {MAX_PROMPT_LENGTH_LIP_SYNC} characters."
            )

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingLipSyncResponse:
        """Polls the Kling API endpoint until the task reaches a terminal state."""
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_LIP_SYNC}/{task_id}",
@@ -1270,7 +1270,7 @@ class KlingLipSyncBase(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        audio: Optional[AudioInput] = None,
@@ -1287,12 +1287,12 @@ class KlingLipSyncBase(KlingNodeBase):
        self.validate_lip_sync_video(video)

        # Upload video to Comfy API and get download URL
-        video_url = upload_video_to_comfyapi(video, auth_kwargs=kwargs)
+        video_url = await upload_video_to_comfyapi(video, auth_kwargs=kwargs)
        logging.info("Uploaded video to Comfy API. URL: %s", video_url)

        # Upload the audio file to Comfy API and get download URL
        if audio:
-            audio_url = upload_audio_to_comfyapi(audio, auth_kwargs=kwargs)
+            audio_url = await upload_audio_to_comfyapi(audio, auth_kwargs=kwargs)
            logging.info("Uploaded audio to Comfy API. URL: %s", audio_url)
        else:
            audio_url = None
@@ -1319,17 +1319,17 @@ class KlingLipSyncBase(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingLipSyncAudioToVideoNode(KlingLipSyncBase):
@@ -1357,7 +1357,7 @@ class KlingLipSyncAudioToVideoNode(KlingLipSyncBase):

    DESCRIPTION = "Kling Lip Sync Audio to Video Node. Syncs mouth movements in a video file to the audio content of an audio file. When using, ensure that the audio contains clearly distinguishable vocals and that the video contains a distinct face. The audio file should not be larger than 5MB. The video file should not be larger than 100MB, should have height/width between 720px and 1920px, and should be between 2s and 10s in length."

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        audio: AudioInput,
@@ -1365,7 +1365,7 @@ class KlingLipSyncAudioToVideoNode(KlingLipSyncBase):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            video=video,
            audio=audio,
            voice_language=voice_language,
@@ -1469,7 +1469,7 @@ class KlingLipSyncTextToVideoNode(KlingLipSyncBase):

    DESCRIPTION = "Kling Lip Sync Text to Video Node. Syncs mouth movements in a video file to a text prompt. The video file should not be larger than 100MB, should have height/width between 720px and 1920px, and should be between 2s and 10s in length."

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        text: str,
@@ -1479,7 +1479,7 @@ class KlingLipSyncTextToVideoNode(KlingLipSyncBase):
        **kwargs,
    ):
        voice_id, voice_language = KlingLipSyncTextToVideoNode.get_voice_config()[voice]
-        return super().api_call(
+        return await super().api_call(
            video=video,
            text=text,
            voice_language=voice_language,
@@ -1533,10 +1533,10 @@ class KlingVirtualTryOnNode(KlingImageGenerationBase):

    DESCRIPTION = "Kling Virtual Try On Node. Input a human image and a cloth image to try on the cloth on the human. You can merge multiple clothing item pictures into one image with a white background."

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingVirtualTryOnResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_VIRTUAL_TRY_ON}/{task_id}",
@@ -1549,7 +1549,7 @@ class KlingVirtualTryOnNode(KlingImageGenerationBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        human_image: torch.Tensor,
        cloth_image: torch.Tensor,
@@ -1572,17 +1572,17 @@ class KlingVirtualTryOnNode(KlingImageGenerationBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_image_result_response(final_response)

        images = get_images_from_response(final_response)
-        return (image_result_to_node_output(images),)
+        return (await image_result_to_node_output(images),)


 class KlingImageGenerationNode(KlingImageGenerationBase):
@@ -1655,13 +1655,13 @@ class KlingImageGenerationNode(KlingImageGenerationBase):

    DESCRIPTION = "Kling Image Generation Node. Generate an image from a text prompt with an optional reference image."

-    def get_response(
+    async def get_response(
        self,
        task_id: str,
        auth_kwargs: Optional[dict[str, str]],
        node_id: Optional[str] = None,
    ) -> KlingImageGenerationsResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_IMAGE_GENERATIONS}/{task_id}",
@@ -1674,7 +1674,7 @@ class KlingImageGenerationNode(KlingImageGenerationBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        model_name: KlingImageGenModelName,
        prompt: str,
@@ -1690,7 +1690,11 @@ class KlingImageGenerationNode(KlingImageGenerationBase):
    ):
        self.validate_prompt(prompt, negative_prompt)

-        if image is not None:
+        if image is None:
+            image_type = None
+        elif model_name == KlingImageGenModelName.kling_v1:
+            raise ValueError(f"The model {KlingImageGenModelName.kling_v1.value} does not support reference images.")
+        else:
            image = tensor_to_base64_string(image)

        initial_operation = SynchronousOperation(
@@ -1714,17 +1718,17 @@ class KlingImageGenerationNode(KlingImageGenerationBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_image_result_response(final_response)

        images = get_images_from_response(final_response)
-        return (image_result_to_node_output(images),)
+        return (await image_result_to_node_output(images),)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_luma.py
+++ b/comfy_api_nodes/nodes_luma.py
@@ -38,7 +38,7 @@ from comfy_api_nodes.apinode_utils import (
 )
 from server import PromptServer

-import requests
+import aiohttp
 import torch
 from io import BytesIO

@@ -217,7 +217,7 @@ class LumaImageGenerationNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -234,19 +234,19 @@ class LumaImageGenerationNode(ComfyNodeABC):
        # handle image_luma_ref
        api_image_ref = None
        if image_luma_ref is not None:
-            api_image_ref = self._convert_luma_refs(
+            api_image_ref = await self._convert_luma_refs(
                image_luma_ref, max_refs=4, auth_kwargs=kwargs,
            )
        # handle style_luma_ref
        api_style_ref = None
        if style_image is not None:
-            api_style_ref = self._convert_style_image(
+            api_style_ref = await self._convert_style_image(
                style_image, weight=style_image_weight, auth_kwargs=kwargs,
            )
        # handle character_ref images
        character_ref = None
        if character_image is not None:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                character_image, max_images=4, auth_kwargs=kwargs,
            )
            character_ref = LumaCharacterRef(
@@ -270,7 +270,7 @@ class LumaImageGenerationNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        operation = PollingOperation(
            poll_endpoint=ApiEndpoint(
@@ -286,19 +286,20 @@ class LumaImageGenerationNode(ComfyNodeABC):
            node_id=unique_id,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        img_response = requests.get(response_poll.assets.image)
-        img = process_image_response(img_response)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.image) as img_response:
+                img = process_image_response(await img_response.content.read())
        return (img,)

-    def _convert_luma_refs(
+    async def _convert_luma_refs(
        self, luma_ref: LumaReferenceChain, max_refs: int, auth_kwargs: Optional[dict[str,str]] = None
    ):
        luma_urls = []
        ref_count = 0
        for ref in luma_ref.refs:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                ref.image, max_images=1, auth_kwargs=auth_kwargs
            )
            luma_urls.append(download_urls[0])
@@ -307,13 +308,13 @@ class LumaImageGenerationNode(ComfyNodeABC):
                break
        return luma_ref.create_api_model(download_urls=luma_urls, max_refs=max_refs)

-    def _convert_style_image(
+    async def _convert_style_image(
        self, style_image: torch.Tensor, weight: float, auth_kwargs: Optional[dict[str,str]] = None
    ):
        chain = LumaReferenceChain(
            first_ref=LumaReference(image=style_image, weight=weight)
        )
-        return self._convert_luma_refs(chain, max_refs=1, auth_kwargs=auth_kwargs)
+        return await self._convert_luma_refs(chain, max_refs=1, auth_kwargs=auth_kwargs)


 class LumaImageModifyNode(ComfyNodeABC):
@@ -370,7 +371,7 @@ class LumaImageModifyNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -381,7 +382,7 @@ class LumaImageModifyNode(ComfyNodeABC):
        **kwargs,
    ):
        # first, upload image
-        download_urls = upload_images_to_comfyapi(
+        download_urls = await upload_images_to_comfyapi(
            image, max_images=1, auth_kwargs=kwargs,
        )
        image_url = download_urls[0]
@@ -402,7 +403,7 @@ class LumaImageModifyNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        operation = PollingOperation(
            poll_endpoint=ApiEndpoint(
@@ -418,10 +419,11 @@ class LumaImageModifyNode(ComfyNodeABC):
            node_id=unique_id,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        img_response = requests.get(response_poll.assets.image)
-        img = process_image_response(img_response)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.image) as img_response:
+                img = process_image_response(await img_response.content.read())
        return (img,)


@@ -494,7 +496,7 @@ class LumaTextToVideoGenerationNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -529,7 +531,7 @@ class LumaTextToVideoGenerationNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        if unique_id:
            PromptServer.instance.send_progress_text(f"Luma video generation started: {response_api.id}", unique_id)
@@ -549,10 +551,11 @@ class LumaTextToVideoGenerationNode(ComfyNodeABC):
            estimated_duration=LUMA_T2V_AVERAGE_DURATION,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.assets.video)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.video) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 class LumaImageToVideoGenerationNode(ComfyNodeABC):
@@ -626,7 +629,7 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -644,7 +647,7 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            raise Exception(
                "At least one of first_image and last_image requires an input."
            )
-        keyframes = self._convert_to_keyframes(first_image, last_image, auth_kwargs=kwargs)
+        keyframes = await self._convert_to_keyframes(first_image, last_image, auth_kwargs=kwargs)
        duration = duration if model != LumaVideoModel.ray_1_6 else None
        resolution = resolution if model != LumaVideoModel.ray_1_6 else None

@@ -667,7 +670,7 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        if unique_id:
            PromptServer.instance.send_progress_text(f"Luma video generation started: {response_api.id}", unique_id)
@@ -687,12 +690,13 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            estimated_duration=LUMA_I2V_AVERAGE_DURATION,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.assets.video)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.video) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)

-    def _convert_to_keyframes(
+    async def _convert_to_keyframes(
        self,
        first_image: torch.Tensor = None,
        last_image: torch.Tensor = None,
@@ -703,12 +707,12 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
        frame0 = None
        frame1 = None
        if first_image is not None:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                first_image, max_images=1, auth_kwargs=auth_kwargs,
            )
            frame0 = LumaImageReference(type="image", url=download_urls[0])
        if last_image is not None:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                last_image, max_images=1, auth_kwargs=auth_kwargs,
            )
            frame1 = LumaImageReference(type="image", url=download_urls[0])
--- a/comfy_api_nodes/nodes_minimax.py
+++ b/comfy_api_nodes/nodes_minimax.py
@@ -86,7 +86,7 @@ class MinimaxTextToVideoNode:
    API_NODE = True
    OUTPUT_NODE = True

-    def generate_video(
+    async def generate_video(
        self,
        prompt_text,
        seed=0,
@@ -104,12 +104,12 @@ class MinimaxTextToVideoNode:
        # upload image, if passed in
        image_url = None
        if image is not None:
-            image_url = upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs)[0]
+            image_url = (await upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs))[0]

        # TODO: figure out how to deal with subject properly, API returns invalid params when using S2V-01 model
        subject_reference = None
        if subject is not None:
-            subject_url = upload_images_to_comfyapi(subject, max_images=1, auth_kwargs=kwargs)[0]
+            subject_url = (await upload_images_to_comfyapi(subject, max_images=1, auth_kwargs=kwargs))[0]
            subject_reference = [SubjectReferenceItem(image=subject_url)]


@@ -130,7 +130,7 @@ class MinimaxTextToVideoNode:
            ),
            auth_kwargs=kwargs,
        )
-        response = video_generate_operation.execute()
+        response = await video_generate_operation.execute()

        task_id = response.task_id
        if not task_id:
@@ -151,7 +151,7 @@ class MinimaxTextToVideoNode:
            node_id=unique_id,
            auth_kwargs=kwargs,
        )
-        task_result = video_generate_operation.execute()
+        task_result = await video_generate_operation.execute()

        file_id = task_result.file_id
        if file_id is None:
@@ -167,7 +167,7 @@ class MinimaxTextToVideoNode:
            request=EmptyRequest(),
            auth_kwargs=kwargs,
        )
-        file_result = file_retrieve_operation.execute()
+        file_result = await file_retrieve_operation.execute()

        file_url = file_result.file.download_url
        if file_url is None:
@@ -182,7 +182,7 @@ class MinimaxTextToVideoNode:
                message = f"Result URL: {file_url}"
            PromptServer.instance.send_progress_text(message, unique_id)

-        video_io = download_url_to_bytesio(file_url)
+        video_io = await download_url_to_bytesio(file_url)
        if video_io is None:
            error_msg = f"Failed to download video from {file_url}"
            logging.error(error_msg)
--- a/comfy_api_nodes/nodes_moonvalley.py
+++ b/comfy_api_nodes/nodes_moonvalley.py
@@ -1,6 +1,5 @@
 import logging
 from typing import Any, Callable, Optional, TypeVar
-import random
 import torch
 from comfy_api_nodes.util.validation_utils import (
    get_image_dimensions,
@@ -95,14 +94,14 @@ def get_video_url_from_response(response) -> Optional[str]:
        return None


-def poll_until_finished(
+async def poll_until_finished(
    auth_kwargs: dict[str, str],
    api_endpoint: ApiEndpoint[Any, R],
    result_url_extractor: Optional[Callable[[R], str]] = None,
    node_id: Optional[str] = None,
 ) -> R:
    """Polls the Moonvalley API endpoint until the task reaches a terminal state, then returns the response."""
-    return PollingOperation(
+    return await PollingOperation(
        poll_endpoint=api_endpoint,
        completed_statuses=[
            "completed",
@@ -208,20 +207,29 @@ def _get_video_dimensions(video: VideoInput) -> tuple[int, int]:
 def _validate_video_dimensions(width: int, height: int) -> None:
    """Validates video dimensions meet Moonvalley V2V requirements."""
    supported_resolutions = {
-        (1920, 1080), (1080, 1920), (1152, 1152),
-        (1536, 1152), (1152, 1536)
+        (1920, 1080),
+        (1080, 1920),
+        (1152, 1152),
+        (1536, 1152),
+        (1152, 1536),
    }

    if (width, height) not in supported_resolutions:
-        supported_list = ', '.join([f'{w}x{h}' for w, h in sorted(supported_resolutions)])
-        raise ValueError(f"Resolution {width}x{height} not supported. Supported: {supported_list}")
+        supported_list = ", ".join(
+            [f"{w}x{h}" for w, h in sorted(supported_resolutions)]
+        )
+        raise ValueError(
+            f"Resolution {width}x{height} not supported. Supported: {supported_list}"
+        )


 def _validate_container_format(video: VideoInput) -> None:
    """Validates video container format is MP4."""
    container_format = video.get_container_format()
-    if container_format not in ['mp4', 'mov,mp4,m4a,3gp,3g2,mj2']:
-        raise ValueError(f"Only MP4 container format supported. Got: {container_format}")
+    if container_format not in ["mp4", "mov,mp4,m4a,3gp,3g2,mj2"]:
+        raise ValueError(
+            f"Only MP4 container format supported. Got: {container_format}"
+        )


 def _validate_and_trim_duration(video: VideoInput) -> VideoInput:
@@ -244,7 +252,6 @@ def _trim_if_too_long(video: VideoInput, duration: float) -> VideoInput:
    return video


-
 def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
    """
    Returns a new VideoInput object trimmed from the beginning to the specified duration,
@@ -302,7 +309,9 @@ def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
        # Calculate target frame count that's divisible by 16
        fps = input_container.streams.video[0].average_rate
        estimated_frames = int(duration_sec * fps)
-        target_frames = (estimated_frames // 16) * 16  # Round down to nearest multiple of 16
+        target_frames = (
+            estimated_frames // 16
+        ) * 16  # Round down to nearest multiple of 16

        if target_frames == 0:
            raise ValueError("Video too short: need at least 16 frames for Moonvalley")
@@ -394,10 +403,10 @@ class BaseMoonvalleyVideoNode:
        else:
            return control_map["Motion Transfer"]

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> MoonvalleyPromptResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{API_PROMPTS_ENDPOINT}/{task_id}",
@@ -424,7 +433,7 @@ class BaseMoonvalleyVideoNode:
                    MoonvalleyTextToVideoInferenceParams,
                    "negative_prompt",
                    multiline=True,
-                    default="low-poly, flat shader, bad rigging, stiff animation, uncanny eyes, low-quality textures, looping glitch, cheap effect, overbloom, bloom spam, default lighting, game asset, stiff face, ugly specular, AI artifacts",
+                    default="<synthetic> <scene cut> gopro, bright, contrast, static, overexposed, vignette, artifacts, still, noise, texture, scanlines, videogame, 360 camera, VR, transition, flare, saturation, distorted, warped, wide angle, saturated, vibrant, glowing, cross dissolve, cheesy, ugly hands, mutated hands, mutant, disfigured, extra fingers, blown out, horrible, blurry, worst quality, bad, dissolve, melt, fade in, fade out, wobbly, weird, low quality, plastic, stock footage, video camera, boring",
                ),
                "resolution": (
                    IO.COMBO,
@@ -441,12 +450,11 @@ class BaseMoonvalleyVideoNode:
                        "tooltip": "Resolution of the output video",
                    },
                ),
-                # "length": (IO.COMBO,{"options":['5s','10s'], "default": '5s'}),
                "prompt_adherence": model_field_to_node_input(
                    IO.FLOAT,
                    MoonvalleyTextToVideoInferenceParams,
                    "guidance_scale",
-                    default=7.0,
+                    default=10.0,
                    step=1,
                    min=1,
                    max=20,
@@ -455,13 +463,12 @@ class BaseMoonvalleyVideoNode:
                    IO.INT,
                    MoonvalleyTextToVideoInferenceParams,
                    "seed",
-                    default=random.randint(0, 2**32 - 1),
+                    default=9,
                    min=0,
                    max=4294967295,
                    step=1,
                    display="number",
                    tooltip="Random seed value",
-                    control_after_generate=True,
                ),
                "steps": model_field_to_node_input(
                    IO.INT,
@@ -507,7 +514,7 @@ class MoonvalleyImg2VideoNode(BaseMoonvalleyVideoNode):
    RETURN_NAMES = ("video",)
    DESCRIPTION = "Moonvalley Marey Image to Video Node"

-    def generate(
+    async def generate(
        self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs
    ):
        image = kwargs.get("image", None)
@@ -532,8 +539,10 @@ class MoonvalleyImg2VideoNode(BaseMoonvalleyVideoNode):
        # Get MIME type from tensor - assuming PNG format for image tensors
        mime_type = "image/png"

-        image_url = upload_images_to_comfyapi(
-            image, max_images=1, auth_kwargs=kwargs, mime_type=mime_type
+        image_url = (
+            await upload_images_to_comfyapi(
+                image, max_images=1, auth_kwargs=kwargs, mime_type=mime_type
+            )
        )[0]

        request = MoonvalleyTextToVideoRequest(
@@ -549,14 +558,14 @@ class MoonvalleyImg2VideoNode(BaseMoonvalleyVideoNode):
            request=request,
            auth_kwargs=kwargs,
        )
-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
-        video = download_url_to_video_output(final_response.output_url)
+        video = await download_url_to_video_output(final_response.output_url)
        return (video,)


@@ -570,17 +579,39 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
        return {
            "required": {
                "prompt": model_field_to_node_input(
-                    IO.STRING, MoonvalleyVideoToVideoRequest, "prompt_text",
-                    multiline=True
+                    IO.STRING,
+                    MoonvalleyVideoToVideoRequest,
+                    "prompt_text",
+                    multiline=True,
                ),
                "negative_prompt": model_field_to_node_input(
                    IO.STRING,
                    MoonvalleyVideoToVideoInferenceParams,
                    "negative_prompt",
                    multiline=True,
-                    default="low-poly, flat shader, bad rigging, stiff animation, uncanny eyes, low-quality textures, looping glitch, cheap effect, overbloom, bloom spam, default lighting, game asset, stiff face, ugly specular, AI artifacts"
+                    default="<synthetic> <scene cut> gopro, bright, contrast, static, overexposed, vignette, artifacts, still, noise, texture, scanlines, videogame, 360 camera, VR, transition, flare, saturation, distorted, warped, wide angle, saturated, vibrant, glowing, cross dissolve, cheesy, ugly hands, mutated hands, mutant, disfigured, extra fingers, blown out, horrible, blurry, worst quality, bad, dissolve, melt, fade in, fade out, wobbly, weird, low quality, plastic, stock footage, video camera, boring",
+                ),
+                "seed": model_field_to_node_input(
+                    IO.INT,
+                    MoonvalleyVideoToVideoInferenceParams,
+                    "seed",
+                    default=9,
+                    min=0,
+                    max=4294967295,
+                    step=1,
+                    display="number",
+                    tooltip="Random seed value",
+                    control_after_generate=False,
+                ),
+                "prompt_adherence": model_field_to_node_input(
+                    IO.FLOAT,
+                    MoonvalleyVideoToVideoInferenceParams,
+                    "guidance_scale",
+                    default=10.0,
+                    step=1,
+                    min=1,
+                    max=20,
                ),
-                "seed": model_field_to_node_input(IO.INT,MoonvalleyVideoToVideoInferenceParams, "seed", default=random.randint(0, 2**32 - 1), min=0, max=4294967295, step=1, display="number", tooltip="Random seed value", control_after_generate=True),
            },
            "hidden": {
                "auth_token": "AUTH_TOKEN_COMFY_ORG",
@@ -588,7 +619,14 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
                "unique_id": "UNIQUE_ID",
            },
            "optional": {
-                "video": (IO.VIDEO, {"default": "", "multiline": False, "tooltip": "The reference video used to generate the output video. Must be at least 5 seconds long. Videos longer than 5s will be automatically trimmed. Only MP4 format supported."}),
+                "video": (
+                    IO.VIDEO,
+                    {
+                        "default": "",
+                        "multiline": False,
+                        "tooltip": "The reference video used to generate the output video. Must be at least 5 seconds long. Videos longer than 5s will be automatically trimmed. Only MP4 format supported.",
+                    },
+                ),
                "control_type": (
                    ["Motion Transfer", "Pose Transfer"],
                    {"default": "Motion Transfer"},
@@ -602,17 +640,24 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
                        "max": 100,
                        "tooltip": "Only used if control_type is 'Motion Transfer'",
                    },
-                )
-            }
+                ),
+                "image": model_field_to_node_input(
+                    IO.IMAGE,
+                    MoonvalleyTextToVideoRequest,
+                    "image_url",
+                    tooltip="The reference image used to generate the video",
+                ),
+            },
        }

    RETURN_TYPES = ("VIDEO",)
    RETURN_NAMES = ("video",)

-    def generate(
+    async def generate(
        self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs
    ):
        video = kwargs.get("video")
+        image = kwargs.get("image", None)

        if not video:
            raise MoonvalleyApiError("video is required")
@@ -620,8 +665,16 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
        video_url = ""
        if video:
            validated_video = validate_video_to_video_input(video)
-            video_url = upload_video_to_comfyapi(validated_video, auth_kwargs=kwargs)
+            video_url = await upload_video_to_comfyapi(
+                validated_video, auth_kwargs=kwargs
+            )
+        mime_type = "image/png"

+        if not image is None:
+            validate_input_image(image, with_frame_conditioning=True)
+            image_url = await upload_images_to_comfyapi(
+                image=image, auth_kwargs=kwargs, max_images=1, mime_type=mime_type
+            )
        control_type = kwargs.get("control_type")
        motion_intensity = kwargs.get("motion_intensity")

@@ -631,12 +684,12 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
        # Only include motion_intensity for Motion Transfer
        control_params = {}
        if control_type == "Motion Transfer" and motion_intensity is not None:
-            control_params['motion_intensity'] = motion_intensity
+            control_params["motion_intensity"] = motion_intensity

-        inference_params=MoonvalleyVideoToVideoInferenceParams(
+        inference_params = MoonvalleyVideoToVideoInferenceParams(
            negative_prompt=negative_prompt,
            seed=kwargs.get("seed"),
-            control_params=control_params
+            control_params=control_params,
        )

        control = self.parseControlParameter(control_type)
@@ -647,6 +700,7 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
            prompt_text=prompt,
            inference_params=inference_params,
        )
+        request.image_url = image_url if not image is None else None

        initial_operation = SynchronousOperation(
            endpoint=ApiEndpoint(
@@ -658,15 +712,15 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
            request=request,
            auth_kwargs=kwargs,
        )
-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )

-        video = download_url_to_video_output(final_response.output_url)
+        video = await download_url_to_video_output(final_response.output_url)

        return (video,)

@@ -688,21 +742,21 @@ class MoonvalleyTxt2VideoNode(BaseMoonvalleyVideoNode):
                del input_types["optional"][param]
        return input_types

-    def generate(
+    async def generate(
        self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs
    ):
        validate_prompts(prompt, negative_prompt, MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
        width_height = self.parseWidthHeightFromRes(kwargs.get("resolution"))

-        inference_params=MoonvalleyTextToVideoInferenceParams(
-                    negative_prompt=negative_prompt,
-                    steps=kwargs.get("steps"),
-                    seed=kwargs.get("seed"),
-                    guidance_scale=kwargs.get("prompt_adherence"),
-                    num_frames=128,
-                    width=width_height.get("width"),
-                    height=width_height.get("height"),
-                )
+        inference_params = MoonvalleyTextToVideoInferenceParams(
+            negative_prompt=negative_prompt,
+            steps=kwargs.get("steps"),
+            seed=kwargs.get("seed"),
+            guidance_scale=kwargs.get("prompt_adherence"),
+            num_frames=128,
+            width=width_height.get("width"),
+            height=width_height.get("height"),
+        )
        request = MoonvalleyTextToVideoRequest(
            prompt_text=prompt, inference_params=inference_params
        )
@@ -717,15 +771,15 @@ class MoonvalleyTxt2VideoNode(BaseMoonvalleyVideoNode):
            request=request,
            auth_kwargs=kwargs,
        )
-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )

-        video = download_url_to_video_output(final_response.output_url)
+        video = await download_url_to_video_output(final_response.output_url)
        return (video,)


--- a/comfy_api_nodes/nodes_openai.py
+++ b/comfy_api_nodes/nodes_openai.py
@@ -163,7 +163,7 @@ class OpenAIDalle2(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        seed=0,
@@ -233,9 +233,9 @@ class OpenAIDalle2(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

-        img_tensor = validate_and_cast_response(response, node_id=unique_id)
+        img_tensor = await validate_and_cast_response(response, node_id=unique_id)
        return (img_tensor,)


@@ -311,7 +311,7 @@ class OpenAIDalle3(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        seed=0,
@@ -343,9 +343,9 @@ class OpenAIDalle3(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

-        img_tensor = validate_and_cast_response(response, node_id=unique_id)
+        img_tensor = await validate_and_cast_response(response, node_id=unique_id)
        return (img_tensor,)


@@ -446,7 +446,7 @@ class OpenAIGPTImage1(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        seed=0,
@@ -464,8 +464,6 @@ class OpenAIGPTImage1(ComfyNodeABC):
        path = "/proxy/openai/images/generations"
        content_type = "application/json"
        request_class = OpenAIImageGenerationRequest
-        img_binaries = []
-        mask_binary = None
        files = []

        if image is not None:
@@ -484,14 +482,11 @@ class OpenAIGPTImage1(ComfyNodeABC):
                img_byte_arr = io.BytesIO()
                img.save(img_byte_arr, format="PNG")
                img_byte_arr.seek(0)
-                img_binary = img_byte_arr
-                img_binary.name = f"image_{i}.png"

-                img_binaries.append(img_binary)
                if batch_size == 1:
-                    files.append(("image", img_binary))
+                    files.append(("image", (f"image_{i}.png", img_byte_arr, "image/png")))
                else:
-                    files.append(("image[]", img_binary))
+                    files.append(("image[]", (f"image_{i}.png", img_byte_arr, "image/png")))

        if mask is not None:
            if image is None:
@@ -511,9 +506,7 @@ class OpenAIGPTImage1(ComfyNodeABC):
            mask_img_byte_arr = io.BytesIO()
            mask_img.save(mask_img_byte_arr, format="PNG")
            mask_img_byte_arr.seek(0)
-            mask_binary = mask_img_byte_arr
-            mask_binary.name = "mask.png"
-            files.append(("mask", mask_binary))
+            files.append(("mask", ("mask.png", mask_img_byte_arr, "image/png")))

        # Build the operation
        operation = SynchronousOperation(
@@ -537,9 +530,9 @@ class OpenAIGPTImage1(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

-        img_tensor = validate_and_cast_response(response, node_id=unique_id)
+        img_tensor = await validate_and_cast_response(response, node_id=unique_id)
        return (img_tensor,)


@@ -623,7 +616,7 @@ class OpenAIChatNode(OpenAITextNode):

    DESCRIPTION = "Generate text responses from an OpenAI model."

-    def get_result_response(
+    async def get_result_response(
        self,
        response_id: str,
        include: Optional[list[Includable]] = None,
@@ -639,7 +632,7 @@ class OpenAIChatNode(OpenAITextNode):
                creation above for more information.

        """
-        return PollingOperation(
+        return await PollingOperation(
            poll_endpoint=ApiEndpoint(
                path=f"{RESPONSES_ENDPOINT}/{response_id}",
                method=HttpMethod.GET,
@@ -784,7 +777,7 @@ class OpenAIChatNode(OpenAITextNode):

        self.history[session_id] = new_history

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        persist_context: bool,
@@ -815,7 +808,7 @@ class OpenAIChatNode(OpenAITextNode):
            previous_response_id = None

        # Create response
-        create_response = SynchronousOperation(
+        create_response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path=RESPONSES_ENDPOINT,
                method=HttpMethod.POST,
@@ -848,7 +841,7 @@ class OpenAIChatNode(OpenAITextNode):
        response_id = create_response.id

        # Get result output
-        result_response = self.get_result_response(response_id, auth_kwargs=kwargs)
+        result_response = await self.get_result_response(response_id, auth_kwargs=kwargs)
        output_text = self.parse_output_text_from_response(result_response)

        # Update history
--- a/comfy_api_nodes/nodes_pika.py
+++ b/comfy_api_nodes/nodes_pika.py
@@ -122,7 +122,7 @@ class PikaNodeBase(ComfyNodeABC):
    FUNCTION = "api_call"
    RETURN_TYPES = ("VIDEO",)

-    def poll_for_task_status(
+    async def poll_for_task_status(
        self,
        task_id: str,
        auth_kwargs: Optional[dict[str, str]] = None,
@@ -152,9 +152,9 @@ class PikaNodeBase(ComfyNodeABC):
            node_id=node_id,
            estimated_duration=60
        )
-        return polling_operation.execute()
+        return await polling_operation.execute()

-    def execute_task(
+    async def execute_task(
        self,
        initial_operation: SynchronousOperation[R, PikaGenerateResponse],
        auth_kwargs: Optional[dict[str, str]] = None,
@@ -169,14 +169,14 @@ class PikaNodeBase(ComfyNodeABC):
        Returns:
            A tuple containing the video file as a VIDEO output.
        """
-        initial_response = initial_operation.execute()
+        initial_response = await initial_operation.execute()
        if not is_valid_initial_response(initial_response):
            error_msg = f"Pika initial request failed. Code: {initial_response.code}, Message: {initial_response.message}, Data: {initial_response.data}"
            logging.error(error_msg)
            raise PikaApiError(error_msg)

        task_id = initial_response.video_id
-        final_response = self.poll_for_task_status(task_id, auth_kwargs)
+        final_response = await self.poll_for_task_status(task_id, auth_kwargs)
        if not is_valid_video_response(final_response):
            error_msg = (
                f"Pika task {task_id} succeeded but no video data found in response."
@@ -187,7 +187,7 @@ class PikaNodeBase(ComfyNodeABC):
        video_url = str(final_response.url)
        logging.info("Pika task %s succeeded. Video URL: %s", task_id, video_url)

-        return (download_url_to_video_output(video_url),)
+        return (await download_url_to_video_output(video_url),)


 class PikaImageToVideoV2_2(PikaNodeBase):
@@ -212,7 +212,7 @@ class PikaImageToVideoV2_2(PikaNodeBase):

    DESCRIPTION = "Sends an image and prompt to the Pika API v2.2 to generate a video."

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt_text: str,
@@ -251,7 +251,7 @@ class PikaImageToVideoV2_2(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaTextToVideoNodeV2_2(PikaNodeBase):
@@ -281,7 +281,7 @@ class PikaTextToVideoNodeV2_2(PikaNodeBase):

    DESCRIPTION = "Sends a text prompt to the Pika API v2.2 to generate a video."

-    def api_call(
+    async def api_call(
        self,
        prompt_text: str,
        negative_prompt: str,
@@ -311,7 +311,7 @@ class PikaTextToVideoNodeV2_2(PikaNodeBase):
            content_type="application/x-www-form-urlencoded",
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaScenesV2_2(PikaNodeBase):
@@ -361,7 +361,7 @@ class PikaScenesV2_2(PikaNodeBase):

    DESCRIPTION = "Combine your images to create a video with the objects in them. Upload multiple images as ingredients and generate a high-quality video that incorporates all of them."

-    def api_call(
+    async def api_call(
        self,
        prompt_text: str,
        negative_prompt: str,
@@ -420,7 +420,7 @@ class PikaScenesV2_2(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikAdditionsNode(PikaNodeBase):
@@ -462,7 +462,7 @@ class PikAdditionsNode(PikaNodeBase):

    DESCRIPTION = "Add any object or image into your video. Upload a video and specify what you'd like to add to create a seamlessly integrated result."

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        image: torch.Tensor,
@@ -481,10 +481,10 @@ class PikAdditionsNode(PikaNodeBase):
        image_bytes_io = tensor_to_bytesio(image)
        image_bytes_io.seek(0)

-        pika_files = [
-            ("video", ("video.mp4", video_bytes_io, "video/mp4")),
-            ("image", ("image.png", image_bytes_io, "image/png")),
-        ]
+        pika_files = {
+            "video": ("video.mp4", video_bytes_io, "video/mp4"),
+            "image": ("image.png", image_bytes_io, "image/png"),
+        }

        # Prepare non-file data
        pika_request_data = PikaBodyGeneratePikadditionsGeneratePikadditionsPost(
@@ -506,7 +506,7 @@ class PikAdditionsNode(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaSwapsNode(PikaNodeBase):
@@ -558,7 +558,7 @@ class PikaSwapsNode(PikaNodeBase):
    DESCRIPTION = "Swap out any object or region of your video with a new image or object. Define areas to replace either with a mask or coordinates."
    RETURN_TYPES = ("VIDEO",)

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        image: torch.Tensor,
@@ -587,11 +587,11 @@ class PikaSwapsNode(PikaNodeBase):
        image_bytes_io = tensor_to_bytesio(image)
        image_bytes_io.seek(0)

-        pika_files = [
-            ("video", ("video.mp4", video_bytes_io, "video/mp4")),
-            ("image", ("image.png", image_bytes_io, "image/png")),
-            ("modifyRegionMask", ("mask.png", mask_bytes_io, "image/png")),
-        ]
+        pika_files = {
+            "video": ("video.mp4", video_bytes_io, "video/mp4"),
+            "image": ("image.png", image_bytes_io, "image/png"),
+            "modifyRegionMask": ("mask.png", mask_bytes_io, "image/png"),
+        }

        # Prepare non-file data
        pika_request_data = PikaBodyGeneratePikaswapsGeneratePikaswapsPost(
@@ -613,7 +613,7 @@ class PikaSwapsNode(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaffectsNode(PikaNodeBase):
@@ -664,7 +664,7 @@ class PikaffectsNode(PikaNodeBase):

    DESCRIPTION = "Generate a video with a specific Pikaffect. Supported Pikaffects: Cake-ify, Crumble, Crush, Decapitate, Deflate, Dissolve, Explode, Eye-pop, Inflate, Levitate, Melt, Peel, Poke, Squish, Ta-da, Tear"

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        pikaffect: str,
@@ -693,7 +693,7 @@ class PikaffectsNode(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaStartEndFrameNode2_2(PikaNodeBase):
@@ -718,7 +718,7 @@ class PikaStartEndFrameNode2_2(PikaNodeBase):

    DESCRIPTION = "Generate a video by combining your first and last frame. Upload two images to define the start and end points, and let the AI create a smooth transition between them."

-    def api_call(
+    async def api_call(
        self,
        image_start: torch.Tensor,
        image_end: torch.Tensor,
@@ -732,10 +732,7 @@ class PikaStartEndFrameNode2_2(PikaNodeBase):
    ) -> tuple[VideoFromFile]:

        pika_files = [
-            (
-                "keyFrames",
-                ("image_start.png", tensor_to_bytesio(image_start), "image/png"),
-            ),
+            ("keyFrames", ("image_start.png", tensor_to_bytesio(image_start), "image/png")),
            ("keyFrames", ("image_end.png", tensor_to_bytesio(image_end), "image/png")),
        ]

@@ -758,7 +755,7 @@ class PikaStartEndFrameNode2_2(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_pixverse.py
+++ b/comfy_api_nodes/nodes_pixverse.py
@@ -30,7 +30,7 @@ from comfy.comfy_types.node_typing import IO, ComfyNodeABC
 from comfy_api.input_impl import VideoFromFile

 import torch
-import requests
+import aiohttp
 from io import BytesIO


@@ -47,7 +47,7 @@ def get_video_url_from_response(
    return str(response.Resp.url)


-def upload_image_to_pixverse(image: torch.Tensor, auth_kwargs=None):
+async def upload_image_to_pixverse(image: torch.Tensor, auth_kwargs=None):
    # first, upload image to Pixverse and get image id to use in actual generation call
    files = {"image": tensor_to_bytesio(image)}
    operation = SynchronousOperation(
@@ -62,7 +62,7 @@ def upload_image_to_pixverse(image: torch.Tensor, auth_kwargs=None):
        content_type="multipart/form-data",
        auth_kwargs=auth_kwargs,
    )
-    response_upload: PixverseImageUploadResponse = operation.execute()
+    response_upload: PixverseImageUploadResponse = await operation.execute()

    if response_upload.Resp is None:
        raise Exception(
@@ -164,7 +164,7 @@ class PixverseTextToVideoNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        aspect_ratio: str,
@@ -205,7 +205,7 @@ class PixverseTextToVideoNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")
@@ -229,11 +229,11 @@ class PixverseTextToVideoNode(ComfyNodeABC):
            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_T2V,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.Resp.url)
-
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 class PixverseImageToVideoNode(ComfyNodeABC):
@@ -302,7 +302,7 @@ class PixverseImageToVideoNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt: str,
@@ -316,7 +316,7 @@ class PixverseImageToVideoNode(ComfyNodeABC):
        **kwargs,
    ):
        validate_string(prompt, strip_whitespace=False)
-        img_id = upload_image_to_pixverse(image, auth_kwargs=kwargs)
+        img_id = await upload_image_to_pixverse(image, auth_kwargs=kwargs)

        # 1080p is limited to 5 seconds duration
        # only normal motion_mode supported for 1080p or for non-5 second duration
@@ -345,7 +345,7 @@ class PixverseImageToVideoNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")
@@ -369,10 +369,11 @@ class PixverseImageToVideoNode(ComfyNodeABC):
            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_I2V,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.Resp.url)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 class PixverseTransitionVideoNode(ComfyNodeABC):
@@ -436,7 +437,7 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        first_frame: torch.Tensor,
        last_frame: torch.Tensor,
@@ -450,8 +451,8 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
        **kwargs,
    ):
        validate_string(prompt, strip_whitespace=False)
-        first_frame_id = upload_image_to_pixverse(first_frame, auth_kwargs=kwargs)
-        last_frame_id = upload_image_to_pixverse(last_frame, auth_kwargs=kwargs)
+        first_frame_id = await upload_image_to_pixverse(first_frame, auth_kwargs=kwargs)
+        last_frame_id = await upload_image_to_pixverse(last_frame, auth_kwargs=kwargs)

        # 1080p is limited to 5 seconds duration
        # only normal motion_mode supported for 1080p or for non-5 second duration
@@ -480,7 +481,7 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")
@@ -504,10 +505,11 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_T2V,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.Resp.url)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_recraft.py
+++ b/comfy_api_nodes/nodes_recraft.py
@@ -37,7 +37,7 @@ from io import BytesIO
 from PIL import UnidentifiedImageError


-def handle_recraft_file_request(
+async def handle_recraft_file_request(
        image: torch.Tensor,
        path: str,
        mask: torch.Tensor=None,
@@ -71,13 +71,13 @@ def handle_recraft_file_request(
            auth_kwargs=auth_kwargs,
            multipart_parser=recraft_multipart_parser,
        )
-        response: RecraftImageGenerationResponse = operation.execute()
+        response: RecraftImageGenerationResponse = await operation.execute()
        all_bytesio = []
        if response.image is not None:
-            all_bytesio.append(download_url_to_bytesio(response.image.url, timeout=timeout))
+            all_bytesio.append(await download_url_to_bytesio(response.image.url, timeout=timeout))
        else:
            for data in response.data:
-                all_bytesio.append(download_url_to_bytesio(data.url, timeout=timeout))
+                all_bytesio.append(await download_url_to_bytesio(data.url, timeout=timeout))

        return all_bytesio

@@ -395,7 +395,7 @@ class RecraftTextToImageNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        size: str,
@@ -439,7 +439,7 @@ class RecraftTextToImageNode:
            ),
            auth_kwargs=kwargs,
        )
-        response: RecraftImageGenerationResponse = operation.execute()
+        response: RecraftImageGenerationResponse = await operation.execute()
        images = []
        urls = []
        for data in response.data:
@@ -451,7 +451,7 @@ class RecraftTextToImageNode:
                        f"Result URL: {urls_string}", unique_id
                    )
                image = bytesio_to_image_tensor(
-                    download_url_to_bytesio(data.url, timeout=1024)
+                    await download_url_to_bytesio(data.url, timeout=1024)
                )
            if len(image.shape) < 4:
                image = image.unsqueeze(0)
@@ -538,7 +538,7 @@ class RecraftImageToImageNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt: str,
@@ -578,7 +578,7 @@ class RecraftImageToImageNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                path="/proxy/recraft/images/imageToImage",
                request=request,
@@ -654,7 +654,7 @@ class RecraftImageInpaintingNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        mask: torch.Tensor,
@@ -690,7 +690,7 @@ class RecraftImageInpaintingNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                mask=mask[i:i+1],
                path="/proxy/recraft/images/inpaint",
@@ -779,7 +779,7 @@ class RecraftTextToVectorNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        substyle: str,
@@ -821,7 +821,7 @@ class RecraftTextToVectorNode:
            ),
            auth_kwargs=kwargs,
        )
-        response: RecraftImageGenerationResponse = operation.execute()
+        response: RecraftImageGenerationResponse = await operation.execute()
        svg_data = []
        urls = []
        for data in response.data:
@@ -831,7 +831,7 @@ class RecraftTextToVectorNode:
                PromptServer.instance.send_progress_text(
                    f"Result URL: {' '.join(urls)}", unique_id
                )
-            svg_data.append(download_url_to_bytesio(data.url, timeout=1024))
+            svg_data.append(await download_url_to_bytesio(data.url, timeout=1024))

        return (SVG(svg_data),)

@@ -861,7 +861,7 @@ class RecraftVectorizeImageNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        **kwargs,
@@ -870,7 +870,7 @@ class RecraftVectorizeImageNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                path="/proxy/recraft/images/vectorize",
                auth_kwargs=kwargs,
@@ -942,7 +942,7 @@ class RecraftReplaceBackgroundNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt: str,
@@ -973,7 +973,7 @@ class RecraftReplaceBackgroundNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                path="/proxy/recraft/images/replaceBackground",
                request=request,
@@ -1011,7 +1011,7 @@ class RecraftRemoveBackgroundNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        **kwargs,
@@ -1020,7 +1020,7 @@ class RecraftRemoveBackgroundNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                path="/proxy/recraft/images/removeBackground",
                auth_kwargs=kwargs,
@@ -1062,7 +1062,7 @@ class RecraftCrispUpscaleNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        **kwargs,
@@ -1071,7 +1071,7 @@ class RecraftCrispUpscaleNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                path=self.RECRAFT_PATH,
                auth_kwargs=kwargs,
--- a/comfy_api_nodes/nodes_rodin.py
+++ b/comfy_api_nodes/nodes_rodin.py
@@ -9,11 +9,10 @@ from __future__ import annotations
 from inspect import cleandoc
 from comfy.comfy_types.node_typing import IO
 import folder_paths as comfy_paths
-import requests
+import aiohttp
 import os
 import datetime
-import shutil
-import time
+import asyncio
 import io
 import logging
 import math
@@ -66,7 +65,6 @@ def create_task_error(response: Rodin3DGenerateResponse):
    return hasattr(response, "error")


-
 class Rodin3DAPI:
    """
    Generate 3D Assets using Rodin API
@@ -123,8 +121,8 @@ class Rodin3DAPI:
        else:
            return "Generating"

-    def CreateGenerateTask(self, images=None, seed=1, material="PBR", quality="medium", tier="Regular", mesh_mode="Quad", **kwargs):
-        if images == None:
+    async def create_generate_task(self, images=None, seed=1, material="PBR", quality="medium", tier="Regular", mesh_mode="Quad", **kwargs):
+        if images is None:
            raise Exception("Rodin 3D generate requires at least 1 image.")
        if len(images) >= 5:
            raise Exception("Rodin 3D generate requires up to 5 image.")
@@ -155,7 +153,7 @@ class Rodin3DAPI:
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

        if create_task_error(response):
            error_message = f"Rodin3D Create 3D generate Task Failed. Message: {response.message}, error: {response.error}"
@@ -168,7 +166,7 @@ class Rodin3DAPI:
        logging.info(f"[ Rodin3D API - Submit Jobs ] UUID: {task_uuid}")
        return task_uuid, subscription_key

-    def poll_for_task_status(self, subscription_key, **kwargs) -> Rodin3DCheckStatusResponse:
+    async def poll_for_task_status(self, subscription_key, **kwargs) -> Rodin3DCheckStatusResponse:

        path = "/proxy/rodin/api/v2/status"

@@ -191,11 +189,9 @@ class Rodin3DAPI:

        logging.info("[ Rodin3D API - CheckStatus ] Generate Start!")

-        return poll_operation.execute()
+        return await poll_operation.execute()

-
-
-    def GetRodinDownloadList(self, uuid, **kwargs) -> Rodin3DDownloadResponse:
+    async def get_rodin_download_list(self, uuid, **kwargs) -> Rodin3DDownloadResponse:
        logging.info("[ Rodin3D API - Downloading ] Generate Successfully!")

        path = "/proxy/rodin/api/v2/download"
@@ -212,53 +208,59 @@ class Rodin3DAPI:
            auth_kwargs=kwargs
        )

-        return operation.execute()
+        return await operation.execute()

-    def GetQualityAndMode(self, PolyCount):
-        if PolyCount == "200K-Triangle":
+    def get_quality_mode(self, poly_count):
+        if poly_count == "200K-Triangle":
            mesh_mode = "Raw"
            quality = "medium"
        else:
            mesh_mode = "Quad"
-            if PolyCount == "4K-Quad":
+            if poly_count == "4K-Quad":
                quality = "extra-low"
-            elif PolyCount == "8K-Quad":
+            elif poly_count == "8K-Quad":
                quality = "low"
-            elif PolyCount == "18K-Quad":
+            elif poly_count == "18K-Quad":
                quality = "medium"
-            elif PolyCount == "50K-Quad":
+            elif poly_count == "50K-Quad":
                quality = "high"
            else:
                quality = "medium"

        return mesh_mode, quality

-    def DownLoadFiles(self, Url_List):
-        Save_path = os.path.join(comfy_paths.get_output_directory(), "Rodin3D", datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S"))
-        os.makedirs(Save_path, exist_ok=True)
+    async def download_files(self, url_list):
+        save_path = os.path.join(comfy_paths.get_output_directory(), "Rodin3D", datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S"))
+        os.makedirs(save_path, exist_ok=True)
        model_file_path = None
-        for Item in Url_List.list:
-            url = Item.url
-            file_name = Item.name
-            file_path = os.path.join(Save_path, file_name)
-            if file_path.endswith(".glb"):
-                model_file_path = file_path
-            logging.info(f"[ Rodin3D API - download_files ] Downloading file: {file_path}")
-            max_retries = 5
-            for attempt in range(max_retries):
-                try:
-                    with requests.get(url, stream=True) as r:
-                        r.raise_for_status()
-                        with open(file_path, "wb") as f:
-                            shutil.copyfileobj(r.raw, f)
-                    break
-                except Exception as e:
-                    logging.info(f"[ Rodin3D API - download_files ] Error downloading {file_path}:{e}")
-                    if attempt < max_retries - 1:
-                        logging.info("Retrying...")
-                        time.sleep(2)
-                    else:
-                        logging.info(f"[ Rodin3D API - download_files ] Failed to download {file_path} after {max_retries} attempts.")
+        async with aiohttp.ClientSession() as session:
+            for i in url_list.list:
+                url = i.url
+                file_name = i.name
+                file_path = os.path.join(save_path, file_name)
+                if file_path.endswith(".glb"):
+                    model_file_path = file_path
+                logging.info(f"[ Rodin3D API - download_files ] Downloading file: {file_path}")
+                max_retries = 5
+                for attempt in range(max_retries):
+                    try:
+                        async with session.get(url) as resp:
+                            resp.raise_for_status()
+                            with open(file_path, "wb") as f:
+                                async for chunk in resp.content.iter_chunked(32 * 1024):
+                                    f.write(chunk)
+                        break
+                    except Exception as e:
+                        logging.info(f"[ Rodin3D API - download_files ] Error downloading {file_path}:{e}")
+                        if attempt < max_retries - 1:
+                            logging.info("Retrying...")
+                            await asyncio.sleep(2)
+                        else:
+                            logging.info(
+                                "[ Rodin3D API - download_files ] Failed to download %s after %s attempts.",
+                                file_path,
+                                max_retries,
+                            )

        return model_file_path

@@ -285,7 +287,7 @@ class Rodin3D_Regular(Rodin3DAPI):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        Images,
        Seed,
@@ -298,14 +300,17 @@ class Rodin3D_Regular(Rodin3DAPI):
        m_images = []
        for i in range(num_images):
            m_images.append(Images[i])
-        mesh_mode, quality = self.GetQualityAndMode(Polygon_count)
-        task_uuid, subscription_key = self.CreateGenerateTask(images=m_images, seed=Seed, material=Material_Type, quality=quality, tier=tier, mesh_mode=mesh_mode, **kwargs)
-        self.poll_for_task_status(subscription_key, **kwargs)
-        Download_List = self.GetRodinDownloadList(task_uuid, **kwargs)
-        model = self.DownLoadFiles(Download_List)
+        mesh_mode, quality = self.get_quality_mode(Polygon_count)
+        task_uuid, subscription_key = await self.create_generate_task(images=m_images, seed=Seed, material=Material_Type,
+                                                                quality=quality, tier=tier, mesh_mode=mesh_mode,
+                                                                **kwargs)
+        await self.poll_for_task_status(subscription_key, **kwargs)
+        download_list = await self.get_rodin_download_list(task_uuid, **kwargs)
+        model = await self.download_files(download_list)

        return (model,)

+
 class Rodin3D_Detail(Rodin3DAPI):
    @classmethod
    def INPUT_TYPES(s):
@@ -328,7 +333,7 @@ class Rodin3D_Detail(Rodin3DAPI):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        Images,
        Seed,
@@ -341,14 +346,17 @@ class Rodin3D_Detail(Rodin3DAPI):
        m_images = []
        for i in range(num_images):
            m_images.append(Images[i])
-        mesh_mode, quality = self.GetQualityAndMode(Polygon_count)
-        task_uuid, subscription_key = self.CreateGenerateTask(images=m_images, seed=Seed, material=Material_Type, quality=quality, tier=tier, mesh_mode=mesh_mode, **kwargs)
-        self.poll_for_task_status(subscription_key, **kwargs)
-        Download_List = self.GetRodinDownloadList(task_uuid, **kwargs)
-        model = self.DownLoadFiles(Download_List)
+        mesh_mode, quality = self.get_quality_mode(Polygon_count)
+        task_uuid, subscription_key = await self.create_generate_task(images=m_images, seed=Seed, material=Material_Type,
+                                                                quality=quality, tier=tier, mesh_mode=mesh_mode,
+                                                                **kwargs)
+        await self.poll_for_task_status(subscription_key, **kwargs)
+        download_list = await self.get_rodin_download_list(task_uuid, **kwargs)
+        model = await self.download_files(download_list)

        return (model,)

+
 class Rodin3D_Smooth(Rodin3DAPI):
    @classmethod
    def INPUT_TYPES(s):
@@ -371,7 +379,7 @@ class Rodin3D_Smooth(Rodin3DAPI):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        Images,
        Seed,
@@ -384,14 +392,17 @@ class Rodin3D_Smooth(Rodin3DAPI):
        m_images = []
        for i in range(num_images):
            m_images.append(Images[i])
-        mesh_mode, quality = self.GetQualityAndMode(Polygon_count)
-        task_uuid, subscription_key = self.CreateGenerateTask(images=m_images, seed=Seed, material=Material_Type, quality=quality, tier=tier, mesh_mode=mesh_mode, **kwargs)
-        self.poll_for_task_status(subscription_key, **kwargs)
-        Download_List = self.GetRodinDownloadList(task_uuid, **kwargs)
-        model = self.DownLoadFiles(Download_List)
+        mesh_mode, quality = self.get_quality_mode(Polygon_count)
+        task_uuid, subscription_key = await self.create_generate_task(images=m_images, seed=Seed, material=Material_Type,
+                                                                quality=quality, tier=tier, mesh_mode=mesh_mode,
+                                                                **kwargs)
+        await self.poll_for_task_status(subscription_key, **kwargs)
+        download_list = await self.get_rodin_download_list(task_uuid, **kwargs)
+        model = await self.download_files(download_list)

        return (model,)

+
 class Rodin3D_Sketch(Rodin3DAPI):
    @classmethod
    def INPUT_TYPES(s):
@@ -423,7 +434,7 @@ class Rodin3D_Sketch(Rodin3DAPI):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        Images,
        Seed,
@@ -437,10 +448,12 @@ class Rodin3D_Sketch(Rodin3DAPI):
        material_type = "PBR"
        quality = "medium"
        mesh_mode = "Quad"
-        task_uuid, subscription_key = self.CreateGenerateTask(images=m_images, seed=Seed, material=material_type, quality=quality, tier=tier, mesh_mode=mesh_mode, **kwargs)
-        self.poll_for_task_status(subscription_key, **kwargs)
-        Download_List = self.GetRodinDownloadList(task_uuid, **kwargs)
-        model = self.DownLoadFiles(Download_List)
+        task_uuid, subscription_key = await self.create_generate_task(
+            images=m_images, seed=Seed, material=material_type, quality=quality, tier=tier, mesh_mode=mesh_mode, **kwargs
+        )
+        await self.poll_for_task_status(subscription_key, **kwargs)
+        download_list = await self.get_rodin_download_list(task_uuid, **kwargs)
+        model = await self.download_files(download_list)

        return (model,)

--- a/comfy_api_nodes/nodes_runway.py
+++ b/comfy_api_nodes/nodes_runway.py
@@ -99,14 +99,14 @@ def validate_input_image(image: torch.Tensor) -> bool:
    return image.shape[2] < 8000 and image.shape[1] < 8000


-def poll_until_finished(
+async def poll_until_finished(
    auth_kwargs: dict[str, str],
    api_endpoint: ApiEndpoint[Any, TaskStatusResponse],
    estimated_duration: Optional[int] = None,
    node_id: Optional[str] = None,
 ) -> TaskStatusResponse:
    """Polls the Runway API endpoint until the task reaches a terminal state, then returns the response."""
-    return PollingOperation(
+    return await PollingOperation(
        poll_endpoint=api_endpoint,
        completed_statuses=[
            TaskStatus.SUCCEEDED.value,
@@ -115,7 +115,7 @@ def poll_until_finished(
            TaskStatus.FAILED.value,
            TaskStatus.CANCELLED.value,
        ],
-        status_extractor=lambda response: (response.status.value),
+        status_extractor=lambda response: response.status.value,
        auth_kwargs=auth_kwargs,
        result_url_extractor=get_video_url_from_task_status,
        estimated_duration=estimated_duration,
@@ -167,11 +167,11 @@ class RunwayVideoGenNode(ComfyNodeABC):
            )
        return True

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> RunwayImageToVideoResponse:
        """Poll the task status until it is finished then get the response."""
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_GET_TASK_STATUS}/{task_id}",
@@ -183,7 +183,7 @@ class RunwayVideoGenNode(ComfyNodeABC):
            node_id=node_id,
        )

-    def generate_video(
+    async def generate_video(
        self,
        request: RunwayImageToVideoRequest,
        auth_kwargs: dict[str, str],
@@ -200,15 +200,15 @@ class RunwayVideoGenNode(ComfyNodeABC):
            auth_kwargs=auth_kwargs,
        )

-        initial_response = initial_operation.execute()
+        initial_response = await initial_operation.execute()
        self.validate_task_created(initial_response)
        task_id = initial_response.id

-        final_response = self.get_response(task_id, auth_kwargs, node_id)
+        final_response = await self.get_response(task_id, auth_kwargs, node_id)
        self.validate_response(final_response)

        video_url = get_video_url_from_task_status(final_response)
-        return (download_url_to_video_output(video_url),)
+        return (await download_url_to_video_output(video_url),)


 class RunwayImageToVideoNodeGen3a(RunwayVideoGenNode):
@@ -250,7 +250,7 @@ class RunwayImageToVideoNodeGen3a(RunwayVideoGenNode):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        start_frame: torch.Tensor,
@@ -265,7 +265,7 @@ class RunwayImageToVideoNodeGen3a(RunwayVideoGenNode):
        validate_input_image(start_frame)

        # Upload image
-        download_urls = upload_images_to_comfyapi(
+        download_urls = await upload_images_to_comfyapi(
            start_frame,
            max_images=1,
            mime_type="image/png",
@@ -274,7 +274,7 @@ class RunwayImageToVideoNodeGen3a(RunwayVideoGenNode):
        if len(download_urls) != 1:
            raise RunwayApiError("Failed to upload one or more images to comfy api.")

-        return self.generate_video(
+        return await self.generate_video(
            RunwayImageToVideoRequest(
                promptText=prompt,
                seed=seed,
@@ -333,7 +333,7 @@ class RunwayImageToVideoNodeGen4(RunwayVideoGenNode):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        start_frame: torch.Tensor,
@@ -348,7 +348,7 @@ class RunwayImageToVideoNodeGen4(RunwayVideoGenNode):
        validate_input_image(start_frame)

        # Upload image
-        download_urls = upload_images_to_comfyapi(
+        download_urls = await upload_images_to_comfyapi(
            start_frame,
            max_images=1,
            mime_type="image/png",
@@ -357,7 +357,7 @@ class RunwayImageToVideoNodeGen4(RunwayVideoGenNode):
        if len(download_urls) != 1:
            raise RunwayApiError("Failed to upload one or more images to comfy api.")

-        return self.generate_video(
+        return await self.generate_video(
            RunwayImageToVideoRequest(
                promptText=prompt,
                seed=seed,
@@ -382,10 +382,10 @@ class RunwayFirstLastFrameNode(RunwayVideoGenNode):

    DESCRIPTION = "Upload first and last keyframes, draft a prompt, and generate a video. More complex transitions, such as cases where the Last frame is completely different from the First frame, may benefit from the longer 10s duration. This would give the generation more time to smoothly transition between the two inputs. Before diving in, review these best practices to ensure that your input selections will set your generation up for success: https://help.runwayml.com/hc/en-us/articles/34170748696595-Creating-with-Keyframes-on-Gen-3."

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> RunwayImageToVideoResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_GET_TASK_STATUS}/{task_id}",
@@ -437,7 +437,7 @@ class RunwayFirstLastFrameNode(RunwayVideoGenNode):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        start_frame: torch.Tensor,
@@ -455,7 +455,7 @@ class RunwayFirstLastFrameNode(RunwayVideoGenNode):

        # Upload images
        stacked_input_images = image_tensor_pair_to_batch(start_frame, end_frame)
-        download_urls = upload_images_to_comfyapi(
+        download_urls = await upload_images_to_comfyapi(
            stacked_input_images,
            max_images=2,
            mime_type="image/png",
@@ -464,7 +464,7 @@ class RunwayFirstLastFrameNode(RunwayVideoGenNode):
        if len(download_urls) != 2:
            raise RunwayApiError("Failed to upload one or more images to comfy api.")

-        return self.generate_video(
+        return await self.generate_video(
            RunwayImageToVideoRequest(
                promptText=prompt,
                seed=seed,
@@ -543,11 +543,11 @@ class RunwayTextToImageNode(ComfyNodeABC):
            )
        return True

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> TaskStatusResponse:
        """Poll the task status until it is finished then get the response."""
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_GET_TASK_STATUS}/{task_id}",
@@ -559,7 +559,7 @@ class RunwayTextToImageNode(ComfyNodeABC):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        ratio: str,
@@ -574,7 +574,7 @@ class RunwayTextToImageNode(ComfyNodeABC):
        reference_images = None
        if reference_image is not None:
            validate_input_image(reference_image)
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                reference_image,
                max_images=1,
                mime_type="image/png",
@@ -605,19 +605,19 @@ class RunwayTextToImageNode(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        initial_response = initial_operation.execute()
+        initial_response = await initial_operation.execute()
        self.validate_task_created(initial_response)
        task_id = initial_response.id

        # Poll for completion
-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        self.validate_response(final_response)

        # Download and return image
        image_url = get_image_url_from_task_status(final_response)
-        return (download_url_to_image_tensor(image_url),)
+        return (await download_url_to_image_tensor(image_url),)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_stability.py
+++ b/comfy_api_nodes/nodes_stability.py
@@ -124,7 +124,7 @@ class StabilityStableImageUltraNode:
            },
        }

-    def api_call(self, prompt: str, aspect_ratio: str, style_preset: str, seed: int,
+    async def api_call(self, prompt: str, aspect_ratio: str, style_preset: str, seed: int,
                 negative_prompt: str=None, image: torch.Tensor = None, image_denoise: float=None,
                 **kwargs):
        validate_string(prompt, strip_whitespace=False)
@@ -163,7 +163,7 @@ class StabilityStableImageUltraNode:
            content_type="multipart/form-data",
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stable Image Ultra generation failed: {response_api.finish_reason}.")
@@ -257,7 +257,7 @@ class StabilityStableImageSD_3_5Node:
            },
        }

-    def api_call(self, model: str, prompt: str, aspect_ratio: str, style_preset: str, seed: int, cfg_scale: float,
+    async def api_call(self, model: str, prompt: str, aspect_ratio: str, style_preset: str, seed: int, cfg_scale: float,
                 negative_prompt: str=None, image: torch.Tensor = None, image_denoise: float=None,
                 **kwargs):
        validate_string(prompt, strip_whitespace=False)
@@ -302,7 +302,7 @@ class StabilityStableImageSD_3_5Node:
            content_type="multipart/form-data",
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stable Diffusion 3.5 Image generation failed: {response_api.finish_reason}.")
@@ -374,7 +374,7 @@ class StabilityUpscaleConservativeNode:
            },
        }

-    def api_call(self, image: torch.Tensor, prompt: str, creativity: float, seed: int, negative_prompt: str=None,
+    async def api_call(self, image: torch.Tensor, prompt: str, creativity: float, seed: int, negative_prompt: str=None,
                 **kwargs):
        validate_string(prompt, strip_whitespace=False)
        image_binary = tensor_to_bytesio(image, total_pixels=1024*1024).read()
@@ -403,7 +403,7 @@ class StabilityUpscaleConservativeNode:
            content_type="multipart/form-data",
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stability Upscale Conservative generation failed: {response_api.finish_reason}.")
@@ -480,7 +480,7 @@ class StabilityUpscaleCreativeNode:
            },
        }

-    def api_call(self, image: torch.Tensor, prompt: str, creativity: float, style_preset: str, seed: int, negative_prompt: str=None,
+    async def api_call(self, image: torch.Tensor, prompt: str, creativity: float, style_preset: str, seed: int, negative_prompt: str=None,
                 **kwargs):
        validate_string(prompt, strip_whitespace=False)
        image_binary = tensor_to_bytesio(image, total_pixels=1024*1024).read()
@@ -512,7 +512,7 @@ class StabilityUpscaleCreativeNode:
            content_type="multipart/form-data",
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        operation = PollingOperation(
            poll_endpoint=ApiEndpoint(
@@ -527,7 +527,7 @@ class StabilityUpscaleCreativeNode:
            status_extractor=lambda x: get_async_dummy_status(x),
            auth_kwargs=kwargs,
        )
-        response_poll: StabilityResultsGetResponse = operation.execute()
+        response_poll: StabilityResultsGetResponse = await operation.execute()

        if response_poll.finish_reason != "SUCCESS":
            raise Exception(f"Stability Upscale Creative generation failed: {response_poll.finish_reason}.")
@@ -563,8 +563,7 @@ class StabilityUpscaleFastNode:
            },
        }

-    def api_call(self, image: torch.Tensor,
-                 **kwargs):
+    async def api_call(self, image: torch.Tensor, **kwargs):
        image_binary = tensor_to_bytesio(image, total_pixels=4096*4096).read()

        files = {
@@ -583,7 +582,7 @@ class StabilityUpscaleFastNode:
            content_type="multipart/form-data",
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stability Upscale Fast failed: {response_api.finish_reason}.")
--- a/comfy_api_nodes/nodes_tripo.py
+++ b/comfy_api_nodes/nodes_tripo.py
@@ -37,8 +37,8 @@ from comfy_api_nodes.apinode_utils import (
 )


-def upload_image_to_tripo(image, **kwargs):
-    urls = upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs)
+async def upload_image_to_tripo(image, **kwargs):
+    urls = await upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs)
    return TripoFileReference(TripoUrlReference(url=urls[0], type="jpeg"))

 def get_model_url_from_response(response: TripoTaskResponse) -> str:
@@ -49,7 +49,7 @@ def get_model_url_from_response(response: TripoTaskResponse) -> str:
    raise RuntimeError(f"Failed to get model url from response: {response}")


-def poll_until_finished(
+async def poll_until_finished(
    kwargs: dict[str, str],
    response: TripoTaskResponse,
 ) -> tuple[str, str]:
@@ -57,7 +57,7 @@ def poll_until_finished(
    if response.code != 0:
        raise RuntimeError(f"Failed to generate mesh: {response.error}")
    task_id = response.data.task_id
-    response_poll = PollingOperation(
+    response_poll = await PollingOperation(
        poll_endpoint=ApiEndpoint(
            path=f"/proxy/tripo/v2/openapi/task/{task_id}",
            method=HttpMethod.GET,
@@ -80,7 +80,7 @@ def poll_until_finished(
    ).execute()
    if response_poll.data.status == TripoTaskStatus.SUCCESS:
        url = get_model_url_from_response(response_poll)
-        bytesio = download_url_to_bytesio(url)
+        bytesio = await download_url_to_bytesio(url)
        # Save the downloaded model file
        model_file = f"tripo_model_{task_id}.glb"
        with open(os.path.join(get_output_directory(), model_file), "wb") as f:
@@ -88,6 +88,7 @@ def poll_until_finished(
        return model_file, task_id
    raise RuntimeError(f"Failed to generate mesh: {response_poll}")

+
 class TripoTextToModelNode:
    """
    Generates 3D models synchronously based on a text prompt using Tripo's API.
@@ -126,11 +127,11 @@ class TripoTextToModelNode:
    API_NODE = True
    OUTPUT_NODE = True

-    def generate_mesh(self, prompt, negative_prompt=None, model_version=None, style=None, texture=None, pbr=None, image_seed=None, model_seed=None, texture_seed=None, texture_quality=None, face_limit=None, quad=None, **kwargs):
+    async def generate_mesh(self, prompt, negative_prompt=None, model_version=None, style=None, texture=None, pbr=None, image_seed=None, model_seed=None, texture_seed=None, texture_quality=None, face_limit=None, quad=None, **kwargs):
        style_enum = None if style == "None" else style
        if not prompt:
            raise RuntimeError("Prompt is required")
-        response = SynchronousOperation(
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -155,7 +156,8 @@ class TripoTextToModelNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 class TripoImageToModelNode:
    """
@@ -195,12 +197,12 @@ class TripoImageToModelNode:
    API_NODE = True
    OUTPUT_NODE = True

-    def generate_mesh(self, image, model_version=None, style=None, texture=None, pbr=None, model_seed=None, orientation=None, texture_alignment=None, texture_seed=None, texture_quality=None, face_limit=None, quad=None, **kwargs):
+    async def generate_mesh(self, image, model_version=None, style=None, texture=None, pbr=None, model_seed=None, orientation=None, texture_alignment=None, texture_seed=None, texture_quality=None, face_limit=None, quad=None, **kwargs):
        style_enum = None if style == "None" else style
        if image is None:
            raise RuntimeError("Image is required")
-        tripo_file = upload_image_to_tripo(image, **kwargs)
-        response = SynchronousOperation(
+        tripo_file = await upload_image_to_tripo(image, **kwargs)
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -225,7 +227,8 @@ class TripoImageToModelNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 class TripoMultiviewToModelNode:
    """
@@ -267,7 +270,7 @@ class TripoMultiviewToModelNode:
    API_NODE = True
    OUTPUT_NODE = True

-    def generate_mesh(self, image, image_left=None, image_back=None, image_right=None, model_version=None, orientation=None, texture=None, pbr=None, model_seed=None, texture_seed=None, texture_quality=None, texture_alignment=None, face_limit=None, quad=None, **kwargs):
+    async def generate_mesh(self, image, image_left=None, image_back=None, image_right=None, model_version=None, orientation=None, texture=None, pbr=None, model_seed=None, texture_seed=None, texture_quality=None, texture_alignment=None, face_limit=None, quad=None, **kwargs):
        if image is None:
            raise RuntimeError("front image for multiview is required")
        images = []
@@ -282,11 +285,11 @@ class TripoMultiviewToModelNode:
        for image_name in ["image", "image_left", "image_back", "image_right"]:
            image_ = image_dict[image_name]
            if image_ is not None:
-                tripo_file = upload_image_to_tripo(image_, **kwargs)
+                tripo_file = await upload_image_to_tripo(image_, **kwargs)
                images.append(tripo_file)
            else:
                images.append(TripoFileEmptyReference())
-        response = SynchronousOperation(
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -309,7 +312,8 @@ class TripoMultiviewToModelNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 class TripoTextureNode:
    @classmethod
@@ -340,8 +344,8 @@ class TripoTextureNode:
    OUTPUT_NODE = True
    AVERAGE_DURATION = 80

-    def generate_mesh(self, model_task_id, texture=None, pbr=None, texture_seed=None, texture_quality=None, texture_alignment=None, **kwargs):
-        response = SynchronousOperation(
+    async def generate_mesh(self, model_task_id, texture=None, pbr=None, texture_seed=None, texture_quality=None, texture_alignment=None, **kwargs):
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -358,7 +362,7 @@ class TripoTextureNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)


 class TripoRefineNode:
@@ -387,8 +391,8 @@ class TripoRefineNode:
    OUTPUT_NODE = True
    AVERAGE_DURATION = 240

-    def generate_mesh(self, model_task_id, **kwargs):
-        response = SynchronousOperation(
+    async def generate_mesh(self, model_task_id, **kwargs):
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -400,7 +404,7 @@ class TripoRefineNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)


 class TripoRigNode:
@@ -425,8 +429,8 @@ class TripoRigNode:
    OUTPUT_NODE = True
    AVERAGE_DURATION = 180

-    def generate_mesh(self, original_model_task_id, **kwargs):
-        response = SynchronousOperation(
+    async def generate_mesh(self, original_model_task_id, **kwargs):
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -440,7 +444,8 @@ class TripoRigNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 class TripoRetargetNode:
    @classmethod
@@ -475,8 +480,8 @@ class TripoRetargetNode:
    OUTPUT_NODE = True
    AVERAGE_DURATION = 30

-    def generate_mesh(self, animation, original_model_task_id, **kwargs):
-        response = SynchronousOperation(
+    async def generate_mesh(self, animation, original_model_task_id, **kwargs):
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -491,7 +496,8 @@ class TripoRetargetNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 class TripoConversionNode:
    @classmethod
@@ -529,10 +535,10 @@ class TripoConversionNode:
    OUTPUT_NODE = True
    AVERAGE_DURATION = 30

-    def generate_mesh(self, original_model_task_id, format, quad, face_limit, texture_size, texture_format, **kwargs):
+    async def generate_mesh(self, original_model_task_id, format, quad, face_limit, texture_size, texture_format, **kwargs):
        if not original_model_task_id:
            raise RuntimeError("original_model_task_id is required")
-        response = SynchronousOperation(
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -549,7 +555,8 @@ class TripoConversionNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 NODE_CLASS_MAPPINGS = {
    "TripoTextToModelNode": TripoTextToModelNode,
--- a/comfy_api_nodes/nodes_veo2.py
+++ b/comfy_api_nodes/nodes_veo2.py
@@ -1,17 +1,17 @@
 import io
 import logging
 import base64
-import requests
+import aiohttp
 import torch
 from typing import Optional

 from comfy.comfy_types.node_typing import IO, ComfyNodeABC
 from comfy_api.input_impl.video_types import VideoFromFile
 from comfy_api_nodes.apis import (
-    Veo2GenVidRequest,
-    Veo2GenVidResponse,
-    Veo2GenVidPollRequest,
-    Veo2GenVidPollResponse
+    VeoGenVidRequest,
+    VeoGenVidResponse,
+    VeoGenVidPollRequest,
+    VeoGenVidPollResponse
 )
 from comfy_api_nodes.apis.client import (
    ApiEndpoint,
@@ -35,7 +35,7 @@ def convert_image_to_base64(image: torch.Tensor):
    return tensor_to_base64_string(scaled_image)


-def get_video_url_from_response(poll_response: Veo2GenVidPollResponse) -> Optional[str]:
+def get_video_url_from_response(poll_response: VeoGenVidPollResponse) -> Optional[str]:
    if (
        poll_response.response
        and hasattr(poll_response.response, "videos")
@@ -130,6 +130,14 @@ class VeoVideoGenerationNode(ComfyNodeABC):
                    "default": None,
                    "tooltip": "Optional reference image to guide video generation",
                }),
+                "model": (
+                    IO.COMBO,
+                    {
+                        "options": ["veo-2.0-generate-001"],
+                        "default": "veo-2.0-generate-001",
+                        "tooltip": "Veo 2 model to use for video generation",
+                    },
+                ),
            },
            "hidden": {
                "auth_token": "AUTH_TOKEN_COMFY_ORG",
@@ -141,10 +149,10 @@ class VeoVideoGenerationNode(ComfyNodeABC):
    RETURN_TYPES = (IO.VIDEO,)
    FUNCTION = "generate_video"
    CATEGORY = "api node/video/Veo"
-    DESCRIPTION = "Generates videos from text prompts using Google's Veo API"
+    DESCRIPTION = "Generates videos from text prompts using Google's Veo 2 API"
    API_NODE = True

-    def generate_video(
+    async def generate_video(
        self,
        prompt,
        aspect_ratio="16:9",
@@ -154,6 +162,8 @@ class VeoVideoGenerationNode(ComfyNodeABC):
        person_generation="ALLOW",
        seed=0,
        image=None,
+        model="veo-2.0-generate-001",
+        generate_audio=False,
        unique_id: Optional[str] = None,
        **kwargs,
    ):
@@ -188,23 +198,26 @@ class VeoVideoGenerationNode(ComfyNodeABC):
            parameters["negativePrompt"] = negative_prompt
        if seed > 0:
            parameters["seed"] = seed
+        # Only add generateAudio for Veo 3 models
+        if "veo-3.0" in model:
+            parameters["generateAudio"] = generate_audio

        # Initial request to start video generation
        initial_operation = SynchronousOperation(
            endpoint=ApiEndpoint(
-                path="/proxy/veo/generate",
+                path=f"/proxy/veo/{model}/generate",
                method=HttpMethod.POST,
-                request_model=Veo2GenVidRequest,
-                response_model=Veo2GenVidResponse
+                request_model=VeoGenVidRequest,
+                response_model=VeoGenVidResponse
            ),
-            request=Veo2GenVidRequest(
+            request=VeoGenVidRequest(
                instances=instances,
                parameters=parameters
            ),
            auth_kwargs=kwargs,
        )

-        initial_response = initial_operation.execute()
+        initial_response = await initial_operation.execute()
        operation_name = initial_response.name

        logging.info(f"Veo generation started with operation name: {operation_name}")
@@ -223,16 +236,16 @@ class VeoVideoGenerationNode(ComfyNodeABC):
        # Define the polling operation
        poll_operation = PollingOperation(
            poll_endpoint=ApiEndpoint(
-                path="/proxy/veo/poll",
+                path=f"/proxy/veo/{model}/poll",
                method=HttpMethod.POST,
-                request_model=Veo2GenVidPollRequest,
-                response_model=Veo2GenVidPollResponse
+                request_model=VeoGenVidPollRequest,
+                response_model=VeoGenVidPollResponse
            ),
            completed_statuses=["completed"],
            failed_statuses=[],  # No failed statuses, we'll handle errors after polling
            status_extractor=status_extractor,
            progress_extractor=progress_extractor,
-            request=Veo2GenVidPollRequest(
+            request=VeoGenVidPollRequest(
                operationName=operation_name
            ),
            auth_kwargs=kwargs,
@@ -243,7 +256,7 @@ class VeoVideoGenerationNode(ComfyNodeABC):
        )

        # Execute the polling operation
-        poll_response = poll_operation.execute()
+        poll_response = await poll_operation.execute()

        # Now check for errors in the final response
        # Check for error in poll response
@@ -268,7 +281,6 @@ class VeoVideoGenerationNode(ComfyNodeABC):
            raise Exception(error_message)

        # Extract video data
-        video_data = None
        if poll_response.response and hasattr(poll_response.response, 'videos') and poll_response.response.videos and len(poll_response.response.videos) > 0:
            video = poll_response.response.videos[0]

@@ -278,9 +290,9 @@ class VeoVideoGenerationNode(ComfyNodeABC):
                video_data = base64.b64decode(video.bytesBase64Encoded)
            elif hasattr(video, 'gcsUri') and video.gcsUri:
                # Download from URL
-                video_url = video.gcsUri
-                video_response = requests.get(video_url)
-                video_data = video_response.content
+                async with aiohttp.ClientSession() as session:
+                    async with session.get(video.gcsUri) as video_response:
+                        video_data = await video_response.content.read()
            else:
                raise Exception("Video returned but no data or URL was provided")
        else:
@@ -298,11 +310,64 @@ class VeoVideoGenerationNode(ComfyNodeABC):
        return (VideoFromFile(video_io),)


-# Register the node
+class Veo3VideoGenerationNode(VeoVideoGenerationNode):
+    """
+    Generates videos from text prompts using Google's Veo 3 API.
+
+    Supported models:
+    - veo-3.0-generate-001
+    - veo-3.0-fast-generate-001
+
+    This node extends the base Veo node with Veo 3 specific features including
+    audio generation and fixed 8-second duration.
+    """
+
+    @classmethod
+    def INPUT_TYPES(s):
+        parent_input = super().INPUT_TYPES()
+
+        # Update model options for Veo 3
+        parent_input["optional"]["model"] = (
+            IO.COMBO,
+            {
+                "options": ["veo-3.0-generate-001", "veo-3.0-fast-generate-001"],
+                "default": "veo-3.0-generate-001",
+                "tooltip": "Veo 3 model to use for video generation",
+            },
+        )
+
+        # Add generateAudio parameter
+        parent_input["optional"]["generate_audio"] = (
+            IO.BOOLEAN,
+            {
+                "default": False,
+                "tooltip": "Generate audio for the video. Supported by all Veo 3 models.",
+            }
+        )
+
+        # Update duration constraints for Veo 3 (only 8 seconds supported)
+        parent_input["optional"]["duration_seconds"] = (
+            IO.INT,
+            {
+                "default": 8,
+                "min": 8,
+                "max": 8,
+                "step": 1,
+                "display": "number",
+                "tooltip": "Duration of the output video in seconds (Veo 3 only supports 8 seconds)",
+            },
+        )
+
+        return parent_input
+
+
+# Register the nodes
 NODE_CLASS_MAPPINGS = {
    "VeoVideoGenerationNode": VeoVideoGenerationNode,
+    "Veo3VideoGenerationNode": Veo3VideoGenerationNode,
 }

 NODE_DISPLAY_NAME_MAPPINGS = {
-    "VeoVideoGenerationNode": "Google Veo2 Video Generation",
+    "VeoVideoGenerationNode": "Google Veo 2 Video Generation",
+    "Veo3VideoGenerationNode": "Google Veo 3 Video Generation",
 }
--- a/comfy_extras/nodes_audio.py
+++ b/comfy_extras/nodes_audio.py
@@ -346,6 +346,24 @@ class LoadAudio:
            return "Invalid audio file: {}".format(audio)
        return True

+class RecordAudio:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": {"audio": ("AUDIO_RECORD", {})}}
+
+    CATEGORY = "audio"
+
+    RETURN_TYPES = ("AUDIO", )
+    FUNCTION = "load"
+
+    def load(self, audio):
+        audio_path = folder_paths.get_annotated_filepath(audio)
+
+        waveform, sample_rate = torchaudio.load(audio_path)
+        audio = {"waveform": waveform.unsqueeze(0), "sample_rate": sample_rate}
+        return (audio, )
+
+
 NODE_CLASS_MAPPINGS = {
    "EmptyLatentAudio": EmptyLatentAudio,
    "VAEEncodeAudio": VAEEncodeAudio,
@@ -356,6 +374,7 @@ NODE_CLASS_MAPPINGS = {
    "LoadAudio": LoadAudio,
    "PreviewAudio": PreviewAudio,
    "ConditioningStableAudio": ConditioningStableAudio,
+    "RecordAudio": RecordAudio,
 }

 NODE_DISPLAY_NAME_MAPPINGS = {
@@ -367,4 +386,5 @@ NODE_DISPLAY_NAME_MAPPINGS = {
    "SaveAudio": "Save Audio (FLAC)",
    "SaveAudioMP3": "Save Audio (MP3)",
    "SaveAudioOpus": "Save Audio (Opus)",
+    "RecordAudio": "Record Audio",
 }
--- a/comfy_extras/nodes_context_windows.py
+++ b/comfy_extras/nodes_context_windows.py
@@ -0,0 +1,89 @@
+from __future__ import annotations
+from comfy_api.latest import ComfyExtension, io
+import comfy.context_windows
+import nodes
+
+
+class ContextWindowsManualNode(io.ComfyNode):
+    @classmethod
+    def define_schema(cls) -> io.Schema:
+        return io.Schema(
+            node_id="ContextWindowsManual",
+            display_name="Context Windows (Manual)",
+            category="context",
+            description="Manually set context windows.",
+            inputs=[
+                io.Model.Input("model", tooltip="The model to apply context windows to during sampling."),
+                io.Int.Input("context_length", min=1, default=16, tooltip="The length of the context window."),
+                io.Int.Input("context_overlap", min=0, default=4, tooltip="The overlap of the context window."),
+                io.Combo.Input("context_schedule", options=[
+                    comfy.context_windows.ContextSchedules.STATIC_STANDARD,
+                    comfy.context_windows.ContextSchedules.UNIFORM_STANDARD,
+                    comfy.context_windows.ContextSchedules.UNIFORM_LOOPED,
+                    comfy.context_windows.ContextSchedules.BATCHED,
+                    ], tooltip="The stride of the context window."),
+                io.Int.Input("context_stride", min=1, default=1, tooltip="The stride of the context window; only applicable to uniform schedules."),
+                io.Boolean.Input("closed_loop", default=False, tooltip="Whether to close the context window loop; only applicable to looped schedules."),
+                io.Combo.Input("fuse_method", options=comfy.context_windows.ContextFuseMethods.LIST_STATIC, default=comfy.context_windows.ContextFuseMethods.PYRAMID, tooltip="The method to use to fuse the context windows."),
+                io.Int.Input("dim", min=0, max=5, default=0, tooltip="The dimension to apply the context windows to."),
+            ],
+            outputs=[
+                io.Model.Output(tooltip="The model with context windows applied during sampling."),
+            ],
+            is_experimental=True,
+        )
+
+    @classmethod
+    def execute(cls, model: io.Model.Type, context_length: int, context_overlap: int, context_schedule: str, context_stride: int, closed_loop: bool, fuse_method: str, dim: int) -> io.Model:
+        model = model.clone()
+        model.model_options["context_handler"] = comfy.context_windows.IndexListContextHandler(
+            context_schedule=comfy.context_windows.get_matching_context_schedule(context_schedule),
+            fuse_method=comfy.context_windows.get_matching_fuse_method(fuse_method),
+            context_length=context_length,
+            context_overlap=context_overlap,
+            context_stride=context_stride,
+            closed_loop=closed_loop,
+            dim=dim)
+        # make memory usage calculation only take into account the context window latents
+        comfy.context_windows.create_prepare_sampling_wrapper(model)
+        return io.NodeOutput(model)
+
+class WanContextWindowsManualNode(ContextWindowsManualNode):
+    @classmethod
+    def define_schema(cls) -> io.Schema:
+        schema = super().define_schema()
+        schema.node_id = "WanContextWindowsManual"
+        schema.display_name = "WAN Context Windows (Manual)"
+        schema.description = "Manually set context windows for WAN-like models (dim=2)."
+        schema.inputs = [
+            io.Model.Input("model", tooltip="The model to apply context windows to during sampling."),
+                io.Int.Input("context_length", min=1, max=nodes.MAX_RESOLUTION, step=4, default=81, tooltip="The length of the context window."),
+                io.Int.Input("context_overlap", min=0, default=30, tooltip="The overlap of the context window."),
+                io.Combo.Input("context_schedule", options=[
+                    comfy.context_windows.ContextSchedules.STATIC_STANDARD,
+                    comfy.context_windows.ContextSchedules.UNIFORM_STANDARD,
+                    comfy.context_windows.ContextSchedules.UNIFORM_LOOPED,
+                    comfy.context_windows.ContextSchedules.BATCHED,
+                    ], tooltip="The stride of the context window."),
+                io.Int.Input("context_stride", min=1, default=1, tooltip="The stride of the context window; only applicable to uniform schedules."),
+                io.Boolean.Input("closed_loop", default=False, tooltip="Whether to close the context window loop; only applicable to looped schedules."),
+                io.Combo.Input("fuse_method", options=comfy.context_windows.ContextFuseMethods.LIST_STATIC, default=comfy.context_windows.ContextFuseMethods.PYRAMID, tooltip="The method to use to fuse the context windows."),
+        ]
+        return schema
+
+    @classmethod
+    def execute(cls, model: io.Model.Type, context_length: int, context_overlap: int, context_schedule: str, context_stride: int, closed_loop: bool, fuse_method: str) -> io.Model:
+        context_length = max(((context_length - 1) // 4) + 1, 1)  # at least length 1
+        context_overlap = max(((context_overlap - 1) // 4) + 1, 0)  # at least overlap 0
+        return super().execute(model, context_length, context_overlap, context_schedule, context_stride, closed_loop, fuse_method, dim=2)
+
+
+class ContextWindowsExtension(ComfyExtension):
+    async def get_node_list(self) -> list[type[io.ComfyNode]]:
+        return [
+            ContextWindowsManualNode,
+            WanContextWindowsManualNode,
+        ]
+
+def comfy_entrypoint():
+    return ContextWindowsExtension()
--- a/comfy_extras/nodes_flux.py
+++ b/comfy_extras/nodes_flux.py
@@ -100,9 +100,28 @@ class FluxKontextImageScale:
        return (image, )


+class FluxKontextMultiReferenceLatentMethod:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": {
+            "conditioning": ("CONDITIONING", ),
+            "reference_latents_method": (("offset", "index"), ),
+            }}
+
+    RETURN_TYPES = ("CONDITIONING",)
+    FUNCTION = "append"
+    EXPERIMENTAL = True
+
+    CATEGORY = "advanced/conditioning/flux"
+
+    def append(self, conditioning, reference_latents_method):
+        c = node_helpers.conditioning_set_values(conditioning, {"reference_latents_method": reference_latents_method})
+        return (c, )
+
 NODE_CLASS_MAPPINGS = {
    "CLIPTextEncodeFlux": CLIPTextEncodeFlux,
    "FluxGuidance": FluxGuidance,
    "FluxDisableGuidance": FluxDisableGuidance,
    "FluxKontextImageScale": FluxKontextImageScale,
+    "FluxKontextMultiReferenceLatentMethod": FluxKontextMultiReferenceLatentMethod,
 }
--- a/comfy_extras/nodes_memory_reserve.py
+++ b/comfy_extras/nodes_memory_reserve.py
@@ -0,0 +1,33 @@
+from comfy_api.latest import io, ComfyExtension
+
+class MemoryReserveNode(io.ComfyNode):
+    @classmethod
+    def define_schema(cls) -> io.Schema:
+        return io.Schema(
+            node_id="ReserveAdditionalMemory",
+            display_name="Reserve Additional Memory",
+            description="Adds additional expected memory usage for the model, in gigabytes.",
+            category="advanced/debug/model",
+            inputs=[
+                io.Model.Input("model", tooltip="The model to add memory reserve to."),
+                io.Float.Input("memory_reserve_gb", min=0.0, default=0.0, max=2048.0, step=0.1, tooltip="The additional expected memory usage for the model, in gigabytes."),
+            ],
+            outputs=[
+                io.Model.Output(tooltip="The model with the additional memory reserve."),
+            ],
+        )
+
+    @classmethod
+    def execute(cls, model: io.Model.Type, memory_reserve_gb: float) -> io.NodeOutput:
+        model = model.clone()
+        model.add_model_memory_reserve(memory_reserve_gb)
+        return io.NodeOutput(model)
+
+class MemoryReserveExtension(ComfyExtension):
+    async def get_node_list(self) -> list[type[io.ComfyNode]]:
+        return [
+            MemoryReserveNode,
+        ]
+
+def comfy_entrypoint():
+    return MemoryReserveExtension()
--- a/comfy_extras/nodes_model_merging_model_specific.py
+++ b/comfy_extras/nodes_model_merging_model_specific.py
@@ -314,6 +314,29 @@ class ModelMergeCosmosPredict2_14B(comfy_extras.nodes_model_merging.ModelMergeBl

        return {"required": arg_dict}

+class ModelMergeQwenImage(comfy_extras.nodes_model_merging.ModelMergeBlocks):
+    CATEGORY = "advanced/model_merging/model_specific"
+
+    @classmethod
+    def INPUT_TYPES(s):
+        arg_dict = { "model1": ("MODEL",),
+                              "model2": ("MODEL",)}
+
+        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+
+        arg_dict["pos_embeds."] = argument
+        arg_dict["img_in."] = argument
+        arg_dict["txt_norm."] = argument
+        arg_dict["txt_in."] = argument
+        arg_dict["time_text_embed."] = argument
+
+        for i in range(60):
+            arg_dict["transformer_blocks.{}.".format(i)] = argument
+
+        arg_dict["proj_out."] = argument
+
+        return {"required": arg_dict}
+
 NODE_CLASS_MAPPINGS = {
    "ModelMergeSD1": ModelMergeSD1,
    "ModelMergeSD2": ModelMergeSD1, #SD1 and SD2 have the same blocks
@@ -329,4 +352,5 @@ NODE_CLASS_MAPPINGS = {
    "ModelMergeWAN2_1": ModelMergeWAN2_1,
    "ModelMergeCosmosPredict2_2B": ModelMergeCosmosPredict2_2B,
    "ModelMergeCosmosPredict2_14B": ModelMergeCosmosPredict2_14B,
+    "ModelMergeQwenImage": ModelMergeQwenImage,
 }
--- a/comfy_extras/nodes_wan.py
+++ b/comfy_extras/nodes_wan.py
@@ -9,29 +9,35 @@ import comfy.clip_vision
 import json
 import numpy as np
 from typing import Tuple
+from typing_extensions import override
+from comfy_api.latest import ComfyExtension, io

-class WanImageToVideo:
+class WanImageToVideo(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": {"positive": ("CONDITIONING", ),
-                             "negative": ("CONDITIONING", ),
-                             "vae": ("VAE", ),
-                             "width": ("INT", {"default": 832, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "height": ("INT", {"default": 480, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "length": ("INT", {"default": 81, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
-                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
-                },
-                "optional": {"clip_vision_output": ("CLIP_VISION_OUTPUT", ),
-                             "start_image": ("IMAGE", ),
-                }}
+    def define_schema(cls):
+        return io.Schema(
+            node_id="WanImageToVideo",
+            category="conditioning/video_models",
+            inputs=[
+                io.Conditioning.Input("positive"),
+                io.Conditioning.Input("negative"),
+                io.Vae.Input("vae"),
+                io.Int.Input("width", default=832, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("height", default=480, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("length", default=81, min=1, max=nodes.MAX_RESOLUTION, step=4),
+                io.Int.Input("batch_size", default=1, min=1, max=4096),
+                io.ClipVisionOutput.Input("clip_vision_output", optional=True),
+                io.Image.Input("start_image", optional=True),
+            ],
+            outputs=[
+                io.Conditioning.Output(display_name="positive"),
+                io.Conditioning.Output(display_name="negative"),
+                io.Latent.Output(display_name="latent"),
+            ],
+        )

-    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
-    RETURN_NAMES = ("positive", "negative", "latent")
-    FUNCTION = "encode"
-
-    CATEGORY = "conditioning/video_models"
-
-    def encode(self, positive, negative, vae, width, height, length, batch_size, start_image=None, clip_vision_output=None):
+    @classmethod
+    def execute(cls, positive, negative, vae, width, height, length, batch_size, start_image=None, clip_vision_output=None) -> io.NodeOutput:
        latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
        if start_image is not None:
            start_image = comfy.utils.common_upscale(start_image[:length].movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
@@ -51,32 +57,36 @@ class WanImageToVideo:

        out_latent = {}
        out_latent["samples"] = latent
-        return (positive, negative, out_latent)
+        return io.NodeOutput(positive, negative, out_latent)


-class WanFunControlToVideo:
+class WanFunControlToVideo(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": {"positive": ("CONDITIONING", ),
-                             "negative": ("CONDITIONING", ),
-                             "vae": ("VAE", ),
-                             "width": ("INT", {"default": 832, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "height": ("INT", {"default": 480, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "length": ("INT", {"default": 81, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
-                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
-                },
-                "optional": {"clip_vision_output": ("CLIP_VISION_OUTPUT", ),
-                             "start_image": ("IMAGE", ),
-                             "control_video": ("IMAGE", ),
-                }}
+    def define_schema(cls):
+        return io.Schema(
+            node_id="WanFunControlToVideo",
+            category="conditioning/video_models",
+            inputs=[
+                io.Conditioning.Input("positive"),
+                io.Conditioning.Input("negative"),
+                io.Vae.Input("vae"),
+                io.Int.Input("width", default=832, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("height", default=480, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("length", default=81, min=1, max=nodes.MAX_RESOLUTION, step=4),
+                io.Int.Input("batch_size", default=1, min=1, max=4096),
+                io.ClipVisionOutput.Input("clip_vision_output", optional=True),
+                io.Image.Input("start_image", optional=True),
+                io.Image.Input("control_video", optional=True),
+            ],
+            outputs=[
+                io.Conditioning.Output(display_name="positive"),
+                io.Conditioning.Output(display_name="negative"),
+                io.Latent.Output(display_name="latent"),
+            ],
+        )

-    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
-    RETURN_NAMES = ("positive", "negative", "latent")
-    FUNCTION = "encode"
-
-    CATEGORY = "conditioning/video_models"
-
-    def encode(self, positive, negative, vae, width, height, length, batch_size, start_image=None, clip_vision_output=None, control_video=None):
+    @classmethod
+    def execute(cls, positive, negative, vae, width, height, length, batch_size, start_image=None, clip_vision_output=None, control_video=None) -> io.NodeOutput:
        latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
        concat_latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
        concat_latent = comfy.latent_formats.Wan21().process_out(concat_latent)
@@ -101,32 +111,96 @@ class WanFunControlToVideo:

        out_latent = {}
        out_latent["samples"] = latent
-        return (positive, negative, out_latent)
+        return io.NodeOutput(positive, negative, out_latent)

-class WanFirstLastFrameToVideo:
+class Wan22FunControlToVideo(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": {"positive": ("CONDITIONING", ),
-                             "negative": ("CONDITIONING", ),
-                             "vae": ("VAE", ),
-                             "width": ("INT", {"default": 832, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "height": ("INT", {"default": 480, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "length": ("INT", {"default": 81, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
-                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
-                },
-                "optional": {"clip_vision_start_image": ("CLIP_VISION_OUTPUT", ),
-                             "clip_vision_end_image": ("CLIP_VISION_OUTPUT", ),
-                             "start_image": ("IMAGE", ),
-                             "end_image": ("IMAGE", ),
-                }}
+    def define_schema(cls):
+        return io.Schema(
+            node_id="Wan22FunControlToVideo",
+            category="conditioning/video_models",
+            inputs=[
+                io.Conditioning.Input("positive"),
+                io.Conditioning.Input("negative"),
+                io.Vae.Input("vae"),
+                io.Int.Input("width", default=832, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("height", default=480, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("length", default=81, min=1, max=nodes.MAX_RESOLUTION, step=4),
+                io.Int.Input("batch_size", default=1, min=1, max=4096),
+                io.Image.Input("ref_image", optional=True),
+                io.Image.Input("control_video", optional=True),
+            ],
+            outputs=[
+                io.Conditioning.Output(display_name="positive"),
+                io.Conditioning.Output(display_name="negative"),
+                io.Latent.Output(display_name="latent"),
+            ],
+        )

-    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
-    RETURN_NAMES = ("positive", "negative", "latent")
-    FUNCTION = "encode"
+    @classmethod
+    def execute(cls, positive, negative, vae, width, height, length, batch_size, ref_image=None, start_image=None, control_video=None) -> io.NodeOutput:
+        latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
+        concat_latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
+        concat_latent = comfy.latent_formats.Wan21().process_out(concat_latent)
+        concat_latent = concat_latent.repeat(1, 2, 1, 1, 1)
+        mask = torch.ones((1, 1, latent.shape[2] * 4, latent.shape[-2], latent.shape[-1]))

-    CATEGORY = "conditioning/video_models"
+        if start_image is not None:
+            start_image = comfy.utils.common_upscale(start_image[:length].movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
+            concat_latent_image = vae.encode(start_image[:, :, :, :3])
+            concat_latent[:,16:,:concat_latent_image.shape[2]] = concat_latent_image[:,:,:concat_latent.shape[2]]
+            mask[:, :, :start_image.shape[0] + 3] = 0.0

-    def encode(self, positive, negative, vae, width, height, length, batch_size, start_image=None, end_image=None, clip_vision_start_image=None, clip_vision_end_image=None):
+        ref_latent = None
+        if ref_image is not None:
+            ref_image = comfy.utils.common_upscale(ref_image[:1].movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
+            ref_latent = vae.encode(ref_image[:, :, :, :3])
+
+        if control_video is not None:
+            control_video = comfy.utils.common_upscale(control_video[:length].movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
+            concat_latent_image = vae.encode(control_video[:, :, :, :3])
+            concat_latent[:,:16,:concat_latent_image.shape[2]] = concat_latent_image[:,:,:concat_latent.shape[2]]
+
+        mask = mask.view(1, mask.shape[2] // 4, 4, mask.shape[3], mask.shape[4]).transpose(1, 2)
+        positive = node_helpers.conditioning_set_values(positive, {"concat_latent_image": concat_latent, "concat_mask": mask, "concat_mask_index": 16})
+        negative = node_helpers.conditioning_set_values(negative, {"concat_latent_image": concat_latent, "concat_mask": mask, "concat_mask_index": 16})
+
+        if ref_latent is not None:
+            positive = node_helpers.conditioning_set_values(positive, {"reference_latents": [ref_latent]}, append=True)
+            negative = node_helpers.conditioning_set_values(negative, {"reference_latents": [ref_latent]}, append=True)
+
+        out_latent = {}
+        out_latent["samples"] = latent
+        return io.NodeOutput(positive, negative, out_latent)
+
+class WanFirstLastFrameToVideo(io.ComfyNode):
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="WanFirstLastFrameToVideo",
+            category="conditioning/video_models",
+            inputs=[
+                io.Conditioning.Input("positive"),
+                io.Conditioning.Input("negative"),
+                io.Vae.Input("vae"),
+                io.Int.Input("width", default=832, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("height", default=480, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("length", default=81, min=1, max=nodes.MAX_RESOLUTION, step=4),
+                io.Int.Input("batch_size", default=1, min=1, max=4096),
+                io.ClipVisionOutput.Input("clip_vision_start_image", optional=True),
+                io.ClipVisionOutput.Input("clip_vision_end_image", optional=True),
+                io.Image.Input("start_image", optional=True),
+                io.Image.Input("end_image", optional=True),
+            ],
+            outputs=[
+                io.Conditioning.Output(display_name="positive"),
+                io.Conditioning.Output(display_name="negative"),
+                io.Latent.Output(display_name="latent"),
+            ],
+        )
+
+    @classmethod
+    def execute(cls, positive, negative, vae, width, height, length, batch_size, start_image=None, end_image=None, clip_vision_start_image=None, clip_vision_end_image=None) -> io.NodeOutput:
        latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
        if start_image is not None:
            start_image = comfy.utils.common_upscale(start_image[:length].movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
@@ -167,62 +241,70 @@ class WanFirstLastFrameToVideo:

        out_latent = {}
        out_latent["samples"] = latent
-        return (positive, negative, out_latent)
+        return io.NodeOutput(positive, negative, out_latent)


-class WanFunInpaintToVideo:
+class WanFunInpaintToVideo(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": {"positive": ("CONDITIONING", ),
-                             "negative": ("CONDITIONING", ),
-                             "vae": ("VAE", ),
-                             "width": ("INT", {"default": 832, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "height": ("INT", {"default": 480, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "length": ("INT", {"default": 81, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
-                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
-                },
-                "optional": {"clip_vision_output": ("CLIP_VISION_OUTPUT", ),
-                             "start_image": ("IMAGE", ),
-                             "end_image": ("IMAGE", ),
-                }}
+    def define_schema(cls):
+        return io.Schema(
+            node_id="WanFunInpaintToVideo",
+            category="conditioning/video_models",
+            inputs=[
+                io.Conditioning.Input("positive"),
+                io.Conditioning.Input("negative"),
+                io.Vae.Input("vae"),
+                io.Int.Input("width", default=832, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("height", default=480, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("length", default=81, min=1, max=nodes.MAX_RESOLUTION, step=4),
+                io.Int.Input("batch_size", default=1, min=1, max=4096),
+                io.ClipVisionOutput.Input("clip_vision_output", optional=True),
+                io.Image.Input("start_image", optional=True),
+                io.Image.Input("end_image", optional=True),
+            ],
+            outputs=[
+                io.Conditioning.Output(display_name="positive"),
+                io.Conditioning.Output(display_name="negative"),
+                io.Latent.Output(display_name="latent"),
+            ],
+        )

-    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
-    RETURN_NAMES = ("positive", "negative", "latent")
-    FUNCTION = "encode"
-
-    CATEGORY = "conditioning/video_models"
-
-    def encode(self, positive, negative, vae, width, height, length, batch_size, start_image=None, end_image=None, clip_vision_output=None):
+    @classmethod
+    def execute(cls, positive, negative, vae, width, height, length, batch_size, start_image=None, end_image=None, clip_vision_output=None) -> io.NodeOutput:
        flfv = WanFirstLastFrameToVideo()
-        return flfv.encode(positive, negative, vae, width, height, length, batch_size, start_image=start_image, end_image=end_image, clip_vision_start_image=clip_vision_output)
+        return flfv.execute(positive, negative, vae, width, height, length, batch_size, start_image=start_image, end_image=end_image, clip_vision_start_image=clip_vision_output)


-class WanVaceToVideo:
+class WanVaceToVideo(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": {"positive": ("CONDITIONING", ),
-                             "negative": ("CONDITIONING", ),
-                             "vae": ("VAE", ),
-                             "width": ("INT", {"default": 832, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "height": ("INT", {"default": 480, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "length": ("INT", {"default": 81, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
-                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
-                             "strength": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1000.0, "step": 0.01}),
-                },
-                "optional": {"control_video": ("IMAGE", ),
-                             "control_masks": ("MASK", ),
-                             "reference_image": ("IMAGE", ),
-                }}
+    def define_schema(cls):
+        return io.Schema(
+            node_id="WanVaceToVideo",
+            category="conditioning/video_models",
+            is_experimental=True,
+            inputs=[
+                io.Conditioning.Input("positive"),
+                io.Conditioning.Input("negative"),
+                io.Vae.Input("vae"),
+                io.Int.Input("width", default=832, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("height", default=480, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("length", default=81, min=1, max=nodes.MAX_RESOLUTION, step=4),
+                io.Int.Input("batch_size", default=1, min=1, max=4096),
+                io.Float.Input("strength", default=1.0, min=0.0, max=1000.0, step=0.01),
+                io.Image.Input("control_video", optional=True),
+                io.Mask.Input("control_masks", optional=True),
+                io.Image.Input("reference_image", optional=True),
+            ],
+            outputs=[
+                io.Conditioning.Output(display_name="positive"),
+                io.Conditioning.Output(display_name="negative"),
+                io.Latent.Output(display_name="latent"),
+                io.Int.Output(display_name="trim_latent"),
+            ],
+        )

-    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT", "INT")
-    RETURN_NAMES = ("positive", "negative", "latent", "trim_latent")
-    FUNCTION = "encode"
-
-    CATEGORY = "conditioning/video_models"
-
-    EXPERIMENTAL = True
-
-    def encode(self, positive, negative, vae, width, height, length, batch_size, strength, control_video=None, control_masks=None, reference_image=None):
+    @classmethod
+    def execute(cls, positive, negative, vae, width, height, length, batch_size, strength, control_video=None, control_masks=None, reference_image=None) -> io.NodeOutput:
        latent_length = ((length - 1) // 4) + 1
        if control_video is not None:
            control_video = comfy.utils.common_upscale(control_video[:length].movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
@@ -279,52 +361,59 @@ class WanVaceToVideo:
        latent = torch.zeros([batch_size, 16, latent_length, height // 8, width // 8], device=comfy.model_management.intermediate_device())
        out_latent = {}
        out_latent["samples"] = latent
-        return (positive, negative, out_latent, trim_latent)
+        return io.NodeOutput(positive, negative, out_latent, trim_latent)

-class TrimVideoLatent:
+class TrimVideoLatent(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "samples": ("LATENT",),
-                              "trim_amount": ("INT", {"default": 0, "min": 0, "max": 99999}),
-                             }}
+    def define_schema(cls):
+        return io.Schema(
+            node_id="TrimVideoLatent",
+            category="latent/video",
+            is_experimental=True,
+            inputs=[
+                io.Latent.Input("samples"),
+                io.Int.Input("trim_amount", default=0, min=0, max=99999),
+            ],
+            outputs=[
+                io.Latent.Output(),
+            ],
+        )

-    RETURN_TYPES = ("LATENT",)
-    FUNCTION = "op"
-
-    CATEGORY = "latent/video"
-
-    EXPERIMENTAL = True
-
-    def op(self, samples, trim_amount):
+    @classmethod
+    def execute(cls, samples, trim_amount) -> io.NodeOutput:
        samples_out = samples.copy()

        s1 = samples["samples"]
        samples_out["samples"] = s1[:, :, trim_amount:]
-        return (samples_out,)
+        return io.NodeOutput(samples_out)

-class WanCameraImageToVideo:
+class WanCameraImageToVideo(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": {"positive": ("CONDITIONING", ),
-                             "negative": ("CONDITIONING", ),
-                             "vae": ("VAE", ),
-                             "width": ("INT", {"default": 832, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "height": ("INT", {"default": 480, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "length": ("INT", {"default": 81, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
-                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
-                },
-                "optional": {"clip_vision_output": ("CLIP_VISION_OUTPUT", ),
-                             "start_image": ("IMAGE", ),
-                             "camera_conditions": ("WAN_CAMERA_EMBEDDING", ),
-                }}
+    def define_schema(cls):
+        return io.Schema(
+            node_id="WanCameraImageToVideo",
+            category="conditioning/video_models",
+            inputs=[
+                io.Conditioning.Input("positive"),
+                io.Conditioning.Input("negative"),
+                io.Vae.Input("vae"),
+                io.Int.Input("width", default=832, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("height", default=480, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("length", default=81, min=1, max=nodes.MAX_RESOLUTION, step=4),
+                io.Int.Input("batch_size", default=1, min=1, max=4096),
+                io.ClipVisionOutput.Input("clip_vision_output", optional=True),
+                io.Image.Input("start_image", optional=True),
+                io.WanCameraEmbedding.Input("camera_conditions", optional=True),
+            ],
+            outputs=[
+                io.Conditioning.Output(display_name="positive"),
+                io.Conditioning.Output(display_name="negative"),
+                io.Latent.Output(display_name="latent"),
+            ],
+        )

-    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
-    RETURN_NAMES = ("positive", "negative", "latent")
-    FUNCTION = "encode"
-
-    CATEGORY = "conditioning/video_models"
-
-    def encode(self, positive, negative, vae, width, height, length, batch_size, start_image=None, clip_vision_output=None, camera_conditions=None):
+    @classmethod
+    def execute(cls, positive, negative, vae, width, height, length, batch_size, start_image=None, clip_vision_output=None, camera_conditions=None) -> io.NodeOutput:
        latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
        concat_latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
        concat_latent = comfy.latent_formats.Wan21().process_out(concat_latent)
@@ -333,9 +422,12 @@ class WanCameraImageToVideo:
            start_image = comfy.utils.common_upscale(start_image[:length].movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
            concat_latent_image = vae.encode(start_image[:, :, :, :3])
            concat_latent[:,:,:concat_latent_image.shape[2]] = concat_latent_image[:,:,:concat_latent.shape[2]]
+            mask = torch.ones((1, 1, latent.shape[2] * 4, latent.shape[-2], latent.shape[-1]))
+            mask[:, :, :start_image.shape[0] + 3] = 0.0
+            mask = mask.view(1, mask.shape[2] // 4, 4, mask.shape[3], mask.shape[4]).transpose(1, 2)

-            positive = node_helpers.conditioning_set_values(positive, {"concat_latent_image": concat_latent})
-            negative = node_helpers.conditioning_set_values(negative, {"concat_latent_image": concat_latent})
+            positive = node_helpers.conditioning_set_values(positive, {"concat_latent_image": concat_latent, "concat_mask": mask})
+            negative = node_helpers.conditioning_set_values(negative, {"concat_latent_image": concat_latent, "concat_mask": mask})

        if camera_conditions is not None:
            positive = node_helpers.conditioning_set_values(positive, {'camera_conditions': camera_conditions})
@@ -347,29 +439,34 @@ class WanCameraImageToVideo:

        out_latent = {}
        out_latent["samples"] = latent
-        return (positive, negative, out_latent)
+        return io.NodeOutput(positive, negative, out_latent)

-class WanPhantomSubjectToVideo:
+class WanPhantomSubjectToVideo(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": {"positive": ("CONDITIONING", ),
-                             "negative": ("CONDITIONING", ),
-                             "vae": ("VAE", ),
-                             "width": ("INT", {"default": 832, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "height": ("INT", {"default": 480, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                             "length": ("INT", {"default": 81, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
-                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
-                },
-                "optional": {"images": ("IMAGE", ),
-                }}
+    def define_schema(cls):
+        return io.Schema(
+            node_id="WanPhantomSubjectToVideo",
+            category="conditioning/video_models",
+            inputs=[
+                io.Conditioning.Input("positive"),
+                io.Conditioning.Input("negative"),
+                io.Vae.Input("vae"),
+                io.Int.Input("width", default=832, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("height", default=480, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("length", default=81, min=1, max=nodes.MAX_RESOLUTION, step=4),
+                io.Int.Input("batch_size", default=1, min=1, max=4096),
+                io.Image.Input("images", optional=True),
+            ],
+            outputs=[
+                io.Conditioning.Output(display_name="positive"),
+                io.Conditioning.Output(display_name="negative_text"),
+                io.Conditioning.Output(display_name="negative_img_text"),
+                io.Latent.Output(display_name="latent"),
+            ],
+        )

-    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "CONDITIONING", "LATENT")
-    RETURN_NAMES = ("positive", "negative_text", "negative_img_text", "latent")
-    FUNCTION = "encode"
-
-    CATEGORY = "conditioning/video_models"
-
-    def encode(self, positive, negative, vae, width, height, length, batch_size, images):
+    @classmethod
+    def execute(cls, positive, negative, vae, width, height, length, batch_size, images) -> io.NodeOutput:
        latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8], device=comfy.model_management.intermediate_device())
        cond2 = negative
        if images is not None:
@@ -385,7 +482,7 @@ class WanPhantomSubjectToVideo:

        out_latent = {}
        out_latent["samples"] = latent
-        return (positive, cond2, negative, out_latent)
+        return io.NodeOutput(positive, cond2, negative, out_latent)

 def parse_json_tracks(tracks):
    """Parse JSON track data into a standardized format"""
@@ -598,39 +695,41 @@ def patch_motion(

    return out_mask_full, out_feature_full

-class WanTrackToVideo:
+class WanTrackToVideo(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": {
-                    "positive": ("CONDITIONING", ),
-                    "negative": ("CONDITIONING", ),
-                    "vae": ("VAE", ),
-                    "tracks": ("STRING", {"multiline": True, "default": "[]"}),
-                    "width": ("INT", {"default": 832, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                    "height": ("INT", {"default": 480, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
-                    "length": ("INT", {"default": 81, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
-                    "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
-                    "temperature": ("FLOAT", {"default": 220.0, "min": 1.0, "max": 1000.0, "step": 0.1}),
-                    "topk": ("INT", {"default": 2, "min": 1, "max": 10}),
-                    "start_image": ("IMAGE", ),
-                },
-                "optional": {
-                    "clip_vision_output": ("CLIP_VISION_OUTPUT", ),
-                }}
+    def define_schema(cls):
+        return io.Schema(
+            node_id="WanTrackToVideo",
+            category="conditioning/video_models",
+            inputs=[
+                io.Conditioning.Input("positive"),
+                io.Conditioning.Input("negative"),
+                io.Vae.Input("vae"),
+                io.String.Input("tracks", multiline=True, default="[]"),
+                io.Int.Input("width", default=832, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("height", default=480, min=16, max=nodes.MAX_RESOLUTION, step=16),
+                io.Int.Input("length", default=81, min=1, max=nodes.MAX_RESOLUTION, step=4),
+                io.Int.Input("batch_size", default=1, min=1, max=4096),
+                io.Float.Input("temperature", default=220.0, min=1.0, max=1000.0, step=0.1),
+                io.Int.Input("topk", default=2, min=1, max=10),
+                io.Image.Input("start_image"),
+                io.ClipVisionOutput.Input("clip_vision_output", optional=True),
+            ],
+            outputs=[
+                io.Conditioning.Output(display_name="positive"),
+                io.Conditioning.Output(display_name="negative"),
+                io.Latent.Output(display_name="latent"),
+            ],
+        )

-    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
-    RETURN_NAMES = ("positive", "negative", "latent")
-    FUNCTION = "encode"
-
-    CATEGORY = "conditioning/video_models"
-
-    def encode(self, positive, negative, vae, tracks, width, height, length, batch_size,
-               temperature, topk, start_image=None, clip_vision_output=None):
+    @classmethod
+    def execute(cls, positive, negative, vae, tracks, width, height, length, batch_size,
+               temperature, topk, start_image=None, clip_vision_output=None) -> io.NodeOutput:

        tracks_data = parse_json_tracks(tracks)

        if not tracks_data:
-            return WanImageToVideo().encode(positive, negative, vae, width, height, length, batch_size, start_image=start_image, clip_vision_output=clip_vision_output)
+            return WanImageToVideo().execute(positive, negative, vae, width, height, length, batch_size, start_image=start_image, clip_vision_output=clip_vision_output)

        latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8],
                           device=comfy.model_management.intermediate_device())
@@ -684,34 +783,36 @@ class WanTrackToVideo:

        out_latent = {}
        out_latent["samples"] = latent
-        return (positive, negative, out_latent)
+        return io.NodeOutput(positive, negative, out_latent)


-class Wan22ImageToVideoLatent:
+class Wan22ImageToVideoLatent(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": {"vae": ("VAE", ),
-                             "width": ("INT", {"default": 1280, "min": 32, "max": nodes.MAX_RESOLUTION, "step": 32}),
-                             "height": ("INT", {"default": 704, "min": 32, "max": nodes.MAX_RESOLUTION, "step": 32}),
-                             "length": ("INT", {"default": 49, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
-                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
-                },
-                "optional": {"start_image": ("IMAGE", ),
-                }}
+    def define_schema(cls):
+        return io.Schema(
+            node_id="Wan22ImageToVideoLatent",
+            category="conditioning/inpaint",
+            inputs=[
+                io.Vae.Input("vae"),
+                io.Int.Input("width", default=1280, min=32, max=nodes.MAX_RESOLUTION, step=32),
+                io.Int.Input("height", default=704, min=32, max=nodes.MAX_RESOLUTION, step=32),
+                io.Int.Input("length", default=49, min=1, max=nodes.MAX_RESOLUTION, step=4),
+                io.Int.Input("batch_size", default=1, min=1, max=4096),
+                io.Image.Input("start_image", optional=True),
+            ],
+            outputs=[
+                io.Latent.Output(),
+            ],
+        )

-
-    RETURN_TYPES = ("LATENT",)
-    FUNCTION = "encode"
-
-    CATEGORY = "conditioning/inpaint"
-
-    def encode(self, vae, width, height, length, batch_size, start_image=None):
+    @classmethod
+    def execute(cls, vae, width, height, length, batch_size, start_image=None) -> io.NodeOutput:
        latent = torch.zeros([1, 48, ((length - 1) // 4) + 1, height // 16, width // 16], device=comfy.model_management.intermediate_device())

        if start_image is None:
            out_latent = {}
            out_latent["samples"] = latent
-            return (out_latent,)
+            return io.NodeOutput(out_latent)

        mask = torch.ones([latent.shape[0], 1, ((length - 1) // 4) + 1, latent.shape[-2], latent.shape[-1]], device=comfy.model_management.intermediate_device())

@@ -726,18 +827,25 @@ class Wan22ImageToVideoLatent:
        latent = latent_format.process_out(latent) * mask + latent * (1.0 - mask)
        out_latent["samples"] = latent.repeat((batch_size, ) + (1,) * (latent.ndim - 1))
        out_latent["noise_mask"] = mask.repeat((batch_size, ) + (1,) * (mask.ndim - 1))
-        return (out_latent,)
+        return io.NodeOutput(out_latent)


-NODE_CLASS_MAPPINGS = {
-    "WanTrackToVideo": WanTrackToVideo,
-    "WanImageToVideo": WanImageToVideo,
-    "WanFunControlToVideo": WanFunControlToVideo,
-    "WanFunInpaintToVideo": WanFunInpaintToVideo,
-    "WanFirstLastFrameToVideo": WanFirstLastFrameToVideo,
-    "WanVaceToVideo": WanVaceToVideo,
-    "TrimVideoLatent": TrimVideoLatent,
-    "WanCameraImageToVideo": WanCameraImageToVideo,
-    "WanPhantomSubjectToVideo": WanPhantomSubjectToVideo,
-    "Wan22ImageToVideoLatent": Wan22ImageToVideoLatent,
-}
+class WanExtension(ComfyExtension):
+    @override
+    async def get_node_list(self) -> list[type[io.ComfyNode]]:
+        return [
+            WanTrackToVideo,
+            WanImageToVideo,
+            WanFunControlToVideo,
+            Wan22FunControlToVideo,
+            WanFunInpaintToVideo,
+            WanFirstLastFrameToVideo,
+            WanVaceToVideo,
+            TrimVideoLatent,
+            WanCameraImageToVideo,
+            WanPhantomSubjectToVideo,
+            Wan22ImageToVideoLatent,
+        ]
+
+async def comfy_entrypoint() -> WanExtension:
+    return WanExtension()
--- a/comfyui_version.py
+++ b/comfyui_version.py
@@ -1,3 +1,3 @@
 # This file is automatically generated by the build process when version is
 # updated in pyproject.toml.
-__version__ = "0.3.48"
+__version__ = "0.3.50"
--- a/execution.py
+++ b/execution.py
@@ -646,8 +646,6 @@ class PromptExecutor:
            self.add_message("execution_error", mes, broadcast=False)

    def execute(self, prompt, prompt_id, extra_data={}, execute_outputs=[]):
-        asyncio_loop = asyncio.new_event_loop()
-        asyncio.set_event_loop(asyncio_loop)
        asyncio.run(self.execute_async(prompt, prompt_id, extra_data, execute_outputs))

    async def execute_async(self, prompt, prompt_id, extra_data={}, execute_outputs=[]):
--- a/nodes.py
+++ b/nodes.py
@@ -925,7 +925,7 @@ class CLIPLoader:
    @classmethod
    def INPUT_TYPES(s):
        return {"required": { "clip_name": (folder_paths.get_filename_list("text_encoders"), ),
-                              "type": (["stable_diffusion", "stable_cascade", "sd3", "stable_audio", "mochi", "ltxv", "pixart", "cosmos", "lumina2", "wan", "hidream", "chroma", "ace", "omnigen2"], ),
+                              "type": (["stable_diffusion", "stable_cascade", "sd3", "stable_audio", "mochi", "ltxv", "pixart", "cosmos", "lumina2", "wan", "hidream", "chroma", "ace", "omnigen2", "qwen_image"], ),
                              },
                "optional": {
                              "device": (["default", "cpu"], {"advanced": True}),
@@ -1229,12 +1229,12 @@ class RepeatLatentBatch:
        s = samples.copy()
        s_in = samples["samples"]

-        s["samples"] = s_in.repeat((amount, 1,1,1))
+        s["samples"] = s_in.repeat((amount,) + ((1,) * (s_in.ndim - 1)))
        if "noise_mask" in samples and samples["noise_mask"].shape[0] > 1:
            masks = samples["noise_mask"]
            if masks.shape[0] < s_in.shape[0]:
-                masks = masks.repeat(math.ceil(s_in.shape[0] / masks.shape[0]), 1, 1, 1)[:s_in.shape[0]]
-            s["noise_mask"] = samples["noise_mask"].repeat((amount, 1,1,1))
+                masks = masks.repeat((math.ceil(s_in.shape[0] / masks.shape[0]),) + ((1,) * (masks.ndim - 1)))[:s_in.shape[0]]
+            s["noise_mask"] = samples["noise_mask"].repeat((amount,) + ((1,) * (samples["noise_mask"].ndim - 1)))
        if "batch_index" in s:
            offset = max(s["batch_index"]) - min(s["batch_index"]) + 1
            s["batch_index"] = s["batch_index"] + [x + (i * offset) for i in range(1, amount) for x in s["batch_index"]]
@@ -2320,6 +2320,8 @@ async def init_builtin_extra_nodes():
        "nodes_camera_trajectory.py",
        "nodes_edit_model.py",
        "nodes_tcfg.py",
+        "nodes_context_windows.py",
+        "nodes_memory_reserve.py",
    ]

    import_failed = []
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "ComfyUI"
-version = "0.3.48"
+version = "0.3.50"
 readme = "README.md"
 license = { file = "LICENSE" }
 requires-python = ">=3.9"
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,6 +1,6 @@
-comfyui-frontend-package==1.23.4
-comfyui-workflow-templates==0.1.45
-comfyui-embedded-docs==0.2.4
+comfyui-frontend-package==1.25.9
+comfyui-workflow-templates==0.1.60
+comfyui-embedded-docs==0.2.6
 torch
 torchsde
 torchvision
@@ -20,11 +20,11 @@ tqdm
 psutil
 alembic
 SQLAlchemy
+av>=14.2.0

 #non essential dependencies:
 kornia>=0.7.1
 spandrel
 soundfile
-av>=14.2.0
 pydantic~=2.0
 pydantic-settings~=2.0
--- a/server.py
+++ b/server.py
@@ -235,7 +235,7 @@ class PromptServer():
                                    sid,
                                )

-                                logging.info(
+                                logging.debug(
                                    f"Feature flags negotiated for client {sid}: {client_flags}"
                                )
                            first_message = False
Author	SHA1	Message	Date
Jedrzej Kosinski	6c611b0b99	Change node id to reflect node name	2025-08-18 15:39:16 -07:00
Jedrzej Kosinski	cd54d502fc	Make sure models_memory_reserve is considered with inference_memory as well in max func calls	2025-08-18 15:34:53 -07:00
Jedrzej Kosinski	63571c6c3d	Renamed to Reserve Additional Memory	2025-08-18 15:04:49 -07:00
Jedrzej Kosinski	bae0c31a68	Added missing model.clone() call	2025-08-18 14:51:11 -07:00
Jedrzej Kosinski	34b1f51f4a	Created Add Memory to Reserve node	2025-08-18 14:45:21 -07:00
Alexander Piskun	bd2ab73976	fix(WAN-nodes): invalid nodeid for WanTrackToVideo (#9396 )	2025-08-18 03:26:55 -04:00
Christian Byrne	da2efeaec6	Bump frontend to 1.25.9 (#9394 )	2025-08-17 20:21:02 -07:00
Jedrzej Kosinski	7f3b9b16c6	Make step index detection much more robust (#9392 )	2025-08-17 18:54:07 -04:00
ComfyUI Wiki	d4e353a94e	Update template to 0.1.60 (#9377 )	2025-08-17 17:38:40 -04:00
comfyanonymous	ed43784b0d	WIP Qwen edit model: The diffusion model part. (#9383 )	2025-08-17 16:45:39 -04:00
comfyanonymous	0f2b8525bc	Qwen image model refactor. (#9375 )	2025-08-16 17:51:28 -04:00
Terry Jia	20a84166d0	record audio node (#8716 ) * record audio node * sf	2025-08-16 02:07:12 -04:00
Christian Byrne	ed2e33c69a	bump frontend version to 1.25.8 (#9361 )	2025-08-15 23:32:58 -04:00
comfyanonymous	1702e6df16	Implement wan2.2 camera model. (#9357 ) Use the old WanCameraImageToVideo node.	2025-08-15 17:29:58 -04:00
comfyanonymous	c308a8840a	Add FluxKontextMultiReferenceLatentMethod node. (#9356 ) This node is only useful if someone trains the kontext model to properly use multiple reference images via the index method. The default is the offset method which feeds the multiple images like if they were stitched together as one. This method works with the current flux kontext model.	2025-08-15 15:50:39 -04:00
Alexander Piskun	027c63f63a	fix(OpenAIGPTImage1): set correct MIME type for multipart uploads to OpenAI edits (#9348 )	2025-08-15 14:57:47 -04:00
comfyanonymous	e08ecfbd8a	Add warning when using old pytorch. (#9347 )	2025-08-15 00:22:26 -04:00
comfyanonymous	4e5c230f6a	Fix last commit not working on older pytorch. (#9346 )	2025-08-14 23:44:02 -04:00
Xiangxi Guo (Ryan)	f0d5d0111f	Avoid torch compile graphbreak for older pytorch versions (#9344 ) Turns out torch.compile has some gaps in context manager decorator syntax support. I've sent patches to fix that in PyTorch, but it won't be available for all the folks running older versions of PyTorch, hence this trivial patch.	2025-08-14 23:41:37 -04:00
comfyanonymous	ad19a069f6	Make SLG nodes work on Qwen Image model. (#9345 )	2025-08-14 23:16:01 -04:00
Alexander Piskun	5d65d6753b	convert WAN nodes to V3 schema (#9201 )	2025-08-14 21:48:41 -04:00
guill	deebee4ff6	Update default parameters for Moonvalley video nodes (#9290 ) * Update default parameters for Moonvalley video nodes - Changed default negative prompts to a more extensive list for both BaseMoonvalleyVideoNode and MoonvalleyVideo2VideoNode. - Updated default guidance scale values for both nodes to enhance prompt adherence. - Set a fixed default seed value for consistency in video generation. * no message * ruff fix --------- Co-authored-by: thorsten <thorsten@tripod-digital.co.nz>	2025-08-14 21:46:55 -04:00
Yoland Yan	fa570cbf59	Update CODEOWNERS (#9343 )	2025-08-14 19:44:22 -04:00
filtered	644b23ac0b	Make custom node testing checkbox optional in issue templates (#9342 ) The checkbox for confirming custom node testing is now optional in both bug report and user support templates. This allows users to submit issues even if they haven't been able to test with custom nodes disabled, making the reporting process more accessible.	2025-08-14 17:36:53 -04:00
comfyanonymous	72fd4d22b6	av is an essential dependency. (#9341 )	2025-08-14 16:03:21 -04:00
Jedrzej Kosinski	e4f7ea105f	Added context window support to core sampling code (#9238 ) * Added initial support for basic context windows - in progress * Add prepare_sampling wrapper for context window to more accurately estimate latent memory requirements, fixed merging wrappers/callbacks dicts in prepare_model_patcher * Made context windows compatible with different dimensions; works for WAN, but results are bad * Fix comfy.patcher_extension.merge_nested_dicts calls in prepare_model_patcher in sampler_helpers.py * Considering adding some callbacks to context window code to allow extensions of behavior without the need to rewrite code * Made dim slicing cleaner * Add Wan Context WIndows node for testing * Made context schedule and fuse method functions be stored on the handler instead of needing to be registered in core code to be found * Moved some code around between node_context_windows.py and context_windows.py * Change manual context window nodes names/ids * Added callbacks to IndexListContexHandler * Adjusted default values for context_length and context_overlap, made schema.inputs definition for WAN Context Windows less annoying * Make get_resized_cond more robust for various dim sizes * Fix typo * Another small fix	2025-08-13 21:33:05 -04:00
Simon Lui	c991a5da65	Fix XPU iGPU regressions (#9322 ) * Change bf16 check and switch non-blocking to off default with option to force to regain speed on certain classes of iGPUs and refactor xpu check. * Turn non_blocking off by default for xpu. * Update README.md for Intel GPUs.	2025-08-13 19:13:35 -04:00
comfyanonymous	9df8792d4b	Make last PR not crash comfy on old pytorch. (#9324 )	2025-08-13 15:12:41 -04:00
contentis	3da5a07510	SDPA backend priority (#9299 )	2025-08-13 14:53:27 -04:00
comfyanonymous	afa0a45206	Reduce portable size again. (#9323 ) * compress more * test * not needed	2025-08-13 14:42:08 -04:00
comfyanonymous	615eb52049	Put back frontend version. (#9317 )	2025-08-13 03:48:06 -04:00
comfyanonymous	d5c1954d5c	ComfyUI version 0.3.50	2025-08-13 03:46:38 -04:00
comfyanonymous	e400f26c8f	Downgrade frontend for release. (#9316 )	2025-08-13 03:44:54 -04:00
comfyanonymous	5ca8e2fac3	Update release workflow to python3.13 pytorch cu129 (#9315 ) * Try to reduce size of portable even more. * Update stable release workflow to python 3.13 cu129 * Update dependencies workflow to python3.13 cu129	2025-08-13 03:01:12 -04:00
ComfyUI Wiki	3294782d19	Update template to 0.1.59 (#9313 )	2025-08-13 02:50:50 -04:00
Jedrzej Kosinski	898d88e10e	Make torchaudio exception catching less specific (#9309 )	2025-08-12 23:34:58 -04:00
comfyanonymous	560d38f34c	Wan2.2 fun control support. (#9292 )	2025-08-12 23:26:33 -04:00
comfyanonymous	e1d4f36d8d	Update test release package workflow with python 3.13 cu129. (#9306 )	2025-08-12 20:13:04 -04:00
ComfyUI Wiki	1e3ae1eed8	Update template to 0.1.58 (#9302 )	2025-08-12 17:14:27 -04:00
Alexander Piskun	f4231a80b1	fix(Kling Image API Node): do not pass "image_type" when no image (#9271 ) * fix(Kling Image API Node): do not pass "image_type" when no image * fix(Kling Image API Node): raise client-side error when kling_v1 is used with reference image	2025-08-11 17:15:14 -04:00
PsychoLogicAu	2208aa616d	Support SimpleTuner lycoris lora for Qwen-Image (#9280 )	2025-08-11 16:56:16 -04:00
ComfyUI Wiki	629b173837	Update template & embedded docs (#9283 ) * Update template & embedded docs * Update embedded docs to 0.2.6	2025-08-11 16:52:12 -04:00
Alexander Piskun	fa340add55	remove creation of non-used asyncio_loop (#9284 )	2025-08-11 16:48:17 -04:00
comfyanonymous	966f3a5206	Only show feature flags log when verbose. (#9281 )	2025-08-11 05:53:01 -04:00
comfyanonymous	0552de7c7d	Bump pytorch cuda and rocm versions in readme instructions. (#9273 )	2025-08-10 05:03:47 -04:00
comfyanonymous	5828607ccf	Not sure if AMD actually support fp16 acc but it doesn't crash. (#9258 )	2025-08-09 12:49:25 -04:00
comfyanonymous	735bb4bdb1	Users report gfx1201 is buggy on flux with pytorch attention. (#9244 )	2025-08-08 04:21:00 -04:00
Alexander Piskun	bf2a1b5b1e	async API nodes (#9129 ) * converted API nodes to async * converted BFL API nodes to async * fixed client bug; converted gemini, ideogram, minimax * fixed client bug; converted openai nodes * fixed client bug; converted moonvalley, pika nodes * fixed client bug; converted kling, luma nodes * converted pixverse, rodin nodes * converted tripo, veo2 * converted recraft nodes * add lost log_request_response call	2025-08-07 23:37:50 -04:00
Jedrzej Kosinski	42974a448c	_ui.py import torchaudio safety check (#9234 ) * Added safety around torchaudio import in _ui.py * Trusted cursor too much, fixed torchaudio bool	2025-08-07 17:54:09 -04:00
comfyanonymous	05df2df489	Fix RepeatLatentBatch not working on multi dim latents. (#9227 )	2025-08-07 11:20:40 -04:00
Christian Byrne	37d620a6b8	Update frontend to v1.24.3 (#9175 )	2025-08-06 19:52:39 -04:00
ComfyUI Wiki	32691b16f4	Update template to 0.1.52 (#9206 )	2025-08-06 13:26:29 -04:00
flybirdxx	4c3e57b0ae	Fixed an issue where qwenLora could not be loaded properly. (#9208 )	2025-08-06 13:23:11 -04:00
comfyanonymous	9126c0cfe4	Qwen Image model merging node. (#9202 )	2025-08-06 04:07:04 -04:00
comfyanonymous	d8c51ba15a	Add Qwen Image model to readme. (#9191 )	2025-08-05 07:41:18 -04:00
comfyanonymous	32a95bba8a	ComfyUI version 0.3.49	2025-08-05 07:33:02 -04:00
ComfyUI Wiki	da1ad9b516	Update template to 0.1.51 (#9187 )	2025-08-05 07:24:12 -04:00
comfyanonymous	d044a24398	Fix default shift and any latent size for qwen image model. (#9186 )	2025-08-05 06:12:27 -04:00
ComfyUI Wiki	5be6fd09ff	Update template to 0.1.48 (#9182 )	2025-08-05 03:48:56 -04:00
Christian Byrne	f69609bbd6	Add Veo3 video generation node with audio support (#9110 ) - Create new Veo3VideoGenerationNode that extends VeoVideoGenerationNode - Add support for generateAudio parameter (only for Veo3 models) - Support new Veo3 models: veo-3.0-generate-001, veo-3.0-fast-generate-001 - Fix Veo3 duration constraint to 8 seconds only - Update original node to be clearly Veo 2 only - Update API paths to use model parameter: /proxy/veo/{model}/generate - Regenerate API types from staging to include generateAudio parameter - Fix TripoModelVersion enum reference after regeneration - Mark generated API types file in .gitattributes	2025-08-05 01:52:25 -04:00
comfyanonymous	c012400240	Initial support for qwen image model. (#9179 )	2025-08-04 22:53:25 -04:00
comfyanonymous	03895dea7c	Fix another issue with the PR. (#9170 )	2025-08-04 04:33:04 -04:00
comfyanonymous	84f9759424	Add some warnings and prevent crash when cond devices don't match. (#9169 )	2025-08-04 04:20:12 -04:00
comfyanonymous	7991341e89	Various fixes for broken things from earlier PR. (#9168 )	2025-08-04 04:02:40 -04:00
comfyanonymous	140ffc7fdc	Fix broken controlnet from last PR. (#9167 )	2025-08-04 03:28:12 -04:00
comfyanonymous	182f90b5ec	Lower cond vram use by casting at the same time as device transfer. (#9159 )	2025-08-04 03:11:53 -04:00
comfyanonymous	aebac22193	Cleanup. (#9160 )	2025-08-03 07:08:11 -04:00
comfyanonymous	13aaa66ec2	Make sure context is on the right device. (#9154 )	2025-08-02 15:09:23 -04:00
comfyanonymous	5f582a9757	Make sure all the conds are on the right device. (#9151 )	2025-08-02 15:00:13 -04:00
ComfyUI Wiki	fbcc23945d	Update template to 0.1.47 (#9153 )	2025-08-02 14:15:29 -04:00
Johnpaul Chiwetelu	3dfefc88d0	API for Recently Used Items (#8792 ) * feat: add file creation time to model file metadata and user file info * fix linting	2025-08-01 22:02:06 -04:00