Merge branch 'master' into pysssss-model-db

Support SimpleTuner lycoris lora for Qwen-Image (#9280 )
Update template & embedded docs (#9283 )
2026-02-15 04:30:02 +00:00 · 2025-08-11 14:09:21 -07:00 · 2025-08-11 16:56:16 -04:00 · 2025-08-11 16:52:12 -04:00 · 2025-08-11 16:48:17 -04:00 · 2025-08-11 05:53:01 -04:00
112 changed files with 11959 additions and 2105 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,2 +1,3 @@
 /web/assets/** linguist-generated
 /web/** linguist-vendored
+comfy_api_nodes/apis/__init__.py linguist-generated
--- a/.github/workflows/check-line-endings.yml
+++ b/.github/workflows/check-line-endings.yml
@@ -17,7 +17,7 @@ jobs:
      - name: Check for Windows line endings (CRLF)
        run: |
          # Get the list of changed files in the PR
-          CHANGED_FILES=$(git diff --name-only origin/${{ github.base_ref }}..HEAD)
+          CHANGED_FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha }}..${{ github.event.pull_request.head.sha }})

          # Flag to track if CRLF is found
          CRLF_FOUND=false
--- a/README.md
+++ b/README.md
@@ -55,7 +55,7 @@ See what ComfyUI can do with the [example workflows](https://comfyanonymous.gith
 ## Features
 - Nodes/graph/flowchart interface to experiment and create complex Stable Diffusion workflows without needing to code anything.
 - Image Models
-   - SD1.x, SD2.x,
+   - SD1.x, SD2.x ([unCLIP](https://comfyanonymous.github.io/ComfyUI_examples/unclip/))
   - [SDXL](https://comfyanonymous.github.io/ComfyUI_examples/sdxl/), [SDXL Turbo](https://comfyanonymous.github.io/ComfyUI_examples/sdturbo/)
   - [Stable Cascade](https://comfyanonymous.github.io/ComfyUI_examples/stable_cascade/)
   - [SD3 and SD3.5](https://comfyanonymous.github.io/ComfyUI_examples/sd3/)
@@ -66,9 +66,11 @@ See what ComfyUI can do with the [example workflows](https://comfyanonymous.gith
   - [Lumina Image 2.0](https://comfyanonymous.github.io/ComfyUI_examples/lumina2/)
   - [HiDream](https://comfyanonymous.github.io/ComfyUI_examples/hidream/)
   - [Cosmos Predict2](https://comfyanonymous.github.io/ComfyUI_examples/cosmos_predict2/)
+   - [Qwen Image](https://comfyanonymous.github.io/ComfyUI_examples/qwen_image/)
 - Image Editing Models
   - [Omnigen 2](https://comfyanonymous.github.io/ComfyUI_examples/omnigen/)
   - [Flux Kontext](https://comfyanonymous.github.io/ComfyUI_examples/flux/#flux-kontext-image-editing-model)
+   - [HiDream E1.1](https://comfyanonymous.github.io/ComfyUI_examples/hidream/#hidream-e11)
 - Video Models
   - [Stable Video Diffusion](https://comfyanonymous.github.io/ComfyUI_examples/video/)
   - [Mochi](https://comfyanonymous.github.io/ComfyUI_examples/mochi/)
@@ -76,6 +78,7 @@ See what ComfyUI can do with the [example workflows](https://comfyanonymous.gith
   - [Hunyuan Video](https://comfyanonymous.github.io/ComfyUI_examples/hunyuan_video/)
   - [Nvidia Cosmos](https://comfyanonymous.github.io/ComfyUI_examples/cosmos/) and [Cosmos Predict2](https://comfyanonymous.github.io/ComfyUI_examples/cosmos_predict2/)
   - [Wan 2.1](https://comfyanonymous.github.io/ComfyUI_examples/wan/)
+   - [Wan 2.2](https://comfyanonymous.github.io/ComfyUI_examples/wan22/)
 - Audio Models
   - [Stable Audio](https://comfyanonymous.github.io/ComfyUI_examples/audio/)
   - [ACE Step](https://comfyanonymous.github.io/ComfyUI_examples/audio/)
@@ -83,9 +86,9 @@ See what ComfyUI can do with the [example workflows](https://comfyanonymous.gith
   - [Hunyuan3D 2.0](https://docs.comfy.org/tutorials/3d/hunyuan3D-2)
 - Asynchronous Queue system
 - Many optimizations: Only re-executes the parts of the workflow that changes between executions.
- Smart memory management: can automatically run models on GPUs with as low as 1GB vram.
+- Smart memory management: can automatically run large models on GPUs with as low as 1GB vram with smart offloading.
 - Works even if you don't have a GPU with: ```--cpu``` (slow)
- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs and CLIP models.
+- Can load ckpt and safetensors: All in one checkpoints or standalone diffusion models, VAEs and CLIP models.
 - Safe loading of ckpt, pt, pth, etc.. files.
 - Embeddings/Textual inversion
 - [Loras (regular, locon and loha)](https://comfyanonymous.github.io/ComfyUI_examples/lora/)
@@ -97,7 +100,6 @@ See what ComfyUI can do with the [example workflows](https://comfyanonymous.gith
 - [Inpainting](https://comfyanonymous.github.io/ComfyUI_examples/inpaint/) with both regular and inpainting models.
 - [ControlNet and T2I-Adapter](https://comfyanonymous.github.io/ComfyUI_examples/controlnet/)
 - [Upscale Models (ESRGAN, ESRGAN variants, SwinIR, Swin2SR, etc...)](https://comfyanonymous.github.io/ComfyUI_examples/upscale_models/)
- [unCLIP Models](https://comfyanonymous.github.io/ComfyUI_examples/unclip/)
 - [GLIGEN](https://comfyanonymous.github.io/ComfyUI_examples/gligen/)
 - [Model Merging](https://comfyanonymous.github.io/ComfyUI_examples/model_merging/)
 - [LCM models and Loras](https://comfyanonymous.github.io/ComfyUI_examples/lcm/)
@@ -110,7 +112,7 @@ Workflow examples can be found on the [Examples page](https://comfyanonymous.git

 ## Release Process

-ComfyUI follows a weekly release cycle every Friday, with three interconnected repositories:
+ComfyUI follows a weekly release cycle targeting Friday but this regularly changes because of model releases or large changes to the codebase. There are three interconnected repositories:

 1. **[ComfyUI Core](https://github.com/comfyanonymous/ComfyUI)**
   - Releases a new stable version (e.g., v0.7.0)
@@ -201,7 +203,7 @@ Put your VAE in: models/vae
 ### AMD GPUs (Linux only)
 AMD users can install rocm and pytorch with pip if you don't have it already installed, this is the command to install the stable version:

-```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3```
+```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.4```

 This is the command to install the nightly with ROCm 6.4 which might have some performance improvements:

@@ -235,7 +237,7 @@ Additional discussion and help can be found [here](https://github.com/comfyanony

 Nvidia users should install stable pytorch using this command:

-```pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu128```
+```pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu129```

 This is the command to install pytorch nightly instead which might have performance improvements.

@@ -293,6 +295,13 @@ For models compatible with Cambricon Extension for PyTorch (torch_mlu). Here's a
 2. Next, install the PyTorch(torch_mlu) following the instructions on the [Installation](https://www.cambricon.com/docs/sdk_1.15.0/cambricon_pytorch_1.17.0/user_guide_1.9/index.html)
 3. Launch ComfyUI by running `python main.py`

+#### Iluvatar Corex
+
+For models compatible with Iluvatar Extension for PyTorch. Here's a step-by-step guide tailored to your platform and installation method:
+
+1. Install the Iluvatar Corex Toolkit by adhering to the platform-specific instructions on the [Installation](https://support.iluvatar.com/#/DocumentCentre?id=1&nameCenter=2&productId=520117912052801536)
+2. Launch ComfyUI by running `python main.py`
+
 # Running

 ```python main.py```
--- a/alembic_db/versions/e9c714da8d57_init.py
+++ b/alembic_db/versions/e9c714da8d57_init.py
@@ -0,0 +1,40 @@
+"""init
+
+Revision ID: e9c714da8d57
+Revises:
+Create Date: 2025-05-30 20:14:33.772039
+
+"""
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic.
+revision: str = 'e9c714da8d57'
+down_revision: Union[str, None] = None
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    """Upgrade schema."""
+    op.create_table('model',
+    sa.Column('type', sa.Text(), nullable=False),
+    sa.Column('path', sa.Text(), nullable=False),
+    sa.Column('file_name', sa.Text(), nullable=True),
+    sa.Column('file_size', sa.Integer(), nullable=True),
+    sa.Column('hash', sa.Text(), nullable=True),
+    sa.Column('hash_algorithm', sa.Text(), nullable=True),
+    sa.Column('source_url', sa.Text(), nullable=True),
+    sa.Column('date_added', sa.DateTime(), server_default=sa.text('(CURRENT_TIMESTAMP)'), nullable=True),
+    sa.PrimaryKeyConstraint('type', 'path')
+    )
+
+
+def downgrade() -> None:
+    """Downgrade schema."""
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.drop_table('model')
+    # ### end Alembic commands ###
--- a/app/database/models.py
+++ b/app/database/models.py
@@ -1,4 +1,11 @@
+from sqlalchemy import (
+    Column,
+    Integer,
+    Text,
+    DateTime,
+)
 from sqlalchemy.orm import declarative_base
+from sqlalchemy.sql import func

 Base = declarative_base()

@@ -11,4 +18,42 @@ def to_dict(obj):
        if (val := getattr(obj, field))
    }

-# TODO: Define models here
+
+class Model(Base):
+    """
+    sqlalchemy model representing a model file in the system.
+
+    This class defines the database schema for storing information about model files,
+    including their type, path, hash, and when they were added to the system.
+
+    Attributes:
+        type (Text): The type of the model, this is the name of the folder in the models folder (primary key)
+        path (Text): The file path of the model relative to the type folder (primary key)
+        file_name (Text): The name of the model file
+        file_size (Integer): The size of the model file in bytes
+        hash (Text): A hash of the model file
+        hash_algorithm (Text): The algorithm used to generate the hash
+        source_url (Text): The URL of the model file
+        date_added (DateTime): Timestamp of when the model was added to the system
+    """
+
+    __tablename__ = "model"
+
+    type = Column(Text, primary_key=True)
+    path = Column(Text, primary_key=True)
+    file_name = Column(Text)
+    file_size = Column(Integer)
+    hash = Column(Text)
+    hash_algorithm = Column(Text)
+    source_url = Column(Text)
+    date_added = Column(DateTime, server_default=func.now())
+
+    def to_dict(self):
+        """
+        Convert the model instance to a dictionary representation.
+
+        Returns:
+            dict: A dictionary containing the attributes of the model
+        """
+        dict = to_dict(self)
+        return dict
--- a/app/frontend_management.py
+++ b/app/frontend_management.py
@@ -29,18 +29,48 @@ def frontend_install_warning_message():
 This error is happening because the ComfyUI frontend is no longer shipped as part of the main repo but as a pip package instead.
 """.strip()

+def parse_version(version: str) -> tuple[int, int, int]:
+        return tuple(map(int, version.split(".")))
+
+def is_valid_version(version: str) -> bool:
+    """Validate if a string is a valid semantic version (X.Y.Z format)."""
+    pattern = r"^(\d+)\.(\d+)\.(\d+)$"
+    return bool(re.match(pattern, version))
+
+def get_installed_frontend_version():
+    """Get the currently installed frontend package version."""
+    frontend_version_str = version("comfyui-frontend-package")
+    return frontend_version_str
+
+def get_required_frontend_version():
+    """Get the required frontend version from requirements.txt."""
+    try:
+        with open(requirements_path, "r", encoding="utf-8") as f:
+            for line in f:
+                line = line.strip()
+                if line.startswith("comfyui-frontend-package=="):
+                    version_str = line.split("==")[-1]
+                    if not is_valid_version(version_str):
+                        logging.error(f"Invalid version format in requirements.txt: {version_str}")
+                        return None
+                    return version_str
+            logging.error("comfyui-frontend-package not found in requirements.txt")
+            return None
+    except FileNotFoundError:
+        logging.error("requirements.txt not found. Cannot determine required frontend version.")
+        return None
+    except Exception as e:
+        logging.error(f"Error reading requirements.txt: {e}")
+        return None

 def check_frontend_version():
    """Check if the frontend version is up to date."""

-    def parse_version(version: str) -> tuple[int, int, int]:
-        return tuple(map(int, version.split(".")))
-
    try:
-        frontend_version_str = version("comfyui-frontend-package")
+        frontend_version_str = get_installed_frontend_version()
        frontend_version = parse_version(frontend_version_str)
-        with open(requirements_path, "r", encoding="utf-8") as f:
-            required_frontend = parse_version(f.readline().split("=")[-1])
+        required_frontend_str = get_required_frontend_version()
+        required_frontend = parse_version(required_frontend_str)
        if frontend_version < required_frontend:
            app.logger.log_startup_warning(
                f"""
@@ -166,10 +196,35 @@ def download_release_asset_zip(release: Release, destination_path: str) -> None:


 class FrontendManager:
+    """
+    A class to manage ComfyUI frontend versions and installations.
+
+    This class handles the initialization and management of different frontend versions,
+    including the default frontend from the pip package and custom frontend versions
+    from GitHub repositories.
+
+    Attributes:
+        CUSTOM_FRONTENDS_ROOT (str): The root directory where custom frontend versions are stored.
+    """
+
    CUSTOM_FRONTENDS_ROOT = str(Path(__file__).parents[1] / "web_custom_versions")

+    @classmethod
+    def get_required_frontend_version(cls) -> str:
+        """Get the required frontend package version."""
+        return get_required_frontend_version()
+
    @classmethod
    def default_frontend_path(cls) -> str:
+        """
+        Get the path to the default frontend installation from the pip package.
+
+        Returns:
+            str: The path to the default frontend static files.
+
+        Raises:
+            SystemExit: If the comfyui-frontend-package is not installed.
+        """
        try:
            import comfyui_frontend_package

@@ -190,6 +245,15 @@ comfyui-frontend-package is not installed.

    @classmethod
    def templates_path(cls) -> str:
+        """
+        Get the path to the workflow templates.
+
+        Returns:
+            str: The path to the workflow templates directory.
+
+        Raises:
+            SystemExit: If the comfyui-workflow-templates package is not installed.
+        """
        try:
            import comfyui_workflow_templates

@@ -225,11 +289,16 @@ comfyui-workflow-templates is not installed.
    @classmethod
    def parse_version_string(cls, value: str) -> tuple[str, str, str]:
        """
+        Parse a version string into its components.
+
+        The version string should be in the format: 'owner/repo@version'
+        where version can be either a semantic version (v1.2.3) or 'latest'.
+
        Args:
            value (str): The version string to parse.

        Returns:
-            tuple[str, str]: A tuple containing provider name and version.
+            tuple[str, str, str]: A tuple containing (owner, repo, version).

        Raises:
            argparse.ArgumentTypeError: If the version string is invalid.
@@ -246,18 +315,22 @@ comfyui-workflow-templates is not installed.
        cls, version_string: str, provider: Optional[FrontEndProvider] = None
    ) -> str:
        """
-        Initializes the frontend for the specified version.
+        Initialize a frontend version without error handling.
+
+        This method attempts to initialize a specific frontend version, either from
+        the default pip package or from a custom GitHub repository. It will download
+        and extract the frontend files if necessary.

        Args:
-            version_string (str): The version string.
-            provider (FrontEndProvider, optional): The provider to use. Defaults to None.
+            version_string (str): The version string specifying which frontend to use.
+            provider (FrontEndProvider, optional): The provider to use for custom frontends.

        Returns:
            str: The path to the initialized frontend.

        Raises:
-            Exception: If there is an error during the initialization process.
-            main error source might be request timeout or invalid URL.
+            Exception: If there is an error during initialization (e.g., network timeout,
+                      invalid URL, or missing assets).
        """
        if version_string == DEFAULT_VERSION_STRING:
            check_frontend_version()
@@ -309,13 +382,17 @@ comfyui-workflow-templates is not installed.
    @classmethod
    def init_frontend(cls, version_string: str) -> str:
        """
-        Initializes the frontend with the specified version string.
+        Initialize a frontend version with error handling.
+
+        This is the main method to initialize a frontend version. It wraps init_frontend_unsafe
+        with error handling, falling back to the default frontend if initialization fails.

        Args:
-            version_string (str): The version string to initialize the frontend with.
+            version_string (str): The version string specifying which frontend to use.

        Returns:
-            str: The path of the initialized frontend.
+            str: The path to the initialized frontend. If initialization fails,
+                 returns the path to the default frontend.
        """
        try:
            return cls.init_frontend_unsafe(version_string)
--- a/app/model_manager.py
+++ b/app/model_manager.py
@@ -130,10 +130,21 @@ class ModelFileManager:

            for file_name in filenames:
                try:
-                    relative_path = os.path.relpath(os.path.join(dirpath, file_name), directory)
-                    result.append(relative_path)
-                except:
-                    logging.warning(f"Warning: Unable to access {file_name}. Skipping this file.")
+                    full_path = os.path.join(dirpath, file_name)
+                    relative_path = os.path.relpath(full_path, directory)
+
+                    # Get file metadata
+                    file_info = {
+                        "name": relative_path,
+                        "pathIndex": pathIndex,
+                        "modified": os.path.getmtime(full_path),  # Add modification time
+                        "created": os.path.getctime(full_path),   # Add creation time
+                        "size": os.path.getsize(full_path)        # Add file size
+                    }
+                    result.append(file_info)
+
+                except Exception as e:
+                    logging.warning(f"Warning: Unable to access {file_name}. Error: {e}. Skipping this file.")
                    continue

            for d in subdirs:
@@ -144,7 +155,7 @@ class ModelFileManager:
                    logging.warning(f"Warning: Unable to access {path}. Skipping this path.")
                    continue

-        return [{"name": f, "pathIndex": pathIndex} for f in result], dirs, time.perf_counter()
+        return result, dirs, time.perf_counter()

    def get_model_previews(self, filepath: str) -> list[str | BytesIO]:
        dirname = os.path.dirname(filepath)
--- a/app/model_processor.py
+++ b/app/model_processor.py
@@ -0,0 +1,331 @@
+import os
+import logging
+import time
+
+import requests
+from tqdm import tqdm
+from folder_paths import get_relative_path, get_full_path
+from app.database.db import create_session, dependencies_available, can_create_session
+import blake3
+import comfy.utils
+
+
+if dependencies_available():
+    from app.database.models import Model
+
+
+class ModelProcessor:
+    def _validate_path(self, model_path):
+        try:
+            if not self._file_exists(model_path):
+                logging.error(f"Model file not found: {model_path}")
+                return None
+
+            result = get_relative_path(model_path)
+            if not result:
+                logging.error(
+                    f"Model file not in a recognized model directory: {model_path}"
+                )
+                return None
+
+            return result
+        except Exception as e:
+            logging.error(f"Error validating model path {model_path}: {str(e)}")
+            return None
+
+    def _file_exists(self, path):
+        """Check if a file exists."""
+        return os.path.exists(path)
+
+    def _get_file_size(self, path):
+        """Get file size."""
+        return os.path.getsize(path)
+
+    def _get_hasher(self):
+        return blake3.blake3()
+
+    def _hash_file(self, model_path):
+        try:
+            hasher = self._get_hasher()
+            with open(model_path, "rb", buffering=0) as f:
+                b = bytearray(128 * 1024)
+                mv = memoryview(b)
+                while n := f.readinto(mv):
+                    hasher.update(mv[:n])
+            return hasher.hexdigest()
+        except Exception as e:
+            logging.error(f"Error hashing file {model_path}: {str(e)}")
+            return None
+
+    def _get_existing_model(self, session, model_type, model_relative_path):
+        return (
+            session.query(Model)
+            .filter(Model.type == model_type)
+            .filter(Model.path == model_relative_path)
+            .first()
+        )
+
+    def _ensure_source_url(self, session, model, source_url):
+        if model.source_url is None:
+            model.source_url = source_url
+            session.commit()
+
+    def _update_database(
+        self,
+        session,
+        model_type,
+        model_path,
+        model_relative_path,
+        model_hash,
+        model,
+        source_url,
+    ):
+        try:
+            if not model:
+                model = self._get_existing_model(
+                    session, model_type, model_relative_path
+                )
+
+            if not model:
+                model = Model(
+                    path=model_relative_path,
+                    type=model_type,
+                    file_name=os.path.basename(model_path),
+                )
+                session.add(model)
+
+            model.file_size = self._get_file_size(model_path)
+            model.hash = model_hash
+            if model_hash:
+                model.hash_algorithm = "blake3"
+            model.source_url = source_url
+
+            session.commit()
+            return model
+        except Exception as e:
+            logging.error(
+                f"Error updating database for {model_relative_path}: {str(e)}"
+            )
+
+    def process_file(self, model_path, source_url=None, model_hash=None):
+        """
+        Process a model file and update the database with metadata.
+        If the file already exists and matches the database, it will not be processed again.
+        Returns the model object or if an error occurs, returns None.
+        """
+        try:
+            if not can_create_session():
+                return
+
+            result = self._validate_path(model_path)
+            if not result:
+                return
+            model_type, model_relative_path = result
+
+            with create_session() as session:
+                session.expire_on_commit = False
+
+                existing_model = self._get_existing_model(
+                    session, model_type, model_relative_path
+                )
+                if (
+                    existing_model
+                    and existing_model.hash
+                    and existing_model.file_size == self._get_file_size(model_path)
+                ):
+                    # File exists with hash and same size, no need to process
+                    self._ensure_source_url(session, existing_model, source_url)
+                    return existing_model
+
+                if model_hash:
+                    model_hash = model_hash.lower()
+                    logging.info(f"Using provided hash: {model_hash}")
+                else:
+                    start_time = time.time()
+                    logging.info(f"Hashing model {model_relative_path}")
+                    model_hash = self._hash_file(model_path)
+                    if not model_hash:
+                        return
+                    logging.info(
+                        f"Model hash: {model_hash} (duration: {time.time() - start_time} seconds)"
+                    )
+
+                return self._update_database(
+                    session,
+                    model_type,
+                    model_path,
+                    model_relative_path,
+                    model_hash,
+                    existing_model,
+                    source_url,
+                )
+        except Exception as e:
+            logging.error(f"Error processing model file {model_path}: {str(e)}")
+            return None
+
+    def retrieve_model_by_hash(self, model_hash, model_type=None, session=None):
+        """
+        Retrieve a model file from the database by hash and optionally by model type.
+        Returns the model object or None if the model doesnt exist or an error occurs.
+        """
+        try:
+            if not can_create_session():
+                return
+
+            dispose_session = False
+
+            if session is None:
+                session = create_session()
+                dispose_session = True
+
+            model = session.query(Model).filter(Model.hash == model_hash)
+            if model_type is not None:
+                model = model.filter(Model.type == model_type)
+            return model.first()
+        except Exception as e:
+            logging.error(f"Error retrieving model by hash {model_hash}: {str(e)}")
+            return None
+        finally:
+            if dispose_session:
+                session.close()
+
+    def retrieve_hash(self, model_path, model_type=None):
+        """
+        Retrieve the hash of a model file from the database.
+        Returns the hash or None if the model doesnt exist or an error occurs.
+        """
+        try:
+            if not can_create_session():
+                return
+
+            if model_type is not None:
+                result = self._validate_path(model_path)
+                if not result:
+                    return None
+            model_type, model_relative_path = result
+
+            with create_session() as session:
+                model = self._get_existing_model(
+                    session, model_type, model_relative_path
+                )
+                if model and model.hash:
+                    return model.hash
+                return None
+        except Exception as e:
+            logging.error(f"Error retrieving hash for {model_path}: {str(e)}")
+            return None
+
+    def _validate_file_extension(self, file_name):
+        """Validate that the file extension is supported."""
+        extension = os.path.splitext(file_name)[1]
+        if extension not in (".safetensors", ".sft", ".txt", ".csv", ".json", ".yaml"):
+            raise ValueError(f"Unsupported unsafe file for download: {file_name}")
+
+    def _check_existing_file(self, model_type, file_name, expected_hash):
+        """Check if file exists and has correct hash."""
+        destination_path = get_full_path(model_type, file_name, allow_missing=True)
+        if self._file_exists(destination_path):
+            model = self.process_file(destination_path)
+            if model and (expected_hash is None or model.hash == expected_hash):
+                logging.debug(
+                    f"File {destination_path} already exists in the database and has the correct hash or no hash was provided."
+                )
+                return destination_path
+            else:
+                raise ValueError(
+                    f"File {destination_path} exists with hash {model.hash if model else 'unknown'} but expected {expected_hash}. Please delete the file and try again."
+                )
+        return None
+
+    def _check_existing_file_by_hash(self, hash, type, url):
+        """Check if a file with the given hash exists in the database and on disk."""
+        hash = hash.lower()
+        with create_session() as session:
+            model = self.retrieve_model_by_hash(hash, type, session)
+            if model:
+                existing_path = get_full_path(type, model.path)
+                if existing_path:
+                    logging.debug(
+                        f"File {model.path} already exists in the database at {existing_path}"
+                    )
+                    self._ensure_source_url(session, model, url)
+                    return existing_path
+                else:
+                    logging.debug(
+                        f"File {model.path} exists in the database but not on disk"
+                    )
+        return None
+
+    def _download_file(self, url, destination_path, hasher):
+        """Download a file and update the hasher with its contents."""
+        response = requests.get(url, stream=True)
+        logging.info(f"Downloading {url} to {destination_path}")
+
+        with open(destination_path, "wb") as f:
+            total_size = int(response.headers.get("content-length", 0))
+            if total_size > 0:
+                pbar = comfy.utils.ProgressBar(total_size)
+            else:
+                pbar = None
+            with tqdm(total=total_size, unit="B", unit_scale=True) as progress_bar:
+                for chunk in response.iter_content(chunk_size=128 * 1024):
+                    if chunk:
+                        f.write(chunk)
+                        hasher.update(chunk)
+                        progress_bar.update(len(chunk))
+                        if pbar:
+                            pbar.update(len(chunk))
+
+    def _verify_downloaded_hash(self, calculated_hash, expected_hash, destination_path):
+        """Verify that the downloaded file has the expected hash."""
+        if expected_hash is not None and calculated_hash != expected_hash:
+            self._remove_file(destination_path)
+            raise ValueError(
+                f"Downloaded file hash {calculated_hash} does not match expected hash {expected_hash}"
+            )
+
+    def _remove_file(self, file_path):
+        """Remove a file from disk."""
+        os.remove(file_path)
+
+    def ensure_downloaded(self, type, url, desired_file_name, hash=None):
+        """
+        Ensure a model file is downloaded and has the correct hash.
+        Returns the path to the downloaded file.
+        """
+        logging.debug(
+            f"Ensuring {type} file is downloaded. URL='{url}' Destination='{desired_file_name}' Hash='{hash}'"
+        )
+
+        # Validate file extension
+        self._validate_file_extension(desired_file_name)
+
+        # Check if file exists with correct hash
+        if hash:
+            existing_path = self._check_existing_file_by_hash(hash, type, url)
+            if existing_path:
+                return existing_path
+
+        # Check if file exists locally
+        destination_path = get_full_path(type, desired_file_name, allow_missing=True)
+        existing_path = self._check_existing_file(type, desired_file_name, hash)
+        if existing_path:
+            return existing_path
+
+        # Download the file
+        hasher = self._get_hasher()
+        self._download_file(url, destination_path, hasher)
+
+        # Verify hash
+        calculated_hash = hasher.hexdigest()
+        self._verify_downloaded_hash(calculated_hash, hash, destination_path)
+
+        # Update database
+        self.process_file(destination_path, url, calculated_hash)
+
+        # TODO: Notify frontend to reload models
+
+        return destination_path
+
+
+model_processor = ModelProcessor()
--- a/app/user_manager.py
+++ b/app/user_manager.py
@@ -20,13 +20,15 @@ class FileInfo(TypedDict):
    path: str
    size: int
    modified: int
+    created: int


 def get_file_info(path: str, relative_to: str) -> FileInfo:
    return {
        "path": os.path.relpath(path, relative_to).replace(os.sep, '/'),
        "size": os.path.getsize(path),
-        "modified": os.path.getmtime(path)
+        "modified": os.path.getmtime(path),
+        "created": os.path.getctime(path)
    }


--- a/comfy/cli_args.py
+++ b/comfy/cli_args.py
@@ -49,7 +49,8 @@ parser.add_argument("--temp-directory", type=str, default=None, help="Set the Co
 parser.add_argument("--input-directory", type=str, default=None, help="Set the ComfyUI input directory. Overrides --base-directory.")
 parser.add_argument("--auto-launch", action="store_true", help="Automatically launch ComfyUI in the default browser.")
 parser.add_argument("--disable-auto-launch", action="store_true", help="Disable auto launching the browser.")
-parser.add_argument("--cuda-device", type=int, default=None, metavar="DEVICE_ID", help="Set the id of the cuda device this instance will use.")
+parser.add_argument("--cuda-device", type=int, default=None, metavar="DEVICE_ID", help="Set the id of the cuda device this instance will use. All other devices will not be visible.")
+parser.add_argument("--default-device", type=int, default=None, metavar="DEFAULT_DEVICE_ID", help="Set the id of the default device, all other devices will stay visible.")
 cm_group = parser.add_mutually_exclusive_group()
 cm_group.add_argument("--cuda-malloc", action="store_true", help="Enable cudaMallocAsync (enabled by default for torch 2.0 and up).")
 cm_group.add_argument("--disable-cuda-malloc", action="store_true", help="Disable cudaMallocAsync.")
@@ -209,6 +210,7 @@ database_default_path = os.path.abspath(
    os.path.join(os.path.dirname(__file__), "..", "user", "comfyui.db")
 )
 parser.add_argument("--database-url", type=str, default=f"sqlite:///{database_default_path}", help="Specify the database URL, e.g. for an in-memory database you can use 'sqlite:///:memory:'.")
+parser.add_argument("--disable-model-processing", action="store_true", help="Disable model file processing, e.g. computing hashes and extracting metadata.")

 if comfy.options.args_parsing:
    args = parser.parse_args()
--- a/comfy/conds.py
+++ b/comfy/conds.py
@@ -1,6 +1,7 @@
 import torch
 import math
 import comfy.utils
+import logging


 class CONDRegular:
@@ -10,12 +11,15 @@ class CONDRegular:
    def _copy_with(self, cond):
        return self.__class__(cond)

-    def process_cond(self, batch_size, device, **kwargs):
-        return self._copy_with(comfy.utils.repeat_to_batch_size(self.cond, batch_size).to(device))
+    def process_cond(self, batch_size, **kwargs):
+        return self._copy_with(comfy.utils.repeat_to_batch_size(self.cond, batch_size))

    def can_concat(self, other):
        if self.cond.shape != other.cond.shape:
            return False
+        if self.cond.device != other.cond.device:
+            logging.warning("WARNING: conds not on same device, skipping concat.")
+            return False
        return True

    def concat(self, others):
@@ -29,14 +33,14 @@ class CONDRegular:


 class CONDNoiseShape(CONDRegular):
-    def process_cond(self, batch_size, device, area, **kwargs):
+    def process_cond(self, batch_size, area, **kwargs):
        data = self.cond
        if area is not None:
            dims = len(area) // 2
            for i in range(dims):
                data = data.narrow(i + 2, area[i + dims], area[i])

-        return self._copy_with(comfy.utils.repeat_to_batch_size(data, batch_size).to(device))
+        return self._copy_with(comfy.utils.repeat_to_batch_size(data, batch_size))


 class CONDCrossAttn(CONDRegular):
@@ -51,6 +55,9 @@ class CONDCrossAttn(CONDRegular):
            diff = mult_min // min(s1[1], s2[1])
            if diff > 4: #arbitrary limit on the padding because it's probably going to impact performance negatively if it's too much
                return False
+        if self.cond.device != other.cond.device:
+            logging.warning("WARNING: conds not on same device: skipping concat.")
+            return False
        return True

    def concat(self, others):
@@ -73,7 +80,7 @@ class CONDConstant(CONDRegular):
    def __init__(self, cond):
        self.cond = cond

-    def process_cond(self, batch_size, device, **kwargs):
+    def process_cond(self, batch_size, **kwargs):
        return self._copy_with(self.cond)

    def can_concat(self, other):
@@ -92,10 +99,10 @@ class CONDList(CONDRegular):
    def __init__(self, cond):
        self.cond = cond

-    def process_cond(self, batch_size, device, **kwargs):
+    def process_cond(self, batch_size, **kwargs):
        out = []
        for c in self.cond:
-            out.append(comfy.utils.repeat_to_batch_size(c, batch_size).to(device))
+            out.append(comfy.utils.repeat_to_batch_size(c, batch_size))

        return self._copy_with(out)

--- a/comfy/controlnet.py
+++ b/comfy/controlnet.py
@@ -28,6 +28,7 @@ import comfy.model_detection
 import comfy.model_patcher
 import comfy.ops
 import comfy.latent_formats
+import comfy.model_base

 import comfy.cldm.cldm
 import comfy.t2i_adapter.adapter
@@ -43,7 +44,6 @@ if TYPE_CHECKING:

 def broadcast_image_to(tensor, target_batch_size, batched_number):
    current_batch_size = tensor.shape[0]
-    #print(current_batch_size, target_batch_size)
    if current_batch_size == 1:
        return tensor

@@ -265,12 +265,12 @@ class ControlNet(ControlBase):
        for c in self.extra_conds:
            temp = cond.get(c, None)
            if temp is not None:
-                extra[c] = temp.to(dtype)
+                extra[c] = comfy.model_base.convert_tensor(temp, dtype, x_noisy.device)

        timestep = self.model_sampling_current.timestep(t)
        x_noisy = self.model_sampling_current.calculate_input(t, x_noisy)

-        control = self.control_model(x=x_noisy.to(dtype), hint=self.cond_hint, timesteps=timestep.to(dtype), context=context.to(dtype), **extra)
+        control = self.control_model(x=x_noisy.to(dtype), hint=self.cond_hint, timesteps=timestep.to(dtype), context=comfy.model_management.cast_to_device(context, x_noisy.device, dtype), **extra)
        return self.control_merge(control, control_prev, output_dtype=None)

    def copy(self):
--- a/comfy/k_diffusion/sampling.py
+++ b/comfy/k_diffusion/sampling.py
@@ -1210,39 +1210,21 @@ def sample_deis(model, x, sigmas, extra_args=None, callback=None, disable=None,
    return x_next


-@torch.no_grad()
-def sample_euler_cfg_pp(model, x, sigmas, extra_args=None, callback=None, disable=None):
-    extra_args = {} if extra_args is None else extra_args
-
-    temp = [0]
-    def post_cfg_function(args):
-        temp[0] = args["uncond_denoised"]
-        return args["denoised"]
-
-    model_options = extra_args.get("model_options", {}).copy()
-    extra_args["model_options"] = comfy.model_patcher.set_model_options_post_cfg_function(model_options, post_cfg_function, disable_cfg1_optimization=True)
-
-    s_in = x.new_ones([x.shape[0]])
-    for i in trange(len(sigmas) - 1, disable=disable):
-        sigma_hat = sigmas[i]
-        denoised = model(x, sigma_hat * s_in, **extra_args)
-        d = to_d(x, sigma_hat, temp[0])
-        if callback is not None:
-            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigma_hat, 'denoised': denoised})
-        # Euler method
-        x = denoised + d * sigmas[i + 1]
-    return x
-
@torch.no_grad()
 def sample_euler_ancestral_cfg_pp(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None):
-    """Ancestral sampling with Euler method steps."""
+    """Ancestral sampling with Euler method steps (CFG++)."""
    extra_args = {} if extra_args is None else extra_args
    seed = extra_args.get("seed", None)
    noise_sampler = default_noise_sampler(x, seed=seed) if noise_sampler is None else noise_sampler

-    temp = [0]
+    model_sampling = model.inner_model.model_patcher.get_model_object("model_sampling")
+    lambda_fn = partial(sigma_to_half_log_snr, model_sampling=model_sampling)
+
+    uncond_denoised = None
+
    def post_cfg_function(args):
-        temp[0] = args["uncond_denoised"]
+        nonlocal uncond_denoised
+        uncond_denoised = args["uncond_denoised"]
        return args["denoised"]

    model_options = extra_args.get("model_options", {}).copy()
@@ -1251,15 +1233,33 @@ def sample_euler_ancestral_cfg_pp(model, x, sigmas, extra_args=None, callback=No
    s_in = x.new_ones([x.shape[0]])
    for i in trange(len(sigmas) - 1, disable=disable):
        denoised = model(x, sigmas[i] * s_in, **extra_args)
-        sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1], eta=eta)
        if callback is not None:
            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
-        d = to_d(x, sigmas[i], temp[0])
-        # Euler method
-        x = denoised + d * sigma_down
-        if sigmas[i + 1] > 0:
-            x = x + noise_sampler(sigmas[i], sigmas[i + 1]) * s_noise * sigma_up
+        if sigmas[i + 1] == 0:
+            # Denoising step
+            x = denoised
+        else:
+            alpha_s = sigmas[i] * lambda_fn(sigmas[i]).exp()
+            alpha_t = sigmas[i + 1] * lambda_fn(sigmas[i + 1]).exp()
+            d = to_d(x, sigmas[i], alpha_s * uncond_denoised)   # to noise
+
+            # DDIM stochastic sampling
+            sigma_down, sigma_up = get_ancestral_step(sigmas[i] / alpha_s, sigmas[i + 1] / alpha_t, eta=eta)
+            sigma_down = alpha_t * sigma_down
+
+            # Euler method
+            x = alpha_t * denoised + sigma_down * d
+            if eta > 0 and s_noise > 0:
+                x = x + alpha_t * noise_sampler(sigmas[i], sigmas[i + 1]) * s_noise * sigma_up
    return x
+
+
+@torch.no_grad()
+def sample_euler_cfg_pp(model, x, sigmas, extra_args=None, callback=None, disable=None):
+    """Euler method steps (CFG++)."""
+    return sample_euler_ancestral_cfg_pp(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=0.0, s_noise=0.0, noise_sampler=None)
+
+
@torch.no_grad()
 def sample_dpmpp_2s_ancestral_cfg_pp(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None):
    """Ancestral sampling with DPM-Solver++(2S) second-order steps."""
--- a/comfy/latent_formats.py
+++ b/comfy/latent_formats.py
@@ -457,6 +457,82 @@ class Wan21(LatentFormat):
        latents_std = self.latents_std.to(latent.device, latent.dtype)
        return latent * latents_std / self.scale_factor + latents_mean

+class Wan22(Wan21):
+    latent_channels = 48
+    latent_dimensions = 3
+
+    latent_rgb_factors = [
+            [ 0.0119,  0.0103,  0.0046],
+            [-0.1062, -0.0504,  0.0165],
+            [ 0.0140,  0.0409,  0.0491],
+            [-0.0813, -0.0677,  0.0607],
+            [ 0.0656,  0.0851,  0.0808],
+            [ 0.0264,  0.0463,  0.0912],
+            [ 0.0295,  0.0326,  0.0590],
+            [-0.0244, -0.0270,  0.0025],
+            [ 0.0443, -0.0102,  0.0288],
+            [-0.0465, -0.0090, -0.0205],
+            [ 0.0359,  0.0236,  0.0082],
+            [-0.0776,  0.0854,  0.1048],
+            [ 0.0564,  0.0264,  0.0561],
+            [ 0.0006,  0.0594,  0.0418],
+            [-0.0319, -0.0542, -0.0637],
+            [-0.0268,  0.0024,  0.0260],
+            [ 0.0539,  0.0265,  0.0358],
+            [-0.0359, -0.0312, -0.0287],
+            [-0.0285, -0.1032, -0.1237],
+            [ 0.1041,  0.0537,  0.0622],
+            [-0.0086, -0.0374, -0.0051],
+            [ 0.0390,  0.0670,  0.2863],
+            [ 0.0069,  0.0144,  0.0082],
+            [ 0.0006, -0.0167,  0.0079],
+            [ 0.0313, -0.0574, -0.0232],
+            [-0.1454, -0.0902, -0.0481],
+            [ 0.0714,  0.0827,  0.0447],
+            [-0.0304, -0.0574, -0.0196],
+            [ 0.0401,  0.0384,  0.0204],
+            [-0.0758, -0.0297, -0.0014],
+            [ 0.0568,  0.1307,  0.1372],
+            [-0.0055, -0.0310, -0.0380],
+            [ 0.0239, -0.0305,  0.0325],
+            [-0.0663, -0.0673, -0.0140],
+            [-0.0416, -0.0047, -0.0023],
+            [ 0.0166,  0.0112, -0.0093],
+            [-0.0211,  0.0011,  0.0331],
+            [ 0.1833,  0.1466,  0.2250],
+            [-0.0368,  0.0370,  0.0295],
+            [-0.3441, -0.3543, -0.2008],
+            [-0.0479, -0.0489, -0.0420],
+            [-0.0660, -0.0153,  0.0800],
+            [-0.0101,  0.0068,  0.0156],
+            [-0.0690, -0.0452, -0.0927],
+            [-0.0145,  0.0041,  0.0015],
+            [ 0.0421,  0.0451,  0.0373],
+            [ 0.0504, -0.0483, -0.0356],
+            [-0.0837,  0.0168,  0.0055]
+        ]
+
+    latent_rgb_factors_bias = [0.0317, -0.0878, -0.1388]
+
+    def __init__(self):
+        self.scale_factor = 1.0
+        self.latents_mean = torch.tensor([
+                -0.2289, -0.0052, -0.1323, -0.2339, -0.2799, 0.0174, 0.1838, 0.1557,
+                -0.1382, 0.0542, 0.2813, 0.0891, 0.1570, -0.0098, 0.0375, -0.1825,
+                -0.2246, -0.1207, -0.0698, 0.5109, 0.2665, -0.2108, -0.2158, 0.2502,
+                -0.2055, -0.0322, 0.1109, 0.1567, -0.0729, 0.0899, -0.2799, -0.1230,
+                -0.0313, -0.1649, 0.0117, 0.0723, -0.2839, -0.2083, -0.0520, 0.3748,
+                0.0152, 0.1957, 0.1433, -0.2944, 0.3573, -0.0548, -0.1681, -0.0667,
+            ]).view(1, self.latent_channels, 1, 1, 1)
+        self.latents_std = torch.tensor([
+                0.4765, 1.0364, 0.4514, 1.1677, 0.5313, 0.4990, 0.4818, 0.5013,
+                0.8158, 1.0344, 0.5894, 1.0901, 0.6885, 0.6165, 0.8454, 0.4978,
+                0.5759, 0.3523, 0.7135, 0.6804, 0.5833, 1.4146, 0.8986, 0.5659,
+                0.7069, 0.5338, 0.4889, 0.4917, 0.4069, 0.4999, 0.6866, 0.4093,
+                0.5709, 0.6065, 0.6415, 0.4944, 0.5726, 1.2042, 0.5458, 1.6887,
+                0.3971, 1.0600, 0.3943, 0.5537, 0.5444, 0.4089, 0.7468, 0.7744
+            ]).view(1, self.latent_channels, 1, 1, 1)
+
 class Hunyuan3Dv2(LatentFormat):
    latent_channels = 64
    latent_dimensions = 1
--- a/comfy/ldm/cosmos/cosmos_tokenizer/utils.py
+++ b/comfy/ldm/cosmos/cosmos_tokenizer/utils.py
@@ -58,7 +58,8 @@ def is_odd(n: int) -> bool:


 def nonlinearity(x):
-    return x * torch.sigmoid(x)
+    # x * sigmoid(x)
+    return torch.nn.functional.silu(x)


 def Normalize(in_channels, num_groups=32):
--- a/comfy/ldm/modules/diffusionmodules/model.py
+++ b/comfy/ldm/modules/diffusionmodules/model.py
@@ -36,7 +36,7 @@ def get_timestep_embedding(timesteps, embedding_dim):

 def nonlinearity(x):
    # swish
-    return x*torch.sigmoid(x)
+    return torch.nn.functional.silu(x)


 def Normalize(in_channels, num_groups=32):
--- a/comfy/ldm/qwen_image/model.py
+++ b/comfy/ldm/qwen_image/model.py
@@ -0,0 +1,400 @@
+# https://github.com/QwenLM/Qwen-Image (Apache 2.0)
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from typing import Optional, Tuple
+from einops import repeat
+
+from comfy.ldm.lightricks.model import TimestepEmbedding, Timesteps
+from comfy.ldm.modules.attention import optimized_attention_masked
+from comfy.ldm.flux.layers import EmbedND
+import comfy.ldm.common_dit
+
+class GELU(nn.Module):
+    def __init__(self, dim_in: int, dim_out: int, approximate: str = "none", bias: bool = True, dtype=None, device=None, operations=None):
+        super().__init__()
+        self.proj = operations.Linear(dim_in, dim_out, bias=bias, dtype=dtype, device=device)
+        self.approximate = approximate
+
+    def forward(self, hidden_states):
+        hidden_states = self.proj(hidden_states)
+        hidden_states = F.gelu(hidden_states, approximate=self.approximate)
+        return hidden_states
+
+
+class FeedForward(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        dim_out: Optional[int] = None,
+        mult: int = 4,
+        dropout: float = 0.0,
+        inner_dim=None,
+        bias: bool = True,
+        dtype=None, device=None, operations=None
+    ):
+        super().__init__()
+        if inner_dim is None:
+            inner_dim = int(dim * mult)
+        dim_out = dim_out if dim_out is not None else dim
+
+        self.net = nn.ModuleList([])
+        self.net.append(GELU(dim, inner_dim, approximate="tanh", bias=bias, dtype=dtype, device=device, operations=operations))
+        self.net.append(nn.Dropout(dropout))
+        self.net.append(operations.Linear(inner_dim, dim_out, bias=bias, dtype=dtype, device=device))
+
+    def forward(self, hidden_states: torch.Tensor, *args, **kwargs) -> torch.Tensor:
+        for module in self.net:
+            hidden_states = module(hidden_states)
+        return hidden_states
+
+
+def apply_rotary_emb(x, freqs_cis):
+    if x.shape[1] == 0:
+        return x
+
+    t_ = x.reshape(*x.shape[:-1], -1, 1, 2)
+    t_out = freqs_cis[..., 0] * t_[..., 0] + freqs_cis[..., 1] * t_[..., 1]
+    return t_out.reshape(*x.shape)
+
+
+class QwenTimestepProjEmbeddings(nn.Module):
+    def __init__(self, embedding_dim, pooled_projection_dim, dtype=None, device=None, operations=None):
+        super().__init__()
+        self.time_proj = Timesteps(num_channels=256, flip_sin_to_cos=True, downscale_freq_shift=0, scale=1000)
+        self.timestep_embedder = TimestepEmbedding(
+            in_channels=256,
+            time_embed_dim=embedding_dim,
+            dtype=dtype,
+            device=device,
+            operations=operations
+        )
+
+    def forward(self, timestep, hidden_states):
+        timesteps_proj = self.time_proj(timestep)
+        timesteps_emb = self.timestep_embedder(timesteps_proj.to(dtype=hidden_states.dtype))
+        return timesteps_emb
+
+
+class Attention(nn.Module):
+    def __init__(
+        self,
+        query_dim: int,
+        dim_head: int = 64,
+        heads: int = 8,
+        dropout: float = 0.0,
+        bias: bool = False,
+        eps: float = 1e-5,
+        out_bias: bool = True,
+        out_dim: int = None,
+        out_context_dim: int = None,
+        dtype=None,
+        device=None,
+        operations=None
+    ):
+        super().__init__()
+        self.inner_dim = out_dim if out_dim is not None else dim_head * heads
+        self.inner_kv_dim = self.inner_dim
+        self.heads = heads
+        self.dim_head = dim_head
+        self.out_dim = out_dim if out_dim is not None else query_dim
+        self.out_context_dim = out_context_dim if out_context_dim is not None else query_dim
+        self.dropout = dropout
+
+        # Q/K normalization
+        self.norm_q = operations.RMSNorm(dim_head, eps=eps, elementwise_affine=True, dtype=dtype, device=device)
+        self.norm_k = operations.RMSNorm(dim_head, eps=eps, elementwise_affine=True, dtype=dtype, device=device)
+        self.norm_added_q = operations.RMSNorm(dim_head, eps=eps, dtype=dtype, device=device)
+        self.norm_added_k = operations.RMSNorm(dim_head, eps=eps, dtype=dtype, device=device)
+
+        # Image stream projections
+        self.to_q = operations.Linear(query_dim, self.inner_dim, bias=bias, dtype=dtype, device=device)
+        self.to_k = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+        self.to_v = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+
+        # Text stream projections
+        self.add_q_proj = operations.Linear(query_dim, self.inner_dim, bias=bias, dtype=dtype, device=device)
+        self.add_k_proj = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+        self.add_v_proj = operations.Linear(query_dim, self.inner_kv_dim, bias=bias, dtype=dtype, device=device)
+
+        # Output projections
+        self.to_out = nn.ModuleList([
+            operations.Linear(self.inner_dim, self.out_dim, bias=out_bias, dtype=dtype, device=device),
+            nn.Dropout(dropout)
+        ])
+        self.to_add_out = operations.Linear(self.inner_dim, self.out_context_dim, bias=out_bias, dtype=dtype, device=device)
+
+    def forward(
+        self,
+        hidden_states: torch.FloatTensor,  # Image stream
+        encoder_hidden_states: torch.FloatTensor = None,  # Text stream
+        encoder_hidden_states_mask: torch.FloatTensor = None,
+        attention_mask: Optional[torch.FloatTensor] = None,
+        image_rotary_emb: Optional[torch.Tensor] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        seq_txt = encoder_hidden_states.shape[1]
+
+        img_query = self.to_q(hidden_states).unflatten(-1, (self.heads, -1))
+        img_key = self.to_k(hidden_states).unflatten(-1, (self.heads, -1))
+        img_value = self.to_v(hidden_states).unflatten(-1, (self.heads, -1))
+
+        txt_query = self.add_q_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+        txt_key = self.add_k_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+        txt_value = self.add_v_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+
+        img_query = self.norm_q(img_query)
+        img_key = self.norm_k(img_key)
+        txt_query = self.norm_added_q(txt_query)
+        txt_key = self.norm_added_k(txt_key)
+
+        joint_query = torch.cat([txt_query, img_query], dim=1)
+        joint_key = torch.cat([txt_key, img_key], dim=1)
+        joint_value = torch.cat([txt_value, img_value], dim=1)
+
+        joint_query = apply_rotary_emb(joint_query, image_rotary_emb)
+        joint_key = apply_rotary_emb(joint_key, image_rotary_emb)
+
+        joint_query = joint_query.flatten(start_dim=2)
+        joint_key = joint_key.flatten(start_dim=2)
+        joint_value = joint_value.flatten(start_dim=2)
+
+        joint_hidden_states = optimized_attention_masked(joint_query, joint_key, joint_value, self.heads, attention_mask)
+
+        txt_attn_output = joint_hidden_states[:, :seq_txt, :]
+        img_attn_output = joint_hidden_states[:, seq_txt:, :]
+
+        img_attn_output = self.to_out[0](img_attn_output)
+        img_attn_output = self.to_out[1](img_attn_output)
+        txt_attn_output = self.to_add_out(txt_attn_output)
+
+        return img_attn_output, txt_attn_output
+
+
+class QwenImageTransformerBlock(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        num_attention_heads: int,
+        attention_head_dim: int,
+        eps: float = 1e-6,
+        dtype=None,
+        device=None,
+        operations=None
+    ):
+        super().__init__()
+        self.dim = dim
+        self.num_attention_heads = num_attention_heads
+        self.attention_head_dim = attention_head_dim
+
+        self.img_mod = nn.Sequential(
+            nn.SiLU(),
+            operations.Linear(dim, 6 * dim, bias=True, dtype=dtype, device=device),
+        )
+        self.img_norm1 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.img_norm2 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.img_mlp = FeedForward(dim=dim, dim_out=dim, dtype=dtype, device=device, operations=operations)
+
+        self.txt_mod = nn.Sequential(
+            nn.SiLU(),
+            operations.Linear(dim, 6 * dim, bias=True, dtype=dtype, device=device),
+        )
+        self.txt_norm1 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.txt_norm2 = operations.LayerNorm(dim, elementwise_affine=False, eps=eps, dtype=dtype, device=device)
+        self.txt_mlp = FeedForward(dim=dim, dim_out=dim, dtype=dtype, device=device, operations=operations)
+
+        self.attn = Attention(
+            query_dim=dim,
+            dim_head=attention_head_dim,
+            heads=num_attention_heads,
+            out_dim=dim,
+            bias=True,
+            eps=eps,
+            dtype=dtype,
+            device=device,
+            operations=operations,
+        )
+
+    def _modulate(self, x, mod_params):
+        shift, scale, gate = mod_params.chunk(3, dim=-1)
+        return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1), gate.unsqueeze(1)
+
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        encoder_hidden_states: torch.Tensor,
+        encoder_hidden_states_mask: torch.Tensor,
+        temb: torch.Tensor,
+        image_rotary_emb: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        img_mod_params = self.img_mod(temb)
+        txt_mod_params = self.txt_mod(temb)
+        img_mod1, img_mod2 = img_mod_params.chunk(2, dim=-1)
+        txt_mod1, txt_mod2 = txt_mod_params.chunk(2, dim=-1)
+
+        img_normed = self.img_norm1(hidden_states)
+        img_modulated, img_gate1 = self._modulate(img_normed, img_mod1)
+        txt_normed = self.txt_norm1(encoder_hidden_states)
+        txt_modulated, txt_gate1 = self._modulate(txt_normed, txt_mod1)
+
+        img_attn_output, txt_attn_output = self.attn(
+            hidden_states=img_modulated,
+            encoder_hidden_states=txt_modulated,
+            encoder_hidden_states_mask=encoder_hidden_states_mask,
+            image_rotary_emb=image_rotary_emb,
+        )
+
+        hidden_states = hidden_states + img_gate1 * img_attn_output
+        encoder_hidden_states = encoder_hidden_states + txt_gate1 * txt_attn_output
+
+        img_normed2 = self.img_norm2(hidden_states)
+        img_modulated2, img_gate2 = self._modulate(img_normed2, img_mod2)
+        hidden_states = hidden_states + img_gate2 * self.img_mlp(img_modulated2)
+
+        txt_normed2 = self.txt_norm2(encoder_hidden_states)
+        txt_modulated2, txt_gate2 = self._modulate(txt_normed2, txt_mod2)
+        encoder_hidden_states = encoder_hidden_states + txt_gate2 * self.txt_mlp(txt_modulated2)
+
+        return encoder_hidden_states, hidden_states
+
+
+class LastLayer(nn.Module):
+    def __init__(
+        self,
+        embedding_dim: int,
+        conditioning_embedding_dim: int,
+        elementwise_affine=False,
+        eps=1e-6,
+        bias=True,
+        dtype=None, device=None, operations=None
+    ):
+        super().__init__()
+        self.silu = nn.SiLU()
+        self.linear = operations.Linear(conditioning_embedding_dim, embedding_dim * 2, bias=bias, dtype=dtype, device=device)
+        self.norm = operations.LayerNorm(embedding_dim, eps, elementwise_affine=False, bias=bias, dtype=dtype, device=device)
+
+    def forward(self, x: torch.Tensor, conditioning_embedding: torch.Tensor) -> torch.Tensor:
+        emb = self.linear(self.silu(conditioning_embedding))
+        scale, shift = torch.chunk(emb, 2, dim=1)
+        x = self.norm(x) * (1 + scale)[:, None, :] + shift[:, None, :]
+        return x
+
+
+class QwenImageTransformer2DModel(nn.Module):
+    def __init__(
+        self,
+        patch_size: int = 2,
+        in_channels: int = 64,
+        out_channels: Optional[int] = 16,
+        num_layers: int = 60,
+        attention_head_dim: int = 128,
+        num_attention_heads: int = 24,
+        joint_attention_dim: int = 3584,
+        pooled_projection_dim: int = 768,
+        guidance_embeds: bool = False,
+        axes_dims_rope: Tuple[int, int, int] = (16, 56, 56),
+        image_model=None,
+        dtype=None,
+        device=None,
+        operations=None,
+    ):
+        super().__init__()
+        self.dtype = dtype
+        self.patch_size = patch_size
+        self.out_channels = out_channels or in_channels
+        self.inner_dim = num_attention_heads * attention_head_dim
+
+        self.pe_embedder = EmbedND(dim=attention_head_dim, theta=10000, axes_dim=list(axes_dims_rope))
+
+        self.time_text_embed = QwenTimestepProjEmbeddings(
+            embedding_dim=self.inner_dim,
+            pooled_projection_dim=pooled_projection_dim,
+            dtype=dtype,
+            device=device,
+            operations=operations
+        )
+
+        self.txt_norm = operations.RMSNorm(joint_attention_dim, eps=1e-6, dtype=dtype, device=device)
+        self.img_in = operations.Linear(in_channels, self.inner_dim, dtype=dtype, device=device)
+        self.txt_in = operations.Linear(joint_attention_dim, self.inner_dim, dtype=dtype, device=device)
+
+        self.transformer_blocks = nn.ModuleList([
+            QwenImageTransformerBlock(
+                dim=self.inner_dim,
+                num_attention_heads=num_attention_heads,
+                attention_head_dim=attention_head_dim,
+                dtype=dtype,
+                device=device,
+                operations=operations
+            )
+            for _ in range(num_layers)
+        ])
+
+        self.norm_out = LastLayer(self.inner_dim, self.inner_dim, dtype=dtype, device=device, operations=operations)
+        self.proj_out = operations.Linear(self.inner_dim, patch_size * patch_size * self.out_channels, bias=True, dtype=dtype, device=device)
+        self.gradient_checkpointing = False
+
+    def pos_embeds(self, x, context):
+        bs, c, t, h, w = x.shape
+        patch_size = self.patch_size
+        h_len = ((h + (patch_size // 2)) // patch_size)
+        w_len = ((w + (patch_size // 2)) // patch_size)
+
+        img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
+        img_ids[:, :, 1] = img_ids[:, :, 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype).unsqueeze(1)
+        img_ids[:, :, 2] = img_ids[:, :, 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype).unsqueeze(0)
+        img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
+
+        txt_start = round(max(h_len, w_len))
+        txt_ids = torch.linspace(txt_start, txt_start + context.shape[1], steps=context.shape[1], device=x.device, dtype=x.dtype).reshape(1, -1, 1).repeat(bs, 1, 3)
+        ids = torch.cat((txt_ids, img_ids), dim=1)
+        return self.pe_embedder(ids).squeeze(1).unsqueeze(2).to(x.dtype)
+
+    def forward(
+        self,
+        x,
+        timesteps,
+        context,
+        attention_mask=None,
+        guidance: torch.Tensor = None,
+        **kwargs
+    ):
+        timestep = timesteps
+        encoder_hidden_states = context
+        encoder_hidden_states_mask = attention_mask
+
+        image_rotary_emb = self.pos_embeds(x, context)
+
+        hidden_states = comfy.ldm.common_dit.pad_to_patch_size(x, (1, self.patch_size, self.patch_size))
+        orig_shape = hidden_states.shape
+        hidden_states = hidden_states.view(orig_shape[0], orig_shape[1], orig_shape[-2] // 2, 2, orig_shape[-1] // 2, 2)
+        hidden_states = hidden_states.permute(0, 2, 4, 1, 3, 5)
+        hidden_states = hidden_states.reshape(orig_shape[0], (orig_shape[-2] // 2) * (orig_shape[-1] // 2), orig_shape[1] * 4)
+
+        hidden_states = self.img_in(hidden_states)
+        encoder_hidden_states = self.txt_norm(encoder_hidden_states)
+        encoder_hidden_states = self.txt_in(encoder_hidden_states)
+
+        if guidance is not None:
+            guidance = guidance * 1000
+
+        temb = (
+            self.time_text_embed(timestep, hidden_states)
+            if guidance is None
+            else self.time_text_embed(timestep, guidance, hidden_states)
+        )
+
+        for block in self.transformer_blocks:
+            encoder_hidden_states, hidden_states = block(
+                hidden_states=hidden_states,
+                encoder_hidden_states=encoder_hidden_states,
+                encoder_hidden_states_mask=encoder_hidden_states_mask,
+                temb=temb,
+                image_rotary_emb=image_rotary_emb,
+            )
+
+        hidden_states = self.norm_out(hidden_states, temb)
+        hidden_states = self.proj_out(hidden_states)
+
+        hidden_states = hidden_states.view(orig_shape[0], orig_shape[-2] // 2, orig_shape[-1] // 2, orig_shape[1], 2, 2)
+        hidden_states = hidden_states.permute(0, 3, 1, 4, 2, 5)
+        return hidden_states.reshape(orig_shape)[:, :, :, :x.shape[-2], :x.shape[-1]]
--- a/comfy/ldm/wan/model.py
+++ b/comfy/ldm/wan/model.py
@@ -146,6 +146,15 @@ WAN_CROSSATTENTION_CLASSES = {
 }


+def repeat_e(e, x):
+    repeats = 1
+    if e.shape[1] > 1:
+        repeats = x.shape[1] // e.shape[1]
+    if repeats == 1:
+        return e
+    return torch.repeat_interleave(e, repeats, dim=1)
+
+
 class WanAttentionBlock(nn.Module):

    def __init__(self,
@@ -202,20 +211,23 @@ class WanAttentionBlock(nn.Module):
        """
        # assert e.dtype == torch.float32

-        e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device) + e).chunk(6, dim=1)
+        if e.ndim < 4:
+            e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device) + e).chunk(6, dim=1)
+        else:
+            e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device).unsqueeze(0) + e).unbind(2)
        # assert e[0].dtype == torch.float32

        # self-attention
        y = self.self_attn(
-            self.norm1(x) * (1 + e[1]) + e[0],
+            self.norm1(x) * (1 + repeat_e(e[1], x)) + repeat_e(e[0], x),
            freqs)

-        x = x + y * e[2]
+        x = x + y * repeat_e(e[2], x)

        # cross-attention & ffn
        x = x + self.cross_attn(self.norm3(x), context, context_img_len=context_img_len)
-        y = self.ffn(self.norm2(x) * (1 + e[4]) + e[3])
-        x = x + y * e[5]
+        y = self.ffn(self.norm2(x) * (1 + repeat_e(e[4], x)) + repeat_e(e[3], x))
+        x = x + y * repeat_e(e[5], x)
        return x


@@ -325,8 +337,12 @@ class Head(nn.Module):
            e(Tensor): Shape [B, C]
        """
        # assert e.dtype == torch.float32
-        e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device) + e.unsqueeze(1)).chunk(2, dim=1)
-        x = (self.head(self.norm(x) * (1 + e[1]) + e[0]))
+        if e.ndim < 3:
+            e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device) + e.unsqueeze(1)).chunk(2, dim=1)
+        else:
+            e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device).unsqueeze(0) + e.unsqueeze(2)).unbind(2)
+
+        x = (self.head(self.norm(x) * (1 + repeat_e(e[1], x)) + repeat_e(e[0], x)))
        return x


@@ -506,8 +522,9 @@ class WanModel(torch.nn.Module):

        # time embeddings
        e = self.time_embedding(
-            sinusoidal_embedding_1d(self.freq_dim, t).to(dtype=x[0].dtype))
-        e0 = self.time_projection(e).unflatten(1, (6, self.dim))
+            sinusoidal_embedding_1d(self.freq_dim, t.flatten()).to(dtype=x[0].dtype))
+        e = e.reshape(t.shape[0], -1, e.shape[-1])
+        e0 = self.time_projection(e).unflatten(2, (6, self.dim))

        # context
        context = self.text_embedding(context)
@@ -752,8 +769,7 @@ class CameraWanModel(WanModel):
        # embeddings
        x = self.patch_embedding(x.float()).to(x.dtype)
        if self.control_adapter is not None and camera_conditions is not None:
-            x_camera = self.control_adapter(camera_conditions).to(x.dtype)
-            x = x + x_camera
+            x = x + self.control_adapter(camera_conditions).to(x.dtype)
        grid_sizes = x.shape[2:]
        x = x.flatten(2).transpose(1, 2)

--- a/comfy/ldm/wan/vae.py
+++ b/comfy/ldm/wan/vae.py
@@ -24,12 +24,17 @@ class CausalConv3d(ops.Conv3d):
                         self.padding[1], 2 * self.padding[0], 0)
        self.padding = (0, 0, 0)

-    def forward(self, x, cache_x=None):
+    def forward(self, x, cache_x=None, cache_list=None, cache_idx=None):
+        if cache_list is not None:
+            cache_x = cache_list[cache_idx]
+            cache_list[cache_idx] = None
+
        padding = list(self._padding)
        if cache_x is not None and self._padding[4] > 0:
            cache_x = cache_x.to(x.device)
            x = torch.cat([cache_x, x], dim=2)
            padding[4] -= cache_x.shape[2]
+            del cache_x
        x = F.pad(x, padding)

        return super().forward(x)
@@ -52,15 +57,6 @@ class RMS_norm(nn.Module):
            x, dim=(1 if self.channel_first else -1)) * self.scale * self.gamma.to(x) + (self.bias.to(x) if self.bias is not None else 0)


-class Upsample(nn.Upsample):
-
-    def forward(self, x):
-        """
-        Fix bfloat16 support for nearest neighbor interpolation.
-        """
-        return super().forward(x.float()).type_as(x)
-
-
 class Resample(nn.Module):

    def __init__(self, dim, mode):
@@ -73,11 +69,11 @@ class Resample(nn.Module):
        # layers
        if mode == 'upsample2d':
            self.resample = nn.Sequential(
-                Upsample(scale_factor=(2., 2.), mode='nearest-exact'),
+                nn.Upsample(scale_factor=(2., 2.), mode='nearest-exact'),
                ops.Conv2d(dim, dim // 2, 3, padding=1))
        elif mode == 'upsample3d':
            self.resample = nn.Sequential(
-                Upsample(scale_factor=(2., 2.), mode='nearest-exact'),
+                nn.Upsample(scale_factor=(2., 2.), mode='nearest-exact'),
                ops.Conv2d(dim, dim // 2, 3, padding=1))
            self.time_conv = CausalConv3d(
                dim, dim * 2, (3, 1, 1), padding=(1, 0, 0))
@@ -157,29 +153,6 @@ class Resample(nn.Module):
                    feat_idx[0] += 1
        return x

-    def init_weight(self, conv):
-        conv_weight = conv.weight
-        nn.init.zeros_(conv_weight)
-        c1, c2, t, h, w = conv_weight.size()
-        one_matrix = torch.eye(c1, c2)
-        init_matrix = one_matrix
-        nn.init.zeros_(conv_weight)
-        #conv_weight.data[:,:,-1,1,1] = init_matrix * 0.5
-        conv_weight.data[:, :, 1, 0, 0] = init_matrix  #* 0.5
-        conv.weight.data.copy_(conv_weight)
-        nn.init.zeros_(conv.bias.data)
-
-    def init_weight2(self, conv):
-        conv_weight = conv.weight.data
-        nn.init.zeros_(conv_weight)
-        c1, c2, t, h, w = conv_weight.size()
-        init_matrix = torch.eye(c1 // 2, c2)
-        #init_matrix = repeat(init_matrix, 'o ... -> (o 2) ...').permute(1,0,2).contiguous().reshape(c1,c2)
-        conv_weight[:c1 // 2, :, -1, 0, 0] = init_matrix
-        conv_weight[c1 // 2:, :, -1, 0, 0] = init_matrix
-        conv.weight.data.copy_(conv_weight)
-        nn.init.zeros_(conv.bias.data)
-

 class ResidualBlock(nn.Module):

@@ -198,7 +171,7 @@ class ResidualBlock(nn.Module):
            if in_dim != out_dim else nn.Identity()

    def forward(self, x, feat_cache=None, feat_idx=[0]):
-        h = self.shortcut(x)
+        old_x = x
        for layer in self.residual:
            if isinstance(layer, CausalConv3d) and feat_cache is not None:
                idx = feat_idx[0]
@@ -210,12 +183,12 @@ class ResidualBlock(nn.Module):
                            cache_x.device), cache_x
                    ],
                                        dim=2)
-                x = layer(x, feat_cache[idx])
+                x = layer(x, cache_list=feat_cache, cache_idx=idx)
                feat_cache[idx] = cache_x
                feat_idx[0] += 1
            else:
                x = layer(x)
-        return x + h
+        return x + self.shortcut(old_x)


 class AttentionBlock(nn.Module):
@@ -494,12 +467,6 @@ class WanVAE(nn.Module):
        self.decoder = Decoder3d(dim, z_dim, dim_mult, num_res_blocks,
                                 attn_scales, self.temperal_upsample, dropout)

-    def forward(self, x):
-        mu, log_var = self.encode(x)
-        z = self.reparameterize(mu, log_var)
-        x_recon = self.decode(z)
-        return x_recon, mu, log_var
-
    def encode(self, x):
        self.clear_cache()
        ## cache
@@ -545,18 +512,6 @@ class WanVAE(nn.Module):
        self.clear_cache()
        return out

-    def reparameterize(self, mu, log_var):
-        std = torch.exp(0.5 * log_var)
-        eps = torch.randn_like(std)
-        return eps * std + mu
-
-    def sample(self, imgs, deterministic=False):
-        mu, log_var = self.encode(imgs)
-        if deterministic:
-            return mu
-        std = torch.exp(0.5 * log_var.clamp(-30.0, 20.0))
-        return mu + std * torch.randn_like(std)
-
    def clear_cache(self):
        self._conv_num = count_conv3d(self.decoder)
        self._conv_idx = [0]
--- a/comfy/ldm/wan/vae2_2.py
+++ b/comfy/ldm/wan/vae2_2.py
@@ -0,0 +1,726 @@
+# original version: https://github.com/Wan-Video/Wan2.2/blob/main/wan/modules/vae2_2.py
+# Copyright 2024-2025 The Alibaba Wan Team Authors. All rights reserved.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from einops import rearrange
+from .vae import AttentionBlock, CausalConv3d, RMS_norm
+
+import comfy.ops
+ops = comfy.ops.disable_weight_init
+
+CACHE_T = 2
+
+
+class Resample(nn.Module):
+
+    def __init__(self, dim, mode):
+        assert mode in (
+            "none",
+            "upsample2d",
+            "upsample3d",
+            "downsample2d",
+            "downsample3d",
+        )
+        super().__init__()
+        self.dim = dim
+        self.mode = mode
+
+        # layers
+        if mode == "upsample2d":
+            self.resample = nn.Sequential(
+                nn.Upsample(scale_factor=(2.0, 2.0), mode="nearest-exact"),
+                ops.Conv2d(dim, dim, 3, padding=1),
+            )
+        elif mode == "upsample3d":
+            self.resample = nn.Sequential(
+                nn.Upsample(scale_factor=(2.0, 2.0), mode="nearest-exact"),
+                ops.Conv2d(dim, dim, 3, padding=1),
+                # ops.Conv2d(dim, dim//2, 3, padding=1)
+            )
+            self.time_conv = CausalConv3d(
+                dim, dim * 2, (3, 1, 1), padding=(1, 0, 0))
+        elif mode == "downsample2d":
+            self.resample = nn.Sequential(
+                nn.ZeroPad2d((0, 1, 0, 1)),
+                ops.Conv2d(dim, dim, 3, stride=(2, 2)))
+        elif mode == "downsample3d":
+            self.resample = nn.Sequential(
+                nn.ZeroPad2d((0, 1, 0, 1)),
+                ops.Conv2d(dim, dim, 3, stride=(2, 2)))
+            self.time_conv = CausalConv3d(
+                dim, dim, (3, 1, 1), stride=(2, 1, 1), padding=(0, 0, 0))
+        else:
+            self.resample = nn.Identity()
+
+    def forward(self, x, feat_cache=None, feat_idx=[0]):
+        b, c, t, h, w = x.size()
+        if self.mode == "upsample3d":
+            if feat_cache is not None:
+                idx = feat_idx[0]
+                if feat_cache[idx] is None:
+                    feat_cache[idx] = "Rep"
+                    feat_idx[0] += 1
+                else:
+                    cache_x = x[:, :, -CACHE_T:, :, :].clone()
+                    if (cache_x.shape[2] < 2 and feat_cache[idx] is not None and
+                            feat_cache[idx] != "Rep"):
+                        # cache last frame of last two chunk
+                        cache_x = torch.cat(
+                            [
+                                feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                                    cache_x.device),
+                                cache_x,
+                            ],
+                            dim=2,
+                        )
+                    if (cache_x.shape[2] < 2 and feat_cache[idx] is not None and
+                            feat_cache[idx] == "Rep"):
+                        cache_x = torch.cat(
+                            [
+                                torch.zeros_like(cache_x).to(cache_x.device),
+                                cache_x
+                            ],
+                            dim=2,
+                        )
+                    if feat_cache[idx] == "Rep":
+                        x = self.time_conv(x)
+                    else:
+                        x = self.time_conv(x, feat_cache[idx])
+                    feat_cache[idx] = cache_x
+                    feat_idx[0] += 1
+                    x = x.reshape(b, 2, c, t, h, w)
+                    x = torch.stack((x[:, 0, :, :, :, :], x[:, 1, :, :, :, :]),
+                                    3)
+                    x = x.reshape(b, c, t * 2, h, w)
+        t = x.shape[2]
+        x = rearrange(x, "b c t h w -> (b t) c h w")
+        x = self.resample(x)
+        x = rearrange(x, "(b t) c h w -> b c t h w", t=t)
+
+        if self.mode == "downsample3d":
+            if feat_cache is not None:
+                idx = feat_idx[0]
+                if feat_cache[idx] is None:
+                    feat_cache[idx] = x.clone()
+                    feat_idx[0] += 1
+                else:
+                    cache_x = x[:, :, -1:, :, :].clone()
+                    x = self.time_conv(
+                        torch.cat([feat_cache[idx][:, :, -1:, :, :], x], 2))
+                    feat_cache[idx] = cache_x
+                    feat_idx[0] += 1
+        return x
+
+
+class ResidualBlock(nn.Module):
+
+    def __init__(self, in_dim, out_dim, dropout=0.0):
+        super().__init__()
+        self.in_dim = in_dim
+        self.out_dim = out_dim
+
+        # layers
+        self.residual = nn.Sequential(
+            RMS_norm(in_dim, images=False),
+            nn.SiLU(),
+            CausalConv3d(in_dim, out_dim, 3, padding=1),
+            RMS_norm(out_dim, images=False),
+            nn.SiLU(),
+            nn.Dropout(dropout),
+            CausalConv3d(out_dim, out_dim, 3, padding=1),
+        )
+        self.shortcut = (
+            CausalConv3d(in_dim, out_dim, 1)
+            if in_dim != out_dim else nn.Identity())
+
+    def forward(self, x, feat_cache=None, feat_idx=[0]):
+        old_x = x
+        for layer in self.residual:
+            if isinstance(layer, CausalConv3d) and feat_cache is not None:
+                idx = feat_idx[0]
+                cache_x = x[:, :, -CACHE_T:, :, :].clone()
+                if cache_x.shape[2] < 2 and feat_cache[idx] is not None:
+                    # cache last frame of last two chunk
+                    cache_x = torch.cat(
+                        [
+                            feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                                cache_x.device),
+                            cache_x,
+                        ],
+                        dim=2,
+                    )
+                x = layer(x, cache_list=feat_cache, cache_idx=idx)
+                feat_cache[idx] = cache_x
+                feat_idx[0] += 1
+            else:
+                x = layer(x)
+        return x + self.shortcut(old_x)
+
+
+def patchify(x, patch_size):
+    if patch_size == 1:
+        return x
+    if x.dim() == 4:
+        x = rearrange(
+            x, "b c (h q) (w r) -> b (c r q) h w", q=patch_size, r=patch_size)
+    elif x.dim() == 5:
+        x = rearrange(
+            x,
+            "b c f (h q) (w r) -> b (c r q) f h w",
+            q=patch_size,
+            r=patch_size,
+        )
+    else:
+        raise ValueError(f"Invalid input shape: {x.shape}")
+
+    return x
+
+
+def unpatchify(x, patch_size):
+    if patch_size == 1:
+        return x
+
+    if x.dim() == 4:
+        x = rearrange(
+            x, "b (c r q) h w -> b c (h q) (w r)", q=patch_size, r=patch_size)
+    elif x.dim() == 5:
+        x = rearrange(
+            x,
+            "b (c r q) f h w -> b c f (h q) (w r)",
+            q=patch_size,
+            r=patch_size,
+        )
+    return x
+
+
+class AvgDown3D(nn.Module):
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        factor_t,
+        factor_s=1,
+    ):
+        super().__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.factor_t = factor_t
+        self.factor_s = factor_s
+        self.factor = self.factor_t * self.factor_s * self.factor_s
+
+        assert in_channels * self.factor % out_channels == 0
+        self.group_size = in_channels * self.factor // out_channels
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        pad_t = (self.factor_t - x.shape[2] % self.factor_t) % self.factor_t
+        pad = (0, 0, 0, 0, pad_t, 0)
+        x = F.pad(x, pad)
+        B, C, T, H, W = x.shape
+        x = x.view(
+            B,
+            C,
+            T // self.factor_t,
+            self.factor_t,
+            H // self.factor_s,
+            self.factor_s,
+            W // self.factor_s,
+            self.factor_s,
+        )
+        x = x.permute(0, 1, 3, 5, 7, 2, 4, 6).contiguous()
+        x = x.view(
+            B,
+            C * self.factor,
+            T // self.factor_t,
+            H // self.factor_s,
+            W // self.factor_s,
+        )
+        x = x.view(
+            B,
+            self.out_channels,
+            self.group_size,
+            T // self.factor_t,
+            H // self.factor_s,
+            W // self.factor_s,
+        )
+        x = x.mean(dim=2)
+        return x
+
+
+class DupUp3D(nn.Module):
+
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        factor_t,
+        factor_s=1,
+    ):
+        super().__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+
+        self.factor_t = factor_t
+        self.factor_s = factor_s
+        self.factor = self.factor_t * self.factor_s * self.factor_s
+
+        assert out_channels * self.factor % in_channels == 0
+        self.repeats = out_channels * self.factor // in_channels
+
+    def forward(self, x: torch.Tensor, first_chunk=False) -> torch.Tensor:
+        x = x.repeat_interleave(self.repeats, dim=1)
+        x = x.view(
+            x.size(0),
+            self.out_channels,
+            self.factor_t,
+            self.factor_s,
+            self.factor_s,
+            x.size(2),
+            x.size(3),
+            x.size(4),
+        )
+        x = x.permute(0, 1, 5, 2, 6, 3, 7, 4).contiguous()
+        x = x.view(
+            x.size(0),
+            self.out_channels,
+            x.size(2) * self.factor_t,
+            x.size(4) * self.factor_s,
+            x.size(6) * self.factor_s,
+        )
+        if first_chunk:
+            x = x[:, :, self.factor_t - 1:, :, :]
+        return x
+
+
+class Down_ResidualBlock(nn.Module):
+
+    def __init__(self,
+                 in_dim,
+                 out_dim,
+                 dropout,
+                 mult,
+                 temperal_downsample=False,
+                 down_flag=False):
+        super().__init__()
+
+        # Shortcut path with downsample
+        self.avg_shortcut = AvgDown3D(
+            in_dim,
+            out_dim,
+            factor_t=2 if temperal_downsample else 1,
+            factor_s=2 if down_flag else 1,
+        )
+
+        # Main path with residual blocks and downsample
+        downsamples = []
+        for _ in range(mult):
+            downsamples.append(ResidualBlock(in_dim, out_dim, dropout))
+            in_dim = out_dim
+
+        # Add the final downsample block
+        if down_flag:
+            mode = "downsample3d" if temperal_downsample else "downsample2d"
+            downsamples.append(Resample(out_dim, mode=mode))
+
+        self.downsamples = nn.Sequential(*downsamples)
+
+    def forward(self, x, feat_cache=None, feat_idx=[0]):
+        x_copy = x
+        for module in self.downsamples:
+            x = module(x, feat_cache, feat_idx)
+
+        return x + self.avg_shortcut(x_copy)
+
+
+class Up_ResidualBlock(nn.Module):
+
+    def __init__(self,
+                 in_dim,
+                 out_dim,
+                 dropout,
+                 mult,
+                 temperal_upsample=False,
+                 up_flag=False):
+        super().__init__()
+        # Shortcut path with upsample
+        if up_flag:
+            self.avg_shortcut = DupUp3D(
+                in_dim,
+                out_dim,
+                factor_t=2 if temperal_upsample else 1,
+                factor_s=2 if up_flag else 1,
+            )
+        else:
+            self.avg_shortcut = None
+
+        # Main path with residual blocks and upsample
+        upsamples = []
+        for _ in range(mult):
+            upsamples.append(ResidualBlock(in_dim, out_dim, dropout))
+            in_dim = out_dim
+
+        # Add the final upsample block
+        if up_flag:
+            mode = "upsample3d" if temperal_upsample else "upsample2d"
+            upsamples.append(Resample(out_dim, mode=mode))
+
+        self.upsamples = nn.Sequential(*upsamples)
+
+    def forward(self, x, feat_cache=None, feat_idx=[0], first_chunk=False):
+        x_main = x
+        for module in self.upsamples:
+            x_main = module(x_main, feat_cache, feat_idx)
+        if self.avg_shortcut is not None:
+            x_shortcut = self.avg_shortcut(x, first_chunk)
+            return x_main + x_shortcut
+        else:
+            return x_main
+
+
+class Encoder3d(nn.Module):
+
+    def __init__(
+        self,
+        dim=128,
+        z_dim=4,
+        dim_mult=[1, 2, 4, 4],
+        num_res_blocks=2,
+        attn_scales=[],
+        temperal_downsample=[True, True, False],
+        dropout=0.0,
+    ):
+        super().__init__()
+        self.dim = dim
+        self.z_dim = z_dim
+        self.dim_mult = dim_mult
+        self.num_res_blocks = num_res_blocks
+        self.attn_scales = attn_scales
+        self.temperal_downsample = temperal_downsample
+
+        # dimensions
+        dims = [dim * u for u in [1] + dim_mult]
+        scale = 1.0
+
+        # init block
+        self.conv1 = CausalConv3d(12, dims[0], 3, padding=1)
+
+        # downsample blocks
+        downsamples = []
+        for i, (in_dim, out_dim) in enumerate(zip(dims[:-1], dims[1:])):
+            t_down_flag = (
+                temperal_downsample[i]
+                if i < len(temperal_downsample) else False)
+            downsamples.append(
+                Down_ResidualBlock(
+                    in_dim=in_dim,
+                    out_dim=out_dim,
+                    dropout=dropout,
+                    mult=num_res_blocks,
+                    temperal_downsample=t_down_flag,
+                    down_flag=i != len(dim_mult) - 1,
+                ))
+            scale /= 2.0
+        self.downsamples = nn.Sequential(*downsamples)
+
+        # middle blocks
+        self.middle = nn.Sequential(
+            ResidualBlock(out_dim, out_dim, dropout),
+            AttentionBlock(out_dim),
+            ResidualBlock(out_dim, out_dim, dropout),
+        )
+
+        # # output blocks
+        self.head = nn.Sequential(
+            RMS_norm(out_dim, images=False),
+            nn.SiLU(),
+            CausalConv3d(out_dim, z_dim, 3, padding=1),
+        )
+
+    def forward(self, x, feat_cache=None, feat_idx=[0]):
+
+        if feat_cache is not None:
+            idx = feat_idx[0]
+            cache_x = x[:, :, -CACHE_T:, :, :].clone()
+            if cache_x.shape[2] < 2 and feat_cache[idx] is not None:
+                cache_x = torch.cat(
+                    [
+                        feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                            cache_x.device),
+                        cache_x,
+                    ],
+                    dim=2,
+                )
+            x = self.conv1(x, feat_cache[idx])
+            feat_cache[idx] = cache_x
+            feat_idx[0] += 1
+        else:
+            x = self.conv1(x)
+
+        ## downsamples
+        for layer in self.downsamples:
+            if feat_cache is not None:
+                x = layer(x, feat_cache, feat_idx)
+            else:
+                x = layer(x)
+
+        ## middle
+        for layer in self.middle:
+            if isinstance(layer, ResidualBlock) and feat_cache is not None:
+                x = layer(x, feat_cache, feat_idx)
+            else:
+                x = layer(x)
+
+        ## head
+        for layer in self.head:
+            if isinstance(layer, CausalConv3d) and feat_cache is not None:
+                idx = feat_idx[0]
+                cache_x = x[:, :, -CACHE_T:, :, :].clone()
+                if cache_x.shape[2] < 2 and feat_cache[idx] is not None:
+                    cache_x = torch.cat(
+                        [
+                            feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                                cache_x.device),
+                            cache_x,
+                        ],
+                        dim=2,
+                    )
+                x = layer(x, feat_cache[idx])
+                feat_cache[idx] = cache_x
+                feat_idx[0] += 1
+            else:
+                x = layer(x)
+
+        return x
+
+
+class Decoder3d(nn.Module):
+
+    def __init__(
+        self,
+        dim=128,
+        z_dim=4,
+        dim_mult=[1, 2, 4, 4],
+        num_res_blocks=2,
+        attn_scales=[],
+        temperal_upsample=[False, True, True],
+        dropout=0.0,
+    ):
+        super().__init__()
+        self.dim = dim
+        self.z_dim = z_dim
+        self.dim_mult = dim_mult
+        self.num_res_blocks = num_res_blocks
+        self.attn_scales = attn_scales
+        self.temperal_upsample = temperal_upsample
+
+        # dimensions
+        dims = [dim * u for u in [dim_mult[-1]] + dim_mult[::-1]]
+        # init block
+        self.conv1 = CausalConv3d(z_dim, dims[0], 3, padding=1)
+
+        # middle blocks
+        self.middle = nn.Sequential(
+            ResidualBlock(dims[0], dims[0], dropout),
+            AttentionBlock(dims[0]),
+            ResidualBlock(dims[0], dims[0], dropout),
+        )
+
+        # upsample blocks
+        upsamples = []
+        for i, (in_dim, out_dim) in enumerate(zip(dims[:-1], dims[1:])):
+            t_up_flag = temperal_upsample[i] if i < len(
+                temperal_upsample) else False
+            upsamples.append(
+                Up_ResidualBlock(
+                    in_dim=in_dim,
+                    out_dim=out_dim,
+                    dropout=dropout,
+                    mult=num_res_blocks + 1,
+                    temperal_upsample=t_up_flag,
+                    up_flag=i != len(dim_mult) - 1,
+                ))
+        self.upsamples = nn.Sequential(*upsamples)
+
+        # output blocks
+        self.head = nn.Sequential(
+            RMS_norm(out_dim, images=False),
+            nn.SiLU(),
+            CausalConv3d(out_dim, 12, 3, padding=1),
+        )
+
+    def forward(self, x, feat_cache=None, feat_idx=[0], first_chunk=False):
+        if feat_cache is not None:
+            idx = feat_idx[0]
+            cache_x = x[:, :, -CACHE_T:, :, :].clone()
+            if cache_x.shape[2] < 2 and feat_cache[idx] is not None:
+                cache_x = torch.cat(
+                    [
+                        feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                            cache_x.device),
+                        cache_x,
+                    ],
+                    dim=2,
+                )
+            x = self.conv1(x, feat_cache[idx])
+            feat_cache[idx] = cache_x
+            feat_idx[0] += 1
+        else:
+            x = self.conv1(x)
+
+        for layer in self.middle:
+            if isinstance(layer, ResidualBlock) and feat_cache is not None:
+                x = layer(x, feat_cache, feat_idx)
+            else:
+                x = layer(x)
+
+        ## upsamples
+        for layer in self.upsamples:
+            if feat_cache is not None:
+                x = layer(x, feat_cache, feat_idx, first_chunk)
+            else:
+                x = layer(x)
+
+        ## head
+        for layer in self.head:
+            if isinstance(layer, CausalConv3d) and feat_cache is not None:
+                idx = feat_idx[0]
+                cache_x = x[:, :, -CACHE_T:, :, :].clone()
+                if cache_x.shape[2] < 2 and feat_cache[idx] is not None:
+                    cache_x = torch.cat(
+                        [
+                            feat_cache[idx][:, :, -1, :, :].unsqueeze(2).to(
+                                cache_x.device),
+                            cache_x,
+                        ],
+                        dim=2,
+                    )
+                x = layer(x, feat_cache[idx])
+                feat_cache[idx] = cache_x
+                feat_idx[0] += 1
+            else:
+                x = layer(x)
+        return x
+
+
+def count_conv3d(model):
+    count = 0
+    for m in model.modules():
+        if isinstance(m, CausalConv3d):
+            count += 1
+    return count
+
+
+class WanVAE(nn.Module):
+
+    def __init__(
+        self,
+        dim=160,
+        dec_dim=256,
+        z_dim=16,
+        dim_mult=[1, 2, 4, 4],
+        num_res_blocks=2,
+        attn_scales=[],
+        temperal_downsample=[True, True, False],
+        dropout=0.0,
+    ):
+        super().__init__()
+        self.dim = dim
+        self.z_dim = z_dim
+        self.dim_mult = dim_mult
+        self.num_res_blocks = num_res_blocks
+        self.attn_scales = attn_scales
+        self.temperal_downsample = temperal_downsample
+        self.temperal_upsample = temperal_downsample[::-1]
+
+        # modules
+        self.encoder = Encoder3d(
+            dim,
+            z_dim * 2,
+            dim_mult,
+            num_res_blocks,
+            attn_scales,
+            self.temperal_downsample,
+            dropout,
+        )
+        self.conv1 = CausalConv3d(z_dim * 2, z_dim * 2, 1)
+        self.conv2 = CausalConv3d(z_dim, z_dim, 1)
+        self.decoder = Decoder3d(
+            dec_dim,
+            z_dim,
+            dim_mult,
+            num_res_blocks,
+            attn_scales,
+            self.temperal_upsample,
+            dropout,
+        )
+
+    def encode(self, x):
+        self.clear_cache()
+        x = patchify(x, patch_size=2)
+        t = x.shape[2]
+        iter_ = 1 + (t - 1) // 4
+        for i in range(iter_):
+            self._enc_conv_idx = [0]
+            if i == 0:
+                out = self.encoder(
+                    x[:, :, :1, :, :],
+                    feat_cache=self._enc_feat_map,
+                    feat_idx=self._enc_conv_idx,
+                )
+            else:
+                out_ = self.encoder(
+                    x[:, :, 1 + 4 * (i - 1):1 + 4 * i, :, :],
+                    feat_cache=self._enc_feat_map,
+                    feat_idx=self._enc_conv_idx,
+                )
+                out = torch.cat([out, out_], 2)
+        mu, log_var = self.conv1(out).chunk(2, dim=1)
+        self.clear_cache()
+        return mu
+
+    def decode(self, z):
+        self.clear_cache()
+        iter_ = z.shape[2]
+        x = self.conv2(z)
+        for i in range(iter_):
+            self._conv_idx = [0]
+            if i == 0:
+                out = self.decoder(
+                    x[:, :, i:i + 1, :, :],
+                    feat_cache=self._feat_map,
+                    feat_idx=self._conv_idx,
+                    first_chunk=True,
+                )
+            else:
+                out_ = self.decoder(
+                    x[:, :, i:i + 1, :, :],
+                    feat_cache=self._feat_map,
+                    feat_idx=self._conv_idx,
+                )
+                out = torch.cat([out, out_], 2)
+        out = unpatchify(out, patch_size=2)
+        self.clear_cache()
+        return out
+
+    def reparameterize(self, mu, log_var):
+        std = torch.exp(0.5 * log_var)
+        eps = torch.randn_like(std)
+        return eps * std + mu
+
+    def sample(self, imgs, deterministic=False):
+        mu, log_var = self.encode(imgs)
+        if deterministic:
+            return mu
+        std = torch.exp(0.5 * log_var.clamp(-30.0, 20.0))
+        return mu + std * torch.randn_like(std)
+
+    def clear_cache(self):
+        self._conv_num = count_conv3d(self.decoder)
+        self._conv_idx = [0]
+        self._feat_map = [None] * self._conv_num
+        # cache encode
+        self._enc_conv_num = count_conv3d(self.encoder)
+        self._enc_conv_idx = [0]
+        self._enc_feat_map = [None] * self._enc_conv_num
--- a/comfy/lora.py
+++ b/comfy/lora.py
@@ -293,6 +293,16 @@ def model_lora_keys_unet(model, key_map={}):
                key_lora = k[len("diffusion_model."):-len(".weight")]
                key_map["{}".format(key_lora)] = k

+    if isinstance(model, comfy.model_base.QwenImage):
+        for k in sdk:
+            if k.startswith("diffusion_model.") and k.endswith(".weight"): #QwenImage lora format
+                key_lora = k[len("diffusion_model."):-len(".weight")]
+                # Direct mapping for transformer_blocks format (QwenImage LoRA format)
+                key_map["{}".format(key_lora)] = k
+                # Support transformer prefix format
+                key_map["transformer.{}".format(key_lora)] = k
+                key_map["lycoris_{}".format(key_lora.replace(".", "_"))] = k #SimpleTuner lycoris format
+
    return key_map


--- a/comfy/model_base.py
+++ b/comfy/model_base.py
@@ -42,6 +42,7 @@ import comfy.ldm.hidream.model
 import comfy.ldm.chroma.model
 import comfy.ldm.ace.model
 import comfy.ldm.omnigen.omnigen2
+import comfy.ldm.qwen_image.model

 import comfy.model_management
 import comfy.patcher_extension
@@ -106,10 +107,12 @@ def model_sampling(model_config, model_type):
    return ModelSampling(model_config)


-def convert_tensor(extra, dtype):
+def convert_tensor(extra, dtype, device):
    if hasattr(extra, "dtype"):
        if extra.dtype != torch.int and extra.dtype != torch.long:
-            extra = extra.to(dtype)
+            extra = comfy.model_management.cast_to_device(extra, device, dtype)
+        else:
+            extra = comfy.model_management.cast_to_device(extra, device, None)
    return extra


@@ -160,7 +163,7 @@ class BaseModel(torch.nn.Module):
        xc = self.model_sampling.calculate_input(sigma, x)

        if c_concat is not None:
-            xc = torch.cat([xc] + [c_concat], dim=1)
+            xc = torch.cat([xc] + [comfy.model_management.cast_to_device(c_concat, xc.device, xc.dtype)], dim=1)

        context = c_crossattn
        dtype = self.get_dtype()
@@ -169,20 +172,21 @@ class BaseModel(torch.nn.Module):
            dtype = self.manual_cast_dtype

        xc = xc.to(dtype)
+        device = xc.device
        t = self.model_sampling.timestep(t).float()
        if context is not None:
-            context = context.to(dtype)
+            context = comfy.model_management.cast_to_device(context, device, dtype)

        extra_conds = {}
        for o in kwargs:
            extra = kwargs[o]

            if hasattr(extra, "dtype"):
-                extra = convert_tensor(extra, dtype)
+                extra = convert_tensor(extra, dtype, device)
            elif isinstance(extra, list):
                ex = []
                for ext in extra:
-                    ex.append(convert_tensor(ext, dtype))
+                    ex.append(convert_tensor(ext, dtype, device))
                extra = ex
            extra_conds[o] = extra

@@ -398,7 +402,7 @@ class SD21UNCLIP(BaseModel):
        unclip_conditioning = kwargs.get("unclip_conditioning", None)
        device = kwargs["device"]
        if unclip_conditioning is None:
-            return torch.zeros((1, self.adm_channels))
+            return torch.zeros((1, self.adm_channels), device=device)
        else:
            return unclip_adm(unclip_conditioning, device, self.noise_augmentor, kwargs.get("unclip_noise_augment_merge", 0.05), kwargs.get("seed", 0) - 10)

@@ -612,9 +616,11 @@ class IP2P:

        if image is None:
            image = torch.zeros_like(noise)
+        else:
+            image = image.to(device=device)

        if image.shape[1:] != noise.shape[1:]:
-            image = utils.common_upscale(image.to(device), noise.shape[-1], noise.shape[-2], "bilinear", "center")
+            image = utils.common_upscale(image, noise.shape[-1], noise.shape[-2], "bilinear", "center")

        image = utils.resize_to_batch_size(image, noise.shape[0])
        return self.process_ip2p_image_in(image)
@@ -693,7 +699,7 @@ class StableCascade_B(BaseModel):
        #size of prior doesn't really matter if zeros because it gets resized but I still want it to get batched
        prior = kwargs.get("stable_cascade_prior", torch.zeros((1, 16, (noise.shape[2] * 4) // 42, (noise.shape[3] * 4) // 42), dtype=noise.dtype, layout=noise.layout, device=noise.device))

-        out["effnet"] = comfy.conds.CONDRegular(prior)
+        out["effnet"] = comfy.conds.CONDRegular(prior.to(device=noise.device))
        out["sca"] = comfy.conds.CONDRegular(torch.zeros((1,)))
        return out

@@ -1097,8 +1103,9 @@ class WAN21(BaseModel):
                image[:, i: i + 16] = self.process_latent_in(image[:, i: i + 16])
            image = utils.resize_to_batch_size(image, noise.shape[0])

-        if not self.image_to_video or extra_channels == image.shape[1]:
-            return image
+        if extra_channels != image.shape[1] + 4:
+            if not self.image_to_video or extra_channels == image.shape[1]:
+                return image

        if image.shape[1] > (extra_channels - 4):
            image = image[:, :(extra_channels - 4)]
@@ -1157,10 +1164,10 @@ class WAN21_Vace(WAN21):

        vace_frames_out = []
        for j in range(len(vace_frames)):
-            vf = vace_frames[j].clone()
+            vf = vace_frames[j].to(device=noise.device, dtype=noise.dtype, copy=True)
            for i in range(0, vf.shape[1], 16):
                vf[:, i:i + 16] = self.process_latent_in(vf[:, i:i + 16])
-            vf = torch.cat([vf, mask[j]], dim=1)
+            vf = torch.cat([vf, mask[j].to(device=noise.device, dtype=noise.dtype)], dim=1)
            vace_frames_out.append(vf)

        vace_frames = torch.stack(vace_frames_out, dim=1)
@@ -1182,6 +1189,31 @@ class WAN21_Camera(WAN21):
            out['camera_conditions'] = comfy.conds.CONDRegular(camera_conditions)
        return out

+class WAN22(BaseModel):
+    def __init__(self, model_config, model_type=ModelType.FLOW, image_to_video=False, device=None):
+        super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.wan.model.WanModel)
+        self.image_to_video = image_to_video
+
+    def extra_conds(self, **kwargs):
+        out = super().extra_conds(**kwargs)
+        cross_attn = kwargs.get("cross_attn", None)
+        if cross_attn is not None:
+            out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)
+
+        denoise_mask = kwargs.get("concat_mask", kwargs.get("denoise_mask", None))
+        if denoise_mask is not None:
+            out["denoise_mask"] = comfy.conds.CONDRegular(denoise_mask)
+        return out
+
+    def process_timestep(self, timestep, x, denoise_mask=None, **kwargs):
+        if denoise_mask is None:
+            return timestep
+        temp_ts = (torch.mean(denoise_mask[:, :, :, :, :], dim=(1, 3, 4), keepdim=True) * timestep.view([timestep.shape[0]] + [1] * (denoise_mask.ndim - 1))).reshape(timestep.shape[0], -1)
+        return temp_ts
+
+    def scale_latent_inpaint(self, sigma, noise, latent_image, **kwargs):
+        return latent_image
+
 class Hunyuan3Dv2(BaseModel):
    def __init__(self, model_config, model_type=ModelType.FLOW, device=None):
        super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.hunyuan3d.model.Hunyuan3Dv2)
@@ -1277,3 +1309,14 @@ class Omnigen2(BaseModel):
        if ref_latents is not None:
            out['ref_latents'] = list([1, 16, sum(map(lambda a: math.prod(a.size()), ref_latents)) // 16])
        return out
+
+class QwenImage(BaseModel):
+    def __init__(self, model_config, model_type=ModelType.FLUX, device=None):
+        super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.qwen_image.model.QwenImageTransformer2DModel)
+
+    def extra_conds(self, **kwargs):
+        out = super().extra_conds(**kwargs)
+        cross_attn = kwargs.get("cross_attn", None)
+        if cross_attn is not None:
+            out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)
+        return out
--- a/comfy/model_detection.py
+++ b/comfy/model_detection.py
@@ -346,7 +346,9 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
        dit_config = {}
        dit_config["image_model"] = "wan2.1"
        dim = state_dict['{}head.modulation'.format(key_prefix)].shape[-1]
+        out_dim = state_dict['{}head.head.weight'.format(key_prefix)].shape[0] // 4
        dit_config["dim"] = dim
+        dit_config["out_dim"] = out_dim
        dit_config["num_heads"] = dim // 128
        dit_config["ffn_dim"] = state_dict['{}blocks.0.ffn.0.weight'.format(key_prefix)].shape[0]
        dit_config["num_layers"] = count_blocks(state_dict_keys, '{}blocks.'.format(key_prefix) + '{}.')
@@ -479,6 +481,11 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
        dit_config["timestep_scale"] = 1000.0
        return dit_config

+    if '{}txt_norm.weight'.format(key_prefix) in state_dict_keys:  # Qwen Image
+        dit_config = {}
+        dit_config["image_model"] = "qwen_image"
+        return dit_config
+
    if '{}input_blocks.0.0.weight'.format(key_prefix) not in state_dict_keys:
        return None

@@ -865,7 +872,7 @@ def convert_diffusers_mmdit(state_dict, output_prefix=""):
        depth_single_blocks = count_blocks(state_dict, 'single_transformer_blocks.{}.')
        hidden_size = state_dict["x_embedder.bias"].shape[0]
        sd_map = comfy.utils.flux_to_diffusers({"depth": depth, "depth_single_blocks": depth_single_blocks, "hidden_size": hidden_size}, output_prefix=output_prefix)
-    elif 'transformer_blocks.0.attn.add_q_proj.weight' in state_dict: #SD3
+    elif 'transformer_blocks.0.attn.add_q_proj.weight' in state_dict and 'pos_embed.proj.weight' in state_dict: #SD3
        num_blocks = count_blocks(state_dict, 'transformer_blocks.{}.')
        depth = state_dict["pos_embed.proj.weight"].shape[0] // 64
        sd_map = comfy.utils.mmdit_to_diffusers({"depth": depth, "num_blocks": num_blocks}, output_prefix=output_prefix)
--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@@ -101,7 +101,7 @@ if args.directml is not None:
    lowvram_available = False #TODO: need to find a way to get free memory in directml before this can be enabled by default.

 try:
-    import intel_extension_for_pytorch as ipex
+    import intel_extension_for_pytorch as ipex  # noqa: F401
    _ = torch.xpu.device_count()
    xpu_available = xpu_available or torch.xpu.is_available()
 except:
@@ -128,6 +128,11 @@ try:
 except:
    mlu_available = False

+try:
+    ixuca_available = hasattr(torch, "corex")
+except:
+    ixuca_available = False
+
 if args.cpu:
    cpu_state = CPUState.CPU

@@ -151,6 +156,12 @@ def is_mlu():
        return True
    return False

+def is_ixuca():
+    global ixuca_available
+    if ixuca_available:
+        return True
+    return False
+
 def get_torch_device():
    global directml_enabled
    global cpu_state
@@ -186,8 +197,9 @@ def get_total_memory(dev=None, torch_total_too=False):
        elif is_intel_xpu():
            stats = torch.xpu.memory_stats(dev)
            mem_reserved = stats['reserved_bytes.all.current']
+            mem_total_xpu = torch.xpu.get_device_properties(dev).total_memory
            mem_total_torch = mem_reserved
-            mem_total = torch.xpu.get_device_properties(dev).total_memory
+            mem_total = mem_total_xpu
        elif is_ascend_npu():
            stats = torch.npu.memory_stats(dev)
            mem_reserved = stats['reserved_bytes.all.current']
@@ -288,7 +300,7 @@ try:
        if torch_version_numeric[0] >= 2:
            if ENABLE_PYTORCH_ATTENTION == False and args.use_split_cross_attention == False and args.use_quad_cross_attention == False:
                ENABLE_PYTORCH_ATTENTION = True
-    if is_intel_xpu() or is_ascend_npu() or is_mlu():
+    if is_intel_xpu() or is_ascend_npu() or is_mlu() or is_ixuca():
        if args.use_split_cross_attention == False and args.use_quad_cross_attention == False:
            ENABLE_PYTORCH_ATTENTION = True
 except:
@@ -307,8 +319,11 @@ try:
        logging.info("ROCm version: {}".format(rocm_version))
        if args.use_split_cross_attention == False and args.use_quad_cross_attention == False:
            if torch_version_numeric >= (2, 7):  # works on 2.6 but doesn't actually seem to improve much
-                if any((a in arch) for a in ["gfx90a", "gfx942", "gfx1100", "gfx1101", "gfx1151"]):  # TODO: more arches, TODO: gfx1201 and gfx950
+                if any((a in arch) for a in ["gfx90a", "gfx942", "gfx1100", "gfx1101", "gfx1151"]):  # TODO: more arches, TODO: gfx950
                    ENABLE_PYTORCH_ATTENTION = True
+#            if torch_version_numeric >= (2, 8):
+#                if any((a in arch) for a in ["gfx1201"]):
+#                    ENABLE_PYTORCH_ATTENTION = True
        if torch_version_numeric >= (2, 7) and rocm_version >= (6, 4):
            if any((a in arch) for a in ["gfx1201", "gfx942", "gfx950"]):  # TODO: more arches
                SUPPORT_FP8_OPS = True
@@ -325,7 +340,7 @@ if ENABLE_PYTORCH_ATTENTION:

 PRIORITIZE_FP16 = False  # TODO: remove and replace with something that shows exactly which dtype is faster than the other
 try:
-    if is_nvidia() and PerformanceFeature.Fp16Accumulation in args.fast:
+    if (is_nvidia() or is_amd()) and PerformanceFeature.Fp16Accumulation in args.fast:
        torch.backends.cuda.matmul.allow_fp16_accumulation = True
        PRIORITIZE_FP16 = True  # TODO: limit to cards where it actually boosts performance
        logging.info("Enabled fp16 accumulation.")
@@ -377,6 +392,8 @@ def get_torch_device_name(device):
            except:
                allocator_backend = ""
            return "{} {} : {}".format(device, torch.cuda.get_device_name(device), allocator_backend)
+        elif device.type == "xpu":
+            return "{} {}".format(device, torch.xpu.get_device_name(device))
        else:
            return "{}".format(device.type)
    elif is_intel_xpu():
@@ -512,6 +529,8 @@ WINDOWS = any(platform.win32_ver())
 EXTRA_RESERVED_VRAM = 400 * 1024 * 1024
 if WINDOWS:
    EXTRA_RESERVED_VRAM = 600 * 1024 * 1024 #Windows is higher because of the shared vram issue
+    if total_vram > (15 * 1024):  # more extra reserved vram on 16GB+ cards
+        EXTRA_RESERVED_VRAM += 100 * 1024 * 1024

 if args.reserve_vram is not None:
    EXTRA_RESERVED_VRAM = args.reserve_vram * 1024 * 1024 * 1024
@@ -876,6 +895,7 @@ def vae_dtype(device=None, allowed_dtypes=[]):
            return d

        # NOTE: bfloat16 seems to work on AMD for the VAE but is extremely slow in some cases compared to fp32
+        # slowness still a problem on pytorch nightly 2.9.0.dev20250720+rocm6.4 tested on RDNA3
        if d == torch.bfloat16 and (not is_amd()) and should_use_bf16(device):
            return d

@@ -929,7 +949,7 @@ def device_supports_non_blocking(device):
    if is_device_mps(device):
        return False #pytorch bug? mps doesn't support non blocking
    if is_intel_xpu():
-        return False
+        return True
    if args.deterministic: #TODO: figure out why deterministic breaks non blocking from gpu to cpu (previews)
        return False
    if directml_enabled:
@@ -968,6 +988,8 @@ def get_offload_stream(device):
        stream_counter = (stream_counter + 1) % len(ss)
        if is_device_cuda(device):
            ss[stream_counter].wait_stream(torch.cuda.current_stream())
+        elif is_device_xpu(device):
+            ss[stream_counter].wait_stream(torch.xpu.current_stream())
        stream_counters[device] = stream_counter
        return s
    elif is_device_cuda(device):
@@ -979,6 +1001,15 @@ def get_offload_stream(device):
        stream_counter = (stream_counter + 1) % len(ss)
        stream_counters[device] = stream_counter
        return s
+    elif is_device_xpu(device):
+        ss = []
+        for k in range(NUM_STREAMS):
+            ss.append(torch.xpu.Stream(device=device, priority=0))
+        STREAMS[device] = ss
+        s = ss[stream_counter]
+        stream_counter = (stream_counter + 1) % len(ss)
+        stream_counters[device] = stream_counter
+        return s
    return None

 def sync_stream(device, stream):
@@ -986,6 +1017,8 @@ def sync_stream(device, stream):
        return
    if is_device_cuda(device):
        torch.cuda.current_stream().wait_stream(stream)
+    elif is_device_xpu(device):
+        torch.xpu.current_stream().wait_stream(stream)

 def cast_to(weight, dtype=None, device=None, non_blocking=False, copy=False, stream=None):
    if device is None or weight.device == device:
@@ -1027,6 +1060,8 @@ def xformers_enabled():
        return False
    if is_mlu():
        return False
+    if is_ixuca():
+        return False
    if directml_enabled:
        return False
    return XFORMERS_IS_AVAILABLE
@@ -1062,6 +1097,8 @@ def pytorch_attention_flash_attention():
            return True
        if is_amd():
            return True #if you have pytorch attention enabled on AMD it probably supports at least mem efficient attention
+        if is_ixuca():
+            return True
    return False

 def force_upcast_attention_dtype():
@@ -1092,8 +1129,8 @@ def get_free_memory(dev=None, torch_free_too=False):
            stats = torch.xpu.memory_stats(dev)
            mem_active = stats['active_bytes.all.current']
            mem_reserved = stats['reserved_bytes.all.current']
-            mem_free_torch = mem_reserved - mem_active
            mem_free_xpu = torch.xpu.get_device_properties(dev).total_memory - mem_reserved
+            mem_free_torch = mem_reserved - mem_active
            mem_free_total = mem_free_xpu + mem_free_torch
        elif is_ascend_npu():
            stats = torch.npu.memory_stats(dev)
@@ -1142,6 +1179,9 @@ def is_device_cpu(device):
 def is_device_mps(device):
    return is_device_type(device, 'mps')

+def is_device_xpu(device):
+    return is_device_type(device, 'xpu')
+
 def is_device_cuda(device):
    return is_device_type(device, 'cuda')

@@ -1173,7 +1213,10 @@ def should_use_fp16(device=None, model_params=0, prioritize_performance=True, ma
        return False

    if is_intel_xpu():
-        return True
+        if torch_version_numeric < (2, 3):
+            return True
+        else:
+            return torch.xpu.get_device_properties(device).has_fp16

    if is_ascend_npu():
        return True
@@ -1181,6 +1224,9 @@ def should_use_fp16(device=None, model_params=0, prioritize_performance=True, ma
    if is_mlu():
        return True

+    if is_ixuca():
+        return True
+
    if torch.version.hip:
        return True

@@ -1236,11 +1282,17 @@ def should_use_bf16(device=None, model_params=0, prioritize_performance=True, ma
        return False

    if is_intel_xpu():
-        return True
+        if torch_version_numeric < (2, 6):
+            return True
+        else:
+            return torch.xpu.get_device_capability(device)['has_bfloat16_conversions']

    if is_ascend_npu():
        return True

+    if is_ixuca():
+        return True
+
    if is_amd():
        arch = torch.cuda.get_device_properties(device).gcnArchName
        if any((a in arch) for a in ["gfx1030", "gfx1031", "gfx1010", "gfx1011", "gfx1012", "gfx906", "gfx900", "gfx803"]):  # RDNA2 and older don't support bf16
--- a/comfy/samplers.py
+++ b/comfy/samplers.py
@@ -89,7 +89,7 @@ def get_area_and_mult(conds, x_in, timestep_in):
    conditioning = {}
    model_conds = conds["model_conds"]
    for c in model_conds:
-        conditioning[c] = model_conds[c].process_cond(batch_size=x_in.shape[0], device=x_in.device, area=area)
+        conditioning[c] = model_conds[c].process_cond(batch_size=x_in.shape[0], area=area)

    hooks = conds.get('hooks', None)
    control = conds.get('control', None)
--- a/comfy/sd.py
+++ b/comfy/sd.py
@@ -14,6 +14,7 @@ import comfy.ldm.genmo.vae.model
 import comfy.ldm.lightricks.vae.causal_video_autoencoder
 import comfy.ldm.cosmos.vae
 import comfy.ldm.wan.vae
+import comfy.ldm.wan.vae2_2
 import comfy.ldm.hunyuan3d.vae
 import comfy.ldm.ace.vae.music_dcae_pipeline
 import yaml
@@ -46,6 +47,7 @@ import comfy.text_encoders.wan
 import comfy.text_encoders.hidream
 import comfy.text_encoders.ace
 import comfy.text_encoders.omnigen2
+import comfy.text_encoders.qwen_image

 import comfy.model_patcher
 import comfy.lora
@@ -420,17 +422,30 @@ class VAE:
                self.memory_used_encode = lambda shape, dtype: (50 * (round((shape[2] + 7) / 8) * 8) * shape[3] * shape[4]) * model_management.dtype_size(dtype)
                self.working_dtypes = [torch.bfloat16, torch.float32]
            elif "decoder.middle.0.residual.0.gamma" in sd:
-                self.upscale_ratio = (lambda a: max(0, a * 4 - 3), 8, 8)
-                self.upscale_index_formula = (4, 8, 8)
-                self.downscale_ratio = (lambda a: max(0, math.floor((a + 3) / 4)), 8, 8)
-                self.downscale_index_formula = (4, 8, 8)
-                self.latent_dim = 3
-                self.latent_channels = 16
-                ddconfig = {"dim": 96, "z_dim": self.latent_channels, "dim_mult": [1, 2, 4, 4], "num_res_blocks": 2, "attn_scales": [], "temperal_downsample": [False, True, True], "dropout": 0.0}
-                self.first_stage_model = comfy.ldm.wan.vae.WanVAE(**ddconfig)
-                self.working_dtypes = [torch.bfloat16, torch.float16, torch.float32]
-                self.memory_used_encode = lambda shape, dtype: 6000 * shape[3] * shape[4] * model_management.dtype_size(dtype)
-                self.memory_used_decode = lambda shape, dtype: 7000 * shape[3] * shape[4] * (8 * 8) * model_management.dtype_size(dtype)
+                if "decoder.upsamples.0.upsamples.0.residual.2.weight" in sd:  # Wan 2.2 VAE
+                    self.upscale_ratio = (lambda a: max(0, a * 4 - 3), 16, 16)
+                    self.upscale_index_formula = (4, 16, 16)
+                    self.downscale_ratio = (lambda a: max(0, math.floor((a + 3) / 4)), 16, 16)
+                    self.downscale_index_formula = (4, 16, 16)
+                    self.latent_dim = 3
+                    self.latent_channels = 48
+                    ddconfig = {"dim": 160, "z_dim": self.latent_channels, "dim_mult": [1, 2, 4, 4], "num_res_blocks": 2, "attn_scales": [], "temperal_downsample": [False, True, True], "dropout": 0.0}
+                    self.first_stage_model = comfy.ldm.wan.vae2_2.WanVAE(**ddconfig)
+                    self.working_dtypes = [torch.bfloat16, torch.float16, torch.float32]
+                    self.memory_used_encode = lambda shape, dtype: 3300 * shape[3] * shape[4] * model_management.dtype_size(dtype)
+                    self.memory_used_decode = lambda shape, dtype: 8000 * shape[3] * shape[4] * (16 * 16) * model_management.dtype_size(dtype)
+                else:  # Wan 2.1 VAE
+                    self.upscale_ratio = (lambda a: max(0, a * 4 - 3), 8, 8)
+                    self.upscale_index_formula = (4, 8, 8)
+                    self.downscale_ratio = (lambda a: max(0, math.floor((a + 3) / 4)), 8, 8)
+                    self.downscale_index_formula = (4, 8, 8)
+                    self.latent_dim = 3
+                    self.latent_channels = 16
+                    ddconfig = {"dim": 96, "z_dim": self.latent_channels, "dim_mult": [1, 2, 4, 4], "num_res_blocks": 2, "attn_scales": [], "temperal_downsample": [False, True, True], "dropout": 0.0}
+                    self.first_stage_model = comfy.ldm.wan.vae.WanVAE(**ddconfig)
+                    self.working_dtypes = [torch.bfloat16, torch.float16, torch.float32]
+                    self.memory_used_encode = lambda shape, dtype: 6000 * shape[3] * shape[4] * model_management.dtype_size(dtype)
+                    self.memory_used_decode = lambda shape, dtype: 7000 * shape[3] * shape[4] * (8 * 8) * model_management.dtype_size(dtype)
            elif "geo_decoder.cross_attn_decoder.ln_1.bias" in sd:
                self.latent_dim = 1
                ln_post = "geo_decoder.ln_post.weight" in sd
@@ -757,6 +772,7 @@ class CLIPType(Enum):
    CHROMA = 15
    ACE = 16
    OMNIGEN2 = 17
+    QWEN_IMAGE = 18


 def load_clip(ckpt_paths, embedding_directory=None, clip_type=CLIPType.STABLE_DIFFUSION, model_options={}):
@@ -777,6 +793,7 @@ class TEModel(Enum):
    T5_XXL_OLD = 8
    GEMMA_2_2B = 9
    QWEN25_3B = 10
+    QWEN25_7B = 11

 def detect_te_model(sd):
    if "text_model.encoder.layers.30.mlp.fc1.weight" in sd:
@@ -798,7 +815,11 @@ def detect_te_model(sd):
    if 'model.layers.0.post_feedforward_layernorm.weight' in sd:
        return TEModel.GEMMA_2_2B
    if 'model.layers.0.self_attn.k_proj.bias' in sd:
-        return TEModel.QWEN25_3B
+        weight = sd['model.layers.0.self_attn.k_proj.bias']
+        if weight.shape[0] == 256:
+            return TEModel.QWEN25_3B
+        if weight.shape[0] == 512:
+            return TEModel.QWEN25_7B
    if "model.layers.0.post_attention_layernorm.weight" in sd:
        return TEModel.LLAMA3_8
    return None
@@ -903,6 +924,9 @@ def load_text_encoder_state_dicts(state_dicts=[], embedding_directory=None, clip
        elif te_model == TEModel.QWEN25_3B:
            clip_target.clip = comfy.text_encoders.omnigen2.te(**llama_detect(clip_data))
            clip_target.tokenizer = comfy.text_encoders.omnigen2.Omnigen2Tokenizer
+        elif te_model == TEModel.QWEN25_7B:
+            clip_target.clip = comfy.text_encoders.qwen_image.te(**llama_detect(clip_data))
+            clip_target.tokenizer = comfy.text_encoders.qwen_image.QwenImageTokenizer
        else:
            # clip_l
            if clip_type == CLIPType.SD3:
--- a/comfy/supported_models.py
+++ b/comfy/supported_models.py
@@ -19,6 +19,7 @@ import comfy.text_encoders.lumina2
 import comfy.text_encoders.wan
 import comfy.text_encoders.ace
 import comfy.text_encoders.omnigen2
+import comfy.text_encoders.qwen_image

 from . import supported_models_base
 from . import latent_formats
@@ -1059,6 +1060,19 @@ class WAN21_Vace(WAN21_T2V):
        out = model_base.WAN21_Vace(self, image_to_video=False, device=device)
        return out

+class WAN22_T2V(WAN21_T2V):
+    unet_config = {
+        "image_model": "wan2.1",
+        "model_type": "t2v",
+        "out_dim": 48,
+    }
+
+    latent_format = latent_formats.Wan22
+
+    def get_model(self, state_dict, prefix="", device=None):
+        out = model_base.WAN22(self, image_to_video=True, device=device)
+        return out
+
 class Hunyuan3Dv2(supported_models_base.BASE):
    unet_config = {
        "image_model": "hunyuan3d2",
@@ -1216,7 +1230,36 @@ class Omnigen2(supported_models_base.BASE):
        hunyuan_detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen25_3b.transformer.".format(pref))
        return supported_models_base.ClipTarget(comfy.text_encoders.omnigen2.Omnigen2Tokenizer, comfy.text_encoders.omnigen2.te(**hunyuan_detect))

+class QwenImage(supported_models_base.BASE):
+    unet_config = {
+        "image_model": "qwen_image",
+    }

-models = [LotusD, Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanVideoSkyreelsI2V, HunyuanVideoI2V, HunyuanVideo, CosmosT2V, CosmosI2V, CosmosT2IPredict2, CosmosI2VPredict2, Lumina2, WAN21_T2V, WAN21_I2V, WAN21_FunControl2V, WAN21_Vace, WAN21_Camera, Hunyuan3Dv2mini, Hunyuan3Dv2, HiDream, Chroma, ACEStep, Omnigen2]
+    sampling_settings = {
+        "multiplier": 1.0,
+        "shift": 1.15,
+    }
+
+    memory_usage_factor = 1.8 #TODO
+
+    unet_extra_config = {}
+    latent_format = latent_formats.Wan21
+
+    supported_inference_dtypes = [torch.bfloat16, torch.float32]
+
+    vae_key_prefix = ["vae."]
+    text_encoder_key_prefix = ["text_encoders."]
+
+    def get_model(self, state_dict, prefix="", device=None):
+        out = model_base.QwenImage(self, device=device)
+        return out
+
+    def clip_target(self, state_dict={}):
+        pref = self.text_encoder_key_prefix[0]
+        hunyuan_detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen25_7b.transformer.".format(pref))
+        return supported_models_base.ClipTarget(comfy.text_encoders.qwen_image.QwenImageTokenizer, comfy.text_encoders.qwen_image.te(**hunyuan_detect))
+
+
+models = [LotusD, Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanVideoSkyreelsI2V, HunyuanVideoI2V, HunyuanVideo, CosmosT2V, CosmosI2V, CosmosT2IPredict2, CosmosI2VPredict2, Lumina2, WAN22_T2V, WAN21_T2V, WAN21_I2V, WAN21_FunControl2V, WAN21_Vace, WAN21_Camera, Hunyuan3Dv2mini, Hunyuan3Dv2, HiDream, Chroma, ACEStep, Omnigen2, QwenImage]

 models += [SVD_img2vid]
--- a/comfy/text_encoders/llama.py
+++ b/comfy/text_encoders/llama.py
@@ -43,6 +43,23 @@ class Qwen25_3BConfig:
    mlp_activation = "silu"
    qkv_bias = True

+@dataclass
+class Qwen25_7BVLI_Config:
+    vocab_size: int = 152064
+    hidden_size: int = 3584
+    intermediate_size: int = 18944
+    num_hidden_layers: int = 28
+    num_attention_heads: int = 28
+    num_key_value_heads: int = 4
+    max_position_embeddings: int = 128000
+    rms_norm_eps: float = 1e-6
+    rope_theta: float = 1000000.0
+    transformer_type: str = "llama"
+    head_dim = 128
+    rms_norm_add = False
+    mlp_activation = "silu"
+    qkv_bias = True
+
@dataclass
 class Gemma2_2B_Config:
    vocab_size: int = 256000
@@ -348,6 +365,15 @@ class Qwen25_3B(BaseLlama, torch.nn.Module):
        self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
        self.dtype = dtype

+class Qwen25_7BVLI(BaseLlama, torch.nn.Module):
+    def __init__(self, config_dict, dtype, device, operations):
+        super().__init__()
+        config = Qwen25_7BVLI_Config(**config_dict)
+        self.num_layers = config.num_hidden_layers
+
+        self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
+        self.dtype = dtype
+
 class Gemma2_2B(BaseLlama, torch.nn.Module):
    def __init__(self, config_dict, dtype, device, operations):
        super().__init__()
--- a/comfy/text_encoders/qwen_image.py
+++ b/comfy/text_encoders/qwen_image.py
@@ -0,0 +1,71 @@
+from transformers import Qwen2Tokenizer
+from comfy import sd1_clip
+import comfy.text_encoders.llama
+import os
+import torch
+import numbers
+
+class Qwen25_7BVLITokenizer(sd1_clip.SDTokenizer):
+    def __init__(self, embedding_directory=None, tokenizer_data={}):
+        tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "qwen25_tokenizer")
+        super().__init__(tokenizer_path, pad_with_end=False, embedding_size=3584, embedding_key='qwen25_7b', tokenizer_class=Qwen2Tokenizer, has_start_token=False, has_end_token=False, pad_to_max_length=False, max_length=99999999, min_length=1, pad_token=151643, tokenizer_data=tokenizer_data)
+
+
+class QwenImageTokenizer(sd1_clip.SD1Tokenizer):
+    def __init__(self, embedding_directory=None, tokenizer_data={}):
+        super().__init__(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data, name="qwen25_7b", tokenizer=Qwen25_7BVLITokenizer)
+        self.llama_template = "<|im_start|>system\nDescribe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n"
+
+    def tokenize_with_weights(self, text, return_word_ids=False, llama_template=None,**kwargs):
+        if llama_template is None:
+            llama_text = self.llama_template.format(text)
+        else:
+            llama_text = llama_template.format(text)
+        return super().tokenize_with_weights(llama_text, return_word_ids=return_word_ids, **kwargs)
+
+
+class Qwen25_7BVLIModel(sd1_clip.SDClipModel):
+    def __init__(self, device="cpu", layer="last", layer_idx=None, dtype=None, attention_mask=True, model_options={}):
+        super().__init__(device=device, layer=layer, layer_idx=layer_idx, textmodel_json_config={}, dtype=dtype, special_tokens={"pad": 151643}, layer_norm_hidden_state=False, model_class=comfy.text_encoders.llama.Qwen25_7BVLI, enable_attention_masks=attention_mask, return_attention_masks=attention_mask, model_options=model_options)
+
+
+class QwenImageTEModel(sd1_clip.SD1ClipModel):
+    def __init__(self, device="cpu", dtype=None, model_options={}):
+        super().__init__(device=device, dtype=dtype, name="qwen25_7b", clip_model=Qwen25_7BVLIModel, model_options=model_options)
+
+    def encode_token_weights(self, token_weight_pairs):
+        out, pooled, extra = super().encode_token_weights(token_weight_pairs)
+        tok_pairs = token_weight_pairs["qwen25_7b"][0]
+        count_im_start = 0
+        for i, v in enumerate(tok_pairs):
+            elem = v[0]
+            if not torch.is_tensor(elem):
+                if isinstance(elem, numbers.Integral):
+                    if elem == 151644 and count_im_start < 2:
+                        template_end = i
+                        count_im_start += 1
+
+        if out.shape[1] > (template_end + 3):
+            if tok_pairs[template_end + 1][0] == 872:
+                if tok_pairs[template_end + 2][0] == 198:
+                    template_end += 3
+
+        out = out[:, template_end:]
+
+        extra["attention_mask"] = extra["attention_mask"][:, template_end:]
+        if extra["attention_mask"].sum() == torch.numel(extra["attention_mask"]):
+            extra.pop("attention_mask")  # attention mask is useless if no masked elements
+
+        return out, pooled, extra
+
+
+def te(dtype_llama=None, llama_scaled_fp8=None):
+    class QwenImageTEModel_(QwenImageTEModel):
+        def __init__(self, device="cpu", dtype=None, model_options={}):
+            if llama_scaled_fp8 is not None and "scaled_fp8" not in model_options:
+                model_options = model_options.copy()
+                model_options["scaled_fp8"] = llama_scaled_fp8
+            if dtype_llama is not None:
+                dtype = dtype_llama
+            super().__init__(device=device, dtype=dtype, model_options=model_options)
+    return QwenImageTEModel_
--- a/comfy/utils.py
+++ b/comfy/utils.py
@@ -50,10 +50,16 @@ if hasattr(torch.serialization, "add_safe_globals"):  # TODO: this was added in
 else:
    logging.info("Warning, you are using an old pytorch version and some ckpt/pt files might be loaded unsafely. Upgrading to 2.4 or above is recommended.")

+def is_html_file(file_path):
+    with open(file_path, "rb") as f:
+        content = f.read(100)
+        return b"<!DOCTYPE html>" in content or b"<html" in content
+
 def load_torch_file(ckpt, safe_load=False, device=None, return_metadata=False):
    if device is None:
        device = torch.device("cpu")
    metadata = None
+
    if ckpt.lower().endswith(".safetensors") or ckpt.lower().endswith(".sft"):
        try:
            with safetensors.safe_open(ckpt, framework="pt", device=device.type) as f:
@@ -66,6 +72,8 @@ def load_torch_file(ckpt, safe_load=False, device=None, return_metadata=False):
                if return_metadata:
                    metadata = f.metadata()
        except Exception as e:
+            if is_html_file(ckpt):
+                raise ValueError("{}\n\nFile path: {}\n\nThe requested file is an HTML document not a safetensors file. Please re-download the file, not the web page.".format(e, ckpt))
            if len(e.args) > 0:
                message = e.args[0]
                if "HeaderTooLarge" in message:
@@ -93,6 +101,13 @@ def load_torch_file(ckpt, safe_load=False, device=None, return_metadata=False):
                    sd = pl_sd
            else:
                sd = pl_sd
+
+    try:
+        from app.model_processor import model_processor
+        model_processor.process_file(ckpt)
+    except Exception as e:
+        logging.error(f"Error processing file {ckpt}: {e}")
+
    return (sd, metadata) if return_metadata else sd

 def save_torch_file(sd, ckpt, metadata=None):
@@ -698,6 +713,26 @@ def resize_to_batch_size(tensor, batch_size):

    return output

+def resize_list_to_batch_size(l, batch_size):
+    in_batch_size = len(l)
+    if in_batch_size == batch_size or in_batch_size == 0:
+        return l
+
+    if batch_size <= 1:
+        return l[:batch_size]
+
+    output = []
+    if batch_size < in_batch_size:
+        scale = (in_batch_size - 1) / (batch_size - 1)
+        for i in range(batch_size):
+            output.append(l[min(round(i * scale), in_batch_size - 1)])
+    else:
+        scale = in_batch_size / batch_size
+        for i in range(batch_size):
+           output.append(l[min(math.floor((i + 0.5) * scale), in_batch_size - 1)])
+
+    return output
+
 def convert_sd_to(state_dict, dtype):
    keys = list(state_dict.keys())
    for k in keys:
--- a/comfy/weight_adapter/init.py
+++ b/comfy/weight_adapter/init.py
@@ -15,9 +15,20 @@ adapters: list[type[WeightAdapterBase]] = [
    OFTAdapter,
    BOFTAdapter,
 ]
+adapter_maps: dict[str, type[WeightAdapterBase]] = {
+    "LoRA": LoRAAdapter,
+    "LoHa": LoHaAdapter,
+    "LoKr": LoKrAdapter,
+    "OFT": OFTAdapter,
+    ## We disable not implemented algo for now
+    # "GLoRA": GLoRAAdapter,
+    # "BOFT": BOFTAdapter,
+}
+

 __all__ = [
    "WeightAdapterBase",
    "WeightAdapterTrainBase",
-    "adapters"
+    "adapters",
+    "adapter_maps",
 ] + [a.__name__ for a in adapters]
--- a/comfy/weight_adapter/base.py
+++ b/comfy/weight_adapter/base.py
@@ -133,3 +133,43 @@ def tucker_weight_from_conv(up, down, mid):
 def tucker_weight(wa, wb, t):
    temp = torch.einsum("i j ..., j r -> i r ...", t, wb)
    return torch.einsum("i j ..., i r -> r j ...", temp, wa)
+
+
+def factorization(dimension: int, factor: int = -1) -> tuple[int, int]:
+    """
+    return a tuple of two value of input dimension decomposed by the number closest to factor
+    second value is higher or equal than first value.
+
+    examples)
+    factor
+        -1               2                4               8               16               ...
+    127 -> 1, 127   127 -> 1, 127    127 -> 1, 127   127 -> 1, 127   127 -> 1, 127
+    128 -> 8, 16    128 -> 2, 64     128 -> 4, 32    128 -> 8, 16    128 -> 8, 16
+    250 -> 10, 25   250 -> 2, 125    250 -> 2, 125   250 -> 5, 50    250 -> 10, 25
+    360 -> 8, 45    360 -> 2, 180    360 -> 4, 90    360 -> 8, 45    360 -> 12, 30
+    512 -> 16, 32   512 -> 2, 256    512 -> 4, 128   512 -> 8, 64    512 -> 16, 32
+    1024 -> 32, 32  1024 -> 2, 512   1024 -> 4, 256  1024 -> 8, 128  1024 -> 16, 64
+    """
+
+    if factor > 0 and (dimension % factor) == 0 and dimension >= factor**2:
+        m = factor
+        n = dimension // factor
+        if m > n:
+            n, m = m, n
+        return m, n
+    if factor < 0:
+        factor = dimension
+    m, n = 1, dimension
+    length = m + n
+    while m < n:
+        new_m = m + 1
+        while dimension % new_m != 0:
+            new_m += 1
+        new_n = dimension // new_m
+        if new_m + new_n > length or new_m > factor:
+            break
+        else:
+            m, n = new_m, new_n
+    if m > n:
+        n, m = m, n
+    return m, n
--- a/comfy/weight_adapter/loha.py
+++ b/comfy/weight_adapter/loha.py
@@ -3,7 +3,120 @@ from typing import Optional

 import torch
 import comfy.model_management
-from .base import WeightAdapterBase, weight_decompose
+from .base import WeightAdapterBase, WeightAdapterTrainBase, weight_decompose
+
+
+class HadaWeight(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, w1u, w1d, w2u, w2d, scale=torch.tensor(1)):
+        ctx.save_for_backward(w1d, w1u, w2d, w2u, scale)
+        diff_weight = ((w1u @ w1d) * (w2u @ w2d)) * scale
+        return diff_weight
+
+    @staticmethod
+    def backward(ctx, grad_out):
+        (w1d, w1u, w2d, w2u, scale) = ctx.saved_tensors
+        grad_out = grad_out * scale
+        temp = grad_out * (w2u @ w2d)
+        grad_w1u = temp @ w1d.T
+        grad_w1d = w1u.T @ temp
+
+        temp = grad_out * (w1u @ w1d)
+        grad_w2u = temp @ w2d.T
+        grad_w2d = w2u.T @ temp
+
+        del temp
+        return grad_w1u, grad_w1d, grad_w2u, grad_w2d, None
+
+
+class HadaWeightTucker(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, t1, w1u, w1d, t2, w2u, w2d, scale=torch.tensor(1)):
+        ctx.save_for_backward(t1, w1d, w1u, t2, w2d, w2u, scale)
+
+        rebuild1 = torch.einsum("i j ..., j r, i p -> p r ...", t1, w1d, w1u)
+        rebuild2 = torch.einsum("i j ..., j r, i p -> p r ...", t2, w2d, w2u)
+
+        return rebuild1 * rebuild2 * scale
+
+    @staticmethod
+    def backward(ctx, grad_out):
+        (t1, w1d, w1u, t2, w2d, w2u, scale) = ctx.saved_tensors
+        grad_out = grad_out * scale
+
+        temp = torch.einsum("i j ..., j r -> i r ...", t2, w2d)
+        rebuild = torch.einsum("i j ..., i r -> r j ...", temp, w2u)
+
+        grad_w = rebuild * grad_out
+        del rebuild
+
+        grad_w1u = torch.einsum("r j ..., i j ... -> r i", temp, grad_w)
+        grad_temp = torch.einsum("i j ..., i r -> r j ...", grad_w, w1u.T)
+        del grad_w, temp
+
+        grad_w1d = torch.einsum("i r ..., i j ... -> r j", t1, grad_temp)
+        grad_t1 = torch.einsum("i j ..., j r -> i r ...", grad_temp, w1d.T)
+        del grad_temp
+
+        temp = torch.einsum("i j ..., j r -> i r ...", t1, w1d)
+        rebuild = torch.einsum("i j ..., i r -> r j ...", temp, w1u)
+
+        grad_w = rebuild * grad_out
+        del rebuild
+
+        grad_w2u = torch.einsum("r j ..., i j ... -> r i", temp, grad_w)
+        grad_temp = torch.einsum("i j ..., i r -> r j ...", grad_w, w2u.T)
+        del grad_w, temp
+
+        grad_w2d = torch.einsum("i r ..., i j ... -> r j", t2, grad_temp)
+        grad_t2 = torch.einsum("i j ..., j r -> i r ...", grad_temp, w2d.T)
+        del grad_temp
+        return grad_t1, grad_w1u, grad_w1d, grad_t2, grad_w2u, grad_w2d, None
+
+
+class LohaDiff(WeightAdapterTrainBase):
+    def __init__(self, weights):
+        super().__init__()
+        # Unpack weights tuple from LoHaAdapter
+        w1a, w1b, alpha, w2a, w2b, t1, t2, _ = weights
+
+        # Create trainable parameters
+        self.hada_w1_a = torch.nn.Parameter(w1a)
+        self.hada_w1_b = torch.nn.Parameter(w1b)
+        self.hada_w2_a = torch.nn.Parameter(w2a)
+        self.hada_w2_b = torch.nn.Parameter(w2b)
+
+        self.use_tucker = False
+        if t1 is not None and t2 is not None:
+            self.use_tucker = True
+            self.hada_t1 = torch.nn.Parameter(t1)
+            self.hada_t2 = torch.nn.Parameter(t2)
+        else:
+            # Keep the attributes for consistent access
+            self.hada_t1 = None
+            self.hada_t2 = None
+
+        # Store rank and non-trainable alpha
+        self.rank = w1b.shape[0]
+        self.alpha = torch.nn.Parameter(torch.tensor(alpha), requires_grad=False)
+
+    def __call__(self, w):
+        org_dtype = w.dtype
+
+        scale = self.alpha / self.rank
+        if self.use_tucker:
+            diff_weight = HadaWeightTucker.apply(self.hada_t1, self.hada_w1_a, self.hada_w1_b, self.hada_t2, self.hada_w2_a, self.hada_w2_b, scale)
+        else:
+            diff_weight = HadaWeight.apply(self.hada_w1_a, self.hada_w1_b, self.hada_w2_a, self.hada_w2_b, scale)
+
+        # Add the scaled difference to the original weight
+        weight = w.to(diff_weight) + diff_weight.reshape(w.shape)
+
+        return weight.to(org_dtype)
+
+    def passive_memory_usage(self):
+        """Calculates memory usage of the trainable parameters."""
+        return sum(param.numel() * param.element_size() for param in self.parameters())


 class LoHaAdapter(WeightAdapterBase):
@@ -13,6 +126,25 @@ class LoHaAdapter(WeightAdapterBase):
        self.loaded_keys = loaded_keys
        self.weights = weights

+    @classmethod
+    def create_train(cls, weight, rank=1, alpha=1.0):
+        out_dim = weight.shape[0]
+        in_dim = weight.shape[1:].numel()
+        mat1 = torch.empty(out_dim, rank, device=weight.device, dtype=weight.dtype)
+        mat2 = torch.empty(rank, in_dim, device=weight.device, dtype=weight.dtype)
+        torch.nn.init.normal_(mat1, 0.1)
+        torch.nn.init.constant_(mat2, 0.0)
+        mat3 = torch.empty(out_dim, rank, device=weight.device, dtype=weight.dtype)
+        mat4 = torch.empty(rank, in_dim, device=weight.device, dtype=weight.dtype)
+        torch.nn.init.normal_(mat3, 0.1)
+        torch.nn.init.normal_(mat4, 0.01)
+        return LohaDiff(
+            (mat1, mat2, alpha, mat3, mat4, None, None, None)
+        )
+
+    def to_train(self):
+        return LohaDiff(self.weights)
+
    @classmethod
    def load(
        cls,
--- a/comfy/weight_adapter/lokr.py
+++ b/comfy/weight_adapter/lokr.py
@@ -3,7 +3,77 @@ from typing import Optional

 import torch
 import comfy.model_management
-from .base import WeightAdapterBase, weight_decompose
+from .base import (
+    WeightAdapterBase,
+    WeightAdapterTrainBase,
+    weight_decompose,
+    factorization,
+)
+
+
+class LokrDiff(WeightAdapterTrainBase):
+    def __init__(self, weights):
+        super().__init__()
+        (lokr_w1, lokr_w2, alpha, lokr_w1_a, lokr_w1_b, lokr_w2_a, lokr_w2_b, lokr_t2, dora_scale) = weights
+        self.use_tucker = False
+        if lokr_w1_a is not None:
+            _, rank_a = lokr_w1_a.shape[0], lokr_w1_a.shape[1]
+            rank_a, _ = lokr_w1_b.shape[0], lokr_w1_b.shape[1]
+            self.lokr_w1_a = torch.nn.Parameter(lokr_w1_a)
+            self.lokr_w1_b = torch.nn.Parameter(lokr_w1_b)
+            self.w1_rebuild = True
+            self.ranka = rank_a
+
+        if lokr_w2_a is not None:
+            _, rank_b = lokr_w2_a.shape[0], lokr_w2_a.shape[1]
+            rank_b, _ = lokr_w2_b.shape[0], lokr_w2_b.shape[1]
+            self.lokr_w2_a = torch.nn.Parameter(lokr_w2_a)
+            self.lokr_w2_b = torch.nn.Parameter(lokr_w2_b)
+            if lokr_t2 is not None:
+                self.use_tucker = True
+                self.lokr_t2 = torch.nn.Parameter(lokr_t2)
+            self.w2_rebuild = True
+            self.rankb = rank_b
+
+        if lokr_w1 is not None:
+            self.lokr_w1 = torch.nn.Parameter(lokr_w1)
+            self.w1_rebuild = False
+
+        if lokr_w2 is not None:
+            self.lokr_w2 = torch.nn.Parameter(lokr_w2)
+            self.w2_rebuild = False
+
+        self.alpha = torch.nn.Parameter(torch.tensor(alpha), requires_grad=False)
+
+    @property
+    def w1(self):
+        if self.w1_rebuild:
+            return (self.lokr_w1_a @ self.lokr_w1_b) * (self.alpha / self.ranka)
+        else:
+            return self.lokr_w1
+
+    @property
+    def w2(self):
+        if self.w2_rebuild:
+            if self.use_tucker:
+                w2 = torch.einsum(
+                    'i j k l, j r, i p -> p r k l',
+                    self.lokr_t2,
+                    self.lokr_w2_b,
+                    self.lokr_w2_a
+                )
+            else:
+                w2 = self.lokr_w2_a @ self.lokr_w2_b
+            return w2 * (self.alpha / self.rankb)
+        else:
+            return self.lokr_w2
+
+    def __call__(self, w):
+        diff = torch.kron(self.w1, self.w2)
+        return w + diff.reshape(w.shape).to(w)
+
+    def passive_memory_usage(self):
+        return sum(param.numel() * param.element_size() for param in self.parameters())


 class LoKrAdapter(WeightAdapterBase):
@@ -13,6 +83,20 @@ class LoKrAdapter(WeightAdapterBase):
        self.loaded_keys = loaded_keys
        self.weights = weights

+    @classmethod
+    def create_train(cls, weight, rank=1, alpha=1.0):
+        out_dim = weight.shape[0]
+        in_dim = weight.shape[1:].numel()
+        out1, out2 = factorization(out_dim, rank)
+        in1, in2 = factorization(in_dim, rank)
+        mat1 = torch.empty(out1, in1, device=weight.device, dtype=weight.dtype)
+        mat2 = torch.empty(out2, in2, device=weight.device, dtype=weight.dtype)
+        torch.nn.init.kaiming_uniform_(mat2, a=5**0.5)
+        torch.nn.init.constant_(mat1, 0.0)
+        return LokrDiff(
+            (mat1, mat2, alpha, None, None, None, None, None, None)
+        )
+
    @classmethod
    def load(
        cls,
--- a/comfy/weight_adapter/lora.py
+++ b/comfy/weight_adapter/lora.py
@@ -96,6 +96,7 @@ class LoRAAdapter(WeightAdapterBase):
        diffusers3_lora = "{}.lora.up.weight".format(x)
        mochi_lora = "{}.lora_B".format(x)
        transformers_lora = "{}.lora_linear_layer.up.weight".format(x)
+        qwen_default_lora = "{}.lora_B.default.weight".format(x)
        A_name = None

        if regular_lora in lora.keys():
@@ -122,6 +123,10 @@ class LoRAAdapter(WeightAdapterBase):
            A_name = transformers_lora
            B_name = "{}.lora_linear_layer.down.weight".format(x)
            mid_name = None
+        elif qwen_default_lora in lora.keys():
+            A_name = qwen_default_lora
+            B_name = "{}.lora_A.default.weight".format(x)
+            mid_name = None

        if A_name is not None:
            mid = None
--- a/comfy/weight_adapter/oft.py
+++ b/comfy/weight_adapter/oft.py
@@ -3,7 +3,58 @@ from typing import Optional

 import torch
 import comfy.model_management
-from .base import WeightAdapterBase, weight_decompose
+from .base import WeightAdapterBase, WeightAdapterTrainBase, weight_decompose, factorization
+
+
+class OFTDiff(WeightAdapterTrainBase):
+    def __init__(self, weights):
+        super().__init__()
+        # Unpack weights tuple from LoHaAdapter
+        blocks, rescale, alpha, _ = weights
+
+        # Create trainable parameters
+        self.oft_blocks = torch.nn.Parameter(blocks)
+        if rescale is not None:
+            self.rescale = torch.nn.Parameter(rescale)
+            self.rescaled = True
+        else:
+            self.rescaled = False
+        self.block_num, self.block_size, _ = blocks.shape
+        self.constraint = float(alpha)
+        self.alpha = torch.nn.Parameter(torch.tensor(alpha), requires_grad=False)
+
+    def __call__(self, w):
+        org_dtype = w.dtype
+        I = torch.eye(self.block_size, device=self.oft_blocks.device)
+
+        ## generate r
+        # for Q = -Q^T
+        q = self.oft_blocks - self.oft_blocks.transpose(1, 2)
+        normed_q = q
+        if self.constraint:
+            q_norm = torch.norm(q) + 1e-8
+            if q_norm > self.constraint:
+                normed_q = q * self.constraint / q_norm
+        # use float() to prevent unsupported type
+        r = (I + normed_q) @ (I - normed_q).float().inverse()
+
+        ## Apply chunked matmul on weight
+        _, *shape = w.shape
+        org_weight = w.to(dtype=r.dtype)
+        org_weight = org_weight.unflatten(0, (self.block_num, self.block_size))
+        # Init R=0, so add I on it to ensure the output of step0 is original model output
+        weight = torch.einsum(
+            "k n m, k n ... -> k m ...",
+            r,
+            org_weight,
+        ).flatten(0, 1)
+        if self.rescaled:
+            weight = self.rescale * weight
+        return weight.to(org_dtype)
+
+    def passive_memory_usage(self):
+        """Calculates memory usage of the trainable parameters."""
+        return sum(param.numel() * param.element_size() for param in self.parameters())


 class OFTAdapter(WeightAdapterBase):
@@ -13,6 +64,18 @@ class OFTAdapter(WeightAdapterBase):
        self.loaded_keys = loaded_keys
        self.weights = weights

+    @classmethod
+    def create_train(cls, weight, rank=1, alpha=1.0):
+        out_dim = weight.shape[0]
+        block_size, block_num = factorization(out_dim, rank)
+        block = torch.zeros(block_num, block_size, block_size, device=weight.device, dtype=weight.dtype)
+        return OFTDiff(
+            (block, None, alpha, None)
+        )
+
+    def to_train(self):
+        return OFTDiff(self.weights)
+
    @classmethod
    def load(
        cls,
@@ -60,6 +123,8 @@ class OFTAdapter(WeightAdapterBase):
        blocks = v[0]
        rescale = v[1]
        alpha = v[2]
+        if alpha is None:
+            alpha = 0
        dora_scale = v[3]

        blocks = comfy.model_management.cast_to_device(blocks, weight.device, intermediate_dtype)
--- a/comfy_api/generate_api_stubs.py
+++ b/comfy_api/generate_api_stubs.py
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+"""
+Script to generate .pyi stub files for the synchronous API wrappers.
+This allows generating stubs without running the full ComfyUI application.
+"""
+
+import os
+import sys
+import logging
+import importlib
+
+# Add ComfyUI to path so we can import modules
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from comfy_api.internal.async_to_sync import AsyncToSyncConverter
+from comfy_api.version_list import supported_versions
+
+
+def generate_stubs_for_module(module_name: str) -> None:
+    """Generate stub files for a specific module that exports ComfyAPI and ComfyAPISync."""
+    try:
+        # Import the module
+        module = importlib.import_module(module_name)
+
+        # Check if module has ComfyAPISync (the sync wrapper)
+        if hasattr(module, "ComfyAPISync"):
+            # Module already has a sync class
+            api_class = getattr(module, "ComfyAPI", None)
+            sync_class = getattr(module, "ComfyAPISync")
+
+            if api_class:
+                # Generate the stub file
+                AsyncToSyncConverter.generate_stub_file(api_class, sync_class)
+                logging.info(f"Generated stub file for {module_name}")
+            else:
+                logging.warning(
+                    f"Module {module_name} has ComfyAPISync but no ComfyAPI"
+                )
+
+        elif hasattr(module, "ComfyAPI"):
+            # Module only has async API, need to create sync wrapper first
+            from comfy_api.internal.async_to_sync import create_sync_class
+
+            api_class = getattr(module, "ComfyAPI")
+            sync_class = create_sync_class(api_class)
+
+            # Generate the stub file
+            AsyncToSyncConverter.generate_stub_file(api_class, sync_class)
+            logging.info(f"Generated stub file for {module_name}")
+        else:
+            logging.warning(
+                f"Module {module_name} does not export ComfyAPI or ComfyAPISync"
+            )
+
+    except Exception as e:
+        logging.error(f"Failed to generate stub for {module_name}: {e}")
+        import traceback
+
+        traceback.print_exc()
+
+
+def main():
+    """Main function to generate all API stub files."""
+    logging.basicConfig(level=logging.INFO)
+
+    logging.info("Starting stub generation...")
+
+    # Dynamically get module names from supported_versions
+    api_modules = []
+    for api_class in supported_versions:
+        # Extract module name from the class
+        module_name = api_class.__module__
+        if module_name not in api_modules:
+            api_modules.append(module_name)
+
+    logging.info(f"Found {len(api_modules)} API modules: {api_modules}")
+
+    # Generate stubs for each module
+    for module_name in api_modules:
+        generate_stubs_for_module(module_name)
+
+    logging.info("Stub generation complete!")
+
+
+if __name__ == "__main__":
+    main()
--- a/comfy_api/input/init.py
+++ b/comfy_api/input/init.py
@@ -1,8 +1,16 @@
-from .basic_types import ImageInput, AudioInput
-from .video_types import VideoInput
+# This file only exists for backwards compatibility.
+from comfy_api.latest._input import (
+    ImageInput,
+    AudioInput,
+    MaskInput,
+    LatentInput,
+    VideoInput,
+)

 __all__ = [
    "ImageInput",
    "AudioInput",
+    "MaskInput",
+    "LatentInput",
    "VideoInput",
 ]
--- a/comfy_api/input/basic_types.py
+++ b/comfy_api/input/basic_types.py
@@ -1,20 +1,14 @@
-import torch
-from typing import TypedDict
-
-ImageInput = torch.Tensor
-"""
-An image in format [B, H, W, C] where B is the batch size, C is the number of channels,
-"""
-
-class AudioInput(TypedDict):
-    """
-    TypedDict representing audio input.
-    """
-
-    waveform: torch.Tensor
-    """
-    Tensor in the format [B, C, T] where B is the batch size, C is the number of channels,
-    """
-
-    sample_rate: int
+# This file only exists for backwards compatibility.
+from comfy_api.latest._input.basic_types import (
+    ImageInput,
+    AudioInput,
+    MaskInput,
+    LatentInput,
+)

+__all__ = [
+    "ImageInput",
+    "AudioInput",
+    "MaskInput",
+    "LatentInput",
+]
--- a/comfy_api/input/video_types.py
+++ b/comfy_api/input/video_types.py
@@ -1,72 +1,6 @@
-from __future__ import annotations
-from abc import ABC, abstractmethod
-from typing import Optional, Union
-import io
-from comfy_api.util import VideoContainer, VideoCodec, VideoComponents
+# This file only exists for backwards compatibility.
+from comfy_api.latest._input.video_types import VideoInput

-class VideoInput(ABC):
-    """
-    Abstract base class for video input types.
-    """
-
-    @abstractmethod
-    def get_components(self) -> VideoComponents:
-        """
-        Abstract method to get the video components (images, audio, and frame rate).
-
-        Returns:
-            VideoComponents containing images, audio, and frame rate
-        """
-        pass
-
-    @abstractmethod
-    def save_to(
-        self,
-        path: str,
-        format: VideoContainer = VideoContainer.AUTO,
-        codec: VideoCodec = VideoCodec.AUTO,
-        metadata: Optional[dict] = None
-    ):
-        """
-        Abstract method to save the video input to a file.
-        """
-        pass
-
-    def get_stream_source(self) -> Union[str, io.BytesIO]:
-        """
-        Get a streamable source for the video. This allows processing without
-        loading the entire video into memory.
-
-        Returns:
-            Either a file path (str) or a BytesIO object that can be opened with av.
-
-        Default implementation creates a BytesIO buffer, but subclasses should
-        override this for better performance when possible.
-        """
-        buffer = io.BytesIO()
-        self.save_to(buffer)
-        buffer.seek(0)
-        return buffer
-
-    # Provide a default implementation, but subclasses can provide optimized versions
-    # if possible.
-    def get_dimensions(self) -> tuple[int, int]:
-        """
-        Returns the dimensions of the video input.
-
-        Returns:
-            Tuple of (width, height)
-        """
-        components = self.get_components()
-        return components.images.shape[2], components.images.shape[1]
-
-    def get_duration(self) -> float:
-        """
-        Returns the duration of the video in seconds.
-
-        Returns:
-            Duration in seconds
-        """
-        components = self.get_components()
-        frame_count = components.images.shape[0]
-        return float(frame_count / components.frame_rate)
+__all__ = [
+    "VideoInput",
+]
--- a/comfy_api/input_impl/init.py
+++ b/comfy_api/input_impl/init.py
@@ -1,7 +1,7 @@
-from .video_types import VideoFromFile, VideoFromComponents
+# This file only exists for backwards compatibility.
+from comfy_api.latest._input_impl import VideoFromFile, VideoFromComponents

 __all__ = [
-    # Implementations
    "VideoFromFile",
    "VideoFromComponents",
 ]
--- a/comfy_api/input_impl/video_types.py
+++ b/comfy_api/input_impl/video_types.py
@@ -1,312 +1,2 @@
-from __future__ import annotations
-from av.container import InputContainer
-from av.subtitles.stream import SubtitleStream
-from fractions import Fraction
-from typing import Optional
-from comfy_api.input import AudioInput
-import av
-import io
-import json
-import numpy as np
-import torch
-from comfy_api.input import VideoInput
-from comfy_api.util import VideoContainer, VideoCodec, VideoComponents
-
-
-def container_to_output_format(container_format: str | None) -> str | None:
-    """
-    A container's `format` may be a comma-separated list of formats.
-    E.g., iso container's `format` may be `mov,mp4,m4a,3gp,3g2,mj2`.
-    However, writing to a file/stream with `av.open` requires a single format,
-    or `None` to auto-detect.
-    """
-    if not container_format:
-        return None  # Auto-detect
-
-    if "," not in container_format:
-        return container_format
-
-    formats = container_format.split(",")
-    return formats[0]
-
-
-def get_open_write_kwargs(
-    dest: str | io.BytesIO, container_format: str, to_format: str | None
-) -> dict:
-    """Get kwargs for writing a `VideoFromFile` to a file/stream with `av.open`"""
-    open_kwargs = {
-        "mode": "w",
-        # If isobmff, preserve custom metadata tags (workflow, prompt, extra_pnginfo)
-        "options": {"movflags": "use_metadata_tags"},
-    }
-
-    is_write_to_buffer = isinstance(dest, io.BytesIO)
-    if is_write_to_buffer:
-        # Set output format explicitly, since it cannot be inferred from file extension
-        if to_format == VideoContainer.AUTO:
-            to_format = container_format.lower()
-        elif isinstance(to_format, str):
-            to_format = to_format.lower()
-        open_kwargs["format"] = container_to_output_format(to_format)
-
-    return open_kwargs
-
-
-class VideoFromFile(VideoInput):
-    """
-    Class representing video input from a file.
-    """
-
-    def __init__(self, file: str | io.BytesIO):
-        """
-        Initialize the VideoFromFile object based off of either a path on disk or a BytesIO object
-        containing the file contents.
-        """
-        self.__file = file
-
-    def get_stream_source(self) -> str | io.BytesIO:
-        """
-        Return the underlying file source for efficient streaming.
-        This avoids unnecessary memory copies when the source is already a file path.
-        """
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)
-        return self.__file
-
-    def get_dimensions(self) -> tuple[int, int]:
-        """
-        Returns the dimensions of the video input.
-
-        Returns:
-            Tuple of (width, height)
-        """
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)  # Reset the BytesIO object to the beginning
-        with av.open(self.__file, mode='r') as container:
-            for stream in container.streams:
-                if stream.type == 'video':
-                    assert isinstance(stream, av.VideoStream)
-                    return stream.width, stream.height
-        raise ValueError(f"No video stream found in file '{self.__file}'")
-
-    def get_duration(self) -> float:
-        """
-        Returns the duration of the video in seconds.
-
-        Returns:
-            Duration in seconds
-        """
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)
-        with av.open(self.__file, mode="r") as container:
-            if container.duration is not None:
-                return float(container.duration / av.time_base)
-
-            # Fallback: calculate from frame count and frame rate
-            video_stream = next(
-                (s for s in container.streams if s.type == "video"), None
-            )
-            if video_stream and video_stream.frames and video_stream.average_rate:
-                return float(video_stream.frames / video_stream.average_rate)
-
-            # Last resort: decode frames to count them
-            if video_stream and video_stream.average_rate:
-                frame_count = 0
-                container.seek(0)
-                for packet in container.demux(video_stream):
-                    for _ in packet.decode():
-                        frame_count += 1
-                if frame_count > 0:
-                    return float(frame_count / video_stream.average_rate)
-
-        raise ValueError(f"Could not determine duration for file '{self.__file}'")
-
-    def get_components_internal(self, container: InputContainer) -> VideoComponents:
-        # Get video frames
-        frames = []
-        for frame in container.decode(video=0):
-            img = frame.to_ndarray(format='rgb24')  # shape: (H, W, 3)
-            img = torch.from_numpy(img) / 255.0  # shape: (H, W, 3)
-            frames.append(img)
-
-        images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 3, 0, 0)
-
-        # Get frame rate
-        video_stream = next(s for s in container.streams if s.type == 'video')
-        frame_rate = Fraction(video_stream.average_rate) if video_stream and video_stream.average_rate else Fraction(1)
-
-        # Get audio if available
-        audio = None
-        try:
-            container.seek(0)  # Reset the container to the beginning
-            for stream in container.streams:
-                if stream.type != 'audio':
-                    continue
-                assert isinstance(stream, av.AudioStream)
-                audio_frames = []
-                for packet in container.demux(stream):
-                    for frame in packet.decode():
-                        assert isinstance(frame, av.AudioFrame)
-                        audio_frames.append(frame.to_ndarray())  # shape: (channels, samples)
-                if len(audio_frames) > 0:
-                    audio_data = np.concatenate(audio_frames, axis=1)  # shape: (channels, total_samples)
-                    audio_tensor = torch.from_numpy(audio_data).unsqueeze(0)  # shape: (1, channels, total_samples)
-                    audio = AudioInput({
-                        "waveform": audio_tensor,
-                        "sample_rate": int(stream.sample_rate) if stream.sample_rate else 1,
-                    })
-        except StopIteration:
-            pass  # No audio stream
-
-        metadata = container.metadata
-        return VideoComponents(images=images, audio=audio, frame_rate=frame_rate, metadata=metadata)
-
-    def get_components(self) -> VideoComponents:
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)  # Reset the BytesIO object to the beginning
-        with av.open(self.__file, mode='r') as container:
-            return self.get_components_internal(container)
-        raise ValueError(f"No video stream found in file '{self.__file}'")
-
-    def save_to(
-        self,
-        path: str | io.BytesIO,
-        format: VideoContainer = VideoContainer.AUTO,
-        codec: VideoCodec = VideoCodec.AUTO,
-        metadata: Optional[dict] = None
-    ):
-        if isinstance(self.__file, io.BytesIO):
-            self.__file.seek(0)  # Reset the BytesIO object to the beginning
-        with av.open(self.__file, mode='r') as container:
-            container_format = container.format.name
-            video_encoding = container.streams.video[0].codec.name if len(container.streams.video) > 0 else None
-            reuse_streams = True
-            if format != VideoContainer.AUTO and format not in container_format.split(","):
-                reuse_streams = False
-            if codec != VideoCodec.AUTO and codec != video_encoding and video_encoding is not None:
-                reuse_streams = False
-
-            if not reuse_streams:
-                components = self.get_components_internal(container)
-                video = VideoFromComponents(components)
-                return video.save_to(
-                    path,
-                    format=format,
-                    codec=codec,
-                    metadata=metadata
-                )
-
-            streams = container.streams
-
-            open_kwargs = get_open_write_kwargs(path, container_format, format)
-            with av.open(path, **open_kwargs) as output_container:
-                # Copy over the original metadata
-                for key, value in container.metadata.items():
-                    if metadata is None or key not in metadata:
-                        output_container.metadata[key] = value
-
-                # Add our new metadata
-                if metadata is not None:
-                    for key, value in metadata.items():
-                        if isinstance(value, str):
-                            output_container.metadata[key] = value
-                        else:
-                            output_container.metadata[key] = json.dumps(value)
-
-                # Add streams to the new container
-                stream_map = {}
-                for stream in streams:
-                    if isinstance(stream, (av.VideoStream, av.AudioStream, SubtitleStream)):
-                        out_stream = output_container.add_stream_from_template(template=stream, opaque=True)
-                        stream_map[stream] = out_stream
-
-                # Write packets to the new container
-                for packet in container.demux():
-                    if packet.stream in stream_map and packet.dts is not None:
-                        packet.stream = stream_map[packet.stream]
-                        output_container.mux(packet)
-
-class VideoFromComponents(VideoInput):
-    """
-    Class representing video input from tensors.
-    """
-
-    def __init__(self, components: VideoComponents):
-        self.__components = components
-
-    def get_components(self) -> VideoComponents:
-        return VideoComponents(
-            images=self.__components.images,
-            audio=self.__components.audio,
-            frame_rate=self.__components.frame_rate
-        )
-
-    def save_to(
-        self,
-        path: str,
-        format: VideoContainer = VideoContainer.AUTO,
-        codec: VideoCodec = VideoCodec.AUTO,
-        metadata: Optional[dict] = None
-    ):
-        if format != VideoContainer.AUTO and format != VideoContainer.MP4:
-            raise ValueError("Only MP4 format is supported for now")
-        if codec != VideoCodec.AUTO and codec != VideoCodec.H264:
-            raise ValueError("Only H264 codec is supported for now")
-        with av.open(path, mode='w', options={'movflags': 'use_metadata_tags'}) as output:
-            # Add metadata before writing any streams
-            if metadata is not None:
-                for key, value in metadata.items():
-                    output.metadata[key] = json.dumps(value)
-
-            frame_rate = Fraction(round(self.__components.frame_rate * 1000), 1000)
-            # Create a video stream
-            video_stream = output.add_stream('h264', rate=frame_rate)
-            video_stream.width = self.__components.images.shape[2]
-            video_stream.height = self.__components.images.shape[1]
-            video_stream.pix_fmt = 'yuv420p'
-
-            # Create an audio stream
-            audio_sample_rate = 1
-            audio_stream: Optional[av.AudioStream] = None
-            if self.__components.audio:
-                audio_sample_rate = int(self.__components.audio['sample_rate'])
-                audio_stream = output.add_stream('aac', rate=audio_sample_rate)
-                audio_stream.sample_rate = audio_sample_rate
-                audio_stream.format = 'fltp'
-
-            # Encode video
-            for i, frame in enumerate(self.__components.images):
-                img = (frame * 255).clamp(0, 255).byte().cpu().numpy() # shape: (H, W, 3)
-                frame = av.VideoFrame.from_ndarray(img, format='rgb24')
-                frame = frame.reformat(format='yuv420p')  # Convert to YUV420P as required by h264
-                packet = video_stream.encode(frame)
-                output.mux(packet)
-
-            # Flush video
-            packet = video_stream.encode(None)
-            output.mux(packet)
-
-            if audio_stream and self.__components.audio:
-                # Encode audio
-                samples_per_frame = int(audio_sample_rate / frame_rate)
-                num_frames = self.__components.audio['waveform'].shape[2] // samples_per_frame
-                for i in range(num_frames):
-                    start = i * samples_per_frame
-                    end = start + samples_per_frame
-                    # TODO(Feature) - Add support for stereo audio
-                    chunk = (
-                        self.__components.audio["waveform"][0, 0, start:end]
-                        .unsqueeze(0)
-                        .contiguous()
-                        .numpy()
-                    )
-                    audio_frame = av.AudioFrame.from_ndarray(chunk, format='fltp', layout='mono')
-                    audio_frame.sample_rate = audio_sample_rate
-                    audio_frame.pts = i * samples_per_frame
-                    for packet in audio_stream.encode(audio_frame):
-                        output.mux(packet)
-
-                # Flush audio
-                for packet in audio_stream.encode(None):
-                    output.mux(packet)
-
+# This file only exists for backwards compatibility.
+from comfy_api.latest._input_impl.video_types import *  # noqa: F403
--- a/comfy_api/internal/init.py
+++ b/comfy_api/internal/init.py
@@ -0,0 +1,150 @@
+# Internal infrastructure for ComfyAPI
+from .api_registry import (
+    ComfyAPIBase as ComfyAPIBase,
+    ComfyAPIWithVersion as ComfyAPIWithVersion,
+    register_versions as register_versions,
+    get_all_versions as get_all_versions,
+)
+
+import asyncio
+from dataclasses import asdict
+from typing import Callable, Optional
+
+
+def first_real_override(cls: type, name: str, *, base: type=None) -> Optional[Callable]:
+    """Return the *callable* override of `name` visible on `cls`, or None if every
+    implementation up to (and including) `base` is the placeholder defined on `base`.
+
+    If base is not provided, it will assume cls has a GET_BASE_CLASS
+    """
+    if base is None:
+        if not hasattr(cls, "GET_BASE_CLASS"):
+            raise ValueError("base is required if cls does not have a GET_BASE_CLASS; is this a valid ComfyNode subclass?")
+        base = cls.GET_BASE_CLASS()
+    base_attr = getattr(base, name, None)
+    if base_attr is None:
+        return None
+    base_func = base_attr.__func__
+    for c in cls.mro():                       # NodeB, NodeA, ComfyNode, object …
+        if c is base:                         # reached the placeholder – we're done
+            break
+        if name in c.__dict__:                # first class that *defines* the attr
+            func = getattr(c, name).__func__
+            if func is not base_func:         # real override
+                return getattr(cls, name)     # bound to *cls*
+    return None
+
+
+class _ComfyNodeInternal:
+    """Class that all V3-based APIs inherit from for ComfyNode.
+
+    This is intended to only be referenced within execution.py, as it has to handle all V3 APIs going forward."""
+    @classmethod
+    def GET_NODE_INFO_V1(cls):
+        ...
+
+
+class _NodeOutputInternal:
+    """Class that all V3-based APIs inherit from for NodeOutput.
+
+    This is intended to only be referenced within execution.py, as it has to handle all V3 APIs going forward."""
+    ...
+
+
+def as_pruned_dict(dataclass_obj):
+    '''Return dict of dataclass object with pruned None values.'''
+    return prune_dict(asdict(dataclass_obj))
+
+def prune_dict(d: dict):
+    return {k: v for k,v in d.items() if v is not None}
+
+
+def is_class(obj):
+    '''
+    Returns True if is a class type.
+    Returns False if is a class instance.
+    '''
+    return isinstance(obj, type)
+
+
+def copy_class(cls: type) -> type:
+    '''
+    Copy a class and its attributes.
+    '''
+    if cls is None:
+        return None
+    cls_dict = {
+            k: v for k, v in cls.__dict__.items()
+            if k not in ('__dict__', '__weakref__', '__module__', '__doc__')
+        }
+    # new class
+    new_cls = type(
+        cls.__name__,
+        (cls,),
+        cls_dict
+    )
+    # metadata preservation
+    new_cls.__module__ = cls.__module__
+    new_cls.__doc__ = cls.__doc__
+    return new_cls
+
+
+class classproperty(object):
+    def __init__(self, f):
+        self.f = f
+    def __get__(self, obj, owner):
+        return self.f(owner)
+
+
+# NOTE: this was ai generated and validated by hand
+def shallow_clone_class(cls, new_name=None):
+    '''
+    Shallow clone a class while preserving super() functionality.
+    '''
+    new_name = new_name or f"{cls.__name__}Clone"
+    # Include the original class in the bases to maintain proper inheritance
+    new_bases = (cls,) + cls.__bases__
+    return type(new_name, new_bases, dict(cls.__dict__))
+
+# NOTE: this was ai generated and validated by hand
+def lock_class(cls):
+    '''
+    Lock a class so that its top-levelattributes cannot be modified.
+    '''
+    # Locked instance __setattr__
+    def locked_instance_setattr(self, name, value):
+        raise AttributeError(
+            f"Cannot set attribute '{name}' on immutable instance of {type(self).__name__}"
+        )
+    # Locked metaclass
+    class LockedMeta(type(cls)):
+        def __setattr__(cls_, name, value):
+            raise AttributeError(
+                f"Cannot modify class attribute '{name}' on locked class '{cls_.__name__}'"
+            )
+    # Rebuild class with locked behavior
+    locked_dict = dict(cls.__dict__)
+    locked_dict['__setattr__'] = locked_instance_setattr
+
+    return LockedMeta(cls.__name__, cls.__bases__, locked_dict)
+
+
+def make_locked_method_func(type_obj, func, class_clone):
+    """
+    Returns a function that, when called with **inputs, will execute:
+    getattr(type_obj, func).__func__(lock_class(class_clone), **inputs)
+
+    Supports both synchronous and asynchronous methods.
+    """
+    locked_class = lock_class(class_clone)
+    method = getattr(type_obj, func).__func__
+
+    # Check if the original method is async
+    if asyncio.iscoroutinefunction(method):
+        async def wrapped_async_func(**inputs):
+            return await method(locked_class, **inputs)
+        return wrapped_async_func
+    else:
+        def wrapped_func(**inputs):
+            return method(locked_class, **inputs)
+        return wrapped_func
--- a/comfy_api/internal/api_registry.py
+++ b/comfy_api/internal/api_registry.py
@@ -0,0 +1,39 @@
+from typing import Type, List, NamedTuple
+from comfy_api.internal.singleton import ProxiedSingleton
+from packaging import version as packaging_version
+
+
+class ComfyAPIBase(ProxiedSingleton):
+    def __init__(self):
+        pass
+
+
+class ComfyAPIWithVersion(NamedTuple):
+    version: str
+    api_class: Type[ComfyAPIBase]
+
+
+def parse_version(version_str: str) -> packaging_version.Version:
+    """
+    Parses a version string into a packaging_version.Version object.
+    Raises ValueError if the version string is invalid.
+    """
+    if version_str == "latest":
+        return packaging_version.parse("9999999.9999999.9999999")
+    return packaging_version.parse(version_str)
+
+
+registered_versions: List[ComfyAPIWithVersion] = []
+
+
+def register_versions(versions: List[ComfyAPIWithVersion]):
+    versions.sort(key=lambda x: parse_version(x.version))
+    global registered_versions
+    registered_versions = versions
+
+
+def get_all_versions() -> List[ComfyAPIWithVersion]:
+    """
+    Returns a list of all registered ComfyAPI versions.
+    """
+    return registered_versions
--- a/comfy_api/internal/async_to_sync.py
+++ b/comfy_api/internal/async_to_sync.py
@@ -0,0 +1,987 @@
+import asyncio
+import concurrent.futures
+import contextvars
+import functools
+import inspect
+import logging
+import os
+import textwrap
+import threading
+from enum import Enum
+from typing import Optional, Type, get_origin, get_args
+
+
+class TypeTracker:
+    """Tracks types discovered during stub generation for automatic import generation."""
+
+    def __init__(self):
+        self.discovered_types = {}  # type_name -> (module, qualname)
+        self.builtin_types = {
+            "Any",
+            "Dict",
+            "List",
+            "Optional",
+            "Tuple",
+            "Union",
+            "Set",
+            "Sequence",
+            "cast",
+            "NamedTuple",
+            "str",
+            "int",
+            "float",
+            "bool",
+            "None",
+            "bytes",
+            "object",
+            "type",
+            "dict",
+            "list",
+            "tuple",
+            "set",
+        }
+        self.already_imported = (
+            set()
+        )  # Track types already imported to avoid duplicates
+
+    def track_type(self, annotation):
+        """Track a type annotation and record its module/import info."""
+        if annotation is None or annotation is type(None):
+            return
+
+        # Skip builtins and typing module types we already import
+        type_name = getattr(annotation, "__name__", None)
+        if type_name and (
+            type_name in self.builtin_types or type_name in self.already_imported
+        ):
+            return
+
+        # Get module and qualname
+        module = getattr(annotation, "__module__", None)
+        qualname = getattr(annotation, "__qualname__", type_name or "")
+
+        # Skip types from typing module (they're already imported)
+        if module == "typing":
+            return
+
+        # Skip UnionType and GenericAlias from types module as they're handled specially
+        if module == "types" and type_name in ("UnionType", "GenericAlias"):
+            return
+
+        if module and module not in ["builtins", "__main__"]:
+            # Store the type info
+            if type_name:
+                self.discovered_types[type_name] = (module, qualname)
+
+    def get_imports(self, main_module_name: str) -> list[str]:
+        """Generate import statements for all discovered types."""
+        imports = []
+        imports_by_module = {}
+
+        for type_name, (module, qualname) in sorted(self.discovered_types.items()):
+            # Skip types from the main module (they're already imported)
+            if main_module_name and module == main_module_name:
+                continue
+
+            if module not in imports_by_module:
+                imports_by_module[module] = []
+            if type_name not in imports_by_module[module]:  # Avoid duplicates
+                imports_by_module[module].append(type_name)
+
+        # Generate import statements
+        for module, types in sorted(imports_by_module.items()):
+            if len(types) == 1:
+                imports.append(f"from {module} import {types[0]}")
+            else:
+                imports.append(f"from {module} import {', '.join(sorted(set(types)))}")
+
+        return imports
+
+
+class AsyncToSyncConverter:
+    """
+    Provides utilities to convert async classes to sync classes with proper type hints.
+    """
+
+    _thread_pool: Optional[concurrent.futures.ThreadPoolExecutor] = None
+    _thread_pool_lock = threading.Lock()
+    _thread_pool_initialized = False
+
+    @classmethod
+    def get_thread_pool(cls, max_workers=None) -> concurrent.futures.ThreadPoolExecutor:
+        """Get or create the shared thread pool with proper thread-safe initialization."""
+        # Fast path - check if already initialized without acquiring lock
+        if cls._thread_pool_initialized:
+            assert cls._thread_pool is not None, "Thread pool should be initialized"
+            return cls._thread_pool
+
+        # Slow path - acquire lock and create pool if needed
+        with cls._thread_pool_lock:
+            if not cls._thread_pool_initialized:
+                cls._thread_pool = concurrent.futures.ThreadPoolExecutor(
+                    max_workers=max_workers, thread_name_prefix="async_to_sync_"
+                )
+                cls._thread_pool_initialized = True
+
+        # This should never be None at this point, but add assertion for type checker
+        assert cls._thread_pool is not None
+        return cls._thread_pool
+
+    @classmethod
+    def run_async_in_thread(cls, coro_func, *args, **kwargs):
+        """
+        Run an async function in a separate thread from the thread pool.
+        Blocks until the async function completes.
+        Properly propagates contextvars between threads and manages event loops.
+        """
+        # Capture current context - this includes all context variables
+        context = contextvars.copy_context()
+
+        # Store the result and any exception that occurs
+        result_container: dict = {"result": None, "exception": None}
+
+        # Function that runs in the thread pool
+        def run_in_thread():
+            # Create new event loop for this thread
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+
+            try:
+                # Create the coroutine within the context
+                async def run_with_context():
+                    # The coroutine function might access context variables
+                    return await coro_func(*args, **kwargs)
+
+                # Run the coroutine with the captured context
+                # This ensures all context variables are available in the async function
+                result = context.run(loop.run_until_complete, run_with_context())
+                result_container["result"] = result
+            except Exception as e:
+                # Store the exception to re-raise in the calling thread
+                result_container["exception"] = e
+            finally:
+                # Ensure event loop is properly closed to prevent warnings
+                try:
+                    # Cancel any remaining tasks
+                    pending = asyncio.all_tasks(loop)
+                    for task in pending:
+                        task.cancel()
+
+                    # Run the loop briefly to handle cancellations
+                    if pending:
+                        loop.run_until_complete(
+                            asyncio.gather(*pending, return_exceptions=True)
+                        )
+                except Exception:
+                    pass  # Ignore errors during cleanup
+
+                # Close the event loop
+                loop.close()
+
+                # Clear the event loop from the thread
+                asyncio.set_event_loop(None)
+
+        # Submit to thread pool and wait for result
+        thread_pool = cls.get_thread_pool()
+        future = thread_pool.submit(run_in_thread)
+        future.result()  # Wait for completion
+
+        # Re-raise any exception that occurred in the thread
+        if result_container["exception"] is not None:
+            raise result_container["exception"]
+
+        return result_container["result"]
+
+    @classmethod
+    def create_sync_class(cls, async_class: Type, thread_pool_size=10) -> Type:
+        """
+        Creates a new class with synchronous versions of all async methods.
+
+        Args:
+            async_class: The async class to convert
+            thread_pool_size: Size of thread pool to use
+
+        Returns:
+            A new class with sync versions of all async methods
+        """
+        sync_class_name = "ComfyAPISyncStub"
+        cls.get_thread_pool(thread_pool_size)
+
+        # Create a proper class with docstrings and proper base classes
+        sync_class_dict = {
+            "__doc__": async_class.__doc__,
+            "__module__": async_class.__module__,
+            "__qualname__": sync_class_name,
+            "__orig_class__": async_class,  # Store original class for typing references
+        }
+
+        # Create __init__ method
+        def __init__(self, *args, **kwargs):
+            self._async_instance = async_class(*args, **kwargs)
+
+            # Handle annotated class attributes (like execution: Execution)
+            # Get all annotations from the class hierarchy
+            all_annotations = {}
+            for base_class in reversed(inspect.getmro(async_class)):
+                if hasattr(base_class, "__annotations__"):
+                    all_annotations.update(base_class.__annotations__)
+
+            # For each annotated attribute, check if it needs to be created or wrapped
+            for attr_name, attr_type in all_annotations.items():
+                if hasattr(self._async_instance, attr_name):
+                    # Attribute exists on the instance
+                    attr = getattr(self._async_instance, attr_name)
+                    # Check if this attribute needs a sync wrapper
+                    if hasattr(attr, "__class__"):
+                        from comfy_api.internal.singleton import ProxiedSingleton
+
+                        if isinstance(attr, ProxiedSingleton):
+                            # Create a sync version of this attribute
+                            try:
+                                sync_attr_class = cls.create_sync_class(attr.__class__)
+                                # Create instance of the sync wrapper with the async instance
+                                sync_attr = object.__new__(sync_attr_class)  # type: ignore
+                                sync_attr._async_instance = attr
+                                setattr(self, attr_name, sync_attr)
+                            except Exception:
+                                # If we can't create a sync version, keep the original
+                                setattr(self, attr_name, attr)
+                        else:
+                            # Not async, just copy the reference
+                            setattr(self, attr_name, attr)
+                else:
+                    # Attribute doesn't exist, but is annotated - create it
+                    # This handles cases like execution: Execution
+                    if isinstance(attr_type, type):
+                        # Check if the type is defined as an inner class
+                        if hasattr(async_class, attr_type.__name__):
+                            inner_class = getattr(async_class, attr_type.__name__)
+                            from comfy_api.internal.singleton import ProxiedSingleton
+
+                            # Create an instance of the inner class
+                            try:
+                                # For ProxiedSingleton classes, get or create the singleton instance
+                                if issubclass(inner_class, ProxiedSingleton):
+                                    async_instance = inner_class.get_instance()
+                                else:
+                                    async_instance = inner_class()
+
+                                # Create sync wrapper
+                                sync_attr_class = cls.create_sync_class(inner_class)
+                                sync_attr = object.__new__(sync_attr_class)  # type: ignore
+                                sync_attr._async_instance = async_instance
+                                setattr(self, attr_name, sync_attr)
+                                # Also set on the async instance for consistency
+                                setattr(self._async_instance, attr_name, async_instance)
+                            except Exception as e:
+                                logging.warning(
+                                    f"Failed to create instance for {attr_name}: {e}"
+                                )
+
+            # Handle other instance attributes that might not be annotated
+            for name, attr in inspect.getmembers(self._async_instance):
+                if name.startswith("_") or hasattr(self, name):
+                    continue
+
+                # If attribute is an instance of a class, and that class is defined in the original class
+                # we need to check if it needs a sync wrapper
+                if isinstance(attr, object) and not isinstance(
+                    attr, (str, int, float, bool, list, dict, tuple)
+                ):
+                    from comfy_api.internal.singleton import ProxiedSingleton
+
+                    if isinstance(attr, ProxiedSingleton):
+                        # Create a sync version of this nested class
+                        try:
+                            sync_attr_class = cls.create_sync_class(attr.__class__)
+                            # Create instance of the sync wrapper with the async instance
+                            sync_attr = object.__new__(sync_attr_class)  # type: ignore
+                            sync_attr._async_instance = attr
+                            setattr(self, name, sync_attr)
+                        except Exception:
+                            # If we can't create a sync version, keep the original
+                            setattr(self, name, attr)
+
+        sync_class_dict["__init__"] = __init__
+
+        # Process methods from the async class
+        for name, method in inspect.getmembers(
+            async_class, predicate=inspect.isfunction
+        ):
+            if name.startswith("_"):
+                continue
+
+            # Extract the actual return type from a coroutine
+            if inspect.iscoroutinefunction(method):
+                # Create sync version of async method with proper signature
+                @functools.wraps(method)
+                def sync_method(self, *args, _method_name=name, **kwargs):
+                    async_method = getattr(self._async_instance, _method_name)
+                    return AsyncToSyncConverter.run_async_in_thread(
+                        async_method, *args, **kwargs
+                    )
+
+                # Add to the class dict
+                sync_class_dict[name] = sync_method
+            else:
+                # For regular methods, create a proxy method
+                @functools.wraps(method)
+                def proxy_method(self, *args, _method_name=name, **kwargs):
+                    method = getattr(self._async_instance, _method_name)
+                    return method(*args, **kwargs)
+
+                # Add to the class dict
+                sync_class_dict[name] = proxy_method
+
+        # Handle property access
+        for name, prop in inspect.getmembers(
+            async_class, lambda x: isinstance(x, property)
+        ):
+
+            def make_property(name, prop_obj):
+                def getter(self):
+                    value = getattr(self._async_instance, name)
+                    if inspect.iscoroutinefunction(value):
+
+                        def sync_fn(*args, **kwargs):
+                            return AsyncToSyncConverter.run_async_in_thread(
+                                value, *args, **kwargs
+                            )
+
+                        return sync_fn
+                    return value
+
+                def setter(self, value):
+                    setattr(self._async_instance, name, value)
+
+                return property(getter, setter if prop_obj.fset else None)
+
+            sync_class_dict[name] = make_property(name, prop)
+
+        # Create the class
+        sync_class = type(sync_class_name, (object,), sync_class_dict)
+
+        return sync_class
+
+    @classmethod
+    def _format_type_annotation(
+        cls, annotation, type_tracker: Optional[TypeTracker] = None
+    ) -> str:
+        """Convert a type annotation to its string representation for stub files."""
+        if (
+            annotation is inspect.Parameter.empty
+            or annotation is inspect.Signature.empty
+        ):
+            return "Any"
+
+        # Handle None type
+        if annotation is type(None):
+            return "None"
+
+        # Track the type if we have a tracker
+        if type_tracker:
+            type_tracker.track_type(annotation)
+
+        # Try using typing.get_origin/get_args for Python 3.8+
+        try:
+            origin = get_origin(annotation)
+            args = get_args(annotation)
+
+            if origin is not None:
+                # Track the origin type
+                if type_tracker:
+                    type_tracker.track_type(origin)
+
+                # Get the origin name
+                origin_name = getattr(origin, "__name__", str(origin))
+                if "." in origin_name:
+                    origin_name = origin_name.split(".")[-1]
+
+                # Special handling for types.UnionType (Python 3.10+ pipe operator)
+                # Convert to old-style Union for compatibility
+                if str(origin) == "<class 'types.UnionType'>" or origin_name == "UnionType":
+                    origin_name = "Union"
+
+                # Format arguments recursively
+                if args:
+                    formatted_args = []
+                    for arg in args:
+                        # Track each type in the union
+                        if type_tracker:
+                            type_tracker.track_type(arg)
+                        formatted_args.append(cls._format_type_annotation(arg, type_tracker))
+                    return f"{origin_name}[{', '.join(formatted_args)}]"
+                else:
+                    return origin_name
+        except (AttributeError, TypeError):
+            # Fallback for older Python versions or non-generic types
+            pass
+
+        # Handle generic types the old way for compatibility
+        if hasattr(annotation, "__origin__") and hasattr(annotation, "__args__"):
+            origin = annotation.__origin__
+            origin_name = (
+                origin.__name__
+                if hasattr(origin, "__name__")
+                else str(origin).split("'")[1]
+            )
+
+            # Format each type argument
+            args = []
+            for arg in annotation.__args__:
+                args.append(cls._format_type_annotation(arg, type_tracker))
+
+            return f"{origin_name}[{', '.join(args)}]"
+
+        # Handle regular types with __name__
+        if hasattr(annotation, "__name__"):
+            return annotation.__name__
+
+        # Handle special module types (like types from typing module)
+        if hasattr(annotation, "__module__") and hasattr(annotation, "__qualname__"):
+            # For types like typing.Literal, typing.TypedDict, etc.
+            return annotation.__qualname__
+
+        # Last resort: string conversion with cleanup
+        type_str = str(annotation)
+
+        # Clean up common patterns more robustly
+        if type_str.startswith("<class '") and type_str.endswith("'>"):
+            type_str = type_str[8:-2]  # Remove "<class '" and "'>"
+
+        # Remove module prefixes for common modules
+        for prefix in ["typing.", "builtins.", "types."]:
+            if type_str.startswith(prefix):
+                type_str = type_str[len(prefix) :]
+
+        # Handle special cases
+        if type_str in ("_empty", "inspect._empty"):
+            return "None"
+
+        # Fix NoneType (this should rarely be needed now)
+        if type_str == "NoneType":
+            return "None"
+
+        return type_str
+
+    @classmethod
+    def _extract_coroutine_return_type(cls, annotation):
+        """Extract the actual return type from a Coroutine annotation."""
+        if hasattr(annotation, "__args__") and len(annotation.__args__) > 2:
+            # Coroutine[Any, Any, ReturnType] -> extract ReturnType
+            return annotation.__args__[2]
+        return annotation
+
+    @classmethod
+    def _format_parameter_default(cls, default_value) -> str:
+        """Format a parameter's default value for stub files."""
+        if default_value is inspect.Parameter.empty:
+            return ""
+        elif default_value is None:
+            return " = None"
+        elif isinstance(default_value, bool):
+            return f" = {default_value}"
+        elif default_value == {}:
+            return " = {}"
+        elif default_value == []:
+            return " = []"
+        else:
+            return f" = {default_value}"
+
+    @classmethod
+    def _format_method_parameters(
+        cls,
+        sig: inspect.Signature,
+        skip_self: bool = True,
+        type_hints: Optional[dict] = None,
+        type_tracker: Optional[TypeTracker] = None,
+    ) -> str:
+        """Format method parameters for stub files."""
+        params = []
+        if type_hints is None:
+            type_hints = {}
+
+        for i, (param_name, param) in enumerate(sig.parameters.items()):
+            if i == 0 and param_name == "self" and skip_self:
+                params.append("self")
+            else:
+                # Get type annotation from type hints if available, otherwise from signature
+                annotation = type_hints.get(param_name, param.annotation)
+                type_str = cls._format_type_annotation(annotation, type_tracker)
+
+                # Get default value
+                default_str = cls._format_parameter_default(param.default)
+
+                # Combine parameter parts
+                if annotation is inspect.Parameter.empty:
+                    params.append(f"{param_name}: Any{default_str}")
+                else:
+                    params.append(f"{param_name}: {type_str}{default_str}")
+
+        return ", ".join(params)
+
+    @classmethod
+    def _generate_method_signature(
+        cls,
+        method_name: str,
+        method,
+        is_async: bool = False,
+        type_tracker: Optional[TypeTracker] = None,
+    ) -> str:
+        """Generate a complete method signature for stub files."""
+        sig = inspect.signature(method)
+
+        # Try to get evaluated type hints to resolve string annotations
+        try:
+            from typing import get_type_hints
+            type_hints = get_type_hints(method)
+        except Exception:
+            # Fallback to empty dict if we can't get type hints
+            type_hints = {}
+
+        # For async methods, extract the actual return type
+        return_annotation = type_hints.get('return', sig.return_annotation)
+        if is_async and inspect.iscoroutinefunction(method):
+            return_annotation = cls._extract_coroutine_return_type(return_annotation)
+
+        # Format parameters with type hints
+        params_str = cls._format_method_parameters(sig, type_hints=type_hints, type_tracker=type_tracker)
+
+        # Format return type
+        return_type = cls._format_type_annotation(return_annotation, type_tracker)
+        if return_annotation is inspect.Signature.empty:
+            return_type = "None"
+
+        return f"def {method_name}({params_str}) -> {return_type}: ..."
+
+    @classmethod
+    def _generate_imports(
+        cls, async_class: Type, type_tracker: TypeTracker
+    ) -> list[str]:
+        """Generate import statements for the stub file."""
+        imports = []
+
+        # Add standard typing imports
+        imports.append(
+            "from typing import Any, Dict, List, Optional, Tuple, Union, Set, Sequence, cast, NamedTuple"
+        )
+
+        # Add imports from the original module
+        if async_class.__module__ != "builtins":
+            module = inspect.getmodule(async_class)
+            additional_types = []
+
+            if module:
+                # Check if module has __all__ defined
+                module_all = getattr(module, "__all__", None)
+
+                for name, obj in sorted(inspect.getmembers(module)):
+                    if isinstance(obj, type):
+                        # Skip if __all__ is defined and this name isn't in it
+                        # unless it's already been tracked as used in type annotations
+                        if module_all is not None and name not in module_all:
+                            # Check if this type was actually used in annotations
+                            if name not in type_tracker.discovered_types:
+                                continue
+
+                        # Check for NamedTuple
+                        if issubclass(obj, tuple) and hasattr(obj, "_fields"):
+                            additional_types.append(name)
+                            # Mark as already imported
+                            type_tracker.already_imported.add(name)
+                        # Check for Enum
+                        elif issubclass(obj, Enum) and name != "Enum":
+                            additional_types.append(name)
+                            # Mark as already imported
+                            type_tracker.already_imported.add(name)
+
+            if additional_types:
+                type_imports = ", ".join([async_class.__name__] + additional_types)
+                imports.append(f"from {async_class.__module__} import {type_imports}")
+            else:
+                imports.append(
+                    f"from {async_class.__module__} import {async_class.__name__}"
+                )
+
+        # Add imports for all discovered types
+        # Pass the main module name to avoid duplicate imports
+        imports.extend(
+            type_tracker.get_imports(main_module_name=async_class.__module__)
+        )
+
+        # Add base module import if needed
+        if hasattr(inspect.getmodule(async_class), "__name__"):
+            module_name = inspect.getmodule(async_class).__name__
+            if "." in module_name:
+                base_module = module_name.split(".")[0]
+                # Only add if not already importing from it
+                if not any(imp.startswith(f"from {base_module}") for imp in imports):
+                    imports.append(f"import {base_module}")
+
+        return imports
+
+    @classmethod
+    def _get_class_attributes(cls, async_class: Type) -> list[tuple[str, Type]]:
+        """Extract class attributes that are classes themselves."""
+        class_attributes = []
+
+        # Look for class attributes that are classes
+        for name, attr in sorted(inspect.getmembers(async_class)):
+            if isinstance(attr, type) and not name.startswith("_"):
+                class_attributes.append((name, attr))
+            elif (
+                hasattr(async_class, "__annotations__")
+                and name in async_class.__annotations__
+            ):
+                annotation = async_class.__annotations__[name]
+                if isinstance(annotation, type):
+                    class_attributes.append((name, annotation))
+
+        return class_attributes
+
+    @classmethod
+    def _generate_inner_class_stub(
+        cls,
+        name: str,
+        attr: Type,
+        indent: str = "    ",
+        type_tracker: Optional[TypeTracker] = None,
+    ) -> list[str]:
+        """Generate stub for an inner class."""
+        stub_lines = []
+        stub_lines.append(f"{indent}class {name}Sync:")
+
+        # Add docstring if available
+        if hasattr(attr, "__doc__") and attr.__doc__:
+            stub_lines.extend(
+                cls._format_docstring_for_stub(attr.__doc__, f"{indent}    ")
+            )
+
+        # Add __init__ if it exists
+        if hasattr(attr, "__init__"):
+            try:
+                init_method = getattr(attr, "__init__")
+                init_sig = inspect.signature(init_method)
+
+                # Try to get type hints
+                try:
+                    from typing import get_type_hints
+                    init_hints = get_type_hints(init_method)
+                except Exception:
+                    init_hints = {}
+
+                # Format parameters
+                params_str = cls._format_method_parameters(
+                    init_sig, type_hints=init_hints, type_tracker=type_tracker
+                )
+                # Add __init__ docstring if available (before the method)
+                if hasattr(init_method, "__doc__") and init_method.__doc__:
+                    stub_lines.extend(
+                        cls._format_docstring_for_stub(
+                            init_method.__doc__, f"{indent}    "
+                        )
+                    )
+                stub_lines.append(
+                    f"{indent}    def __init__({params_str}) -> None: ..."
+                )
+            except (ValueError, TypeError):
+                stub_lines.append(
+                    f"{indent}    def __init__(self, *args, **kwargs) -> None: ..."
+                )
+
+        # Add methods to the inner class
+        has_methods = False
+        for method_name, method in sorted(
+            inspect.getmembers(attr, predicate=inspect.isfunction)
+        ):
+            if method_name.startswith("_"):
+                continue
+
+            has_methods = True
+            try:
+                # Add method docstring if available (before the method signature)
+                if method.__doc__:
+                    stub_lines.extend(
+                        cls._format_docstring_for_stub(method.__doc__, f"{indent}    ")
+                    )
+
+                method_sig = cls._generate_method_signature(
+                    method_name, method, is_async=True, type_tracker=type_tracker
+                )
+                stub_lines.append(f"{indent}    {method_sig}")
+            except (ValueError, TypeError):
+                stub_lines.append(
+                    f"{indent}    def {method_name}(self, *args, **kwargs): ..."
+                )
+
+        if not has_methods:
+            stub_lines.append(f"{indent}    pass")
+
+        return stub_lines
+
+    @classmethod
+    def _format_docstring_for_stub(
+        cls, docstring: str, indent: str = "    "
+    ) -> list[str]:
+        """Format a docstring for inclusion in a stub file with proper indentation."""
+        if not docstring:
+            return []
+
+        # First, dedent the docstring to remove any existing indentation
+        dedented = textwrap.dedent(docstring).strip()
+
+        # Split into lines
+        lines = dedented.split("\n")
+
+        # Build the properly indented docstring
+        result = []
+        result.append(f'{indent}"""')
+
+        for line in lines:
+            if line.strip():  # Non-empty line
+                result.append(f"{indent}{line}")
+            else:  # Empty line
+                result.append("")
+
+        result.append(f'{indent}"""')
+        return result
+
+    @classmethod
+    def _post_process_stub_content(cls, stub_content: list[str]) -> list[str]:
+        """Post-process stub content to fix any remaining issues."""
+        processed = []
+
+        for line in stub_content:
+            # Skip processing imports
+            if line.startswith(("from ", "import ")):
+                processed.append(line)
+                continue
+
+            # Fix method signatures missing return types
+            if (
+                line.strip().startswith("def ")
+                and line.strip().endswith(": ...")
+                and ") -> " not in line
+            ):
+                # Add -> None for methods without return annotation
+                line = line.replace(": ...", " -> None: ...")
+
+            processed.append(line)
+
+        return processed
+
+    @classmethod
+    def generate_stub_file(cls, async_class: Type, sync_class: Type) -> None:
+        """
+        Generate a .pyi stub file for the sync class to help IDEs with type checking.
+        """
+        try:
+            # Only generate stub if we can determine module path
+            if async_class.__module__ == "__main__":
+                return
+
+            module = inspect.getmodule(async_class)
+            if not module:
+                return
+
+            module_path = module.__file__
+            if not module_path:
+                return
+
+            # Create stub file path in a 'generated' subdirectory
+            module_dir = os.path.dirname(module_path)
+            stub_dir = os.path.join(module_dir, "generated")
+
+            # Ensure the generated directory exists
+            os.makedirs(stub_dir, exist_ok=True)
+
+            module_name = os.path.basename(module_path)
+            if module_name.endswith(".py"):
+                module_name = module_name[:-3]
+
+            sync_stub_path = os.path.join(stub_dir, f"{sync_class.__name__}.pyi")
+
+            # Create a type tracker for this stub generation
+            type_tracker = TypeTracker()
+
+            stub_content = []
+
+            # We'll generate imports after processing all methods to capture all types
+            # Leave a placeholder for imports
+            imports_placeholder_index = len(stub_content)
+            stub_content.append("")  # Will be replaced with imports later
+
+            # Class definition
+            stub_content.append(f"class {sync_class.__name__}:")
+
+            # Docstring
+            if async_class.__doc__:
+                stub_content.extend(
+                    cls._format_docstring_for_stub(async_class.__doc__, "    ")
+                )
+
+            # Generate __init__
+            try:
+                init_method = async_class.__init__
+                init_signature = inspect.signature(init_method)
+
+                # Try to get type hints for __init__
+                try:
+                    from typing import get_type_hints
+                    init_hints = get_type_hints(init_method)
+                except Exception:
+                    init_hints = {}
+
+                # Format parameters
+                params_str = cls._format_method_parameters(
+                    init_signature, type_hints=init_hints, type_tracker=type_tracker
+                )
+                # Add __init__ docstring if available (before the method)
+                if hasattr(init_method, "__doc__") and init_method.__doc__:
+                    stub_content.extend(
+                        cls._format_docstring_for_stub(init_method.__doc__, "    ")
+                    )
+                stub_content.append(f"    def __init__({params_str}) -> None: ...")
+            except (ValueError, TypeError):
+                stub_content.append(
+                    "    def __init__(self, *args, **kwargs) -> None: ..."
+                )
+
+            stub_content.append("")  # Add newline after __init__
+
+            # Get class attributes
+            class_attributes = cls._get_class_attributes(async_class)
+
+            # Generate inner classes
+            for name, attr in class_attributes:
+                inner_class_stub = cls._generate_inner_class_stub(
+                    name, attr, type_tracker=type_tracker
+                )
+                stub_content.extend(inner_class_stub)
+                stub_content.append("")  # Add newline after the inner class
+
+            # Add methods to the main class
+            processed_methods = set()  # Keep track of methods we've processed
+            for name, method in sorted(
+                inspect.getmembers(async_class, predicate=inspect.isfunction)
+            ):
+                if name.startswith("_") or name in processed_methods:
+                    continue
+
+                processed_methods.add(name)
+
+                try:
+                    method_sig = cls._generate_method_signature(
+                        name, method, is_async=True, type_tracker=type_tracker
+                    )
+
+                    # Add docstring if available (before the method signature for proper formatting)
+                    if method.__doc__:
+                        stub_content.extend(
+                            cls._format_docstring_for_stub(method.__doc__, "    ")
+                        )
+
+                    stub_content.append(f"    {method_sig}")
+
+                    stub_content.append("")  # Add newline after each method
+
+                except (ValueError, TypeError):
+                    # If we can't get the signature, just add a simple stub
+                    stub_content.append(f"    def {name}(self, *args, **kwargs): ...")
+                    stub_content.append("")  # Add newline
+
+            # Add properties
+            for name, prop in sorted(
+                inspect.getmembers(async_class, lambda x: isinstance(x, property))
+            ):
+                stub_content.append("    @property")
+                stub_content.append(f"    def {name}(self) -> Any: ...")
+                if prop.fset:
+                    stub_content.append(f"    @{name}.setter")
+                    stub_content.append(
+                        f"    def {name}(self, value: Any) -> None: ..."
+                    )
+                stub_content.append("")  # Add newline after each property
+
+            # Add placeholders for the nested class instances
+            # Check the actual attribute names from class annotations and attributes
+            attribute_mappings = {}
+
+            # First check annotations for typed attributes (including from parent classes)
+            # Collect all annotations from the class hierarchy
+            all_annotations = {}
+            for base_class in reversed(inspect.getmro(async_class)):
+                if hasattr(base_class, "__annotations__"):
+                    all_annotations.update(base_class.__annotations__)
+
+            for attr_name, attr_type in sorted(all_annotations.items()):
+                for class_name, class_type in class_attributes:
+                    # If the class type matches the annotated type
+                    if (
+                        attr_type == class_type
+                        or (hasattr(attr_type, "__name__") and attr_type.__name__ == class_name)
+                        or (isinstance(attr_type, str) and attr_type == class_name)
+                    ):
+                        attribute_mappings[class_name] = attr_name
+
+            # Remove the extra checking - annotations should be sufficient
+
+            # Add the attribute declarations with proper names
+            for class_name, class_type in class_attributes:
+                # Check if there's a mapping from annotation
+                attr_name = attribute_mappings.get(class_name, class_name)
+                # Use the annotation name if it exists, even if the attribute doesn't exist yet
+                # This is because the attribute might be created at runtime
+                stub_content.append(f"    {attr_name}: {class_name}Sync")
+
+            stub_content.append("")  # Add a final newline
+
+            # Now generate imports with all discovered types
+            imports = cls._generate_imports(async_class, type_tracker)
+
+            # Deduplicate imports while preserving order
+            seen = set()
+            unique_imports = []
+            for imp in imports:
+                if imp not in seen:
+                    seen.add(imp)
+                    unique_imports.append(imp)
+                else:
+                    logging.warning(f"Duplicate import detected: {imp}")
+
+            # Replace the placeholder with actual imports
+            stub_content[imports_placeholder_index : imports_placeholder_index + 1] = (
+                unique_imports
+            )
+
+            # Post-process stub content
+            stub_content = cls._post_process_stub_content(stub_content)
+
+            # Write stub file
+            with open(sync_stub_path, "w") as f:
+                f.write("\n".join(stub_content))
+
+            logging.info(f"Generated stub file: {sync_stub_path}")
+
+        except Exception as e:
+            # If stub generation fails, log the error but don't break the main functionality
+            logging.error(
+                f"Error generating stub file for {sync_class.__name__}: {str(e)}"
+            )
+            import traceback
+
+            logging.error(traceback.format_exc())
+
+
+def create_sync_class(async_class: Type, thread_pool_size=10) -> Type:
+    """
+    Creates a sync version of an async class
+
+    Args:
+        async_class: The async class to convert
+        thread_pool_size: Size of thread pool to use
+
+    Returns:
+        A new class with sync versions of all async methods
+    """
+    return AsyncToSyncConverter.create_sync_class(async_class, thread_pool_size)
--- a/comfy_api/internal/singleton.py
+++ b/comfy_api/internal/singleton.py
@@ -0,0 +1,33 @@
+from typing import Type, TypeVar
+
+class SingletonMetaclass(type):
+    T = TypeVar("T", bound="SingletonMetaclass")
+    _instances = {}
+
+    def __call__(cls, *args, **kwargs):
+        if cls not in cls._instances:
+            cls._instances[cls] = super(SingletonMetaclass, cls).__call__(
+                *args, **kwargs
+            )
+        return cls._instances[cls]
+
+    def inject_instance(cls: Type[T], instance: T) -> None:
+        assert cls not in SingletonMetaclass._instances, (
+            "Cannot inject instance after first instantiation"
+        )
+        SingletonMetaclass._instances[cls] = instance
+
+    def get_instance(cls: Type[T], *args, **kwargs) -> T:
+        """
+        Gets the singleton instance of the class, creating it if it doesn't exist.
+        """
+        if cls not in SingletonMetaclass._instances:
+            SingletonMetaclass._instances[cls] = super(
+                SingletonMetaclass, cls
+            ).__call__(*args, **kwargs)
+        return cls._instances[cls]
+
+
+class ProxiedSingleton(object, metaclass=SingletonMetaclass):
+    def __init__(self):
+        super().__init__()
--- a/comfy_api/latest/init.py
+++ b/comfy_api/latest/init.py
@@ -0,0 +1,124 @@
+from __future__ import annotations
+
+from abc import ABC, abstractmethod
+from typing import Type, TYPE_CHECKING
+from comfy_api.internal import ComfyAPIBase
+from comfy_api.internal.singleton import ProxiedSingleton
+from comfy_api.internal.async_to_sync import create_sync_class
+from comfy_api.latest._input import ImageInput, AudioInput, MaskInput, LatentInput, VideoInput
+from comfy_api.latest._input_impl import VideoFromFile, VideoFromComponents
+from comfy_api.latest._util import VideoCodec, VideoContainer, VideoComponents
+from comfy_api.latest._io import _IO as io  #noqa: F401
+from comfy_api.latest._ui import _UI as ui  #noqa: F401
+# from comfy_api.latest._resources import _RESOURCES as resources  #noqa: F401
+from comfy_execution.utils import get_executing_context
+from comfy_execution.progress import get_progress_state, PreviewImageTuple
+from PIL import Image
+from comfy.cli_args import args
+import numpy as np
+
+
+class ComfyAPI_latest(ComfyAPIBase):
+    VERSION = "latest"
+    STABLE = False
+
+    class Execution(ProxiedSingleton):
+        async def set_progress(
+            self,
+            value: float,
+            max_value: float,
+            node_id: str | None = None,
+            preview_image: Image.Image | ImageInput | None = None,
+            ignore_size_limit: bool = False,
+        ) -> None:
+            """
+            Update the progress bar displayed in the ComfyUI interface.
+
+            This function allows custom nodes and API calls to report their progress
+            back to the user interface, providing visual feedback during long operations.
+
+            Migration from previous API: comfy.utils.PROGRESS_BAR_HOOK
+            """
+            executing_context = get_executing_context()
+            if node_id is None and executing_context is not None:
+                node_id = executing_context.node_id
+            if node_id is None:
+                raise ValueError("node_id must be provided if not in executing context")
+
+            # Convert preview_image to PreviewImageTuple if needed
+            to_display: PreviewImageTuple | Image.Image | ImageInput | None = preview_image
+            if to_display is not None:
+                # First convert to PIL Image if needed
+                if isinstance(to_display, ImageInput):
+                    # Convert ImageInput (torch.Tensor) to PIL Image
+                    # Handle tensor shape [B, H, W, C] -> get first image if batch
+                    tensor = to_display
+                    if len(tensor.shape) == 4:
+                        tensor = tensor[0]
+
+                    # Convert to numpy array and scale to 0-255
+                    image_np = (tensor.cpu().numpy() * 255).astype(np.uint8)
+                    to_display = Image.fromarray(image_np)
+
+                if isinstance(to_display, Image.Image):
+                    # Detect image format from PIL Image
+                    image_format = to_display.format if to_display.format else "JPEG"
+                    # Use None for preview_size if ignore_size_limit is True
+                    preview_size = None if ignore_size_limit else args.preview_size
+                    to_display = (image_format, to_display, preview_size)
+
+            get_progress_state().update_progress(
+                node_id=node_id,
+                value=value,
+                max_value=max_value,
+                image=to_display,
+            )
+
+    execution: Execution
+
+class ComfyExtension(ABC):
+    async def on_load(self) -> None:
+        """
+        Called when an extension is loaded.
+        This should be used to initialize any global resources neeeded by the extension.
+        """
+
+    @abstractmethod
+    async def get_node_list(self) -> list[type[io.ComfyNode]]:
+        """
+        Returns a list of nodes that this extension provides.
+        """
+
+class Input:
+    Image = ImageInput
+    Audio = AudioInput
+    Mask = MaskInput
+    Latent = LatentInput
+    Video = VideoInput
+
+class InputImpl:
+    VideoFromFile = VideoFromFile
+    VideoFromComponents = VideoFromComponents
+
+class Types:
+    VideoCodec = VideoCodec
+    VideoContainer = VideoContainer
+    VideoComponents = VideoComponents
+
+ComfyAPI = ComfyAPI_latest
+
+# Create a synchronous version of the API
+if TYPE_CHECKING:
+    import comfy_api.latest.generated.ComfyAPISyncStub  # type: ignore
+
+    ComfyAPISync: Type[comfy_api.latest.generated.ComfyAPISyncStub.ComfyAPISyncStub]
+ComfyAPISync = create_sync_class(ComfyAPI_latest)
+
+__all__ = [
+    "ComfyAPI",
+    "ComfyAPISync",
+    "Input",
+    "InputImpl",
+    "Types",
+    "ComfyExtension",
+]
--- a/comfy_api/latest/_input/init.py
+++ b/comfy_api/latest/_input/init.py
@@ -0,0 +1,10 @@
+from .basic_types import ImageInput, AudioInput, MaskInput, LatentInput
+from .video_types import VideoInput
+
+__all__ = [
+    "ImageInput",
+    "AudioInput",
+    "VideoInput",
+    "MaskInput",
+    "LatentInput",
+]
--- a/comfy_api/latest/_input/basic_types.py
+++ b/comfy_api/latest/_input/basic_types.py
@@ -0,0 +1,42 @@
+import torch
+from typing import TypedDict, List, Optional
+
+ImageInput = torch.Tensor
+"""
+An image in format [B, H, W, C] where B is the batch size, C is the number of channels,
+"""
+
+MaskInput = torch.Tensor
+"""
+A mask in format [B, H, W] where B is the batch size
+"""
+
+class AudioInput(TypedDict):
+    """
+    TypedDict representing audio input.
+    """
+
+    waveform: torch.Tensor
+    """
+    Tensor in the format [B, C, T] where B is the batch size, C is the number of channels,
+    """
+
+    sample_rate: int
+
+class LatentInput(TypedDict):
+    """
+    TypedDict representing latent input.
+    """
+
+    samples: torch.Tensor
+    """
+    Tensor in the format [B, C, H, W] where B is the batch size, C is the number of channels,
+    H is the height, and W is the width.
+    """
+
+    noise_mask: Optional[MaskInput]
+    """
+    Optional noise mask tensor in the same format as samples.
+    """
+
+    batch_index: Optional[List[int]]
--- a/comfy_api/latest/_input/video_types.py
+++ b/comfy_api/latest/_input/video_types.py
@@ -0,0 +1,85 @@
+from __future__ import annotations
+from abc import ABC, abstractmethod
+from typing import Optional, Union
+import io
+import av
+from comfy_api.util import VideoContainer, VideoCodec, VideoComponents
+
+class VideoInput(ABC):
+    """
+    Abstract base class for video input types.
+    """
+
+    @abstractmethod
+    def get_components(self) -> VideoComponents:
+        """
+        Abstract method to get the video components (images, audio, and frame rate).
+
+        Returns:
+            VideoComponents containing images, audio, and frame rate
+        """
+        pass
+
+    @abstractmethod
+    def save_to(
+        self,
+        path: str,
+        format: VideoContainer = VideoContainer.AUTO,
+        codec: VideoCodec = VideoCodec.AUTO,
+        metadata: Optional[dict] = None
+    ):
+        """
+        Abstract method to save the video input to a file.
+        """
+        pass
+
+    def get_stream_source(self) -> Union[str, io.BytesIO]:
+        """
+        Get a streamable source for the video. This allows processing without
+        loading the entire video into memory.
+
+        Returns:
+            Either a file path (str) or a BytesIO object that can be opened with av.
+
+        Default implementation creates a BytesIO buffer, but subclasses should
+        override this for better performance when possible.
+        """
+        buffer = io.BytesIO()
+        self.save_to(buffer)
+        buffer.seek(0)
+        return buffer
+
+    # Provide a default implementation, but subclasses can provide optimized versions
+    # if possible.
+    def get_dimensions(self) -> tuple[int, int]:
+        """
+        Returns the dimensions of the video input.
+
+        Returns:
+            Tuple of (width, height)
+        """
+        components = self.get_components()
+        return components.images.shape[2], components.images.shape[1]
+
+    def get_duration(self) -> float:
+        """
+        Returns the duration of the video in seconds.
+
+        Returns:
+            Duration in seconds
+        """
+        components = self.get_components()
+        frame_count = components.images.shape[0]
+        return float(frame_count / components.frame_rate)
+
+    def get_container_format(self) -> str:
+        """
+        Returns the container format of the video (e.g., 'mp4', 'mov', 'avi').
+
+        Returns:
+            Container format as string
+        """
+        # Default implementation - subclasses should override for better performance
+        source = self.get_stream_source()
+        with av.open(source, mode="r") as container:
+            return container.format.name
--- a/comfy_api/latest/_input_impl/init.py
+++ b/comfy_api/latest/_input_impl/init.py
@@ -0,0 +1,7 @@
+from .video_types import VideoFromFile, VideoFromComponents
+
+__all__ = [
+    # Implementations
+    "VideoFromFile",
+    "VideoFromComponents",
+]
--- a/comfy_api/latest/_input_impl/video_types.py
+++ b/comfy_api/latest/_input_impl/video_types.py
@@ -0,0 +1,324 @@
+from __future__ import annotations
+from av.container import InputContainer
+from av.subtitles.stream import SubtitleStream
+from fractions import Fraction
+from typing import Optional
+from comfy_api.latest._input import AudioInput, VideoInput
+import av
+import io
+import json
+import numpy as np
+import torch
+from comfy_api.latest._util import VideoContainer, VideoCodec, VideoComponents
+
+
+def container_to_output_format(container_format: str | None) -> str | None:
+    """
+    A container's `format` may be a comma-separated list of formats.
+    E.g., iso container's `format` may be `mov,mp4,m4a,3gp,3g2,mj2`.
+    However, writing to a file/stream with `av.open` requires a single format,
+    or `None` to auto-detect.
+    """
+    if not container_format:
+        return None  # Auto-detect
+
+    if "," not in container_format:
+        return container_format
+
+    formats = container_format.split(",")
+    return formats[0]
+
+
+def get_open_write_kwargs(
+    dest: str | io.BytesIO, container_format: str, to_format: str | None
+) -> dict:
+    """Get kwargs for writing a `VideoFromFile` to a file/stream with `av.open`"""
+    open_kwargs = {
+        "mode": "w",
+        # If isobmff, preserve custom metadata tags (workflow, prompt, extra_pnginfo)
+        "options": {"movflags": "use_metadata_tags"},
+    }
+
+    is_write_to_buffer = isinstance(dest, io.BytesIO)
+    if is_write_to_buffer:
+        # Set output format explicitly, since it cannot be inferred from file extension
+        if to_format == VideoContainer.AUTO:
+            to_format = container_format.lower()
+        elif isinstance(to_format, str):
+            to_format = to_format.lower()
+        open_kwargs["format"] = container_to_output_format(to_format)
+
+    return open_kwargs
+
+
+class VideoFromFile(VideoInput):
+    """
+    Class representing video input from a file.
+    """
+
+    def __init__(self, file: str | io.BytesIO):
+        """
+        Initialize the VideoFromFile object based off of either a path on disk or a BytesIO object
+        containing the file contents.
+        """
+        self.__file = file
+
+    def get_stream_source(self) -> str | io.BytesIO:
+        """
+        Return the underlying file source for efficient streaming.
+        This avoids unnecessary memory copies when the source is already a file path.
+        """
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)
+        return self.__file
+
+    def get_dimensions(self) -> tuple[int, int]:
+        """
+        Returns the dimensions of the video input.
+
+        Returns:
+            Tuple of (width, height)
+        """
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)  # Reset the BytesIO object to the beginning
+        with av.open(self.__file, mode='r') as container:
+            for stream in container.streams:
+                if stream.type == 'video':
+                    assert isinstance(stream, av.VideoStream)
+                    return stream.width, stream.height
+        raise ValueError(f"No video stream found in file '{self.__file}'")
+
+    def get_duration(self) -> float:
+        """
+        Returns the duration of the video in seconds.
+
+        Returns:
+            Duration in seconds
+        """
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)
+        with av.open(self.__file, mode="r") as container:
+            if container.duration is not None:
+                return float(container.duration / av.time_base)
+
+            # Fallback: calculate from frame count and frame rate
+            video_stream = next(
+                (s for s in container.streams if s.type == "video"), None
+            )
+            if video_stream and video_stream.frames and video_stream.average_rate:
+                return float(video_stream.frames / video_stream.average_rate)
+
+            # Last resort: decode frames to count them
+            if video_stream and video_stream.average_rate:
+                frame_count = 0
+                container.seek(0)
+                for packet in container.demux(video_stream):
+                    for _ in packet.decode():
+                        frame_count += 1
+                if frame_count > 0:
+                    return float(frame_count / video_stream.average_rate)
+
+        raise ValueError(f"Could not determine duration for file '{self.__file}'")
+
+    def get_container_format(self) -> str:
+        """
+        Returns the container format of the video (e.g., 'mp4', 'mov', 'avi').
+
+        Returns:
+            Container format as string
+        """
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)
+        with av.open(self.__file, mode='r') as container:
+            return container.format.name
+
+    def get_components_internal(self, container: InputContainer) -> VideoComponents:
+        # Get video frames
+        frames = []
+        for frame in container.decode(video=0):
+            img = frame.to_ndarray(format='rgb24')  # shape: (H, W, 3)
+            img = torch.from_numpy(img) / 255.0  # shape: (H, W, 3)
+            frames.append(img)
+
+        images = torch.stack(frames) if len(frames) > 0 else torch.zeros(0, 3, 0, 0)
+
+        # Get frame rate
+        video_stream = next(s for s in container.streams if s.type == 'video')
+        frame_rate = Fraction(video_stream.average_rate) if video_stream and video_stream.average_rate else Fraction(1)
+
+        # Get audio if available
+        audio = None
+        try:
+            container.seek(0)  # Reset the container to the beginning
+            for stream in container.streams:
+                if stream.type != 'audio':
+                    continue
+                assert isinstance(stream, av.AudioStream)
+                audio_frames = []
+                for packet in container.demux(stream):
+                    for frame in packet.decode():
+                        assert isinstance(frame, av.AudioFrame)
+                        audio_frames.append(frame.to_ndarray())  # shape: (channels, samples)
+                if len(audio_frames) > 0:
+                    audio_data = np.concatenate(audio_frames, axis=1)  # shape: (channels, total_samples)
+                    audio_tensor = torch.from_numpy(audio_data).unsqueeze(0)  # shape: (1, channels, total_samples)
+                    audio = AudioInput({
+                        "waveform": audio_tensor,
+                        "sample_rate": int(stream.sample_rate) if stream.sample_rate else 1,
+                    })
+        except StopIteration:
+            pass  # No audio stream
+
+        metadata = container.metadata
+        return VideoComponents(images=images, audio=audio, frame_rate=frame_rate, metadata=metadata)
+
+    def get_components(self) -> VideoComponents:
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)  # Reset the BytesIO object to the beginning
+        with av.open(self.__file, mode='r') as container:
+            return self.get_components_internal(container)
+        raise ValueError(f"No video stream found in file '{self.__file}'")
+
+    def save_to(
+        self,
+        path: str | io.BytesIO,
+        format: VideoContainer = VideoContainer.AUTO,
+        codec: VideoCodec = VideoCodec.AUTO,
+        metadata: Optional[dict] = None
+    ):
+        if isinstance(self.__file, io.BytesIO):
+            self.__file.seek(0)  # Reset the BytesIO object to the beginning
+        with av.open(self.__file, mode='r') as container:
+            container_format = container.format.name
+            video_encoding = container.streams.video[0].codec.name if len(container.streams.video) > 0 else None
+            reuse_streams = True
+            if format != VideoContainer.AUTO and format not in container_format.split(","):
+                reuse_streams = False
+            if codec != VideoCodec.AUTO and codec != video_encoding and video_encoding is not None:
+                reuse_streams = False
+
+            if not reuse_streams:
+                components = self.get_components_internal(container)
+                video = VideoFromComponents(components)
+                return video.save_to(
+                    path,
+                    format=format,
+                    codec=codec,
+                    metadata=metadata
+                )
+
+            streams = container.streams
+
+            open_kwargs = get_open_write_kwargs(path, container_format, format)
+            with av.open(path, **open_kwargs) as output_container:
+                # Copy over the original metadata
+                for key, value in container.metadata.items():
+                    if metadata is None or key not in metadata:
+                        output_container.metadata[key] = value
+
+                # Add our new metadata
+                if metadata is not None:
+                    for key, value in metadata.items():
+                        if isinstance(value, str):
+                            output_container.metadata[key] = value
+                        else:
+                            output_container.metadata[key] = json.dumps(value)
+
+                # Add streams to the new container
+                stream_map = {}
+                for stream in streams:
+                    if isinstance(stream, (av.VideoStream, av.AudioStream, SubtitleStream)):
+                        out_stream = output_container.add_stream_from_template(template=stream, opaque=True)
+                        stream_map[stream] = out_stream
+
+                # Write packets to the new container
+                for packet in container.demux():
+                    if packet.stream in stream_map and packet.dts is not None:
+                        packet.stream = stream_map[packet.stream]
+                        output_container.mux(packet)
+
+class VideoFromComponents(VideoInput):
+    """
+    Class representing video input from tensors.
+    """
+
+    def __init__(self, components: VideoComponents):
+        self.__components = components
+
+    def get_components(self) -> VideoComponents:
+        return VideoComponents(
+            images=self.__components.images,
+            audio=self.__components.audio,
+            frame_rate=self.__components.frame_rate
+        )
+
+    def save_to(
+        self,
+        path: str,
+        format: VideoContainer = VideoContainer.AUTO,
+        codec: VideoCodec = VideoCodec.AUTO,
+        metadata: Optional[dict] = None
+    ):
+        if format != VideoContainer.AUTO and format != VideoContainer.MP4:
+            raise ValueError("Only MP4 format is supported for now")
+        if codec != VideoCodec.AUTO and codec != VideoCodec.H264:
+            raise ValueError("Only H264 codec is supported for now")
+        with av.open(path, mode='w', options={'movflags': 'use_metadata_tags'}) as output:
+            # Add metadata before writing any streams
+            if metadata is not None:
+                for key, value in metadata.items():
+                    output.metadata[key] = json.dumps(value)
+
+            frame_rate = Fraction(round(self.__components.frame_rate * 1000), 1000)
+            # Create a video stream
+            video_stream = output.add_stream('h264', rate=frame_rate)
+            video_stream.width = self.__components.images.shape[2]
+            video_stream.height = self.__components.images.shape[1]
+            video_stream.pix_fmt = 'yuv420p'
+
+            # Create an audio stream
+            audio_sample_rate = 1
+            audio_stream: Optional[av.AudioStream] = None
+            if self.__components.audio:
+                audio_sample_rate = int(self.__components.audio['sample_rate'])
+                audio_stream = output.add_stream('aac', rate=audio_sample_rate)
+                audio_stream.sample_rate = audio_sample_rate
+                audio_stream.format = 'fltp'
+
+            # Encode video
+            for i, frame in enumerate(self.__components.images):
+                img = (frame * 255).clamp(0, 255).byte().cpu().numpy() # shape: (H, W, 3)
+                frame = av.VideoFrame.from_ndarray(img, format='rgb24')
+                frame = frame.reformat(format='yuv420p')  # Convert to YUV420P as required by h264
+                packet = video_stream.encode(frame)
+                output.mux(packet)
+
+            # Flush video
+            packet = video_stream.encode(None)
+            output.mux(packet)
+
+            if audio_stream and self.__components.audio:
+                # Encode audio
+                samples_per_frame = int(audio_sample_rate / frame_rate)
+                num_frames = self.__components.audio['waveform'].shape[2] // samples_per_frame
+                for i in range(num_frames):
+                    start = i * samples_per_frame
+                    end = start + samples_per_frame
+                    # TODO(Feature) - Add support for stereo audio
+                    chunk = (
+                        self.__components.audio["waveform"][0, 0, start:end]
+                        .unsqueeze(0)
+                        .contiguous()
+                        .numpy()
+                    )
+                    audio_frame = av.AudioFrame.from_ndarray(chunk, format='fltp', layout='mono')
+                    audio_frame.sample_rate = audio_sample_rate
+                    audio_frame.pts = i * samples_per_frame
+                    for packet in audio_stream.encode(audio_frame):
+                        output.mux(packet)
+
+                # Flush audio
+                for packet in audio_stream.encode(None):
+                    output.mux(packet)
+
+
--- a/comfy_api/latest/_io.py
+++ b/comfy_api/latest/_io.py
--- a/comfy_api/latest/_resources.py
+++ b/comfy_api/latest/_resources.py
@@ -0,0 +1,72 @@
+from __future__ import annotations
+import comfy.utils
+import folder_paths
+import logging
+from abc import ABC, abstractmethod
+from typing import Any
+import torch
+
+class ResourceKey(ABC):
+    Type = Any
+    def __init__(self):
+        ...
+
+class TorchDictFolderFilename(ResourceKey):
+    '''Key for requesting a torch file via file_name from a folder category.'''
+    Type = dict[str, torch.Tensor]
+    def __init__(self, folder_name: str, file_name: str):
+        self.folder_name = folder_name
+        self.file_name = file_name
+
+    def __hash__(self):
+        return hash((self.folder_name, self.file_name))
+
+    def __eq__(self, other: object) -> bool:
+        if not isinstance(other, TorchDictFolderFilename):
+            return False
+        return self.folder_name == other.folder_name and self.file_name == other.file_name
+
+    def __str__(self):
+        return f"{self.folder_name} -> {self.file_name}"
+
+class Resources(ABC):
+    def __init__(self):
+        ...
+
+    @abstractmethod
+    def get(self, key: ResourceKey, default: Any=...) -> Any:
+        pass
+
+class ResourcesLocal(Resources):
+    def __init__(self):
+        super().__init__()
+        self.local_resources: dict[ResourceKey, Any] = {}
+
+    def get(self, key: ResourceKey, default: Any=...) -> Any:
+        cached = self.local_resources.get(key, None)
+        if cached is not None:
+            logging.info(f"Using cached resource '{key}'")
+            return cached
+        logging.info(f"Loading resource '{key}'")
+        to_return = None
+        if isinstance(key, TorchDictFolderFilename):
+            if default is ...:
+                to_return = comfy.utils.load_torch_file(folder_paths.get_full_path_or_raise(key.folder_name, key.file_name), safe_load=True)
+            else:
+                full_path = folder_paths.get_full_path(key.folder_name, key.file_name)
+                if full_path is not None:
+                    to_return = comfy.utils.load_torch_file(full_path, safe_load=True)
+
+        if to_return is not None:
+            self.local_resources[key] = to_return
+            return to_return
+        if default is not ...:
+            return default
+        raise Exception(f"Unsupported resource key type: {type(key)}")
+
+
+class _RESOURCES:
+    ResourceKey = ResourceKey
+    TorchDictFolderFilename = TorchDictFolderFilename
+    Resources = Resources
+    ResourcesLocal = ResourcesLocal
--- a/comfy_api/latest/_ui.py
+++ b/comfy_api/latest/_ui.py
@@ -0,0 +1,463 @@
+from __future__ import annotations
+
+import json
+import os
+import random
+from io import BytesIO
+from typing import Type
+
+import av
+import numpy as np
+import torch
+try:
+    import torchaudio
+    TORCH_AUDIO_AVAILABLE = True
+except ImportError:
+    TORCH_AUDIO_AVAILABLE = False
+from PIL import Image as PILImage
+from PIL.PngImagePlugin import PngInfo
+
+import folder_paths
+
+# used for image preview
+from comfy.cli_args import args
+from comfy_api.latest._io import ComfyNode, FolderType, Image, _UIOutput
+
+
+class SavedResult(dict):
+    def __init__(self, filename: str, subfolder: str, type: FolderType):
+        super().__init__(filename=filename, subfolder=subfolder,type=type.value)
+
+    @property
+    def filename(self) -> str:
+        return self["filename"]
+
+    @property
+    def subfolder(self) -> str:
+        return self["subfolder"]
+
+    @property
+    def type(self) -> FolderType:
+        return FolderType(self["type"])
+
+
+class SavedImages(_UIOutput):
+    """A UI output class to represent one or more saved images, potentially animated."""
+    def __init__(self, results: list[SavedResult], is_animated: bool = False):
+        super().__init__()
+        self.results = results
+        self.is_animated = is_animated
+
+    def as_dict(self) -> dict:
+        data = {"images": self.results}
+        if self.is_animated:
+            data["animated"] = (True,)
+        return data
+
+
+class SavedAudios(_UIOutput):
+    """UI wrapper around one or more audio files on disk (FLAC / MP3 / Opus)."""
+    def __init__(self, results: list[SavedResult]):
+        super().__init__()
+        self.results = results
+
+    def as_dict(self) -> dict:
+        return {"audio": self.results}
+
+
+def _get_directory_by_folder_type(folder_type: FolderType) -> str:
+    if folder_type == FolderType.input:
+        return folder_paths.get_input_directory()
+    if folder_type == FolderType.output:
+        return folder_paths.get_output_directory()
+    return folder_paths.get_temp_directory()
+
+
+class ImageSaveHelper:
+    """A helper class with static methods to handle image saving and metadata."""
+
+    @staticmethod
+    def _convert_tensor_to_pil(image_tensor: torch.Tensor) -> PILImage.Image:
+        """Converts a single torch tensor to a PIL Image."""
+        return PILImage.fromarray(np.clip(255.0 * image_tensor.cpu().numpy(), 0, 255).astype(np.uint8))
+
+    @staticmethod
+    def _create_png_metadata(cls: Type[ComfyNode] | None) -> PngInfo | None:
+        """Creates a PngInfo object with prompt and extra_pnginfo."""
+        if args.disable_metadata or cls is None or not cls.hidden:
+            return None
+        metadata = PngInfo()
+        if cls.hidden.prompt:
+            metadata.add_text("prompt", json.dumps(cls.hidden.prompt))
+        if cls.hidden.extra_pnginfo:
+            for x in cls.hidden.extra_pnginfo:
+                metadata.add_text(x, json.dumps(cls.hidden.extra_pnginfo[x]))
+        return metadata
+
+    @staticmethod
+    def _create_animated_png_metadata(cls: Type[ComfyNode] | None) -> PngInfo | None:
+        """Creates a PngInfo object with prompt and extra_pnginfo for animated PNGs (APNG)."""
+        if args.disable_metadata or cls is None or not cls.hidden:
+            return None
+        metadata = PngInfo()
+        if cls.hidden.prompt:
+            metadata.add(
+                b"comf",
+                "prompt".encode("latin-1", "strict")
+                + b"\0"
+                + json.dumps(cls.hidden.prompt).encode("latin-1", "strict"),
+                after_idat=True,
+            )
+        if cls.hidden.extra_pnginfo:
+            for x in cls.hidden.extra_pnginfo:
+                metadata.add(
+                    b"comf",
+                    x.encode("latin-1", "strict")
+                    + b"\0"
+                    + json.dumps(cls.hidden.extra_pnginfo[x]).encode("latin-1", "strict"),
+                    after_idat=True,
+                )
+        return metadata
+
+    @staticmethod
+    def _create_webp_metadata(pil_image: PILImage.Image, cls: Type[ComfyNode] | None) -> PILImage.Exif:
+        """Creates EXIF metadata bytes for WebP images."""
+        exif_data = pil_image.getexif()
+        if args.disable_metadata or cls is None or cls.hidden is None:
+            return exif_data
+        if cls.hidden.prompt is not None:
+            exif_data[0x0110] = "prompt:{}".format(json.dumps(cls.hidden.prompt))  # EXIF 0x0110 = Model
+        if cls.hidden.extra_pnginfo is not None:
+            inital_exif_tag = 0x010F  # EXIF 0x010f = Make
+            for key, value in cls.hidden.extra_pnginfo.items():
+                exif_data[inital_exif_tag] = "{}:{}".format(key, json.dumps(value))
+                inital_exif_tag -= 1
+        return exif_data
+
+    @staticmethod
+    def save_images(
+        images, filename_prefix: str, folder_type: FolderType, cls: Type[ComfyNode] | None, compress_level = 4,
+    ) -> list[SavedResult]:
+        """Saves a batch of images as individual PNG files."""
+        full_output_folder, filename, counter, subfolder, _ = folder_paths.get_save_image_path(
+            filename_prefix, _get_directory_by_folder_type(folder_type), images[0].shape[1], images[0].shape[0]
+        )
+        results = []
+        metadata = ImageSaveHelper._create_png_metadata(cls)
+        for batch_number, image_tensor in enumerate(images):
+            img = ImageSaveHelper._convert_tensor_to_pil(image_tensor)
+            filename_with_batch_num = filename.replace("%batch_num%", str(batch_number))
+            file = f"{filename_with_batch_num}_{counter:05}_.png"
+            img.save(os.path.join(full_output_folder, file), pnginfo=metadata, compress_level=compress_level)
+            results.append(SavedResult(file, subfolder, folder_type))
+            counter += 1
+        return results
+
+    @staticmethod
+    def get_save_images_ui(images, filename_prefix: str, cls: Type[ComfyNode] | None, compress_level=4) -> SavedImages:
+        """Saves a batch of images and returns a UI object for the node output."""
+        return SavedImages(
+                ImageSaveHelper.save_images(
+                images,
+                filename_prefix=filename_prefix,
+                folder_type=FolderType.output,
+                cls=cls,
+                compress_level=compress_level,
+            )
+        )
+
+    @staticmethod
+    def save_animated_png(
+        images, filename_prefix: str, folder_type: FolderType, cls: Type[ComfyNode] | None, fps: float, compress_level: int
+    ) -> SavedResult:
+        """Saves a batch of images as a single animated PNG."""
+        full_output_folder, filename, counter, subfolder, _ = folder_paths.get_save_image_path(
+            filename_prefix, _get_directory_by_folder_type(folder_type), images[0].shape[1], images[0].shape[0]
+        )
+        pil_images = [ImageSaveHelper._convert_tensor_to_pil(img) for img in images]
+        metadata = ImageSaveHelper._create_animated_png_metadata(cls)
+        file = f"{filename}_{counter:05}_.png"
+        save_path = os.path.join(full_output_folder, file)
+        pil_images[0].save(
+            save_path,
+            pnginfo=metadata,
+            compress_level=compress_level,
+            save_all=True,
+            duration=int(1000.0 / fps),
+            append_images=pil_images[1:],
+        )
+        return SavedResult(file, subfolder, folder_type)
+
+    @staticmethod
+    def get_save_animated_png_ui(
+        images, filename_prefix: str, cls: Type[ComfyNode] | None, fps: float, compress_level: int
+    ) -> SavedImages:
+        """Saves an animated PNG and returns a UI object for the node output."""
+        result = ImageSaveHelper.save_animated_png(
+            images,
+            filename_prefix=filename_prefix,
+            folder_type=FolderType.output,
+            cls=cls,
+            fps=fps,
+            compress_level=compress_level,
+        )
+        return SavedImages([result], is_animated=len(images) > 1)
+
+    @staticmethod
+    def save_animated_webp(
+        images,
+        filename_prefix: str,
+        folder_type: FolderType,
+        cls: Type[ComfyNode] | None,
+        fps: float,
+        lossless: bool,
+        quality: int,
+        method: int,
+    ) -> SavedResult:
+        """Saves a batch of images as a single animated WebP."""
+        full_output_folder, filename, counter, subfolder, _ = folder_paths.get_save_image_path(
+            filename_prefix, _get_directory_by_folder_type(folder_type), images[0].shape[1], images[0].shape[0]
+        )
+        pil_images = [ImageSaveHelper._convert_tensor_to_pil(img) for img in images]
+        pil_exif = ImageSaveHelper._create_webp_metadata(pil_images[0], cls)
+        file = f"{filename}_{counter:05}_.webp"
+        pil_images[0].save(
+            os.path.join(full_output_folder, file),
+            save_all=True,
+            duration=int(1000.0 / fps),
+            append_images=pil_images[1:],
+            exif=pil_exif,
+            lossless=lossless,
+            quality=quality,
+            method=method,
+        )
+        return SavedResult(file, subfolder, folder_type)
+
+    @staticmethod
+    def get_save_animated_webp_ui(
+        images,
+        filename_prefix: str,
+        cls: Type[ComfyNode] | None,
+        fps: float,
+        lossless: bool,
+        quality: int,
+        method: int,
+    ) -> SavedImages:
+        """Saves an animated WebP and returns a UI object for the node output."""
+        result = ImageSaveHelper.save_animated_webp(
+            images,
+            filename_prefix=filename_prefix,
+            folder_type=FolderType.output,
+            cls=cls,
+            fps=fps,
+            lossless=lossless,
+            quality=quality,
+            method=method,
+        )
+        return SavedImages([result], is_animated=len(images) > 1)
+
+
+class AudioSaveHelper:
+    """A helper class with static methods to handle audio saving and metadata."""
+    _OPUS_RATES = [8000, 12000, 16000, 24000, 48000]
+
+    @staticmethod
+    def save_audio(
+        audio: dict,
+        filename_prefix: str,
+        folder_type: FolderType,
+        cls: Type[ComfyNode] | None,
+        format: str = "flac",
+        quality: str = "128k",
+    ) -> list[SavedResult]:
+        full_output_folder, filename, counter, subfolder, _ = folder_paths.get_save_image_path(
+            filename_prefix, _get_directory_by_folder_type(folder_type)
+        )
+
+        metadata = {}
+        if not args.disable_metadata and cls is not None:
+            if cls.hidden.prompt is not None:
+                metadata["prompt"] = json.dumps(cls.hidden.prompt)
+            if cls.hidden.extra_pnginfo is not None:
+                for x in cls.hidden.extra_pnginfo:
+                    metadata[x] = json.dumps(cls.hidden.extra_pnginfo[x])
+
+        results = []
+        for batch_number, waveform in enumerate(audio["waveform"].cpu()):
+            filename_with_batch_num = filename.replace("%batch_num%", str(batch_number))
+            file = f"{filename_with_batch_num}_{counter:05}_.{format}"
+            output_path = os.path.join(full_output_folder, file)
+
+            # Use original sample rate initially
+            sample_rate = audio["sample_rate"]
+
+            # Handle Opus sample rate requirements
+            if format == "opus":
+                if sample_rate > 48000:
+                    sample_rate = 48000
+                elif sample_rate not in AudioSaveHelper._OPUS_RATES:
+                    # Find the next highest supported rate
+                    for rate in sorted(AudioSaveHelper._OPUS_RATES):
+                        if rate > sample_rate:
+                            sample_rate = rate
+                            break
+                    if sample_rate not in AudioSaveHelper._OPUS_RATES:  # Fallback if still not supported
+                        sample_rate = 48000
+
+                # Resample if necessary
+                if sample_rate != audio["sample_rate"]:
+                    if not TORCH_AUDIO_AVAILABLE:
+                        raise Exception("torchaudio is not available; cannot resample audio.")
+                    waveform = torchaudio.functional.resample(waveform, audio["sample_rate"], sample_rate)
+
+            # Create output with specified format
+            output_buffer = BytesIO()
+            output_container = av.open(output_buffer, mode="w", format=format)
+
+            # Set metadata on the container
+            for key, value in metadata.items():
+                output_container.metadata[key] = value
+
+            # Set up the output stream with appropriate properties
+            if format == "opus":
+                out_stream = output_container.add_stream("libopus", rate=sample_rate)
+                if quality == "64k":
+                    out_stream.bit_rate = 64000
+                elif quality == "96k":
+                    out_stream.bit_rate = 96000
+                elif quality == "128k":
+                    out_stream.bit_rate = 128000
+                elif quality == "192k":
+                    out_stream.bit_rate = 192000
+                elif quality == "320k":
+                    out_stream.bit_rate = 320000
+            elif format == "mp3":
+                out_stream = output_container.add_stream("libmp3lame", rate=sample_rate)
+                if quality == "V0":
+                    # TODO i would really love to support V3 and V5 but there doesn't seem to be a way to set the qscale level, the property below is a bool
+                    out_stream.codec_context.qscale = 1
+                elif quality == "128k":
+                    out_stream.bit_rate = 128000
+                elif quality == "320k":
+                    out_stream.bit_rate = 320000
+            else:  # format == "flac":
+                out_stream = output_container.add_stream("flac", rate=sample_rate)
+
+            frame = av.AudioFrame.from_ndarray(
+                waveform.movedim(0, 1).reshape(1, -1).float().numpy(),
+                format="flt",
+                layout="mono" if waveform.shape[0] == 1 else "stereo",
+            )
+            frame.sample_rate = sample_rate
+            frame.pts = 0
+            output_container.mux(out_stream.encode(frame))
+
+            # Flush encoder
+            output_container.mux(out_stream.encode(None))
+
+            # Close containers
+            output_container.close()
+
+            # Write the output to file
+            output_buffer.seek(0)
+            with open(output_path, "wb") as f:
+                f.write(output_buffer.getbuffer())
+
+            results.append(SavedResult(file, subfolder, folder_type))
+            counter += 1
+
+        return results
+
+    @staticmethod
+    def get_save_audio_ui(
+        audio, filename_prefix: str, cls: Type[ComfyNode] | None, format: str = "flac", quality: str = "128k",
+    ) -> SavedAudios:
+        """Save and instantly wrap for UI."""
+        return SavedAudios(
+            AudioSaveHelper.save_audio(
+                audio,
+                filename_prefix=filename_prefix,
+                folder_type=FolderType.output,
+                cls=cls,
+                format=format,
+                quality=quality,
+            )
+        )
+
+
+class PreviewImage(_UIOutput):
+    def __init__(self, image: Image.Type, animated: bool = False, cls: Type[ComfyNode] = None, **kwargs):
+        self.values = ImageSaveHelper.save_images(
+            image,
+            filename_prefix="ComfyUI_temp_" + ''.join(random.choice("abcdefghijklmnopqrstupvxyz") for _ in range(5)),
+            folder_type=FolderType.temp,
+            cls=cls,
+            compress_level=1,
+        )
+        self.animated = animated
+
+    def as_dict(self):
+        return {
+            "images": self.values,
+            "animated": (self.animated,)
+        }
+
+
+class PreviewMask(PreviewImage):
+    def __init__(self, mask: PreviewMask.Type, animated: bool=False, cls: ComfyNode=None, **kwargs):
+        preview = mask.reshape((-1, 1, mask.shape[-2], mask.shape[-1])).movedim(1, -1).expand(-1, -1, -1, 3)
+        super().__init__(preview, animated, cls, **kwargs)
+
+
+class PreviewAudio(_UIOutput):
+    def __init__(self, audio: dict, cls: Type[ComfyNode] = None, **kwargs):
+        self.values = AudioSaveHelper.save_audio(
+            audio,
+            filename_prefix="ComfyUI_temp_" + "".join(random.choice("abcdefghijklmnopqrstuvwxyz") for _ in range(5)),
+            folder_type=FolderType.temp,
+            cls=cls,
+            format="flac",
+            quality="128k",
+        )
+
+    def as_dict(self) -> dict:
+        return {"audio": self.values}
+
+
+class PreviewVideo(_UIOutput):
+    def __init__(self, values: list[SavedResult | dict], **kwargs):
+        self.values = values
+
+    def as_dict(self):
+        return {"images": self.values, "animated": (True,)}
+
+
+class PreviewUI3D(_UIOutput):
+    def __init__(self, model_file, camera_info, **kwargs):
+        self.model_file = model_file
+        self.camera_info = camera_info
+
+    def as_dict(self):
+        return {"result": [self.model_file, self.camera_info]}
+
+
+class PreviewText(_UIOutput):
+    def __init__(self, value: str, **kwargs):
+        self.value = value
+
+    def as_dict(self):
+        return {"text": (self.value,)}
+
+
+class _UI:
+    SavedResult = SavedResult
+    SavedImages = SavedImages
+    SavedAudios = SavedAudios
+    ImageSaveHelper = ImageSaveHelper
+    AudioSaveHelper = AudioSaveHelper
+    PreviewImage = PreviewImage
+    PreviewMask = PreviewMask
+    PreviewAudio = PreviewAudio
+    PreviewVideo = PreviewVideo
+    PreviewUI3D = PreviewUI3D
+    PreviewText = PreviewText
--- a/comfy_api/latest/_util/init.py
+++ b/comfy_api/latest/_util/init.py
@@ -0,0 +1,8 @@
+from .video_types import VideoContainer, VideoCodec, VideoComponents
+
+__all__ = [
+    # Utility Types
+    "VideoContainer",
+    "VideoCodec",
+    "VideoComponents",
+]
--- a/comfy_api/latest/_util/video_types.py
+++ b/comfy_api/latest/_util/video_types.py
@@ -0,0 +1,52 @@
+from __future__ import annotations
+from dataclasses import dataclass
+from enum import Enum
+from fractions import Fraction
+from typing import Optional
+from comfy_api.latest._input import ImageInput, AudioInput
+
+class VideoCodec(str, Enum):
+    AUTO = "auto"
+    H264 = "h264"
+
+    @classmethod
+    def as_input(cls) -> list[str]:
+        """
+        Returns a list of codec names that can be used as node input.
+        """
+        return [member.value for member in cls]
+
+class VideoContainer(str, Enum):
+    AUTO = "auto"
+    MP4 = "mp4"
+
+    @classmethod
+    def as_input(cls) -> list[str]:
+        """
+        Returns a list of container names that can be used as node input.
+        """
+        return [member.value for member in cls]
+
+    @classmethod
+    def get_extension(cls, value) -> str:
+        """
+        Returns the file extension for the container.
+        """
+        if isinstance(value, str):
+            value = cls(value)
+        if value == VideoContainer.MP4 or value == VideoContainer.AUTO:
+            return "mp4"
+        return ""
+
+@dataclass
+class VideoComponents:
+    """
+    Dataclass representing the components of a video.
+    """
+
+    images: ImageInput
+    frame_rate: Fraction
+    audio: Optional[AudioInput] = None
+    metadata: Optional[dict] = None
+
+
--- a/comfy_api/latest/generated/ComfyAPISyncStub.pyi
+++ b/comfy_api/latest/generated/ComfyAPISyncStub.pyi
@@ -0,0 +1,20 @@
+from typing import Any, Dict, List, Optional, Tuple, Union, Set, Sequence, cast, NamedTuple
+from comfy_api.latest import ComfyAPI_latest
+from PIL.Image import Image
+from torch import Tensor
+class ComfyAPISyncStub:
+    def __init__(self) -> None: ...
+
+    class ExecutionSync:
+        def __init__(self) -> None: ...
+        """
+        Update the progress bar displayed in the ComfyUI interface.
+
+        This function allows custom nodes and API calls to report their progress
+        back to the user interface, providing visual feedback during long operations.
+
+        Migration from previous API: comfy.utils.PROGRESS_BAR_HOOK
+        """
+        def set_progress(self, value: float, max_value: float, node_id: Union[str, None] = None, preview_image: Union[Image, Tensor, None] = None, ignore_size_limit: bool = False) -> None: ...
+
+    execution: ExecutionSync
--- a/comfy_api/util.py
+++ b/comfy_api/util.py
@@ -0,0 +1,8 @@
+# This file only exists for backwards compatibility.
+from comfy_api.latest._util import VideoCodec, VideoContainer, VideoComponents
+
+__all__ = [
+    "VideoCodec",
+    "VideoContainer",
+    "VideoComponents",
+]
--- a/comfy_api/util/init.py
+++ b/comfy_api/util/init.py
@@ -1,7 +1,7 @@
-from .video_types import VideoContainer, VideoCodec, VideoComponents
+# This file only exists for backwards compatibility.
+from comfy_api.latest._util import VideoContainer, VideoCodec, VideoComponents

 __all__ = [
-    # Utility Types
    "VideoContainer",
    "VideoCodec",
    "VideoComponents",
--- a/comfy_api/util/video_types.py
+++ b/comfy_api/util/video_types.py
@@ -1,51 +1,12 @@
-from __future__ import annotations
-from dataclasses import dataclass
-from enum import Enum
-from fractions import Fraction
-from typing import Optional
-from comfy_api.input import ImageInput, AudioInput
-
-class VideoCodec(str, Enum):
-    AUTO = "auto"
-    H264 = "h264"
-
-    @classmethod
-    def as_input(cls) -> list[str]:
-        """
-        Returns a list of codec names that can be used as node input.
-        """
-        return [member.value for member in cls]
-
-class VideoContainer(str, Enum):
-    AUTO = "auto"
-    MP4 = "mp4"
-
-    @classmethod
-    def as_input(cls) -> list[str]:
-        """
-        Returns a list of container names that can be used as node input.
-        """
-        return [member.value for member in cls]
-
-    @classmethod
-    def get_extension(cls, value) -> str:
-        """
-        Returns the file extension for the container.
-        """
-        if isinstance(value, str):
-            value = cls(value)
-        if value == VideoContainer.MP4 or value == VideoContainer.AUTO:
-            return "mp4"
-        return ""
-
-@dataclass
-class VideoComponents:
-    """
-    Dataclass representing the components of a video.
-    """
-
-    images: ImageInput
-    frame_rate: Fraction
-    audio: Optional[AudioInput] = None
-    metadata: Optional[dict] = None
+# This file only exists for backwards compatibility.
+from comfy_api.latest._util.video_types import (
+    VideoContainer,
+    VideoCodec,
+    VideoComponents,
+)

+__all__ = [
+    "VideoContainer",
+    "VideoCodec",
+    "VideoComponents",
+]
--- a/comfy_api/v0_0_1/init.py
+++ b/comfy_api/v0_0_1/init.py
@@ -0,0 +1,42 @@
+from comfy_api.v0_0_2 import (
+    ComfyAPIAdapter_v0_0_2,
+    Input as Input_v0_0_2,
+    InputImpl as InputImpl_v0_0_2,
+    Types as Types_v0_0_2,
+)
+from typing import Type, TYPE_CHECKING
+from comfy_api.internal.async_to_sync import create_sync_class
+
+
+# This version only exists to serve as a template for future version adapters.
+# There is no reason anyone should ever use it.
+class ComfyAPIAdapter_v0_0_1(ComfyAPIAdapter_v0_0_2):
+    VERSION = "0.0.1"
+    STABLE = True
+
+class Input(Input_v0_0_2):
+    pass
+
+class InputImpl(InputImpl_v0_0_2):
+    pass
+
+class Types(Types_v0_0_2):
+    pass
+
+ComfyAPI = ComfyAPIAdapter_v0_0_1
+
+# Create a synchronous version of the API
+if TYPE_CHECKING:
+    from comfy_api.v0_0_1.generated.ComfyAPISyncStub import ComfyAPISyncStub  # type: ignore
+
+    ComfyAPISync: Type[ComfyAPISyncStub]
+
+ComfyAPISync = create_sync_class(ComfyAPIAdapter_v0_0_1)
+
+__all__ = [
+    "ComfyAPI",
+    "ComfyAPISync",
+    "Input",
+    "InputImpl",
+    "Types",
+]
--- a/comfy_api/v0_0_1/generated/ComfyAPISyncStub.pyi
+++ b/comfy_api/v0_0_1/generated/ComfyAPISyncStub.pyi
@@ -0,0 +1,20 @@
+from typing import Any, Dict, List, Optional, Tuple, Union, Set, Sequence, cast, NamedTuple
+from comfy_api.v0_0_1 import ComfyAPIAdapter_v0_0_1
+from PIL.Image import Image
+from torch import Tensor
+class ComfyAPISyncStub:
+    def __init__(self) -> None: ...
+
+    class ExecutionSync:
+        def __init__(self) -> None: ...
+        """
+        Update the progress bar displayed in the ComfyUI interface.
+
+        This function allows custom nodes and API calls to report their progress
+        back to the user interface, providing visual feedback during long operations.
+
+        Migration from previous API: comfy.utils.PROGRESS_BAR_HOOK
+        """
+        def set_progress(self, value: float, max_value: float, node_id: Union[str, None] = None, preview_image: Union[Image, Tensor, None] = None, ignore_size_limit: bool = False) -> None: ...
+
+    execution: ExecutionSync
--- a/comfy_api/v0_0_2/init.py
+++ b/comfy_api/v0_0_2/init.py
@@ -0,0 +1,45 @@
+from comfy_api.latest import (
+    ComfyAPI_latest,
+    Input as Input_latest,
+    InputImpl as InputImpl_latest,
+    Types as Types_latest,
+)
+from typing import Type, TYPE_CHECKING
+from comfy_api.internal.async_to_sync import create_sync_class
+from comfy_api.latest import io, ui, ComfyExtension  #noqa: F401
+
+
+class ComfyAPIAdapter_v0_0_2(ComfyAPI_latest):
+    VERSION = "0.0.2"
+    STABLE = False
+
+
+class Input(Input_latest):
+    pass
+
+
+class InputImpl(InputImpl_latest):
+    pass
+
+
+class Types(Types_latest):
+    pass
+
+
+ComfyAPI = ComfyAPIAdapter_v0_0_2
+
+# Create a synchronous version of the API
+if TYPE_CHECKING:
+    from comfy_api.v0_0_2.generated.ComfyAPISyncStub import ComfyAPISyncStub  # type: ignore
+
+    ComfyAPISync: Type[ComfyAPISyncStub]
+ComfyAPISync = create_sync_class(ComfyAPIAdapter_v0_0_2)
+
+__all__ = [
+    "ComfyAPI",
+    "ComfyAPISync",
+    "Input",
+    "InputImpl",
+    "Types",
+    "ComfyExtension",
+]
--- a/comfy_api/v0_0_2/generated/ComfyAPISyncStub.pyi
+++ b/comfy_api/v0_0_2/generated/ComfyAPISyncStub.pyi
@@ -0,0 +1,20 @@
+from typing import Any, Dict, List, Optional, Tuple, Union, Set, Sequence, cast, NamedTuple
+from comfy_api.v0_0_2 import ComfyAPIAdapter_v0_0_2
+from PIL.Image import Image
+from torch import Tensor
+class ComfyAPISyncStub:
+    def __init__(self) -> None: ...
+
+    class ExecutionSync:
+        def __init__(self) -> None: ...
+        """
+        Update the progress bar displayed in the ComfyUI interface.
+
+        This function allows custom nodes and API calls to report their progress
+        back to the user interface, providing visual feedback during long operations.
+
+        Migration from previous API: comfy.utils.PROGRESS_BAR_HOOK
+        """
+        def set_progress(self, value: float, max_value: float, node_id: Union[str, None] = None, preview_image: Union[Image, Tensor, None] = None, ignore_size_limit: bool = False) -> None: ...
+
+    execution: ExecutionSync
--- a/comfy_api/version_list.py
+++ b/comfy_api/version_list.py
@@ -0,0 +1,12 @@
+from comfy_api.latest import ComfyAPI_latest
+from comfy_api.v0_0_2 import ComfyAPIAdapter_v0_0_2
+from comfy_api.v0_0_1 import ComfyAPIAdapter_v0_0_1
+from comfy_api.internal import ComfyAPIBase
+from typing import List, Type
+
+supported_versions: List[Type[ComfyAPIBase]] = [
+    ComfyAPI_latest,
+    ComfyAPIAdapter_v0_0_2,
+    ComfyAPIAdapter_v0_0_1,
+]
+
--- a/comfy_api_nodes/README.md
+++ b/comfy_api_nodes/README.md
@@ -2,7 +2,7 @@

 ## Introduction 

-Below are a collection of nodes that work by calling external APIs. More information available in our [docs](https://docs.comfy.org/tutorials/api-nodes/overview#api-nodes).
+Below are a collection of nodes that work by calling external APIs. More information available in our [docs](https://docs.comfy.org/tutorials/api-nodes/overview).

 ## Development

--- a/comfy_api_nodes/apinode_utils.py
+++ b/comfy_api_nodes/apinode_utils.py
@@ -1,4 +1,5 @@
 from __future__ import annotations
+import aiohttp
 import io
 import logging
 import mimetypes
@@ -21,7 +22,6 @@ from server import PromptServer

 import numpy as np
 from PIL import Image
-import requests
 import torch
 import math
 import base64
@@ -30,7 +30,7 @@ from io import BytesIO
 import av


-def download_url_to_video_output(video_url: str, timeout: int = None) -> VideoFromFile:
+async def download_url_to_video_output(video_url: str, timeout: int = None) -> VideoFromFile:
    """Downloads a video from a URL and returns a `VIDEO` output.

    Args:
@@ -39,7 +39,7 @@ def download_url_to_video_output(video_url: str, timeout: int = None) -> VideoFr
    Returns:
        A Comfy node `VIDEO` output.
    """
-    video_io = download_url_to_bytesio(video_url, timeout)
+    video_io = await download_url_to_bytesio(video_url, timeout)
    if video_io is None:
        error_msg = f"Failed to download video from {video_url}"
        logging.error(error_msg)
@@ -62,7 +62,7 @@ def downscale_image_tensor(image, total_pixels=1536 * 1024) -> torch.Tensor:
    return s


-def validate_and_cast_response(
+async def validate_and_cast_response(
    response, timeout: int = None, node_id: Union[str, None] = None
 ) -> torch.Tensor:
    """Validates and casts a response to a torch.Tensor.
@@ -86,35 +86,24 @@ def validate_and_cast_response(
    image_tensors: list[torch.Tensor] = []

    # Process each image in the data array
-    for image_data in data:
-        image_url = image_data.url
-        b64_data = image_data.b64_json
+    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=timeout)) as session:
+        for img_data in data:
+            img_bytes: bytes
+            if img_data.b64_json:
+                img_bytes = base64.b64decode(img_data.b64_json)
+            elif img_data.url:
+                if node_id:
+                    PromptServer.instance.send_progress_text(f"Result URL: {img_data.url}", node_id)
+                async with session.get(img_data.url) as resp:
+                    if resp.status != 200:
+                        raise ValueError("Failed to download generated image")
+                    img_bytes = await resp.read()
+            else:
+                raise ValueError("Invalid image payload – neither URL nor base64 data present.")

-        if not image_url and not b64_data:
-            raise ValueError("No image was generated in the response")
-
-        if b64_data:
-            img_data = base64.b64decode(b64_data)
-            img = Image.open(io.BytesIO(img_data))
-
-        elif image_url:
-            if node_id:
-                PromptServer.instance.send_progress_text(
-                    f"Result URL: {image_url}", node_id
-                )
-            img_response = requests.get(image_url, timeout=timeout)
-            if img_response.status_code != 200:
-                raise ValueError("Failed to download the image")
-            img = Image.open(io.BytesIO(img_response.content))
-
-        img = img.convert("RGBA")
-
-        # Convert to numpy array, normalize to float32 between 0 and 1
-        img_array = np.array(img).astype(np.float32) / 255.0
-        img_tensor = torch.from_numpy(img_array)
-
-        # Add to list of tensors
-        image_tensors.append(img_tensor)
+            pil_img = Image.open(BytesIO(img_bytes)).convert("RGBA")
+            arr = np.asarray(pil_img).astype(np.float32) / 255.0
+            image_tensors.append(torch.from_numpy(arr))

    return torch.stack(image_tensors, dim=0)

@@ -175,7 +164,7 @@ def mimetype_to_extension(mime_type: str) -> str:
    return mime_type.split("/")[-1].lower()


-def download_url_to_bytesio(url: str, timeout: int = None) -> BytesIO:
+async def download_url_to_bytesio(url: str, timeout: int = None) -> BytesIO:
    """Downloads content from a URL using requests and returns it as BytesIO.

    Args:
@@ -185,9 +174,11 @@ def download_url_to_bytesio(url: str, timeout: int = None) -> BytesIO:
    Returns:
        BytesIO object containing the downloaded content.
    """
-    response = requests.get(url, stream=True, timeout=timeout)
-    response.raise_for_status()  # Raises HTTPError for bad responses (4XX or 5XX)
-    return BytesIO(response.content)
+    timeout_cfg = aiohttp.ClientTimeout(total=timeout) if timeout else None
+    async with aiohttp.ClientSession(timeout=timeout_cfg) as session:
+        async with session.get(url) as resp:
+            resp.raise_for_status()  # Raises HTTPError for bad responses (4XX or 5XX)
+            return BytesIO(await resp.read())


 def bytesio_to_image_tensor(image_bytesio: BytesIO, mode: str = "RGBA") -> torch.Tensor:
@@ -210,15 +201,15 @@ def bytesio_to_image_tensor(image_bytesio: BytesIO, mode: str = "RGBA") -> torch
    return torch.from_numpy(image_array).unsqueeze(0)


-def download_url_to_image_tensor(url: str, timeout: int = None) -> torch.Tensor:
+async def download_url_to_image_tensor(url: str, timeout: int = None) -> torch.Tensor:
    """Downloads an image from a URL and returns a [B, H, W, C] tensor."""
-    image_bytesio = download_url_to_bytesio(url, timeout)
+    image_bytesio = await download_url_to_bytesio(url, timeout)
    return bytesio_to_image_tensor(image_bytesio)


-def process_image_response(response: requests.Response) -> torch.Tensor:
+def process_image_response(response_content: bytes | str) -> torch.Tensor:
    """Uses content from a Response object and converts it to a torch.Tensor"""
-    return bytesio_to_image_tensor(BytesIO(response.content))
+    return bytesio_to_image_tensor(BytesIO(response_content))


 def _tensor_to_pil(image: torch.Tensor, total_pixels: int = 2048 * 2048) -> Image.Image:
@@ -336,10 +327,10 @@ def text_filepath_to_data_uri(filepath: str) -> str:
    return f"data:{mime_type};base64,{base64_string}"


-def upload_file_to_comfyapi(
+async def upload_file_to_comfyapi(
    file_bytes_io: BytesIO,
    filename: str,
-    upload_mime_type: str,
+    upload_mime_type: Optional[str],
    auth_kwargs: Optional[dict[str, str]] = None,
 ) -> str:
    """
@@ -354,7 +345,10 @@ def upload_file_to_comfyapi(
    Returns:
        The download URL for the uploaded file.
    """
-    request_object = UploadRequest(file_name=filename, content_type=upload_mime_type)
+    if upload_mime_type is None:
+        request_object = UploadRequest(file_name=filename)
+    else:
+        request_object = UploadRequest(file_name=filename, content_type=upload_mime_type)
    operation = SynchronousOperation(
        endpoint=ApiEndpoint(
            path="/customers/storage",
@@ -366,12 +360,8 @@ def upload_file_to_comfyapi(
        auth_kwargs=auth_kwargs,
    )

-    response: UploadResponse = operation.execute()
-    upload_response = ApiClient.upload_file(
-        response.upload_url, file_bytes_io, content_type=upload_mime_type
-    )
-    upload_response.raise_for_status()
-
+    response: UploadResponse = await operation.execute()
+    await ApiClient.upload_file(response.upload_url, file_bytes_io, content_type=upload_mime_type)
    return response.download_url


@@ -399,7 +389,7 @@ def video_to_base64_string(
    return base64.b64encode(video_bytes_io.getvalue()).decode("utf-8")


-def upload_video_to_comfyapi(
+async def upload_video_to_comfyapi(
    video: VideoInput,
    auth_kwargs: Optional[dict[str, str]] = None,
    container: VideoContainer = VideoContainer.MP4,
@@ -439,9 +429,7 @@ def upload_video_to_comfyapi(
    video.save_to(video_bytes_io, format=container, codec=codec)
    video_bytes_io.seek(0)

-    return upload_file_to_comfyapi(
-        video_bytes_io, filename, upload_mime_type, auth_kwargs
-    )
+    return await upload_file_to_comfyapi(video_bytes_io, filename, upload_mime_type, auth_kwargs)


 def audio_tensor_to_contiguous_ndarray(waveform: torch.Tensor) -> np.ndarray:
@@ -501,7 +489,7 @@ def audio_ndarray_to_bytesio(
    return audio_bytes_io


-def upload_audio_to_comfyapi(
+async def upload_audio_to_comfyapi(
    audio: AudioInput,
    auth_kwargs: Optional[dict[str, str]] = None,
    container_format: str = "mp4",
@@ -527,7 +515,7 @@ def upload_audio_to_comfyapi(
        audio_data_np, sample_rate, container_format, codec_name
    )

-    return upload_file_to_comfyapi(audio_bytes_io, filename, mime_type, auth_kwargs)
+    return await upload_file_to_comfyapi(audio_bytes_io, filename, mime_type, auth_kwargs)


 def audio_to_base64_string(
@@ -544,7 +532,7 @@ def audio_to_base64_string(
    return base64.b64encode(audio_bytes).decode("utf-8")


-def upload_images_to_comfyapi(
+async def upload_images_to_comfyapi(
    image: torch.Tensor,
    max_images=8,
    auth_kwargs: Optional[dict[str, str]] = None,
@@ -561,55 +549,15 @@ def upload_images_to_comfyapi(
        mime_type: Optional MIME type for the image.
    """
    # if batch, try to upload each file if max_images is greater than 0
-    idx_image = 0
    download_urls: list[str] = []
    is_batch = len(image.shape) > 3
-    batch_length = 1
-    if is_batch:
-        batch_length = image.shape[0]
-    while True:
-        curr_image = image
-        if len(image.shape) > 3:
-            curr_image = image[idx_image]
-        # get BytesIO version of image
-        img_binary = tensor_to_bytesio(curr_image, mime_type=mime_type)
-        # first, request upload/download urls from comfy API
-        if not mime_type:
-            request_object = UploadRequest(file_name=img_binary.name)
-        else:
-            request_object = UploadRequest(
-                file_name=img_binary.name, content_type=mime_type
-            )
-        operation = SynchronousOperation(
-            endpoint=ApiEndpoint(
-                path="/customers/storage",
-                method=HttpMethod.POST,
-                request_model=UploadRequest,
-                response_model=UploadResponse,
-            ),
-            request=request_object,
-            auth_kwargs=auth_kwargs,
-        )
-        response = operation.execute()
+    batch_len = image.shape[0] if is_batch else 1

-        upload_response = ApiClient.upload_file(
-            response.upload_url, img_binary, content_type=mime_type
-        )
-        # verify success
-        try:
-            upload_response.raise_for_status()
-        except requests.exceptions.HTTPError as e:
-            raise ValueError(f"Could not upload one or more images: {e}") from e
-        # add download_url to list
-        download_urls.append(response.download_url)
-
-        idx_image += 1
-        # stop uploading additional files if done
-        if is_batch and max_images > 0:
-            if idx_image >= max_images:
-                break
-            if idx_image >= batch_length:
-                break
+    for idx in range(min(batch_len, max_images)):
+        tensor = image[idx] if is_batch else image
+        img_io = tensor_to_bytesio(tensor, mime_type=mime_type)
+        url = await upload_file_to_comfyapi(img_io, img_io.name, mime_type, auth_kwargs)
+        download_urls.append(url)
    return download_urls


--- a/comfy_api_nodes/apis/init.py
+++ b/comfy_api_nodes/apis/init.py
--- a/comfy_api_nodes/apis/client.py
+++ b/comfy_api_nodes/apis/client.py
--- a/comfy_api_nodes/apis/request_logger.py
+++ b/comfy_api_nodes/apis/request_logger.py
@@ -1,3 +1,5 @@
+from __future__ import annotations
+
 import os
 import datetime
 import json
--- a/comfy_api_nodes/apis/tripo_api.py
+++ b/comfy_api_nodes/apis/tripo_api.py
@@ -127,7 +127,7 @@ class TripoTextToModelRequest(BaseModel):
    type: TripoTaskType = Field(TripoTaskType.TEXT_TO_MODEL, description='Type of task')
    prompt: str = Field(..., description='The text prompt describing the model to generate', max_length=1024)
    negative_prompt: Optional[str] = Field(None, description='The negative text prompt', max_length=1024)
-    model_version: Optional[TripoModelVersion] = TripoModelVersion.V2_5
+    model_version: Optional[TripoModelVersion] = TripoModelVersion.v2_5_20250123
    face_limit: Optional[int] = Field(None, description='The number of faces to limit the generation to')
    texture: Optional[bool] = Field(True, description='Whether to apply texture to the generated model')
    pbr: Optional[bool] = Field(True, description='Whether to apply PBR to the generated model')
--- a/comfy_api_nodes/nodes_bfl.py
+++ b/comfy_api_nodes/nodes_bfl.py
@@ -1,3 +1,4 @@
+import asyncio
 import io
 from inspect import cleandoc
 from typing import Union, Optional
@@ -28,7 +29,7 @@ from comfy_api_nodes.apinode_utils import (

 import numpy as np
 from PIL import Image
-import requests
+import aiohttp
 import torch
 import base64
 import time
@@ -44,18 +45,18 @@ def convert_mask_to_image(mask: torch.Tensor):
    return mask


-def handle_bfl_synchronous_operation(
+async def handle_bfl_synchronous_operation(
    operation: SynchronousOperation,
    timeout_bfl_calls=360,
    node_id: Union[str, None] = None,
 ):
-    response_api: BFLFluxProGenerateResponse = operation.execute()
-    return _poll_until_generated(
+    response_api: BFLFluxProGenerateResponse = await operation.execute()
+    return await _poll_until_generated(
        response_api.polling_url, timeout=timeout_bfl_calls, node_id=node_id
    )


-def _poll_until_generated(
+async def _poll_until_generated(
    polling_url: str, timeout=360, node_id: Union[str, None] = None
 ):
    # used bfl-comfy-nodes to verify code implementation:
@@ -66,55 +67,56 @@ def _poll_until_generated(
    retry_404_seconds = 2
    retry_202_seconds = 2
    retry_pending_seconds = 1
-    request = requests.Request(method=HttpMethod.GET, url=polling_url)
-    # NOTE: should True loop be replaced with checking if workflow has been interrupted?
-    while True:
-        if node_id:
-            time_elapsed = time.time() - start_time
-            PromptServer.instance.send_progress_text(
-                f"Generating ({time_elapsed:.0f}s)", node_id
-            )

-        response = requests.Session().send(request.prepare())
-        if response.status_code == 200:
-            result = response.json()
-            if result["status"] == BFLStatus.ready:
-                img_url = result["result"]["sample"]
-                if node_id:
-                    PromptServer.instance.send_progress_text(
-                        f"Result URL: {img_url}", node_id
-                    )
-                img_response = requests.get(img_url)
-                return process_image_response(img_response)
-            elif result["status"] in [
-                BFLStatus.request_moderated,
-                BFLStatus.content_moderated,
-            ]:
-                status = result["status"]
-                raise Exception(
-                    f"BFL API did not return an image due to: {status}."
+    async with aiohttp.ClientSession() as session:
+        # NOTE: should True loop be replaced with checking if workflow has been interrupted?
+        while True:
+            if node_id:
+                time_elapsed = time.time() - start_time
+                PromptServer.instance.send_progress_text(
+                    f"Generating ({time_elapsed:.0f}s)", node_id
                )
-            elif result["status"] == BFLStatus.error:
-                raise Exception(f"BFL API encountered an error: {result}.")
-            elif result["status"] == BFLStatus.pending:
-                time.sleep(retry_pending_seconds)
-                continue
-        elif response.status_code == 404:
-            if retries_404 < max_retries_404:
-                retries_404 += 1
-                time.sleep(retry_404_seconds)
-                continue
-            raise Exception(
-                f"BFL API could not find task after {max_retries_404} tries."
-            )
-        elif response.status_code == 202:
-            time.sleep(retry_202_seconds)
-        elif time.time() - start_time > timeout:
-            raise Exception(
-                f"BFL API experienced a timeout; could not return request under {timeout} seconds."
-            )
-        else:
-            raise Exception(f"BFL API encountered an error: {response.json()}")
+
+            async with session.get(polling_url) as response:
+                if response.status == 200:
+                    result = await response.json()
+                    if result["status"] == BFLStatus.ready:
+                        img_url = result["result"]["sample"]
+                        if node_id:
+                            PromptServer.instance.send_progress_text(
+                                f"Result URL: {img_url}", node_id
+                            )
+                        async with session.get(img_url) as img_resp:
+                            return process_image_response(await img_resp.content.read())
+                    elif result["status"] in [
+                        BFLStatus.request_moderated,
+                        BFLStatus.content_moderated,
+                    ]:
+                        status = result["status"]
+                        raise Exception(
+                            f"BFL API did not return an image due to: {status}."
+                        )
+                    elif result["status"] == BFLStatus.error:
+                        raise Exception(f"BFL API encountered an error: {result}.")
+                    elif result["status"] == BFLStatus.pending:
+                        await asyncio.sleep(retry_pending_seconds)
+                        continue
+                elif response.status == 404:
+                    if retries_404 < max_retries_404:
+                        retries_404 += 1
+                        await asyncio.sleep(retry_404_seconds)
+                        continue
+                    raise Exception(
+                        f"BFL API could not find task after {max_retries_404} tries."
+                    )
+                elif response.status == 202:
+                    await asyncio.sleep(retry_202_seconds)
+                elif time.time() - start_time > timeout:
+                    raise Exception(
+                        f"BFL API experienced a timeout; could not return request under {timeout} seconds."
+                    )
+                else:
+                    raise Exception(f"BFL API encountered an error: {response.json()}")

 def convert_image_to_base64(image: torch.Tensor):
    scaled_image = downscale_image_tensor(image, total_pixels=2048 * 2048)
@@ -222,7 +224,7 @@ class FluxProUltraImageNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        aspect_ratio: str,
@@ -266,7 +268,7 @@ class FluxProUltraImageNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -354,7 +356,7 @@ class FluxKontextProImageNode(ComfyNodeABC):

    BFL_PATH = "/proxy/bfl/flux-kontext-pro/generate"

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        aspect_ratio: str,
@@ -397,7 +399,7 @@ class FluxKontextProImageNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -489,7 +491,7 @@ class FluxProImageNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        prompt_upsampling,
@@ -524,7 +526,7 @@ class FluxProImageNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -632,7 +634,7 @@ class FluxProExpandNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt: str,
@@ -670,7 +672,7 @@ class FluxProExpandNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -744,7 +746,7 @@ class FluxProFillNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        mask: torch.Tensor,
@@ -780,7 +782,7 @@ class FluxProFillNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -879,7 +881,7 @@ class FluxProCannyNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        control_image: torch.Tensor,
        prompt: str,
@@ -929,7 +931,7 @@ class FluxProCannyNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


@@ -1008,7 +1010,7 @@ class FluxProDepthNode(ComfyNodeABC):
    API_NODE = True
    CATEGORY = "api node/image/BFL"

-    def api_call(
+    async def api_call(
        self,
        control_image: torch.Tensor,
        prompt: str,
@@ -1045,7 +1047,7 @@ class FluxProDepthNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        output_image = handle_bfl_synchronous_operation(operation, node_id=unique_id)
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=unique_id)
        return (output_image,)


--- a/comfy_api_nodes/nodes_gemini.py
+++ b/comfy_api_nodes/nodes_gemini.py
@@ -2,6 +2,8 @@
 API Nodes for Gemini Multimodal LLM Usage via Remote API
 See: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
 """
+from __future__ import annotations
+

 import os
 from enum import Enum
@@ -301,7 +303,7 @@ class GeminiNode(ComfyNodeABC):
        """
        return GeminiPart(text=text)

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: GeminiModel,
@@ -330,7 +332,7 @@ class GeminiNode(ComfyNodeABC):
            parts.extend(files)

        # Create response
-        response = SynchronousOperation(
+        response = await SynchronousOperation(
            endpoint=get_gemini_endpoint(model),
            request=GeminiGenerateContentRequest(
                contents=[
--- a/comfy_api_nodes/nodes_ideogram.py
+++ b/comfy_api_nodes/nodes_ideogram.py
@@ -212,7 +212,7 @@ V3_RESOLUTIONS= [
    "1536x640"
 ]

-def download_and_process_images(image_urls):
+async def download_and_process_images(image_urls):
    """Helper function to download and process multiple images from URLs"""

    # Initialize list to store image tensors
@@ -220,7 +220,7 @@ def download_and_process_images(image_urls):

    for image_url in image_urls:
        # Using functions from apinode_utils.py to handle downloading and processing
-        image_bytesio = download_url_to_bytesio(image_url)  # Download image content to BytesIO
+        image_bytesio = await download_url_to_bytesio(image_url)  # Download image content to BytesIO
        img_tensor = bytesio_to_image_tensor(image_bytesio, mode="RGB")  # Convert to torch.Tensor with RGB mode
        image_tensors.append(img_tensor)

@@ -328,7 +328,7 @@ class IdeogramV1(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        turbo=False,
@@ -367,7 +367,7 @@ class IdeogramV1(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")
@@ -378,7 +378,7 @@ class IdeogramV1(ComfyNodeABC):
            raise Exception("No image URLs were generated in the response")

        display_image_urls_on_node(image_urls, unique_id)
-        return (download_and_process_images(image_urls),)
+        return (await download_and_process_images(image_urls),)


 class IdeogramV2(ComfyNodeABC):
@@ -487,7 +487,7 @@ class IdeogramV2(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        turbo=False,
@@ -543,7 +543,7 @@ class IdeogramV2(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")
@@ -554,7 +554,7 @@ class IdeogramV2(ComfyNodeABC):
            raise Exception("No image URLs were generated in the response")

        display_image_urls_on_node(image_urls, unique_id)
-        return (download_and_process_images(image_urls),)
+        return (await download_and_process_images(image_urls),)

 class IdeogramV3(ComfyNodeABC):
    """
@@ -653,7 +653,7 @@ class IdeogramV3(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        image=None,
@@ -774,7 +774,7 @@ class IdeogramV3(ComfyNodeABC):
            )

        # Execute the operation and process response
-        response = operation.execute()
+        response = await operation.execute()

        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")
@@ -785,7 +785,7 @@ class IdeogramV3(ComfyNodeABC):
            raise Exception("No image URLs were generated in the response")

        display_image_urls_on_node(image_urls, unique_id)
-        return (download_and_process_images(image_urls),)
+        return (await download_and_process_images(image_urls),)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_kling.py
+++ b/comfy_api_nodes/nodes_kling.py
@@ -109,7 +109,7 @@ class KlingApiError(Exception):
    pass


-def poll_until_finished(
+async def poll_until_finished(
    auth_kwargs: dict[str, str],
    api_endpoint: ApiEndpoint[Any, R],
    result_url_extractor: Optional[Callable[[R], str]] = None,
@@ -117,7 +117,7 @@ def poll_until_finished(
    node_id: Optional[str] = None,
 ) -> R:
    """Polls the Kling API endpoint until the task reaches a terminal state, then returns the response."""
-    return PollingOperation(
+    return await PollingOperation(
        poll_endpoint=api_endpoint,
        completed_statuses=[
            KlingTaskStatus.succeed.value,
@@ -278,18 +278,18 @@ def get_images_urls_from_response(response) -> Optional[str]:
        return None


-def video_result_to_node_output(
+async def video_result_to_node_output(
    video: KlingVideoResult,
 ) -> tuple[VideoFromFile, str, str]:
    """Converts a KlingVideoResult to a tuple of (VideoFromFile, str, str) to be used as a ComfyUI node output."""
    return (
-        download_url_to_video_output(video.url),
+        await download_url_to_video_output(str(video.url)),
        str(video.id),
        str(video.duration),
    )


-def image_result_to_node_output(
+async def image_result_to_node_output(
    images: list[KlingImageResult],
 ) -> torch.Tensor:
    """
@@ -297,9 +297,9 @@ def image_result_to_node_output(
    If multiple images are returned, they will be stacked along the batch dimension.
    """
    if len(images) == 1:
-        return download_url_to_image_tensor(images[0].url)
+        return await download_url_to_image_tensor(str(images[0].url))
    else:
-        return torch.cat([download_url_to_image_tensor(image.url) for image in images])
+        return torch.cat([await download_url_to_image_tensor(str(image.url)) for image in images])


 class KlingNodeBase(ComfyNodeABC):
@@ -467,10 +467,10 @@ class KlingTextToVideoNode(KlingNodeBase):
    RETURN_NAMES = ("VIDEO", "video_id", "duration")
    DESCRIPTION = "Kling Text to Video Node"

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingText2VideoResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_TEXT_TO_VIDEO}/{task_id}",
@@ -483,7 +483,7 @@ class KlingTextToVideoNode(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        negative_prompt: str,
@@ -519,17 +519,17 @@ class KlingTextToVideoNode(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)

        task_id = task_creation_response.data.task_id
-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingCameraControlT2VNode(KlingTextToVideoNode):
@@ -581,7 +581,7 @@ class KlingCameraControlT2VNode(KlingTextToVideoNode):

    DESCRIPTION = "Transform text into cinematic videos with professional camera movements that simulate real-world cinematography. Control virtual camera actions including zoom, rotation, pan, tilt, and first-person view, while maintaining focus on your original text."

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        negative_prompt: str,
@@ -591,7 +591,7 @@ class KlingCameraControlT2VNode(KlingTextToVideoNode):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            model_name=KlingVideoGenModelName.kling_v1,
            cfg_scale=cfg_scale,
            mode=KlingVideoGenMode.std,
@@ -670,10 +670,10 @@ class KlingImage2VideoNode(KlingNodeBase):
    RETURN_NAMES = ("VIDEO", "video_id", "duration")
    DESCRIPTION = "Kling Image to Video Node"

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingImage2VideoResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_IMAGE_TO_VIDEO}/{task_id}",
@@ -686,7 +686,7 @@ class KlingImage2VideoNode(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        start_frame: torch.Tensor,
        prompt: str,
@@ -733,17 +733,17 @@ class KlingImage2VideoNode(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingCameraControlI2VNode(KlingImage2VideoNode):
@@ -798,7 +798,7 @@ class KlingCameraControlI2VNode(KlingImage2VideoNode):

    DESCRIPTION = "Transform still images into cinematic videos with professional camera movements that simulate real-world cinematography. Control virtual camera actions including zoom, rotation, pan, tilt, and first-person view, while maintaining focus on your original image."

-    def api_call(
+    async def api_call(
        self,
        start_frame: torch.Tensor,
        prompt: str,
@@ -809,7 +809,7 @@ class KlingCameraControlI2VNode(KlingImage2VideoNode):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            model_name=KlingVideoGenModelName.kling_v1_5,
            start_frame=start_frame,
            cfg_scale=cfg_scale,
@@ -897,7 +897,7 @@ class KlingStartEndFrameNode(KlingImage2VideoNode):

    DESCRIPTION = "Generate a video sequence that transitions between your provided start and end images. The node creates all frames in between, producing a smooth transformation from the first frame to the last."

-    def api_call(
+    async def api_call(
        self,
        start_frame: torch.Tensor,
        end_frame: torch.Tensor,
@@ -912,7 +912,7 @@ class KlingStartEndFrameNode(KlingImage2VideoNode):
        mode, duration, model_name = KlingStartEndFrameNode.get_mode_string_mapping()[
            mode
        ]
-        return super().api_call(
+        return await super().api_call(
            prompt=prompt,
            negative_prompt=negative_prompt,
            model_name=model_name,
@@ -964,10 +964,10 @@ class KlingVideoExtendNode(KlingNodeBase):
    RETURN_NAMES = ("VIDEO", "video_id", "duration")
    DESCRIPTION = "Kling Video Extend Node. Extend videos made by other Kling nodes. The video_id is created by using other Kling Nodes."

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingVideoExtendResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_VIDEO_EXTEND}/{task_id}",
@@ -980,7 +980,7 @@ class KlingVideoExtendNode(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        negative_prompt: str,
@@ -1006,17 +1006,17 @@ class KlingVideoExtendNode(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingVideoEffectsBase(KlingNodeBase):
@@ -1025,10 +1025,10 @@ class KlingVideoEffectsBase(KlingNodeBase):
    RETURN_TYPES = ("VIDEO", "STRING", "STRING")
    RETURN_NAMES = ("VIDEO", "video_id", "duration")

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingVideoEffectsResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_VIDEO_EFFECTS}/{task_id}",
@@ -1041,7 +1041,7 @@ class KlingVideoEffectsBase(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        dual_character: bool,
        effect_scene: KlingDualCharacterEffectsScene | KlingSingleImageEffectsScene,
@@ -1084,17 +1084,17 @@ class KlingVideoEffectsBase(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingDualCharacterVideoEffectNode(KlingVideoEffectsBase):
@@ -1142,7 +1142,7 @@ class KlingDualCharacterVideoEffectNode(KlingVideoEffectsBase):
    RETURN_TYPES = ("VIDEO", "STRING")
    RETURN_NAMES = ("VIDEO", "duration")

-    def api_call(
+    async def api_call(
        self,
        image_left: torch.Tensor,
        image_right: torch.Tensor,
@@ -1153,7 +1153,7 @@ class KlingDualCharacterVideoEffectNode(KlingVideoEffectsBase):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        video, _, duration = super().api_call(
+        video, _, duration = await super().api_call(
            dual_character=True,
            effect_scene=effect_scene,
            model_name=model_name,
@@ -1208,7 +1208,7 @@ class KlingSingleImageVideoEffectNode(KlingVideoEffectsBase):

    DESCRIPTION = "Achieve different special effects when generating a video based on the effect_scene."

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        effect_scene: KlingSingleImageEffectsScene,
@@ -1217,7 +1217,7 @@ class KlingSingleImageVideoEffectNode(KlingVideoEffectsBase):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            dual_character=False,
            effect_scene=effect_scene,
            model_name=model_name,
@@ -1253,11 +1253,11 @@ class KlingLipSyncBase(KlingNodeBase):
                f"Text is too long. Maximum length is {MAX_PROMPT_LENGTH_LIP_SYNC} characters."
            )

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingLipSyncResponse:
        """Polls the Kling API endpoint until the task reaches a terminal state."""
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_LIP_SYNC}/{task_id}",
@@ -1270,7 +1270,7 @@ class KlingLipSyncBase(KlingNodeBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        audio: Optional[AudioInput] = None,
@@ -1287,12 +1287,12 @@ class KlingLipSyncBase(KlingNodeBase):
        self.validate_lip_sync_video(video)

        # Upload video to Comfy API and get download URL
-        video_url = upload_video_to_comfyapi(video, auth_kwargs=kwargs)
+        video_url = await upload_video_to_comfyapi(video, auth_kwargs=kwargs)
        logging.info("Uploaded video to Comfy API. URL: %s", video_url)

        # Upload the audio file to Comfy API and get download URL
        if audio:
-            audio_url = upload_audio_to_comfyapi(audio, auth_kwargs=kwargs)
+            audio_url = await upload_audio_to_comfyapi(audio, auth_kwargs=kwargs)
            logging.info("Uploaded audio to Comfy API. URL: %s", audio_url)
        else:
            audio_url = None
@@ -1319,17 +1319,17 @@ class KlingLipSyncBase(KlingNodeBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_video_result_response(final_response)

        video = get_video_from_response(final_response)
-        return video_result_to_node_output(video)
+        return await video_result_to_node_output(video)


 class KlingLipSyncAudioToVideoNode(KlingLipSyncBase):
@@ -1357,7 +1357,7 @@ class KlingLipSyncAudioToVideoNode(KlingLipSyncBase):

    DESCRIPTION = "Kling Lip Sync Audio to Video Node. Syncs mouth movements in a video file to the audio content of an audio file. When using, ensure that the audio contains clearly distinguishable vocals and that the video contains a distinct face. The audio file should not be larger than 5MB. The video file should not be larger than 100MB, should have height/width between 720px and 1920px, and should be between 2s and 10s in length."

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        audio: AudioInput,
@@ -1365,7 +1365,7 @@ class KlingLipSyncAudioToVideoNode(KlingLipSyncBase):
        unique_id: Optional[str] = None,
        **kwargs,
    ):
-        return super().api_call(
+        return await super().api_call(
            video=video,
            audio=audio,
            voice_language=voice_language,
@@ -1469,7 +1469,7 @@ class KlingLipSyncTextToVideoNode(KlingLipSyncBase):

    DESCRIPTION = "Kling Lip Sync Text to Video Node. Syncs mouth movements in a video file to a text prompt. The video file should not be larger than 100MB, should have height/width between 720px and 1920px, and should be between 2s and 10s in length."

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        text: str,
@@ -1479,7 +1479,7 @@ class KlingLipSyncTextToVideoNode(KlingLipSyncBase):
        **kwargs,
    ):
        voice_id, voice_language = KlingLipSyncTextToVideoNode.get_voice_config()[voice]
-        return super().api_call(
+        return await super().api_call(
            video=video,
            text=text,
            voice_language=voice_language,
@@ -1533,10 +1533,10 @@ class KlingVirtualTryOnNode(KlingImageGenerationBase):

    DESCRIPTION = "Kling Virtual Try On Node. Input a human image and a cloth image to try on the cloth on the human. You can merge multiple clothing item pictures into one image with a white background."

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> KlingVirtualTryOnResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_VIRTUAL_TRY_ON}/{task_id}",
@@ -1549,7 +1549,7 @@ class KlingVirtualTryOnNode(KlingImageGenerationBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        human_image: torch.Tensor,
        cloth_image: torch.Tensor,
@@ -1572,17 +1572,17 @@ class KlingVirtualTryOnNode(KlingImageGenerationBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_image_result_response(final_response)

        images = get_images_from_response(final_response)
-        return (image_result_to_node_output(images),)
+        return (await image_result_to_node_output(images),)


 class KlingImageGenerationNode(KlingImageGenerationBase):
@@ -1655,13 +1655,13 @@ class KlingImageGenerationNode(KlingImageGenerationBase):

    DESCRIPTION = "Kling Image Generation Node. Generate an image from a text prompt with an optional reference image."

-    def get_response(
+    async def get_response(
        self,
        task_id: str,
        auth_kwargs: Optional[dict[str, str]],
        node_id: Optional[str] = None,
    ) -> KlingImageGenerationsResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_IMAGE_GENERATIONS}/{task_id}",
@@ -1674,7 +1674,7 @@ class KlingImageGenerationNode(KlingImageGenerationBase):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        model_name: KlingImageGenModelName,
        prompt: str,
@@ -1714,17 +1714,17 @@ class KlingImageGenerationNode(KlingImageGenerationBase):
            auth_kwargs=kwargs,
        )

-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        validate_image_result_response(final_response)

        images = get_images_from_response(final_response)
-        return (image_result_to_node_output(images),)
+        return (await image_result_to_node_output(images),)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_luma.py
+++ b/comfy_api_nodes/nodes_luma.py
@@ -38,7 +38,7 @@ from comfy_api_nodes.apinode_utils import (
 )
 from server import PromptServer

-import requests
+import aiohttp
 import torch
 from io import BytesIO

@@ -217,7 +217,7 @@ class LumaImageGenerationNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -234,19 +234,19 @@ class LumaImageGenerationNode(ComfyNodeABC):
        # handle image_luma_ref
        api_image_ref = None
        if image_luma_ref is not None:
-            api_image_ref = self._convert_luma_refs(
+            api_image_ref = await self._convert_luma_refs(
                image_luma_ref, max_refs=4, auth_kwargs=kwargs,
            )
        # handle style_luma_ref
        api_style_ref = None
        if style_image is not None:
-            api_style_ref = self._convert_style_image(
+            api_style_ref = await self._convert_style_image(
                style_image, weight=style_image_weight, auth_kwargs=kwargs,
            )
        # handle character_ref images
        character_ref = None
        if character_image is not None:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                character_image, max_images=4, auth_kwargs=kwargs,
            )
            character_ref = LumaCharacterRef(
@@ -270,7 +270,7 @@ class LumaImageGenerationNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        operation = PollingOperation(
            poll_endpoint=ApiEndpoint(
@@ -286,19 +286,20 @@ class LumaImageGenerationNode(ComfyNodeABC):
            node_id=unique_id,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        img_response = requests.get(response_poll.assets.image)
-        img = process_image_response(img_response)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.image) as img_response:
+                img = process_image_response(await img_response.content.read())
        return (img,)

-    def _convert_luma_refs(
+    async def _convert_luma_refs(
        self, luma_ref: LumaReferenceChain, max_refs: int, auth_kwargs: Optional[dict[str,str]] = None
    ):
        luma_urls = []
        ref_count = 0
        for ref in luma_ref.refs:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                ref.image, max_images=1, auth_kwargs=auth_kwargs
            )
            luma_urls.append(download_urls[0])
@@ -307,13 +308,13 @@ class LumaImageGenerationNode(ComfyNodeABC):
                break
        return luma_ref.create_api_model(download_urls=luma_urls, max_refs=max_refs)

-    def _convert_style_image(
+    async def _convert_style_image(
        self, style_image: torch.Tensor, weight: float, auth_kwargs: Optional[dict[str,str]] = None
    ):
        chain = LumaReferenceChain(
            first_ref=LumaReference(image=style_image, weight=weight)
        )
-        return self._convert_luma_refs(chain, max_refs=1, auth_kwargs=auth_kwargs)
+        return await self._convert_luma_refs(chain, max_refs=1, auth_kwargs=auth_kwargs)


 class LumaImageModifyNode(ComfyNodeABC):
@@ -370,7 +371,7 @@ class LumaImageModifyNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -381,7 +382,7 @@ class LumaImageModifyNode(ComfyNodeABC):
        **kwargs,
    ):
        # first, upload image
-        download_urls = upload_images_to_comfyapi(
+        download_urls = await upload_images_to_comfyapi(
            image, max_images=1, auth_kwargs=kwargs,
        )
        image_url = download_urls[0]
@@ -402,7 +403,7 @@ class LumaImageModifyNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        operation = PollingOperation(
            poll_endpoint=ApiEndpoint(
@@ -418,10 +419,11 @@ class LumaImageModifyNode(ComfyNodeABC):
            node_id=unique_id,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        img_response = requests.get(response_poll.assets.image)
-        img = process_image_response(img_response)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.image) as img_response:
+                img = process_image_response(await img_response.content.read())
        return (img,)


@@ -494,7 +496,7 @@ class LumaTextToVideoGenerationNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -529,7 +531,7 @@ class LumaTextToVideoGenerationNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        if unique_id:
            PromptServer.instance.send_progress_text(f"Luma video generation started: {response_api.id}", unique_id)
@@ -549,10 +551,11 @@ class LumaTextToVideoGenerationNode(ComfyNodeABC):
            estimated_duration=LUMA_T2V_AVERAGE_DURATION,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.assets.video)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.video) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 class LumaImageToVideoGenerationNode(ComfyNodeABC):
@@ -626,7 +629,7 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        model: str,
@@ -644,7 +647,7 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            raise Exception(
                "At least one of first_image and last_image requires an input."
            )
-        keyframes = self._convert_to_keyframes(first_image, last_image, auth_kwargs=kwargs)
+        keyframes = await self._convert_to_keyframes(first_image, last_image, auth_kwargs=kwargs)
        duration = duration if model != LumaVideoModel.ray_1_6 else None
        resolution = resolution if model != LumaVideoModel.ray_1_6 else None

@@ -667,7 +670,7 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api: LumaGeneration = operation.execute()
+        response_api: LumaGeneration = await operation.execute()

        if unique_id:
            PromptServer.instance.send_progress_text(f"Luma video generation started: {response_api.id}", unique_id)
@@ -687,12 +690,13 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
            estimated_duration=LUMA_I2V_AVERAGE_DURATION,
            auth_kwargs=kwargs,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.assets.video)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.video) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)

-    def _convert_to_keyframes(
+    async def _convert_to_keyframes(
        self,
        first_image: torch.Tensor = None,
        last_image: torch.Tensor = None,
@@ -703,12 +707,12 @@ class LumaImageToVideoGenerationNode(ComfyNodeABC):
        frame0 = None
        frame1 = None
        if first_image is not None:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                first_image, max_images=1, auth_kwargs=auth_kwargs,
            )
            frame0 = LumaImageReference(type="image", url=download_urls[0])
        if last_image is not None:
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                last_image, max_images=1, auth_kwargs=auth_kwargs,
            )
            frame1 = LumaImageReference(type="image", url=download_urls[0])
--- a/comfy_api_nodes/nodes_minimax.py
+++ b/comfy_api_nodes/nodes_minimax.py
@@ -86,7 +86,7 @@ class MinimaxTextToVideoNode:
    API_NODE = True
    OUTPUT_NODE = True

-    def generate_video(
+    async def generate_video(
        self,
        prompt_text,
        seed=0,
@@ -104,12 +104,12 @@ class MinimaxTextToVideoNode:
        # upload image, if passed in
        image_url = None
        if image is not None:
-            image_url = upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs)[0]
+            image_url = (await upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs))[0]

        # TODO: figure out how to deal with subject properly, API returns invalid params when using S2V-01 model
        subject_reference = None
        if subject is not None:
-            subject_url = upload_images_to_comfyapi(subject, max_images=1, auth_kwargs=kwargs)[0]
+            subject_url = (await upload_images_to_comfyapi(subject, max_images=1, auth_kwargs=kwargs))[0]
            subject_reference = [SubjectReferenceItem(image=subject_url)]


@@ -130,7 +130,7 @@ class MinimaxTextToVideoNode:
            ),
            auth_kwargs=kwargs,
        )
-        response = video_generate_operation.execute()
+        response = await video_generate_operation.execute()

        task_id = response.task_id
        if not task_id:
@@ -151,7 +151,7 @@ class MinimaxTextToVideoNode:
            node_id=unique_id,
            auth_kwargs=kwargs,
        )
-        task_result = video_generate_operation.execute()
+        task_result = await video_generate_operation.execute()

        file_id = task_result.file_id
        if file_id is None:
@@ -167,7 +167,7 @@ class MinimaxTextToVideoNode:
            request=EmptyRequest(),
            auth_kwargs=kwargs,
        )
-        file_result = file_retrieve_operation.execute()
+        file_result = await file_retrieve_operation.execute()

        file_url = file_result.file.download_url
        if file_url is None:
@@ -182,7 +182,7 @@ class MinimaxTextToVideoNode:
                message = f"Result URL: {file_url}"
            PromptServer.instance.send_progress_text(message, unique_id)

-        video_io = download_url_to_bytesio(file_url)
+        video_io = await download_url_to_bytesio(file_url)
        if video_io is None:
            error_msg = f"Failed to download video from {file_url}"
            logging.error(error_msg)
--- a/comfy_api_nodes/nodes_moonvalley.py
+++ b/comfy_api_nodes/nodes_moonvalley.py
@@ -2,7 +2,10 @@ import logging
 from typing import Any, Callable, Optional, TypeVar
 import random
 import torch
-from comfy_api_nodes.util.validation_utils import get_image_dimensions, validate_image_dimensions, validate_video_dimensions
+from comfy_api_nodes.util.validation_utils import (
+    get_image_dimensions,
+    validate_image_dimensions,
+)


 from comfy_api_nodes.apis import (
@@ -10,7 +13,7 @@ from comfy_api_nodes.apis import (
    MoonvalleyTextToVideoInferenceParams,
    MoonvalleyVideoToVideoInferenceParams,
    MoonvalleyVideoToVideoRequest,
-    MoonvalleyPromptResponse
+    MoonvalleyPromptResponse,
 )
 from comfy_api_nodes.apis.client import (
    ApiEndpoint,
@@ -54,20 +57,26 @@ MAX_VIDEO_SIZE = 1024 * 1024 * 1024  # 1 GB max for in-memory video processing

 MOONVALLEY_MAREY_MAX_PROMPT_LENGTH = 5000
 R = TypeVar("R")
+
+
 class MoonvalleyApiError(Exception):
    """Base exception for Moonvalley API errors."""
+
    pass

+
 def is_valid_task_creation_response(response: MoonvalleyPromptResponse) -> bool:
    """Verifies that the initial response contains a task ID."""
    return bool(response.id)

+
 def validate_task_creation_response(response) -> None:
    if not is_valid_task_creation_response(response):
        error_msg = f"Moonvalley Marey API: Initial request failed. Code: {response.code}, Message: {response.message}, Data: {response}"
        logging.error(error_msg)
        raise MoonvalleyApiError(error_msg)

+
 def get_video_from_response(response):
    video = response.output_url
    logging.info(
@@ -86,14 +95,14 @@ def get_video_url_from_response(response) -> Optional[str]:
        return None


-def poll_until_finished(
+async def poll_until_finished(
    auth_kwargs: dict[str, str],
    api_endpoint: ApiEndpoint[Any, R],
    result_url_extractor: Optional[Callable[[R], str]] = None,
    node_id: Optional[str] = None,
 ) -> R:
    """Polls the Moonvalley API endpoint until the task reaches a terminal state, then returns the response."""
-    return PollingOperation(
+    return await PollingOperation(
        poll_endpoint=api_endpoint,
        completed_statuses=[
            "completed",
@@ -102,16 +111,17 @@ def poll_until_finished(
        poll_interval=16.0,
        failed_statuses=["error"],
        status_extractor=lambda response: (
-            response.status
-            if response and response.status
-            else None
+            response.status if response and response.status else None
        ),
        auth_kwargs=auth_kwargs,
        result_url_extractor=result_url_extractor,
        node_id=node_id,
    ).execute()

-def validate_prompts(prompt:str, negative_prompt: str, max_length=MOONVALLEY_MAREY_MAX_PROMPT_LENGTH):
+
+def validate_prompts(
+    prompt: str, negative_prompt: str, max_length=MOONVALLEY_MAREY_MAX_PROMPT_LENGTH
+):
    """Verifies that the prompt isn't empty and that neither prompt is too long."""
    if not prompt:
        raise ValueError("Positive prompt is empty")
@@ -123,16 +133,15 @@ def validate_prompts(prompt:str, negative_prompt: str, max_length=MOONVALLEY_MAR
        )
    return True

+
 def validate_input_media(width, height, with_frame_conditioning, num_frames_in=None):
-        # inference validation
-        # T = num_frames
-        # in all cases, the following must be true: T divisible by 16 and H,W by 8. in addition...
-        # with image conditioning: H*W must be divisible by 8192
-        # without image conditioning: T divisible by 32
-    if num_frames_in and not num_frames_in % 16 == 0 :
-        return False, (
-            "The input video total frame count must be divisible by 16!"
-        )
+    # inference validation
+    # T = num_frames
+    # in all cases, the following must be true: T divisible by 16 and H,W by 8. in addition...
+    # with image conditioning: H*W must be divisible by 8192
+    # without image conditioning: T divisible by 32
+    if num_frames_in and not num_frames_in % 16 == 0:
+        return False, ("The input video total frame count must be divisible by 16!")

    if height % 8 != 0 or width % 8 != 0:
        return False, (
@@ -146,13 +155,13 @@ def validate_input_media(width, height, with_frame_conditioning, num_frames_in=N
                "divisible by 8192 for frame conditioning"
            )
    else:
-        if num_frames_in and not num_frames_in % 32 == 0 :
-            return False, (
-                "The input video total frame count must be divisible by 32!"
-            )
+        if num_frames_in and not num_frames_in % 32 == 0:
+            return False, ("The input video total frame count must be divisible by 32!")


-def validate_input_image(image: torch.Tensor, with_frame_conditioning: bool=False) -> None:
+def validate_input_image(
+    image: torch.Tensor, with_frame_conditioning: bool = False
+) -> None:
    """
    Validates the input image adheres to the expectations of the API:
    - The image resolution should not be less than 300*300px
@@ -160,42 +169,82 @@ def validate_input_image(image: torch.Tensor, with_frame_conditioning: bool=Fals

    """
    height, width = get_image_dimensions(image)
-    validate_input_media(width, height, with_frame_conditioning )
-    validate_image_dimensions(image, min_width=300, min_height=300, max_height=MAX_HEIGHT, max_width=MAX_WIDTH)
+    validate_input_media(width, height, with_frame_conditioning)
+    validate_image_dimensions(
+        image, min_width=300, min_height=300, max_height=MAX_HEIGHT, max_width=MAX_WIDTH
+    )

-def validate_input_video(video: VideoInput, num_frames_out: int, with_frame_conditioning: bool=False):
+
+def validate_video_to_video_input(video: VideoInput) -> VideoInput:
+    """
+    Validates and processes video input for Moonvalley Video-to-Video generation.
+
+    Args:
+        video: Input video to validate
+
+    Returns:
+        Validated and potentially trimmed video
+
+    Raises:
+        ValueError: If video doesn't meet requirements
+        MoonvalleyApiError: If video duration is too short
+    """
+    width, height = _get_video_dimensions(video)
+    _validate_video_dimensions(width, height)
+    _validate_container_format(video)
+
+    return _validate_and_trim_duration(video)
+
+
+def _get_video_dimensions(video: VideoInput) -> tuple[int, int]:
+    """Extracts video dimensions with error handling."""
    try:
-        width, height = video.get_dimensions()
+        return video.get_dimensions()
    except Exception as e:
        logging.error("Error getting dimensions of video: %s", e)
        raise ValueError(f"Cannot get video dimensions: {e}") from e

-    validate_input_media(width, height, with_frame_conditioning)
-    validate_video_dimensions(video, min_width=MIN_VID_WIDTH, min_height=MIN_VID_HEIGHT, max_width=MAX_VID_WIDTH, max_height=MAX_VID_HEIGHT)

-    trimmed_video = validate_input_video_length(video, num_frames_out)
-    return trimmed_video
+def _validate_video_dimensions(width: int, height: int) -> None:
+    """Validates video dimensions meet Moonvalley V2V requirements."""
+    supported_resolutions = {
+        (1920, 1080), (1080, 1920), (1152, 1152),
+        (1536, 1152), (1152, 1536)
+    }
+
+    if (width, height) not in supported_resolutions:
+        supported_list = ', '.join([f'{w}x{h}' for w, h in sorted(supported_resolutions)])
+        raise ValueError(f"Resolution {width}x{height} not supported. Supported: {supported_list}")


-def validate_input_video_length(video: VideoInput, num_frames: int):
+def _validate_container_format(video: VideoInput) -> None:
+    """Validates video container format is MP4."""
+    container_format = video.get_container_format()
+    if container_format not in ['mp4', 'mov,mp4,m4a,3gp,3g2,mj2']:
+        raise ValueError(f"Only MP4 container format supported. Got: {container_format}")

-    if video.get_duration() > 60:
-        raise MoonvalleyApiError("Input Video lenth should be less than 1min. Please trim.")

-    if num_frames == 128:
-       if video.get_duration() < 5:
-           raise MoonvalleyApiError("Input Video length is less than 5s. Please use a video longer than or equal to 5s.")
-       if video.get_duration() > 5:
-        #    trim video to 5s
-        video = trim_video(video, 5)
-    if num_frames == 256:
-        if video.get_duration() < 10:
-            raise MoonvalleyApiError("Input Video length is less than 10s. Please use a video longer than or equal to 10s.")
-        if video.get_duration() > 10:
-            # trim video to 10s
-            video = trim_video(video, 10)
+def _validate_and_trim_duration(video: VideoInput) -> VideoInput:
+    """Validates video duration and trims to 5 seconds if needed."""
+    duration = video.get_duration()
+    _validate_minimum_duration(duration)
+    return _trim_if_too_long(video, duration)
+
+
+def _validate_minimum_duration(duration: float) -> None:
+    """Ensures video is at least 5 seconds long."""
+    if duration < 5:
+        raise MoonvalleyApiError("Input video must be at least 5 seconds long.")
+
+
+def _trim_if_too_long(video: VideoInput, duration: float) -> VideoInput:
+    """Trims video to 5 seconds if longer."""
+    if duration > 5:
+        return trim_video(video, 5)
    return video

+
+
 def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
    """
    Returns a new VideoInput object trimmed from the beginning to the specified duration,
@@ -219,8 +268,8 @@ def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
        input_source = video.get_stream_source()

        # Open containers
-        input_container = av.open(input_source, mode='r')
-        output_container = av.open(output_buffer, mode='w', format='mp4')
+        input_container = av.open(input_source, mode="r")
+        output_container = av.open(output_buffer, mode="w", format="mp4")

        # Set up output streams for re-encoding
        video_stream = None
@@ -230,25 +279,33 @@ def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
            logging.info(f"Found stream: type={stream.type}, class={type(stream)}")
            if isinstance(stream, av.VideoStream):
                # Create output video stream with same parameters
-                video_stream = output_container.add_stream('h264', rate=stream.average_rate)
+                video_stream = output_container.add_stream(
+                    "h264", rate=stream.average_rate
+                )
                video_stream.width = stream.width
                video_stream.height = stream.height
-                video_stream.pix_fmt = 'yuv420p'
-                logging.info(f"Added video stream: {stream.width}x{stream.height} @ {stream.average_rate}fps")
+                video_stream.pix_fmt = "yuv420p"
+                logging.info(
+                    f"Added video stream: {stream.width}x{stream.height} @ {stream.average_rate}fps"
+                )
            elif isinstance(stream, av.AudioStream):
                # Create output audio stream with same parameters
-                audio_stream = output_container.add_stream('aac', rate=stream.sample_rate)
+                audio_stream = output_container.add_stream(
+                    "aac", rate=stream.sample_rate
+                )
                audio_stream.sample_rate = stream.sample_rate
                audio_stream.layout = stream.layout
-                logging.info(f"Added audio stream: {stream.sample_rate}Hz, {stream.channels} channels")
+                logging.info(
+                    f"Added audio stream: {stream.sample_rate}Hz, {stream.channels} channels"
+                )

-        # Calculate target frame count that's divisible by 32
+        # Calculate target frame count that's divisible by 16
        fps = input_container.streams.video[0].average_rate
        estimated_frames = int(duration_sec * fps)
-        target_frames = (estimated_frames // 32) * 32  # Round down to nearest multiple of 32
+        target_frames = (estimated_frames // 16) * 16  # Round down to nearest multiple of 16

        if target_frames == 0:
-            raise ValueError("Video too short: need at least 32 frames for Moonvalley")
+            raise ValueError("Video too short: need at least 16 frames for Moonvalley")

        frame_count = 0
        audio_frame_count = 0
@@ -268,7 +325,9 @@ def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
            for packet in video_stream.encode():
                output_container.mux(packet)

-            logging.info(f"Encoded {frame_count} video frames (target: {target_frames})")
+            logging.info(
+                f"Encoded {frame_count} video frames (target: {target_frames})"
+            )

        # Decode and re-encode audio frames
        if audio_stream:
@@ -292,7 +351,6 @@ def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
        output_container.close()
        input_container.close()

-
        # Return as VideoFromFile using the buffer
        output_buffer.seek(0)
        return VideoFromFile(output_buffer)
@@ -305,6 +363,7 @@ def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
            output_container.close()
        raise RuntimeError(f"Failed to trim video: {str(e)}") from e

+
 # --- BaseMoonvalleyVideoNode ---
 class BaseMoonvalleyVideoNode:
    def parseWidthHeightFromRes(self, resolution: str):
@@ -313,8 +372,8 @@ class BaseMoonvalleyVideoNode:
            "16:9 (1920 x 1080)": {"width": 1920, "height": 1080},
            "9:16 (1080 x 1920)": {"width": 1080, "height": 1920},
            "1:1 (1152 x 1152)": {"width": 1152, "height": 1152},
-            "4:3 (1440 x 1080)": {"width": 1440, "height": 1080},
-            "3:4 (1080 x 1440)": {"width": 1080, "height": 1440},
+            "4:3 (1536 x 1152)": {"width": 1536, "height": 1152},
+            "3:4 (1152 x 1536)": {"width": 1152, "height": 1536},
            "21:9 (2560 x 1080)": {"width": 2560, "height": 1080},
        }
        if resolution in res_map:
@@ -328,17 +387,17 @@ class BaseMoonvalleyVideoNode:
            "Motion Transfer": "motion_control",
            "Canny": "canny_control",
            "Pose Transfer": "pose_control",
-            "Depth": "depth_control"
+            "Depth": "depth_control",
        }
        if value in control_map:
            return control_map[value]
        else:
            return control_map["Motion Transfer"]

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> MoonvalleyPromptResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{API_PROMPTS_ENDPOINT}/{task_id}",
@@ -355,31 +414,63 @@ class BaseMoonvalleyVideoNode:
        return {
            "required": {
                "prompt": model_field_to_node_input(
-                    IO.STRING, MoonvalleyTextToVideoRequest, "prompt_text",
-                    multiline=True
+                    IO.STRING,
+                    MoonvalleyTextToVideoRequest,
+                    "prompt_text",
+                    multiline=True,
                ),
                "negative_prompt": model_field_to_node_input(
                    IO.STRING,
                    MoonvalleyTextToVideoInferenceParams,
                    "negative_prompt",
                    multiline=True,
-                    default="gopro, bright, contrast, static, overexposed, bright, vignette, artifacts, still, noise, texture, scanlines, videogame, 360 camera, VR, transition, flare, saturation, distorted, warped, wide angle, contrast, saturated, vibrant, glowing, cross dissolve, texture, videogame, saturation, cheesy, ugly hands, mutated hands, mutant, disfigured, extra fingers, blown out, horrible, blurry, worst quality, bad, transition, dissolve, cross-dissolve, melt, fade in, fade out, wobbly, weird, low quality, plastic, stock footage, video camera, boring, static",
+                    default="low-poly, flat shader, bad rigging, stiff animation, uncanny eyes, low-quality textures, looping glitch, cheap effect, overbloom, bloom spam, default lighting, game asset, stiff face, ugly specular, AI artifacts",
                ),
-
-                "resolution": (IO.COMBO, {
-                        "options": ["16:9 (1920 x 1080)",
-                                    "9:16 (1080 x 1920)",
-                                    "1:1 (1152 x 1152)",
-                                    "4:3 (1440 x 1080)",
-                                    "3:4 (1080 x 1440)",
-                                    "21:9 (2560 x 1080)"],
+                "resolution": (
+                    IO.COMBO,
+                    {
+                        "options": [
+                            "16:9 (1920 x 1080)",
+                            "9:16 (1080 x 1920)",
+                            "1:1 (1152 x 1152)",
+                            "4:3 (1440 x 1080)",
+                            "3:4 (1080 x 1440)",
+                            "21:9 (2560 x 1080)",
+                        ],
                        "default": "16:9 (1920 x 1080)",
                        "tooltip": "Resolution of the output video",
-                    }),
+                    },
+                ),
                # "length": (IO.COMBO,{"options":['5s','10s'], "default": '5s'}),
-                "prompt_adherence": model_field_to_node_input(IO.FLOAT,MoonvalleyTextToVideoInferenceParams,"guidance_scale",default=7.0, step=1, min=1, max=20),
-                "seed": model_field_to_node_input(IO.INT,MoonvalleyTextToVideoInferenceParams, "seed", default=random.randint(0, 2**32 - 1), min=0, max=4294967295, step=1, display="number", tooltip="Random seed value", control_after_generate=True),
-                "steps": model_field_to_node_input(IO.INT, MoonvalleyTextToVideoInferenceParams, "steps", default=100, min=1, max=100),
+                "prompt_adherence": model_field_to_node_input(
+                    IO.FLOAT,
+                    MoonvalleyTextToVideoInferenceParams,
+                    "guidance_scale",
+                    default=7.0,
+                    step=1,
+                    min=1,
+                    max=20,
+                ),
+                "seed": model_field_to_node_input(
+                    IO.INT,
+                    MoonvalleyTextToVideoInferenceParams,
+                    "seed",
+                    default=random.randint(0, 2**32 - 1),
+                    min=0,
+                    max=4294967295,
+                    step=1,
+                    display="number",
+                    tooltip="Random seed value",
+                    control_after_generate=True,
+                ),
+                "steps": model_field_to_node_input(
+                    IO.INT,
+                    MoonvalleyTextToVideoInferenceParams,
+                    "steps",
+                    default=100,
+                    min=1,
+                    max=100,
+                ),
            },
            "hidden": {
                "auth_token": "AUTH_TOKEN_COMFY_ORG",
@@ -393,7 +484,7 @@ class BaseMoonvalleyVideoNode:
                    "image_url",
                    tooltip="The reference image used to generate the video",
                ),
-            }
+            },
        }

    RETURN_TYPES = ("STRING",)
@@ -404,6 +495,7 @@ class BaseMoonvalleyVideoNode:
    def generate(self, **kwargs):
        return None

+
 # --- MoonvalleyImg2VideoNode ---
 class MoonvalleyImg2VideoNode(BaseMoonvalleyVideoNode):

@@ -415,55 +507,58 @@ class MoonvalleyImg2VideoNode(BaseMoonvalleyVideoNode):
    RETURN_NAMES = ("video",)
    DESCRIPTION = "Moonvalley Marey Image to Video Node"

-    def generate(self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs):
+    async def generate(
+        self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs
+    ):
        image = kwargs.get("image", None)
-        if (image is None):
+        if image is None:
            raise MoonvalleyApiError("image is required")
-        total_frames = get_total_frames_from_length()

-        validate_input_image(image,True)
+        validate_input_image(image, True)
        validate_prompts(prompt, negative_prompt, MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
        width_height = self.parseWidthHeightFromRes(kwargs.get("resolution"))

-        inference_params=MoonvalleyTextToVideoInferenceParams(
-                    negative_prompt=negative_prompt,
-                    steps=kwargs.get("steps"),
-                    seed=kwargs.get("seed"),
-                    guidance_scale=kwargs.get("prompt_adherence"),
-                    num_frames=total_frames,
-                    width=width_height.get("width"),
-                    height=width_height.get("height"),
-                    use_negative_prompts=True
-                )
+        inference_params = MoonvalleyTextToVideoInferenceParams(
+            negative_prompt=negative_prompt,
+            steps=kwargs.get("steps"),
+            seed=kwargs.get("seed"),
+            guidance_scale=kwargs.get("prompt_adherence"),
+            num_frames=128,
+            width=width_height.get("width"),
+            height=width_height.get("height"),
+            use_negative_prompts=True,
+        )
        """Upload image to comfy backend to have a URL available for further processing"""
        # Get MIME type from tensor - assuming PNG format for image tensors
        mime_type = "image/png"

-        image_url = upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs, mime_type=mime_type)[0]
+        image_url = (await upload_images_to_comfyapi(
+            image, max_images=1, auth_kwargs=kwargs, mime_type=mime_type
+        ))[0]

        request = MoonvalleyTextToVideoRequest(
-                image_url=image_url,
-                prompt_text=prompt,
-                inference_params=inference_params
-            )
+            image_url=image_url, prompt_text=prompt, inference_params=inference_params
+        )
        initial_operation = SynchronousOperation(
-            endpoint=ApiEndpoint(path=API_IMG2VIDEO_ENDPOINT,
-                                 method=HttpMethod.POST,
-                                 request_model=MoonvalleyTextToVideoRequest,
-                                 response_model=MoonvalleyPromptResponse
-                                 ),
+            endpoint=ApiEndpoint(
+                path=API_IMG2VIDEO_ENDPOINT,
+                method=HttpMethod.POST,
+                request_model=MoonvalleyTextToVideoRequest,
+                response_model=MoonvalleyPromptResponse,
+            ),
            request=request,
            auth_kwargs=kwargs,
        )
-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
-        video = download_url_to_video_output(final_response.output_url)
-        return (video, )
+        video = await download_url_to_video_output(final_response.output_url)
+        return (video,)
+

 # --- MoonvalleyVid2VidNode ---
 class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
@@ -472,14 +567,28 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):

    @classmethod
    def INPUT_TYPES(cls):
-        input_types = super().INPUT_TYPES()
-        for param in ["resolution", "image"]:
-            if param in input_types["required"]:
-                del input_types["required"][param]
-            if param in input_types["optional"]:
-                del input_types["optional"][param]
-        input_types["optional"] = {
-                "video": (IO.VIDEO, {"default": "", "multiline": False, "tooltip": "The reference video used to generate the output video. Input a 5s video for 128 frames and a 10s video for 256 frames. Longer videos will be trimmed automatically."}),
+        return {
+            "required": {
+                "prompt": model_field_to_node_input(
+                    IO.STRING, MoonvalleyVideoToVideoRequest, "prompt_text",
+                    multiline=True
+                ),
+                "negative_prompt": model_field_to_node_input(
+                    IO.STRING,
+                    MoonvalleyVideoToVideoInferenceParams,
+                    "negative_prompt",
+                    multiline=True,
+                    default="low-poly, flat shader, bad rigging, stiff animation, uncanny eyes, low-quality textures, looping glitch, cheap effect, overbloom, bloom spam, default lighting, game asset, stiff face, ugly specular, AI artifacts"
+                ),
+                "seed": model_field_to_node_input(IO.INT,MoonvalleyVideoToVideoInferenceParams, "seed", default=random.randint(0, 2**32 - 1), min=0, max=4294967295, step=1, display="number", tooltip="Random seed value", control_after_generate=True),
+            },
+            "hidden": {
+                "auth_token": "AUTH_TOKEN_COMFY_ORG",
+                "comfy_api_key": "API_KEY_COMFY_ORG",
+                "unique_id": "UNIQUE_ID",
+            },
+            "optional": {
+                "video": (IO.VIDEO, {"default": "", "multiline": False, "tooltip": "The reference video used to generate the output video. Must be at least 5 seconds long. Videos longer than 5s will be automatically trimmed. Only MP4 format supported."}),
                "control_type": (
                    ["Motion Transfer", "Pose Transfer"],
                    {"default": "Motion Transfer"},
@@ -495,68 +604,72 @@ class MoonvalleyVideo2VideoNode(BaseMoonvalleyVideoNode):
                    },
                )
            }
-
-        return input_types
+        }

    RETURN_TYPES = ("VIDEO",)
    RETURN_NAMES = ("video",)

-    def generate(self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs):
+    async def generate(
+        self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs
+    ):
        video = kwargs.get("video")
-        num_frames = get_total_frames_from_length()

-        if not video :
+        if not video:
            raise MoonvalleyApiError("video is required")

-
-        """Validate video input"""
-        video_url=""
+        video_url = ""
        if video:
-            validated_video = validate_input_video(video, num_frames, False)
-            video_url = upload_video_to_comfyapi(validated_video, auth_kwargs=kwargs)
+            validated_video = validate_video_to_video_input(video)
+            video_url = await upload_video_to_comfyapi(validated_video, auth_kwargs=kwargs)

        control_type = kwargs.get("control_type")
        motion_intensity = kwargs.get("motion_intensity")

        """Validate prompts and inference input"""
        validate_prompts(prompt, negative_prompt)
+
+        # Only include motion_intensity for Motion Transfer
+        control_params = {}
+        if control_type == "Motion Transfer" and motion_intensity is not None:
+            control_params['motion_intensity'] = motion_intensity
+
        inference_params=MoonvalleyVideoToVideoInferenceParams(
            negative_prompt=negative_prompt,
-            steps=kwargs.get("steps"),
            seed=kwargs.get("seed"),
-            guidance_scale=kwargs.get("prompt_adherence"),
-            control_params={'motion_intensity': motion_intensity}
+            control_params=control_params
        )

        control = self.parseControlParameter(control_type)

        request = MoonvalleyVideoToVideoRequest(
-                control_type=control,
-                video_url=video_url,
-                prompt_text=prompt,
-                inference_params=inference_params
-            )
+            control_type=control,
+            video_url=video_url,
+            prompt_text=prompt,
+            inference_params=inference_params,
+        )

        initial_operation = SynchronousOperation(
-            endpoint=ApiEndpoint(path=API_VIDEO2VIDEO_ENDPOINT,
-                                 method=HttpMethod.POST,
-                                 request_model=MoonvalleyVideoToVideoRequest,
-                                 response_model=MoonvalleyPromptResponse
-                                 ),
+            endpoint=ApiEndpoint(
+                path=API_VIDEO2VIDEO_ENDPOINT,
+                method=HttpMethod.POST,
+                request_model=MoonvalleyVideoToVideoRequest,
+                response_model=MoonvalleyPromptResponse,
+            ),
            request=request,
            auth_kwargs=kwargs,
        )
-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )

-        video = download_url_to_video_output(final_response.output_url)
+        video = await download_url_to_video_output(final_response.output_url)
+
+        return (video,)

-        return (video, )

 # --- MoonvalleyTxt2VideoNode ---
 class MoonvalleyTxt2VideoNode(BaseMoonvalleyVideoNode):
@@ -575,65 +688,56 @@ class MoonvalleyTxt2VideoNode(BaseMoonvalleyVideoNode):
                del input_types["optional"][param]
        return input_types

-    def generate(self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs):
+    async def generate(
+        self, prompt, negative_prompt, unique_id: Optional[str] = None, **kwargs
+    ):
        validate_prompts(prompt, negative_prompt, MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
        width_height = self.parseWidthHeightFromRes(kwargs.get("resolution"))
-        num_frames = get_total_frames_from_length()

        inference_params=MoonvalleyTextToVideoInferenceParams(
                    negative_prompt=negative_prompt,
                    steps=kwargs.get("steps"),
                    seed=kwargs.get("seed"),
                    guidance_scale=kwargs.get("prompt_adherence"),
-                    num_frames=num_frames,
+                    num_frames=128,
                    width=width_height.get("width"),
                    height=width_height.get("height"),
                )
        request = MoonvalleyTextToVideoRequest(
-                prompt_text=prompt,
-                inference_params=inference_params
-            )
+            prompt_text=prompt, inference_params=inference_params
+        )

        initial_operation = SynchronousOperation(
-            endpoint=ApiEndpoint(path=API_TXT2VIDEO_ENDPOINT,
-                                 method=HttpMethod.POST,
-                                 request_model=MoonvalleyTextToVideoRequest,
-                                 response_model=MoonvalleyPromptResponse
-                                 ),
+            endpoint=ApiEndpoint(
+                path=API_TXT2VIDEO_ENDPOINT,
+                method=HttpMethod.POST,
+                request_model=MoonvalleyTextToVideoRequest,
+                response_model=MoonvalleyPromptResponse,
+            ),
            request=request,
            auth_kwargs=kwargs,
        )
-        task_creation_response = initial_operation.execute()
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.id

-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )

-        video = download_url_to_video_output(final_response.output_url)
-        return (video, )
-
+        video = await download_url_to_video_output(final_response.output_url)
+        return (video,)


 NODE_CLASS_MAPPINGS = {
    "MoonvalleyImg2VideoNode": MoonvalleyImg2VideoNode,
    "MoonvalleyTxt2VideoNode": MoonvalleyTxt2VideoNode,
-    # "MoonvalleyVideo2VideoNode": MoonvalleyVideo2VideoNode,
+    "MoonvalleyVideo2VideoNode": MoonvalleyVideo2VideoNode,
 }


 NODE_DISPLAY_NAME_MAPPINGS = {
    "MoonvalleyImg2VideoNode": "Moonvalley Marey Image to Video",
    "MoonvalleyTxt2VideoNode": "Moonvalley Marey Text to Video",
-    # "MoonvalleyVideo2VideoNode": "Moonvalley Marey Video to Video",
+    "MoonvalleyVideo2VideoNode": "Moonvalley Marey Video to Video",
 }
-
-def get_total_frames_from_length(length="5s"):
-    # if length == '5s':
-    #     return 128
-    # elif length == '10s':
-    #     return 256
-    return 128
-    # else:
-    #     raise MoonvalleyApiError("length is required")
--- a/comfy_api_nodes/nodes_openai.py
+++ b/comfy_api_nodes/nodes_openai.py
@@ -163,7 +163,7 @@ class OpenAIDalle2(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        seed=0,
@@ -233,9 +233,9 @@ class OpenAIDalle2(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

-        img_tensor = validate_and_cast_response(response, node_id=unique_id)
+        img_tensor = await validate_and_cast_response(response, node_id=unique_id)
        return (img_tensor,)


@@ -311,7 +311,7 @@ class OpenAIDalle3(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        seed=0,
@@ -343,9 +343,9 @@ class OpenAIDalle3(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

-        img_tensor = validate_and_cast_response(response, node_id=unique_id)
+        img_tensor = await validate_and_cast_response(response, node_id=unique_id)
        return (img_tensor,)


@@ -446,7 +446,7 @@ class OpenAIGPTImage1(ComfyNodeABC):
    DESCRIPTION = cleandoc(__doc__ or "")
    API_NODE = True

-    def api_call(
+    async def api_call(
        self,
        prompt,
        seed=0,
@@ -537,9 +537,9 @@ class OpenAIGPTImage1(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

-        img_tensor = validate_and_cast_response(response, node_id=unique_id)
+        img_tensor = await validate_and_cast_response(response, node_id=unique_id)
        return (img_tensor,)


@@ -623,7 +623,7 @@ class OpenAIChatNode(OpenAITextNode):

    DESCRIPTION = "Generate text responses from an OpenAI model."

-    def get_result_response(
+    async def get_result_response(
        self,
        response_id: str,
        include: Optional[list[Includable]] = None,
@@ -639,7 +639,7 @@ class OpenAIChatNode(OpenAITextNode):
                creation above for more information.

        """
-        return PollingOperation(
+        return await PollingOperation(
            poll_endpoint=ApiEndpoint(
                path=f"{RESPONSES_ENDPOINT}/{response_id}",
                method=HttpMethod.GET,
@@ -784,7 +784,7 @@ class OpenAIChatNode(OpenAITextNode):

        self.history[session_id] = new_history

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        persist_context: bool,
@@ -815,7 +815,7 @@ class OpenAIChatNode(OpenAITextNode):
            previous_response_id = None

        # Create response
-        create_response = SynchronousOperation(
+        create_response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path=RESPONSES_ENDPOINT,
                method=HttpMethod.POST,
@@ -848,7 +848,7 @@ class OpenAIChatNode(OpenAITextNode):
        response_id = create_response.id

        # Get result output
-        result_response = self.get_result_response(response_id, auth_kwargs=kwargs)
+        result_response = await self.get_result_response(response_id, auth_kwargs=kwargs)
        output_text = self.parse_output_text_from_response(result_response)

        # Update history
--- a/comfy_api_nodes/nodes_pika.py
+++ b/comfy_api_nodes/nodes_pika.py
@@ -122,7 +122,7 @@ class PikaNodeBase(ComfyNodeABC):
    FUNCTION = "api_call"
    RETURN_TYPES = ("VIDEO",)

-    def poll_for_task_status(
+    async def poll_for_task_status(
        self,
        task_id: str,
        auth_kwargs: Optional[dict[str, str]] = None,
@@ -152,9 +152,9 @@ class PikaNodeBase(ComfyNodeABC):
            node_id=node_id,
            estimated_duration=60
        )
-        return polling_operation.execute()
+        return await polling_operation.execute()

-    def execute_task(
+    async def execute_task(
        self,
        initial_operation: SynchronousOperation[R, PikaGenerateResponse],
        auth_kwargs: Optional[dict[str, str]] = None,
@@ -169,14 +169,14 @@ class PikaNodeBase(ComfyNodeABC):
        Returns:
            A tuple containing the video file as a VIDEO output.
        """
-        initial_response = initial_operation.execute()
+        initial_response = await initial_operation.execute()
        if not is_valid_initial_response(initial_response):
            error_msg = f"Pika initial request failed. Code: {initial_response.code}, Message: {initial_response.message}, Data: {initial_response.data}"
            logging.error(error_msg)
            raise PikaApiError(error_msg)

        task_id = initial_response.video_id
-        final_response = self.poll_for_task_status(task_id, auth_kwargs)
+        final_response = await self.poll_for_task_status(task_id, auth_kwargs)
        if not is_valid_video_response(final_response):
            error_msg = (
                f"Pika task {task_id} succeeded but no video data found in response."
@@ -187,7 +187,7 @@ class PikaNodeBase(ComfyNodeABC):
        video_url = str(final_response.url)
        logging.info("Pika task %s succeeded. Video URL: %s", task_id, video_url)

-        return (download_url_to_video_output(video_url),)
+        return (await download_url_to_video_output(video_url),)


 class PikaImageToVideoV2_2(PikaNodeBase):
@@ -212,7 +212,7 @@ class PikaImageToVideoV2_2(PikaNodeBase):

    DESCRIPTION = "Sends an image and prompt to the Pika API v2.2 to generate a video."

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt_text: str,
@@ -251,7 +251,7 @@ class PikaImageToVideoV2_2(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaTextToVideoNodeV2_2(PikaNodeBase):
@@ -281,7 +281,7 @@ class PikaTextToVideoNodeV2_2(PikaNodeBase):

    DESCRIPTION = "Sends a text prompt to the Pika API v2.2 to generate a video."

-    def api_call(
+    async def api_call(
        self,
        prompt_text: str,
        negative_prompt: str,
@@ -311,7 +311,7 @@ class PikaTextToVideoNodeV2_2(PikaNodeBase):
            content_type="application/x-www-form-urlencoded",
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaScenesV2_2(PikaNodeBase):
@@ -361,7 +361,7 @@ class PikaScenesV2_2(PikaNodeBase):

    DESCRIPTION = "Combine your images to create a video with the objects in them. Upload multiple images as ingredients and generate a high-quality video that incorporates all of them."

-    def api_call(
+    async def api_call(
        self,
        prompt_text: str,
        negative_prompt: str,
@@ -420,7 +420,7 @@ class PikaScenesV2_2(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikAdditionsNode(PikaNodeBase):
@@ -462,7 +462,7 @@ class PikAdditionsNode(PikaNodeBase):

    DESCRIPTION = "Add any object or image into your video. Upload a video and specify what you'd like to add to create a seamlessly integrated result."

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        image: torch.Tensor,
@@ -481,10 +481,10 @@ class PikAdditionsNode(PikaNodeBase):
        image_bytes_io = tensor_to_bytesio(image)
        image_bytes_io.seek(0)

-        pika_files = [
-            ("video", ("video.mp4", video_bytes_io, "video/mp4")),
-            ("image", ("image.png", image_bytes_io, "image/png")),
-        ]
+        pika_files = {
+            "video": ("video.mp4", video_bytes_io, "video/mp4"),
+            "image": ("image.png", image_bytes_io, "image/png"),
+        }

        # Prepare non-file data
        pika_request_data = PikaBodyGeneratePikadditionsGeneratePikadditionsPost(
@@ -506,7 +506,7 @@ class PikAdditionsNode(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaSwapsNode(PikaNodeBase):
@@ -558,7 +558,7 @@ class PikaSwapsNode(PikaNodeBase):
    DESCRIPTION = "Swap out any object or region of your video with a new image or object. Define areas to replace either with a mask or coordinates."
    RETURN_TYPES = ("VIDEO",)

-    def api_call(
+    async def api_call(
        self,
        video: VideoInput,
        image: torch.Tensor,
@@ -587,11 +587,11 @@ class PikaSwapsNode(PikaNodeBase):
        image_bytes_io = tensor_to_bytesio(image)
        image_bytes_io.seek(0)

-        pika_files = [
-            ("video", ("video.mp4", video_bytes_io, "video/mp4")),
-            ("image", ("image.png", image_bytes_io, "image/png")),
-            ("modifyRegionMask", ("mask.png", mask_bytes_io, "image/png")),
-        ]
+        pika_files = {
+            "video": ("video.mp4", video_bytes_io, "video/mp4"),
+            "image": ("image.png", image_bytes_io, "image/png"),
+            "modifyRegionMask": ("mask.png", mask_bytes_io, "image/png"),
+        }

        # Prepare non-file data
        pika_request_data = PikaBodyGeneratePikaswapsGeneratePikaswapsPost(
@@ -613,7 +613,7 @@ class PikaSwapsNode(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaffectsNode(PikaNodeBase):
@@ -664,7 +664,7 @@ class PikaffectsNode(PikaNodeBase):

    DESCRIPTION = "Generate a video with a specific Pikaffect. Supported Pikaffects: Cake-ify, Crumble, Crush, Decapitate, Deflate, Dissolve, Explode, Eye-pop, Inflate, Levitate, Melt, Peel, Poke, Squish, Ta-da, Tear"

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        pikaffect: str,
@@ -693,7 +693,7 @@ class PikaffectsNode(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 class PikaStartEndFrameNode2_2(PikaNodeBase):
@@ -718,7 +718,7 @@ class PikaStartEndFrameNode2_2(PikaNodeBase):

    DESCRIPTION = "Generate a video by combining your first and last frame. Upload two images to define the start and end points, and let the AI create a smooth transition between them."

-    def api_call(
+    async def api_call(
        self,
        image_start: torch.Tensor,
        image_end: torch.Tensor,
@@ -732,10 +732,7 @@ class PikaStartEndFrameNode2_2(PikaNodeBase):
    ) -> tuple[VideoFromFile]:

        pika_files = [
-            (
-                "keyFrames",
-                ("image_start.png", tensor_to_bytesio(image_start), "image/png"),
-            ),
+            ("keyFrames", ("image_start.png", tensor_to_bytesio(image_start), "image/png")),
            ("keyFrames", ("image_end.png", tensor_to_bytesio(image_end), "image/png")),
        ]

@@ -758,7 +755,7 @@ class PikaStartEndFrameNode2_2(PikaNodeBase):
            auth_kwargs=kwargs,
        )

-        return self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)
+        return await self.execute_task(initial_operation, auth_kwargs=kwargs, node_id=unique_id)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_pixverse.py
+++ b/comfy_api_nodes/nodes_pixverse.py
@@ -30,7 +30,7 @@ from comfy.comfy_types.node_typing import IO, ComfyNodeABC
 from comfy_api.input_impl import VideoFromFile

 import torch
-import requests
+import aiohttp
 from io import BytesIO


@@ -47,7 +47,7 @@ def get_video_url_from_response(
    return str(response.Resp.url)


-def upload_image_to_pixverse(image: torch.Tensor, auth_kwargs=None):
+async def upload_image_to_pixverse(image: torch.Tensor, auth_kwargs=None):
    # first, upload image to Pixverse and get image id to use in actual generation call
    files = {"image": tensor_to_bytesio(image)}
    operation = SynchronousOperation(
@@ -62,7 +62,7 @@ def upload_image_to_pixverse(image: torch.Tensor, auth_kwargs=None):
        content_type="multipart/form-data",
        auth_kwargs=auth_kwargs,
    )
-    response_upload: PixverseImageUploadResponse = operation.execute()
+    response_upload: PixverseImageUploadResponse = await operation.execute()

    if response_upload.Resp is None:
        raise Exception(
@@ -164,7 +164,7 @@ class PixverseTextToVideoNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        aspect_ratio: str,
@@ -205,7 +205,7 @@ class PixverseTextToVideoNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")
@@ -229,11 +229,11 @@ class PixverseTextToVideoNode(ComfyNodeABC):
            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_T2V,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.Resp.url)
-
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 class PixverseImageToVideoNode(ComfyNodeABC):
@@ -302,7 +302,7 @@ class PixverseImageToVideoNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt: str,
@@ -316,7 +316,7 @@ class PixverseImageToVideoNode(ComfyNodeABC):
        **kwargs,
    ):
        validate_string(prompt, strip_whitespace=False)
-        img_id = upload_image_to_pixverse(image, auth_kwargs=kwargs)
+        img_id = await upload_image_to_pixverse(image, auth_kwargs=kwargs)

        # 1080p is limited to 5 seconds duration
        # only normal motion_mode supported for 1080p or for non-5 second duration
@@ -345,7 +345,7 @@ class PixverseImageToVideoNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")
@@ -369,10 +369,11 @@ class PixverseImageToVideoNode(ComfyNodeABC):
            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_I2V,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.Resp.url)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 class PixverseTransitionVideoNode(ComfyNodeABC):
@@ -436,7 +437,7 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        first_frame: torch.Tensor,
        last_frame: torch.Tensor,
@@ -450,8 +451,8 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
        **kwargs,
    ):
        validate_string(prompt, strip_whitespace=False)
-        first_frame_id = upload_image_to_pixverse(first_frame, auth_kwargs=kwargs)
-        last_frame_id = upload_image_to_pixverse(last_frame, auth_kwargs=kwargs)
+        first_frame_id = await upload_image_to_pixverse(first_frame, auth_kwargs=kwargs)
+        last_frame_id = await upload_image_to_pixverse(last_frame, auth_kwargs=kwargs)

        # 1080p is limited to 5 seconds duration
        # only normal motion_mode supported for 1080p or for non-5 second duration
@@ -480,7 +481,7 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
            ),
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")
@@ -504,10 +505,11 @@ class PixverseTransitionVideoNode(ComfyNodeABC):
            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_T2V,
        )
-        response_poll = operation.execute()
+        response_poll = await operation.execute()

-        vid_response = requests.get(response_poll.Resp.url)
-        return (VideoFromFile(BytesIO(vid_response.content)),)
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return (VideoFromFile(BytesIO(await vid_response.content.read())),)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_recraft.py
+++ b/comfy_api_nodes/nodes_recraft.py
@@ -37,7 +37,7 @@ from io import BytesIO
 from PIL import UnidentifiedImageError


-def handle_recraft_file_request(
+async def handle_recraft_file_request(
        image: torch.Tensor,
        path: str,
        mask: torch.Tensor=None,
@@ -71,13 +71,13 @@ def handle_recraft_file_request(
            auth_kwargs=auth_kwargs,
            multipart_parser=recraft_multipart_parser,
        )
-        response: RecraftImageGenerationResponse = operation.execute()
+        response: RecraftImageGenerationResponse = await operation.execute()
        all_bytesio = []
        if response.image is not None:
-            all_bytesio.append(download_url_to_bytesio(response.image.url, timeout=timeout))
+            all_bytesio.append(await download_url_to_bytesio(response.image.url, timeout=timeout))
        else:
            for data in response.data:
-                all_bytesio.append(download_url_to_bytesio(data.url, timeout=timeout))
+                all_bytesio.append(await download_url_to_bytesio(data.url, timeout=timeout))

        return all_bytesio

@@ -395,7 +395,7 @@ class RecraftTextToImageNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        size: str,
@@ -439,7 +439,7 @@ class RecraftTextToImageNode:
            ),
            auth_kwargs=kwargs,
        )
-        response: RecraftImageGenerationResponse = operation.execute()
+        response: RecraftImageGenerationResponse = await operation.execute()
        images = []
        urls = []
        for data in response.data:
@@ -451,7 +451,7 @@ class RecraftTextToImageNode:
                        f"Result URL: {urls_string}", unique_id
                    )
                image = bytesio_to_image_tensor(
-                    download_url_to_bytesio(data.url, timeout=1024)
+                    await download_url_to_bytesio(data.url, timeout=1024)
                )
            if len(image.shape) < 4:
                image = image.unsqueeze(0)
@@ -538,7 +538,7 @@ class RecraftImageToImageNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt: str,
@@ -578,7 +578,7 @@ class RecraftImageToImageNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                path="/proxy/recraft/images/imageToImage",
                request=request,
@@ -654,7 +654,7 @@ class RecraftImageInpaintingNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        mask: torch.Tensor,
@@ -690,7 +690,7 @@ class RecraftImageInpaintingNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                mask=mask[i:i+1],
                path="/proxy/recraft/images/inpaint",
@@ -779,7 +779,7 @@ class RecraftTextToVectorNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        substyle: str,
@@ -821,7 +821,7 @@ class RecraftTextToVectorNode:
            ),
            auth_kwargs=kwargs,
        )
-        response: RecraftImageGenerationResponse = operation.execute()
+        response: RecraftImageGenerationResponse = await operation.execute()
        svg_data = []
        urls = []
        for data in response.data:
@@ -831,7 +831,7 @@ class RecraftTextToVectorNode:
                PromptServer.instance.send_progress_text(
                    f"Result URL: {' '.join(urls)}", unique_id
                )
-            svg_data.append(download_url_to_bytesio(data.url, timeout=1024))
+            svg_data.append(await download_url_to_bytesio(data.url, timeout=1024))

        return (SVG(svg_data),)

@@ -861,7 +861,7 @@ class RecraftVectorizeImageNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        **kwargs,
@@ -870,7 +870,7 @@ class RecraftVectorizeImageNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                path="/proxy/recraft/images/vectorize",
                auth_kwargs=kwargs,
@@ -942,7 +942,7 @@ class RecraftReplaceBackgroundNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        prompt: str,
@@ -973,7 +973,7 @@ class RecraftReplaceBackgroundNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                path="/proxy/recraft/images/replaceBackground",
                request=request,
@@ -1011,7 +1011,7 @@ class RecraftRemoveBackgroundNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        **kwargs,
@@ -1020,7 +1020,7 @@ class RecraftRemoveBackgroundNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                path="/proxy/recraft/images/removeBackground",
                auth_kwargs=kwargs,
@@ -1062,7 +1062,7 @@ class RecraftCrispUpscaleNode:
            },
        }

-    def api_call(
+    async def api_call(
        self,
        image: torch.Tensor,
        **kwargs,
@@ -1071,7 +1071,7 @@ class RecraftCrispUpscaleNode:
        total = image.shape[0]
        pbar = ProgressBar(total)
        for i in range(total):
-            sub_bytes = handle_recraft_file_request(
+            sub_bytes = await handle_recraft_file_request(
                image=image[i],
                path=self.RECRAFT_PATH,
                auth_kwargs=kwargs,
--- a/comfy_api_nodes/nodes_rodin.py
+++ b/comfy_api_nodes/nodes_rodin.py
@@ -9,11 +9,10 @@ from __future__ import annotations
 from inspect import cleandoc
 from comfy.comfy_types.node_typing import IO
 import folder_paths as comfy_paths
-import requests
+import aiohttp
 import os
 import datetime
-import shutil
-import time
+import asyncio
 import io
 import logging
 import math
@@ -66,7 +65,6 @@ def create_task_error(response: Rodin3DGenerateResponse):
    return hasattr(response, "error")


-
 class Rodin3DAPI:
    """
    Generate 3D Assets using Rodin API
@@ -123,8 +121,8 @@ class Rodin3DAPI:
        else:
            return "Generating"

-    def CreateGenerateTask(self, images=None, seed=1, material="PBR", quality="medium", tier="Regular", mesh_mode="Quad", **kwargs):
-        if images == None:
+    async def create_generate_task(self, images=None, seed=1, material="PBR", quality="medium", tier="Regular", mesh_mode="Quad", **kwargs):
+        if images is None:
            raise Exception("Rodin 3D generate requires at least 1 image.")
        if len(images) >= 5:
            raise Exception("Rodin 3D generate requires up to 5 image.")
@@ -155,7 +153,7 @@ class Rodin3DAPI:
            auth_kwargs=kwargs,
        )

-        response = operation.execute()
+        response = await operation.execute()

        if create_task_error(response):
            error_message = f"Rodin3D Create 3D generate Task Failed. Message: {response.message}, error: {response.error}"
@@ -168,7 +166,7 @@ class Rodin3DAPI:
        logging.info(f"[ Rodin3D API - Submit Jobs ] UUID: {task_uuid}")
        return task_uuid, subscription_key

-    def poll_for_task_status(self, subscription_key, **kwargs) -> Rodin3DCheckStatusResponse:
+    async def poll_for_task_status(self, subscription_key, **kwargs) -> Rodin3DCheckStatusResponse:

        path = "/proxy/rodin/api/v2/status"

@@ -191,11 +189,9 @@ class Rodin3DAPI:

        logging.info("[ Rodin3D API - CheckStatus ] Generate Start!")

-        return poll_operation.execute()
+        return await poll_operation.execute()

-
-
-    def GetRodinDownloadList(self, uuid, **kwargs) -> Rodin3DDownloadResponse:
+    async def get_rodin_download_list(self, uuid, **kwargs) -> Rodin3DDownloadResponse:
        logging.info("[ Rodin3D API - Downloading ] Generate Successfully!")

        path = "/proxy/rodin/api/v2/download"
@@ -212,53 +208,59 @@ class Rodin3DAPI:
            auth_kwargs=kwargs
        )

-        return operation.execute()
+        return await operation.execute()

-    def GetQualityAndMode(self, PolyCount):
-        if PolyCount == "200K-Triangle":
+    def get_quality_mode(self, poly_count):
+        if poly_count == "200K-Triangle":
            mesh_mode = "Raw"
            quality = "medium"
        else:
            mesh_mode = "Quad"
-            if PolyCount == "4K-Quad":
+            if poly_count == "4K-Quad":
                quality = "extra-low"
-            elif PolyCount == "8K-Quad":
+            elif poly_count == "8K-Quad":
                quality = "low"
-            elif PolyCount == "18K-Quad":
+            elif poly_count == "18K-Quad":
                quality = "medium"
-            elif PolyCount == "50K-Quad":
+            elif poly_count == "50K-Quad":
                quality = "high"
            else:
                quality = "medium"

        return mesh_mode, quality

-    def DownLoadFiles(self, Url_List):
-        Save_path = os.path.join(comfy_paths.get_output_directory(), "Rodin3D", datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S"))
-        os.makedirs(Save_path, exist_ok=True)
+    async def download_files(self, url_list):
+        save_path = os.path.join(comfy_paths.get_output_directory(), "Rodin3D", datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S"))
+        os.makedirs(save_path, exist_ok=True)
        model_file_path = None
-        for Item in Url_List.list:
-            url = Item.url
-            file_name = Item.name
-            file_path = os.path.join(Save_path, file_name)
-            if file_path.endswith(".glb"):
-                model_file_path = file_path
-            logging.info(f"[ Rodin3D API - download_files ] Downloading file: {file_path}")
-            max_retries = 5
-            for attempt in range(max_retries):
-                try:
-                    with requests.get(url, stream=True) as r:
-                        r.raise_for_status()
-                        with open(file_path, "wb") as f:
-                            shutil.copyfileobj(r.raw, f)
-                    break
-                except Exception as e:
-                    logging.info(f"[ Rodin3D API - download_files ] Error downloading {file_path}:{e}")
-                    if attempt < max_retries - 1:
-                        logging.info("Retrying...")
-                        time.sleep(2)
-                    else:
-                        logging.info(f"[ Rodin3D API - download_files ] Failed to download {file_path} after {max_retries} attempts.")
+        async with aiohttp.ClientSession() as session:
+            for i in url_list.list:
+                url = i.url
+                file_name = i.name
+                file_path = os.path.join(save_path, file_name)
+                if file_path.endswith(".glb"):
+                    model_file_path = file_path
+                logging.info(f"[ Rodin3D API - download_files ] Downloading file: {file_path}")
+                max_retries = 5
+                for attempt in range(max_retries):
+                    try:
+                        async with session.get(url) as resp:
+                            resp.raise_for_status()
+                            with open(file_path, "wb") as f:
+                                async for chunk in resp.content.iter_chunked(32 * 1024):
+                                    f.write(chunk)
+                        break
+                    except Exception as e:
+                        logging.info(f"[ Rodin3D API - download_files ] Error downloading {file_path}:{e}")
+                        if attempt < max_retries - 1:
+                            logging.info("Retrying...")
+                            await asyncio.sleep(2)
+                        else:
+                            logging.info(
+                                "[ Rodin3D API - download_files ] Failed to download %s after %s attempts.",
+                                file_path,
+                                max_retries,
+                            )

        return model_file_path

@@ -285,7 +287,7 @@ class Rodin3D_Regular(Rodin3DAPI):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        Images,
        Seed,
@@ -298,14 +300,17 @@ class Rodin3D_Regular(Rodin3DAPI):
        m_images = []
        for i in range(num_images):
            m_images.append(Images[i])
-        mesh_mode, quality = self.GetQualityAndMode(Polygon_count)
-        task_uuid, subscription_key = self.CreateGenerateTask(images=m_images, seed=Seed, material=Material_Type, quality=quality, tier=tier, mesh_mode=mesh_mode, **kwargs)
-        self.poll_for_task_status(subscription_key, **kwargs)
-        Download_List = self.GetRodinDownloadList(task_uuid, **kwargs)
-        model = self.DownLoadFiles(Download_List)
+        mesh_mode, quality = self.get_quality_mode(Polygon_count)
+        task_uuid, subscription_key = await self.create_generate_task(images=m_images, seed=Seed, material=Material_Type,
+                                                                quality=quality, tier=tier, mesh_mode=mesh_mode,
+                                                                **kwargs)
+        await self.poll_for_task_status(subscription_key, **kwargs)
+        download_list = await self.get_rodin_download_list(task_uuid, **kwargs)
+        model = await self.download_files(download_list)

        return (model,)

+
 class Rodin3D_Detail(Rodin3DAPI):
    @classmethod
    def INPUT_TYPES(s):
@@ -328,7 +333,7 @@ class Rodin3D_Detail(Rodin3DAPI):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        Images,
        Seed,
@@ -341,14 +346,17 @@ class Rodin3D_Detail(Rodin3DAPI):
        m_images = []
        for i in range(num_images):
            m_images.append(Images[i])
-        mesh_mode, quality = self.GetQualityAndMode(Polygon_count)
-        task_uuid, subscription_key = self.CreateGenerateTask(images=m_images, seed=Seed, material=Material_Type, quality=quality, tier=tier, mesh_mode=mesh_mode, **kwargs)
-        self.poll_for_task_status(subscription_key, **kwargs)
-        Download_List = self.GetRodinDownloadList(task_uuid, **kwargs)
-        model = self.DownLoadFiles(Download_List)
+        mesh_mode, quality = self.get_quality_mode(Polygon_count)
+        task_uuid, subscription_key = await self.create_generate_task(images=m_images, seed=Seed, material=Material_Type,
+                                                                quality=quality, tier=tier, mesh_mode=mesh_mode,
+                                                                **kwargs)
+        await self.poll_for_task_status(subscription_key, **kwargs)
+        download_list = await self.get_rodin_download_list(task_uuid, **kwargs)
+        model = await self.download_files(download_list)

        return (model,)

+
 class Rodin3D_Smooth(Rodin3DAPI):
    @classmethod
    def INPUT_TYPES(s):
@@ -371,7 +379,7 @@ class Rodin3D_Smooth(Rodin3DAPI):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        Images,
        Seed,
@@ -384,14 +392,17 @@ class Rodin3D_Smooth(Rodin3DAPI):
        m_images = []
        for i in range(num_images):
            m_images.append(Images[i])
-        mesh_mode, quality = self.GetQualityAndMode(Polygon_count)
-        task_uuid, subscription_key = self.CreateGenerateTask(images=m_images, seed=Seed, material=Material_Type, quality=quality, tier=tier, mesh_mode=mesh_mode, **kwargs)
-        self.poll_for_task_status(subscription_key, **kwargs)
-        Download_List = self.GetRodinDownloadList(task_uuid, **kwargs)
-        model = self.DownLoadFiles(Download_List)
+        mesh_mode, quality = self.get_quality_mode(Polygon_count)
+        task_uuid, subscription_key = await self.create_generate_task(images=m_images, seed=Seed, material=Material_Type,
+                                                                quality=quality, tier=tier, mesh_mode=mesh_mode,
+                                                                **kwargs)
+        await self.poll_for_task_status(subscription_key, **kwargs)
+        download_list = await self.get_rodin_download_list(task_uuid, **kwargs)
+        model = await self.download_files(download_list)

        return (model,)

+
 class Rodin3D_Sketch(Rodin3DAPI):
    @classmethod
    def INPUT_TYPES(s):
@@ -423,7 +434,7 @@ class Rodin3D_Sketch(Rodin3DAPI):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        Images,
        Seed,
@@ -437,10 +448,12 @@ class Rodin3D_Sketch(Rodin3DAPI):
        material_type = "PBR"
        quality = "medium"
        mesh_mode = "Quad"
-        task_uuid, subscription_key = self.CreateGenerateTask(images=m_images, seed=Seed, material=material_type, quality=quality, tier=tier, mesh_mode=mesh_mode, **kwargs)
-        self.poll_for_task_status(subscription_key, **kwargs)
-        Download_List = self.GetRodinDownloadList(task_uuid, **kwargs)
-        model = self.DownLoadFiles(Download_List)
+        task_uuid, subscription_key = await self.create_generate_task(
+            images=m_images, seed=Seed, material=material_type, quality=quality, tier=tier, mesh_mode=mesh_mode, **kwargs
+        )
+        await self.poll_for_task_status(subscription_key, **kwargs)
+        download_list = await self.get_rodin_download_list(task_uuid, **kwargs)
+        model = await self.download_files(download_list)

        return (model,)

--- a/comfy_api_nodes/nodes_runway.py
+++ b/comfy_api_nodes/nodes_runway.py
@@ -99,14 +99,14 @@ def validate_input_image(image: torch.Tensor) -> bool:
    return image.shape[2] < 8000 and image.shape[1] < 8000


-def poll_until_finished(
+async def poll_until_finished(
    auth_kwargs: dict[str, str],
    api_endpoint: ApiEndpoint[Any, TaskStatusResponse],
    estimated_duration: Optional[int] = None,
    node_id: Optional[str] = None,
 ) -> TaskStatusResponse:
    """Polls the Runway API endpoint until the task reaches a terminal state, then returns the response."""
-    return PollingOperation(
+    return await PollingOperation(
        poll_endpoint=api_endpoint,
        completed_statuses=[
            TaskStatus.SUCCEEDED.value,
@@ -115,7 +115,7 @@ def poll_until_finished(
            TaskStatus.FAILED.value,
            TaskStatus.CANCELLED.value,
        ],
-        status_extractor=lambda response: (response.status.value),
+        status_extractor=lambda response: response.status.value,
        auth_kwargs=auth_kwargs,
        result_url_extractor=get_video_url_from_task_status,
        estimated_duration=estimated_duration,
@@ -167,11 +167,11 @@ class RunwayVideoGenNode(ComfyNodeABC):
            )
        return True

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> RunwayImageToVideoResponse:
        """Poll the task status until it is finished then get the response."""
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_GET_TASK_STATUS}/{task_id}",
@@ -183,7 +183,7 @@ class RunwayVideoGenNode(ComfyNodeABC):
            node_id=node_id,
        )

-    def generate_video(
+    async def generate_video(
        self,
        request: RunwayImageToVideoRequest,
        auth_kwargs: dict[str, str],
@@ -200,15 +200,15 @@ class RunwayVideoGenNode(ComfyNodeABC):
            auth_kwargs=auth_kwargs,
        )

-        initial_response = initial_operation.execute()
+        initial_response = await initial_operation.execute()
        self.validate_task_created(initial_response)
        task_id = initial_response.id

-        final_response = self.get_response(task_id, auth_kwargs, node_id)
+        final_response = await self.get_response(task_id, auth_kwargs, node_id)
        self.validate_response(final_response)

        video_url = get_video_url_from_task_status(final_response)
-        return (download_url_to_video_output(video_url),)
+        return (await download_url_to_video_output(video_url),)


 class RunwayImageToVideoNodeGen3a(RunwayVideoGenNode):
@@ -250,7 +250,7 @@ class RunwayImageToVideoNodeGen3a(RunwayVideoGenNode):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        start_frame: torch.Tensor,
@@ -265,7 +265,7 @@ class RunwayImageToVideoNodeGen3a(RunwayVideoGenNode):
        validate_input_image(start_frame)

        # Upload image
-        download_urls = upload_images_to_comfyapi(
+        download_urls = await upload_images_to_comfyapi(
            start_frame,
            max_images=1,
            mime_type="image/png",
@@ -274,7 +274,7 @@ class RunwayImageToVideoNodeGen3a(RunwayVideoGenNode):
        if len(download_urls) != 1:
            raise RunwayApiError("Failed to upload one or more images to comfy api.")

-        return self.generate_video(
+        return await self.generate_video(
            RunwayImageToVideoRequest(
                promptText=prompt,
                seed=seed,
@@ -333,7 +333,7 @@ class RunwayImageToVideoNodeGen4(RunwayVideoGenNode):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        start_frame: torch.Tensor,
@@ -348,7 +348,7 @@ class RunwayImageToVideoNodeGen4(RunwayVideoGenNode):
        validate_input_image(start_frame)

        # Upload image
-        download_urls = upload_images_to_comfyapi(
+        download_urls = await upload_images_to_comfyapi(
            start_frame,
            max_images=1,
            mime_type="image/png",
@@ -357,7 +357,7 @@ class RunwayImageToVideoNodeGen4(RunwayVideoGenNode):
        if len(download_urls) != 1:
            raise RunwayApiError("Failed to upload one or more images to comfy api.")

-        return self.generate_video(
+        return await self.generate_video(
            RunwayImageToVideoRequest(
                promptText=prompt,
                seed=seed,
@@ -382,10 +382,10 @@ class RunwayFirstLastFrameNode(RunwayVideoGenNode):

    DESCRIPTION = "Upload first and last keyframes, draft a prompt, and generate a video. More complex transitions, such as cases where the Last frame is completely different from the First frame, may benefit from the longer 10s duration. This would give the generation more time to smoothly transition between the two inputs. Before diving in, review these best practices to ensure that your input selections will set your generation up for success: https://help.runwayml.com/hc/en-us/articles/34170748696595-Creating-with-Keyframes-on-Gen-3."

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> RunwayImageToVideoResponse:
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_GET_TASK_STATUS}/{task_id}",
@@ -437,7 +437,7 @@ class RunwayFirstLastFrameNode(RunwayVideoGenNode):
            },
        }

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        start_frame: torch.Tensor,
@@ -455,7 +455,7 @@ class RunwayFirstLastFrameNode(RunwayVideoGenNode):

        # Upload images
        stacked_input_images = image_tensor_pair_to_batch(start_frame, end_frame)
-        download_urls = upload_images_to_comfyapi(
+        download_urls = await upload_images_to_comfyapi(
            stacked_input_images,
            max_images=2,
            mime_type="image/png",
@@ -464,7 +464,7 @@ class RunwayFirstLastFrameNode(RunwayVideoGenNode):
        if len(download_urls) != 2:
            raise RunwayApiError("Failed to upload one or more images to comfy api.")

-        return self.generate_video(
+        return await self.generate_video(
            RunwayImageToVideoRequest(
                promptText=prompt,
                seed=seed,
@@ -543,11 +543,11 @@ class RunwayTextToImageNode(ComfyNodeABC):
            )
        return True

-    def get_response(
+    async def get_response(
        self, task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
    ) -> TaskStatusResponse:
        """Poll the task status until it is finished then get the response."""
-        return poll_until_finished(
+        return await poll_until_finished(
            auth_kwargs,
            ApiEndpoint(
                path=f"{PATH_GET_TASK_STATUS}/{task_id}",
@@ -559,7 +559,7 @@ class RunwayTextToImageNode(ComfyNodeABC):
            node_id=node_id,
        )

-    def api_call(
+    async def api_call(
        self,
        prompt: str,
        ratio: str,
@@ -574,7 +574,7 @@ class RunwayTextToImageNode(ComfyNodeABC):
        reference_images = None
        if reference_image is not None:
            validate_input_image(reference_image)
-            download_urls = upload_images_to_comfyapi(
+            download_urls = await upload_images_to_comfyapi(
                reference_image,
                max_images=1,
                mime_type="image/png",
@@ -605,19 +605,19 @@ class RunwayTextToImageNode(ComfyNodeABC):
            auth_kwargs=kwargs,
        )

-        initial_response = initial_operation.execute()
+        initial_response = await initial_operation.execute()
        self.validate_task_created(initial_response)
        task_id = initial_response.id

        # Poll for completion
-        final_response = self.get_response(
+        final_response = await self.get_response(
            task_id, auth_kwargs=kwargs, node_id=unique_id
        )
        self.validate_response(final_response)

        # Download and return image
        image_url = get_image_url_from_task_status(final_response)
-        return (download_url_to_image_tensor(image_url),)
+        return (await download_url_to_image_tensor(image_url),)


 NODE_CLASS_MAPPINGS = {
--- a/comfy_api_nodes/nodes_stability.py
+++ b/comfy_api_nodes/nodes_stability.py
@@ -124,7 +124,7 @@ class StabilityStableImageUltraNode:
            },
        }

-    def api_call(self, prompt: str, aspect_ratio: str, style_preset: str, seed: int,
+    async def api_call(self, prompt: str, aspect_ratio: str, style_preset: str, seed: int,
                 negative_prompt: str=None, image: torch.Tensor = None, image_denoise: float=None,
                 **kwargs):
        validate_string(prompt, strip_whitespace=False)
@@ -163,7 +163,7 @@ class StabilityStableImageUltraNode:
            content_type="multipart/form-data",
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stable Image Ultra generation failed: {response_api.finish_reason}.")
@@ -257,7 +257,7 @@ class StabilityStableImageSD_3_5Node:
            },
        }

-    def api_call(self, model: str, prompt: str, aspect_ratio: str, style_preset: str, seed: int, cfg_scale: float,
+    async def api_call(self, model: str, prompt: str, aspect_ratio: str, style_preset: str, seed: int, cfg_scale: float,
                 negative_prompt: str=None, image: torch.Tensor = None, image_denoise: float=None,
                 **kwargs):
        validate_string(prompt, strip_whitespace=False)
@@ -302,7 +302,7 @@ class StabilityStableImageSD_3_5Node:
            content_type="multipart/form-data",
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stable Diffusion 3.5 Image generation failed: {response_api.finish_reason}.")
@@ -374,7 +374,7 @@ class StabilityUpscaleConservativeNode:
            },
        }

-    def api_call(self, image: torch.Tensor, prompt: str, creativity: float, seed: int, negative_prompt: str=None,
+    async def api_call(self, image: torch.Tensor, prompt: str, creativity: float, seed: int, negative_prompt: str=None,
                 **kwargs):
        validate_string(prompt, strip_whitespace=False)
        image_binary = tensor_to_bytesio(image, total_pixels=1024*1024).read()
@@ -403,7 +403,7 @@ class StabilityUpscaleConservativeNode:
            content_type="multipart/form-data",
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stability Upscale Conservative generation failed: {response_api.finish_reason}.")
@@ -480,7 +480,7 @@ class StabilityUpscaleCreativeNode:
            },
        }

-    def api_call(self, image: torch.Tensor, prompt: str, creativity: float, style_preset: str, seed: int, negative_prompt: str=None,
+    async def api_call(self, image: torch.Tensor, prompt: str, creativity: float, style_preset: str, seed: int, negative_prompt: str=None,
                 **kwargs):
        validate_string(prompt, strip_whitespace=False)
        image_binary = tensor_to_bytesio(image, total_pixels=1024*1024).read()
@@ -512,7 +512,7 @@ class StabilityUpscaleCreativeNode:
            content_type="multipart/form-data",
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        operation = PollingOperation(
            poll_endpoint=ApiEndpoint(
@@ -527,7 +527,7 @@ class StabilityUpscaleCreativeNode:
            status_extractor=lambda x: get_async_dummy_status(x),
            auth_kwargs=kwargs,
        )
-        response_poll: StabilityResultsGetResponse = operation.execute()
+        response_poll: StabilityResultsGetResponse = await operation.execute()

        if response_poll.finish_reason != "SUCCESS":
            raise Exception(f"Stability Upscale Creative generation failed: {response_poll.finish_reason}.")
@@ -563,8 +563,7 @@ class StabilityUpscaleFastNode:
            },
        }

-    def api_call(self, image: torch.Tensor,
-                 **kwargs):
+    async def api_call(self, image: torch.Tensor, **kwargs):
        image_binary = tensor_to_bytesio(image, total_pixels=4096*4096).read()

        files = {
@@ -583,7 +582,7 @@ class StabilityUpscaleFastNode:
            content_type="multipart/form-data",
            auth_kwargs=kwargs,
        )
-        response_api = operation.execute()
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stability Upscale Fast failed: {response_api.finish_reason}.")
--- a/comfy_api_nodes/nodes_tripo.py
+++ b/comfy_api_nodes/nodes_tripo.py
@@ -37,8 +37,8 @@ from comfy_api_nodes.apinode_utils import (
 )


-def upload_image_to_tripo(image, **kwargs):
-    urls = upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs)
+async def upload_image_to_tripo(image, **kwargs):
+    urls = await upload_images_to_comfyapi(image, max_images=1, auth_kwargs=kwargs)
    return TripoFileReference(TripoUrlReference(url=urls[0], type="jpeg"))

 def get_model_url_from_response(response: TripoTaskResponse) -> str:
@@ -49,7 +49,7 @@ def get_model_url_from_response(response: TripoTaskResponse) -> str:
    raise RuntimeError(f"Failed to get model url from response: {response}")


-def poll_until_finished(
+async def poll_until_finished(
    kwargs: dict[str, str],
    response: TripoTaskResponse,
 ) -> tuple[str, str]:
@@ -57,7 +57,7 @@ def poll_until_finished(
    if response.code != 0:
        raise RuntimeError(f"Failed to generate mesh: {response.error}")
    task_id = response.data.task_id
-    response_poll = PollingOperation(
+    response_poll = await PollingOperation(
        poll_endpoint=ApiEndpoint(
            path=f"/proxy/tripo/v2/openapi/task/{task_id}",
            method=HttpMethod.GET,
@@ -80,7 +80,7 @@ def poll_until_finished(
    ).execute()
    if response_poll.data.status == TripoTaskStatus.SUCCESS:
        url = get_model_url_from_response(response_poll)
-        bytesio = download_url_to_bytesio(url)
+        bytesio = await download_url_to_bytesio(url)
        # Save the downloaded model file
        model_file = f"tripo_model_{task_id}.glb"
        with open(os.path.join(get_output_directory(), model_file), "wb") as f:
@@ -88,6 +88,7 @@ def poll_until_finished(
        return model_file, task_id
    raise RuntimeError(f"Failed to generate mesh: {response_poll}")

+
 class TripoTextToModelNode:
    """
    Generates 3D models synchronously based on a text prompt using Tripo's API.
@@ -126,11 +127,11 @@ class TripoTextToModelNode:
    API_NODE = True
    OUTPUT_NODE = True

-    def generate_mesh(self, prompt, negative_prompt=None, model_version=None, style=None, texture=None, pbr=None, image_seed=None, model_seed=None, texture_seed=None, texture_quality=None, face_limit=None, quad=None, **kwargs):
+    async def generate_mesh(self, prompt, negative_prompt=None, model_version=None, style=None, texture=None, pbr=None, image_seed=None, model_seed=None, texture_seed=None, texture_quality=None, face_limit=None, quad=None, **kwargs):
        style_enum = None if style == "None" else style
        if not prompt:
            raise RuntimeError("Prompt is required")
-        response = SynchronousOperation(
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -155,7 +156,8 @@ class TripoTextToModelNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 class TripoImageToModelNode:
    """
@@ -195,12 +197,12 @@ class TripoImageToModelNode:
    API_NODE = True
    OUTPUT_NODE = True

-    def generate_mesh(self, image, model_version=None, style=None, texture=None, pbr=None, model_seed=None, orientation=None, texture_alignment=None, texture_seed=None, texture_quality=None, face_limit=None, quad=None, **kwargs):
+    async def generate_mesh(self, image, model_version=None, style=None, texture=None, pbr=None, model_seed=None, orientation=None, texture_alignment=None, texture_seed=None, texture_quality=None, face_limit=None, quad=None, **kwargs):
        style_enum = None if style == "None" else style
        if image is None:
            raise RuntimeError("Image is required")
-        tripo_file = upload_image_to_tripo(image, **kwargs)
-        response = SynchronousOperation(
+        tripo_file = await upload_image_to_tripo(image, **kwargs)
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -225,7 +227,8 @@ class TripoImageToModelNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 class TripoMultiviewToModelNode:
    """
@@ -267,7 +270,7 @@ class TripoMultiviewToModelNode:
    API_NODE = True
    OUTPUT_NODE = True

-    def generate_mesh(self, image, image_left=None, image_back=None, image_right=None, model_version=None, orientation=None, texture=None, pbr=None, model_seed=None, texture_seed=None, texture_quality=None, texture_alignment=None, face_limit=None, quad=None, **kwargs):
+    async def generate_mesh(self, image, image_left=None, image_back=None, image_right=None, model_version=None, orientation=None, texture=None, pbr=None, model_seed=None, texture_seed=None, texture_quality=None, texture_alignment=None, face_limit=None, quad=None, **kwargs):
        if image is None:
            raise RuntimeError("front image for multiview is required")
        images = []
@@ -282,11 +285,11 @@ class TripoMultiviewToModelNode:
        for image_name in ["image", "image_left", "image_back", "image_right"]:
            image_ = image_dict[image_name]
            if image_ is not None:
-                tripo_file = upload_image_to_tripo(image_, **kwargs)
+                tripo_file = await upload_image_to_tripo(image_, **kwargs)
                images.append(tripo_file)
            else:
                images.append(TripoFileEmptyReference())
-        response = SynchronousOperation(
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -309,7 +312,8 @@ class TripoMultiviewToModelNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 class TripoTextureNode:
    @classmethod
@@ -340,8 +344,8 @@ class TripoTextureNode:
    OUTPUT_NODE = True
    AVERAGE_DURATION = 80

-    def generate_mesh(self, model_task_id, texture=None, pbr=None, texture_seed=None, texture_quality=None, texture_alignment=None, **kwargs):
-        response = SynchronousOperation(
+    async def generate_mesh(self, model_task_id, texture=None, pbr=None, texture_seed=None, texture_quality=None, texture_alignment=None, **kwargs):
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -358,7 +362,7 @@ class TripoTextureNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)


 class TripoRefineNode:
@@ -387,8 +391,8 @@ class TripoRefineNode:
    OUTPUT_NODE = True
    AVERAGE_DURATION = 240

-    def generate_mesh(self, model_task_id, **kwargs):
-        response = SynchronousOperation(
+    async def generate_mesh(self, model_task_id, **kwargs):
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -400,7 +404,7 @@ class TripoRefineNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)


 class TripoRigNode:
@@ -425,8 +429,8 @@ class TripoRigNode:
    OUTPUT_NODE = True
    AVERAGE_DURATION = 180

-    def generate_mesh(self, original_model_task_id, **kwargs):
-        response = SynchronousOperation(
+    async def generate_mesh(self, original_model_task_id, **kwargs):
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -440,7 +444,8 @@ class TripoRigNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 class TripoRetargetNode:
    @classmethod
@@ -475,8 +480,8 @@ class TripoRetargetNode:
    OUTPUT_NODE = True
    AVERAGE_DURATION = 30

-    def generate_mesh(self, animation, original_model_task_id, **kwargs):
-        response = SynchronousOperation(
+    async def generate_mesh(self, animation, original_model_task_id, **kwargs):
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -491,7 +496,8 @@ class TripoRetargetNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 class TripoConversionNode:
    @classmethod
@@ -529,10 +535,10 @@ class TripoConversionNode:
    OUTPUT_NODE = True
    AVERAGE_DURATION = 30

-    def generate_mesh(self, original_model_task_id, format, quad, face_limit, texture_size, texture_format, **kwargs):
+    async def generate_mesh(self, original_model_task_id, format, quad, face_limit, texture_size, texture_format, **kwargs):
        if not original_model_task_id:
            raise RuntimeError("original_model_task_id is required")
-        response = SynchronousOperation(
+        response = await SynchronousOperation(
            endpoint=ApiEndpoint(
                path="/proxy/tripo/v2/openapi/task",
                method=HttpMethod.POST,
@@ -549,7 +555,8 @@ class TripoConversionNode:
            ),
            auth_kwargs=kwargs,
        ).execute()
-        return poll_until_finished(kwargs, response)
+        return await poll_until_finished(kwargs, response)
+

 NODE_CLASS_MAPPINGS = {
    "TripoTextToModelNode": TripoTextToModelNode,
--- a/comfy_api_nodes/nodes_veo2.py
+++ b/comfy_api_nodes/nodes_veo2.py
@@ -1,17 +1,17 @@
 import io
 import logging
 import base64
-import requests
+import aiohttp
 import torch
 from typing import Optional

 from comfy.comfy_types.node_typing import IO, ComfyNodeABC
 from comfy_api.input_impl.video_types import VideoFromFile
 from comfy_api_nodes.apis import (
-    Veo2GenVidRequest,
-    Veo2GenVidResponse,
-    Veo2GenVidPollRequest,
-    Veo2GenVidPollResponse
+    VeoGenVidRequest,
+    VeoGenVidResponse,
+    VeoGenVidPollRequest,
+    VeoGenVidPollResponse
 )
 from comfy_api_nodes.apis.client import (
    ApiEndpoint,
@@ -35,7 +35,7 @@ def convert_image_to_base64(image: torch.Tensor):
    return tensor_to_base64_string(scaled_image)


-def get_video_url_from_response(poll_response: Veo2GenVidPollResponse) -> Optional[str]:
+def get_video_url_from_response(poll_response: VeoGenVidPollResponse) -> Optional[str]:
    if (
        poll_response.response
        and hasattr(poll_response.response, "videos")
@@ -130,6 +130,14 @@ class VeoVideoGenerationNode(ComfyNodeABC):
                    "default": None,
                    "tooltip": "Optional reference image to guide video generation",
                }),
+                "model": (
+                    IO.COMBO,
+                    {
+                        "options": ["veo-2.0-generate-001"],
+                        "default": "veo-2.0-generate-001",
+                        "tooltip": "Veo 2 model to use for video generation",
+                    },
+                ),
            },
            "hidden": {
                "auth_token": "AUTH_TOKEN_COMFY_ORG",
@@ -141,10 +149,10 @@ class VeoVideoGenerationNode(ComfyNodeABC):
    RETURN_TYPES = (IO.VIDEO,)
    FUNCTION = "generate_video"
    CATEGORY = "api node/video/Veo"
-    DESCRIPTION = "Generates videos from text prompts using Google's Veo API"
+    DESCRIPTION = "Generates videos from text prompts using Google's Veo 2 API"
    API_NODE = True

-    def generate_video(
+    async def generate_video(
        self,
        prompt,
        aspect_ratio="16:9",
@@ -154,6 +162,8 @@ class VeoVideoGenerationNode(ComfyNodeABC):
        person_generation="ALLOW",
        seed=0,
        image=None,
+        model="veo-2.0-generate-001",
+        generate_audio=False,
        unique_id: Optional[str] = None,
        **kwargs,
    ):
@@ -188,23 +198,26 @@ class VeoVideoGenerationNode(ComfyNodeABC):
            parameters["negativePrompt"] = negative_prompt
        if seed > 0:
            parameters["seed"] = seed
+        # Only add generateAudio for Veo 3 models
+        if "veo-3.0" in model:
+            parameters["generateAudio"] = generate_audio

        # Initial request to start video generation
        initial_operation = SynchronousOperation(
            endpoint=ApiEndpoint(
-                path="/proxy/veo/generate",
+                path=f"/proxy/veo/{model}/generate",
                method=HttpMethod.POST,
-                request_model=Veo2GenVidRequest,
-                response_model=Veo2GenVidResponse
+                request_model=VeoGenVidRequest,
+                response_model=VeoGenVidResponse
            ),
-            request=Veo2GenVidRequest(
+            request=VeoGenVidRequest(
                instances=instances,
                parameters=parameters
            ),
            auth_kwargs=kwargs,
        )

-        initial_response = initial_operation.execute()
+        initial_response = await initial_operation.execute()
        operation_name = initial_response.name

        logging.info(f"Veo generation started with operation name: {operation_name}")
@@ -223,16 +236,16 @@ class VeoVideoGenerationNode(ComfyNodeABC):
        # Define the polling operation
        poll_operation = PollingOperation(
            poll_endpoint=ApiEndpoint(
-                path="/proxy/veo/poll",
+                path=f"/proxy/veo/{model}/poll",
                method=HttpMethod.POST,
-                request_model=Veo2GenVidPollRequest,
-                response_model=Veo2GenVidPollResponse
+                request_model=VeoGenVidPollRequest,
+                response_model=VeoGenVidPollResponse
            ),
            completed_statuses=["completed"],
            failed_statuses=[],  # No failed statuses, we'll handle errors after polling
            status_extractor=status_extractor,
            progress_extractor=progress_extractor,
-            request=Veo2GenVidPollRequest(
+            request=VeoGenVidPollRequest(
                operationName=operation_name
            ),
            auth_kwargs=kwargs,
@@ -243,7 +256,7 @@ class VeoVideoGenerationNode(ComfyNodeABC):
        )

        # Execute the polling operation
-        poll_response = poll_operation.execute()
+        poll_response = await poll_operation.execute()

        # Now check for errors in the final response
        # Check for error in poll response
@@ -268,7 +281,6 @@ class VeoVideoGenerationNode(ComfyNodeABC):
            raise Exception(error_message)

        # Extract video data
-        video_data = None
        if poll_response.response and hasattr(poll_response.response, 'videos') and poll_response.response.videos and len(poll_response.response.videos) > 0:
            video = poll_response.response.videos[0]

@@ -278,9 +290,9 @@ class VeoVideoGenerationNode(ComfyNodeABC):
                video_data = base64.b64decode(video.bytesBase64Encoded)
            elif hasattr(video, 'gcsUri') and video.gcsUri:
                # Download from URL
-                video_url = video.gcsUri
-                video_response = requests.get(video_url)
-                video_data = video_response.content
+                async with aiohttp.ClientSession() as session:
+                    async with session.get(video.gcsUri) as video_response:
+                        video_data = await video_response.content.read()
            else:
                raise Exception("Video returned but no data or URL was provided")
        else:
@@ -298,11 +310,64 @@ class VeoVideoGenerationNode(ComfyNodeABC):
        return (VideoFromFile(video_io),)


-# Register the node
+class Veo3VideoGenerationNode(VeoVideoGenerationNode):
+    """
+    Generates videos from text prompts using Google's Veo 3 API.
+
+    Supported models:
+    - veo-3.0-generate-001
+    - veo-3.0-fast-generate-001
+
+    This node extends the base Veo node with Veo 3 specific features including
+    audio generation and fixed 8-second duration.
+    """
+
+    @classmethod
+    def INPUT_TYPES(s):
+        parent_input = super().INPUT_TYPES()
+
+        # Update model options for Veo 3
+        parent_input["optional"]["model"] = (
+            IO.COMBO,
+            {
+                "options": ["veo-3.0-generate-001", "veo-3.0-fast-generate-001"],
+                "default": "veo-3.0-generate-001",
+                "tooltip": "Veo 3 model to use for video generation",
+            },
+        )
+
+        # Add generateAudio parameter
+        parent_input["optional"]["generate_audio"] = (
+            IO.BOOLEAN,
+            {
+                "default": False,
+                "tooltip": "Generate audio for the video. Supported by all Veo 3 models.",
+            }
+        )
+
+        # Update duration constraints for Veo 3 (only 8 seconds supported)
+        parent_input["optional"]["duration_seconds"] = (
+            IO.INT,
+            {
+                "default": 8,
+                "min": 8,
+                "max": 8,
+                "step": 1,
+                "display": "number",
+                "tooltip": "Duration of the output video in seconds (Veo 3 only supports 8 seconds)",
+            },
+        )
+
+        return parent_input
+
+
+# Register the nodes
 NODE_CLASS_MAPPINGS = {
    "VeoVideoGenerationNode": VeoVideoGenerationNode,
+    "Veo3VideoGenerationNode": Veo3VideoGenerationNode,
 }

 NODE_DISPLAY_NAME_MAPPINGS = {
-    "VeoVideoGenerationNode": "Google Veo2 Video Generation",
+    "VeoVideoGenerationNode": "Google Veo 2 Video Generation",
+    "Veo3VideoGenerationNode": "Google Veo 3 Video Generation",
 }
--- a/comfy_execution/graph.py
+++ b/comfy_execution/graph.py
@@ -4,9 +4,12 @@ from typing import Type, Literal
 import nodes
 import asyncio
 import inspect
-from comfy_execution.graph_utils import is_link
+from comfy_execution.graph_utils import is_link, ExecutionBlocker
 from comfy.comfy_types.node_typing import ComfyNodeABC, InputTypeDict, InputTypeOptions

+# NOTE: ExecutionBlocker code got moved to graph_utils.py to prevent torch being imported too soon during unit tests
+ExecutionBlocker = ExecutionBlocker
+
 class DependencyCycleError(Exception):
    pass

@@ -294,21 +297,3 @@ class ExecutionList(TopologicalSort):
                del blocked_by[node_id]
            to_remove = [node_id for node_id in blocked_by if len(blocked_by[node_id]) == 0]
        return list(blocked_by.keys())
-
-class ExecutionBlocker:
-    """
-    Return this from a node and any users will be blocked with the given error message.
-    If the message is None, execution will be blocked silently instead.
-    Generally, you should avoid using this functionality unless absolutely necessary. Whenever it's
-    possible, a lazy input will be more efficient and have a better user experience.
-    This functionality is useful in two cases:
-    1. You want to conditionally prevent an output node from executing. (Particularly a built-in node
-       like SaveImage. For your own output nodes, I would recommend just adding a BOOL input and using
-       lazy evaluation to let it conditionally disable itself.)
-    2. You have a node with multiple possible outputs, some of which are invalid and should not be used.
-       (I would recommend not making nodes like this in the future -- instead, make multiple nodes with
-       different outputs. Unfortunately, there are several popular existing nodes using this pattern.)
-    """
-    def __init__(self, message):
-        self.message = message
-
--- a/comfy_execution/graph_utils.py
+++ b/comfy_execution/graph_utils.py
@@ -137,3 +137,19 @@ def add_graph_prefix(graph, outputs, prefix):

    return new_graph, tuple(new_outputs)

+class ExecutionBlocker:
+    """
+    Return this from a node and any users will be blocked with the given error message.
+    If the message is None, execution will be blocked silently instead.
+    Generally, you should avoid using this functionality unless absolutely necessary. Whenever it's
+    possible, a lazy input will be more efficient and have a better user experience.
+    This functionality is useful in two cases:
+    1. You want to conditionally prevent an output node from executing. (Particularly a built-in node
+       like SaveImage. For your own output nodes, I would recommend just adding a BOOL input and using
+       lazy evaluation to let it conditionally disable itself.)
+    2. You have a node with multiple possible outputs, some of which are invalid and should not be used.
+       (I would recommend not making nodes like this in the future -- instead, make multiple nodes with
+       different outputs. Unfortunately, there are several popular existing nodes using this pattern.)
+    """
+    def __init__(self, message):
+        self.message = message
--- a/comfy_execution/progress.py
+++ b/comfy_execution/progress.py
@@ -1,4 +1,6 @@
-from typing import TypedDict, Dict, Optional
+from __future__ import annotations
+
+from typing import TypedDict, Dict, Optional, Tuple
 from typing_extensions import override
 from PIL import Image
 from enum import Enum
@@ -10,6 +12,7 @@ if TYPE_CHECKING:
 from protocol import BinaryEventTypes
 from comfy_api import feature_flags

+PreviewImageTuple = Tuple[str, Image.Image, Optional[int]]

 class NodeState(Enum):
    Pending = "pending"
@@ -52,7 +55,7 @@ class ProgressHandler(ABC):
        max_value: float,
        state: NodeProgressState,
        prompt_id: str,
-        image: Optional[Image.Image] = None,
+        image: PreviewImageTuple | None = None,
    ):
        """Called when a node's progress is updated"""
        pass
@@ -103,7 +106,7 @@ class CLIProgressHandler(ProgressHandler):
        max_value: float,
        state: NodeProgressState,
        prompt_id: str,
-        image: Optional[Image.Image] = None,
+        image: PreviewImageTuple | None = None,
    ):
        # Handle case where start_handler wasn't called
        if node_id not in self.progress_bars:
@@ -196,7 +199,7 @@ class WebUIProgressHandler(ProgressHandler):
        max_value: float,
        state: NodeProgressState,
        prompt_id: str,
-        image: Optional[Image.Image] = None,
+        image: PreviewImageTuple | None = None,
    ):
        # Send progress state of all nodes
        if self.registry:
@@ -231,7 +234,6 @@ class WebUIProgressHandler(ProgressHandler):
        if self.registry:
            self._send_progress_state(prompt_id, self.registry.nodes)

-
 class ProgressRegistry:
    """
    Registry that maintains node progress state and notifies registered handlers.
@@ -285,7 +287,7 @@ class ProgressRegistry:
                handler.start_handler(node_id, entry, self.prompt_id)

    def update_progress(
-        self, node_id: str, value: float, max_value: float, image: Optional[Image.Image]
+        self, node_id: str, value: float, max_value: float, image: PreviewImageTuple | None = None
    ) -> None:
        """Update progress for a node"""
        entry = self.ensure_entry(node_id)
@@ -317,7 +319,7 @@ class ProgressRegistry:
            handler.reset()

 # Global registry instance
-global_progress_registry: ProgressRegistry = None
+global_progress_registry: ProgressRegistry | None = None

 def reset_progress_state(prompt_id: str, dynprompt: "DynamicPrompt") -> None:
    global global_progress_registry
--- a/comfy_extras/nodes_audio.py
+++ b/comfy_extras/nodes_audio.py
@@ -278,6 +278,42 @@ class PreviewAudio(SaveAudio):
                "hidden": {"prompt": "PROMPT", "extra_pnginfo": "EXTRA_PNGINFO"},
                }

+def f32_pcm(wav: torch.Tensor) -> torch.Tensor:
+    """Convert audio to float 32 bits PCM format."""
+    if wav.dtype.is_floating_point:
+        return wav
+    elif wav.dtype == torch.int16:
+        return wav.float() / (2 ** 15)
+    elif wav.dtype == torch.int32:
+        return wav.float() / (2 ** 31)
+    raise ValueError(f"Unsupported wav dtype: {wav.dtype}")
+
+def load(filepath: str) -> tuple[torch.Tensor, int]:
+    with av.open(filepath) as af:
+        if not af.streams.audio:
+            raise ValueError("No audio stream found in the file.")
+
+        stream = af.streams.audio[0]
+        sr = stream.codec_context.sample_rate
+        n_channels = stream.channels
+
+        frames = []
+        length = 0
+        for frame in af.decode(streams=stream.index):
+            buf = torch.from_numpy(frame.to_ndarray())
+            if buf.shape[0] != n_channels:
+                buf = buf.view(-1, n_channels).t()
+
+            frames.append(buf)
+            length += buf.shape[1]
+
+        if not frames:
+            raise ValueError("No audio frames decoded.")
+
+        wav = torch.cat(frames, dim=1)
+        wav = f32_pcm(wav)
+        return wav, sr
+
 class LoadAudio:
    @classmethod
    def INPUT_TYPES(s):
@@ -292,7 +328,7 @@ class LoadAudio:

    def load(self, audio):
        audio_path = folder_paths.get_annotated_filepath(audio)
-        waveform, sample_rate = torchaudio.load(audio_path)
+        waveform, sample_rate = load(audio_path)
        audio = {"waveform": waveform.unsqueeze(0), "sample_rate": sample_rate}
        return (audio, )

--- a/comfy_extras/nodes_model_merging_model_specific.py
+++ b/comfy_extras/nodes_model_merging_model_specific.py
@@ -314,6 +314,29 @@ class ModelMergeCosmosPredict2_14B(comfy_extras.nodes_model_merging.ModelMergeBl

        return {"required": arg_dict}

+class ModelMergeQwenImage(comfy_extras.nodes_model_merging.ModelMergeBlocks):
+    CATEGORY = "advanced/model_merging/model_specific"
+
+    @classmethod
+    def INPUT_TYPES(s):
+        arg_dict = { "model1": ("MODEL",),
+                              "model2": ("MODEL",)}
+
+        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+
+        arg_dict["pos_embeds."] = argument
+        arg_dict["img_in."] = argument
+        arg_dict["txt_norm."] = argument
+        arg_dict["txt_in."] = argument
+        arg_dict["time_text_embed."] = argument
+
+        for i in range(60):
+            arg_dict["transformer_blocks.{}.".format(i)] = argument
+
+        arg_dict["proj_out."] = argument
+
+        return {"required": arg_dict}
+
 NODE_CLASS_MAPPINGS = {
    "ModelMergeSD1": ModelMergeSD1,
    "ModelMergeSD2": ModelMergeSD1, #SD1 and SD2 have the same blocks
@@ -329,4 +352,5 @@ NODE_CLASS_MAPPINGS = {
    "ModelMergeWAN2_1": ModelMergeWAN2_1,
    "ModelMergeCosmosPredict2_2B": ModelMergeCosmosPredict2_2B,
    "ModelMergeCosmosPredict2_14B": ModelMergeCosmosPredict2_14B,
+    "ModelMergeQwenImage": ModelMergeQwenImage,
 }
--- a/comfy_extras/nodes_train.py
+++ b/comfy_extras/nodes_train.py
@@ -20,7 +20,7 @@ import folder_paths
 import node_helpers
 from comfy.cli_args import args
 from comfy.comfy_types.node_typing import IO
-from comfy.weight_adapter import adapters
+from comfy.weight_adapter import adapters, adapter_maps


 def make_batch_extra_option_dict(d, indicies, full_size=None):
@@ -39,13 +39,13 @@ def make_batch_extra_option_dict(d, indicies, full_size=None):


 class TrainSampler(comfy.samplers.Sampler):
-
-    def __init__(self, loss_fn, optimizer, loss_callback=None, batch_size=1, total_steps=1, seed=0, training_dtype=torch.bfloat16):
+    def __init__(self, loss_fn, optimizer, loss_callback=None, batch_size=1, grad_acc=1, total_steps=1, seed=0, training_dtype=torch.bfloat16):
        self.loss_fn = loss_fn
        self.optimizer = optimizer
        self.loss_callback = loss_callback
        self.batch_size = batch_size
        self.total_steps = total_steps
+        self.grad_acc = grad_acc
        self.seed = seed
        self.training_dtype = training_dtype

@@ -92,8 +92,9 @@ class TrainSampler(comfy.samplers.Sampler):
                self.loss_callback(loss.item())
            pbar.set_postfix({"loss": f"{loss.item():.4f}"})

-            self.optimizer.step()
-            self.optimizer.zero_grad()
+            if (i+1) % self.grad_acc == 0:
+                self.optimizer.step()
+                self.optimizer.zero_grad()
        torch.cuda.empty_cache()
        return torch.zeros_like(latent_image)

@@ -419,6 +420,16 @@ class TrainLoraNode:
                        "tooltip": "The batch size to use for training.",
                    },
                ),
+                "grad_accumulation_steps": (
+                    IO.INT,
+                    {
+                        "default": 1,
+                        "min": 1,
+                        "max": 1024,
+                        "step": 1,
+                        "tooltip": "The number of gradient accumulation steps to use for training.",
+                    }
+                ),
                "steps": (
                    IO.INT,
                    {
@@ -478,6 +489,17 @@ class TrainLoraNode:
                    ["bf16", "fp32"],
                    {"default": "bf16", "tooltip": "The dtype to use for lora."},
                ),
+                "algorithm": (
+                    list(adapter_maps.keys()),
+                    {"default": list(adapter_maps.keys())[0], "tooltip": "The algorithm to use for training."},
+                ),
+                "gradient_checkpointing": (
+                    IO.BOOLEAN,
+                    {
+                        "default": True,
+                        "tooltip": "Use gradient checkpointing for training.",
+                    }
+                ),
                "existing_lora": (
                    folder_paths.get_filename_list("loras") + ["[None]"],
                    {
@@ -501,6 +523,7 @@ class TrainLoraNode:
        positive,
        batch_size,
        steps,
+        grad_accumulation_steps,
        learning_rate,
        rank,
        optimizer,
@@ -508,6 +531,8 @@ class TrainLoraNode:
        seed,
        training_dtype,
        lora_dtype,
+        algorithm,
+        gradient_checkpointing,
        existing_lora,
    ):
        mp = model.clone()
@@ -558,10 +583,8 @@ class TrainLoraNode:
                                if existing_adapter is not None:
                                    break
                            else:
-                                # If no existing adapter found, use LoRA
-                                # We will add algo option in the future
                                existing_adapter = None
-                                adapter_cls = adapters[0]
+                                adapter_cls = adapter_maps[algorithm]

                            if existing_adapter is not None:
                                train_adapter = existing_adapter.to_train().to(lora_dtype)
@@ -615,8 +638,9 @@ class TrainLoraNode:
                criterion = torch.nn.SmoothL1Loss()

            # setup models
-            for m in find_all_highest_child_module_with_forward(mp.model.diffusion_model):
-                patch(m)
+            if gradient_checkpointing:
+                for m in find_all_highest_child_module_with_forward(mp.model.diffusion_model):
+                    patch(m)
            mp.model.requires_grad_(False)
            comfy.model_management.load_models_gpu([mp], memory_required=1e20, force_full_load=True)

@@ -629,7 +653,8 @@ class TrainLoraNode:
                optimizer,
                loss_callback=loss_callback,
                batch_size=batch_size,
-                total_steps=steps,
+                grad_acc=grad_accumulation_steps,
+                total_steps=steps*grad_accumulation_steps,
                seed=seed,
                training_dtype=dtype
            )
--- a/comfy_extras/nodes_video.py
+++ b/comfy_extras/nodes_video.py
@@ -8,9 +8,7 @@ import json
 from typing import Optional, Literal
 from fractions import Fraction
 from comfy.comfy_types import IO, FileLocator, ComfyNodeABC
-from comfy_api.input import ImageInput, AudioInput, VideoInput
-from comfy_api.util import VideoContainer, VideoCodec, VideoComponents
-from comfy_api.input_impl import VideoFromFile, VideoFromComponents
+from comfy_api.latest import Input, InputImpl, Types
 from comfy.cli_args import args

 class SaveWEBM:
@@ -91,8 +89,8 @@ class SaveVideo(ComfyNodeABC):
            "required": {
                "video": (IO.VIDEO, {"tooltip": "The video to save."}),
                "filename_prefix": ("STRING", {"default": "video/ComfyUI", "tooltip": "The prefix for the file to save. This may include formatting information such as %date:yyyy-MM-dd% or %Empty Latent Image.width% to include values from nodes."}),
-                "format": (VideoContainer.as_input(), {"default": "auto", "tooltip": "The format to save the video as."}),
-                "codec": (VideoCodec.as_input(), {"default": "auto", "tooltip": "The codec to use for the video."}),
+                "format": (Types.VideoContainer.as_input(), {"default": "auto", "tooltip": "The format to save the video as."}),
+                "codec": (Types.VideoCodec.as_input(), {"default": "auto", "tooltip": "The codec to use for the video."}),
            },
            "hidden": {
                "prompt": "PROMPT",
@@ -108,7 +106,7 @@ class SaveVideo(ComfyNodeABC):
    CATEGORY = "image/video"
    DESCRIPTION = "Saves the input images to your ComfyUI output directory."

-    def save_video(self, video: VideoInput, filename_prefix, format, codec, prompt=None, extra_pnginfo=None):
+    def save_video(self, video: Input.Video, filename_prefix, format, codec, prompt=None, extra_pnginfo=None):
        filename_prefix += self.prefix_append
        width, height = video.get_dimensions()
        full_output_folder, filename, counter, subfolder, filename_prefix = folder_paths.get_save_image_path(
@@ -127,7 +125,7 @@ class SaveVideo(ComfyNodeABC):
                metadata["prompt"] = prompt
            if len(metadata) > 0:
                saved_metadata = metadata
-        file = f"{filename}_{counter:05}_.{VideoContainer.get_extension(format)}"
+        file = f"{filename}_{counter:05}_.{Types.VideoContainer.get_extension(format)}"
        video.save_to(
            os.path.join(full_output_folder, file),
            format=format,
@@ -163,9 +161,9 @@ class CreateVideo(ComfyNodeABC):
    CATEGORY = "image/video"
    DESCRIPTION = "Create a video from images."

-    def create_video(self, images: ImageInput, fps: float, audio: Optional[AudioInput] = None):
-        return (VideoFromComponents(
-            VideoComponents(
+    def create_video(self, images: Input.Image, fps: float, audio: Optional[Input.Audio] = None):
+        return (InputImpl.VideoFromComponents(
+            Types.VideoComponents(
            images=images,
            audio=audio,
            frame_rate=Fraction(fps),
@@ -187,7 +185,7 @@ class GetVideoComponents(ComfyNodeABC):
    CATEGORY = "image/video"
    DESCRIPTION = "Extracts all components from a video: frames, audio, and framerate."

-    def get_components(self, video: VideoInput):
+    def get_components(self, video: Input.Video):
        components = video.get_components()

        return (components.images, components.audio, float(components.frame_rate))
@@ -208,7 +206,7 @@ class LoadVideo(ComfyNodeABC):
    FUNCTION = "load_video"
    def load_video(self, file):
        video_path = folder_paths.get_annotated_filepath(file)
-        return (VideoFromFile(video_path),)
+        return (InputImpl.VideoFromFile(video_path),)

    @classmethod
    def IS_CHANGED(cls, file):
@@ -239,3 +237,4 @@ NODE_DISPLAY_NAME_MAPPINGS = {
    "GetVideoComponents": "Get Video Components",
    "LoadVideo": "Load Video",
 }
+
--- a/comfy_extras/nodes_wan.py
+++ b/comfy_extras/nodes_wan.py
@@ -1,3 +1,4 @@
+import math
 import nodes
 import node_helpers
 import torch
@@ -5,7 +6,9 @@ import comfy.model_management
 import comfy.utils
 import comfy.latent_formats
 import comfy.clip_vision
-
+import json
+import numpy as np
+from typing import Tuple

 class WanImageToVideo:
    @classmethod
@@ -146,6 +149,7 @@ class WanFirstLastFrameToVideo:
        positive = node_helpers.conditioning_set_values(positive, {"concat_latent_image": concat_latent_image, "concat_mask": mask})
        negative = node_helpers.conditioning_set_values(negative, {"concat_latent_image": concat_latent_image, "concat_mask": mask})

+        clip_vision_output = None
        if clip_vision_start_image is not None:
            clip_vision_output = clip_vision_start_image

@@ -383,7 +387,350 @@ class WanPhantomSubjectToVideo:
        out_latent["samples"] = latent
        return (positive, cond2, negative, out_latent)

+def parse_json_tracks(tracks):
+    """Parse JSON track data into a standardized format"""
+    tracks_data = []
+    try:
+        # If tracks is a string, try to parse it as JSON
+        if isinstance(tracks, str):
+            parsed = json.loads(tracks.replace("'", '"'))
+            tracks_data.extend(parsed)
+        else:
+            # If tracks is a list of strings, parse each one
+            for track_str in tracks:
+                parsed = json.loads(track_str.replace("'", '"'))
+                tracks_data.append(parsed)
+
+        # Check if we have a single track (dict with x,y) or a list of tracks
+        if tracks_data and isinstance(tracks_data[0], dict) and 'x' in tracks_data[0]:
+            # Single track detected, wrap it in a list
+            tracks_data = [tracks_data]
+        elif tracks_data and isinstance(tracks_data[0], list) and tracks_data[0] and isinstance(tracks_data[0][0], dict) and 'x' in tracks_data[0][0]:
+            # Already a list of tracks, nothing to do
+            pass
+        else:
+            # Unexpected format
+            pass
+
+    except json.JSONDecodeError:
+        tracks_data = []
+    return tracks_data
+
+def process_tracks(tracks_np: np.ndarray, frame_size: Tuple[int, int], num_frames, quant_multi: int = 8, **kwargs):
+    # tracks: shape [t, h, w, 3] => samples align with 24 fps, model trained with 16 fps.
+    # frame_size: tuple (W, H)
+    tracks = torch.from_numpy(tracks_np).float()
+
+    if tracks.shape[1] == 121:
+        tracks = torch.permute(tracks, (1, 0, 2, 3))
+
+    tracks, visibles = tracks[..., :2], tracks[..., 2:3]
+
+    short_edge = min(*frame_size)
+
+    frame_center = torch.tensor([*frame_size]).type_as(tracks) / 2
+    tracks = tracks - frame_center
+
+    tracks = tracks / short_edge * 2
+
+    visibles = visibles * 2 - 1
+
+    trange = torch.linspace(-1, 1, tracks.shape[0]).view(-1, 1, 1, 1).expand(*visibles.shape)
+
+    out_ = torch.cat([trange, tracks, visibles], dim=-1).view(121, -1, 4)
+
+    out_0 = out_[:1]
+
+    out_l = out_[1:] # 121 => 120 | 1
+    a = 120 // math.gcd(120, num_frames)
+    b = num_frames // math.gcd(120, num_frames)
+    out_l = torch.repeat_interleave(out_l, b, dim=0)[1::a]  # 120 => 120 * b => 120 * b / a == F
+
+    final_result = torch.cat([out_0, out_l], dim=0)
+
+    return final_result
+
+FIXED_LENGTH = 121
+def pad_pts(tr):
+    """Convert list of {x,y} to (FIXED_LENGTH,1,3) array, padding/truncating."""
+    pts = np.array([[p['x'], p['y'], 1] for p in tr], dtype=np.float32)
+    n = pts.shape[0]
+    if n < FIXED_LENGTH:
+        pad = np.zeros((FIXED_LENGTH - n, 3), dtype=np.float32)
+        pts = np.vstack((pts, pad))
+    else:
+        pts = pts[:FIXED_LENGTH]
+    return pts.reshape(FIXED_LENGTH, 1, 3)
+
+def ind_sel(target: torch.Tensor, ind: torch.Tensor, dim: int = 1):
+    """Index selection utility function"""
+    assert (
+        len(ind.shape) > dim
+    ), "Index must have the target dim, but get dim: %d, ind shape: %s" % (dim, str(ind.shape))
+
+    target = target.expand(
+        *tuple(
+            [ind.shape[k] if target.shape[k] == 1 else -1 for k in range(dim)]
+            + [
+                -1,
+            ]
+            * (len(target.shape) - dim)
+        )
+    )
+
+    ind_pad = ind
+
+    if len(target.shape) > dim + 1:
+        for _ in range(len(target.shape) - (dim + 1)):
+            ind_pad = ind_pad.unsqueeze(-1)
+        ind_pad = ind_pad.expand(*(-1,) * (dim + 1), *target.shape[(dim + 1) : :])
+
+    return torch.gather(target, dim=dim, index=ind_pad)
+
+def merge_final(vert_attr: torch.Tensor, weight: torch.Tensor, vert_assign: torch.Tensor):
+    """Merge vertex attributes with weights"""
+    target_dim = len(vert_assign.shape) - 1
+    if len(vert_attr.shape) == 2:
+        assert vert_attr.shape[0] > vert_assign.max()
+        new_shape = [1] * target_dim + list(vert_attr.shape)
+        tensor = vert_attr.reshape(new_shape)
+        sel_attr = ind_sel(tensor, vert_assign.type(torch.long), dim=target_dim)
+    else:
+        assert vert_attr.shape[1] > vert_assign.max()
+        new_shape = [vert_attr.shape[0]] + [1] * (target_dim - 1) + list(vert_attr.shape[1:])
+        tensor = vert_attr.reshape(new_shape)
+        sel_attr = ind_sel(tensor, vert_assign.type(torch.long), dim=target_dim)
+
+    final_attr = torch.sum(sel_attr * weight.unsqueeze(-1), dim=-2)
+    return final_attr
+
+
+def _patch_motion_single(
+    tracks: torch.FloatTensor,  # (B, T, N, 4)
+    vid: torch.FloatTensor,     # (C, T, H, W)
+    temperature: float,
+    vae_divide: tuple,
+    topk: int,
+):
+    """Apply motion patching based on tracks"""
+    _, T, H, W = vid.shape
+    N = tracks.shape[2]
+    _, tracks_xy, visible = torch.split(
+        tracks, [1, 2, 1], dim=-1
+    )  # (B, T, N, 2) | (B, T, N, 1)
+    tracks_n = tracks_xy / torch.tensor([W / min(H, W), H / min(H, W)], device=tracks_xy.device)
+    tracks_n = tracks_n.clamp(-1, 1)
+    visible = visible.clamp(0, 1)
+
+    xx = torch.linspace(-W / min(H, W), W / min(H, W), W)
+    yy = torch.linspace(-H / min(H, W), H / min(H, W), H)
+
+    grid = torch.stack(torch.meshgrid(yy, xx, indexing="ij")[::-1], dim=-1).to(
+        tracks_xy.device
+    )
+
+    tracks_pad = tracks_xy[:, 1:]
+    visible_pad = visible[:, 1:]
+
+    visible_align = visible_pad.view(T - 1, 4, *visible_pad.shape[2:]).sum(1)
+    tracks_align = (tracks_pad * visible_pad).view(T - 1, 4, *tracks_pad.shape[2:]).sum(
+        1
+    ) / (visible_align + 1e-5)
+    dist_ = (
+        (tracks_align[:, None, None] - grid[None, :, :, None]).pow(2).sum(-1)
+    )  # T, H, W, N
+    weight = torch.exp(-dist_ * temperature) * visible_align.clamp(0, 1).view(
+        T - 1, 1, 1, N
+    )
+    vert_weight, vert_index = torch.topk(
+        weight, k=min(topk, weight.shape[-1]), dim=-1
+    )
+
+    grid_mode = "bilinear"
+    point_feature = torch.nn.functional.grid_sample(
+        vid.permute(1, 0, 2, 3)[:1],
+        tracks_n[:, :1].type(vid.dtype),
+        mode=grid_mode,
+        padding_mode="zeros",
+        align_corners=False,
+    )
+    point_feature = point_feature.squeeze(0).squeeze(1).permute(1, 0) # N, C=16
+
+    out_feature = merge_final(point_feature, vert_weight, vert_index).permute(3, 0, 1, 2) # T - 1, H, W, C => C, T - 1, H, W
+    out_weight = vert_weight.sum(-1) # T - 1, H, W
+
+    # out feature -> already soft weighted
+    mix_feature = out_feature + vid[:, 1:] * (1 - out_weight.clamp(0, 1))
+
+    out_feature_full = torch.cat([vid[:, :1], mix_feature], dim=1) # C, T, H, W
+    out_mask_full = torch.cat([torch.ones_like(out_weight[:1]), out_weight], dim=0)  # T, H, W
+
+    return out_mask_full[None].expand(vae_divide[0], -1, -1, -1), out_feature_full
+
+
+def patch_motion(
+    tracks: torch.FloatTensor,  # (B, TB, T, N, 4)
+    vid: torch.FloatTensor,     # (C, T, H, W)
+    temperature: float = 220.0,
+    vae_divide: tuple = (4, 16),
+    topk: int = 2,
+):
+    B = len(tracks)
+
+    # Process each batch separately
+    out_masks = []
+    out_features = []
+
+    for b in range(B):
+        mask, feature = _patch_motion_single(
+            tracks[b],  # (T, N, 4)
+            vid[b],        # (C, T, H, W)
+            temperature,
+            vae_divide,
+            topk
+        )
+        out_masks.append(mask)
+        out_features.append(feature)
+
+    # Stack results: (B, C, T, H, W)
+    out_mask_full = torch.stack(out_masks, dim=0)
+    out_feature_full = torch.stack(out_features, dim=0)
+
+    return out_mask_full, out_feature_full
+
+class WanTrackToVideo:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": {
+                    "positive": ("CONDITIONING", ),
+                    "negative": ("CONDITIONING", ),
+                    "vae": ("VAE", ),
+                    "tracks": ("STRING", {"multiline": True, "default": "[]"}),
+                    "width": ("INT", {"default": 832, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
+                    "height": ("INT", {"default": 480, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 16}),
+                    "length": ("INT", {"default": 81, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
+                    "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
+                    "temperature": ("FLOAT", {"default": 220.0, "min": 1.0, "max": 1000.0, "step": 0.1}),
+                    "topk": ("INT", {"default": 2, "min": 1, "max": 10}),
+                    "start_image": ("IMAGE", ),
+                },
+                "optional": {
+                    "clip_vision_output": ("CLIP_VISION_OUTPUT", ),
+                }}
+
+    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
+    RETURN_NAMES = ("positive", "negative", "latent")
+    FUNCTION = "encode"
+
+    CATEGORY = "conditioning/video_models"
+
+    def encode(self, positive, negative, vae, tracks, width, height, length, batch_size,
+               temperature, topk, start_image=None, clip_vision_output=None):
+
+        tracks_data = parse_json_tracks(tracks)
+
+        if not tracks_data:
+            return WanImageToVideo().encode(positive, negative, vae, width, height, length, batch_size, start_image=start_image, clip_vision_output=clip_vision_output)
+
+        latent = torch.zeros([batch_size, 16, ((length - 1) // 4) + 1, height // 8, width // 8],
+                           device=comfy.model_management.intermediate_device())
+
+        if isinstance(tracks_data[0][0], dict):
+            tracks_data = [tracks_data]
+
+        processed_tracks = []
+        for batch in tracks_data:
+            arrs = []
+            for track in batch:
+                pts = pad_pts(track)
+                arrs.append(pts)
+
+            tracks_np = np.stack(arrs, axis=0)
+            processed_tracks.append(process_tracks(tracks_np, (width, height), length - 1).unsqueeze(0))
+
+        if start_image is not None:
+            start_image = comfy.utils.common_upscale(start_image[:batch_size].movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
+            videos = torch.ones((start_image.shape[0], length, height, width, start_image.shape[-1]), device=start_image.device, dtype=start_image.dtype) * 0.5
+            for i in range(start_image.shape[0]):
+                videos[i, 0] = start_image[i]
+
+            latent_videos = []
+            videos = comfy.utils.resize_to_batch_size(videos, batch_size)
+            for i in range(batch_size):
+                latent_videos += [vae.encode(videos[i, :, :, :, :3])]
+            y = torch.cat(latent_videos, dim=0)
+
+            # Scale latent since patch_motion is non-linear
+            y = comfy.latent_formats.Wan21().process_in(y)
+
+            processed_tracks = comfy.utils.resize_list_to_batch_size(processed_tracks, batch_size)
+            res = patch_motion(
+                processed_tracks, y, temperature=temperature, topk=topk, vae_divide=(4, 16)
+            )
+
+            mask, concat_latent_image = res
+            concat_latent_image = comfy.latent_formats.Wan21().process_out(concat_latent_image)
+            mask = -mask + 1.0  # Invert mask to match expected format
+            positive = node_helpers.conditioning_set_values(positive,
+                                                            {"concat_mask": mask,
+                                                            "concat_latent_image": concat_latent_image})
+            negative = node_helpers.conditioning_set_values(negative,
+                                                            {"concat_mask": mask,
+                                                            "concat_latent_image": concat_latent_image})
+
+        if clip_vision_output is not None:
+            positive = node_helpers.conditioning_set_values(positive, {"clip_vision_output": clip_vision_output})
+            negative = node_helpers.conditioning_set_values(negative, {"clip_vision_output": clip_vision_output})
+
+        out_latent = {}
+        out_latent["samples"] = latent
+        return (positive, negative, out_latent)
+
+
+class Wan22ImageToVideoLatent:
+    @classmethod
+    def INPUT_TYPES(s):
+        return {"required": {"vae": ("VAE", ),
+                             "width": ("INT", {"default": 1280, "min": 32, "max": nodes.MAX_RESOLUTION, "step": 32}),
+                             "height": ("INT", {"default": 704, "min": 32, "max": nodes.MAX_RESOLUTION, "step": 32}),
+                             "length": ("INT", {"default": 49, "min": 1, "max": nodes.MAX_RESOLUTION, "step": 4}),
+                             "batch_size": ("INT", {"default": 1, "min": 1, "max": 4096}),
+                },
+                "optional": {"start_image": ("IMAGE", ),
+                }}
+
+
+    RETURN_TYPES = ("LATENT",)
+    FUNCTION = "encode"
+
+    CATEGORY = "conditioning/inpaint"
+
+    def encode(self, vae, width, height, length, batch_size, start_image=None):
+        latent = torch.zeros([1, 48, ((length - 1) // 4) + 1, height // 16, width // 16], device=comfy.model_management.intermediate_device())
+
+        if start_image is None:
+            out_latent = {}
+            out_latent["samples"] = latent
+            return (out_latent,)
+
+        mask = torch.ones([latent.shape[0], 1, ((length - 1) // 4) + 1, latent.shape[-2], latent.shape[-1]], device=comfy.model_management.intermediate_device())
+
+        if start_image is not None:
+            start_image = comfy.utils.common_upscale(start_image[:length].movedim(-1, 1), width, height, "bilinear", "center").movedim(1, -1)
+            latent_temp = vae.encode(start_image)
+            latent[:, :, :latent_temp.shape[-3]] = latent_temp
+            mask[:, :, :latent_temp.shape[-3]] *= 0.0
+
+        out_latent = {}
+        latent_format = comfy.latent_formats.Wan22()
+        latent = latent_format.process_out(latent) * mask + latent * (1.0 - mask)
+        out_latent["samples"] = latent.repeat((batch_size, ) + (1,) * (latent.ndim - 1))
+        out_latent["noise_mask"] = mask.repeat((batch_size, ) + (1,) * (mask.ndim - 1))
+        return (out_latent,)
+
+
 NODE_CLASS_MAPPINGS = {
+    "WanTrackToVideo": WanTrackToVideo,
    "WanImageToVideo": WanImageToVideo,
    "WanFunControlToVideo": WanFunControlToVideo,
    "WanFunInpaintToVideo": WanFunInpaintToVideo,
@@ -392,4 +739,5 @@ NODE_CLASS_MAPPINGS = {
    "TrimVideoLatent": TrimVideoLatent,
    "WanCameraImageToVideo": WanCameraImageToVideo,
    "WanPhantomSubjectToVideo": WanPhantomSubjectToVideo,
+    "Wan22ImageToVideoLatent": Wan22ImageToVideoLatent,
 }
--- a/comfyui_version.py
+++ b/comfyui_version.py
@@ -1,3 +1,3 @@
 # This file is automatically generated by the build process when version is
 # updated in pyproject.toml.
-__version__ = "0.3.45"
+__version__ = "0.3.49"
--- a/cuda_malloc.py
+++ b/cuda_malloc.py
@@ -74,7 +74,8 @@ if not args.cuda_malloc:
                module = importlib.util.module_from_spec(spec)
                spec.loader.exec_module(module)
                version = module.__version__
-        if int(version[0]) >= 2: #enable by default for torch version 2.0 and up
+
+        if int(version[0]) >= 2 and "+cu" in version: #enable by default for torch version 2.0 and up only on cuda torch
            args.cuda_malloc = cuda_malloc_supported()
    except:
        pass
--- a/execution.py
+++ b/execution.py
@@ -7,7 +7,7 @@ import threading
 import time
 import traceback
 from enum import Enum
-from typing import List, Literal, NamedTuple, Optional
+from typing import List, Literal, NamedTuple, Optional, Union
 import asyncio

 import torch
@@ -32,6 +32,8 @@ from comfy_execution.graph_utils import GraphBuilder, is_link
 from comfy_execution.validation import validate_node_input
 from comfy_execution.progress import get_progress_state, reset_progress_state, add_progress_handler, WebUIProgressHandler
 from comfy_execution.utils import CurrentNodeContext
+from comfy_api.internal import _ComfyNodeInternal, _NodeOutputInternal, first_real_override, is_class, make_locked_method_func
+from comfy_api.latest import io


 class ExecutionResult(Enum):
@@ -56,7 +58,15 @@ class IsChangedCache:
        node = self.dynprompt.get_node(node_id)
        class_type = node["class_type"]
        class_def = nodes.NODE_CLASS_MAPPINGS[class_type]
-        if not hasattr(class_def, "IS_CHANGED"):
+        has_is_changed = False
+        is_changed_name = None
+        if issubclass(class_def, _ComfyNodeInternal) and first_real_override(class_def, "fingerprint_inputs") is not None:
+            has_is_changed = True
+            is_changed_name = "fingerprint_inputs"
+        elif hasattr(class_def, "IS_CHANGED"):
+            has_is_changed = True
+            is_changed_name = "IS_CHANGED"
+        if not has_is_changed:
            self.is_changed[node_id] = False
            return self.is_changed[node_id]

@@ -65,9 +75,9 @@ class IsChangedCache:
            return self.is_changed[node_id]

        # Intentionally do not use cached outputs here. We only want constants in IS_CHANGED
-        input_data_all, _ = get_input_data(node["inputs"], class_def, node_id, None)
+        input_data_all, _, hidden_inputs = get_input_data(node["inputs"], class_def, node_id, None)
        try:
-            is_changed = await _async_map_node_over_list(self.prompt_id, node_id, class_def, input_data_all, "IS_CHANGED")
+            is_changed = await _async_map_node_over_list(self.prompt_id, node_id, class_def, input_data_all, is_changed_name)
            is_changed = await resolve_map_node_over_list_results(is_changed)
            node["is_changed"] = [None if isinstance(x, ExecutionBlocker) else x for x in is_changed]
        except Exception as e:
@@ -126,9 +136,14 @@ class CacheSet:
 SENSITIVE_EXTRA_DATA_KEYS = ("auth_token_comfy_org", "api_key_comfy_org")

 def get_input_data(inputs, class_def, unique_id, outputs=None, dynprompt=None, extra_data={}):
-    valid_inputs = class_def.INPUT_TYPES()
+    is_v3 = issubclass(class_def, _ComfyNodeInternal)
+    if is_v3:
+        valid_inputs, schema = class_def.INPUT_TYPES(include_hidden=False, return_schema=True)
+    else:
+        valid_inputs = class_def.INPUT_TYPES()
    input_data_all = {}
    missing_keys = {}
+    hidden_inputs_v3 = {}
    for x in inputs:
        input_data = inputs[x]
        _, input_category, input_info = get_input_info(class_def, x, valid_inputs)
@@ -153,22 +168,37 @@ def get_input_data(inputs, class_def, unique_id, outputs=None, dynprompt=None, e
        elif input_category is not None:
            input_data_all[x] = [input_data]

-    if "hidden" in valid_inputs:
-        h = valid_inputs["hidden"]
-        for x in h:
-            if h[x] == "PROMPT":
-                input_data_all[x] = [dynprompt.get_original_prompt() if dynprompt is not None else {}]
-            if h[x] == "DYNPROMPT":
-                input_data_all[x] = [dynprompt]
-            if h[x] == "EXTRA_PNGINFO":
-                input_data_all[x] = [extra_data.get('extra_pnginfo', None)]
-            if h[x] == "UNIQUE_ID":
-                input_data_all[x] = [unique_id]
-            if h[x] == "AUTH_TOKEN_COMFY_ORG":
-                input_data_all[x] = [extra_data.get("auth_token_comfy_org", None)]
-            if h[x] == "API_KEY_COMFY_ORG":
-                input_data_all[x] = [extra_data.get("api_key_comfy_org", None)]
-    return input_data_all, missing_keys
+    if is_v3:
+        if schema.hidden:
+            if io.Hidden.prompt in schema.hidden:
+                hidden_inputs_v3[io.Hidden.prompt] = dynprompt.get_original_prompt() if dynprompt is not None else {}
+            if io.Hidden.dynprompt in schema.hidden:
+                hidden_inputs_v3[io.Hidden.dynprompt] = dynprompt
+            if io.Hidden.extra_pnginfo in schema.hidden:
+                hidden_inputs_v3[io.Hidden.extra_pnginfo] = extra_data.get('extra_pnginfo', None)
+            if io.Hidden.unique_id in schema.hidden:
+                hidden_inputs_v3[io.Hidden.unique_id] = unique_id
+            if io.Hidden.auth_token_comfy_org in schema.hidden:
+                hidden_inputs_v3[io.Hidden.auth_token_comfy_org] = extra_data.get("auth_token_comfy_org", None)
+            if io.Hidden.api_key_comfy_org in schema.hidden:
+                hidden_inputs_v3[io.Hidden.api_key_comfy_org] = extra_data.get("api_key_comfy_org", None)
+    else:
+        if "hidden" in valid_inputs:
+            h = valid_inputs["hidden"]
+            for x in h:
+                if h[x] == "PROMPT":
+                    input_data_all[x] = [dynprompt.get_original_prompt() if dynprompt is not None else {}]
+                if h[x] == "DYNPROMPT":
+                    input_data_all[x] = [dynprompt]
+                if h[x] == "EXTRA_PNGINFO":
+                    input_data_all[x] = [extra_data.get('extra_pnginfo', None)]
+                if h[x] == "UNIQUE_ID":
+                    input_data_all[x] = [unique_id]
+                if h[x] == "AUTH_TOKEN_COMFY_ORG":
+                    input_data_all[x] = [extra_data.get("auth_token_comfy_org", None)]
+                if h[x] == "API_KEY_COMFY_ORG":
+                    input_data_all[x] = [extra_data.get("api_key_comfy_org", None)]
+    return input_data_all, missing_keys, hidden_inputs_v3

 map_node_over_list = None #Don't hook this please

@@ -184,7 +214,7 @@ async def resolve_map_node_over_list_results(results):
                raise exc
        return [x.result() if isinstance(x, asyncio.Task) else x for x in results]

-async def _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, func, allow_interrupt=False, execution_block_cb=None, pre_execute_cb=None):
+async def _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, func, allow_interrupt=False, execution_block_cb=None, pre_execute_cb=None, hidden_inputs=None):
    # check if node wants the lists
    input_is_list = getattr(obj, "INPUT_IS_LIST", False)

@@ -214,7 +244,22 @@ async def _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, f
        if execution_block is None:
            if pre_execute_cb is not None and index is not None:
                pre_execute_cb(index)
-            f = getattr(obj, func)
+            # V3
+            if isinstance(obj, _ComfyNodeInternal) or (is_class(obj) and issubclass(obj, _ComfyNodeInternal)):
+                # if is just a class, then assign no resources or state, just create clone
+                if is_class(obj):
+                    type_obj = obj
+                    obj.VALIDATE_CLASS()
+                    class_clone = obj.PREPARE_CLASS_CLONE(hidden_inputs)
+                # otherwise, use class instance to populate/reuse some fields
+                else:
+                    type_obj = type(obj)
+                    type_obj.VALIDATE_CLASS()
+                    class_clone = type_obj.PREPARE_CLASS_CLONE(hidden_inputs)
+                f = make_locked_method_func(type_obj, func, class_clone)
+            # V1
+            else:
+                f = getattr(obj, func)
            if inspect.iscoroutinefunction(f):
                async def async_wrapper(f, prompt_id, unique_id, list_index, args):
                    with CurrentNodeContext(prompt_id, unique_id, list_index):
@@ -266,8 +311,8 @@ def merge_result_data(results, obj):
            output.append([o[i] for o in results])
    return output

-async def get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=None, pre_execute_cb=None):
-    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
+async def get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=None, pre_execute_cb=None, hidden_inputs=None):
+    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
    has_pending_task = any(isinstance(r, asyncio.Task) and not r.done() for r in return_values)
    if has_pending_task:
        return return_values, {}, False, has_pending_task
@@ -298,6 +343,26 @@ def get_output_from_returns(return_values, obj):
                    result = tuple([result] * len(obj.RETURN_TYPES))
                results.append(result)
                subgraph_results.append((None, result))
+        elif isinstance(r, _NodeOutputInternal):
+            # V3
+            if r.ui is not None:
+                if isinstance(r.ui, dict):
+                    uis.append(r.ui)
+                else:
+                    uis.append(r.ui.as_dict())
+            if r.expand is not None:
+                has_subgraph = True
+                new_graph = r.expand
+                result = r.result
+                if r.block_execution is not None:
+                    result = tuple([ExecutionBlocker(r.block_execution)] * len(obj.RETURN_TYPES))
+                subgraph_results.append((new_graph, result))
+            elif r.result is not None:
+                result = r.result
+                if r.block_execution is not None:
+                    result = tuple([ExecutionBlocker(r.block_execution)] * len(obj.RETURN_TYPES))
+                results.append(result)
+                subgraph_results.append((None, result))
        else:
            if isinstance(r, ExecutionBlocker):
                r = tuple([r] * len(obj.RETURN_TYPES))
@@ -381,7 +446,7 @@ async def execute(server, dynprompt, caches, current_item, extra_data, executed,
            has_subgraph = False
        else:
            get_progress_state().start_progress(unique_id)
-            input_data_all, missing_keys = get_input_data(inputs, class_def, unique_id, caches.outputs, dynprompt, extra_data)
+            input_data_all, missing_keys, hidden_inputs = get_input_data(inputs, class_def, unique_id, caches.outputs, dynprompt, extra_data)
            if server.client_id is not None:
                server.last_node_id = display_node_id
                server.send_sync("executing", { "node": unique_id, "display_node": display_node_id, "prompt_id": prompt_id }, server.client_id)
@@ -391,8 +456,12 @@ async def execute(server, dynprompt, caches, current_item, extra_data, executed,
                obj = class_def()
                caches.objects.set(unique_id, obj)

-            if hasattr(obj, "check_lazy_status"):
-                required_inputs = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, "check_lazy_status", allow_interrupt=True)
+            if issubclass(class_def, _ComfyNodeInternal):
+                lazy_status_present = first_real_override(class_def, "check_lazy_status") is not None
+            else:
+                lazy_status_present = getattr(obj, "check_lazy_status", None) is not None
+            if lazy_status_present:
+                required_inputs = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, "check_lazy_status", allow_interrupt=True, hidden_inputs=hidden_inputs)
                required_inputs = await resolve_map_node_over_list_results(required_inputs)
                required_inputs = set(sum([r for r in required_inputs if isinstance(r,list)], []))
                required_inputs = [x for x in required_inputs if isinstance(x,str) and (
@@ -424,7 +493,7 @@ async def execute(server, dynprompt, caches, current_item, extra_data, executed,
            def pre_execute_cb(call_index):
                # TODO - How to handle this with async functions without contextvars (which requires Python 3.12)?
                GraphBuilder.set_default_prefix(unique_id, call_index, 0)
-            output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
+            output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
            if has_pending_tasks:
                pending_async_nodes[unique_id] = output_data
                unblock = execution_list.add_external_block(unique_id)
@@ -577,8 +646,6 @@ class PromptExecutor:
            self.add_message("execution_error", mes, broadcast=False)

    def execute(self, prompt, prompt_id, extra_data={}, execute_outputs=[]):
-        asyncio_loop = asyncio.new_event_loop()
-        asyncio.set_event_loop(asyncio_loop)
        asyncio.run(self.execute_async(prompt, prompt_id, extra_data, execute_outputs))

    async def execute_async(self, prompt, prompt_id, extra_data={}, execute_outputs=[]):
@@ -672,8 +739,14 @@ async def validate_inputs(prompt_id, prompt, item, validated):

    validate_function_inputs = []
    validate_has_kwargs = False
-    if hasattr(obj_class, "VALIDATE_INPUTS"):
-        argspec = inspect.getfullargspec(obj_class.VALIDATE_INPUTS)
+    if issubclass(obj_class, _ComfyNodeInternal):
+        validate_function_name = "validate_inputs"
+        validate_function = first_real_override(obj_class, validate_function_name)
+    else:
+        validate_function_name = "VALIDATE_INPUTS"
+        validate_function = getattr(obj_class, validate_function_name, None)
+    if validate_function is not None:
+        argspec = inspect.getfullargspec(validate_function)
        validate_function_inputs = argspec.args
        validate_has_kwargs = argspec.varkw is not None
    received_types = {}
@@ -848,7 +921,7 @@ async def validate_inputs(prompt_id, prompt, item, validated):
                        continue

    if len(validate_function_inputs) > 0 or validate_has_kwargs:
-        input_data_all, _ = get_input_data(inputs, obj_class, unique_id)
+        input_data_all, _, hidden_inputs = get_input_data(inputs, obj_class, unique_id)
        input_filtered = {}
        for x in input_data_all:
            if x in validate_function_inputs or validate_has_kwargs:
@@ -856,8 +929,7 @@ async def validate_inputs(prompt_id, prompt, item, validated):
        if 'input_types' in validate_function_inputs:
            input_filtered['input_types'] = [received_types]

-        #ret = obj_class.VALIDATE_INPUTS(**input_filtered)
-        ret = await _async_map_node_over_list(prompt_id, unique_id, obj_class, input_filtered, "VALIDATE_INPUTS")
+        ret = await _async_map_node_over_list(prompt_id, unique_id, obj_class, input_filtered, validate_function_name, hidden_inputs=hidden_inputs)
        ret = await resolve_map_node_over_list_results(ret)
        for x in input_filtered:
            for i, r in enumerate(ret):
@@ -891,7 +963,7 @@ def full_type_name(klass):
        return klass.__qualname__
    return module + '.' + klass.__qualname__

-async def validate_prompt(prompt_id, prompt):
+async def validate_prompt(prompt_id, prompt, partial_execution_list: Union[list[str], None]):
    outputs = set()
    for x in prompt:
        if 'class_type' not in prompt[x]:
@@ -915,7 +987,8 @@ async def validate_prompt(prompt_id, prompt):
            return (False, error, [], {})

        if hasattr(class_, 'OUTPUT_NODE') and class_.OUTPUT_NODE is True:
-            outputs.add(x)
+            if partial_execution_list is None or x in partial_execution_list:
+                outputs.add(x)

    if len(outputs) == 0:
        error = {
@@ -1097,7 +1170,7 @@ class PromptQueue:
                    return True
        return False

-    def get_history(self, prompt_id=None, max_items=None, offset=-1):
+    def get_history(self, prompt_id=None, max_items=None, offset=-1, map_function=None):
        with self.mutex:
            if prompt_id is None:
                out = {}
@@ -1106,13 +1179,21 @@ class PromptQueue:
                    offset = len(self.history) - max_items
                for k in self.history:
                    if i >= offset:
-                        out[k] = self.history[k]
+                        p = self.history[k]
+                        if map_function is not None:
+                            p = map_function(p)
+                        out[k] = p
                        if max_items is not None and len(out) >= max_items:
                            break
                    i += 1
                return out
            elif prompt_id in self.history:
-                return {prompt_id: copy.deepcopy(self.history[prompt_id])}
+                p = self.history[prompt_id]
+                if map_function is None:
+                    p = copy.deepcopy(p)
+                else:
+                    p = map_function(p)
+                return {prompt_id: p}
            else:
                return {}

--- a/folder_paths.py
+++ b/folder_paths.py
@@ -275,10 +275,7 @@ def filter_files_extensions(files: Collection[str], extensions: Collection[str])



-def get_full_path(folder_name: str, filename: str) -> str | None:
-    """
-    Get the full path of a file in a folder, has to be a file
-    """
+def get_full_path(folder_name: str, filename: str, allow_missing: bool = False) -> str | None:
    global folder_names_and_paths
    folder_name = map_legacy(folder_name)
    if folder_name not in folder_names_and_paths:
@@ -291,6 +288,8 @@ def get_full_path(folder_name: str, filename: str) -> str | None:
            return full_path
        elif os.path.islink(full_path):
            logging.warning("WARNING path {} exists but doesn't link anywhere, skipping.".format(full_path))
+        elif allow_missing:
+            return full_path

    return None

@@ -305,6 +304,27 @@ def get_full_path_or_raise(folder_name: str, filename: str) -> str:
    return full_path


+def get_relative_path(full_path: str) -> tuple[str, str] | None:
+    """Convert a full path back to a type-relative path.
+
+    Args:
+        full_path: The full path to the file
+
+    Returns:
+        tuple[str, str] | None: A tuple of (model_type, relative_path) if found, None otherwise
+    """
+    global folder_names_and_paths
+    full_path = os.path.normpath(full_path)
+
+    for model_type, (paths, _) in folder_names_and_paths.items():
+        for base_path in paths:
+            base_path = os.path.normpath(base_path)
+            if full_path.startswith(base_path):
+                relative_path = os.path.relpath(full_path, base_path)
+                return model_type, relative_path
+
+    return None
+
 def get_filename_list_(folder_name: str) -> tuple[list[str], dict[str, float], float]:
    folder_name = map_legacy(folder_name)
    global folder_names_and_paths
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Jedrzej Kosinski	3089936a2c	Merge branch 'master' into pysssss-model-db	2025-08-11 14:09:21 -07:00
PsychoLogicAu	2208aa616d	Support SimpleTuner lycoris lora for Qwen-Image (#9280 )	2025-08-11 16:56:16 -04:00
ComfyUI Wiki	629b173837	Update template & embedded docs (#9283 ) * Update template & embedded docs * Update embedded docs to 0.2.6	2025-08-11 16:52:12 -04:00
Alexander Piskun	fa340add55	remove creation of non-used asyncio_loop (#9284 )	2025-08-11 16:48:17 -04:00
comfyanonymous	966f3a5206	Only show feature flags log when verbose. (#9281 )	2025-08-11 05:53:01 -04:00
comfyanonymous	0552de7c7d	Bump pytorch cuda and rocm versions in readme instructions. (#9273 )	2025-08-10 05:03:47 -04:00
comfyanonymous	5828607ccf	Not sure if AMD actually support fp16 acc but it doesn't crash. (#9258 )	2025-08-09 12:49:25 -04:00
comfyanonymous	735bb4bdb1	Users report gfx1201 is buggy on flux with pytorch attention. (#9244 )	2025-08-08 04:21:00 -04:00
Jedrzej Kosinski	cd679129e3	Merge branch 'master' into pysssss-model-db	2025-08-07 21:12:30 -07:00
Alexander Piskun	bf2a1b5b1e	async API nodes (#9129 ) * converted API nodes to async * converted BFL API nodes to async * fixed client bug; converted gemini, ideogram, minimax * fixed client bug; converted openai nodes * fixed client bug; converted moonvalley, pika nodes * fixed client bug; converted kling, luma nodes * converted pixverse, rodin nodes * converted tripo, veo2 * converted recraft nodes * add lost log_request_response call	2025-08-07 23:37:50 -04:00
Jedrzej Kosinski	42974a448c	_ui.py import torchaudio safety check (#9234 ) * Added safety around torchaudio import in _ui.py * Trusted cursor too much, fixed torchaudio bool	2025-08-07 17:54:09 -04:00
comfyanonymous	05df2df489	Fix RepeatLatentBatch not working on multi dim latents. (#9227 )	2025-08-07 11:20:40 -04:00
Christian Byrne	37d620a6b8	Update frontend to v1.24.3 (#9175 )	2025-08-06 19:52:39 -04:00
ComfyUI Wiki	32691b16f4	Update template to 0.1.52 (#9206 )	2025-08-06 13:26:29 -04:00
flybirdxx	4c3e57b0ae	Fixed an issue where qwenLora could not be loaded properly. (#9208 )	2025-08-06 13:23:11 -04:00
comfyanonymous	9126c0cfe4	Qwen Image model merging node. (#9202 )	2025-08-06 04:07:04 -04:00
comfyanonymous	d8c51ba15a	Add Qwen Image model to readme. (#9191 )	2025-08-05 07:41:18 -04:00
comfyanonymous	32a95bba8a	ComfyUI version 0.3.49	2025-08-05 07:33:02 -04:00
ComfyUI Wiki	da1ad9b516	Update template to 0.1.51 (#9187 )	2025-08-05 07:24:12 -04:00
comfyanonymous	d044a24398	Fix default shift and any latent size for qwen image model. (#9186 )	2025-08-05 06:12:27 -04:00
ComfyUI Wiki	5be6fd09ff	Update template to 0.1.48 (#9182 )	2025-08-05 03:48:56 -04:00
Christian Byrne	f69609bbd6	Add Veo3 video generation node with audio support (#9110 ) - Create new Veo3VideoGenerationNode that extends VeoVideoGenerationNode - Add support for generateAudio parameter (only for Veo3 models) - Support new Veo3 models: veo-3.0-generate-001, veo-3.0-fast-generate-001 - Fix Veo3 duration constraint to 8 seconds only - Update original node to be clearly Veo 2 only - Update API paths to use model parameter: /proxy/veo/{model}/generate - Regenerate API types from staging to include generateAudio parameter - Fix TripoModelVersion enum reference after regeneration - Mark generated API types file in .gitattributes	2025-08-05 01:52:25 -04:00
comfyanonymous	c012400240	Initial support for qwen image model. (#9179 )	2025-08-04 22:53:25 -04:00
comfyanonymous	03895dea7c	Fix another issue with the PR. (#9170 )	2025-08-04 04:33:04 -04:00
comfyanonymous	84f9759424	Add some warnings and prevent crash when cond devices don't match. (#9169 )	2025-08-04 04:20:12 -04:00
comfyanonymous	7991341e89	Various fixes for broken things from earlier PR. (#9168 )	2025-08-04 04:02:40 -04:00
comfyanonymous	140ffc7fdc	Fix broken controlnet from last PR. (#9167 )	2025-08-04 03:28:12 -04:00
comfyanonymous	182f90b5ec	Lower cond vram use by casting at the same time as device transfer. (#9159 )	2025-08-04 03:11:53 -04:00
pythongosssss	d7062277a7	fix bad merge	2025-08-03 16:40:27 +01:00
pythongosssss	54cf14cbbb	Merge remote-tracking branch 'origin/master' into pysssss-model-db	2025-08-03 16:36:49 +01:00
comfyanonymous	aebac22193	Cleanup. (#9160 )	2025-08-03 07:08:11 -04:00
comfyanonymous	13aaa66ec2	Make sure context is on the right device. (#9154 )	2025-08-02 15:09:23 -04:00
comfyanonymous	5f582a9757	Make sure all the conds are on the right device. (#9151 )	2025-08-02 15:00:13 -04:00
ComfyUI Wiki	fbcc23945d	Update template to 0.1.47 (#9153 )	2025-08-02 14:15:29 -04:00
Johnpaul Chiwetelu	3dfefc88d0	API for Recently Used Items (#8792 ) * feat: add file creation time to model file metadata and user file info * fix linting	2025-08-01 22:02:06 -04:00
comfyanonymous	bff60b5cfc	ComfyUI version 0.3.48	2025-08-01 20:03:22 -04:00
comfyanonymous	1e638a140b	Tiny wan vae optimizations. (#9136 )	2025-08-01 05:25:38 -04:00
ComfyUI Wiki	4696d74305	update template to 0.1.45 (#9135 )	2025-08-01 03:06:18 -04:00
comfyanonymous	5ee381c058	Fix WanFirstLastFrameToVideo node when no clip vision. (#9134 )	2025-07-31 23:33:27 -04:00
Jedrzej Kosinski	4887743a2a	V3 Node Schema Definition - initial (#8656 )	2025-07-31 18:02:12 -04:00
comfyanonymous	97b8a2c26a	More accurate explanation of release process. (#9126 )	2025-07-31 05:46:23 -04:00
guill	97eb256a35	Add support for partial execution in backend (#9123 ) When a prompt is submitted, it can optionally include `partial_execution_targets` as a list of ids. If it does, rather than adding all outputs to the execution list, we add only those in the list.	2025-07-30 22:55:28 -04:00
chaObserv	61b08d4ba6	Replace manual x * sigmoid(x) with torch silu in VAE nonlinearity (#9057 )	2025-07-30 19:25:56 -04:00
comfyanonymous	da9dab7edd	Small wan camera memory optimization. (#9111 )	2025-07-30 05:55:26 -04:00
ComfyUI Wiki	d2aaef029c	Update template to 0.1.44 (#9104 )	2025-07-29 22:50:49 -04:00
guill	0a3d062e06	ComfyAPI Core v0.0.2 (#8962 ) * ComfyAPI Core v0.0.2 * Respond to PR feedback * Fix Python 3.9 errors * Fix missing backward compatibility proxy * Reorganize types a bit The input types, input impls, and utility types are now all available in the versioned API. See the change in `comfy_extras/nodes_video.py` for an example of their usage. * Remove the need for `--generate-api-stubs` * Fix generated stubs differing by Python version * Fix ruff formatting issues	2025-07-29 22:17:22 -04:00
comfyanonymous	2f74e17975	ComfyUI version 0.3.47	2025-07-29 20:08:25 -04:00
comfyanonymous	dca6bdd4fa	Make wan2.2 5B i2v take a lot less memory. (#9102 )	2025-07-29 19:44:18 -04:00
comfyanonymous	7d593baf91	Extra reserved vram on large cards on windows. (#9093 )	2025-07-29 04:07:45 -04:00
comfyanonymous	c60dc4177c	Remove unecessary clones in the wan2.2 VAE. (#9083 )	2025-07-28 14:48:19 -04:00
comfyanonymous	5d4cc3ba1b	ComfyUI 0.3.46	2025-07-28 08:04:04 -04:00
comfyanonymous	9f1388c0a3	Add wan2.2 to readme. (#9081 )	2025-07-28 08:01:53 -04:00
comfyanonymous	a88788dce6	Wan 2.2 support. (#9080 )	2025-07-28 08:00:23 -04:00
ComfyUI Wiki	d0210fe2e5	Update template to 0.1.41 (#9079 )	2025-07-28 07:55:02 -04:00
Christian Byrne	e6d9f62744	Add Moonvalley Marey V2V node with updated input validation (#9069 ) * [moonvalley] Update V2V node to match API specification - Add exact resolution validation for supported resolutions (1920x1080, 1080x1920, 1152x1152, 1536x1152, 1152x1536) - Change frame count validation from divisible by 32 to 16 - Add MP4 container format validation - Remove internal parameters (steps, guidance_scale) from V2V inference params - Update video duration handling to support only 5 seconds (auto-trim if longer) - Add motion_intensity parameter (0-100) for Motion Transfer control type - Add get_container_format() method to VideoInput classes * update negative prompt	2025-07-27 19:51:36 -04:00
comfyanonymous	78672d0ee6	Small readme update. (#9071 )	2025-07-27 07:42:58 -04:00
ComfyUI Wiki	1ef70fcde4	Fix the broken link (#9060 )	2025-07-26 17:25:33 -04:00
comfyanonymous	0621d73a9c	Remove useless code. (#9059 )	2025-07-26 04:44:19 -04:00
comfyanonymous	b850d9a8bb	Add map_function to get_history. (#9056 )	2025-07-25 21:25:45 -04:00
Thor-ATX	c60467a148	Update negative prompt for Moonvalley nodes (#9038 ) Co-authored-by: thorsten <thorsten@tripod-digital.co.nz>	2025-07-25 17:27:03 -04:00
comfyanonymous	c0207b473f	Fix issue with line endings github workflow. (#9053 )	2025-07-25 17:25:08 -04:00
ComfyUI Wiki	93bc2f8e4d	Update template to 0.1.40 (#9048 )	2025-07-25 13:24:23 -04:00
comfyanonymous	e6e5d33b35	Remove useless code. (#9041 ) This is only needed on old pytorch 2.0 and older.	2025-07-25 04:58:28 -04:00
Eugene Fairley	4293e4da21	Add WAN ATI support (#8874 ) * Add WAN ATI support * Fixes * Fix length * Remove extra functions * Fix * Fix * Ruff fix * Remove torch.no_grad * Add batch trajectory logic * Scale inputs before and after motion patch * Batch image/trajectory * Ruff fix * Clean up	2025-07-24 20:59:19 -04:00
comfyanonymous	69cb57b342	Print xpu device name. (#9035 )	2025-07-24 15:06:25 -04:00
SHIVANSH GUPTA	d03ae077b4	Added parameter required_frontend_version in the /system_stats API response (#8875 ) * Added the parameter required_frontend_version in the /system_stats api response * Update server.py * Created a function get_required_frontend_version and wrote tests for it * Refactored the function to return currently installed frontend pacakage version * Moved required_frontend to a new function and imported that in server.py * Corrected test cases using mocking techniques * Corrected files to comply with ruff formatting	2025-07-24 14:05:54 -04:00
honglyua	0ccc88b03f	Support Iluvatar CoreX (#8585 ) * Support Iluvatar CoreX Co-authored-by: mingjiang.li <mingjiang.li@iluvatar.com>	2025-07-24 13:57:36 -04:00
Kohaku-Blueleaf	eb2f78b4e0	[Training Node] algo support, grad acc, optional grad ckpt (#9015 ) * Add factorization utils for lokr * Add lokr train impl * Add loha train impl * Add adapter map for algo selection * Add optional grad ckpt and algo selection * Update __init__.py * correct key name for loha * Use custom fwd/bwd func and better init for loha * Support gradient accumulation * Fix bugs of loha * use more stable init * Add OFT training * linting	2025-07-23 20:57:27 -04:00
chaObserv	e729a5cc11	Separate denoised and noise estimation in Euler CFG++ (#9008 ) This will change their behavior with the sampling CONST type. It also combines euler_cfg_pp and euler_ancestral_cfg_pp into one main function.	2025-07-23 19:47:05 -04:00
comfyanonymous	e78d230496	Only enable cuda malloc on cuda torch. (#9031 )	2025-07-23 19:37:43 -04:00
comfyanonymous	d3504e1778	Enable pytorch attention by default for gfx1201 on torch 2.8 (#9029 )	2025-07-23 19:21:29 -04:00
comfyanonymous	a86a58c308	Fix xpu function not implemented p2. (#9027 )	2025-07-23 18:18:20 -04:00
comfyanonymous	39dda1d40d	Fix xpu function not implemented. (#9026 )	2025-07-23 18:10:59 -04:00
comfyanonymous	5ad33787de	Add default device argument. (#9023 )	2025-07-23 14:20:49 -04:00
Simon Lui	255f139863	Add xpu version for async offload and some other things. (#9004 )	2025-07-22 15:20:09 -04:00
comfyanonymous	5ac9ec214b	Try to fix line endings workflow. (#9001 )	2025-07-22 04:07:51 -04:00
comfyanonymous	0aa1c58b04	This is not needed. (#8991 )	2025-07-21 16:48:25 -04:00
comfyanonymous	5249e45a1c	Add hidream e1.1 example to readme. (#8990 )	2025-07-21 15:23:41 -04:00
comfyanonymous	54a45b9967	Replace torchaudio.load with pyav. (#8989 )	2025-07-21 14:19:14 -04:00
pythongosssss	7d5160f92c	Tidy	2025-06-01 15:45:15 +01:00
pythongosssss	7f7b3f1695	tidy	2025-06-01 15:41:00 +01:00
pythongosssss	9da6aca0d0	Add additional db model metadata fields and model downloading function	2025-06-01 15:32:13 +01:00
pythongosssss	1cb3c98947	Implement database & model hashing	2025-06-01 15:32:02 +01:00