Bring patches changes from _calc_cond_batch into _calc_cond_batch_multigpu

Merge branch 'master' into worksplit-multigpu
Satisfy ruff
2026-02-13 19:50:02 +00:00 · 2025-10-15 17:34:36 -07:00 · 2025-10-15 17:33:02 -07:00 · 2025-10-13 22:00:34 -07:00 · 2025-10-13 21:53:14 -07:00 · 2025-09-24 23:45:26 -07:00
120 changed files with 8685 additions and 10423 deletions
--- a/.ci/windows_nvidia_base_files/advanced/run_nvidia_gpu_disable_api_nodes.bat
+++ b/.ci/windows_nvidia_base_files/advanced/run_nvidia_gpu_disable_api_nodes.bat
@@ -1,3 +0,0 @@
-..\python_embeded\python.exe -s ..\ComfyUI\main.py --windows-standalone-build --disable-api-nodes
-echo If you see this and ComfyUI did not start try updating your Nvidia Drivers to the latest.
-pause
--- a/.ci/windows_nvidia_base_files/run_nvidia_gpu.bat
+++ b/.ci/windows_nvidia_base_files/run_nvidia_gpu.bat
@@ -1,3 +1,2 @@
 .\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build
-echo If you see this and ComfyUI did not start try updating your Nvidia Drivers to the latest.
 pause
--- a/.ci/windows_nvidia_base_files/run_nvidia_gpu_fast_fp16_accumulation.bat
+++ b/.ci/windows_nvidia_base_files/run_nvidia_gpu_fast_fp16_accumulation.bat
@@ -1,3 +1,2 @@
 .\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --fast fp16_accumulation
-echo If you see this and ComfyUI did not start try updating your Nvidia Drivers to the latest.
 pause
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@@ -8,15 +8,13 @@ body:
        Before submitting a **Bug Report**, please ensure the following:

        - **1:** You are running the latest version of ComfyUI.
-        - **2:** You have your ComfyUI logs and relevant workflow on hand and will post them in this bug report.
+        - **2:** You have looked at the existing bug reports and made sure this isn't already reported.
        - **3:** You confirmed that the bug is not caused by a custom node. You can disable all custom nodes by passing
-        `--disable-all-custom-nodes` command line argument. If you have custom node try updating them to the latest version.
+        `--disable-all-custom-nodes` command line argument.
        - **4:** This is an actual bug in ComfyUI, not just a support question. A bug is when you can specify exact
        steps to replicate what went wrong and others will be able to repeat your steps and see the same issue happen.

-        ## Very Important
-
-        Please make sure that you post ALL your ComfyUI logs in the bug report. A bug report without logs will likely be ignored.
+        If unsure, ask on the [ComfyUI Matrix Space](https://app.element.io/#/room/%23comfyui_space%3Amatrix.org) or the [Comfy Org Discord](https://discord.gg/comfyorg) first.
  - type: checkboxes
    id: custom-nodes-test
    attributes:
--- a/.github/PULL_REQUEST_TEMPLATE/api-node.md
+++ b/.github/PULL_REQUEST_TEMPLATE/api-node.md
@@ -1,21 +0,0 @@
-<!-- API_NODE_PR_CHECKLIST: do not remove -->
-
-## API Node PR Checklist
-
-### Scope
- [ ] **Is API Node Change**
-
-### Pricing & Billing
- [ ] **Need pricing update**
- [ ] **No pricing update**
-
-If **Need pricing update**:
- [ ] Metronome rate cards updated
- [ ] Auto‑billing tests updated and passing
-
-### QA
- [ ] **QA done**
- [ ] **QA not required**
-
-### Comms
- [ ] Informed **Kosinkadink**
--- a/.github/workflows/api-node-template.yml
+++ b/.github/workflows/api-node-template.yml
@@ -1,58 +0,0 @@
-name: Append API Node PR template
-
-on:
-  pull_request_target:
-    types: [opened, reopened, synchronize, ready_for_review]
-    paths:
-      - 'comfy_api_nodes/**'   # only run if these files changed
-
-permissions:
-  contents: read
-  pull-requests: write
-
-jobs:
-  inject:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Ensure template exists and append to PR body
-        uses: actions/github-script@v7
-        with:
-          script: |
-            const { owner, repo } = context.repo;
-            const number = context.payload.pull_request.number;
-            const templatePath = '.github/PULL_REQUEST_TEMPLATE/api-node.md';
-            const marker = '<!-- API_NODE_PR_CHECKLIST: do not remove -->';
-
-            const { data: pr } = await github.rest.pulls.get({ owner, repo, pull_number: number });
-
-            let templateText;
-            try {
-              const res = await github.rest.repos.getContent({
-                owner,
-                repo,
-                path: templatePath,
-                ref: pr.base.ref
-              });
-              const buf = Buffer.from(res.data.content, res.data.encoding || 'base64');
-              templateText = buf.toString('utf8');
-            } catch (e) {
-              core.setFailed(`Required PR template not found at "${templatePath}" on ${pr.base.ref}. Please add it to the repo.`);
-              return;
-            }
-
-            // Enforce the presence of the marker inside the template (for idempotence)
-            if (!templateText.includes(marker)) {
-              core.setFailed(`Template at "${templatePath}" does not contain the required marker:\n${marker}\nAdd it so we can detect duplicates safely.`);
-              return;
-            }
-
-            // If the PR already contains the marker, do not append again.
-            const body = pr.body || '';
-            if (body.includes(marker)) {
-              core.info('Template already present in PR body; nothing to inject.');
-              return;
-            }
-
-            const newBody = (body ? body + '\n\n' : '') + templateText + '\n';
-            await github.rest.pulls.update({ owner, repo, pull_number: number, body: newBody });
-            core.notice('API Node template appended to PR description.');
--- a/.github/workflows/release-stable-all.yml
+++ b/.github/workflows/release-stable-all.yml
@@ -14,13 +14,13 @@ jobs:
      contents: "write"
      packages: "write"
      pull-requests: "read"
-    name: "Release NVIDIA Default (cu130)"
+    name: "Release NVIDIA Default (cu129)"
    uses: ./.github/workflows/stable-release.yml
    with:
      git_tag: ${{ inputs.git_tag }}
-      cache_tag: "cu130"
+      cache_tag: "cu129"
      python_minor: "13"
-      python_patch: "9"
+      python_patch: "6"
      rel_name: "nvidia"
      rel_extra_name: ""
      test_release: true
@@ -43,23 +43,6 @@ jobs:
      test_release: true
    secrets: inherit

-  release_nvidia_cu126:
-    permissions:
-      contents: "write"
-      packages: "write"
-      pull-requests: "read"
-    name: "Release NVIDIA cu126"
-    uses: ./.github/workflows/stable-release.yml
-    with:
-      git_tag: ${{ inputs.git_tag }}
-      cache_tag: "cu126"
-      python_minor: "12"
-      python_patch: "10"
-      rel_name: "nvidia"
-      rel_extra_name: "_cu126"
-      test_release: true
-    secrets: inherit
-
  release_amd_rocm:
    permissions:
      contents: "write"
--- a/.github/workflows/test-ci.yml
+++ b/.github/workflows/test-ci.yml
@@ -21,15 +21,14 @@ jobs:
      fail-fast: false
      matrix:
        # os: [macos, linux, windows]
-        # os: [macos, linux]
-        os: [linux]
-        python_version: ["3.10", "3.11", "3.12"]
+        os: [macos, linux]
+        python_version: ["3.9", "3.10", "3.11", "3.12"]
        cuda_version: ["12.1"]
        torch_version: ["stable"]
        include:
-          # - os: macos
-          #   runner_label: [self-hosted, macOS]
-          #   flags: "--use-pytorch-cross-attention"
+          - os: macos
+            runner_label: [self-hosted, macOS]
+            flags: "--use-pytorch-cross-attention"
          - os: linux
            runner_label: [self-hosted, Linux]
            flags: ""
@@ -74,15 +73,14 @@ jobs:
    strategy:
      fail-fast: false
      matrix:
-        # os: [macos, linux]
-        os: [linux]
+        os: [macos, linux]
        python_version: ["3.11"]
        cuda_version: ["12.1"]
        torch_version: ["nightly"]
        include:
-          # - os: macos
-          #   runner_label: [self-hosted, macOS]
-          #   flags: "--use-pytorch-cross-attention"
+          - os: macos
+            runner_label: [self-hosted, macOS]
+            flags: "--use-pytorch-cross-attention"
          - os: linux
            runner_label: [self-hosted, Linux]
            flags: ""
--- a/.github/workflows/windows_release_dependencies.yml
+++ b/.github/workflows/windows_release_dependencies.yml
@@ -17,7 +17,7 @@ on:
        description: 'cuda version'
        required: true
        type: string
-        default: "130"
+        default: "129"

      python_minor:
        description: 'python minor version'
@@ -29,7 +29,7 @@ on:
        description: 'python patch version'
        required: true
        type: string
-        default: "9"
+        default: "6"
 #  push:
 #    branches:
 #      - master
--- a/QUANTIZATION.md
+++ b/QUANTIZATION.md
@@ -1,168 +0,0 @@
-# The Comfy guide to Quantization
-
-
-## How does quantization work?
-
-Quantization aims to map a high-precision value x_f to a lower precision format with minimal loss in accuracy. These smaller formats then serve to reduce the models memory footprint and increase throughput by using specialized hardware.
-
-When simply converting a value from FP16 to FP8 using the round-nearest method we might hit two issues:
- The dynamic range of FP16 (-65,504, 65,504) far exceeds FP8 formats like E4M3 (-448, 448) or E5M2 (-57,344, 57,344), potentially resulting in clipped values
- The original values are concentrated in a small range (e.g. -1,1) leaving many FP8-bits "unused"
-
-By using a scaling factor, we aim to map these values into the quantized-dtype range, making use of the full spectrum. One of the easiest approaches, and common, is using per-tensor absolute-maximum scaling.
-
-```
-absmax = max(abs(tensor))
-scale = amax / max_dynamic_range_low_precision
-
-# Quantization
-tensor_q = (tensor / scale).to(low_precision_dtype)
-
-# De-Quantization
-tensor_dq = tensor_q.to(fp16) * scale
-
-tensor_dq ~ tensor
-```
-
-Given that additional information (scaling factor) is needed to "interpret" the quantized values, we describe those as derived datatypes.
-
-
-## Quantization in Comfy
-
-```
-QuantizedTensor (torch.Tensor subclass)
-  ↓ __torch_dispatch__
-Two-Level Registry (generic + layout handlers)
-  ↓
-MixedPrecisionOps + Metadata Detection
-```
-
-### Representation
-
-To represent these derived datatypes, ComfyUI uses a subclass of torch.Tensor to implements these using the `QuantizedTensor` class found in `comfy/quant_ops.py`
-
-A `Layout` class defines how a specific quantization format behaves:
- Required parameters
- Quantize method
- De-Quantize method
-
-```python
-from comfy.quant_ops import QuantizedLayout
-
-class MyLayout(QuantizedLayout):
-    @classmethod
-    def quantize(cls, tensor, **kwargs):
-        # Convert to quantized format
-        qdata = ...
-        params = {'scale': ..., 'orig_dtype': tensor.dtype}
-        return qdata, params
-    
-    @staticmethod
-    def dequantize(qdata, scale, orig_dtype, **kwargs):
-        return qdata.to(orig_dtype) * scale
-```
-
-To then run operations using these QuantizedTensors we use two registry systems to define supported operations. 
-The first is a **generic registry** that handles operations common to all quantized formats (e.g., `.to()`, `.clone()`, `.reshape()`).
-
-The second registry is layout-specific and allows to implement fast-paths like nn.Linear.
-```python
-from comfy.quant_ops import register_layout_op
-
-@register_layout_op(torch.ops.aten.linear.default, MyLayout)
-def my_linear(func, args, kwargs):
-    # Extract tensors, call optimized kernel
-    ...
-```
-When `torch.nn.functional.linear()` is called with QuantizedTensor arguments, `__torch_dispatch__` automatically routes to the registered implementation.
-For any unsupported operation, QuantizedTensor will fallback to call `dequantize` and dispatch using the high-precision implementation.
-
-
-### Mixed Precision
-
-The `MixedPrecisionOps` class (lines 542-648 in `comfy/ops.py`) enables per-layer quantization decisions, allowing different layers in a model to use different precisions. This is activated when a model config contains a `layer_quant_config` dictionary that specifies which layers should be quantized and how.
-
-**Architecture:**
-
-```python
-class MixedPrecisionOps(disable_weight_init):
-    _layer_quant_config = {}  # Maps layer names to quantization configs
-    _compute_dtype = torch.bfloat16  # Default compute / dequantize precision
-```
-
-**Key mechanism:**
-
-The custom `Linear._load_from_state_dict()` method inspects each layer during model loading:
- If the layer name is **not** in `_layer_quant_config`: load weight as regular tensor in `_compute_dtype`
- If the layer name **is** in `_layer_quant_config`: 
-  - Load weight as `QuantizedTensor` with the specified layout (e.g., `TensorCoreFP8Layout`)
-  - Load associated quantization parameters (scales, block_size, etc.)
-
-**Why it's needed:**
-
-Not all layers tolerate quantization equally. Sensitive operations like final projections can be kept in higher precision, while compute-heavy matmuls are quantized. This provides most of the performance benefits while maintaining quality.
-
-The system is selected in `pick_operations()` when `model_config.layer_quant_config` is present, making it the highest-priority operation mode.
-
-
-## Checkpoint Format
-
-Quantized checkpoints are stored as standard safetensors files with quantized weight tensors and associated scaling parameters, plus a `_quantization_metadata` JSON entry describing the quantization scheme.
-
-The quantized checkpoint will contain the same layers as the original checkpoint but:
- The weights are stored as quantized values, sometimes using a different storage datatype. E.g. uint8 container for fp8.
- For each quantized weight a number of additional scaling parameters are stored alongside depending on the recipe.
- We store a metadata.json in the metadata of the final safetensor containing the `_quantization_metadata` describing which layers are quantized and what layout has been used.
-
-### Scaling Parameters details
-We define 4 possible scaling parameters that should cover most recipes in the near-future:
- **weight_scale**: quantization scalers for the weights
- **weight_scale_2**: global scalers in the context of double scaling
- **pre_quant_scale**: scalers used for smoothing salient weights
- **input_scale**: quantization scalers for the activations
-
-| Format | Storage dtype | weight_scale | weight_scale_2 | pre_quant_scale | input_scale |
-|--------|---------------|--------------|----------------|-----------------|-------------|
-| float8_e4m3fn | float32 | float32 (scalar) | - | - | float32 (scalar) |
-
-You can find the defined formats in `comfy/quant_ops.py` (QUANT_ALGOS).
-
-### Quantization Metadata
-
-The metadata stored alongside the checkpoint contains:
- **format_version**: String to define a version of the standard
- **layers**: A dictionary mapping layer names to their quantization format. The format string maps to the definitions found in `QUANT_ALGOS`. 
-
-Example:
-```json
-{
-  "_quantization_metadata": {
-    "format_version": "1.0",
-    "layers": {
-      "model.layers.0.mlp.up_proj": "float8_e4m3fn",
-      "model.layers.0.mlp.down_proj": "float8_e4m3fn",
-      "model.layers.1.mlp.up_proj": "float8_e4m3fn"
-    }
-  }
-}
-```
-
-
-## Creating Quantized Checkpoints
-
-To create compatible checkpoints, use any quantization tool provided the output follows the checkpoint format described above and uses a layout defined in `QUANT_ALGOS`.
-
-### Weight Quantization
-
-Weight quantization is straightforward - compute the scaling factor directly from the weight tensor using the absolute maximum method described earlier. Each layer's weights are quantized independently and stored with their corresponding `weight_scale` parameter.
-
-### Calibration (for Activation Quantization)
-
-Activation quantization (e.g., for FP8 Tensor Core operations) requires `input_scale` parameters that cannot be determined from static weights alone. Since activation values depend on actual inputs, we use **post-training calibration (PTQ)**:
-
-1. **Collect statistics**: Run inference on N representative samples
-2. **Track activations**: Record the absolute maximum (`amax`) of inputs to each quantized layer
-3. **Compute scales**: Derive `input_scale` from collected statistics
-4. **Store in checkpoint**: Save `input_scale` parameters alongside weights
-
-The calibration dataset should be representative of your target use case. For diffusion models, this typically means a diverse set of prompts and generation parameters.
--- a/README.md
+++ b/README.md
@@ -112,11 +112,10 @@ Workflow examples can be found on the [Examples page](https://comfyanonymous.git

 ## Release Process

-ComfyUI follows a weekly release cycle targeting Monday but this regularly changes because of model releases or large changes to the codebase. There are three interconnected repositories:
+ComfyUI follows a weekly release cycle targeting Friday but this regularly changes because of model releases or large changes to the codebase. There are three interconnected repositories:

 1. **[ComfyUI Core](https://github.com/comfyanonymous/ComfyUI)**
-   - Releases a new stable version (e.g., v0.7.0) roughly every week.
-   - Commits outside of the stable release tags may be very unstable and break many custom nodes.
+   - Releases a new stable version (e.g., v0.7.0)
   - Serves as the foundation for the desktop release

 2. **[ComfyUI Desktop](https://github.com/Comfy-Org/desktop)**
@@ -173,19 +172,15 @@ There is a portable standalone build for Windows that should work for running on

 ### [Direct link to download](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia.7z)

-Simply download, extract with [7-Zip](https://7-zip.org) or with the windows explorer on recent windows versions and run. For smaller models you normally only need to put the checkpoints (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints but many of the larger models have multiple files. Make sure to follow the instructions to know which subfolder to put them in ComfyUI\models\
+Simply download, extract with [7-Zip](https://7-zip.org) and run. Make sure you put your Stable Diffusion checkpoints/models (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints

 If you have trouble extracting it, right click the file -> properties -> unblock

-Update your Nvidia drivers if it doesn't start.
-
 #### Alternative Downloads:

 [Experimental portable for AMD GPUs](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_amd.7z)

-[Portable with pytorch cuda 12.8 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu128.7z).
-
-[Portable with pytorch cuda 12.6 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu126.7z) (Supports Nvidia 10 series and older GPUs).
+[Portable with pytorch cuda 12.8 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu128.7z) (Supports Nvidia 10 series and older GPUs).

 #### How do I share models between another UI and ComfyUI?

@@ -202,12 +197,10 @@ comfy install

 ## Manual Install (Windows, Linux)

-Python 3.14 works but you may encounter issues with the torch compile node. The free threaded variant is still missing some dependencies.
+Python 3.14 will work if you comment out the `kornia` dependency in the requirements.txt file (breaks the canny node) and install pytorch nightly but it is not recommended.

 Python 3.13 is very well supported. If you have trouble with some custom node dependencies on 3.13 you can try 3.12

-### Instructions:
-
 Git clone this repo.

 Put your SD checkpoints (the huge ckpt/safetensors files) in: models/checkpoints
@@ -223,7 +216,7 @@ AMD users can install rocm and pytorch with pip if you don't have it already ins

 This is the command to install the nightly with ROCm 7.0 which might have some performance improvements:

-```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.1```
+```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.0```


 ### AMD GPUs (Experimental: Windows and Linux), RDNA 3, 3.5 and 4 only.
@@ -244,7 +237,7 @@ RDNA 4 (RX 9000 series):

 ### Intel GPUs (Windows and Linux)

-Intel Arc GPU users can install native PyTorch with torch.xpu support using pip. More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)
+(Option 1) Intel Arc GPU users can install native PyTorch with torch.xpu support using pip. More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)

 1. To install PyTorch xpu, use the following command:

@@ -254,6 +247,10 @@ This is the command to install the Pytorch xpu nightly which might have some per

 ```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu```

+(Option 2) Alternatively, Intel GPUs supported by Intel Extension for PyTorch (IPEX) can leverage IPEX for improved performance.
+
+1. visit [Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu) for more information.
+
 ### NVIDIA

 Nvidia users should install stable pytorch using this command:
--- a/app/frontend_management.py
+++ b/app/frontend_management.py
@@ -10,8 +10,7 @@ import importlib
 from dataclasses import dataclass
 from functools import cached_property
 from pathlib import Path
-from typing import Dict, TypedDict, Optional
-from aiohttp import web
+from typing import TypedDict, Optional
 from importlib.metadata import version

 import requests
@@ -258,54 +257,7 @@ comfyui-frontend-package is not installed.
            sys.exit(-1)

    @classmethod
-    def template_asset_map(cls) -> Optional[Dict[str, str]]:
-        """Return a mapping of template asset names to their absolute paths."""
-        try:
-            from comfyui_workflow_templates import (
-                get_asset_path,
-                iter_templates,
-            )
-        except ImportError:
-            logging.error(
-                f"""
-********** ERROR ***********
-
-comfyui-workflow-templates is not installed.
-
-{frontend_install_warning_message()}
-
-********** ERROR ***********
-""".strip()
-            )
-            return None
-
-        try:
-            template_entries = list(iter_templates())
-        except Exception as exc:
-            logging.error(f"Failed to enumerate workflow templates: {exc}")
-            return None
-
-        asset_map: Dict[str, str] = {}
-        try:
-            for entry in template_entries:
-                for asset in entry.assets:
-                    asset_map[asset.filename] = get_asset_path(
-                        entry.template_id, asset.filename
-                    )
-        except Exception as exc:
-            logging.error(f"Failed to resolve template asset paths: {exc}")
-            return None
-
-        if not asset_map:
-            logging.error("No workflow template assets found. Did the packages install correctly?")
-            return None
-
-        return asset_map
-
-
-    @classmethod
-    def legacy_templates_path(cls) -> Optional[str]:
-        """Return the legacy templates directory shipped inside the meta package."""
+    def templates_path(cls) -> str:
        try:
            import comfyui_workflow_templates

@@ -324,7 +276,6 @@ comfyui-workflow-templates is not installed.
 ********** ERROR ***********
 """.strip()
            )
-            return None

    @classmethod
    def embedded_docs_path(cls) -> str:
@@ -441,17 +392,3 @@ comfyui-workflow-templates is not installed.
            logging.info("Falling back to the default frontend.")
            check_frontend_version()
            return cls.default_frontend_path()
-    @classmethod
-    def template_asset_handler(cls):
-        assets = cls.template_asset_map()
-        if not assets:
-            return None
-
-        async def serve_template(request: web.Request) -> web.StreamResponse:
-            rel_path = request.match_info.get("path", "")
-            target = assets.get(rel_path)
-            if target is None:
-                raise web.HTTPNotFound()
-            return web.FileResponse(target)
-
-        return serve_template
--- a/app/subgraph_manager.py
+++ b/app/subgraph_manager.py
@@ -1,112 +0,0 @@
-from __future__ import annotations
-
-from typing import TypedDict
-import os
-import folder_paths
-import glob
-from aiohttp import web
-import hashlib
-
-
-class Source:
-    custom_node = "custom_node"
-
-class SubgraphEntry(TypedDict):
-    source: str
-    """
-    Source of subgraph - custom_nodes vs templates.
-    """
-    path: str
-    """
-    Relative path of the subgraph file.
-    For custom nodes, will be the relative directory like <custom_node_dir>/subgraphs/<name>.json
-    """
-    name: str
-    """
-    Name of subgraph file.
-    """
-    info: CustomNodeSubgraphEntryInfo
-    """
-    Additional info about subgraph; in the case of custom_nodes, will contain nodepack name
-    """
-    data: str
-
-class CustomNodeSubgraphEntryInfo(TypedDict):
-    node_pack: str
-    """Node pack name."""
-
-class SubgraphManager:
-    def __init__(self):
-        self.cached_custom_node_subgraphs: dict[SubgraphEntry] | None = None
-
-    async def load_entry_data(self, entry: SubgraphEntry):
-        with open(entry['path'], 'r') as f:
-            entry['data'] = f.read()
-        return entry
-
-    async def sanitize_entry(self, entry: SubgraphEntry | None, remove_data=False) -> SubgraphEntry | None:
-        if entry is None:
-            return None
-        entry = entry.copy()
-        entry.pop('path', None)
-        if remove_data:
-            entry.pop('data', None)
-        return entry
-
-    async def sanitize_entries(self, entries: dict[str, SubgraphEntry], remove_data=False) -> dict[str, SubgraphEntry]:
-        entries = entries.copy()
-        for key in list(entries.keys()):
-            entries[key] = await self.sanitize_entry(entries[key], remove_data)
-        return entries
-
-    async def get_custom_node_subgraphs(self, loadedModules, force_reload=False):
-        # if not forced to reload and cached, return cache
-        if not force_reload and self.cached_custom_node_subgraphs is not None:
-            return self.cached_custom_node_subgraphs
-        # Load subgraphs from custom nodes
-        subfolder = "subgraphs"
-        subgraphs_dict: dict[SubgraphEntry] = {}
-
-        for folder in folder_paths.get_folder_paths("custom_nodes"):
-            pattern = os.path.join(folder, f"*/{subfolder}/*.json")
-            matched_files = glob.glob(pattern)
-            for file in matched_files:
-                # replace backslashes with forward slashes
-                file = file.replace('\\', '/')
-                info: CustomNodeSubgraphEntryInfo = {
-                    "node_pack": "custom_nodes." + file.split('/')[-3]
-                }
-                source = Source.custom_node
-                # hash source + path to make sure id will be as unique as possible, but
-                # reproducible across backend reloads
-                id = hashlib.sha256(f"{source}{file}".encode()).hexdigest()
-                entry: SubgraphEntry = {
-                    "source": Source.custom_node,
-                    "name": os.path.splitext(os.path.basename(file))[0],
-                    "path": file,
-                    "info": info,
-                }
-                subgraphs_dict[id] = entry
-        self.cached_custom_node_subgraphs = subgraphs_dict
-        return subgraphs_dict
-
-    async def get_custom_node_subgraph(self, id: str, loadedModules):
-        subgraphs = await self.get_custom_node_subgraphs(loadedModules)
-        entry: SubgraphEntry = subgraphs.get(id, None)
-        if entry is not None and entry.get('data', None) is None:
-            await self.load_entry_data(entry)
-        return entry
-
-    def add_routes(self, routes, loadedModules):
-        @routes.get("/global_subgraphs")
-        async def get_global_subgraphs(request):
-            subgraphs_dict = await self.get_custom_node_subgraphs(loadedModules)
-            # NOTE: we may want to include other sources of global subgraphs such as templates in the future;
-            # that's the reasoning for the current implementation
-            return web.json_response(await self.sanitize_entries(subgraphs_dict, remove_data=True))
-
-        @routes.get("/global_subgraphs/{id}")
-        async def get_global_subgraph(request):
-            id = request.match_info.get("id", None)
-            subgraph = await self.get_custom_node_subgraph(id, loadedModules)
-            return web.json_response(await self.sanitize_entry(subgraph))
--- a/comfy/cli_args.py
+++ b/comfy/cli_args.py
@@ -49,7 +49,7 @@ parser.add_argument("--temp-directory", type=str, default=None, help="Set the Co
 parser.add_argument("--input-directory", type=str, default=None, help="Set the ComfyUI input directory. Overrides --base-directory.")
 parser.add_argument("--auto-launch", action="store_true", help="Automatically launch ComfyUI in the default browser.")
 parser.add_argument("--disable-auto-launch", action="store_true", help="Disable auto launching the browser.")
-parser.add_argument("--cuda-device", type=int, default=None, metavar="DEVICE_ID", help="Set the id of the cuda device this instance will use. All other devices will not be visible.")
+parser.add_argument("--cuda-device", type=str, default=None, metavar="DEVICE_ID", help="Set the ids of cuda devices this instance will use. All other devices will not be visible.")
 parser.add_argument("--default-device", type=int, default=None, metavar="DEFAULT_DEVICE_ID", help="Set the id of the default device, all other devices will stay visible.")
 cm_group = parser.add_mutually_exclusive_group()
 cm_group.add_argument("--cuda-malloc", action="store_true", help="Enable cudaMallocAsync (enabled by default for torch 2.0 and up).")
@@ -105,7 +105,6 @@ cache_group = parser.add_mutually_exclusive_group()
 cache_group.add_argument("--cache-classic", action="store_true", help="Use the old style (aggressive) caching.")
 cache_group.add_argument("--cache-lru", type=int, default=0, help="Use LRU caching with a maximum of N node results cached. May use more RAM/VRAM.")
 cache_group.add_argument("--cache-none", action="store_true", help="Reduced RAM/VRAM usage at the expense of executing every node for each run.")
-cache_group.add_argument("--cache-ram", nargs='?', const=4.0, type=float, default=0, help="Use RAM pressure caching with the specified headroom threshold. If available RAM drops below the threhold the cache remove large items to free RAM. Default 4GB")

 attn_group = parser.add_mutually_exclusive_group()
 attn_group.add_argument("--use-split-cross-attention", action="store_true", help="Use the split cross attention optimization. Ignored when xformers is used.")
@@ -146,9 +145,7 @@ class PerformanceFeature(enum.Enum):
    CublasOps = "cublas_ops"
    AutoTune = "autotune"

-parser.add_argument("--fast", nargs="*", type=PerformanceFeature, help="Enable some untested and potentially quality deteriorating optimizations. This is used to test new features so using it might crash your comfyui. --fast with no arguments enables everything. You can pass a list specific optimizations if you only want to enable specific ones. Current valid optimizations: {}".format(" ".join(map(lambda c: c.value, PerformanceFeature))))
-
-parser.add_argument("--disable-pinned-memory", action="store_true", help="Disable pinned memory use.")
+parser.add_argument("--fast", nargs="*", type=PerformanceFeature, help="Enable some untested and potentially quality deteriorating optimizations. --fast with no arguments enables everything. You can pass a list specific optimizations if you only want to enable specific ones. Current valid optimizations: {}".format(" ".join(map(lambda c: c.value, PerformanceFeature))))

 parser.add_argument("--mmap-torch-files", action="store_true", help="Use mmap when loading ckpt/pt files.")
 parser.add_argument("--disable-mmap", action="store_true", help="Don't use mmap when loading safetensors.")
--- a/comfy/controlnet.py
+++ b/comfy/controlnet.py
@@ -15,13 +15,14 @@
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <https://www.gnu.org/licenses/>.
 """
-
+from __future__ import annotations

 import torch
 from enum import Enum
 import math
 import os
 import logging
+import copy
 import comfy.utils
 import comfy.model_management
 import comfy.model_detection
@@ -38,7 +39,7 @@ import comfy.ldm.hydit.controlnet
 import comfy.ldm.flux.controlnet
 import comfy.ldm.qwen_image.controlnet
 import comfy.cldm.dit_embedder
-from typing import TYPE_CHECKING
+from typing import TYPE_CHECKING, Union
 if TYPE_CHECKING:
    from comfy.hooks import HookGroup

@@ -64,6 +65,18 @@ class StrengthType(Enum):
    CONSTANT = 1
    LINEAR_UP = 2

+class ControlIsolation:
+    '''Temporarily set a ControlBase object's previous_controlnet to None to prevent cascading calls.'''
+    def __init__(self, control: ControlBase):
+        self.control = control
+        self.orig_previous_controlnet = control.previous_controlnet
+
+    def __enter__(self):
+        self.control.previous_controlnet = None
+
+    def __exit__(self, *args):
+        self.control.previous_controlnet = self.orig_previous_controlnet
+
 class ControlBase:
    def __init__(self):
        self.cond_hint_original = None
@@ -77,7 +90,7 @@ class ControlBase:
        self.compression_ratio = 8
        self.upscale_algorithm = 'nearest-exact'
        self.extra_args = {}
-        self.previous_controlnet = None
+        self.previous_controlnet: Union[ControlBase, None] = None
        self.extra_conds = []
        self.strength_type = StrengthType.CONSTANT
        self.concat_mask = False
@@ -85,6 +98,7 @@ class ControlBase:
        self.extra_concat = None
        self.extra_hooks: HookGroup = None
        self.preprocess_image = lambda a: a
+        self.multigpu_clones: dict[torch.device, ControlBase] = {}

    def set_cond_hint(self, cond_hint, strength=1.0, timestep_percent_range=(0.0, 1.0), vae=None, extra_concat=[]):
        self.cond_hint_original = cond_hint
@@ -111,17 +125,38 @@ class ControlBase:
    def cleanup(self):
        if self.previous_controlnet is not None:
            self.previous_controlnet.cleanup()
-
+        for device_cnet in self.multigpu_clones.values():
+            with ControlIsolation(device_cnet):
+                device_cnet.cleanup()
        self.cond_hint = None
        self.extra_concat = None
        self.timestep_range = None

    def get_models(self):
        out = []
+        for device_cnet in self.multigpu_clones.values():
+            out += device_cnet.get_models_only_self()
        if self.previous_controlnet is not None:
            out += self.previous_controlnet.get_models()
        return out

+    def get_models_only_self(self):
+        'Calls get_models, but temporarily sets previous_controlnet to None.'
+        with ControlIsolation(self):
+            return self.get_models()
+
+    def get_instance_for_device(self, device):
+        'Returns instance of this Control object intended for selected device.'
+        return self.multigpu_clones.get(device, self)
+
+    def deepclone_multigpu(self, load_device, autoregister=False):
+        '''
+        Create deep clone of Control object where model(s) is set to other devices.
+
+        When autoregister is set to True, the deep clone is also added to multigpu_clones dict.
+        '''
+        raise NotImplementedError("Classes inheriting from ControlBase should define their own deepclone_multigpu funtion.")
+
    def get_extra_hooks(self):
        out = []
        if self.extra_hooks is not None:
@@ -130,7 +165,7 @@ class ControlBase:
            out += self.previous_controlnet.get_extra_hooks()
        return out

-    def copy_to(self, c):
+    def copy_to(self, c: ControlBase):
        c.cond_hint_original = self.cond_hint_original
        c.strength = self.strength
        c.timestep_percent_range = self.timestep_percent_range
@@ -284,6 +319,14 @@ class ControlNet(ControlBase):
        self.copy_to(c)
        return c

+    def deepclone_multigpu(self, load_device, autoregister=False):
+        c = self.copy()
+        c.control_model = copy.deepcopy(c.control_model)
+        c.control_model_wrapped = comfy.model_patcher.ModelPatcher(c.control_model, load_device=load_device, offload_device=comfy.model_management.unet_offload_device())
+        if autoregister:
+            self.multigpu_clones[load_device] = c
+        return c
+
    def get_models(self):
        out = super().get_models()
        out.append(self.control_model_wrapped)
@@ -310,13 +353,11 @@ class ControlLoraOps:
            self.bias = None

        def forward(self, input):
-            weight, bias, offload_stream = comfy.ops.cast_bias_weight(self, input, offloadable=True)
+            weight, bias = comfy.ops.cast_bias_weight(self, input)
            if self.up is not None:
-                x = torch.nn.functional.linear(input, weight + (torch.mm(self.up.flatten(start_dim=1), self.down.flatten(start_dim=1))).reshape(self.weight.shape).type(input.dtype), bias)
+                return torch.nn.functional.linear(input, weight + (torch.mm(self.up.flatten(start_dim=1), self.down.flatten(start_dim=1))).reshape(self.weight.shape).type(input.dtype), bias)
            else:
-                x = torch.nn.functional.linear(input, weight, bias)
-            comfy.ops.uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
+                return torch.nn.functional.linear(input, weight, bias)

    class Conv2d(torch.nn.Module, comfy.ops.CastWeightBiasOp):
        def __init__(
@@ -352,13 +393,12 @@ class ControlLoraOps:


        def forward(self, input):
-            weight, bias, offload_stream = comfy.ops.cast_bias_weight(self, input, offloadable=True)
+            weight, bias = comfy.ops.cast_bias_weight(self, input)
            if self.up is not None:
-                x = torch.nn.functional.conv2d(input, weight + (torch.mm(self.up.flatten(start_dim=1), self.down.flatten(start_dim=1))).reshape(self.weight.shape).type(input.dtype), bias, self.stride, self.padding, self.dilation, self.groups)
+                return torch.nn.functional.conv2d(input, weight + (torch.mm(self.up.flatten(start_dim=1), self.down.flatten(start_dim=1))).reshape(self.weight.shape).type(input.dtype), bias, self.stride, self.padding, self.dilation, self.groups)
            else:
-                x = torch.nn.functional.conv2d(input, weight, bias, self.stride, self.padding, self.dilation, self.groups)
-            comfy.ops.uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
+                return torch.nn.functional.conv2d(input, weight, bias, self.stride, self.padding, self.dilation, self.groups)
+

 class ControlLora(ControlNet):
    def __init__(self, control_weights, global_average_pooling=False, model_options={}): #TODO? model_options
@@ -832,6 +872,14 @@ class T2IAdapter(ControlBase):
        self.copy_to(c)
        return c

+    def deepclone_multigpu(self, load_device, autoregister=False):
+        c = self.copy()
+        c.t2i_model = copy.deepcopy(c.t2i_model)
+        c.device = load_device
+        if autoregister:
+            self.multigpu_clones[load_device] = c
+        return c
+
 def load_t2i_adapter(t2i_data, model_options={}): #TODO: model_options
    compression_ratio = 8
    upscale_algorithm = 'nearest-exact'
--- a/comfy/latent_formats.py
+++ b/comfy/latent_formats.py
@@ -611,66 +611,6 @@ class HunyuanImage21Refiner(LatentFormat):
    latent_dimensions = 3
    scale_factor = 1.03682

-    def process_in(self, latent):
-        out = latent * self.scale_factor
-        out = torch.cat((out[:, :, :1], out), dim=2)
-        out = out.permute(0, 2, 1, 3, 4)
-        b, f_times_2, c, h, w = out.shape
-        out = out.reshape(b, f_times_2 // 2, 2 * c, h, w)
-        out = out.permute(0, 2, 1, 3, 4).contiguous()
-        return out
-
-    def process_out(self, latent):
-        z = latent / self.scale_factor
-        z = z.permute(0, 2, 1, 3, 4)
-        b, f, c, h, w = z.shape
-        z = z.reshape(b, f, 2, c // 2, h, w)
-        z = z.permute(0, 1, 2, 3, 4, 5).reshape(b, f * 2, c // 2, h, w)
-        z = z.permute(0, 2, 1, 3, 4)
-        z = z[:, :, 1:]
-        return z
-
-class HunyuanVideo15(LatentFormat):
-    latent_rgb_factors = [
-        [ 0.0568, -0.0521, -0.0131],
-        [ 0.0014,  0.0735,  0.0326],
-        [ 0.0186,  0.0531, -0.0138],
-        [-0.0031,  0.0051,  0.0288],
-        [ 0.0110,  0.0556,  0.0432],
-        [-0.0041, -0.0023, -0.0485],
-        [ 0.0530,  0.0413,  0.0253],
-        [ 0.0283,  0.0251,  0.0339],
-        [ 0.0277, -0.0372, -0.0093],
-        [ 0.0393,  0.0944,  0.1131],
-        [ 0.0020,  0.0251,  0.0037],
-        [-0.0017,  0.0012,  0.0234],
-        [ 0.0468,  0.0436,  0.0203],
-        [ 0.0354,  0.0439, -0.0233],
-        [ 0.0090,  0.0123,  0.0346],
-        [ 0.0382,  0.0029,  0.0217],
-        [ 0.0261, -0.0300,  0.0030],
-        [-0.0088, -0.0220, -0.0283],
-        [-0.0272, -0.0121, -0.0363],
-        [-0.0664, -0.0622,  0.0144],
-        [ 0.0414,  0.0479,  0.0529],
-        [ 0.0355,  0.0612, -0.0247],
-        [ 0.0147,  0.0264,  0.0174],
-        [ 0.0438,  0.0038,  0.0542],
-        [ 0.0431, -0.0573, -0.0033],
-        [-0.0162, -0.0211, -0.0406],
-        [-0.0487, -0.0295, -0.0393],
-        [ 0.0005, -0.0109,  0.0253],
-        [ 0.0296,  0.0591,  0.0353],
-        [ 0.0119,  0.0181, -0.0306],
-        [-0.0085, -0.0362,  0.0229],
-        [ 0.0005, -0.0106,  0.0242]
-    ]
-
-    latent_rgb_factors_bias = [ 0.0456, -0.0202, -0.0644]
-    latent_channels = 32
-    latent_dimensions = 3
-    scale_factor = 1.03682
-
 class Hunyuan3Dv2(LatentFormat):
    latent_channels = 64
    latent_dimensions = 1
--- a/comfy/ldm/chroma/layers.py
+++ b/comfy/ldm/chroma/layers.py
@@ -1,15 +1,15 @@
 import torch
 from torch import Tensor, nn

+from comfy.ldm.flux.math import attention
 from comfy.ldm.flux.layers import (
    MLPEmbedder,
    RMSNorm,
+    QKNorm,
+    SelfAttention,
    ModulationOut,
 )

-# TODO: remove this in a few months
-SingleStreamBlock = None
-DoubleStreamBlock = None


 class ChromaModulationOut(ModulationOut):
@@ -48,6 +48,124 @@ class Approximator(nn.Module):
        return x


+class DoubleStreamBlock(nn.Module):
+    def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False, flipped_img_txt=False, dtype=None, device=None, operations=None):
+        super().__init__()
+
+        mlp_hidden_dim = int(hidden_size * mlp_ratio)
+        self.num_heads = num_heads
+        self.hidden_size = hidden_size
+        self.img_norm1 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
+        self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias, dtype=dtype, device=device, operations=operations)
+
+        self.img_norm2 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
+        self.img_mlp = nn.Sequential(
+            operations.Linear(hidden_size, mlp_hidden_dim, bias=True, dtype=dtype, device=device),
+            nn.GELU(approximate="tanh"),
+            operations.Linear(mlp_hidden_dim, hidden_size, bias=True, dtype=dtype, device=device),
+        )
+
+        self.txt_norm1 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
+        self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias, dtype=dtype, device=device, operations=operations)
+
+        self.txt_norm2 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
+        self.txt_mlp = nn.Sequential(
+            operations.Linear(hidden_size, mlp_hidden_dim, bias=True, dtype=dtype, device=device),
+            nn.GELU(approximate="tanh"),
+            operations.Linear(mlp_hidden_dim, hidden_size, bias=True, dtype=dtype, device=device),
+        )
+        self.flipped_img_txt = flipped_img_txt
+
+    def forward(self, img: Tensor, txt: Tensor, pe: Tensor, vec: Tensor, attn_mask=None, transformer_options={}):
+        (img_mod1, img_mod2), (txt_mod1, txt_mod2) = vec
+
+        # prepare image for attention
+        img_modulated = torch.addcmul(img_mod1.shift, 1 + img_mod1.scale, self.img_norm1(img))
+        img_qkv = self.img_attn.qkv(img_modulated)
+        img_q, img_k, img_v = img_qkv.view(img_qkv.shape[0], img_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
+        img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)
+
+        # prepare txt for attention
+        txt_modulated = torch.addcmul(txt_mod1.shift, 1 + txt_mod1.scale, self.txt_norm1(txt))
+        txt_qkv = self.txt_attn.qkv(txt_modulated)
+        txt_q, txt_k, txt_v = txt_qkv.view(txt_qkv.shape[0], txt_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
+        txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)
+
+        # run actual attention
+        attn = attention(torch.cat((txt_q, img_q), dim=2),
+                         torch.cat((txt_k, img_k), dim=2),
+                         torch.cat((txt_v, img_v), dim=2),
+                         pe=pe, mask=attn_mask, transformer_options=transformer_options)
+
+        txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
+
+        # calculate the img bloks
+        img.addcmul_(img_mod1.gate, self.img_attn.proj(img_attn))
+        img.addcmul_(img_mod2.gate, self.img_mlp(torch.addcmul(img_mod2.shift, 1 + img_mod2.scale, self.img_norm2(img))))
+
+        # calculate the txt bloks
+        txt.addcmul_(txt_mod1.gate, self.txt_attn.proj(txt_attn))
+        txt.addcmul_(txt_mod2.gate, self.txt_mlp(torch.addcmul(txt_mod2.shift, 1 + txt_mod2.scale, self.txt_norm2(txt))))
+
+        if txt.dtype == torch.float16:
+            txt = torch.nan_to_num(txt, nan=0.0, posinf=65504, neginf=-65504)
+
+        return img, txt
+
+
+class SingleStreamBlock(nn.Module):
+    """
+    A DiT block with parallel linear layers as described in
+    https://arxiv.org/abs/2302.05442 and adapted modulation interface.
+    """
+
+    def __init__(
+        self,
+        hidden_size: int,
+        num_heads: int,
+        mlp_ratio: float = 4.0,
+        qk_scale: float = None,
+        dtype=None,
+        device=None,
+        operations=None
+    ):
+        super().__init__()
+        self.hidden_dim = hidden_size
+        self.num_heads = num_heads
+        head_dim = hidden_size // num_heads
+        self.scale = qk_scale or head_dim**-0.5
+
+        self.mlp_hidden_dim = int(hidden_size * mlp_ratio)
+        # qkv and mlp_in
+        self.linear1 = operations.Linear(hidden_size, hidden_size * 3 + self.mlp_hidden_dim, dtype=dtype, device=device)
+        # proj and mlp_out
+        self.linear2 = operations.Linear(hidden_size + self.mlp_hidden_dim, hidden_size, dtype=dtype, device=device)
+
+        self.norm = QKNorm(head_dim, dtype=dtype, device=device, operations=operations)
+
+        self.hidden_size = hidden_size
+        self.pre_norm = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
+
+        self.mlp_act = nn.GELU(approximate="tanh")
+
+    def forward(self, x: Tensor, pe: Tensor, vec: Tensor, attn_mask=None, transformer_options={}) -> Tensor:
+        mod = vec
+        x_mod = torch.addcmul(mod.shift, 1 + mod.scale, self.pre_norm(x))
+        qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)
+
+        q, k, v = qkv.view(qkv.shape[0], qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
+        q, k = self.norm(q, k, v)
+
+        # compute attention
+        attn = attention(q, k, v, pe=pe, mask=attn_mask, transformer_options=transformer_options)
+        # compute activation in mlp stream, cat again and run second linear layer
+        output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
+        x.addcmul_(mod.gate, output)
+        if x.dtype == torch.float16:
+            x = torch.nan_to_num(x, nan=0.0, posinf=65504, neginf=-65504)
+        return x
+
+
 class LastLayer(nn.Module):
    def __init__(self, hidden_size: int, patch_size: int, out_channels: int, dtype=None, device=None, operations=None):
        super().__init__()
--- a/comfy/ldm/chroma/model.py
+++ b/comfy/ldm/chroma/model.py
@@ -11,12 +11,12 @@ import comfy.ldm.common_dit
 from comfy.ldm.flux.layers import (
    EmbedND,
    timestep_embedding,
-    DoubleStreamBlock,
-    SingleStreamBlock,
 )

 from .layers import (
+    DoubleStreamBlock,
    LastLayer,
+    SingleStreamBlock,
    Approximator,
    ChromaModulationOut,
 )
@@ -90,7 +90,6 @@ class Chroma(nn.Module):
                    self.num_heads,
                    mlp_ratio=params.mlp_ratio,
                    qkv_bias=params.qkv_bias,
-                    modulation=False,
                    dtype=dtype, device=device, operations=operations
                )
                for _ in range(params.depth)
@@ -99,7 +98,7 @@ class Chroma(nn.Module):

        self.single_blocks = nn.ModuleList(
            [
-                SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=params.mlp_ratio, modulation=False, dtype=dtype, device=device, operations=operations)
+                SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=params.mlp_ratio, dtype=dtype, device=device, operations=operations)
                for _ in range(params.depth_single_blocks)
            ]
        )
--- a/comfy/ldm/chroma_radiance/model.py
+++ b/comfy/ldm/chroma_radiance/model.py
@@ -10,10 +10,12 @@ from torch import Tensor, nn
 from einops import repeat
 import comfy.ldm.common_dit

-from comfy.ldm.flux.layers import EmbedND, DoubleStreamBlock, SingleStreamBlock
+from comfy.ldm.flux.layers import EmbedND

 from comfy.ldm.chroma.model import Chroma, ChromaParams
 from comfy.ldm.chroma.layers import (
+    DoubleStreamBlock,
+    SingleStreamBlock,
    Approximator,
 )
 from .layers import (
@@ -87,6 +89,7 @@ class ChromaRadiance(Chroma):
                    dtype=dtype, device=device, operations=operations
                )

+
        self.double_blocks = nn.ModuleList(
            [
                DoubleStreamBlock(
@@ -94,7 +97,6 @@ class ChromaRadiance(Chroma):
                    self.num_heads,
                    mlp_ratio=params.mlp_ratio,
                    qkv_bias=params.qkv_bias,
-                    modulation=False,
                    dtype=dtype, device=device, operations=operations
                )
                for _ in range(params.depth)
@@ -107,7 +109,6 @@ class ChromaRadiance(Chroma):
                    self.hidden_size,
                    self.num_heads,
                    mlp_ratio=params.mlp_ratio,
-                    modulation=False,
                    dtype=dtype, device=device, operations=operations,
                )
                for _ in range(params.depth_single_blocks)
@@ -188,15 +189,15 @@ class ChromaRadiance(Chroma):
        nerf_pixels = nn.functional.unfold(img_orig, kernel_size=patch_size, stride=patch_size)
        nerf_pixels = nerf_pixels.transpose(1, 2) # -> [B, NumPatches, C * P * P]

-        # Reshape for per-patch processing
-        nerf_hidden = img_out.reshape(B * num_patches, params.hidden_size)
-        nerf_pixels = nerf_pixels.reshape(B * num_patches, C, patch_size**2).transpose(1, 2)
-
        if params.nerf_tile_size > 0 and num_patches > params.nerf_tile_size:
            # Enable tiling if nerf_tile_size isn't 0 and we actually have more patches than
            # the tile size.
-            img_dct = self.forward_tiled_nerf(nerf_hidden, nerf_pixels, B, C, num_patches, patch_size, params)
+            img_dct = self.forward_tiled_nerf(img_out, nerf_pixels, B, C, num_patches, patch_size, params)
        else:
+            # Reshape for per-patch processing
+            nerf_hidden = img_out.reshape(B * num_patches, params.hidden_size)
+            nerf_pixels = nerf_pixels.reshape(B * num_patches, C, patch_size**2).transpose(1, 2)
+
            # Get DCT-encoded pixel embeddings [pixel-dct]
            img_dct = self.nerf_image_embedder(nerf_pixels)

@@ -239,8 +240,17 @@ class ChromaRadiance(Chroma):
            end = min(i + tile_size, num_patches)

            # Slice the current tile from the input tensors
-            nerf_hidden_tile = nerf_hidden[i * batch:end * batch]
-            nerf_pixels_tile = nerf_pixels[i * batch:end * batch]
+            nerf_hidden_tile = nerf_hidden[:, i:end, :]
+            nerf_pixels_tile = nerf_pixels[:, i:end, :]
+
+            # Get the actual number of patches in this tile (can be smaller for the last tile)
+            num_patches_tile = nerf_hidden_tile.shape[1]
+
+            # Reshape the tile for per-patch processing
+            # [B, NumPatches_tile, D] -> [B * NumPatches_tile, D]
+            nerf_hidden_tile = nerf_hidden_tile.reshape(batch * num_patches_tile, params.hidden_size)
+            # [B, NumPatches_tile, C*P*P] -> [B*NumPatches_tile, C, P*P] -> [B*NumPatches_tile, P*P, C]
+            nerf_pixels_tile = nerf_pixels_tile.reshape(batch * num_patches_tile, channels, patch_size**2).transpose(1, 2)

            # get DCT-encoded pixel embeddings [pixel-dct]
            img_dct_tile = self.nerf_image_embedder(nerf_pixels_tile)
--- a/comfy/ldm/flux/layers.py
+++ b/comfy/ldm/flux/layers.py
@@ -130,17 +130,13 @@ def apply_mod(tensor, m_mult, m_add=None, modulation_dims=None):


 class DoubleStreamBlock(nn.Module):
-    def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False, flipped_img_txt=False, modulation=True, dtype=None, device=None, operations=None):
+    def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False, flipped_img_txt=False, dtype=None, device=None, operations=None):
        super().__init__()

        mlp_hidden_dim = int(hidden_size * mlp_ratio)
        self.num_heads = num_heads
        self.hidden_size = hidden_size
-        self.modulation = modulation
-
-        if self.modulation:
-            self.img_mod = Modulation(hidden_size, double=True, dtype=dtype, device=device, operations=operations)
-
+        self.img_mod = Modulation(hidden_size, double=True, dtype=dtype, device=device, operations=operations)
        self.img_norm1 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
        self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias, dtype=dtype, device=device, operations=operations)

@@ -151,9 +147,7 @@ class DoubleStreamBlock(nn.Module):
            operations.Linear(mlp_hidden_dim, hidden_size, bias=True, dtype=dtype, device=device),
        )

-        if self.modulation:
-            self.txt_mod = Modulation(hidden_size, double=True, dtype=dtype, device=device, operations=operations)
-
+        self.txt_mod = Modulation(hidden_size, double=True, dtype=dtype, device=device, operations=operations)
        self.txt_norm1 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
        self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias, dtype=dtype, device=device, operations=operations)

@@ -166,65 +160,46 @@ class DoubleStreamBlock(nn.Module):
        self.flipped_img_txt = flipped_img_txt

    def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor, attn_mask=None, modulation_dims_img=None, modulation_dims_txt=None, transformer_options={}):
-        if self.modulation:
-            img_mod1, img_mod2 = self.img_mod(vec)
-            txt_mod1, txt_mod2 = self.txt_mod(vec)
-        else:
-            (img_mod1, img_mod2), (txt_mod1, txt_mod2) = vec
+        img_mod1, img_mod2 = self.img_mod(vec)
+        txt_mod1, txt_mod2 = self.txt_mod(vec)

        # prepare image for attention
        img_modulated = self.img_norm1(img)
        img_modulated = apply_mod(img_modulated, (1 + img_mod1.scale), img_mod1.shift, modulation_dims_img)
        img_qkv = self.img_attn.qkv(img_modulated)
-        del img_modulated
        img_q, img_k, img_v = img_qkv.view(img_qkv.shape[0], img_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
-        del img_qkv
        img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)

        # prepare txt for attention
        txt_modulated = self.txt_norm1(txt)
        txt_modulated = apply_mod(txt_modulated, (1 + txt_mod1.scale), txt_mod1.shift, modulation_dims_txt)
        txt_qkv = self.txt_attn.qkv(txt_modulated)
-        del txt_modulated
        txt_q, txt_k, txt_v = txt_qkv.view(txt_qkv.shape[0], txt_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
-        del txt_qkv
        txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)

        if self.flipped_img_txt:
-            q = torch.cat((img_q, txt_q), dim=2)
-            del img_q, txt_q
-            k = torch.cat((img_k, txt_k), dim=2)
-            del img_k, txt_k
-            v = torch.cat((img_v, txt_v), dim=2)
-            del img_v, txt_v
            # run actual attention
-            attn = attention(q, k, v,
+            attn = attention(torch.cat((img_q, txt_q), dim=2),
+                             torch.cat((img_k, txt_k), dim=2),
+                             torch.cat((img_v, txt_v), dim=2),
                             pe=pe, mask=attn_mask, transformer_options=transformer_options)
-            del q, k, v

            img_attn, txt_attn = attn[:, : img.shape[1]], attn[:, img.shape[1]:]
        else:
-            q = torch.cat((txt_q, img_q), dim=2)
-            del txt_q, img_q
-            k = torch.cat((txt_k, img_k), dim=2)
-            del txt_k, img_k
-            v = torch.cat((txt_v, img_v), dim=2)
-            del txt_v, img_v
            # run actual attention
-            attn = attention(q, k, v,
+            attn = attention(torch.cat((txt_q, img_q), dim=2),
+                             torch.cat((txt_k, img_k), dim=2),
+                             torch.cat((txt_v, img_v), dim=2),
                             pe=pe, mask=attn_mask, transformer_options=transformer_options)
-            del q, k, v

            txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1]:]

        # calculate the img bloks
-        img += apply_mod(self.img_attn.proj(img_attn), img_mod1.gate, None, modulation_dims_img)
-        del img_attn
-        img += apply_mod(self.img_mlp(apply_mod(self.img_norm2(img), (1 + img_mod2.scale), img_mod2.shift, modulation_dims_img)), img_mod2.gate, None, modulation_dims_img)
+        img = img + apply_mod(self.img_attn.proj(img_attn), img_mod1.gate, None, modulation_dims_img)
+        img = img + apply_mod(self.img_mlp(apply_mod(self.img_norm2(img), (1 + img_mod2.scale), img_mod2.shift, modulation_dims_img)), img_mod2.gate, None, modulation_dims_img)

        # calculate the txt bloks
        txt += apply_mod(self.txt_attn.proj(txt_attn), txt_mod1.gate, None, modulation_dims_txt)
-        del txt_attn
        txt += apply_mod(self.txt_mlp(apply_mod(self.txt_norm2(txt), (1 + txt_mod2.scale), txt_mod2.shift, modulation_dims_txt)), txt_mod2.gate, None, modulation_dims_txt)

        if txt.dtype == torch.float16:
@@ -245,7 +220,6 @@ class SingleStreamBlock(nn.Module):
        num_heads: int,
        mlp_ratio: float = 4.0,
        qk_scale: float = None,
-        modulation=True,
        dtype=None,
        device=None,
        operations=None
@@ -268,29 +242,19 @@ class SingleStreamBlock(nn.Module):
        self.pre_norm = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)

        self.mlp_act = nn.GELU(approximate="tanh")
-        if modulation:
-            self.modulation = Modulation(hidden_size, double=False, dtype=dtype, device=device, operations=operations)
-        else:
-            self.modulation = None
+        self.modulation = Modulation(hidden_size, double=False, dtype=dtype, device=device, operations=operations)

    def forward(self, x: Tensor, vec: Tensor, pe: Tensor, attn_mask=None, modulation_dims=None, transformer_options={}) -> Tensor:
-        if self.modulation:
-            mod, _ = self.modulation(vec)
-        else:
-            mod = vec
-
+        mod, _ = self.modulation(vec)
        qkv, mlp = torch.split(self.linear1(apply_mod(self.pre_norm(x), (1 + mod.scale), mod.shift, modulation_dims)), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)

        q, k, v = qkv.view(qkv.shape[0], qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
-        del qkv
        q, k = self.norm(q, k, v)

        # compute attention
        attn = attention(q, k, v, pe=pe, mask=attn_mask, transformer_options=transformer_options)
-        del q, k, v
        # compute activation in mlp stream, cat again and run second linear layer
-        mlp = self.mlp_act(mlp)
-        output = self.linear2(torch.cat((attn, mlp), 2))
+        output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
        x += apply_mod(output, mod.gate, None, modulation_dims)
        if x.dtype == torch.float16:
            x = torch.nan_to_num(x, nan=0.0, posinf=65504, neginf=-65504)
--- a/comfy/ldm/flux/math.py
+++ b/comfy/ldm/flux/math.py
@@ -7,8 +7,15 @@ import comfy.model_management


 def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor, mask=None, transformer_options={}) -> Tensor:
+    q_shape = q.shape
+    k_shape = k.shape
+
    if pe is not None:
-        q, k = apply_rope(q, k, pe)
+        q = q.to(dtype=pe.dtype).reshape(*q.shape[:-1], -1, 1, 2)
+        k = k.to(dtype=pe.dtype).reshape(*k.shape[:-1], -1, 1, 2)
+        q = (pe[..., 0] * q[..., 0] + pe[..., 1] * q[..., 1]).reshape(*q_shape).type_as(v)
+        k = (pe[..., 0] * k[..., 0] + pe[..., 1] * k[..., 1]).reshape(*k_shape).type_as(v)
+
    heads = q.shape[1]
    x = optimized_attention(q, k, v, heads, skip_reshape=True, mask=mask, transformer_options=transformer_options)
    return x
--- a/comfy/ldm/flux/model.py
+++ b/comfy/ldm/flux/model.py
@@ -210,7 +210,7 @@ class Flux(nn.Module):
        img = self.final_layer(img, vec)  # (N, T, patch_size ** 2 * out_channels)
        return img

-    def process_img(self, x, index=0, h_offset=0, w_offset=0, transformer_options={}):
+    def process_img(self, x, index=0, h_offset=0, w_offset=0):
        bs, c, h, w = x.shape
        patch_size = self.patch_size
        x = comfy.ldm.common_dit.pad_to_patch_size(x, (patch_size, patch_size))
@@ -222,22 +222,10 @@ class Flux(nn.Module):
        h_offset = ((h_offset + (patch_size // 2)) // patch_size)
        w_offset = ((w_offset + (patch_size // 2)) // patch_size)

-        steps_h = h_len
-        steps_w = w_len
-
-        rope_options = transformer_options.get("rope_options", None)
-        if rope_options is not None:
-            h_len = (h_len - 1.0) * rope_options.get("scale_y", 1.0) + 1.0
-            w_len = (w_len - 1.0) * rope_options.get("scale_x", 1.0) + 1.0
-
-            index += rope_options.get("shift_t", 0.0)
-            h_offset += rope_options.get("shift_y", 0.0)
-            w_offset += rope_options.get("shift_x", 0.0)
-
-        img_ids = torch.zeros((steps_h, steps_w, 3), device=x.device, dtype=x.dtype)
+        img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
        img_ids[:, :, 0] = img_ids[:, :, 1] + index
-        img_ids[:, :, 1] = img_ids[:, :, 1] + torch.linspace(h_offset, h_len - 1 + h_offset, steps=steps_h, device=x.device, dtype=x.dtype).unsqueeze(1)
-        img_ids[:, :, 2] = img_ids[:, :, 2] + torch.linspace(w_offset, w_len - 1 + w_offset, steps=steps_w, device=x.device, dtype=x.dtype).unsqueeze(0)
+        img_ids[:, :, 1] = img_ids[:, :, 1] + torch.linspace(h_offset, h_len - 1 + h_offset, steps=h_len, device=x.device, dtype=x.dtype).unsqueeze(1)
+        img_ids[:, :, 2] = img_ids[:, :, 2] + torch.linspace(w_offset, w_len - 1 + w_offset, steps=w_len, device=x.device, dtype=x.dtype).unsqueeze(0)
        return img, repeat(img_ids, "h w c -> b (h w) c", b=bs)

    def forward(self, x, timestep, context, y=None, guidance=None, ref_latents=None, control=None, transformer_options={}, **kwargs):
@@ -253,7 +241,7 @@ class Flux(nn.Module):

        h_len = ((h_orig + (patch_size // 2)) // patch_size)
        w_len = ((w_orig + (patch_size // 2)) // patch_size)
-        img, img_ids = self.process_img(x, transformer_options=transformer_options)
+        img, img_ids = self.process_img(x)
        img_tokens = img.shape[1]
        if ref_latents is not None:
            h = 0
--- a/comfy/ldm/hunyuan_video/model.py
+++ b/comfy/ldm/hunyuan_video/model.py
@@ -6,6 +6,7 @@ import comfy.ldm.flux.layers
 import comfy.ldm.modules.diffusionmodules.mmdit
 from comfy.ldm.modules.attention import optimized_attention

+
 from dataclasses import dataclass
 from einops import repeat

@@ -41,8 +42,6 @@ class HunyuanVideoParams:
    guidance_embed: bool
    byt5: bool
    meanflow: bool
-    use_cond_type_embedding: bool
-    vision_in_dim: int


 class SelfAttentionRef(nn.Module):
@@ -158,10 +157,7 @@ class TokenRefiner(nn.Module):
        t = self.t_embedder(timestep_embedding(timesteps, 256, time_factor=1.0).to(x.dtype))
        # m = mask.float().unsqueeze(-1)
        # c = (x.float() * m).sum(dim=1) / m.sum(dim=1) #TODO: the following works when the x.shape is the same length as the tokens but might break otherwise
-        if x.dtype == torch.float16:
-            c = x.float().sum(dim=1) / x.shape[1]
-        else:
-            c = x.sum(dim=1) / x.shape[1]
+        c = x.sum(dim=1) / x.shape[1]

        c = t + self.c_embedder(c.to(x.dtype))
        x = self.input_embedder(x)
@@ -200,15 +196,11 @@ class HunyuanVideo(nn.Module):
    def __init__(self, image_model=None, final_layer=True, dtype=None, device=None, operations=None, **kwargs):
        super().__init__()
        self.dtype = dtype
-        operation_settings = {"operations": operations, "device": device, "dtype": dtype}
-
        params = HunyuanVideoParams(**kwargs)
        self.params = params
        self.patch_size = params.patch_size
        self.in_channels = params.in_channels
        self.out_channels = params.out_channels
-        self.use_cond_type_embedding = params.use_cond_type_embedding
-        self.vision_in_dim = params.vision_in_dim
        if params.hidden_size % params.num_heads != 0:
            raise ValueError(
                f"Hidden size {params.hidden_size} must be divisible by num_heads {params.num_heads}"
@@ -274,18 +266,6 @@ class HunyuanVideo(nn.Module):
        if final_layer:
            self.final_layer = LastLayer(self.hidden_size, self.patch_size[-1], self.out_channels, dtype=dtype, device=device, operations=operations)

-        # HunyuanVideo 1.5 specific modules
-        if self.vision_in_dim is not None:
-            from comfy.ldm.wan.model import MLPProj
-            self.vision_in = MLPProj(in_dim=self.vision_in_dim, out_dim=self.hidden_size, operation_settings=operation_settings)
-        else:
-            self.vision_in = None
-        if self.use_cond_type_embedding:
-            # 0: text_encoder feature 1: byt5 feature 2: vision_encoder feature
-            self.cond_type_embedding = nn.Embedding(3, self.hidden_size)
-        else:
-            self.cond_type_embedding = None
-
    def forward_orig(
        self,
        img: Tensor,
@@ -296,7 +276,6 @@ class HunyuanVideo(nn.Module):
        timesteps: Tensor,
        y: Tensor = None,
        txt_byt5=None,
-        clip_fea=None,
        guidance: Tensor = None,
        guiding_frame_index=None,
        ref_latent=None,
@@ -352,31 +331,12 @@ class HunyuanVideo(nn.Module):

        txt = self.txt_in(txt, timesteps, txt_mask, transformer_options=transformer_options)

-        if self.cond_type_embedding is not None:
-            self.cond_type_embedding.to(txt.device)
-            cond_emb = self.cond_type_embedding(torch.zeros_like(txt[:, :, 0], device=txt.device, dtype=torch.long))
-            txt = txt + cond_emb.to(txt.dtype)
-
        if self.byt5_in is not None and txt_byt5 is not None:
            txt_byt5 = self.byt5_in(txt_byt5)
-            if self.cond_type_embedding is not None:
-                cond_emb = self.cond_type_embedding(torch.ones_like(txt_byt5[:, :, 0], device=txt_byt5.device, dtype=torch.long))
-                txt_byt5 = txt_byt5 + cond_emb.to(txt_byt5.dtype)
-                txt = torch.cat((txt_byt5, txt), dim=1) # byt5 first for HunyuanVideo1.5
-            else:
-                txt = torch.cat((txt, txt_byt5), dim=1)
            txt_byt5_ids = torch.zeros((txt_ids.shape[0], txt_byt5.shape[1], txt_ids.shape[-1]), device=txt_ids.device, dtype=txt_ids.dtype)
+            txt = torch.cat((txt, txt_byt5), dim=1)
            txt_ids = torch.cat((txt_ids, txt_byt5_ids), dim=1)

-        if clip_fea is not None:
-            txt_vision_states = self.vision_in(clip_fea)
-            if self.cond_type_embedding is not None:
-                cond_emb = self.cond_type_embedding(2 * torch.ones_like(txt_vision_states[:, :, 0], dtype=torch.long, device=txt_vision_states.device))
-                txt_vision_states = txt_vision_states + cond_emb
-            txt = torch.cat((txt_vision_states.to(txt.dtype), txt), dim=1)
-            extra_txt_ids = torch.zeros((txt_ids.shape[0], txt_vision_states.shape[1], txt_ids.shape[-1]), device=txt_ids.device, dtype=txt_ids.dtype)
-            txt_ids = torch.cat((txt_ids, extra_txt_ids), dim=1)
-
        ids = torch.cat((img_ids, txt_ids), dim=1)
        pe = self.pe_embedder(ids)

@@ -470,14 +430,14 @@ class HunyuanVideo(nn.Module):
        img_ids[:, :, 1] = img_ids[:, :, 1] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype).unsqueeze(0)
        return repeat(img_ids, "h w c -> b (h w) c", b=bs)

-    def forward(self, x, timestep, context, y=None, txt_byt5=None, clip_fea=None, guidance=None, attention_mask=None, guiding_frame_index=None, ref_latent=None, disable_time_r=False, control=None, transformer_options={}, **kwargs):
+    def forward(self, x, timestep, context, y=None, txt_byt5=None, guidance=None, attention_mask=None, guiding_frame_index=None, ref_latent=None, disable_time_r=False, control=None, transformer_options={}, **kwargs):
        return comfy.patcher_extension.WrapperExecutor.new_class_executor(
            self._forward,
            self,
            comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, transformer_options)
-        ).execute(x, timestep, context, y, txt_byt5, clip_fea, guidance, attention_mask, guiding_frame_index, ref_latent, disable_time_r, control, transformer_options, **kwargs)
+        ).execute(x, timestep, context, y, txt_byt5, guidance, attention_mask, guiding_frame_index, ref_latent, disable_time_r, control, transformer_options, **kwargs)

-    def _forward(self, x, timestep, context, y=None, txt_byt5=None, clip_fea=None, guidance=None, attention_mask=None, guiding_frame_index=None, ref_latent=None, disable_time_r=False, control=None, transformer_options={}, **kwargs):
+    def _forward(self, x, timestep, context, y=None, txt_byt5=None, guidance=None, attention_mask=None, guiding_frame_index=None, ref_latent=None, disable_time_r=False, control=None, transformer_options={}, **kwargs):
        bs = x.shape[0]
        if len(self.patch_size) == 3:
            img_ids = self.img_ids(x)
@@ -485,5 +445,5 @@ class HunyuanVideo(nn.Module):
        else:
            img_ids = self.img_ids_2d(x)
            txt_ids = torch.zeros((bs, context.shape[1], 2), device=x.device, dtype=x.dtype)
-        out = self.forward_orig(x, img_ids, context, txt_ids, attention_mask, timestep, y, txt_byt5, clip_fea, guidance, guiding_frame_index, ref_latent, disable_time_r=disable_time_r, control=control, transformer_options=transformer_options)
+        out = self.forward_orig(x, img_ids, context, txt_ids, attention_mask, timestep, y, txt_byt5, guidance, guiding_frame_index, ref_latent, disable_time_r=disable_time_r, control=control, transformer_options=transformer_options)
        return out
--- a/comfy/ldm/hunyuan_video/upsampler.py
+++ b/comfy/ldm/hunyuan_video/upsampler.py
@@ -1,120 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-from comfy.ldm.hunyuan_video.vae_refiner import RMS_norm, ResnetBlock, VideoConv3d
-import model_management, model_patcher
-
-class SRResidualCausalBlock3D(nn.Module):
-    def __init__(self, channels: int):
-        super().__init__()
-        self.block = nn.Sequential(
-            VideoConv3d(channels, channels, kernel_size=3),
-            nn.SiLU(inplace=True),
-            VideoConv3d(channels, channels, kernel_size=3),
-            nn.SiLU(inplace=True),
-            VideoConv3d(channels, channels, kernel_size=3),
-        )
-
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        return x + self.block(x)
-
-class SRModel3DV2(nn.Module):
-    def __init__(
-        self,
-        in_channels: int,
-        out_channels: int,
-        hidden_channels: int = 64,
-        num_blocks: int = 6,
-        global_residual: bool = False,
-    ):
-        super().__init__()
-        self.in_conv = VideoConv3d(in_channels, hidden_channels, kernel_size=3)
-        self.blocks = nn.ModuleList([SRResidualCausalBlock3D(hidden_channels) for _ in range(num_blocks)])
-        self.out_conv = VideoConv3d(hidden_channels, out_channels, kernel_size=3)
-        self.global_residual = bool(global_residual)
-
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        residual = x
-        y = self.in_conv(x)
-        for blk in self.blocks:
-            y = blk(y)
-        y = self.out_conv(y)
-        if self.global_residual and (y.shape == residual.shape):
-            y = y + residual
-        return y
-
-
-class Upsampler(nn.Module):
-    def __init__(
-        self,
-        z_channels: int,
-        out_channels: int,
-        block_out_channels: tuple[int, ...],
-        num_res_blocks: int = 2,
-    ):
-        super().__init__()
-        self.num_res_blocks = num_res_blocks
-        self.block_out_channels = block_out_channels
-        self.z_channels = z_channels
-
-        ch = block_out_channels[0]
-        self.conv_in = VideoConv3d(z_channels, ch, kernel_size=3)
-
-        self.up = nn.ModuleList()
-
-        for i, tgt in enumerate(block_out_channels):
-            stage = nn.Module()
-            stage.block = nn.ModuleList([ResnetBlock(in_channels=ch if j == 0 else tgt,
-                                                    out_channels=tgt,
-                                                    temb_channels=0,
-                                                    conv_shortcut=False,
-                                                    conv_op=VideoConv3d, norm_op=RMS_norm)
-                                        for j in range(num_res_blocks + 1)])
-            ch = tgt
-            self.up.append(stage)
-
-        self.norm_out = RMS_norm(ch)
-        self.conv_out = VideoConv3d(ch, out_channels, kernel_size=3)
-
-    def forward(self, z):
-        """
-        Args:
-            z: (B, C, T, H, W)
-            target_shape: (H, W)
-        """
-        # z to block_in
-        repeats = self.block_out_channels[0] // (self.z_channels)
-        x = self.conv_in(z) + z.repeat_interleave(repeats=repeats, dim=1)
-
-        # upsampling
-        for stage in self.up:
-            for blk in stage.block:
-                x = blk(x)
-
-        out = self.conv_out(F.silu(self.norm_out(x)))
-        return out
-
-UPSAMPLERS = {
-    "720p": SRModel3DV2,
-    "1080p": Upsampler,
-}
-
-class HunyuanVideo15SRModel():
-    def __init__(self, model_type, config):
-        self.load_device = model_management.vae_device()
-        offload_device = model_management.vae_offload_device()
-        self.dtype = model_management.vae_dtype(self.load_device)
-        self.model_class = UPSAMPLERS.get(model_type)
-        self.model = self.model_class(**config).eval()
-
-        self.patcher = model_patcher.ModelPatcher(self.model, load_device=self.load_device, offload_device=offload_device)
-
-    def load_sd(self, sd):
-        return self.model.load_state_dict(sd, strict=True)
-
-    def get_sd(self):
-        return self.model.state_dict()
-
-    def resample_latent(self, latent):
-        model_management.load_model_gpu(self.patcher)
-        return self.model(latent.to(self.load_device))
--- a/comfy/ldm/hunyuan_video/vae_refiner.py
+++ b/comfy/ldm/hunyuan_video/vae_refiner.py
@@ -4,40 +4,8 @@ import torch.nn.functional as F
 from comfy.ldm.modules.diffusionmodules.model import ResnetBlock, AttnBlock, VideoConv3d, Normalize
 import comfy.ops
 import comfy.ldm.models.autoencoder
-import comfy.model_management
 ops = comfy.ops.disable_weight_init

-class NoPadConv3d(nn.Module):
-    def __init__(self, n_channels, out_channels, kernel_size, stride=1, dilation=1, padding=0, **kwargs):
-        super().__init__()
-        self.conv = ops.Conv3d(n_channels, out_channels, kernel_size, stride=stride, dilation=dilation, **kwargs)
-
-    def forward(self, x):
-        return self.conv(x)
-
-
-def conv_carry_causal_3d(xl, op, conv_carry_in=None, conv_carry_out=None):
-
-    x = xl[0]
-    xl.clear()
-
-    if conv_carry_out is not None:
-        to_push = x[:, :, -2:, :, :].clone()
-        conv_carry_out.append(to_push)
-
-    if isinstance(op, NoPadConv3d):
-        if conv_carry_in is None:
-            x = torch.nn.functional.pad(x, (1, 1, 1, 1, 2, 0), mode = 'replicate')
-        else:
-            carry_len = conv_carry_in[0].shape[2]
-            x = torch.cat([conv_carry_in.pop(0), x], dim=2)
-            x = torch.nn.functional.pad(x, (1, 1, 1, 1, 2 - carry_len, 0), mode = 'replicate')
-
-    out = op(x)
-
-    return out
-
-
 class RMS_norm(nn.Module):
    def __init__(self, dim):
        super().__init__()
@@ -46,7 +14,7 @@ class RMS_norm(nn.Module):
        self.gamma = nn.Parameter(torch.empty(shape))

    def forward(self, x):
-        return F.normalize(x, dim=1) * self.scale * comfy.model_management.cast_to(self.gamma, dtype=x.dtype, device=x.device)
+        return F.normalize(x, dim=1) * self.scale * self.gamma

 class DnSmpl(nn.Module):
    def __init__(self, ic, oc, tds=True, refiner_vae=True, op=VideoConv3d):
@@ -59,12 +27,11 @@ class DnSmpl(nn.Module):
        self.tds = tds
        self.gs = fct * ic // oc

-    def forward(self, x, conv_carry_in=None, conv_carry_out=None):
+    def forward(self, x):
        r1 = 2 if self.tds else 1
-        h = conv_carry_causal_3d([x], self.conv, conv_carry_in, conv_carry_out)
-
-        if self.tds and self.refiner_vae and conv_carry_in is None:
+        h = self.conv(x)

+        if self.tds and self.refiner_vae:
            hf = h[:, :, :1, :, :]
            b, c, f, ht, wd = hf.shape
            hf = hf.reshape(b, c, f, ht // 2, 2, wd // 2, 2)
@@ -72,7 +39,14 @@ class DnSmpl(nn.Module):
            hf = hf.reshape(b, 2 * 2 * c, f, ht // 2, wd // 2)
            hf = torch.cat([hf, hf], dim=1)

-            h = h[:, :, 1:, :, :]
+            hn = h[:, :, 1:, :, :]
+            b, c, frms, ht, wd = hn.shape
+            nf = frms // r1
+            hn = hn.reshape(b, c, nf, r1, ht // 2, 2, wd // 2, 2)
+            hn = hn.permute(0, 3, 5, 7, 1, 2, 4, 6)
+            hn = hn.reshape(b, r1 * 2 * 2 * c, nf, ht // 2, wd // 2)
+
+            h = torch.cat([hf, hn], dim=2)

            xf = x[:, :, :1, :, :]
            b, ci, f, ht, wd = xf.shape
@@ -80,32 +54,34 @@ class DnSmpl(nn.Module):
            xf = xf.permute(0, 4, 6, 1, 2, 3, 5)
            xf = xf.reshape(b, 2 * 2 * ci, f, ht // 2, wd // 2)
            B, C, T, H, W = xf.shape
-            xf = xf.view(B, hf.shape[1], self.gs // 2, T, H, W).mean(dim=2)
+            xf = xf.view(B, h.shape[1], self.gs // 2, T, H, W).mean(dim=2)

-            x = x[:, :, 1:, :, :]
+            xn = x[:, :, 1:, :, :]
+            b, ci, frms, ht, wd = xn.shape
+            nf = frms // r1
+            xn = xn.reshape(b, ci, nf, r1, ht // 2, 2, wd // 2, 2)
+            xn = xn.permute(0, 3, 5, 7, 1, 2, 4, 6)
+            xn = xn.reshape(b, r1 * 2 * 2 * ci, nf, ht // 2, wd // 2)
+            B, C, T, H, W = xn.shape
+            xn = xn.view(B, h.shape[1], self.gs, T, H, W).mean(dim=2)
+            sc = torch.cat([xf, xn], dim=2)
+        else:
+            b, c, frms, ht, wd = h.shape

-        if h.shape[2] == 0:
-            return hf + xf
+            nf = frms // r1
+            h = h.reshape(b, c, nf, r1, ht // 2, 2, wd // 2, 2)
+            h = h.permute(0, 3, 5, 7, 1, 2, 4, 6)
+            h = h.reshape(b, r1 * 2 * 2 * c, nf, ht // 2, wd // 2)

-        b, c, frms, ht, wd = h.shape
-        nf = frms // r1
-        h = h.reshape(b, c, nf, r1, ht // 2, 2, wd // 2, 2)
-        h = h.permute(0, 3, 5, 7, 1, 2, 4, 6)
-        h = h.reshape(b, r1 * 2 * 2 * c, nf, ht // 2, wd // 2)
+            b, ci, frms, ht, wd = x.shape
+            nf = frms // r1
+            sc = x.reshape(b, ci, nf, r1, ht // 2, 2, wd // 2, 2)
+            sc = sc.permute(0, 3, 5, 7, 1, 2, 4, 6)
+            sc = sc.reshape(b, r1 * 2 * 2 * ci, nf, ht // 2, wd // 2)
+            B, C, T, H, W = sc.shape
+            sc = sc.view(B, h.shape[1], self.gs, T, H, W).mean(dim=2)

-        b, ci, frms, ht, wd = x.shape
-        nf = frms // r1
-        x = x.reshape(b, ci, nf, r1, ht // 2, 2, wd // 2, 2)
-        x = x.permute(0, 3, 5, 7, 1, 2, 4, 6)
-        x = x.reshape(b, r1 * 2 * 2 * ci, nf, ht // 2, wd // 2)
-        B, C, T, H, W = x.shape
-        x = x.view(B, h.shape[1], self.gs, T, H, W).mean(dim=2)
-
-        if self.tds and self.refiner_vae and conv_carry_in is None:
-            h = torch.cat([hf, h], dim=2)
-            x = torch.cat([xf, x], dim=2)
-
-        return h + x
+        return h + sc


 class UpSmpl(nn.Module):
@@ -118,11 +94,11 @@ class UpSmpl(nn.Module):
        self.tus = tus
        self.rp = fct * oc // ic

-    def forward(self, x, conv_carry_in=None, conv_carry_out=None):
+    def forward(self, x):
        r1 = 2 if self.tus else 1
-        h = conv_carry_causal_3d([x], self.conv, conv_carry_in, conv_carry_out)
+        h = self.conv(x)

-        if self.tus and self.refiner_vae and conv_carry_in is None:
+        if self.tus and self.refiner_vae:
            hf = h[:, :, :1, :, :]
            b, c, f, ht, wd = hf.shape
            nc = c // (2 * 2)
@@ -131,7 +107,14 @@ class UpSmpl(nn.Module):
            hf = hf.reshape(b, nc, f, ht * 2, wd * 2)
            hf = hf[:, : hf.shape[1] // 2]

-            h = h[:, :, 1:, :, :]
+            hn = h[:, :, 1:, :, :]
+            b, c, frms, ht, wd = hn.shape
+            nc = c // (r1 * 2 * 2)
+            hn = hn.reshape(b, r1, 2, 2, nc, frms, ht, wd)
+            hn = hn.permute(0, 4, 5, 1, 6, 2, 7, 3)
+            hn = hn.reshape(b, nc, frms * r1, ht * 2, wd * 2)
+
+            h = torch.cat([hf, hn], dim=2)

            xf = x[:, :, :1, :, :]
            b, ci, f, ht, wd = xf.shape
@@ -142,43 +125,29 @@ class UpSmpl(nn.Module):
            xf = xf.permute(0, 3, 4, 5, 1, 6, 2)
            xf = xf.reshape(b, nc, f, ht * 2, wd * 2)

-            x = x[:, :, 1:, :, :]
+            xn = x[:, :, 1:, :, :]
+            xn = xn.repeat_interleave(repeats=self.rp, dim=1)
+            b, c, frms, ht, wd = xn.shape
+            nc = c // (r1 * 2 * 2)
+            xn = xn.reshape(b, r1, 2, 2, nc, frms, ht, wd)
+            xn = xn.permute(0, 4, 5, 1, 6, 2, 7, 3)
+            xn = xn.reshape(b, nc, frms * r1, ht * 2, wd * 2)
+            sc = torch.cat([xf, xn], dim=2)
+        else:
+            b, c, frms, ht, wd = h.shape
+            nc = c // (r1 * 2 * 2)
+            h = h.reshape(b, r1, 2, 2, nc, frms, ht, wd)
+            h = h.permute(0, 4, 5, 1, 6, 2, 7, 3)
+            h = h.reshape(b, nc, frms * r1, ht * 2, wd * 2)

-        b, c, frms, ht, wd = h.shape
-        nc = c // (r1 * 2 * 2)
-        h = h.reshape(b, r1, 2, 2, nc, frms, ht, wd)
-        h = h.permute(0, 4, 5, 1, 6, 2, 7, 3)
-        h = h.reshape(b, nc, frms * r1, ht * 2, wd * 2)
+            sc = x.repeat_interleave(repeats=self.rp, dim=1)
+            b, c, frms, ht, wd = sc.shape
+            nc = c // (r1 * 2 * 2)
+            sc = sc.reshape(b, r1, 2, 2, nc, frms, ht, wd)
+            sc = sc.permute(0, 4, 5, 1, 6, 2, 7, 3)
+            sc = sc.reshape(b, nc, frms * r1, ht * 2, wd * 2)

-        x = x.repeat_interleave(repeats=self.rp, dim=1)
-        b, c, frms, ht, wd = x.shape
-        nc = c // (r1 * 2 * 2)
-        x = x.reshape(b, r1, 2, 2, nc, frms, ht, wd)
-        x = x.permute(0, 4, 5, 1, 6, 2, 7, 3)
-        x = x.reshape(b, nc, frms * r1, ht * 2, wd * 2)
-
-        if self.tus and self.refiner_vae and conv_carry_in is None:
-            h = torch.cat([hf, h], dim=2)
-            x = torch.cat([xf, x], dim=2)
-
-        return h + x
-
-class HunyuanRefinerResnetBlock(ResnetBlock):
-    def __init__(self, in_channels, out_channels, conv_op=NoPadConv3d, norm_op=RMS_norm):
-        super().__init__(in_channels=in_channels, out_channels=out_channels, temb_channels=0, conv_op=conv_op, norm_op=norm_op)
-
-    def forward(self, x, conv_carry_in=None, conv_carry_out=None):
-        h = x
-        h = [ self.swish(self.norm1(x)) ]
-        h = conv_carry_causal_3d(h, self.conv1, conv_carry_in=conv_carry_in, conv_carry_out=conv_carry_out)
-
-        h = [ self.dropout(self.swish(self.norm2(h))) ]
-        h = conv_carry_causal_3d(h, self.conv2, conv_carry_in=conv_carry_in, conv_carry_out=conv_carry_out)
-
-        if self.in_channels != self.out_channels:
-            x = self.nin_shortcut(x)
-
-        return x+h
+        return h + sc

 class Encoder(nn.Module):
    def __init__(self, in_channels, z_channels, block_out_channels, num_res_blocks,
@@ -191,7 +160,7 @@ class Encoder(nn.Module):

        self.refiner_vae = refiner_vae
        if self.refiner_vae:
-            conv_op = NoPadConv3d
+            conv_op = VideoConv3d
            norm_op = RMS_norm
        else:
            conv_op = ops.Conv3d
@@ -206,9 +175,10 @@ class Encoder(nn.Module):

        for i, tgt in enumerate(block_out_channels):
            stage = nn.Module()
-            stage.block = nn.ModuleList([HunyuanRefinerResnetBlock(in_channels=ch if j == 0 else tgt,
-                                                                   out_channels=tgt,
-                                                                   conv_op=conv_op, norm_op=norm_op)
+            stage.block = nn.ModuleList([ResnetBlock(in_channels=ch if j == 0 else tgt,
+                                                     out_channels=tgt,
+                                                     temb_channels=0,
+                                                     conv_op=conv_op, norm_op=norm_op)
                                        for j in range(num_res_blocks)])
            ch = tgt
            if i < depth:
@@ -218,9 +188,9 @@ class Encoder(nn.Module):
            self.down.append(stage)

        self.mid = nn.Module()
-        self.mid.block_1 = HunyuanRefinerResnetBlock(in_channels=ch, out_channels=ch, conv_op=conv_op, norm_op=norm_op)
+        self.mid.block_1 = ResnetBlock(in_channels=ch, out_channels=ch, temb_channels=0, conv_op=conv_op, norm_op=norm_op)
        self.mid.attn_1 = AttnBlock(ch, conv_op=ops.Conv3d, norm_op=norm_op)
-        self.mid.block_2 = HunyuanRefinerResnetBlock(in_channels=ch, out_channels=ch, conv_op=conv_op, norm_op=norm_op)
+        self.mid.block_2 = ResnetBlock(in_channels=ch, out_channels=ch, temb_channels=0, conv_op=conv_op, norm_op=norm_op)

        self.norm_out = norm_op(ch)
        self.conv_out = conv_op(ch, z_channels << 1, 3, 1, 1)
@@ -231,50 +201,31 @@ class Encoder(nn.Module):
        if not self.refiner_vae and x.shape[2] == 1:
            x = x.expand(-1, -1, self.ffactor_temporal, -1, -1)

-        if self.refiner_vae:
-            xl = [x[:, :, :1, :, :]]
-            if x.shape[2] > self.ffactor_temporal:
-                xl += torch.split(x[:, :, 1: 1 + ((x.shape[2] - 1) // self.ffactor_temporal) * self.ffactor_temporal, :, :], self.ffactor_temporal * 2, dim=2)
-            x = xl
-        else:
-            x = [x]
-        out = []
+        x = self.conv_in(x)

-        conv_carry_in = None
+        for stage in self.down:
+            for blk in stage.block:
+                x = blk(x)
+            if hasattr(stage, 'downsample'):
+                x = stage.downsample(x)

-        for i, x1 in enumerate(x):
-            conv_carry_out = []
-            if i == len(x) - 1:
-                conv_carry_out = None
-            x1 = [ x1 ]
-            x1 = conv_carry_causal_3d(x1, self.conv_in, conv_carry_in, conv_carry_out)
-
-            for stage in self.down:
-                for blk in stage.block:
-                    x1 = blk(x1, conv_carry_in, conv_carry_out)
-                if hasattr(stage, 'downsample'):
-                    x1 = stage.downsample(x1, conv_carry_in, conv_carry_out)
-
-            out.append(x1)
-            conv_carry_in = conv_carry_out
-
-        if len(out) > 1:
-            out = torch.cat(out, dim=2)
-        else:
-            out = out[0]
-
-        x = self.mid.block_2(self.mid.attn_1(self.mid.block_1(out)))
-        del out
+        x = self.mid.block_2(self.mid.attn_1(self.mid.block_1(x)))

        b, c, t, h, w = x.shape
        grp = c // (self.z_channels << 1)
        skip = x.view(b, c // grp, grp, t, h, w).mean(2)

-        out = conv_carry_causal_3d([F.silu(self.norm_out(x))], self.conv_out) + skip
+        out = self.conv_out(F.silu(self.norm_out(x))) + skip

        if self.refiner_vae:
            out = self.regul(out)[0]

+            out = torch.cat((out[:, :, :1], out), dim=2)
+            out = out.permute(0, 2, 1, 3, 4)
+            b, f_times_2, c, h, w = out.shape
+            out = out.reshape(b, f_times_2 // 2, 2 * c, h, w)
+            out = out.permute(0, 2, 1, 3, 4).contiguous()
+
        return out

 class Decoder(nn.Module):
@@ -288,7 +239,7 @@ class Decoder(nn.Module):

        self.refiner_vae = refiner_vae
        if self.refiner_vae:
-            conv_op = NoPadConv3d
+            conv_op = VideoConv3d
            norm_op = RMS_norm
        else:
            conv_op = ops.Conv3d
@@ -298,9 +249,9 @@ class Decoder(nn.Module):
        self.conv_in = conv_op(z_channels, ch, kernel_size=3, stride=1, padding=1)

        self.mid = nn.Module()
-        self.mid.block_1 = HunyuanRefinerResnetBlock(in_channels=ch, out_channels=ch, conv_op=conv_op, norm_op=norm_op)
+        self.mid.block_1 = ResnetBlock(in_channels=ch, out_channels=ch, temb_channels=0, conv_op=conv_op, norm_op=norm_op)
        self.mid.attn_1 = AttnBlock(ch, conv_op=ops.Conv3d, norm_op=norm_op)
-        self.mid.block_2 = HunyuanRefinerResnetBlock(in_channels=ch, out_channels=ch,  conv_op=conv_op, norm_op=norm_op)
+        self.mid.block_2 = ResnetBlock(in_channels=ch, out_channels=ch, temb_channels=0, conv_op=conv_op, norm_op=norm_op)

        self.up = nn.ModuleList()
        depth = (ffactor_spatial >> 1).bit_length()
@@ -308,9 +259,10 @@ class Decoder(nn.Module):

        for i, tgt in enumerate(block_out_channels):
            stage = nn.Module()
-            stage.block = nn.ModuleList([HunyuanRefinerResnetBlock(in_channels=ch if j == 0 else tgt,
-                                                                   out_channels=tgt,
-                                                                   conv_op=conv_op, norm_op=norm_op)
+            stage.block = nn.ModuleList([ResnetBlock(in_channels=ch if j == 0 else tgt,
+                                                     out_channels=tgt,
+                                                     temb_channels=0,
+                                                     conv_op=conv_op, norm_op=norm_op)
                                        for j in range(num_res_blocks + 1)])
            ch = tgt
            if i < depth:
@@ -323,41 +275,27 @@ class Decoder(nn.Module):
        self.conv_out = conv_op(ch, out_channels, 3, stride=1, padding=1)

    def forward(self, z):
-        x = conv_carry_causal_3d([z], self.conv_in) + z.repeat_interleave(self.block_out_channels[0] // self.z_channels, 1)
+        if self.refiner_vae:
+            z = z.permute(0, 2, 1, 3, 4)
+            b, f, c, h, w = z.shape
+            z = z.reshape(b, f, 2, c // 2, h, w)
+            z = z.permute(0, 1, 2, 3, 4, 5).reshape(b, f * 2, c // 2, h, w)
+            z = z.permute(0, 2, 1, 3, 4)
+            z = z[:, :, 1:]
+
+        x = self.conv_in(z) + z.repeat_interleave(self.block_out_channels[0] // self.z_channels, 1)
        x = self.mid.block_2(self.mid.attn_1(self.mid.block_1(x)))

-        if self.refiner_vae:
-            x = torch.split(x, 2, dim=2)
-        else:
-            x = [ x ]
-        out = []
+        for stage in self.up:
+            for blk in stage.block:
+                x = blk(x)
+            if hasattr(stage, 'upsample'):
+                x = stage.upsample(x)

-        conv_carry_in = None
-
-        for i, x1 in enumerate(x):
-            conv_carry_out = []
-            if i == len(x) - 1:
-                conv_carry_out = None
-            for stage in self.up:
-                for blk in stage.block:
-                    x1 = blk(x1, conv_carry_in, conv_carry_out)
-                if hasattr(stage, 'upsample'):
-                    x1 = stage.upsample(x1, conv_carry_in, conv_carry_out)
-
-            x1 = [ F.silu(self.norm_out(x1)) ]
-            x1 = conv_carry_causal_3d(x1, self.conv_out, conv_carry_in, conv_carry_out)
-            out.append(x1)
-            conv_carry_in = conv_carry_out
-        del x
-
-        if len(out) > 1:
-            out = torch.cat(out, dim=2)
-        else:
-            out = out[0]
+        out = self.conv_out(F.silu(self.norm_out(x)))

        if not self.refiner_vae:
            if z.shape[-3] == 1:
                out = out[:, :, -1:]

        return out
-
--- a/comfy/ldm/lightricks/model.py
+++ b/comfy/ldm/lightricks/model.py
@@ -3,11 +3,12 @@ from torch import nn
 import comfy.patcher_extension
 import comfy.ldm.modules.attention
 import comfy.ldm.common_dit
+from einops import rearrange
 import math
 from typing import Dict, Optional, Tuple

 from .symmetric_patchifier import SymmetricPatchifier, latent_to_pixel_coords
-from comfy.ldm.flux.math import apply_rope1
+

 def get_timestep_embedding(
    timesteps: torch.Tensor,
@@ -237,6 +238,20 @@ class FeedForward(nn.Module):
        return self.net(x)


+def apply_rotary_emb(input_tensor, freqs_cis): #TODO: remove duplicate funcs and pick the best/fastest one
+    cos_freqs = freqs_cis[0]
+    sin_freqs = freqs_cis[1]
+
+    t_dup = rearrange(input_tensor, "... (d r) -> ... d r", r=2)
+    t1, t2 = t_dup.unbind(dim=-1)
+    t_dup = torch.stack((-t2, t1), dim=-1)
+    input_tensor_rot = rearrange(t_dup, "... d r -> ... (d r)")
+
+    out = input_tensor * cos_freqs + input_tensor_rot * sin_freqs
+
+    return out
+
+
 class CrossAttention(nn.Module):
    def __init__(self, query_dim, context_dim=None, heads=8, dim_head=64, dropout=0., attn_precision=None, dtype=None, device=None, operations=None):
        super().__init__()
@@ -266,8 +281,8 @@ class CrossAttention(nn.Module):
        k = self.k_norm(k)

        if pe is not None:
-            q = apply_rope1(q.unsqueeze(1), pe).squeeze(1)
-            k = apply_rope1(k.unsqueeze(1), pe).squeeze(1)
+            q = apply_rotary_emb(q, pe)
+            k = apply_rotary_emb(k, pe)

        if mask is None:
            out = comfy.ldm.modules.attention.optimized_attention(q, k, v, self.heads, attn_precision=self.attn_precision, transformer_options=transformer_options)
@@ -291,17 +306,12 @@ class BasicTransformerBlock(nn.Module):
    def forward(self, x, context=None, attention_mask=None, timestep=None, pe=None, transformer_options={}):
        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (self.scale_shift_table[None, None].to(device=x.device, dtype=x.dtype) + timestep.reshape(x.shape[0], timestep.shape[1], self.scale_shift_table.shape[0], -1)).unbind(dim=2)

-        attn1_input = comfy.ldm.common_dit.rms_norm(x)
-        attn1_input = torch.addcmul(attn1_input, attn1_input, scale_msa).add_(shift_msa)
-        attn1_input = self.attn1(attn1_input, pe=pe, transformer_options=transformer_options)
-        x.addcmul_(attn1_input, gate_msa)
-        del attn1_input
+        x += self.attn1(comfy.ldm.common_dit.rms_norm(x) * (1 + scale_msa) + shift_msa, pe=pe, transformer_options=transformer_options) * gate_msa

        x += self.attn2(x, context=context, mask=attention_mask, transformer_options=transformer_options)

-        y = comfy.ldm.common_dit.rms_norm(x)
-        y = torch.addcmul(y, y, scale_mlp).add_(shift_mlp)
-        x.addcmul_(self.ff(y), gate_mlp)
+        y = comfy.ldm.common_dit.rms_norm(x) * (1 + scale_mlp) + shift_mlp
+        x += self.ff(y) * gate_mlp

        return x

@@ -317,35 +327,41 @@ def get_fractional_positions(indices_grid, max_pos):


 def precompute_freqs_cis(indices_grid, dim, out_dtype, theta=10000.0, max_pos=[20, 2048, 2048]):
-    dtype = torch.float32
-    device = indices_grid.device
+    dtype = torch.float32 #self.dtype

-    # Get fractional positions and compute frequency indices
    fractional_positions = get_fractional_positions(indices_grid, max_pos)
-    indices = theta ** torch.linspace(0, 1, dim // 6, device=device, dtype=dtype) * math.pi / 2

-    # Compute frequencies and apply cos/sin
-    freqs = (indices * (fractional_positions.unsqueeze(-1) * 2 - 1)).transpose(-1, -2).flatten(2)
-    cos_vals = freqs.cos().repeat_interleave(2, dim=-1)
-    sin_vals = freqs.sin().repeat_interleave(2, dim=-1)
+    start = 1
+    end = theta
+    device = fractional_positions.device

-    # Pad if dim is not divisible by 6
+    indices = theta ** (
+        torch.linspace(
+            math.log(start, theta),
+            math.log(end, theta),
+            dim // 6,
+            device=device,
+            dtype=dtype,
+        )
+    )
+    indices = indices.to(dtype=dtype)
+
+    indices = indices * math.pi / 2
+
+    freqs = (
+        (indices * (fractional_positions.unsqueeze(-1) * 2 - 1))
+        .transpose(-1, -2)
+        .flatten(2)
+    )
+
+    cos_freq = freqs.cos().repeat_interleave(2, dim=-1)
+    sin_freq = freqs.sin().repeat_interleave(2, dim=-1)
    if dim % 6 != 0:
-        padding_size = dim % 6
-        cos_vals = torch.cat([torch.ones_like(cos_vals[:, :, :padding_size]), cos_vals], dim=-1)
-        sin_vals = torch.cat([torch.zeros_like(sin_vals[:, :, :padding_size]), sin_vals], dim=-1)
-
-    # Reshape and extract one value per pair (since repeat_interleave duplicates each value)
-    cos_vals = cos_vals.reshape(*cos_vals.shape[:2], -1, 2)[..., 0].to(out_dtype)  # [B, N, dim//2]
-    sin_vals = sin_vals.reshape(*sin_vals.shape[:2], -1, 2)[..., 0].to(out_dtype)  # [B, N, dim//2]
-
-    # Build rotation matrix [[cos, -sin], [sin, cos]] and add heads dimension
-    freqs_cis = torch.stack([
-        torch.stack([cos_vals, -sin_vals], dim=-1),
-        torch.stack([sin_vals, cos_vals], dim=-1)
-    ], dim=-2).unsqueeze(1)  # [B, 1, N, dim//2, 2, 2]
-
-    return freqs_cis
+        cos_padding = torch.ones_like(cos_freq[:, :, : dim % 6])
+        sin_padding = torch.zeros_like(cos_freq[:, :, : dim % 6])
+        cos_freq = torch.cat([cos_padding, cos_freq], dim=-1)
+        sin_freq = torch.cat([sin_padding, sin_freq], dim=-1)
+    return cos_freq.to(out_dtype), sin_freq.to(out_dtype)


 class LTXVModel(torch.nn.Module):
@@ -485,7 +501,7 @@ class LTXVModel(torch.nn.Module):
        shift, scale = scale_shift_values[:, :, 0], scale_shift_values[:, :, 1]
        x = self.norm_out(x)
        # Modulation
-        x = torch.addcmul(x, x, scale).add_(shift)
+        x = x * (1 + scale) + shift
        x = self.proj_out(x)

        x = self.patchifier.unpatchify(
--- a/comfy/ldm/lumina/model.py
+++ b/comfy/ldm/lumina/model.py
@@ -522,7 +522,7 @@ class NextDiT(nn.Module):
        max_cap_len = max(l_effective_cap_len)
        max_img_len = max(l_effective_img_len)

-        position_ids = torch.zeros(bsz, max_seq_len, 3, dtype=torch.float32, device=device)
+        position_ids = torch.zeros(bsz, max_seq_len, 3, dtype=torch.int32, device=device)

        for i in range(bsz):
            cap_len = l_effective_cap_len[i]
@@ -531,22 +531,10 @@ class NextDiT(nn.Module):
            H_tokens, W_tokens = H // pH, W // pW
            assert H_tokens * W_tokens == img_len

-            rope_options = transformer_options.get("rope_options", None)
-            h_scale = 1.0
-            w_scale = 1.0
-            h_start = 0
-            w_start = 0
-            if rope_options is not None:
-                h_scale = rope_options.get("scale_y", 1.0)
-                w_scale = rope_options.get("scale_x", 1.0)
-
-                h_start = rope_options.get("shift_y", 0.0)
-                w_start = rope_options.get("shift_x", 0.0)
-
-            position_ids[i, :cap_len, 0] = torch.arange(cap_len, dtype=torch.float32, device=device)
+            position_ids[i, :cap_len, 0] = torch.arange(cap_len, dtype=torch.int32, device=device)
            position_ids[i, cap_len:cap_len+img_len, 0] = cap_len
-            row_ids = (torch.arange(H_tokens, dtype=torch.float32, device=device) * h_scale + h_start).view(-1, 1).repeat(1, W_tokens).flatten()
-            col_ids = (torch.arange(W_tokens, dtype=torch.float32, device=device) * w_scale + w_start).view(1, -1).repeat(H_tokens, 1).flatten()
+            row_ids = torch.arange(H_tokens, dtype=torch.int32, device=device).view(-1, 1).repeat(1, W_tokens).flatten()
+            col_ids = torch.arange(W_tokens, dtype=torch.int32, device=device).view(1, -1).repeat(H_tokens, 1).flatten()
            position_ids[i, cap_len:cap_len+img_len, 1] = row_ids
            position_ids[i, cap_len:cap_len+img_len, 2] = col_ids

--- a/comfy/ldm/qwen_image/controlnet.py
+++ b/comfy/ldm/qwen_image/controlnet.py
@@ -44,7 +44,7 @@ class QwenImageControlNetModel(QwenImageTransformer2DModel):
        txt_start = round(max(((x.shape[-1] + (self.patch_size // 2)) // self.patch_size) // 2, ((x.shape[-2] + (self.patch_size // 2)) // self.patch_size) // 2))
        txt_ids = torch.arange(txt_start, txt_start + context.shape[1], device=x.device).reshape(1, -1, 1).repeat(x.shape[0], 1, 3)
        ids = torch.cat((txt_ids, img_ids), dim=1)
-        image_rotary_emb = self.pe_embedder(ids).to(x.dtype).contiguous()
+        image_rotary_emb = self.pe_embedder(ids).squeeze(1).unsqueeze(2).to(x.dtype)
        del ids, txt_ids, img_ids

        hidden_states = self.img_in(hidden_states) + self.controlnet_x_embedder(hint)
--- a/comfy/ldm/qwen_image/model.py
+++ b/comfy/ldm/qwen_image/model.py
@@ -10,7 +10,6 @@ from comfy.ldm.modules.attention import optimized_attention_masked
 from comfy.ldm.flux.layers import EmbedND
 import comfy.ldm.common_dit
 import comfy.patcher_extension
-from comfy.ldm.flux.math import apply_rope1

 class GELU(nn.Module):
    def __init__(self, dim_in: int, dim_out: int, approximate: str = "none", bias: bool = True, dtype=None, device=None, operations=None):
@@ -135,34 +134,33 @@ class Attention(nn.Module):
        image_rotary_emb: Optional[torch.Tensor] = None,
        transformer_options={},
    ) -> Tuple[torch.Tensor, torch.Tensor]:
-        batch_size = hidden_states.shape[0]
-        seq_img = hidden_states.shape[1]
        seq_txt = encoder_hidden_states.shape[1]

-        # Project and reshape to BHND format (batch, heads, seq, dim)
-        img_query = self.to_q(hidden_states).view(batch_size, seq_img, self.heads, -1).transpose(1, 2).contiguous()
-        img_key = self.to_k(hidden_states).view(batch_size, seq_img, self.heads, -1).transpose(1, 2).contiguous()
-        img_value = self.to_v(hidden_states).view(batch_size, seq_img, self.heads, -1).transpose(1, 2)
+        img_query = self.to_q(hidden_states).unflatten(-1, (self.heads, -1))
+        img_key = self.to_k(hidden_states).unflatten(-1, (self.heads, -1))
+        img_value = self.to_v(hidden_states).unflatten(-1, (self.heads, -1))

-        txt_query = self.add_q_proj(encoder_hidden_states).view(batch_size, seq_txt, self.heads, -1).transpose(1, 2).contiguous()
-        txt_key = self.add_k_proj(encoder_hidden_states).view(batch_size, seq_txt, self.heads, -1).transpose(1, 2).contiguous()
-        txt_value = self.add_v_proj(encoder_hidden_states).view(batch_size, seq_txt, self.heads, -1).transpose(1, 2)
+        txt_query = self.add_q_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+        txt_key = self.add_k_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+        txt_value = self.add_v_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))

        img_query = self.norm_q(img_query)
        img_key = self.norm_k(img_key)
        txt_query = self.norm_added_q(txt_query)
        txt_key = self.norm_added_k(txt_key)

-        joint_query = torch.cat([txt_query, img_query], dim=2)
-        joint_key = torch.cat([txt_key, img_key], dim=2)
-        joint_value = torch.cat([txt_value, img_value], dim=2)
+        joint_query = torch.cat([txt_query, img_query], dim=1)
+        joint_key = torch.cat([txt_key, img_key], dim=1)
+        joint_value = torch.cat([txt_value, img_value], dim=1)

-        joint_query = apply_rope1(joint_query, image_rotary_emb)
-        joint_key = apply_rope1(joint_key, image_rotary_emb)
+        joint_query = apply_rotary_emb(joint_query, image_rotary_emb)
+        joint_key = apply_rotary_emb(joint_key, image_rotary_emb)

-        joint_hidden_states = optimized_attention_masked(joint_query, joint_key, joint_value, self.heads,
-                                                         attention_mask, transformer_options=transformer_options,
-                                                         skip_reshape=True)
+        joint_query = joint_query.flatten(start_dim=2)
+        joint_key = joint_key.flatten(start_dim=2)
+        joint_value = joint_value.flatten(start_dim=2)
+
+        joint_hidden_states = optimized_attention_masked(joint_query, joint_key, joint_value, self.heads, attention_mask, transformer_options=transformer_options)

        txt_attn_output = joint_hidden_states[:, :seq_txt, :]
        img_attn_output = joint_hidden_states[:, seq_txt:, :]
@@ -236,10 +234,10 @@ class QwenImageTransformerBlock(nn.Module):
        img_mod1, img_mod2 = img_mod_params.chunk(2, dim=-1)
        txt_mod1, txt_mod2 = txt_mod_params.chunk(2, dim=-1)

-        img_modulated, img_gate1 = self._modulate(self.img_norm1(hidden_states), img_mod1)
-        del img_mod1
-        txt_modulated, txt_gate1 = self._modulate(self.txt_norm1(encoder_hidden_states), txt_mod1)
-        del txt_mod1
+        img_normed = self.img_norm1(hidden_states)
+        img_modulated, img_gate1 = self._modulate(img_normed, img_mod1)
+        txt_normed = self.txt_norm1(encoder_hidden_states)
+        txt_modulated, txt_gate1 = self._modulate(txt_normed, txt_mod1)

        img_attn_output, txt_attn_output = self.attn(
            hidden_states=img_modulated,
@@ -248,20 +246,16 @@ class QwenImageTransformerBlock(nn.Module):
            image_rotary_emb=image_rotary_emb,
            transformer_options=transformer_options,
        )
-        del img_modulated
-        del txt_modulated

        hidden_states = hidden_states + img_gate1 * img_attn_output
        encoder_hidden_states = encoder_hidden_states + txt_gate1 * txt_attn_output
-        del img_attn_output
-        del txt_attn_output
-        del img_gate1
-        del txt_gate1

-        img_modulated2, img_gate2 = self._modulate(self.img_norm2(hidden_states), img_mod2)
+        img_normed2 = self.img_norm2(hidden_states)
+        img_modulated2, img_gate2 = self._modulate(img_normed2, img_mod2)
        hidden_states = torch.addcmul(hidden_states, img_gate2, self.img_mlp(img_modulated2))

-        txt_modulated2, txt_gate2 = self._modulate(self.txt_norm2(encoder_hidden_states), txt_mod2)
+        txt_normed2 = self.txt_norm2(encoder_hidden_states)
+        txt_modulated2, txt_gate2 = self._modulate(txt_normed2, txt_mod2)
        encoder_hidden_states = torch.addcmul(encoder_hidden_states, txt_gate2, self.txt_mlp(txt_modulated2))

        return encoder_hidden_states, hidden_states
@@ -419,7 +413,7 @@ class QwenImageTransformer2DModel(nn.Module):
        txt_start = round(max(((x.shape[-1] + (self.patch_size // 2)) // self.patch_size) // 2, ((x.shape[-2] + (self.patch_size // 2)) // self.patch_size) // 2))
        txt_ids = torch.arange(txt_start, txt_start + context.shape[1], device=x.device).reshape(1, -1, 1).repeat(x.shape[0], 1, 3)
        ids = torch.cat((txt_ids, img_ids), dim=1)
-        image_rotary_emb = self.pe_embedder(ids).to(x.dtype).contiguous()
+        image_rotary_emb = self.pe_embedder(ids).squeeze(1).unsqueeze(2).to(x.dtype)
        del ids, txt_ids, img_ids

        hidden_states = self.img_in(hidden_states)
--- a/comfy/ldm/wan/model.py
+++ b/comfy/ldm/wan/model.py
@@ -232,7 +232,6 @@ class WanAttentionBlock(nn.Module):
        # assert e[0].dtype == torch.float32

        # self-attention
-        x = x.contiguous() # otherwise implicit in LayerNorm
        y = self.self_attn(
            torch.addcmul(repeat_e(e[0], x), self.norm1(x), 1 + repeat_e(e[1], x)),
            freqs, transformer_options=transformer_options)
@@ -589,7 +588,7 @@ class WanModel(torch.nn.Module):
        x = self.unpatchify(x, grid_sizes)
        return x

-    def rope_encode(self, t, h, w, t_start=0, steps_t=None, steps_h=None, steps_w=None, device=None, dtype=None, transformer_options={}):
+    def rope_encode(self, t, h, w, t_start=0, steps_t=None, steps_h=None, steps_w=None, device=None, dtype=None):
        patch_size = self.patch_size
        t_len = ((t + (patch_size[0] // 2)) // patch_size[0])
        h_len = ((h + (patch_size[1] // 2)) // patch_size[1])
@@ -602,22 +601,10 @@ class WanModel(torch.nn.Module):
        if steps_w is None:
            steps_w = w_len

-        h_start = 0
-        w_start = 0
-        rope_options = transformer_options.get("rope_options", None)
-        if rope_options is not None:
-            t_len = (t_len - 1.0) * rope_options.get("scale_t", 1.0) + 1.0
-            h_len = (h_len - 1.0) * rope_options.get("scale_y", 1.0) + 1.0
-            w_len = (w_len - 1.0) * rope_options.get("scale_x", 1.0) + 1.0
-
-            t_start += rope_options.get("shift_t", 0.0)
-            h_start += rope_options.get("shift_y", 0.0)
-            w_start += rope_options.get("shift_x", 0.0)
-
        img_ids = torch.zeros((steps_t, steps_h, steps_w, 3), device=device, dtype=dtype)
        img_ids[:, :, :, 0] = img_ids[:, :, :, 0] + torch.linspace(t_start, t_start + (t_len - 1), steps=steps_t, device=device, dtype=dtype).reshape(-1, 1, 1)
-        img_ids[:, :, :, 1] = img_ids[:, :, :, 1] + torch.linspace(h_start, h_start + (h_len - 1), steps=steps_h, device=device, dtype=dtype).reshape(1, -1, 1)
-        img_ids[:, :, :, 2] = img_ids[:, :, :, 2] + torch.linspace(w_start, w_start + (w_len - 1), steps=steps_w, device=device, dtype=dtype).reshape(1, 1, -1)
+        img_ids[:, :, :, 1] = img_ids[:, :, :, 1] + torch.linspace(0, h_len - 1, steps=steps_h, device=device, dtype=dtype).reshape(1, -1, 1)
+        img_ids[:, :, :, 2] = img_ids[:, :, :, 2] + torch.linspace(0, w_len - 1, steps=steps_w, device=device, dtype=dtype).reshape(1, 1, -1)
        img_ids = img_ids.reshape(1, -1, img_ids.shape[-1])

        freqs = self.rope_embedder(img_ids).movedim(1, 2)
@@ -643,7 +630,7 @@ class WanModel(torch.nn.Module):
        if self.ref_conv is not None and "reference_latent" in kwargs:
            t_len += 1

-        freqs = self.rope_encode(t_len, h, w, device=x.device, dtype=x.dtype, transformer_options=transformer_options)
+        freqs = self.rope_encode(t_len, h, w, device=x.device, dtype=x.dtype)
        return self.forward_orig(x, timestep, context, clip_fea=clip_fea, freqs=freqs, transformer_options=transformer_options, **kwargs)[:, :, :t, :h, :w]

    def unpatchify(self, x, grid_sizes):
--- a/comfy/model_base.py
+++ b/comfy/model_base.py
@@ -134,7 +134,7 @@ class BaseModel(torch.nn.Module):
        if not unet_config.get("disable_unet_model_creation", False):
            if model_config.custom_operations is None:
                fp8 = model_config.optimizations.get("fp8", False)
-                operations = comfy.ops.pick_operations(unet_config.get("dtype", None), self.manual_cast_dtype, fp8_optimizations=fp8, scaled_fp8=model_config.scaled_fp8, model_config=model_config)
+                operations = comfy.ops.pick_operations(unet_config.get("dtype", None), self.manual_cast_dtype, fp8_optimizations=fp8, scaled_fp8=model_config.scaled_fp8)
            else:
                operations = model_config.custom_operations
            self.diffusion_model = unet_model(**unet_config, device=device, operations=operations)
@@ -197,14 +197,8 @@ class BaseModel(torch.nn.Module):
            extra_conds[o] = extra

        t = self.process_timestep(t, x=x, **extra_conds)
-        if "latent_shapes" in extra_conds:
-            xc = utils.unpack_latents(xc, extra_conds.pop("latent_shapes"))
-
-        model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds)
-        if len(model_output) > 1 and not torch.is_tensor(model_output):
-            model_output, _ = utils.pack_latents(model_output)
-
-        return self.model_sampling.calculate_denoised(sigma, model_output.float(), x)
+        model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
+        return self.model_sampling.calculate_denoised(sigma, model_output, x)

    def process_timestep(self, timestep, **kwargs):
        return timestep
@@ -333,14 +327,6 @@ class BaseModel(torch.nn.Module):
        if self.model_config.scaled_fp8 is not None:
            unet_state_dict["scaled_fp8"] = torch.tensor([], dtype=self.model_config.scaled_fp8)

-        # Save mixed precision metadata
-        if hasattr(self.model_config, 'layer_quant_config') and self.model_config.layer_quant_config:
-            metadata = {
-                "format_version": "1.0",
-                "layers": self.model_config.layer_quant_config
-            }
-            unet_state_dict["_quantization_metadata"] = metadata
-
        unet_state_dict = self.model_config.process_unet_state_dict_for_saving(unet_state_dict)

        if self.model_type == ModelType.V_PREDICTION:
@@ -1536,94 +1522,3 @@ class HunyuanImage21Refiner(HunyuanImage21):
        out = super().extra_conds(**kwargs)
        out['disable_time_r'] = comfy.conds.CONDConstant(True)
        return out
-
-class HunyuanVideo15(HunyuanVideo):
-    def __init__(self, model_config, model_type=ModelType.FLOW, device=None):
-        super().__init__(model_config, model_type, device=device)
-
-    def concat_cond(self, **kwargs):
-        noise = kwargs.get("noise", None)
-        extra_channels = self.diffusion_model.img_in.proj.weight.shape[1] - noise.shape[1] - 1 #noise 32 img cond 32 + mask 1
-        if extra_channels == 0:
-            return None
-
-        image = kwargs.get("concat_latent_image", None)
-        device = kwargs["device"]
-
-        if image is None:
-            shape_image = list(noise.shape)
-            shape_image[1] = extra_channels
-            image = torch.zeros(shape_image, dtype=noise.dtype, layout=noise.layout, device=noise.device)
-        else:
-            latent_dim = self.latent_format.latent_channels
-            image = utils.common_upscale(image.to(device), noise.shape[-1], noise.shape[-2], "bilinear", "center")
-            for i in range(0, image.shape[1], latent_dim):
-                image[:, i: i + latent_dim] = self.process_latent_in(image[:, i: i + latent_dim])
-            image = utils.resize_to_batch_size(image, noise.shape[0])
-
-        mask = kwargs.get("concat_mask", kwargs.get("denoise_mask", None))
-        if mask is None:
-            mask = torch.zeros_like(noise)[:, :1]
-        else:
-            mask = 1.0 - mask
-            mask = utils.common_upscale(mask.to(device), noise.shape[-1], noise.shape[-2], "bilinear", "center")
-            if mask.shape[-3] < noise.shape[-3]:
-                mask = torch.nn.functional.pad(mask, (0, 0, 0, 0, 0, noise.shape[-3] - mask.shape[-3]), mode='constant', value=0)
-            mask = utils.resize_to_batch_size(mask, noise.shape[0])
-
-        return torch.cat((image, mask), dim=1)
-
-    def extra_conds(self, **kwargs):
-        out = super().extra_conds(**kwargs)
-        attention_mask = kwargs.get("attention_mask", None)
-        if attention_mask is not None:
-            if torch.numel(attention_mask) != attention_mask.sum():
-                out['attention_mask'] = comfy.conds.CONDRegular(attention_mask)
-        cross_attn = kwargs.get("cross_attn", None)
-        if cross_attn is not None:
-            out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)
-
-        conditioning_byt5small = kwargs.get("conditioning_byt5small", None)
-        if conditioning_byt5small is not None:
-            out['txt_byt5'] = comfy.conds.CONDRegular(conditioning_byt5small)
-
-        guidance = kwargs.get("guidance", 6.0)
-        if guidance is not None:
-            out['guidance'] = comfy.conds.CONDRegular(torch.FloatTensor([guidance]))
-
-        clip_vision_output = kwargs.get("clip_vision_output", None)
-        if clip_vision_output is not None:
-            out['clip_fea'] = comfy.conds.CONDRegular(clip_vision_output.last_hidden_state)
-
-        return out
-
-class HunyuanVideo15_SR_Distilled(HunyuanVideo15):
-    def __init__(self, model_config, model_type=ModelType.FLOW, device=None):
-        super().__init__(model_config, model_type, device=device)
-
-    def concat_cond(self, **kwargs):
-        noise = kwargs.get("noise", None)
-        image = kwargs.get("concat_latent_image", None)
-        noise_augmentation = kwargs.get("noise_augmentation", 0.0)
-        device = kwargs["device"]
-
-        if image is None:
-            image = torch.zeros([noise.shape[0], noise.shape[1] * 2 + 2, noise.shape[-3], noise.shape[-2], noise.shape[-1]], device=comfy.model_management.intermediate_device())
-        else:
-            image = utils.common_upscale(image.to(device), noise.shape[-1], noise.shape[-2], "bilinear", "center")
-            #image = self.process_latent_in(image) # scaling wasn't applied in reference code
-            image = utils.resize_to_batch_size(image, noise.shape[0])
-            lq_image_slice = slice(noise.shape[1] + 1, 2 * noise.shape[1] + 1)
-            if noise_augmentation > 0:
-                generator = torch.Generator(device="cpu")
-                generator.manual_seed(kwargs.get("seed", 0) - 10)
-                noise = torch.randn(image[:, lq_image_slice].shape, generator=generator, dtype=image.dtype, device="cpu").to(image.device)
-                image[:, lq_image_slice] = noise_augmentation * noise + min(1.0 - noise_augmentation, 0.75) * image[:, lq_image_slice]
-            else:
-                image[:, lq_image_slice] = 0.75 * image[:, lq_image_slice]
-        return image
-
-    def extra_conds(self, **kwargs):
-        out = super().extra_conds(**kwargs)
-        out['disable_time_r'] = comfy.conds.CONDConstant(False)
-        return out
--- a/comfy/model_detection.py
+++ b/comfy/model_detection.py
@@ -6,20 +6,6 @@ import math
 import logging
 import torch

-
-def detect_layer_quantization(metadata):
-    quant_key = "_quantization_metadata"
-    if metadata is not None and quant_key in metadata:
-        quant_metadata = metadata.pop(quant_key)
-        quant_metadata = json.loads(quant_metadata)
-        if isinstance(quant_metadata, dict) and "layers" in quant_metadata:
-            logging.info(f"Found quantization metadata (version {quant_metadata.get('format_version', 'unknown')})")
-            return quant_metadata["layers"]
-        else:
-            raise ValueError("Invalid quantization metadata format")
-    return None
-
-
 def count_blocks(state_dict_keys, prefix_string):
    count = 0
    while True:
@@ -186,16 +172,6 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):

        guidance_keys = list(filter(lambda a: a.startswith("{}guidance_in.".format(key_prefix)), state_dict_keys))
        dit_config["guidance_embed"] = len(guidance_keys) > 0
-
-        # HunyuanVideo 1.5
-        if '{}cond_type_embedding.weight'.format(key_prefix) in state_dict_keys:
-            dit_config["use_cond_type_embedding"] = True
-        else:
-            dit_config["use_cond_type_embedding"] = False
-        if '{}vision_in.proj.0.weight'.format(key_prefix) in state_dict_keys:
-            dit_config["vision_in_dim"] = state_dict['{}vision_in.proj.0.weight'.format(key_prefix)].shape[0]
-        else:
-            dit_config["vision_in_dim"] = None
        return dit_config

    if '{}double_blocks.0.img_attn.norm.key_norm.scale'.format(key_prefix) in state_dict_keys and ('{}img_in.weight'.format(key_prefix) in state_dict_keys or f"{key_prefix}distilled_guidance_layer.norms.0.scale" in state_dict_keys): #Flux, Chroma or Chroma Radiance (has no img_in.weight)
@@ -237,7 +213,7 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
                dit_config["nerf_mlp_ratio"] = 4
                dit_config["nerf_depth"] = 4
                dit_config["nerf_max_freqs"] = 8
-                dit_config["nerf_tile_size"] = 512
+                dit_config["nerf_tile_size"] = 32
                dit_config["nerf_final_head_type"] = "conv" if f"{key_prefix}nerf_final_layer_conv.norm.scale" in state_dict_keys else "linear"
                dit_config["nerf_embedder_dtype"] = torch.float32
        else:
@@ -725,12 +701,6 @@ def model_config_from_unet(state_dict, unet_key_prefix, use_base_if_no_match=Fal
        else:
            model_config.optimizations["fp8"] = True

-    # Detect per-layer quantization (mixed precision)
-    layer_quant_config = detect_layer_quantization(metadata)
-    if layer_quant_config:
-        model_config.layer_quant_config = layer_quant_config
-        logging.info(f"Detected mixed precision quantization: {len(layer_quant_config)} layers quantized")
-
    return model_config

 def unet_prefix_from_state_dict(state_dict):
--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@@ -15,6 +15,7 @@
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <https://www.gnu.org/licenses/>.
 """
+from __future__ import annotations

 import psutil
 import logging
@@ -27,6 +28,10 @@ import platform
 import weakref
 import gc

+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from comfy.model_patcher import ModelPatcher
+
 class VRAMState(Enum):
    DISABLED = 0    #No vram present: no need to move models to vram
    NO_VRAM = 1     #Very low vram: enable all the options to save vram
@@ -89,7 +94,6 @@ if args.deterministic:

 directml_enabled = False
 if args.directml is not None:
-    logging.warning("WARNING: torch-directml barely works, is very slow, has not been updated in over 1 year and might be removed soon, please don't use it, there are better options.")
    import torch_directml
    directml_enabled = True
    device_index = args.directml
@@ -187,6 +191,25 @@ def get_torch_device():
        else:
            return torch.device(torch.cuda.current_device())

+def get_all_torch_devices(exclude_current=False):
+    global cpu_state
+    devices = []
+    if cpu_state == CPUState.GPU:
+        if is_nvidia():
+            for i in range(torch.cuda.device_count()):
+                devices.append(torch.device(i))
+        elif is_intel_xpu():
+            for i in range(torch.xpu.device_count()):
+                devices.append(torch.device(i))
+        elif is_ascend_npu():
+            for i in range(torch.npu.device_count()):
+                devices.append(torch.device(i))
+    else:
+        devices.append(get_torch_device())
+    if exclude_current:
+        devices.remove(get_torch_device())
+    return devices
+
 def get_total_memory(dev=None, torch_total_too=False):
    global directml_enabled
    if dev is None:
@@ -331,21 +354,14 @@ except:


 SUPPORT_FP8_OPS = args.supports_fp8_compute
-
-AMD_RDNA2_AND_OLDER_ARCH = ["gfx1030", "gfx1031", "gfx1010", "gfx1011", "gfx1012", "gfx906", "gfx900", "gfx803"]
-
 try:
    if is_amd():
-        arch = torch.cuda.get_device_properties(get_torch_device()).gcnArchName
-        if not (any((a in arch) for a in AMD_RDNA2_AND_OLDER_ARCH)):
-            torch.backends.cudnn.enabled = False  # Seems to improve things a lot on AMD
-            logging.info("Set: torch.backends.cudnn.enabled = False for better AMD performance.")
-
+        torch.backends.cudnn.enabled = False  # Seems to improve things a lot on AMD
        try:
            rocm_version = tuple(map(int, str(torch.version.hip).split(".")[:2]))
        except:
            rocm_version = (6, -1)
-
+        arch = torch.cuda.get_device_properties(get_torch_device()).gcnArchName
        logging.info("AMD arch: {}".format(arch))
        logging.info("ROCm version: {}".format(rocm_version))
        if args.use_split_cross_attention == False and args.use_quad_cross_attention == False:
@@ -379,9 +395,6 @@ try:
 except:
    pass

-if torch.cuda.is_available() and torch.backends.cudnn.is_available() and PerformanceFeature.AutoTune in args.fast:
-    torch.backends.cudnn.benchmark = True
-
 try:
    if torch_version_numeric >= (2, 5):
        torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp(True)
@@ -444,9 +457,13 @@ try:
    logging.info("Device: {}".format(get_torch_device_name(get_torch_device())))
 except:
    logging.warning("Could not pick default device.")
+try:
+    for device in get_all_torch_devices(exclude_current=True):
+        logging.info("Device: {}".format(get_torch_device_name(device)))
+except:
+    pass

-
-current_loaded_models = []
+current_loaded_models: list[LoadedModel] = []

 def module_size(module):
    module_mem = 0
@@ -457,7 +474,7 @@ def module_size(module):
    return module_mem

 class LoadedModel:
-    def __init__(self, model):
+    def __init__(self, model: ModelPatcher):
        self._set_model(model)
        self.device = model.load_device
        self.real_model = None
@@ -465,7 +482,7 @@ class LoadedModel:
        self.model_finalizer = None
        self._patcher_finalizer = None

-    def _set_model(self, model):
+    def _set_model(self, model: ModelPatcher):
        self._model = weakref.ref(model)
        if model.parent is not None:
            self._parent_model = weakref.ref(model.parent)
@@ -504,7 +521,6 @@ class LoadedModel:
        if use_more_vram == 0:
            use_more_vram = 1e32
        self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
-
        real_model = self.model.model

        if is_intel_xpu() and not args.disable_ipex_optimize and 'ipex' in globals() and real_model is not None:
@@ -690,10 +706,7 @@ def load_models_gpu(models, memory_required=0, force_patch_weights=False, minimu
            current_free_mem = get_free_memory(torch_dev) + loaded_memory

            lowvram_model_memory = max(128 * 1024 * 1024, (current_free_mem - minimum_memory_required), min(current_free_mem * MIN_WEIGHT_MEMORY_RATIO, current_free_mem - minimum_inference_memory()))
-            lowvram_model_memory = lowvram_model_memory - loaded_memory
-
-            if lowvram_model_memory == 0:
-                lowvram_model_memory = 0.1
+            lowvram_model_memory = max(0.1, lowvram_model_memory - loaded_memory)

        if vram_set_state == VRAMState.NO_VRAM:
            lowvram_model_memory = 0.1
@@ -1003,6 +1016,12 @@ def device_supports_non_blocking(device):
        return False
    return True

+def device_should_use_non_blocking(device):
+    if not device_supports_non_blocking(device):
+        return False
+    return False
+    # return True #TODO: figure out why this causes memory issues on Nvidia and possibly others
+
 def force_channels_last():
    if args.force_channels_last:
        return True
@@ -1017,16 +1036,6 @@ if args.async_offload:
    NUM_STREAMS = 2
    logging.info("Using async weight offloading with {} streams".format(NUM_STREAMS))

-def current_stream(device):
-    if device is None:
-        return None
-    if is_device_cuda(device):
-        return torch.cuda.current_stream()
-    elif is_device_xpu(device):
-        return torch.xpu.current_stream()
-    else:
-        return None
-
 stream_counters = {}
 def get_offload_stream(device):
    stream_counter = stream_counters.get(device, 0)
@@ -1035,17 +1044,21 @@ def get_offload_stream(device):

    if device in STREAMS:
        ss = STREAMS[device]
-        #Sync the oldest stream in the queue with the current
-        ss[stream_counter].wait_stream(current_stream(device))
+        s = ss[stream_counter]
        stream_counter = (stream_counter + 1) % len(ss)
+        if is_device_cuda(device):
+            ss[stream_counter].wait_stream(torch.cuda.current_stream())
+        elif is_device_xpu(device):
+            ss[stream_counter].wait_stream(torch.xpu.current_stream())
        stream_counters[device] = stream_counter
-        return ss[stream_counter]
+        return s
    elif is_device_cuda(device):
        ss = []
        for k in range(NUM_STREAMS):
            ss.append(torch.cuda.Stream(device=device, priority=0))
        STREAMS[device] = ss
        s = ss[stream_counter]
+        stream_counter = (stream_counter + 1) % len(ss)
        stream_counters[device] = stream_counter
        return s
    elif is_device_xpu(device):
@@ -1054,14 +1067,18 @@ def get_offload_stream(device):
            ss.append(torch.xpu.Stream(device=device, priority=0))
        STREAMS[device] = ss
        s = ss[stream_counter]
+        stream_counter = (stream_counter + 1) % len(ss)
        stream_counters[device] = stream_counter
        return s
    return None

 def sync_stream(device, stream):
-    if stream is None or current_stream(device) is None:
+    if stream is None:
        return
-    current_stream(device).wait_stream(stream)
+    if is_device_cuda(device):
+        torch.cuda.current_stream().wait_stream(stream)
+    elif is_device_xpu(device):
+        torch.xpu.current_stream().wait_stream(stream)

 def cast_to(weight, dtype=None, device=None, non_blocking=False, copy=False, stream=None):
    if device is None or weight.device == device:
@@ -1086,79 +1103,6 @@ def cast_to_device(tensor, device, dtype, copy=False):
    non_blocking = device_supports_non_blocking(device)
    return cast_to(tensor, dtype=dtype, device=device, non_blocking=non_blocking, copy=copy)

-
-PINNED_MEMORY = {}
-TOTAL_PINNED_MEMORY = 0
-MAX_PINNED_MEMORY = -1
-if not args.disable_pinned_memory:
-    if is_nvidia() or is_amd():
-        if WINDOWS:
-            MAX_PINNED_MEMORY = get_total_memory(torch.device("cpu")) * 0.45  # Windows limit is apparently 50%
-        else:
-            MAX_PINNED_MEMORY = get_total_memory(torch.device("cpu")) * 0.95
-        logging.info("Enabled pinned memory {}".format(MAX_PINNED_MEMORY // (1024 * 1024)))
-
-
-def pin_memory(tensor):
-    global TOTAL_PINNED_MEMORY
-    if MAX_PINNED_MEMORY <= 0:
-        return False
-
-    if type(tensor) is not torch.nn.parameter.Parameter:
-        return False
-
-    if not is_device_cpu(tensor.device):
-        return False
-
-    if tensor.is_pinned():
-        #NOTE: Cuda does detect when a tensor is already pinned and would
-        #error below, but there are proven cases where this also queues an error
-        #on the GPU async. So dont trust the CUDA API and guard here
-        return False
-
-    if not tensor.is_contiguous():
-        return False
-
-    size = tensor.numel() * tensor.element_size()
-    if (TOTAL_PINNED_MEMORY + size) > MAX_PINNED_MEMORY:
-        return False
-
-    ptr = tensor.data_ptr()
-    if torch.cuda.cudart().cudaHostRegister(ptr, size, 1) == 0:
-        PINNED_MEMORY[ptr] = size
-        TOTAL_PINNED_MEMORY += size
-        return True
-
-    return False
-
-def unpin_memory(tensor):
-    global TOTAL_PINNED_MEMORY
-    if MAX_PINNED_MEMORY <= 0:
-        return False
-
-    if not is_device_cpu(tensor.device):
-        return False
-
-    ptr = tensor.data_ptr()
-    size = tensor.numel() * tensor.element_size()
-
-    size_stored = PINNED_MEMORY.get(ptr, None)
-    if size_stored is None:
-        logging.warning("Tried to unpin tensor not pinned by ComfyUI")
-        return False
-
-    if size != size_stored:
-        logging.warning("Size of pinned tensor changed")
-        return False
-
-    if torch.cuda.cudart().cudaHostUnregister(ptr) == 0:
-        TOTAL_PINNED_MEMORY -= PINNED_MEMORY.pop(ptr)
-        if len(PINNED_MEMORY) == 0:
-            TOTAL_PINNED_MEMORY = 0
-        return True
-
-    return False
-
 def sage_attention_enabled():
    return args.use_sage_attention

@@ -1411,7 +1355,7 @@ def should_use_bf16(device=None, model_params=0, prioritize_performance=True, ma

    if is_amd():
        arch = torch.cuda.get_device_properties(device).gcnArchName
-        if any((a in arch) for a in AMD_RDNA2_AND_OLDER_ARCH):  # RDNA2 and older don't support bf16
+        if any((a in arch) for a in ["gfx1030", "gfx1031", "gfx1010", "gfx1011", "gfx1012", "gfx906", "gfx900", "gfx803"]):  # RDNA2 and older don't support bf16
            if manual_cast:
                return True
            return False
@@ -1480,8 +1424,34 @@ def soft_empty_cache(force=False):
        torch.cuda.ipc_collect()

 def unload_all_models():
-    free_memory(1e30, get_torch_device())
+    for device in get_all_torch_devices():
+        free_memory(1e30, device)

+def unload_model_and_clones(model: ModelPatcher, unload_additional_models=True, all_devices=False):
+    'Unload only model and its clones - primarily for multigpu cloning purposes.'
+    initial_keep_loaded: list[LoadedModel] = current_loaded_models.copy()
+    additional_models = []
+    if unload_additional_models:
+        additional_models = model.get_nested_additional_models()
+    keep_loaded = []
+    for loaded_model in initial_keep_loaded:
+        if loaded_model.model is not None:
+            if model.clone_base_uuid == loaded_model.model.clone_base_uuid:
+                continue
+            # check additional models if they are a match
+            skip = False
+            for add_model in additional_models:
+                if add_model.clone_base_uuid == loaded_model.model.clone_base_uuid:
+                    skip = True
+                    break
+            if skip:
+                continue
+        keep_loaded.append(loaded_model)
+    if not all_devices:
+        free_memory(1e30, get_torch_device(), keep_loaded)
+    else:
+        for device in get_all_torch_devices():
+            free_memory(1e30, device, keep_loaded)

 #TODO: might be cleaner to put this somewhere else
 import threading
--- a/comfy/model_patcher.py
+++ b/comfy/model_patcher.py
@@ -87,12 +87,15 @@ def set_model_options_pre_cfg_function(model_options, pre_cfg_function, disable_
 def create_model_options_clone(orig_model_options: dict):
    return comfy.patcher_extension.copy_nested_dicts(orig_model_options)

-def create_hook_patches_clone(orig_hook_patches):
+def create_hook_patches_clone(orig_hook_patches, copy_tuples=False):
    new_hook_patches = {}
    for hook_ref in orig_hook_patches:
        new_hook_patches[hook_ref] = {}
        for k in orig_hook_patches[hook_ref]:
            new_hook_patches[hook_ref][k] = orig_hook_patches[hook_ref][k][:]
+            if copy_tuples:
+                for i in range(len(new_hook_patches[hook_ref][k])):
+                    new_hook_patches[hook_ref][k][i] = tuple(new_hook_patches[hook_ref][k][i])
    return new_hook_patches

 def wipe_lowvram_weight(m):
@@ -238,7 +241,6 @@ class ModelPatcher:
        self.force_cast_weights = False
        self.patches_uuid = uuid.uuid4()
        self.parent = None
-        self.pinned = set()

        self.attachments: dict[str] = {}
        self.additional_models: dict[str, list[ModelPatcher]] = {}
@@ -258,6 +260,9 @@ class ModelPatcher:
        self.is_clip = False
        self.hook_mode = comfy.hooks.EnumHookMode.MaxSpeed

+        self.is_multigpu_base_clone = False
+        self.clone_base_uuid = uuid.uuid4()
+
        if not hasattr(self.model, 'model_loaded_weight_memory'):
            self.model.model_loaded_weight_memory = 0

@@ -276,9 +281,6 @@ class ModelPatcher:
        self.size = comfy.model_management.module_size(self.model)
        return self.size

-    def get_ram_usage(self):
-        return self.model_size()
-
    def loaded_size(self):
        return self.model.model_loaded_weight_memory

@@ -298,7 +300,6 @@ class ModelPatcher:
        n.backup = self.backup
        n.object_patches_backup = self.object_patches_backup
        n.parent = self
-        n.pinned = self.pinned

        n.force_cast_weights = self.force_cast_weights

@@ -340,18 +341,92 @@ class ModelPatcher:
        n.is_clip = self.is_clip
        n.hook_mode = self.hook_mode

+        n.is_multigpu_base_clone = self.is_multigpu_base_clone
+        n.clone_base_uuid = self.clone_base_uuid
+
        for callback in self.get_all_callbacks(CallbacksMP.ON_CLONE):
            callback(self, n)
        return n

+    def deepclone_multigpu(self, new_load_device=None, models_cache: dict[uuid.UUID,ModelPatcher]=None):
+        logging.info(f"Creating deepclone of {self.model.__class__.__name__} for {new_load_device if new_load_device else self.load_device}.")
+        comfy.model_management.unload_model_and_clones(self)
+        n = self.clone()
+        # set load device, if present
+        if new_load_device is not None:
+            n.load_device = new_load_device
+        # unlike for normal clone, backup dicts that shared same ref should not;
+        # otherwise, patchers that have deep copies of base models will erroneously influence each other.
+        n.backup = copy.deepcopy(n.backup)
+        n.object_patches_backup = copy.deepcopy(n.object_patches_backup)
+        n.hook_backup = copy.deepcopy(n.hook_backup)
+        n.model = copy.deepcopy(n.model)
+        # multigpu clone should not have multigpu additional_models entry
+        n.remove_additional_models("multigpu")
+        # multigpu_clone all stored additional_models; make sure circular references are properly handled
+        if models_cache is None:
+            models_cache = {}
+        for key, model_list in n.additional_models.items():
+            for i in range(len(model_list)):
+                add_model = n.additional_models[key][i]
+                if add_model.clone_base_uuid not in models_cache:
+                    models_cache[add_model.clone_base_uuid] = add_model.deepclone_multigpu(new_load_device=new_load_device, models_cache=models_cache)
+                n.additional_models[key][i] = models_cache[add_model.clone_base_uuid]
+        for callback in self.get_all_callbacks(CallbacksMP.ON_DEEPCLONE_MULTIGPU):
+            callback(self, n)
+        return n
+
+    def match_multigpu_clones(self):
+        multigpu_models = self.get_additional_models_with_key("multigpu")
+        if len(multigpu_models) > 0:
+            new_multigpu_models = []
+            for mm in multigpu_models:
+                # clone main model, but bring over relevant props from existing multigpu clone
+                n = self.clone()
+                n.load_device = mm.load_device
+                n.backup = mm.backup
+                n.object_patches_backup = mm.object_patches_backup
+                n.hook_backup = mm.hook_backup
+                n.model = mm.model
+                n.is_multigpu_base_clone = mm.is_multigpu_base_clone
+                n.remove_additional_models("multigpu")
+                orig_additional_models: dict[str, list[ModelPatcher]] = comfy.patcher_extension.copy_nested_dicts(n.additional_models)
+                n.additional_models = comfy.patcher_extension.copy_nested_dicts(mm.additional_models)
+                # figure out which additional models are not present in multigpu clone
+                models_cache = {}
+                for mm_add_model in mm.get_additional_models():
+                    models_cache[mm_add_model.clone_base_uuid] = mm_add_model
+                remove_models_uuids = set(list(models_cache.keys()))
+                for key, model_list in orig_additional_models.items():
+                    for orig_add_model in model_list:
+                        if orig_add_model.clone_base_uuid not in models_cache:
+                            models_cache[orig_add_model.clone_base_uuid] = orig_add_model.deepclone_multigpu(new_load_device=n.load_device, models_cache=models_cache)
+                            existing_list = n.get_additional_models_with_key(key)
+                            existing_list.append(models_cache[orig_add_model.clone_base_uuid])
+                            n.set_additional_models(key, existing_list)
+                        if orig_add_model.clone_base_uuid in remove_models_uuids:
+                            remove_models_uuids.remove(orig_add_model.clone_base_uuid)
+                # remove duplicate additional models
+                for key, model_list in n.additional_models.items():
+                    new_model_list = [x for x in model_list if x.clone_base_uuid not in remove_models_uuids]
+                    n.set_additional_models(key, new_model_list)
+                for callback in self.get_all_callbacks(CallbacksMP.ON_MATCH_MULTIGPU_CLONES):
+                    callback(self, n)
+                new_multigpu_models.append(n)
+            self.set_additional_models("multigpu", new_multigpu_models)
+
    def is_clone(self, other):
        if hasattr(other, 'model') and self.model is other.model:
            return True
        return False

-    def clone_has_same_weights(self, clone: 'ModelPatcher'):
-        if not self.is_clone(clone):
-            return False
+    def clone_has_same_weights(self, clone: ModelPatcher, allow_multigpu=False):
+        if allow_multigpu:
+            if self.clone_base_uuid != clone.clone_base_uuid:
+                return False
+        else:
+            if not self.is_clone(clone):
+                return False

        if self.current_hooks != clone.current_hooks:
            return False
@@ -455,19 +530,6 @@ class ModelPatcher:
    def set_model_post_input_patch(self, patch):
        self.set_model_patch(patch, "post_input")

-    def set_model_rope_options(self, scale_x, shift_x, scale_y, shift_y, scale_t, shift_t, **kwargs):
-        rope_options = self.model_options["transformer_options"].get("rope_options", {})
-        rope_options["scale_x"] = scale_x
-        rope_options["scale_y"] = scale_y
-        rope_options["scale_t"] = scale_t
-
-        rope_options["shift_x"] = shift_x
-        rope_options["shift_y"] = shift_y
-        rope_options["shift_t"] = shift_t
-
-        self.model_options["transformer_options"]["rope_options"] = rope_options
-
-
    def add_object_patch(self, name, obj):
        self.object_patches[name] = obj

@@ -636,21 +698,6 @@ class ModelPatcher:
        else:
            set_func(out_weight, inplace_update=inplace_update, seed=string_to_seed(key))

-    def pin_weight_to_device(self, key):
-        weight, set_func, convert_func = get_key_weight(self.model, key)
-        if comfy.model_management.pin_memory(weight):
-            self.pinned.add(key)
-
-    def unpin_weight(self, key):
-        if key in self.pinned:
-            weight, set_func, convert_func = get_key_weight(self.model, key)
-            comfy.model_management.unpin_memory(weight)
-            self.pinned.remove(key)
-
-    def unpin_all_weights(self):
-        for key in list(self.pinned):
-            self.unpin_weight(key)
-
    def _load_list(self):
        loading = []
        for n, m in self.model.named_modules():
@@ -672,11 +719,9 @@ class ModelPatcher:
            mem_counter = 0
            patch_counter = 0
            lowvram_counter = 0
-            lowvram_mem_counter = 0
            loading = self._load_list()

            load_completely = []
-            offloaded = []
            loading.sort(reverse=True)
            for x in loading:
                n = x[1]
@@ -693,7 +738,6 @@ class ModelPatcher:
                    if mem_counter + module_mem >= lowvram_model_memory:
                        lowvram_weight = True
                        lowvram_counter += 1
-                        lowvram_mem_counter += module_mem
                        if hasattr(m, "prev_comfy_cast_weights"): #Already lowvramed
                            continue

@@ -719,7 +763,6 @@ class ModelPatcher:
                            patch_counter += 1

                    cast_weight = True
-                    offloaded.append((module_mem, n, m, params))
                else:
                    if hasattr(m, "comfy_cast_weights"):
                        wipe_lowvram_weight(m)
@@ -750,9 +793,7 @@ class ModelPatcher:
                        continue

                for param in params:
-                    key = "{}.{}".format(n, param)
-                    self.unpin_weight(key)
-                    self.patch_weight_to_device(key, device_to=device_to)
+                    self.patch_weight_to_device("{}.{}".format(n, param), device_to=device_to)

                logging.debug("lowvram: loaded module regularly {} {}".format(n, m))
                m.comfy_patched_weights = True
@@ -760,17 +801,11 @@ class ModelPatcher:
            for x in load_completely:
                x[2].to(device_to)

-            for x in offloaded:
-                n = x[1]
-                params = x[3]
-                for param in params:
-                    self.pin_weight_to_device("{}.{}".format(n, param))
-
            if lowvram_counter > 0:
-                logging.info("loaded partially; {:.2f} MB usable, {:.2f} MB loaded, {:.2f} MB offloaded, lowvram patches: {}".format(lowvram_model_memory / (1024 * 1024), mem_counter / (1024 * 1024), lowvram_mem_counter / (1024 * 1024), patch_counter))
+                logging.info("loaded partially {} {} {}".format(lowvram_model_memory / (1024 * 1024), mem_counter / (1024 * 1024), patch_counter))
                self.model.model_lowvram = True
            else:
-                logging.info("loaded completely; {:.2f} MB usable, {:.2f} MB loaded, full load: {}".format(lowvram_model_memory / (1024 * 1024), mem_counter / (1024 * 1024), full_load))
+                logging.info("loaded completely {} {} {}".format(lowvram_model_memory / (1024 * 1024), mem_counter / (1024 * 1024), full_load))
                self.model.model_lowvram = False
                if full_load:
                    self.model.to(device_to)
@@ -807,7 +842,6 @@ class ModelPatcher:
        self.eject_model()
        if unpatch_weights:
            self.unpatch_hooks()
-            self.unpin_all_weights()
            if self.model.model_lowvram:
                for m in self.model.modules():
                    move_weight_functions(m, device_to)
@@ -843,7 +877,7 @@ class ModelPatcher:

        self.object_patches_backup.clear()

-    def partially_unload(self, device_to, memory_to_free=0, force_patch_weights=False):
+    def partially_unload(self, device_to, memory_to_free=0):
        with self.use_ejected():
            hooks_unpatched = False
            memory_freed = 0
@@ -887,19 +921,13 @@ class ModelPatcher:
                        module_mem += move_weight_functions(m, device_to)
                        if lowvram_possible:
                            if weight_key in self.patches:
-                                if force_patch_weights:
-                                    self.patch_weight_to_device(weight_key)
-                                else:
-                                    _, set_func, convert_func = get_key_weight(self.model, weight_key)
-                                    m.weight_function.append(LowVramPatch(weight_key, self.patches, convert_func, set_func))
-                                    patch_counter += 1
+                                _, set_func, convert_func = get_key_weight(self.model, weight_key)
+                                m.weight_function.append(LowVramPatch(weight_key, self.patches, convert_func, set_func))
+                                patch_counter += 1
                            if bias_key in self.patches:
-                                if force_patch_weights:
-                                    self.patch_weight_to_device(bias_key)
-                                else:
-                                    _, set_func, convert_func = get_key_weight(self.model, bias_key)
-                                    m.bias_function.append(LowVramPatch(bias_key, self.patches, convert_func, set_func))
-                                    patch_counter += 1
+                                _, set_func, convert_func = get_key_weight(self.model, bias_key)
+                                m.bias_function.append(LowVramPatch(bias_key, self.patches, convert_func, set_func))
+                                patch_counter += 1
                            cast_weight = True

                        if cast_weight:
@@ -909,13 +937,9 @@ class ModelPatcher:
                        memory_freed += module_mem
                        logging.debug("freed {}".format(n))

-                        for param in params:
-                            self.pin_weight_to_device("{}.{}".format(n, param))
-
            self.model.model_lowvram = True
            self.model.lowvram_patch_counter += patch_counter
            self.model.model_loaded_weight_memory -= memory_freed
-            logging.info("loaded partially: {:.2f} MB loaded, lowvram patches: {}".format(self.model.model_loaded_weight_memory / (1024 * 1024), self.model.lowvram_patch_counter))
            return memory_freed

    def partially_load(self, device_to, extra_memory=0, force_patch_weights=False):
@@ -928,9 +952,6 @@ class ModelPatcher:
                extra_memory += (used - self.model.model_loaded_weight_memory)

            self.patch_model(load_weights=False)
-            if extra_memory < 0 and not unpatch_weights:
-                self.partially_unload(self.offload_device, -extra_memory, force_patch_weights=force_patch_weights)
-                return 0
            full_load = False
            if self.model.model_lowvram == False and self.model.model_loaded_weight_memory > 0:
                self.apply_hooks(self.forced_hooks, force_apply=True)
@@ -1042,7 +1063,7 @@ class ModelPatcher:
        return self.additional_models.get(key, [])

    def get_additional_models(self):
-        all_models = []
+        all_models: list[ModelPatcher] = []
        for models in self.additional_models.values():
            all_models.extend(models)
        return all_models
@@ -1096,9 +1117,13 @@ class ModelPatcher:
        for callback in self.get_all_callbacks(CallbacksMP.ON_PRE_RUN):
            callback(self)

-    def prepare_state(self, timestep):
+    def prepare_state(self, timestep, model_options, ignore_multigpu=False):
        for callback in self.get_all_callbacks(CallbacksMP.ON_PREPARE_STATE):
-            callback(self, timestep)
+            callback(self, timestep, model_options, ignore_multigpu)
+        if not ignore_multigpu and "multigpu_clones" in model_options:
+            for p in model_options["multigpu_clones"].values():
+                p: ModelPatcher
+                p.prepare_state(timestep, model_options, ignore_multigpu=True)

    def restore_hook_patches(self):
        if self.hook_patches_backup is not None:
@@ -1111,12 +1136,18 @@ class ModelPatcher:
    def prepare_hook_patches_current_keyframe(self, t: torch.Tensor, hook_group: comfy.hooks.HookGroup, model_options: dict[str]):
        curr_t = t[0]
        reset_current_hooks = False
+        multigpu_kf_changed_cache = None
        transformer_options = model_options.get("transformer_options", {})
        for hook in hook_group.hooks:
            changed = hook.hook_keyframe.prepare_current_keyframe(curr_t=curr_t, transformer_options=transformer_options)
            # if keyframe changed, remove any cached HookGroups that contain hook with the same hook_ref;
            # this will cause the weights to be recalculated when sampling
            if changed:
+                # cache changed for multigpu usage
+                if "multigpu_clones" in model_options:
+                    if multigpu_kf_changed_cache is None:
+                        multigpu_kf_changed_cache = []
+                    multigpu_kf_changed_cache.append(hook)
                # reset current_hooks if contains hook that changed
                if self.current_hooks is not None:
                    for current_hook in self.current_hooks.hooks:
@@ -1128,6 +1159,28 @@ class ModelPatcher:
                        self.cached_hook_patches.pop(cached_group)
        if reset_current_hooks:
            self.patch_hooks(None)
+        if "multigpu_clones" in model_options:
+            for p in model_options["multigpu_clones"].values():
+                p: ModelPatcher
+                p._handle_changed_hook_keyframes(multigpu_kf_changed_cache)
+
+    def _handle_changed_hook_keyframes(self, kf_changed_cache: list[comfy.hooks.Hook]):
+        'Used to handle multigpu behavior inside prepare_hook_patches_current_keyframe.'
+        if kf_changed_cache is None:
+            return
+        reset_current_hooks = False
+        # reset current_hooks if contains hook that changed
+        for hook in kf_changed_cache:
+            if self.current_hooks is not None:
+                for current_hook in self.current_hooks.hooks:
+                    if current_hook == hook:
+                        reset_current_hooks = True
+                        break
+            for cached_group in list(self.cached_hook_patches.keys()):
+                if cached_group.contains(hook):
+                    self.cached_hook_patches.pop(cached_group)
+        if reset_current_hooks:
+            self.patch_hooks(None)

    def register_all_hook_patches(self, hooks: comfy.hooks.HookGroup, target_dict: dict[str], model_options: dict=None,
                                  registered: comfy.hooks.HookGroup = None):
@@ -1318,6 +1371,5 @@ class ModelPatcher:
        self.clear_cached_hook_weights()

    def __del__(self):
-        self.unpin_all_weights()
        self.detach(unpatch_all=False)

--- a/comfy/multigpu.py
+++ b/comfy/multigpu.py
@@ -0,0 +1,167 @@
+from __future__ import annotations
+import torch
+import logging
+
+from collections import namedtuple
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from comfy.model_patcher import ModelPatcher
+import comfy.utils
+import comfy.patcher_extension
+import comfy.model_management
+
+
+class GPUOptions:
+    def __init__(self, device_index: int, relative_speed: float):
+        self.device_index = device_index
+        self.relative_speed = relative_speed
+
+    def clone(self):
+        return GPUOptions(self.device_index, self.relative_speed)
+
+    def create_dict(self):
+        return {
+            "relative_speed": self.relative_speed
+        }
+
+class GPUOptionsGroup:
+    def __init__(self):
+        self.options: dict[int, GPUOptions] = {}
+
+    def add(self, info: GPUOptions):
+        self.options[info.device_index] = info
+
+    def clone(self):
+        c = GPUOptionsGroup()
+        for opt in self.options.values():
+            c.add(opt)
+        return c
+
+    def register(self, model: ModelPatcher):
+        opts_dict = {}
+        # get devices that are valid for this model
+        devices: list[torch.device] = [model.load_device]
+        for extra_model in model.get_additional_models_with_key("multigpu"):
+            extra_model: ModelPatcher
+            devices.append(extra_model.load_device)
+        # create dictionary with actual device mapped to its GPUOptions
+        device_opts_list: list[GPUOptions] = []
+        for device in devices:
+            device_opts = self.options.get(device.index, GPUOptions(device_index=device.index, relative_speed=1.0))
+            opts_dict[device] = device_opts.create_dict()
+            device_opts_list.append(device_opts)
+        # make relative_speed relative to 1.0
+        min_speed = min([x.relative_speed for x in device_opts_list])
+        for value in opts_dict.values():
+            value['relative_speed'] /= min_speed
+        model.model_options['multigpu_options'] = opts_dict
+
+
+def create_multigpu_deepclones(model: ModelPatcher, max_gpus: int, gpu_options: GPUOptionsGroup=None, reuse_loaded=False):
+    'Prepare ModelPatcher to contain deepclones of its BaseModel and related properties.'
+    model = model.clone()
+    # check if multigpu is already prepared - get the load devices from them if possible to exclude
+    skip_devices = set()
+    multigpu_models = model.get_additional_models_with_key("multigpu")
+    if len(multigpu_models) > 0:
+        for mm in multigpu_models:
+            skip_devices.add(mm.load_device)
+    skip_devices = list(skip_devices)
+
+    full_extra_devices = comfy.model_management.get_all_torch_devices(exclude_current=True)
+    limit_extra_devices = full_extra_devices[:max_gpus-1]
+    extra_devices = limit_extra_devices.copy()
+    # exclude skipped devices
+    for skip in skip_devices:
+        if skip in extra_devices:
+            extra_devices.remove(skip)
+    # create new deepclones
+    if len(extra_devices) > 0:
+        for device in extra_devices:
+            device_patcher = None
+            if reuse_loaded:
+                # check if there are any ModelPatchers currently loaded that could be referenced here after a clone
+                loaded_models: list[ModelPatcher] = comfy.model_management.loaded_models()
+                for lm in loaded_models:
+                    if lm.model is not None and lm.clone_base_uuid == model.clone_base_uuid and lm.load_device == device:
+                        device_patcher = lm.clone()
+                        logging.info(f"Reusing loaded deepclone of {device_patcher.model.__class__.__name__} for {device}")
+                        break
+            if device_patcher is None:
+                device_patcher = model.deepclone_multigpu(new_load_device=device)
+                device_patcher.is_multigpu_base_clone = True
+            multigpu_models = model.get_additional_models_with_key("multigpu")
+            multigpu_models.append(device_patcher)
+            model.set_additional_models("multigpu", multigpu_models)
+        model.match_multigpu_clones()
+        if gpu_options is None:
+            gpu_options = GPUOptionsGroup()
+        gpu_options.register(model)
+    else:
+        logging.info("No extra torch devices need initialization, skipping initializing MultiGPU Work Units.")
+    # TODO: only keep model clones that don't go 'past' the intended max_gpu count
+    # multigpu_models = model.get_additional_models_with_key("multigpu")
+    # new_multigpu_models = []
+    # for m in multigpu_models:
+    #     if m.load_device in limit_extra_devices:
+    #         new_multigpu_models.append(m)
+    # model.set_additional_models("multigpu", new_multigpu_models)
+    # persist skip_devices for use in sampling code
+    # if len(skip_devices) > 0 or "multigpu_skip_devices" in model.model_options:
+    #     model.model_options["multigpu_skip_devices"] = skip_devices
+    return model
+
+
+LoadBalance = namedtuple('LoadBalance', ['work_per_device', 'idle_time'])
+def load_balance_devices(model_options: dict[str], total_work: int, return_idle_time=False, work_normalized: int=None):
+    'Optimize work assigned to different devices, accounting for their relative speeds and splittable work.'
+    opts_dict = model_options['multigpu_options']
+    devices = list(model_options['multigpu_clones'].keys())
+    speed_per_device = []
+    work_per_device = []
+    # get sum of each device's relative_speed
+    total_speed = 0.0
+    for opts in opts_dict.values():
+        total_speed += opts['relative_speed']
+    # get relative work for each device;
+    # obtained by w = (W*r)/R
+    for device in devices:
+        relative_speed = opts_dict[device]['relative_speed']
+        relative_work = (total_work*relative_speed) / total_speed
+        speed_per_device.append(relative_speed)
+        work_per_device.append(relative_work)
+    # relative work must be expressed in whole numbers, but likely is a decimal;
+    # perform rounding while maintaining total sum equal to total work (sum of relative works)
+    work_per_device = round_preserved(work_per_device)
+    dict_work_per_device = {}
+    for device, relative_work in zip(devices, work_per_device):
+        dict_work_per_device[device] = relative_work
+    if not return_idle_time:
+        return LoadBalance(dict_work_per_device, None)
+    # divide relative work by relative speed to get estimated completion time of said work by each device;
+    # time here is relative and does not correspond to real-world units
+    completion_time = [w/r for w,r in zip(work_per_device, speed_per_device)]
+    # calculate relative time spent by the devices waiting on each other after their work is completed
+    idle_time = abs(min(completion_time) - max(completion_time))
+    # if need to compare work idle time, need to normalize to a common total work
+    if work_normalized:
+        idle_time *= (work_normalized/total_work)
+
+    return LoadBalance(dict_work_per_device, idle_time)
+
+def round_preserved(values: list[float]):
+    'Round all values in a list, preserving the combined sum of values.'
+    # get floor of values; casting to int does it too
+    floored = [int(x) for x in values]
+    total_floored = sum(floored)
+    # get remainder to distribute
+    remainder = round(sum(values)) - total_floored
+    # pair values with fractional portions
+    fractional = [(i, x-floored[i]) for i, x in enumerate(values)]
+    # sort by fractional part in descending order
+    fractional.sort(key=lambda x: x[1], reverse=True)
+    # distribute the remainder
+    for i in range(remainder):
+        index = fractional[i][0]
+        floored[index] += 1
+    return floored
--- a/comfy/nested_tensor.py
+++ b/comfy/nested_tensor.py
@@ -1,91 +0,0 @@
-import torch
-
-class NestedTensor:
-    def __init__(self, tensors):
-        self.tensors = list(tensors)
-        self.is_nested = True
-
-    def _copy(self):
-        return NestedTensor(self.tensors)
-
-    def apply_operation(self, other, operation):
-        o = self._copy()
-        if isinstance(other, NestedTensor):
-            for i, t in enumerate(o.tensors):
-                o.tensors[i] = operation(t, other.tensors[i])
-        else:
-            for i, t in enumerate(o.tensors):
-                o.tensors[i] = operation(t, other)
-        return o
-
-    def __add__(self, b):
-        return self.apply_operation(b, lambda x, y: x + y)
-
-    def __sub__(self, b):
-        return self.apply_operation(b, lambda x, y: x - y)
-
-    def __mul__(self, b):
-        return self.apply_operation(b, lambda x, y: x * y)
-
-    # def __itruediv__(self, b):
-    #     return self.apply_operation(b, lambda x, y: x / y)
-
-    def __truediv__(self, b):
-        return self.apply_operation(b, lambda x, y: x / y)
-
-    def __getitem__(self, *args, **kwargs):
-        return self.apply_operation(None, lambda x, y: x.__getitem__(*args, **kwargs))
-
-    def unbind(self):
-        return self.tensors
-
-    def to(self, *args, **kwargs):
-        o = self._copy()
-        for i, t in enumerate(o.tensors):
-            o.tensors[i] = t.to(*args, **kwargs)
-        return o
-
-    def new_ones(self, *args, **kwargs):
-        return self.tensors[0].new_ones(*args, **kwargs)
-
-    def float(self):
-        return self.to(dtype=torch.float)
-
-    def chunk(self, *args, **kwargs):
-        return self.apply_operation(None, lambda x, y: x.chunk(*args, **kwargs))
-
-    def size(self):
-        return self.tensors[0].size()
-
-    @property
-    def shape(self):
-        return self.tensors[0].shape
-
-    @property
-    def ndim(self):
-        dims = 0
-        for t in self.tensors:
-            dims = max(t.ndim, dims)
-        return dims
-
-    @property
-    def device(self):
-        return self.tensors[0].device
-
-    @property
-    def dtype(self):
-        return self.tensors[0].dtype
-
-    @property
-    def layout(self):
-        return self.tensors[0].layout
-
-
-def cat_nested(tensors, *args, **kwargs):
-    cated_tensors = []
-    for i in range(len(tensors[0].tensors)):
-        tens = []
-        for j in range(len(tensors)):
-            tens.append(tensors[j].tensors[i])
-        cated_tensors.append(torch.cat(tens, *args, **kwargs))
-    return NestedTensor(cated_tensors)
--- a/comfy/ops.py
+++ b/comfy/ops.py
@@ -25,9 +25,6 @@ import comfy.rmsnorm
 import contextlib

 def run_every_op():
-    if torch.compiler.is_compiling():
-        return
-
    comfy.model_management.throw_exception_if_processing_interrupted()

 def scaled_dot_product_attention(q, k, v, *args, **kwargs):
@@ -35,7 +32,7 @@ def scaled_dot_product_attention(q, k, v, *args, **kwargs):


 try:
-    if torch.cuda.is_available() and comfy.model_management.WINDOWS:
+    if torch.cuda.is_available():
        from torch.nn.attention import SDPBackend, sdpa_kernel
        import inspect
        if "set_priority" in inspect.signature(sdpa_kernel).parameters:
@@ -55,90 +52,49 @@ try:
 except (ModuleNotFoundError, TypeError):
    logging.warning("Could not set sdpa backend priority.")

-NVIDIA_MEMORY_CONV_BUG_WORKAROUND = False
-try:
-    if comfy.model_management.is_nvidia():
-        cudnn_version = torch.backends.cudnn.version()
-        if (cudnn_version >= 91002 and cudnn_version < 91500) and comfy.model_management.torch_version_numeric >= (2, 9) and comfy.model_management.torch_version_numeric <= (2, 10):
-            #TODO: change upper bound version once it's fixed'
-            NVIDIA_MEMORY_CONV_BUG_WORKAROUND = True
-            logging.info("working around nvidia conv3d memory bug.")
-except:
-    pass
-
 cast_to = comfy.model_management.cast_to #TODO: remove once no more references

+if torch.cuda.is_available() and torch.backends.cudnn.is_available() and PerformanceFeature.AutoTune in args.fast:
+    torch.backends.cudnn.benchmark = True
+
 def cast_to_input(weight, input, non_blocking=False, copy=True):
    return comfy.model_management.cast_to(weight, input.dtype, input.device, non_blocking=non_blocking, copy=copy)

-
-def cast_bias_weight(s, input=None, dtype=None, device=None, bias_dtype=None, offloadable=False):
-    # NOTE: offloadable=False is a a legacy and if you are a custom node author reading this please pass
-    # offloadable=True and call uncast_bias_weight() after your last usage of the weight/bias. This
-    # will add async-offload support to your cast and improve performance.
+def cast_bias_weight(s, input=None, dtype=None, device=None, bias_dtype=None):
    if input is not None:
        if dtype is None:
-            if isinstance(input, QuantizedTensor):
-                dtype = input._layout_params["orig_dtype"]
-            else:
-                dtype = input.dtype
+            dtype = input.dtype
        if bias_dtype is None:
            bias_dtype = dtype
        if device is None:
            device = input.device

-    if offloadable and (device != s.weight.device or
-                        (s.bias is not None and device != s.bias.device)):
-        offload_stream = comfy.model_management.get_offload_stream(device)
-    else:
-        offload_stream = None
-
+    offload_stream = comfy.model_management.get_offload_stream(device)
    if offload_stream is not None:
        wf_context = offload_stream
    else:
        wf_context = contextlib.nullcontext()

-    non_blocking = comfy.model_management.device_supports_non_blocking(device)
-
-    weight_has_function = len(s.weight_function) > 0
-    bias_has_function = len(s.bias_function) > 0
-
-    weight = comfy.model_management.cast_to(s.weight, None, device, non_blocking=non_blocking, copy=weight_has_function, stream=offload_stream)
-
    bias = None
+    non_blocking = comfy.model_management.device_supports_non_blocking(device)
    if s.bias is not None:
-        bias = comfy.model_management.cast_to(s.bias, bias_dtype, device, non_blocking=non_blocking, copy=bias_has_function, stream=offload_stream)
+        has_function = len(s.bias_function) > 0
+        bias = comfy.model_management.cast_to(s.bias, bias_dtype, device, non_blocking=non_blocking, copy=has_function, stream=offload_stream)

-        if bias_has_function:
+        if has_function:
            with wf_context:
                for f in s.bias_function:
                    bias = f(bias)

-    if weight_has_function or weight.dtype != dtype:
+    has_function = len(s.weight_function) > 0
+    weight = comfy.model_management.cast_to(s.weight, dtype, device, non_blocking=non_blocking, copy=has_function, stream=offload_stream)
+    if has_function:
        with wf_context:
-            weight = weight.to(dtype=dtype)
            for f in s.weight_function:
                weight = f(weight)

    comfy.model_management.sync_stream(device, offload_stream)
-    if offloadable:
-        return weight, bias, offload_stream
-    else:
-        #Legacy function signature
-        return weight, bias
-
-
-def uncast_bias_weight(s, weight, bias, offload_stream):
-    if offload_stream is None:
-        return
-    if weight is not None:
-        device = weight.device
-    else:
-        if bias is None:
-            return
-        device = bias.device
-    offload_stream.wait_stream(comfy.model_management.current_stream(device))
-
+    return weight, bias

 class CastWeightBiasOp:
    comfy_cast_weights = False
@@ -151,10 +107,8 @@ class disable_weight_init:
            return None

        def forward_comfy_cast_weights(self, input):
-            weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
-            x = torch.nn.functional.linear(input, weight, bias)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
+            weight, bias = cast_bias_weight(self, input)
+            return torch.nn.functional.linear(input, weight, bias)

        def forward(self, *args, **kwargs):
            run_every_op()
@@ -168,10 +122,8 @@ class disable_weight_init:
            return None

        def forward_comfy_cast_weights(self, input):
-            weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
-            x = self._conv_forward(input, weight, bias)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
+            weight, bias = cast_bias_weight(self, input)
+            return self._conv_forward(input, weight, bias)

        def forward(self, *args, **kwargs):
            run_every_op()
@@ -185,10 +137,8 @@ class disable_weight_init:
            return None

        def forward_comfy_cast_weights(self, input):
-            weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
-            x = self._conv_forward(input, weight, bias)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
+            weight, bias = cast_bias_weight(self, input)
+            return self._conv_forward(input, weight, bias)

        def forward(self, *args, **kwargs):
            run_every_op()
@@ -201,20 +151,9 @@ class disable_weight_init:
        def reset_parameters(self):
            return None

-        def _conv_forward(self, input, weight, bias, *args, **kwargs):
-            if NVIDIA_MEMORY_CONV_BUG_WORKAROUND and weight.dtype in (torch.float16, torch.bfloat16):
-                out = torch.cudnn_convolution(input, weight, self.padding, self.stride, self.dilation, self.groups, benchmark=False, deterministic=False, allow_tf32=True)
-                if bias is not None:
-                    out += bias.reshape((1, -1) + (1,) * (out.ndim - 2))
-                return out
-            else:
-                return super()._conv_forward(input, weight, bias, *args, **kwargs)
-
        def forward_comfy_cast_weights(self, input):
-            weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
-            x = self._conv_forward(input, weight, bias)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
+            weight, bias = cast_bias_weight(self, input)
+            return self._conv_forward(input, weight, bias)

        def forward(self, *args, **kwargs):
            run_every_op()
@@ -228,10 +167,8 @@ class disable_weight_init:
            return None

        def forward_comfy_cast_weights(self, input):
-            weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
-            x = torch.nn.functional.group_norm(input, self.num_groups, weight, bias, self.eps)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
+            weight, bias = cast_bias_weight(self, input)
+            return torch.nn.functional.group_norm(input, self.num_groups, weight, bias, self.eps)

        def forward(self, *args, **kwargs):
            run_every_op()
@@ -246,14 +183,11 @@ class disable_weight_init:

        def forward_comfy_cast_weights(self, input):
            if self.weight is not None:
-                weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
+                weight, bias = cast_bias_weight(self, input)
            else:
                weight = None
                bias = None
-                offload_stream = None
-            x = torch.nn.functional.layer_norm(input, self.normalized_shape, weight, bias, self.eps)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
+            return torch.nn.functional.layer_norm(input, self.normalized_shape, weight, bias, self.eps)

        def forward(self, *args, **kwargs):
            run_every_op()
@@ -269,15 +203,11 @@ class disable_weight_init:

        def forward_comfy_cast_weights(self, input):
            if self.weight is not None:
-                weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
+                weight, bias = cast_bias_weight(self, input)
            else:
                weight = None
-                bias = None
-                offload_stream = None
-            x = comfy.rmsnorm.rms_norm(input, weight, self.eps)  # TODO: switch to commented out line when old torch is deprecated
-            # x = torch.nn.functional.rms_norm(input, self.normalized_shape, weight, self.eps)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
+            return comfy.rmsnorm.rms_norm(input, weight, self.eps)  # TODO: switch to commented out line when old torch is deprecated
+            # return torch.nn.functional.rms_norm(input, self.normalized_shape, weight, self.eps)

        def forward(self, *args, **kwargs):
            run_every_op()
@@ -296,12 +226,10 @@ class disable_weight_init:
                input, output_size, self.stride, self.padding, self.kernel_size,
                num_spatial_dims, self.dilation)

-            weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
-            x = torch.nn.functional.conv_transpose2d(
+            weight, bias = cast_bias_weight(self, input)
+            return torch.nn.functional.conv_transpose2d(
                input, weight, bias, self.stride, self.padding,
                output_padding, self.groups, self.dilation)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x

        def forward(self, *args, **kwargs):
            run_every_op()
@@ -320,12 +248,10 @@ class disable_weight_init:
                input, output_size, self.stride, self.padding, self.kernel_size,
                num_spatial_dims, self.dilation)

-            weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
-            x = torch.nn.functional.conv_transpose1d(
+            weight, bias = cast_bias_weight(self, input)
+            return torch.nn.functional.conv_transpose1d(
                input, weight, bias, self.stride, self.padding,
                output_padding, self.groups, self.dilation)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x

        def forward(self, *args, **kwargs):
            run_every_op()
@@ -343,11 +269,8 @@ class disable_weight_init:
            output_dtype = out_dtype
            if self.weight.dtype == torch.float16 or self.weight.dtype == torch.bfloat16:
                out_dtype = None
-            weight, bias, offload_stream = cast_bias_weight(self, device=input.device, dtype=out_dtype, offloadable=True)
-            x = torch.nn.functional.embedding(input, weight, self.padding_idx, self.max_norm, self.norm_type, self.scale_grad_by_freq, self.sparse).to(dtype=output_dtype)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
-
+            weight, bias = cast_bias_weight(self, device=input.device, dtype=out_dtype)
+            return torch.nn.functional.embedding(input, weight, self.padding_idx, self.max_norm, self.norm_type, self.scale_grad_by_freq, self.sparse).to(dtype=output_dtype)

        def forward(self, *args, **kwargs):
            run_every_op()
@@ -401,18 +324,20 @@ class manual_cast(disable_weight_init):


 def fp8_linear(self, input):
-    """
-    Legacy FP8 linear function for backward compatibility.
-    Uses QuantizedTensor subclass for dispatch.
-    """
    dtype = self.weight.dtype
    if dtype not in [torch.float8_e4m3fn]:
        return None

-    input_dtype = input.dtype
+    tensor_2d = False
+    if len(input.shape) == 2:
+        tensor_2d = True
+        input = input.unsqueeze(1)

-    if input.ndim == 3 or input.ndim == 2:
-        w, bias, offload_stream = cast_bias_weight(self, input, dtype=dtype, bias_dtype=input_dtype, offloadable=True)
+    input_shape = input.shape
+    input_dtype = input.dtype
+    if len(input.shape) == 3:
+        w, bias = cast_bias_weight(self, input, dtype=dtype, bias_dtype=input_dtype)
+        w = w.t()

        scale_weight = self.scale_weight
        scale_input = self.scale_input
@@ -424,20 +349,23 @@ def fp8_linear(self, input):
        if scale_input is None:
            scale_input = torch.ones((), device=input.device, dtype=torch.float32)
            input = torch.clamp(input, min=-448, max=448, out=input)
-            layout_params_weight = {'scale': scale_input, 'orig_dtype': input_dtype}
-            quantized_input = QuantizedTensor(input.to(dtype).contiguous(), "TensorCoreFP8Layout", layout_params_weight)
+            input = input.reshape(-1, input_shape[2]).to(dtype).contiguous()
        else:
            scale_input = scale_input.to(input.device)
-            quantized_input = QuantizedTensor.from_float(input, "TensorCoreFP8Layout", scale=scale_input, dtype=dtype)
+            input = (input * (1.0 / scale_input).to(input_dtype)).reshape(-1, input_shape[2]).to(dtype).contiguous()

-        # Wrap weight in QuantizedTensor - this enables unified dispatch
-        # Call F.linear - __torch_dispatch__ routes to fp8_linear handler in quant_ops.py!
-        layout_params_weight = {'scale': scale_weight, 'orig_dtype': input_dtype}
-        quantized_weight = QuantizedTensor(w, "TensorCoreFP8Layout", layout_params_weight)
-        o = torch.nn.functional.linear(quantized_input, quantized_weight, bias)
+        if bias is not None:
+            o = torch._scaled_mm(input, w, out_dtype=input_dtype, bias=bias, scale_a=scale_input, scale_b=scale_weight)
+        else:
+            o = torch._scaled_mm(input, w, out_dtype=input_dtype, scale_a=scale_input, scale_b=scale_weight)

-        uncast_bias_weight(self, w, bias, offload_stream)
-        return o
+        if isinstance(o, tuple):
+            o = o[0]
+
+        if tensor_2d:
+            return o.reshape(input_shape[0], -1)
+
+        return o.reshape((-1, input_shape[1], self.weight.shape[0]))

    return None

@@ -457,10 +385,8 @@ class fp8_ops(manual_cast):
                except Exception as e:
                    logging.info("Exception during fp8 op: {}".format(e))

-            weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
-            x = torch.nn.functional.linear(input, weight, bias)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
+            weight, bias = cast_bias_weight(self, input)
+            return torch.nn.functional.linear(input, weight, bias)

 def scaled_fp8_ops(fp8_matrix_mult=False, scale_input=False, override_dtype=None):
    logging.info("Using scaled fp8: fp8 matrix mult: {}, scale input: {}".format(fp8_matrix_mult, scale_input))
@@ -488,14 +414,12 @@ def scaled_fp8_ops(fp8_matrix_mult=False, scale_input=False, override_dtype=None
                    if out is not None:
                        return out

-                weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
+                weight, bias = cast_bias_weight(self, input)

                if weight.numel() < input.numel(): #TODO: optimize
-                    x = torch.nn.functional.linear(input, weight * self.scale_weight.to(device=weight.device, dtype=weight.dtype), bias)
+                    return torch.nn.functional.linear(input, weight * self.scale_weight.to(device=weight.device, dtype=weight.dtype), bias)
                else:
-                    x = torch.nn.functional.linear(input * self.scale_weight.to(device=weight.device, dtype=weight.dtype), weight, bias)
-                uncast_bias_weight(self, weight, bias, offload_stream)
-                return x
+                    return torch.nn.functional.linear(input * self.scale_weight.to(device=weight.device, dtype=weight.dtype), weight, bias)

            def convert_weight(self, weight, inplace=False, **kwargs):
                if inplace:
@@ -534,120 +458,7 @@ if CUBLAS_IS_AVAILABLE:
            def forward(self, *args, **kwargs):
                return super().forward(*args, **kwargs)

-
-# ==============================================================================
-# Mixed Precision Operations
-# ==============================================================================
-from .quant_ops import QuantizedTensor, QUANT_ALGOS
-
-class MixedPrecisionOps(disable_weight_init):
-    _layer_quant_config = {}
-    _compute_dtype = torch.bfloat16
-
-    class Linear(torch.nn.Module, CastWeightBiasOp):
-        def __init__(
-            self,
-            in_features: int,
-            out_features: int,
-            bias: bool = True,
-            device=None,
-            dtype=None,
-        ) -> None:
-            super().__init__()
-
-            self.factory_kwargs = {"device": device, "dtype": MixedPrecisionOps._compute_dtype}
-            # self.factory_kwargs = {"device": device, "dtype": dtype}
-
-            self.in_features = in_features
-            self.out_features = out_features
-            if bias:
-                self.bias = torch.nn.Parameter(torch.empty(out_features, **self.factory_kwargs))
-            else:
-                self.register_parameter("bias", None)
-
-            self.tensor_class = None
-
-        def reset_parameters(self):
-            return None
-
-        def _load_from_state_dict(self, state_dict, prefix, local_metadata,
-                                  strict, missing_keys, unexpected_keys, error_msgs):
-
-            device = self.factory_kwargs["device"]
-            layer_name = prefix.rstrip('.')
-            weight_key = f"{prefix}weight"
-            weight = state_dict.pop(weight_key, None)
-            if weight is None:
-                raise ValueError(f"Missing weight for layer {layer_name}")
-
-            manually_loaded_keys = [weight_key]
-
-            if layer_name not in MixedPrecisionOps._layer_quant_config:
-                self.weight = torch.nn.Parameter(weight.to(device=device, dtype=MixedPrecisionOps._compute_dtype), requires_grad=False)
-            else:
-                quant_format = MixedPrecisionOps._layer_quant_config[layer_name].get("format", None)
-                if quant_format is None:
-                    raise ValueError(f"Unknown quantization format for layer {layer_name}")
-
-                qconfig = QUANT_ALGOS[quant_format]
-                self.layout_type = qconfig["comfy_tensor_layout"]
-
-                weight_scale_key = f"{prefix}weight_scale"
-                layout_params = {
-                    'scale': state_dict.pop(weight_scale_key, None),
-                    'orig_dtype': MixedPrecisionOps._compute_dtype,
-                    'block_size': qconfig.get("group_size", None),
-                }
-                if layout_params['scale'] is not None:
-                    manually_loaded_keys.append(weight_scale_key)
-
-                self.weight = torch.nn.Parameter(
-                    QuantizedTensor(weight.to(device=device), self.layout_type, layout_params),
-                    requires_grad=False
-                )
-
-                for param_name in qconfig["parameters"]:
-                    param_key = f"{prefix}{param_name}"
-                    _v = state_dict.pop(param_key, None)
-                    if _v is None:
-                        continue
-                    setattr(self, param_name, torch.nn.Parameter(_v.to(device=device), requires_grad=False))
-                    manually_loaded_keys.append(param_key)
-
-            super()._load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)
-
-            for key in manually_loaded_keys:
-                if key in missing_keys:
-                    missing_keys.remove(key)
-
-        def _forward(self, input, weight, bias):
-            return torch.nn.functional.linear(input, weight, bias)
-
-        def forward_comfy_cast_weights(self, input):
-            weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
-            x = self._forward(input, weight, bias)
-            uncast_bias_weight(self, weight, bias, offload_stream)
-            return x
-
-        def forward(self, input, *args, **kwargs):
-            run_every_op()
-
-            if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
-                return self.forward_comfy_cast_weights(input, *args, **kwargs)
-            if (getattr(self, 'layout_type', None) is not None and
-                getattr(self, 'input_scale', None) is not None and
-                not isinstance(input, QuantizedTensor)):
-                input = QuantizedTensor.from_float(input, self.layout_type, scale=self.input_scale, dtype=self.weight.dtype)
-            return self._forward(input, self.weight, self.bias)
-
-
-def pick_operations(weight_dtype, compute_dtype, load_device=None, disable_fast_fp8=False, fp8_optimizations=False, scaled_fp8=None, model_config=None):
-    if model_config and hasattr(model_config, 'layer_quant_config') and model_config.layer_quant_config:
-        MixedPrecisionOps._layer_quant_config = model_config.layer_quant_config
-        MixedPrecisionOps._compute_dtype = compute_dtype
-        logging.info(f"Using mixed precision operations: {len(model_config.layer_quant_config)} quantized layers")
-        return MixedPrecisionOps
-
+def pick_operations(weight_dtype, compute_dtype, load_device=None, disable_fast_fp8=False, fp8_optimizations=False, scaled_fp8=None):
    fp8_compute = comfy.model_management.supports_fp8_compute(load_device)
    if scaled_fp8 is not None:
        return scaled_fp8_ops(fp8_matrix_mult=fp8_compute and fp8_optimizations, scale_input=fp8_optimizations, override_dtype=scaled_fp8)
--- a/comfy/patcher_extension.py
+++ b/comfy/patcher_extension.py
@@ -3,6 +3,8 @@ from typing import Callable

 class CallbacksMP:
    ON_CLONE = "on_clone"
+    ON_DEEPCLONE_MULTIGPU = "on_deepclone_multigpu"
+    ON_MATCH_MULTIGPU_CLONES = "on_match_multigpu_clones"
    ON_LOAD = "on_load_after"
    ON_DETACH = "on_detach_after"
    ON_CLEANUP = "on_cleanup"
--- a/comfy/quant_ops.py
+++ b/comfy/quant_ops.py
@@ -1,545 +0,0 @@
-import torch
-import logging
-from typing import Tuple, Dict
-
-_LAYOUT_REGISTRY = {}
-_GENERIC_UTILS = {}
-
-
-def register_layout_op(torch_op, layout_type):
-    """
-    Decorator to register a layout-specific operation handler.
-    Args:
-        torch_op: PyTorch operation (e.g., torch.ops.aten.linear.default)
-        layout_type: Layout class (e.g., TensorCoreFP8Layout)
-    Example:
-        @register_layout_op(torch.ops.aten.linear.default, TensorCoreFP8Layout)
-        def fp8_linear(func, args, kwargs):
-            # FP8-specific linear implementation
-            ...
-    """
-    def decorator(handler_func):
-        if torch_op not in _LAYOUT_REGISTRY:
-            _LAYOUT_REGISTRY[torch_op] = {}
-        _LAYOUT_REGISTRY[torch_op][layout_type] = handler_func
-        return handler_func
-    return decorator
-
-
-def register_generic_util(torch_op):
-    """
-    Decorator to register a generic utility that works for all layouts.
-    Args:
-        torch_op: PyTorch operation (e.g., torch.ops.aten.detach.default)
-
-    Example:
-        @register_generic_util(torch.ops.aten.detach.default)
-        def generic_detach(func, args, kwargs):
-            # Works for any layout
-            ...
-    """
-    def decorator(handler_func):
-        _GENERIC_UTILS[torch_op] = handler_func
-        return handler_func
-    return decorator
-
-
-def _get_layout_from_args(args):
-    for arg in args:
-        if isinstance(arg, QuantizedTensor):
-            return arg._layout_type
-        elif isinstance(arg, (list, tuple)):
-            for item in arg:
-                if isinstance(item, QuantizedTensor):
-                    return item._layout_type
-    return None
-
-
-def _move_layout_params_to_device(params, device):
-    new_params = {}
-    for k, v in params.items():
-        if isinstance(v, torch.Tensor):
-            new_params[k] = v.to(device=device)
-        else:
-            new_params[k] = v
-    return new_params
-
-
-def _copy_layout_params(params):
-    new_params = {}
-    for k, v in params.items():
-        if isinstance(v, torch.Tensor):
-            new_params[k] = v.clone()
-        else:
-            new_params[k] = v
-    return new_params
-
-def _copy_layout_params_inplace(src, dst, non_blocking=False):
-    for k, v in src.items():
-        if isinstance(v, torch.Tensor):
-            dst[k].copy_(v, non_blocking=non_blocking)
-        else:
-            dst[k] = v
-
-class QuantizedLayout:
-    """
-    Base class for quantization layouts.
-
-    A layout encapsulates the format-specific logic for quantization/dequantization
-    and provides a uniform interface for extracting raw tensors needed for computation.
-
-    New quantization formats should subclass this and implement the required methods.
-    """
-    @classmethod
-    def quantize(cls, tensor, **kwargs) -> Tuple[torch.Tensor, Dict]:
-        raise NotImplementedError(f"{cls.__name__} must implement quantize()")
-
-    @staticmethod
-    def dequantize(qdata, **layout_params) -> torch.Tensor:
-        raise NotImplementedError("TensorLayout must implement dequantize()")
-
-    @classmethod
-    def get_plain_tensors(cls, qtensor) -> torch.Tensor:
-        raise NotImplementedError(f"{cls.__name__} must implement get_plain_tensors()")
-
-
-class QuantizedTensor(torch.Tensor):
-    """
-    Universal quantized tensor that works with any layout.
-
-    This tensor subclass uses a pluggable layout system to support multiple
-    quantization formats (FP8, INT4, INT8, etc.) without code duplication.
-
-    The layout_type determines format-specific behavior, while common operations
-    (detach, clone, to) are handled generically.
-
-    Attributes:
-        _qdata: The quantized tensor data
-        _layout_type: Layout class (e.g., TensorCoreFP8Layout)
-        _layout_params: Dict with layout-specific params (scale, zero_point, etc.)
-    """
-
-    @staticmethod
-    def __new__(cls, qdata, layout_type, layout_params):
-        """
-        Create a quantized tensor.
-
-        Args:
-            qdata: The quantized data tensor
-            layout_type: Layout class (subclass of QuantizedLayout)
-            layout_params: Dict with layout-specific parameters
-        """
-        return torch.Tensor._make_wrapper_subclass(cls, qdata.shape, device=qdata.device, dtype=qdata.dtype, requires_grad=False)
-
-    def __init__(self, qdata, layout_type, layout_params):
-        self._qdata = qdata
-        self._layout_type = layout_type
-        self._layout_params = layout_params
-
-    def __repr__(self):
-        layout_name = self._layout_type
-        param_str = ", ".join(f"{k}={v}" for k, v in list(self._layout_params.items())[:2])
-        return f"QuantizedTensor(shape={self.shape}, layout={layout_name}, {param_str})"
-
-    @property
-    def layout_type(self):
-        return self._layout_type
-
-    def __tensor_flatten__(self):
-        """
-        Tensor flattening protocol for proper device movement.
-        """
-        inner_tensors = ["_qdata"]
-        ctx = {
-            "layout_type": self._layout_type,
-        }
-
-        tensor_params = {}
-        non_tensor_params = {}
-        for k, v in self._layout_params.items():
-            if isinstance(v, torch.Tensor):
-                tensor_params[k] = v
-            else:
-                non_tensor_params[k] = v
-
-        ctx["tensor_param_keys"] = list(tensor_params.keys())
-        ctx["non_tensor_params"] = non_tensor_params
-
-        for k, v in tensor_params.items():
-            attr_name = f"_layout_param_{k}"
-            object.__setattr__(self, attr_name, v)
-            inner_tensors.append(attr_name)
-
-        return inner_tensors, ctx
-
-    @staticmethod
-    def __tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride):
-        """
-        Tensor unflattening protocol for proper device movement.
-        Reconstructs the QuantizedTensor after device movement.
-        """
-        layout_type = ctx["layout_type"]
-        layout_params = dict(ctx["non_tensor_params"])
-
-        for key in ctx["tensor_param_keys"]:
-            attr_name = f"_layout_param_{key}"
-            layout_params[key] = inner_tensors[attr_name]
-
-        return QuantizedTensor(inner_tensors["_qdata"], layout_type, layout_params)
-
-    @classmethod
-    def from_float(cls, tensor, layout_type, **quantize_kwargs) -> 'QuantizedTensor':
-        qdata, layout_params = LAYOUTS[layout_type].quantize(tensor, **quantize_kwargs)
-        return cls(qdata, layout_type, layout_params)
-
-    def dequantize(self) -> torch.Tensor:
-        return LAYOUTS[self._layout_type].dequantize(self._qdata, **self._layout_params)
-
-    @classmethod
-    def __torch_dispatch__(cls, func, types, args=(), kwargs=None):
-        kwargs = kwargs or {}
-
-        # Step 1: Check generic utilities first (detach, clone, to, etc.)
-        if func in _GENERIC_UTILS:
-            return _GENERIC_UTILS[func](func, args, kwargs)
-
-        # Step 2: Check layout-specific handlers (linear, matmul, etc.)
-        layout_type = _get_layout_from_args(args)
-        if layout_type and func in _LAYOUT_REGISTRY:
-            handler = _LAYOUT_REGISTRY[func].get(layout_type)
-            if handler:
-                return handler(func, args, kwargs)
-
-        # Step 3: Fallback to dequantization
-        if isinstance(args[0] if args else None, QuantizedTensor):
-            logging.info(f"QuantizedTensor: Unhandled operation {func}, falling back to dequantization. kwargs={kwargs}")
-        return cls._dequant_and_fallback(func, args, kwargs)
-
-    @classmethod
-    def _dequant_and_fallback(cls, func, args, kwargs):
-        def dequant_arg(arg):
-            if isinstance(arg, QuantizedTensor):
-                return arg.dequantize()
-            elif isinstance(arg, (list, tuple)):
-                return type(arg)(dequant_arg(a) for a in arg)
-            return arg
-
-        new_args = dequant_arg(args)
-        new_kwargs = dequant_arg(kwargs)
-        return func(*new_args, **new_kwargs)
-
-
-# ==============================================================================
-# Generic Utilities (Layout-Agnostic Operations)
-# ==============================================================================
-
-def _create_transformed_qtensor(qt, transform_fn):
-    new_data = transform_fn(qt._qdata)
-    new_params = _copy_layout_params(qt._layout_params)
-    return QuantizedTensor(new_data, qt._layout_type, new_params)
-
-
-def _handle_device_transfer(qt, target_device, target_dtype=None, target_layout=None, op_name="to"):
-    if target_dtype is not None and target_dtype != qt.dtype:
-        logging.warning(
-            f"QuantizedTensor: dtype conversion requested to {target_dtype}, "
-            f"but not supported for quantized tensors. Ignoring dtype."
-        )
-
-    if target_layout is not None and target_layout != torch.strided:
-        logging.warning(
-            f"QuantizedTensor: layout change requested to {target_layout}, "
-            f"but not supported. Ignoring layout."
-        )
-
-    # Handle device transfer
-    current_device = qt._qdata.device
-    if target_device is not None:
-        # Normalize device for comparison
-        if isinstance(target_device, str):
-            target_device = torch.device(target_device)
-        if isinstance(current_device, str):
-            current_device = torch.device(current_device)
-
-        if target_device != current_device:
-            logging.debug(f"QuantizedTensor.{op_name}: Moving from {current_device} to {target_device}")
-            new_q_data = qt._qdata.to(device=target_device)
-            new_params = _move_layout_params_to_device(qt._layout_params, target_device)
-            new_qt = QuantizedTensor(new_q_data, qt._layout_type, new_params)
-            logging.debug(f"QuantizedTensor.{op_name}: Created new tensor on {target_device}")
-            return new_qt
-
-    logging.debug(f"QuantizedTensor.{op_name}: No device change needed, returning original")
-    return qt
-
-
-@register_generic_util(torch.ops.aten.detach.default)
-def generic_detach(func, args, kwargs):
-    """Detach operation - creates a detached copy of the quantized tensor."""
-    qt = args[0]
-    if isinstance(qt, QuantizedTensor):
-        return _create_transformed_qtensor(qt, lambda x: x.detach())
-    return func(*args, **kwargs)
-
-
-@register_generic_util(torch.ops.aten.clone.default)
-def generic_clone(func, args, kwargs):
-    """Clone operation - creates a deep copy of the quantized tensor."""
-    qt = args[0]
-    if isinstance(qt, QuantizedTensor):
-        return _create_transformed_qtensor(qt, lambda x: x.clone())
-    return func(*args, **kwargs)
-
-
-@register_generic_util(torch.ops.aten._to_copy.default)
-def generic_to_copy(func, args, kwargs):
-    """Device/dtype transfer operation - handles .to(device) calls."""
-    qt = args[0]
-    if isinstance(qt, QuantizedTensor):
-        return _handle_device_transfer(
-            qt,
-            target_device=kwargs.get('device', None),
-            target_dtype=kwargs.get('dtype', None),
-            op_name="_to_copy"
-        )
-    return func(*args, **kwargs)
-
-
-@register_generic_util(torch.ops.aten.to.dtype_layout)
-def generic_to_dtype_layout(func, args, kwargs):
-    """Handle .to(device) calls using the dtype_layout variant."""
-    qt = args[0]
-    if isinstance(qt, QuantizedTensor):
-        return _handle_device_transfer(
-            qt,
-            target_device=kwargs.get('device', None),
-            target_dtype=kwargs.get('dtype', None),
-            target_layout=kwargs.get('layout', None),
-            op_name="to"
-        )
-    return func(*args, **kwargs)
-
-
-@register_generic_util(torch.ops.aten.copy_.default)
-def generic_copy_(func, args, kwargs):
-    qt_dest = args[0]
-    src = args[1]
-    non_blocking = args[2] if len(args) > 2 else False
-    if isinstance(qt_dest, QuantizedTensor):
-        if isinstance(src, QuantizedTensor):
-            # Copy from another quantized tensor
-            qt_dest._qdata.copy_(src._qdata, non_blocking=non_blocking)
-            qt_dest._layout_type = src._layout_type
-            _copy_layout_params_inplace(src._layout_params, qt_dest._layout_params, non_blocking=non_blocking)
-        else:
-            # Copy from regular tensor - just copy raw data
-            qt_dest._qdata.copy_(src)
-        return qt_dest
-    return func(*args, **kwargs)
-
-
-@register_generic_util(torch.ops.aten._has_compatible_shallow_copy_type.default)
-def generic_has_compatible_shallow_copy_type(func, args, kwargs):
-    return True
-
-
-@register_generic_util(torch.ops.aten.empty_like.default)
-def generic_empty_like(func, args, kwargs):
-    """Empty_like operation - creates an empty tensor with the same quantized structure."""
-    qt = args[0]
-    if isinstance(qt, QuantizedTensor):
-        # Create empty tensor with same shape and dtype as the quantized data
-        hp_dtype = kwargs.pop('dtype', qt._layout_params["orig_dtype"])
-        new_qdata = torch.empty_like(qt._qdata, **kwargs)
-
-        # Handle device transfer for layout params
-        target_device = kwargs.get('device', new_qdata.device)
-        new_params = _move_layout_params_to_device(qt._layout_params, target_device)
-
-        # Update orig_dtype if dtype is specified
-        new_params['orig_dtype'] = hp_dtype
-
-        return QuantizedTensor(new_qdata, qt._layout_type, new_params)
-    return func(*args, **kwargs)
-
-# ==============================================================================
-# FP8 Layout + Operation Handlers
-# ==============================================================================
-class TensorCoreFP8Layout(QuantizedLayout):
-    """
-    Storage format:
-    - qdata: FP8 tensor (torch.float8_e4m3fn or torch.float8_e5m2)
-    - scale: Scalar tensor (float32) for dequantization
-    - orig_dtype: Original dtype before quantization (for casting back)
-    """
-    @classmethod
-    def quantize(cls, tensor, scale=None, dtype=torch.float8_e4m3fn):
-        orig_dtype = tensor.dtype
-
-        if scale is None:
-            scale = torch.amax(tensor.abs()) / torch.finfo(dtype).max
-
-        if not isinstance(scale, torch.Tensor):
-            scale = torch.tensor(scale)
-        scale = scale.to(device=tensor.device, dtype=torch.float32)
-
-        tensor_scaled = tensor * (1.0 / scale).to(tensor.dtype)
-        # TODO: uncomment this if it's actually needed because the clamp has a small performance penality'
-        # lp_amax = torch.finfo(dtype).max
-        # torch.clamp(tensor_scaled, min=-lp_amax, max=lp_amax, out=tensor_scaled)
-        qdata = tensor_scaled.to(dtype, memory_format=torch.contiguous_format)
-
-        layout_params = {
-            'scale': scale,
-            'orig_dtype': orig_dtype
-        }
-        return qdata, layout_params
-
-    @staticmethod
-    def dequantize(qdata, scale, orig_dtype, **kwargs):
-        plain_tensor = torch.ops.aten._to_copy.default(qdata, dtype=orig_dtype)
-        return plain_tensor * scale
-
-    @classmethod
-    def get_plain_tensors(cls, qtensor):
-        return qtensor._qdata, qtensor._layout_params['scale']
-
-QUANT_ALGOS = {
-    "float8_e4m3fn": {
-        "storage_t": torch.float8_e4m3fn,
-        "parameters": {"weight_scale", "input_scale"},
-        "comfy_tensor_layout": "TensorCoreFP8Layout",
-    },
-}
-
-LAYOUTS = {
-    "TensorCoreFP8Layout": TensorCoreFP8Layout,
-}
-
-
-@register_layout_op(torch.ops.aten.linear.default, "TensorCoreFP8Layout")
-def fp8_linear(func, args, kwargs):
-    input_tensor = args[0]
-    weight = args[1]
-    bias = args[2] if len(args) > 2 else None
-
-    if isinstance(input_tensor, QuantizedTensor) and isinstance(weight, QuantizedTensor):
-        plain_input, scale_a = TensorCoreFP8Layout.get_plain_tensors(input_tensor)
-        plain_weight, scale_b = TensorCoreFP8Layout.get_plain_tensors(weight)
-
-        out_dtype = kwargs.get("out_dtype")
-        if out_dtype is None:
-            out_dtype = input_tensor._layout_params['orig_dtype']
-
-        weight_t = plain_weight.t()
-
-        tensor_2d = False
-        if len(plain_input.shape) == 2:
-            tensor_2d = True
-            plain_input = plain_input.unsqueeze(1)
-
-        input_shape = plain_input.shape
-        if len(input_shape) != 3:
-            return None
-
-        try:
-            output = torch._scaled_mm(
-                plain_input.reshape(-1, input_shape[2]).contiguous(),
-                weight_t,
-                bias=bias,
-                scale_a=scale_a,
-                scale_b=scale_b,
-                out_dtype=out_dtype,
-            )
-
-            if isinstance(output, tuple):  # TODO: remove when we drop support for torch 2.4
-                output = output[0]
-
-            if not tensor_2d:
-                output = output.reshape((-1, input_shape[1], weight.shape[0]))
-
-            if output.dtype in [torch.float8_e4m3fn, torch.float8_e5m2]:
-                output_scale = scale_a * scale_b
-                output_params = {
-                    'scale': output_scale,
-                    'orig_dtype': input_tensor._layout_params['orig_dtype']
-                }
-                return QuantizedTensor(output, "TensorCoreFP8Layout", output_params)
-            else:
-                return output
-
-        except Exception as e:
-            raise RuntimeError(f"FP8 _scaled_mm failed, falling back to dequantization: {e}")
-
-    # Case 2: DQ Fallback
-    if isinstance(weight, QuantizedTensor):
-        weight = weight.dequantize()
-    if isinstance(input_tensor, QuantizedTensor):
-        input_tensor = input_tensor.dequantize()
-
-    return torch.nn.functional.linear(input_tensor, weight, bias)
-
-def fp8_mm_(input_tensor, weight, bias=None, out_dtype=None):
-    if out_dtype is None:
-        out_dtype = input_tensor._layout_params['orig_dtype']
-
-    plain_input, scale_a = TensorCoreFP8Layout.get_plain_tensors(input_tensor)
-    plain_weight, scale_b = TensorCoreFP8Layout.get_plain_tensors(weight)
-
-    output = torch._scaled_mm(
-        plain_input.contiguous(),
-        plain_weight,
-        bias=bias,
-        scale_a=scale_a,
-        scale_b=scale_b,
-        out_dtype=out_dtype,
-    )
-
-    if isinstance(output, tuple):  # TODO: remove when we drop support for torch 2.4
-        output = output[0]
-    return output
-
-@register_layout_op(torch.ops.aten.addmm.default, "TensorCoreFP8Layout")
-def fp8_addmm(func, args, kwargs):
-    input_tensor = args[1]
-    weight = args[2]
-    bias = args[0]
-
-    if isinstance(input_tensor, QuantizedTensor) and isinstance(weight, QuantizedTensor):
-        return fp8_mm_(input_tensor, weight, bias=bias, out_dtype=kwargs.get("out_dtype", None))
-
-    a = list(args)
-    if isinstance(args[0], QuantizedTensor):
-        a[0] = args[0].dequantize()
-    if isinstance(args[1], QuantizedTensor):
-        a[1] = args[1].dequantize()
-    if isinstance(args[2], QuantizedTensor):
-        a[2] = args[2].dequantize()
-
-    return func(*a, **kwargs)
-
-@register_layout_op(torch.ops.aten.mm.default, "TensorCoreFP8Layout")
-def fp8_mm(func, args, kwargs):
-    input_tensor = args[0]
-    weight = args[1]
-
-    if isinstance(input_tensor, QuantizedTensor) and isinstance(weight, QuantizedTensor):
-        return fp8_mm_(input_tensor, weight, bias=None, out_dtype=kwargs.get("out_dtype", None))
-
-    a = list(args)
-    if isinstance(args[0], QuantizedTensor):
-        a[0] = args[0].dequantize()
-    if isinstance(args[1], QuantizedTensor):
-        a[1] = args[1].dequantize()
-    return func(*a, **kwargs)
-
-@register_layout_op(torch.ops.aten.view.default, "TensorCoreFP8Layout")
-@register_layout_op(torch.ops.aten.t.default, "TensorCoreFP8Layout")
-def fp8_func(func, args, kwargs):
-    input_tensor = args[0]
-    if isinstance(input_tensor, QuantizedTensor):
-        plain_input, scale_a = TensorCoreFP8Layout.get_plain_tensors(input_tensor)
-        ar = list(args)
-        ar[0] = plain_input
-        return QuantizedTensor(func(*ar, **kwargs), "TensorCoreFP8Layout", input_tensor._layout_params)
-    return func(*args, **kwargs)
--- a/comfy/sample.py
+++ b/comfy/sample.py
@@ -4,9 +4,13 @@ import comfy.samplers
 import comfy.utils
 import numpy as np
 import logging
-import comfy.nested_tensor

-def prepare_noise_inner(latent_image, generator, noise_inds=None):
+def prepare_noise(latent_image, seed, noise_inds=None):
+    """
+    creates random noise given a latent image and a seed.
+    optional arg skip can be used to skip and discard x number of noise generations for a given seed
+    """
+    generator = torch.manual_seed(seed)
    if noise_inds is None:
        return torch.randn(latent_image.size(), dtype=latent_image.dtype, layout=latent_image.layout, generator=generator, device="cpu")

@@ -17,29 +21,10 @@ def prepare_noise_inner(latent_image, generator, noise_inds=None):
        if i in unique_inds:
            noises.append(noise)
    noises = [noises[i] for i in inverse]
-    return torch.cat(noises, axis=0)
-
-def prepare_noise(latent_image, seed, noise_inds=None):
-    """
-    creates random noise given a latent image and a seed.
-    optional arg skip can be used to skip and discard x number of noise generations for a given seed
-    """
-    generator = torch.manual_seed(seed)
-
-    if latent_image.is_nested:
-        tensors = latent_image.unbind()
-        noises = []
-        for t in tensors:
-            noises.append(prepare_noise_inner(t, generator, noise_inds))
-        noises = comfy.nested_tensor.NestedTensor(noises)
-    else:
-        noises = prepare_noise_inner(latent_image, generator, noise_inds)
-
+    noises = torch.cat(noises, axis=0)
    return noises

 def fix_empty_latent_channels(model, latent_image):
-    if latent_image.is_nested:
-        return latent_image
    latent_format = model.get_model_object("latent_format") #Resize the empty latent image so it has the right number of channels
    if latent_format.latent_channels != latent_image.shape[1] and torch.count_nonzero(latent_image) == 0:
        latent_image = comfy.utils.repeat_to_batch_size(latent_image, latent_format.latent_channels, dim=1)
--- a/comfy/sampler_helpers.py
+++ b/comfy/sampler_helpers.py
@@ -1,16 +1,17 @@
 from __future__ import annotations
+import torch
 import uuid
 import math
 import collections
 import comfy.model_management
 import comfy.conds
+import comfy.model_patcher
 import comfy.utils
 import comfy.hooks
 import comfy.patcher_extension
 from typing import TYPE_CHECKING
 if TYPE_CHECKING:
    from comfy.model_patcher import ModelPatcher
-    from comfy.model_base import BaseModel
    from comfy.controlnet import ControlBase

 def prepare_mask(noise_mask, shape, device):
@@ -106,6 +107,47 @@ def cleanup_additional_models(models):
        if hasattr(m, 'cleanup'):
            m.cleanup()

+def preprocess_multigpu_conds(conds: dict[str, list[dict[str]]], model: ModelPatcher, model_options: dict[str]):
+    '''If multigpu acceleration required, creates deepclones of ControlNets and GLIGEN per device.'''
+    multigpu_models: list[ModelPatcher] = model.get_additional_models_with_key("multigpu")
+    if len(multigpu_models) == 0:
+        return
+    extra_devices = [x.load_device for x in multigpu_models]
+    # handle controlnets
+    controlnets: set[ControlBase] = set()
+    for k in conds:
+        for kk in conds[k]:
+            if 'control' in kk:
+                controlnets.add(kk['control'])
+    if len(controlnets) > 0:
+        # first, unload all controlnet clones
+        for cnet in list(controlnets):
+            cnet_models = cnet.get_models()
+            for cm in cnet_models:
+                comfy.model_management.unload_model_and_clones(cm, unload_additional_models=True)
+
+        # next, make sure each controlnet has a deepclone for all relevant devices
+        for cnet in controlnets:
+            curr_cnet = cnet
+            while curr_cnet is not None:
+                for device in extra_devices:
+                    if device not in curr_cnet.multigpu_clones:
+                        curr_cnet.deepclone_multigpu(device, autoregister=True)
+                curr_cnet = curr_cnet.previous_controlnet
+        # since all device clones are now present, recreate the linked list for cloned cnets per device
+        for cnet in controlnets:
+            curr_cnet = cnet
+            while curr_cnet is not None:
+                prev_cnet = curr_cnet.previous_controlnet
+                for device in extra_devices:
+                    device_cnet = curr_cnet.get_instance_for_device(device)
+                    prev_device_cnet = None
+                    if prev_cnet is not None:
+                        prev_device_cnet = prev_cnet.get_instance_for_device(device)
+                    device_cnet.set_previous_controlnet(prev_device_cnet)
+                curr_cnet = prev_cnet
+    # potentially handle gligen - since not widely used, ignored for now
+
 def estimate_memory(model, noise_shape, conds):
    cond_shapes = collections.defaultdict(list)
    cond_shapes_min = {}
@@ -130,7 +172,8 @@ def prepare_sampling(model: ModelPatcher, noise_shape, conds, model_options=None
    return executor.execute(model, noise_shape, conds, model_options=model_options)

 def _prepare_sampling(model: ModelPatcher, noise_shape, conds, model_options=None):
-    real_model: BaseModel = None
+    model.match_multigpu_clones()
+    preprocess_multigpu_conds(conds, model, model_options)
    models, inference_memory = get_additional_models(conds, model.model_dtype())
    models += get_additional_models_from_model_options(model_options)
    models += model.get_nested_additional_models()  # TODO: does this require inference_memory update?
@@ -182,3 +225,18 @@ def prepare_model_patcher(model: ModelPatcher, conds, model_options: dict):
        comfy.patcher_extension.merge_nested_dicts(to_load_options.setdefault(wc_name, {}), model_options["transformer_options"][wc_name],
                                                    copy_dict1=False)
    return to_load_options
+
+def prepare_model_patcher_multigpu_clones(model_patcher: ModelPatcher, loaded_models: list[ModelPatcher], model_options: dict):
+    '''
+    In case multigpu acceleration is enabled, prep ModelPatchers for each device.
+    '''
+    multigpu_patchers: list[ModelPatcher] = [x for x in loaded_models if x.is_multigpu_base_clone]
+    if len(multigpu_patchers) > 0:
+        multigpu_dict: dict[torch.device, ModelPatcher] = {}
+        multigpu_dict[model_patcher.load_device] = model_patcher
+        for x in multigpu_patchers:
+            x.hook_patches = comfy.model_patcher.create_hook_patches_clone(model_patcher.hook_patches, copy_tuples=True)
+            x.hook_mode = model_patcher.hook_mode # match main model's hook_mode
+            multigpu_dict[x.load_device] = x
+        model_options["multigpu_clones"] = multigpu_dict
+    return multigpu_patchers
--- a/comfy/samplers.py
+++ b/comfy/samplers.py
@@ -1,7 +1,9 @@
 from __future__ import annotations
+
+import comfy.model_management
 from .k_diffusion import sampling as k_diffusion_sampling
 from .extra_samplers import uni_pc
-from typing import TYPE_CHECKING, Callable, NamedTuple
+from typing import TYPE_CHECKING, Callable, NamedTuple, Any
 if TYPE_CHECKING:
    from comfy.model_patcher import ModelPatcher
    from comfy.model_base import BaseModel
@@ -20,6 +22,7 @@ import comfy.context_windows
 import comfy.utils
 import scipy.stats
 import numpy
+import threading


 def add_area_dims(area, num_dims):
@@ -142,7 +145,7 @@ def can_concat_cond(c1, c2):

    return cond_equal_size(c1.conditioning, c2.conditioning)

-def cond_cat(c_list):
+def cond_cat(c_list, device=None):
    temp = {}
    for x in c_list:
        for k in x:
@@ -154,6 +157,8 @@ def cond_cat(c_list):
    for k in temp:
        conds = temp[k]
        out[k] = conds[0].concat(conds[1:])
+        if device is not None and hasattr(out[k], 'to'):
+            out[k] = out[k].to(device)

    return out

@@ -213,7 +218,9 @@ def _calc_cond_batch_outer(model: BaseModel, conds: list[list[dict]], x_in: torc
    )
    return executor.execute(model, conds, x_in, timestep, model_options)

-def _calc_cond_batch(model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep, model_options):
+def _calc_cond_batch(model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep: torch.Tensor, model_options: dict[str]):
+    if 'multigpu_clones' in model_options:
+        return _calc_cond_batch_multigpu(model, conds, x_in, timestep, model_options)
    out_conds = []
    out_counts = []
    # separate conds by matching hooks
@@ -245,7 +252,7 @@ def _calc_cond_batch(model: BaseModel, conds: list[list[dict]], x_in: torch.Tens
    if has_default_conds:
        finalize_default_conds(model, hooked_to_run, default_conds, x_in, timestep, model_options)

-    model.current_patcher.prepare_state(timestep)
+    model.current_patcher.prepare_state(timestep, model_options)

    # run every hooked_to_run separately
    for hooks, to_run in hooked_to_run.items():
@@ -346,6 +353,196 @@ def _calc_cond_batch(model: BaseModel, conds: list[list[dict]], x_in: torch.Tens

    return out_conds

+def _calc_cond_batch_multigpu(model: BaseModel, conds: list[list[dict]], x_in: torch.Tensor, timestep: torch.Tensor, model_options: dict[str]):
+    out_conds = []
+    out_counts = []
+    # separate conds by matching hooks
+    hooked_to_run: dict[comfy.hooks.HookGroup,list[tuple[tuple,int]]] = {}
+    default_conds = []
+    has_default_conds = False
+
+    output_device = x_in.device
+
+    for i in range(len(conds)):
+        out_conds.append(torch.zeros_like(x_in))
+        out_counts.append(torch.ones_like(x_in) * 1e-37)
+
+        cond = conds[i]
+        default_c = []
+        if cond is not None:
+            for x in cond:
+                if 'default' in x:
+                    default_c.append(x)
+                    has_default_conds = True
+                    continue
+                p = get_area_and_mult(x, x_in, timestep)
+                if p is None:
+                    continue
+                if p.hooks is not None:
+                    model.current_patcher.prepare_hook_patches_current_keyframe(timestep, p.hooks, model_options)
+                hooked_to_run.setdefault(p.hooks, list())
+                hooked_to_run[p.hooks] += [(p, i)]
+        default_conds.append(default_c)
+
+    if has_default_conds:
+        finalize_default_conds(model, hooked_to_run, default_conds, x_in, timestep, model_options)
+
+    model.current_patcher.prepare_state(timestep, model_options)
+
+    devices = [dev_m for dev_m in model_options['multigpu_clones'].keys()]
+    device_batched_hooked_to_run: dict[torch.device, list[tuple[comfy.hooks.HookGroup, tuple]]] = {}
+
+    total_conds = 0
+    for to_run in hooked_to_run.values():
+        total_conds += len(to_run)
+    conds_per_device = max(1, math.ceil(total_conds//len(devices)))
+    index_device = 0
+    current_device = devices[index_device]
+    # run every hooked_to_run separately
+    for hooks, to_run in hooked_to_run.items():
+        while len(to_run) > 0:
+            current_device = devices[index_device % len(devices)]
+            batched_to_run = device_batched_hooked_to_run.setdefault(current_device, [])
+            # keep track of conds currently scheduled onto this device
+            batched_to_run_length = 0
+            for btr in batched_to_run:
+                batched_to_run_length += len(btr[1])
+
+            first = to_run[0]
+            first_shape = first[0][0].shape
+            to_batch_temp = []
+            # make sure not over conds_per_device limit when creating temp batch
+            for x in range(len(to_run)):
+                if can_concat_cond(to_run[x][0], first[0]) and len(to_batch_temp) < (conds_per_device - batched_to_run_length):
+                    to_batch_temp += [x]
+
+            to_batch_temp.reverse()
+            to_batch = to_batch_temp[:1]
+
+            free_memory = model_management.get_free_memory(current_device)
+            for i in range(1, len(to_batch_temp) + 1):
+                batch_amount = to_batch_temp[:len(to_batch_temp)//i]
+                input_shape = [len(batch_amount) * first_shape[0]] + list(first_shape)[1:]
+                if model.memory_required(input_shape) * 1.5 < free_memory:
+                    to_batch = batch_amount
+                    break
+            conds_to_batch = []
+            for x in to_batch:
+                conds_to_batch.append(to_run.pop(x))
+            batched_to_run_length += len(conds_to_batch)
+
+            batched_to_run.append((hooks, conds_to_batch))
+            if batched_to_run_length >= conds_per_device:
+                index_device += 1
+
+    class thread_result(NamedTuple):
+        output: Any
+        mult: Any
+        area: Any
+        batch_chunks: int
+        cond_or_uncond: Any
+        error: Exception = None
+
+    def _handle_batch(device: torch.device, batch_tuple: tuple[comfy.hooks.HookGroup, tuple], results: list[thread_result]):
+        try:
+            model_current: BaseModel = model_options["multigpu_clones"][device].model
+            # run every hooked_to_run separately
+            with torch.no_grad():
+                for hooks, to_batch in batch_tuple:
+                    input_x = []
+                    mult = []
+                    c = []
+                    cond_or_uncond = []
+                    uuids = []
+                    area = []
+                    control: ControlBase = None
+                    patches = None
+                    for x in to_batch:
+                        o = x
+                        p = o[0]
+                        input_x.append(p.input_x)
+                        mult.append(p.mult)
+                        c.append(p.conditioning)
+                        area.append(p.area)
+                        cond_or_uncond.append(o[1])
+                        uuids.append(p.uuid)
+                        control = p.control
+                        patches = p.patches
+
+                    batch_chunks = len(cond_or_uncond)
+                    input_x = torch.cat(input_x).to(device)
+                    c = cond_cat(c, device=device)
+                    timestep_ = torch.cat([timestep.to(device)] * batch_chunks)
+
+                    transformer_options = model_current.current_patcher.apply_hooks(hooks=hooks)
+                    if 'transformer_options' in model_options:
+                        transformer_options = comfy.patcher_extension.merge_nested_dicts(transformer_options,
+                                                                                        model_options['transformer_options'],
+                                                                                        copy_dict1=False)
+
+                    if patches is not None:
+                        transformer_options["patches"] = comfy.patcher_extension.merge_nested_dicts(
+                            transformer_options.get("patches", {}),
+                            patches
+                        )
+
+                    transformer_options["cond_or_uncond"] = cond_or_uncond[:]
+                    transformer_options["uuids"] = uuids[:]
+                    transformer_options["sigmas"] = timestep
+                    transformer_options["sample_sigmas"] = transformer_options["sample_sigmas"].to(device)
+                    transformer_options["multigpu_thread_device"] = device
+
+                    cast_transformer_options(transformer_options, device=device)
+                    c['transformer_options'] = transformer_options
+
+                    if control is not None:
+                        device_control = control.get_instance_for_device(device)
+                        c['control'] = device_control.get_control(input_x, timestep_, c, len(cond_or_uncond), transformer_options)
+
+                    if 'model_function_wrapper' in model_options:
+                        output = model_options['model_function_wrapper'](model_current.apply_model, {"input": input_x, "timestep": timestep_, "c": c, "cond_or_uncond": cond_or_uncond}).to(output_device).chunk(batch_chunks)
+                    else:
+                        output = model_current.apply_model(input_x, timestep_, **c).to(output_device).chunk(batch_chunks)
+                    results.append(thread_result(output, mult, area, batch_chunks, cond_or_uncond))
+        except Exception as e:
+            results.append(thread_result(None, None, None, None, None, error=e))
+            raise
+
+
+    results: list[thread_result] = []
+    threads: list[threading.Thread] = []
+    for device, batch_tuple in device_batched_hooked_to_run.items():
+        new_thread = threading.Thread(target=_handle_batch, args=(device, batch_tuple, results))
+        threads.append(new_thread)
+        new_thread.start()
+
+    for thread in threads:
+        thread.join()
+
+    for output, mult, area, batch_chunks, cond_or_uncond, error in results:
+        if error is not None:
+            raise error
+        for o in range(batch_chunks):
+            cond_index = cond_or_uncond[o]
+            a = area[o]
+            if a is None:
+                out_conds[cond_index] += output[o] * mult[o]
+                out_counts[cond_index] += mult[o]
+            else:
+                out_c = out_conds[cond_index]
+                out_cts = out_counts[cond_index]
+                dims = len(a) // 2
+                for i in range(dims):
+                    out_c = out_c.narrow(i + 2, a[i + dims], a[i])
+                    out_cts = out_cts.narrow(i + 2, a[i + dims], a[i])
+                out_c += output[o] * mult[o]
+                out_cts += mult[o]
+
+    for i in range(len(out_conds)):
+        out_conds[i] /= out_counts[i]
+
+    return out_conds
+
 def calc_cond_uncond_batch(model, cond, uncond, x_in, timestep, model_options): #TODO: remove
    logging.warning("WARNING: The comfy.samplers.calc_cond_uncond_batch function is deprecated please use the calc_cond_batch one instead.")
    return tuple(calc_cond_batch(model, [cond, uncond], x_in, timestep, model_options))
@@ -650,6 +847,8 @@ def pre_run_control(model, conds):
        percent_to_timestep_function = lambda a: s.percent_to_sigma(a)
        if 'control' in x:
            x['control'].pre_run(model, percent_to_timestep_function)
+            for device_cnet in x['control'].multigpu_clones.values():
+                device_cnet.pre_run(model, percent_to_timestep_function)

 def apply_empty_x_to_equal_area(conds, uncond, name, uncond_fill_func):
    cond_cnets = []
@@ -782,7 +981,7 @@ def ksampler(sampler_name, extra_options={}, inpaint_options={}):
    return KSAMPLER(sampler_function, extra_options, inpaint_options)


-def process_conds(model, noise, conds, device, latent_image=None, denoise_mask=None, seed=None, latent_shapes=None):
+def process_conds(model, noise, conds, device, latent_image=None, denoise_mask=None, seed=None):
    for k in conds:
        conds[k] = conds[k][:]
        resolve_areas_and_cond_masks_multidim(conds[k], noise.shape[2:], device)
@@ -792,7 +991,7 @@ def process_conds(model, noise, conds, device, latent_image=None, denoise_mask=N

    if hasattr(model, 'extra_conds'):
        for k in conds:
-            conds[k] = encode_model_conds(model.extra_conds, conds[k], noise, device, k, latent_image=latent_image, denoise_mask=denoise_mask, seed=seed, latent_shapes=latent_shapes)
+            conds[k] = encode_model_conds(model.extra_conds, conds[k], noise, device, k, latent_image=latent_image, denoise_mask=denoise_mask, seed=seed)

    #make sure each cond area has an opposite one with the same area
    for k in conds:
@@ -892,7 +1091,9 @@ def cast_to_load_options(model_options: dict[str], device=None, dtype=None):
    to_load_options = model_options.get("to_load_options", None)
    if to_load_options is None:
        return
+    cast_transformer_options(to_load_options, device, dtype)

+def cast_transformer_options(transformer_options: dict[str], device=None, dtype=None):
    casts = []
    if device is not None:
        casts.append(device)
@@ -901,18 +1102,17 @@ def cast_to_load_options(model_options: dict[str], device=None, dtype=None):
    # if nothing to apply, do nothing
    if len(casts) == 0:
        return
-
    # try to call .to on patches
-    if "patches" in to_load_options:
-        patches = to_load_options["patches"]
+    if "patches" in transformer_options:
+        patches = transformer_options["patches"]
        for name in patches:
            patch_list = patches[name]
            for i in range(len(patch_list)):
                if hasattr(patch_list[i], "to"):
                    for cast in casts:
                        patch_list[i] = patch_list[i].to(cast)
-    if "patches_replace" in to_load_options:
-        patches = to_load_options["patches_replace"]
+    if "patches_replace" in transformer_options:
+        patches = transformer_options["patches_replace"]
        for name in patches:
            patch_list = patches[name]
            for k in patch_list:
@@ -922,8 +1122,8 @@ def cast_to_load_options(model_options: dict[str], device=None, dtype=None):
    # try to call .to on any wrappers/callbacks
    wrappers_and_callbacks = ["wrappers", "callbacks"]
    for wc_name in wrappers_and_callbacks:
-        if wc_name in to_load_options:
-            wc: dict[str, list] = to_load_options[wc_name]
+        if wc_name in transformer_options:
+            wc: dict[str, list] = transformer_options[wc_name]
            for wc_dict in wc.values():
                for wc_list in wc_dict.values():
                    for i in range(len(wc_list)):
@@ -931,7 +1131,6 @@ def cast_to_load_options(model_options: dict[str], device=None, dtype=None):
                            for cast in casts:
                                wc_list[i] = wc_list[i].to(cast)

-
 class CFGGuider:
    def __init__(self, model_patcher: ModelPatcher):
        self.model_patcher = model_patcher
@@ -962,11 +1161,11 @@ class CFGGuider:
    def predict_noise(self, x, timestep, model_options={}, seed=None):
        return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)

-    def inner_sample(self, noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=None):
+    def inner_sample(self, noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed):
        if latent_image is not None and torch.count_nonzero(latent_image) > 0: #Don't shift the empty latent image.
            latent_image = self.inner_model.process_latent_in(latent_image)

-        self.conds = process_conds(self.inner_model, noise, self.conds, device, latent_image, denoise_mask, seed, latent_shapes=latent_shapes)
+        self.conds = process_conds(self.inner_model, noise, self.conds, device, latent_image, denoise_mask, seed)

        extra_model_options = comfy.model_patcher.create_model_options_clone(self.model_options)
        extra_model_options.setdefault("transformer_options", {})["sample_sigmas"] = sigmas
@@ -980,10 +1179,12 @@ class CFGGuider:
        samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
        return self.inner_model.process_latent_out(samples.to(torch.float32))

-    def outer_sample(self, noise, latent_image, sampler, sigmas, denoise_mask=None, callback=None, disable_pbar=False, seed=None, latent_shapes=None):
+    def outer_sample(self, noise, latent_image, sampler, sigmas, denoise_mask=None, callback=None, disable_pbar=False, seed=None):
        self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options)
        device = self.model_patcher.load_device

+        multigpu_patchers = comfy.sampler_helpers.prepare_model_patcher_multigpu_clones(self.model_patcher, self.loaded_models, self.model_options)
+
        if denoise_mask is not None:
            denoise_mask = comfy.sampler_helpers.prepare_mask(denoise_mask, noise.shape, device)

@@ -994,9 +1195,13 @@ class CFGGuider:

        try:
            self.model_patcher.pre_run()
-            output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
+            for multigpu_patcher in multigpu_patchers:
+                multigpu_patcher.pre_run()
+            output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
        finally:
            self.model_patcher.cleanup()
+            for multigpu_patcher in multigpu_patchers:
+                multigpu_patcher.cleanup()

        comfy.sampler_helpers.cleanup_models(self.conds, self.loaded_models)
        del self.inner_model
@@ -1007,12 +1212,6 @@ class CFGGuider:
        if sigmas.shape[-1] == 0:
            return latent_image

-        if latent_image.is_nested:
-            latent_image, latent_shapes = comfy.utils.pack_latents(latent_image.unbind())
-            noise, _ = comfy.utils.pack_latents(noise.unbind())
-        else:
-            latent_shapes = [latent_image.shape]
-
        self.conds = {}
        for k in self.original_conds:
            self.conds[k] = list(map(lambda a: a.copy(), self.original_conds[k]))
@@ -1032,7 +1231,7 @@ class CFGGuider:
                self,
                comfy.patcher_extension.get_all_wrappers(comfy.patcher_extension.WrappersMP.OUTER_SAMPLE, self.model_options, is_model_options=True)
            )
-            output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
+            output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
        finally:
            cast_to_load_options(self.model_options, device=self.model_patcher.offload_device)
            self.model_options = orig_model_options
@@ -1040,9 +1239,6 @@ class CFGGuider:
            self.model_patcher.restore_hook_patches()

        del self.conds
-
-        if len(latent_shapes) > 1:
-            output = comfy.nested_tensor.NestedTensor(comfy.utils.unpack_latents(output, latent_shapes))
        return output


--- a/comfy/sd.py
+++ b/comfy/sd.py
@@ -143,9 +143,6 @@ class CLIP:
        n.apply_hooks_to_conds = self.apply_hooks_to_conds
        return n

-    def get_ram_usage(self):
-        return self.patcher.get_ram_usage()
-
    def add_patches(self, patches, strength_patch=1.0, strength_model=1.0):
        return self.patcher.add_patches(patches, strength_patch, strength_model)

@@ -296,7 +293,6 @@ class VAE:
        self.working_dtypes = [torch.bfloat16, torch.float32]
        self.disable_offload = False
        self.not_video = False
-        self.size = None

        self.downscale_index_formula = None
        self.upscale_index_formula = None
@@ -441,20 +437,20 @@ class VAE:
            elif "decoder.conv_in.conv.weight" in sd and sd['decoder.conv_in.conv.weight'].shape[1] == 32:
                ddconfig = {"block_out_channels": [128, 256, 512, 1024, 1024], "in_channels": 3, "out_channels": 3, "num_res_blocks": 2, "ffactor_spatial": 16, "ffactor_temporal": 4, "downsample_match_channel": True, "upsample_match_channel": True}
                ddconfig['z_channels'] = sd["decoder.conv_in.conv.weight"].shape[1]
-                self.latent_channels = 32
+                self.latent_channels = 64
                self.upscale_ratio = (lambda a: max(0, a * 4 - 3), 16, 16)
                self.upscale_index_formula = (4, 16, 16)
                self.downscale_ratio = (lambda a: max(0, math.floor((a + 3) / 4)), 16, 16)
                self.downscale_index_formula = (4, 16, 16)
                self.latent_dim = 3
-                self.not_video = False
+                self.not_video = True
                self.working_dtypes = [torch.float16, torch.bfloat16, torch.float32]
                self.first_stage_model = AutoencodingEngine(regularizer_config={'target': "comfy.ldm.models.autoencoder.EmptyRegularizer"},
                                                            encoder_config={'target': "comfy.ldm.hunyuan_video.vae_refiner.Encoder", 'params': ddconfig},
                                                            decoder_config={'target': "comfy.ldm.hunyuan_video.vae_refiner.Decoder", 'params': ddconfig})

-                self.memory_used_encode = lambda shape, dtype: (1400 * 9 * shape[-2] * shape[-1]) * model_management.dtype_size(dtype)
-                self.memory_used_decode = lambda shape, dtype: (2800 * 4 * shape[-2] * shape[-1] * 16 * 16) * model_management.dtype_size(dtype)
+                self.memory_used_encode = lambda shape, dtype: (1400 * shape[-2] * shape[-1]) * model_management.dtype_size(dtype)
+                self.memory_used_decode = lambda shape, dtype: (1400 * shape[-3] * shape[-2] * shape[-1] * 16 * 16) * model_management.dtype_size(dtype)
            elif "decoder.conv_in.conv.weight" in sd:
                ddconfig = {'double_z': True, 'z_channels': 4, 'resolution': 256, 'in_channels': 3, 'out_ch': 3, 'ch': 128, 'ch_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_resolutions': [], 'dropout': 0.0}
                ddconfig["conv3d"] = True
@@ -599,16 +595,6 @@ class VAE:

        self.patcher = comfy.model_patcher.ModelPatcher(self.first_stage_model, load_device=self.device, offload_device=offload_device)
        logging.info("VAE load device: {}, offload device: {}, dtype: {}".format(self.device, offload_device, self.vae_dtype))
-        self.model_size()
-
-    def model_size(self):
-        if self.size is not None:
-            return self.size
-        self.size = comfy.model_management.module_size(self.first_stage_model)
-        return self.size
-
-    def get_ram_usage(self):
-        return self.model_size()

    def throw_exception_if_invalid(self):
        if self.first_stage_model is None:
@@ -911,7 +897,6 @@ class CLIPType(Enum):
    OMNIGEN2 = 17
    QWEN_IMAGE = 18
    HUNYUAN_IMAGE = 19
-    HUNYUAN_VIDEO_15 = 20


 def load_clip(ckpt_paths, embedding_directory=None, clip_type=CLIPType.STABLE_DIFFUSION, model_options={}):
@@ -1127,9 +1112,6 @@ def load_text_encoder_state_dicts(state_dicts=[], embedding_directory=None, clip
        elif clip_type == CLIPType.HUNYUAN_IMAGE:
            clip_target.clip = comfy.text_encoders.hunyuan_image.te(**llama_detect(clip_data))
            clip_target.tokenizer = comfy.text_encoders.hunyuan_image.HunyuanImageTokenizer
-        elif clip_type == CLIPType.HUNYUAN_VIDEO_15:
-            clip_target.clip = comfy.text_encoders.hunyuan_image.te(**llama_detect(clip_data))
-            clip_target.tokenizer = comfy.text_encoders.hunyuan_video.HunyuanVideo15Tokenizer
        else:
            clip_target.clip = sdxl_clip.SDXLClipModel
            clip_target.tokenizer = sdxl_clip.SDXLTokenizer
@@ -1280,7 +1262,7 @@ def load_state_dict_guess_config(sd, output_vae=True, output_clip=True, output_c
    return (model_patcher, clip, vae, clipvision)


-def load_diffusion_model_state_dict(sd, model_options={}, metadata=None):
+def load_diffusion_model_state_dict(sd, model_options={}):
    """
    Loads a UNet diffusion model from a state dictionary, supporting both diffusers and regular formats.

@@ -1314,7 +1296,7 @@ def load_diffusion_model_state_dict(sd, model_options={}, metadata=None):
    weight_dtype = comfy.utils.weight_dtype(sd)

    load_device = model_management.get_torch_device()
-    model_config = model_detection.model_config_from_unet(sd, "", metadata=metadata)
+    model_config = model_detection.model_config_from_unet(sd, "")

    if model_config is not None:
        new_sd = sd
@@ -1348,10 +1330,7 @@ def load_diffusion_model_state_dict(sd, model_options={}, metadata=None):
    else:
        unet_dtype = dtype

-    if model_config.layer_quant_config is not None:
-        manual_cast_dtype = model_management.unet_manual_cast(None, load_device, model_config.supported_inference_dtypes)
-    else:
-        manual_cast_dtype = model_management.unet_manual_cast(unet_dtype, load_device, model_config.supported_inference_dtypes)
+    manual_cast_dtype = model_management.unet_manual_cast(unet_dtype, load_device, model_config.supported_inference_dtypes)
    model_config.set_inference_dtype(unet_dtype, manual_cast_dtype)
    model_config.custom_operations = model_options.get("custom_operations", model_config.custom_operations)
    if model_options.get("fp8_optimizations", False):
@@ -1367,8 +1346,8 @@ def load_diffusion_model_state_dict(sd, model_options={}, metadata=None):


 def load_diffusion_model(unet_path, model_options={}):
-    sd, metadata = comfy.utils.load_torch_file(unet_path, return_metadata=True)
-    model = load_diffusion_model_state_dict(sd, model_options=model_options, metadata=metadata)
+    sd = comfy.utils.load_torch_file(unet_path)
+    model = load_diffusion_model_state_dict(sd, model_options=model_options)
    if model is None:
        logging.error("ERROR UNSUPPORTED DIFFUSION MODEL {}".format(unet_path))
        raise RuntimeError("ERROR: Could not detect model type of: {}\n{}".format(unet_path, model_detection_error_hint(unet_path, sd)))
--- a/comfy/sd1_clip.py
+++ b/comfy/sd1_clip.py
@@ -460,7 +460,7 @@ def load_embed(embedding_name, embedding_directory, embedding_size, embed_key=No
    return embed_out

 class SDTokenizer:
-    def __init__(self, tokenizer_path=None, max_length=77, pad_with_end=True, embedding_directory=None, embedding_size=768, embedding_key='clip_l', tokenizer_class=CLIPTokenizer, has_start_token=True, has_end_token=True, pad_to_max_length=True, min_length=None, pad_token=None, end_token=None, min_padding=None, pad_left=False, tokenizer_data={}, tokenizer_args={}):
+    def __init__(self, tokenizer_path=None, max_length=77, pad_with_end=True, embedding_directory=None, embedding_size=768, embedding_key='clip_l', tokenizer_class=CLIPTokenizer, has_start_token=True, has_end_token=True, pad_to_max_length=True, min_length=None, pad_token=None, end_token=None, min_padding=None, tokenizer_data={}, tokenizer_args={}):
        if tokenizer_path is None:
            tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "sd1_tokenizer")
        self.tokenizer = tokenizer_class.from_pretrained(tokenizer_path, **tokenizer_args)
@@ -468,7 +468,6 @@ class SDTokenizer:
        self.min_length = tokenizer_data.get("{}_min_length".format(embedding_key), min_length)
        self.end_token = None
        self.min_padding = min_padding
-        self.pad_left = pad_left

        empty = self.tokenizer('')["input_ids"]
        self.tokenizer_adds_end_token = has_end_token
@@ -523,12 +522,6 @@ class SDTokenizer:
                return (embed, "{} {}".format(embedding_name[len(stripped):], leftover))
        return (embed, leftover)

-    def pad_tokens(self, tokens, amount):
-        if self.pad_left:
-            for i in range(amount):
-                tokens.insert(0, (self.pad_token, 1.0, 0))
-        else:
-            tokens.extend([(self.pad_token, 1.0, 0)] * amount)

    def tokenize_with_weights(self, text:str, return_word_ids=False, tokenizer_options={}, **kwargs):
        '''
@@ -607,7 +600,7 @@ class SDTokenizer:
                        if self.end_token is not None:
                            batch.append((self.end_token, 1.0, 0))
                        if self.pad_to_max_length:
-                            self.pad_tokens(batch, remaining_length)
+                            batch.extend([(self.pad_token, 1.0, 0)] * (remaining_length))
                    #start new batch
                    batch = []
                    if self.start_token is not None:
@@ -621,11 +614,11 @@ class SDTokenizer:
        if self.end_token is not None:
            batch.append((self.end_token, 1.0, 0))
        if min_padding is not None:
-            self.pad_tokens(batch, min_padding)
+            batch.extend([(self.pad_token, 1.0, 0)] * min_padding)
        if self.pad_to_max_length and len(batch) < self.max_length:
-            self.pad_tokens(batch, self.max_length - len(batch))
+            batch.extend([(self.pad_token, 1.0, 0)] * (self.max_length - len(batch)))
        if min_length is not None and len(batch) < min_length:
-            self.pad_tokens(batch, min_length - len(batch))
+            batch.extend([(self.pad_token, 1.0, 0)] * (min_length - len(batch)))

        if not return_word_ids:
            batched_tokens = [[(t, w) for t, w,_ in x] for x in batched_tokens]
--- a/comfy/supported_models.py
+++ b/comfy/supported_models.py
@@ -1374,54 +1374,6 @@ class HunyuanImage21Refiner(HunyuanVideo):
        out = model_base.HunyuanImage21Refiner(self, device=device)
        return out

-class HunyuanVideo15(HunyuanVideo):
-    unet_config = {
-        "image_model": "hunyuan_video",
-        "vision_in_dim": 1152,
-    }
-
-    sampling_settings = {
-        "shift": 7.0,
-    }
-    memory_usage_factor = 4.0 #TODO
-    supported_inference_dtypes = [torch.float16, torch.bfloat16, torch.float32]
-
-    latent_format = latent_formats.HunyuanVideo15
-
-    def get_model(self, state_dict, prefix="", device=None):
-        out = model_base.HunyuanVideo15(self, device=device)
-        return out
-
-    def clip_target(self, state_dict={}):
-        pref = self.text_encoder_key_prefix[0]
-        hunyuan_detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen25_7b.transformer.".format(pref))
-        return supported_models_base.ClipTarget(comfy.text_encoders.hunyuan_video.HunyuanVideo15Tokenizer, comfy.text_encoders.hunyuan_image.te(**hunyuan_detect))
-
-
-class HunyuanVideo15_SR_Distilled(HunyuanVideo):
-    unet_config = {
-        "image_model": "hunyuan_video",
-        "vision_in_dim": 1152,
-        "in_channels": 98,
-    }
-
-    sampling_settings = {
-        "shift": 2.0,
-    }
-    memory_usage_factor = 4.0 #TODO
-    supported_inference_dtypes = [torch.float16, torch.bfloat16, torch.float32]
-
-    latent_format = latent_formats.HunyuanVideo15
-
-    def get_model(self, state_dict, prefix="", device=None):
-        out = model_base.HunyuanVideo15_SR_Distilled(self, device=device)
-        return out
-
-    def clip_target(self, state_dict={}):
-        pref = self.text_encoder_key_prefix[0]
-        hunyuan_detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen25_7b.transformer.".format(pref))
-        return supported_models_base.ClipTarget(comfy.text_encoders.hunyuan_video.HunyuanVideo15Tokenizer, comfy.text_encoders.hunyuan_image.te(**hunyuan_detect))
-
-models = [LotusD, Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanVideo15_SR_Distilled, HunyuanVideo15, HunyuanImage21Refiner, HunyuanImage21, HunyuanVideoSkyreelsI2V, HunyuanVideoI2V, HunyuanVideo, CosmosT2V, CosmosI2V, CosmosT2IPredict2, CosmosI2VPredict2, Lumina2, WAN22_T2V, WAN21_T2V, WAN21_I2V, WAN21_FunControl2V, WAN21_Vace, WAN21_Camera, WAN22_Camera, WAN22_S2V, WAN21_HuMo, WAN22_Animate, Hunyuan3Dv2mini, Hunyuan3Dv2, Hunyuan3Dv2_1, HiDream, Chroma, ChromaRadiance, ACEStep, Omnigen2, QwenImage]
+models = [LotusD, Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, HunyuanImage21Refiner, HunyuanImage21, HunyuanVideoSkyreelsI2V, HunyuanVideoI2V, HunyuanVideo, CosmosT2V, CosmosI2V, CosmosT2IPredict2, CosmosI2VPredict2, Lumina2, WAN22_T2V, WAN21_T2V, WAN21_I2V, WAN21_FunControl2V, WAN21_Vace, WAN21_Camera, WAN22_Camera, WAN22_S2V, WAN21_HuMo, WAN22_Animate, Hunyuan3Dv2mini, Hunyuan3Dv2, Hunyuan3Dv2_1, HiDream, Chroma, ChromaRadiance, ACEStep, Omnigen2, QwenImage]

 models += [SVD_img2vid]
--- a/comfy/supported_models_base.py
+++ b/comfy/supported_models_base.py
@@ -50,7 +50,6 @@ class BASE:
    manual_cast_dtype = None
    custom_operations = None
    scaled_fp8 = None
-    layer_quant_config = None  # Per-layer quantization configuration for mixed precision
    optimizations = {"fp8": False}

    @classmethod
--- a/comfy/text_encoders/hunyuan_video.py
+++ b/comfy/text_encoders/hunyuan_video.py
@@ -1,7 +1,6 @@
 from comfy import sd1_clip
 import comfy.model_management
 import comfy.text_encoders.llama
-from .hunyuan_image import HunyuanImageTokenizer
 from transformers import LlamaTokenizerFast
 import torch
 import os
@@ -74,14 +73,6 @@ class HunyuanVideoTokenizer:
        return {}


-class HunyuanVideo15Tokenizer(HunyuanImageTokenizer):
-    def __init__(self, embedding_directory=None, tokenizer_data={}):
-        super().__init__(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data)
-        self.llama_template = "<|im_start|>system\nYou are a helpful assistant. Describe the video by detailing the following aspects:\n1. The main content and theme of the video.\n2. The color, shape, size, texture, quantity, text, and spatial relationships of the objects.\n3. Actions, events, behaviors temporal relationships, physical movement changes of the objects.\n4. background environment, light, style and atmosphere.\n5. camera angles, movements, and transitions used in the video.<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n"
-
-    def tokenize_with_weights(self, text:str, return_word_ids=False, **kwargs):
-        return super().tokenize_with_weights(text, return_word_ids, prevent_empty_text=True, **kwargs)
-
 class HunyuanVideoClipModel(torch.nn.Module):
    def __init__(self, dtype_llama=None, device="cpu", dtype=None, model_options={}):
        super().__init__()
--- a/comfy/text_encoders/llama.py
+++ b/comfy/text_encoders/llama.py
@@ -32,7 +32,6 @@ class Llama2Config:
    q_norm = None
    k_norm = None
    rope_scale = None
-    final_norm: bool = True

@dataclass
 class Qwen25_3BConfig:
@@ -54,7 +53,6 @@ class Qwen25_3BConfig:
    q_norm = None
    k_norm = None
    rope_scale = None
-    final_norm: bool = True

@dataclass
 class Qwen25_7BVLI_Config:
@@ -76,7 +74,6 @@ class Qwen25_7BVLI_Config:
    q_norm = None
    k_norm = None
    rope_scale = None
-    final_norm: bool = True

@dataclass
 class Gemma2_2B_Config:
@@ -99,7 +96,6 @@ class Gemma2_2B_Config:
    k_norm = None
    sliding_attention = None
    rope_scale = None
-    final_norm: bool = True

@dataclass
 class Gemma3_4B_Config:
@@ -122,7 +118,6 @@ class Gemma3_4B_Config:
    k_norm = "gemma3"
    sliding_attention = [False, False, False, False, False, 1024]
    rope_scale = [1.0, 8.0]
-    final_norm: bool = True

 class RMSNorm(nn.Module):
    def __init__(self, dim: int, eps: float = 1e-5, add=False, device=None, dtype=None):
@@ -371,12 +366,7 @@ class Llama2_(nn.Module):
            transformer(config, index=i, device=device, dtype=dtype, ops=ops)
            for i in range(config.num_hidden_layers)
        ])
-
-        if config.final_norm:
-            self.norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps, add=config.rms_norm_add, device=device, dtype=dtype)
-        else:
-            self.norm = None
-
+        self.norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps, add=config.rms_norm_add, device=device, dtype=dtype)
        # self.lm_head = ops.Linear(config.hidden_size, config.vocab_size, bias=False, device=device, dtype=dtype)

    def forward(self, x, attention_mask=None, embeds=None, num_tokens=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=None, position_ids=None, embeds_info=[]):
@@ -431,16 +421,14 @@ class Llama2_(nn.Module):
            if i == intermediate_output:
                intermediate = x.clone()

-        if self.norm is not None:
-            x = self.norm(x)
-
+        x = self.norm(x)
        if all_intermediate is not None:
            all_intermediate.append(x.unsqueeze(1).clone())

        if all_intermediate is not None:
            intermediate = torch.cat(all_intermediate, dim=1)

-        if intermediate is not None and final_layer_norm_intermediate and self.norm is not None:
+        if intermediate is not None and final_layer_norm_intermediate:
            intermediate = self.norm(intermediate)

        return x, intermediate
--- a/comfy/text_encoders/qwen_image.py
+++ b/comfy/text_encoders/qwen_image.py
@@ -17,14 +17,12 @@ class QwenImageTokenizer(sd1_clip.SD1Tokenizer):
        self.llama_template = "<|im_start|>system\nDescribe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n"
        self.llama_template_images = "<|im_start|>system\nDescribe the key features of the input image (color, shape, size, texture, objects, background), then explain how the user's text instruction should alter or modify the image. Generate a new image that meets the user's requirements while maintaining consistency with the original input where appropriate.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>{}<|im_end|>\n<|im_start|>assistant\n"

-    def tokenize_with_weights(self, text, return_word_ids=False, llama_template=None, images=[], prevent_empty_text=False, **kwargs):
+    def tokenize_with_weights(self, text, return_word_ids=False, llama_template=None, images=[], **kwargs):
        skip_template = False
        if text.startswith('<|im_start|>'):
            skip_template = True
        if text.startswith('<|start_header_id|>'):
            skip_template = True
-        if prevent_empty_text and text == '':
-            text = ' '

        if skip_template:
            llama_text = text
--- a/comfy/utils.py
+++ b/comfy/utils.py
@@ -1106,25 +1106,3 @@ def upscale_dit_mask(mask: torch.Tensor, img_size_in, img_size_out):
            dim=1
        )
        return out
-
-def pack_latents(latents):
-    latent_shapes = []
-    tensors = []
-    for tensor in latents:
-        latent_shapes.append(tensor.shape)
-        tensors.append(tensor.reshape(tensor.shape[0], 1, -1))
-
-    latent = torch.cat(tensors, dim=-1)
-    return latent, latent_shapes
-
-def unpack_latents(combined_latent, latent_shapes):
-    if len(latent_shapes) > 1:
-        output_tensors = []
-        for shape in latent_shapes:
-            cut = math.prod(shape[1:])
-            tens = combined_latent[:, :, :cut]
-            combined_latent = combined_latent[:, :, cut:]
-            output_tensors.append(tens.reshape([tens.shape[0]] + list(shape)[1:]))
-    else:
-        output_tensors = combined_latent
-    return output_tensors
--- a/comfy_api/latest/init.py
+++ b/comfy_api/latest/init.py
@@ -7,7 +7,7 @@ from comfy_api.internal.singleton import ProxiedSingleton
 from comfy_api.internal.async_to_sync import create_sync_class
 from comfy_api.latest._input import ImageInput, AudioInput, MaskInput, LatentInput, VideoInput
 from comfy_api.latest._input_impl import VideoFromFile, VideoFromComponents
-from comfy_api.latest._util import VideoCodec, VideoContainer, VideoComponents, MESH, VOXEL
+from comfy_api.latest._util import VideoCodec, VideoContainer, VideoComponents
 from . import _io as io
 from . import _ui as ui
 # from comfy_api.latest._resources import _RESOURCES as resources  #noqa: F401
@@ -104,8 +104,6 @@ class Types:
    VideoCodec = VideoCodec
    VideoContainer = VideoContainer
    VideoComponents = VideoComponents
-    MESH = MESH
-    VOXEL = VOXEL

 ComfyAPI = ComfyAPI_latest

--- a/comfy_api/latest/_io.py
+++ b/comfy_api/latest/_io.py
@@ -27,7 +27,6 @@ from comfy_api.internal import (_ComfyNodeInternal, _NodeOutputInternal, classpr
    prune_dict, shallow_clone_class)
 from comfy_api.latest._resources import Resources, ResourcesLocal
 from comfy_execution.graph_utils import ExecutionBlocker
-from ._util import MESH, VOXEL

 # from comfy_extras.nodes_images import SVG as SVG_ # NOTE: needs to be moved before can be imported due to circular reference

@@ -629,10 +628,6 @@ class UpscaleModel(ComfyTypeIO):
    if TYPE_CHECKING:
        Type = ImageModelDescriptor

-@comfytype(io_type="LATENT_UPSCALE_MODEL")
-class LatentUpscaleModel(ComfyTypeIO):
-    Type = Any
-
@comfytype(io_type="AUDIO")
 class Audio(ComfyTypeIO):
    class AudioDict(TypedDict):
@@ -661,11 +656,11 @@ class LossMap(ComfyTypeIO):

@comfytype(io_type="VOXEL")
 class Voxel(ComfyTypeIO):
-    Type = VOXEL
+    Type = Any # TODO: VOXEL class is defined in comfy_extras/nodes_hunyuan3d.py; should be moved to somewhere else before referenced directly in v3

@comfytype(io_type="MESH")
 class Mesh(ComfyTypeIO):
-    Type = MESH
+    Type = Any # TODO: MESH class is defined in comfy_extras/nodes_hunyuan3d.py; should be moved to somewhere else before referenced directly in v3

@comfytype(io_type="HOOKS")
 class Hooks(ComfyTypeIO):
--- a/comfy_api/latest/_util/init.py
+++ b/comfy_api/latest/_util/init.py
@@ -1,11 +1,8 @@
 from .video_types import VideoContainer, VideoCodec, VideoComponents
-from .geometry_types import VOXEL, MESH

 __all__ = [
    # Utility Types
    "VideoContainer",
    "VideoCodec",
    "VideoComponents",
-    "VOXEL",
-    "MESH",
 ]
--- a/comfy_api/latest/_util/geometry_types.py
+++ b/comfy_api/latest/_util/geometry_types.py
@@ -1,12 +0,0 @@
-import torch
-
-
-class VOXEL:
-    def __init__(self, data: torch.Tensor):
-        self.data = data
-
-
-class MESH:
-    def __init__(self, vertices: torch.Tensor, faces: torch.Tensor):
-        self.vertices = vertices
-        self.faces = faces
--- a/comfy_api_nodes/apinode_utils.py
+++ b/comfy_api_nodes/apinode_utils.py
@@ -0,0 +1,718 @@
+from __future__ import annotations
+import aiohttp
+import io
+import logging
+import mimetypes
+import os
+from typing import Optional, Union
+from comfy.utils import common_upscale
+from comfy_api.input_impl import VideoFromFile
+from comfy_api.util import VideoContainer, VideoCodec
+from comfy_api.input.video_types import VideoInput
+from comfy_api.input.basic_types import AudioInput
+from comfy_api_nodes.apis.client import (
+    ApiClient,
+    ApiEndpoint,
+    HttpMethod,
+    SynchronousOperation,
+    UploadRequest,
+    UploadResponse,
+)
+from server import PromptServer
+from comfy.cli_args import args
+
+import numpy as np
+from PIL import Image
+import torch
+import math
+import base64
+import uuid
+from io import BytesIO
+import av
+
+
+async def download_url_to_video_output(
+    video_url: str, timeout: int = None, auth_kwargs: Optional[dict[str, str]] = None
+) -> VideoFromFile:
+    """Downloads a video from a URL and returns a `VIDEO` output.
+
+    Args:
+        video_url: The URL of the video to download.
+
+    Returns:
+        A Comfy node `VIDEO` output.
+    """
+    video_io = await download_url_to_bytesio(video_url, timeout, auth_kwargs=auth_kwargs)
+    if video_io is None:
+        error_msg = f"Failed to download video from {video_url}"
+        logging.error(error_msg)
+        raise ValueError(error_msg)
+    return VideoFromFile(video_io)
+
+
+def downscale_image_tensor(image, total_pixels=1536 * 1024) -> torch.Tensor:
+    """Downscale input image tensor to roughly the specified total pixels."""
+    samples = image.movedim(-1, 1)
+    total = int(total_pixels)
+    scale_by = math.sqrt(total / (samples.shape[3] * samples.shape[2]))
+    if scale_by >= 1:
+        return image
+    width = round(samples.shape[3] * scale_by)
+    height = round(samples.shape[2] * scale_by)
+
+    s = common_upscale(samples, width, height, "lanczos", "disabled")
+    s = s.movedim(1, -1)
+    return s
+
+
+async def validate_and_cast_response(
+    response, timeout: int = None, node_id: Union[str, None] = None
+) -> torch.Tensor:
+    """Validates and casts a response to a torch.Tensor.
+
+    Args:
+        response: The response to validate and cast.
+        timeout: Request timeout in seconds. Defaults to None (no timeout).
+
+    Returns:
+        A torch.Tensor representing the image (1, H, W, C).
+
+    Raises:
+        ValueError: If the response is not valid.
+    """
+    # validate raw JSON response
+    data = response.data
+    if not data or len(data) == 0:
+        raise ValueError("No images returned from API endpoint")
+
+    # Initialize list to store image tensors
+    image_tensors: list[torch.Tensor] = []
+
+    # Process each image in the data array
+    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=timeout)) as session:
+        for img_data in data:
+            img_bytes: bytes
+            if img_data.b64_json:
+                img_bytes = base64.b64decode(img_data.b64_json)
+            elif img_data.url:
+                if node_id:
+                    PromptServer.instance.send_progress_text(f"Result URL: {img_data.url}", node_id)
+                async with session.get(img_data.url) as resp:
+                    if resp.status != 200:
+                        raise ValueError("Failed to download generated image")
+                    img_bytes = await resp.read()
+            else:
+                raise ValueError("Invalid image payload – neither URL nor base64 data present.")
+
+            pil_img = Image.open(BytesIO(img_bytes)).convert("RGBA")
+            arr = np.asarray(pil_img).astype(np.float32) / 255.0
+            image_tensors.append(torch.from_numpy(arr))
+
+    return torch.stack(image_tensors, dim=0)
+
+
+def validate_aspect_ratio(
+    aspect_ratio: str,
+    minimum_ratio: float,
+    maximum_ratio: float,
+    minimum_ratio_str: str,
+    maximum_ratio_str: str,
+) -> float:
+    """Validates and casts an aspect ratio string to a float.
+
+    Args:
+        aspect_ratio: The aspect ratio string to validate.
+        minimum_ratio: The minimum aspect ratio.
+        maximum_ratio: The maximum aspect ratio.
+        minimum_ratio_str: The minimum aspect ratio string.
+        maximum_ratio_str: The maximum aspect ratio string.
+
+    Returns:
+        The validated and cast aspect ratio.
+
+    Raises:
+        Exception: If the aspect ratio is not valid.
+    """
+    # get ratio values
+    numbers = aspect_ratio.split(":")
+    if len(numbers) != 2:
+        raise TypeError(
+            f"Aspect ratio must be in the format X:Y, such as 16:9, but was {aspect_ratio}."
+        )
+    try:
+        numerator = int(numbers[0])
+        denominator = int(numbers[1])
+    except ValueError as exc:
+        raise TypeError(
+            f"Aspect ratio must contain numbers separated by ':', such as 16:9, but was {aspect_ratio}."
+        ) from exc
+    calculated_ratio = numerator / denominator
+    # if not close to minimum and maximum, check bounds
+    if not math.isclose(calculated_ratio, minimum_ratio) or not math.isclose(
+        calculated_ratio, maximum_ratio
+    ):
+        if calculated_ratio < minimum_ratio:
+            raise TypeError(
+                f"Aspect ratio cannot reduce to any less than {minimum_ratio_str} ({minimum_ratio}), but was {aspect_ratio} ({calculated_ratio})."
+            )
+        if calculated_ratio > maximum_ratio:
+            raise TypeError(
+                f"Aspect ratio cannot reduce to any greater than {maximum_ratio_str} ({maximum_ratio}), but was {aspect_ratio} ({calculated_ratio})."
+            )
+    return aspect_ratio
+
+
+def mimetype_to_extension(mime_type: str) -> str:
+    """Converts a MIME type to a file extension."""
+    return mime_type.split("/")[-1].lower()
+
+
+async def download_url_to_bytesio(
+    url: str, timeout: int = None, auth_kwargs: Optional[dict[str, str]] = None
+) -> BytesIO:
+    """Downloads content from a URL using requests and returns it as BytesIO.
+
+    Args:
+        url: The URL to download.
+        timeout: Request timeout in seconds. Defaults to None (no timeout).
+
+    Returns:
+        BytesIO object containing the downloaded content.
+    """
+    headers = {}
+    if url.startswith("/proxy/"):
+        url = str(args.comfy_api_base).rstrip("/") + url
+        auth_token = auth_kwargs.get("auth_token")
+        comfy_api_key = auth_kwargs.get("comfy_api_key")
+        if auth_token:
+            headers["Authorization"] = f"Bearer {auth_token}"
+        elif comfy_api_key:
+            headers["X-API-KEY"] = comfy_api_key
+    timeout_cfg = aiohttp.ClientTimeout(total=timeout) if timeout else None
+    async with aiohttp.ClientSession(timeout=timeout_cfg) as session:
+        async with session.get(url, headers=headers) as resp:
+            resp.raise_for_status()  # Raises HTTPError for bad responses (4XX or 5XX)
+            return BytesIO(await resp.read())
+
+
+def bytesio_to_image_tensor(image_bytesio: BytesIO, mode: str = "RGBA") -> torch.Tensor:
+    """Converts image data from BytesIO to a torch.Tensor.
+
+    Args:
+        image_bytesio: BytesIO object containing the image data.
+        mode: The PIL mode to convert the image to (e.g., "RGB", "RGBA").
+
+    Returns:
+        A torch.Tensor representing the image (1, H, W, C).
+
+    Raises:
+        PIL.UnidentifiedImageError: If the image data cannot be identified.
+        ValueError: If the specified mode is invalid.
+    """
+    image = Image.open(image_bytesio)
+    image = image.convert(mode)
+    image_array = np.array(image).astype(np.float32) / 255.0
+    return torch.from_numpy(image_array).unsqueeze(0)
+
+
+async def download_url_to_image_tensor(url: str, timeout: int = None) -> torch.Tensor:
+    """Downloads an image from a URL and returns a [B, H, W, C] tensor."""
+    image_bytesio = await download_url_to_bytesio(url, timeout)
+    return bytesio_to_image_tensor(image_bytesio)
+
+
+def process_image_response(response_content: bytes | str) -> torch.Tensor:
+    """Uses content from a Response object and converts it to a torch.Tensor"""
+    return bytesio_to_image_tensor(BytesIO(response_content))
+
+
+def _tensor_to_pil(image: torch.Tensor, total_pixels: int = 2048 * 2048) -> Image.Image:
+    """Converts a single torch.Tensor image [H, W, C] to a PIL Image, optionally downscaling."""
+    if len(image.shape) > 3:
+        image = image[0]
+    # TODO: remove alpha if not allowed and present
+    input_tensor = image.cpu()
+    input_tensor = downscale_image_tensor(
+        input_tensor.unsqueeze(0), total_pixels=total_pixels
+    ).squeeze()
+    image_np = (input_tensor.numpy() * 255).astype(np.uint8)
+    img = Image.fromarray(image_np)
+    return img
+
+
+def _pil_to_bytesio(img: Image.Image, mime_type: str = "image/png") -> BytesIO:
+    """Converts a PIL Image to a BytesIO object."""
+    if not mime_type:
+        mime_type = "image/png"
+
+    img_byte_arr = io.BytesIO()
+    # Derive PIL format from MIME type (e.g., 'image/png' -> 'PNG')
+    pil_format = mime_type.split("/")[-1].upper()
+    if pil_format == "JPG":
+        pil_format = "JPEG"
+    img.save(img_byte_arr, format=pil_format)
+    img_byte_arr.seek(0)
+    return img_byte_arr
+
+
+def tensor_to_bytesio(
+    image: torch.Tensor,
+    name: Optional[str] = None,
+    total_pixels: int = 2048 * 2048,
+    mime_type: str = "image/png",
+) -> BytesIO:
+    """Converts a torch.Tensor image to a named BytesIO object.
+
+    Args:
+        image: Input torch.Tensor image.
+        name: Optional filename for the BytesIO object.
+        total_pixels: Maximum total pixels for potential downscaling.
+        mime_type: Target image MIME type (e.g., 'image/png', 'image/jpeg', 'image/webp', 'video/mp4').
+
+    Returns:
+        Named BytesIO object containing the image data, with pointer set to the start of buffer.
+    """
+    if not mime_type:
+        mime_type = "image/png"
+
+    pil_image = _tensor_to_pil(image, total_pixels=total_pixels)
+    img_binary = _pil_to_bytesio(pil_image, mime_type=mime_type)
+    img_binary.name = (
+        f"{name if name else uuid.uuid4()}.{mimetype_to_extension(mime_type)}"
+    )
+    return img_binary
+
+
+def tensor_to_base64_string(
+    image_tensor: torch.Tensor,
+    total_pixels: int = 2048 * 2048,
+    mime_type: str = "image/png",
+) -> str:
+    """Convert [B, H, W, C] or [H, W, C] tensor to a base64 string.
+
+    Args:
+        image_tensor: Input torch.Tensor image.
+        total_pixels: Maximum total pixels for potential downscaling.
+        mime_type: Target image MIME type (e.g., 'image/png', 'image/jpeg', 'image/webp', 'video/mp4').
+
+    Returns:
+        Base64 encoded string of the image.
+    """
+    pil_image = _tensor_to_pil(image_tensor, total_pixels=total_pixels)
+    img_byte_arr = _pil_to_bytesio(pil_image, mime_type=mime_type)
+    img_bytes = img_byte_arr.getvalue()
+    # Encode bytes to base64 string
+    base64_encoded_string = base64.b64encode(img_bytes).decode("utf-8")
+    return base64_encoded_string
+
+
+def tensor_to_data_uri(
+    image_tensor: torch.Tensor,
+    total_pixels: int = 2048 * 2048,
+    mime_type: str = "image/png",
+) -> str:
+    """Converts a tensor image to a Data URI string.
+
+    Args:
+        image_tensor: Input torch.Tensor image.
+        total_pixels: Maximum total pixels for potential downscaling.
+        mime_type: Target image MIME type (e.g., 'image/png', 'image/jpeg', 'image/webp').
+
+    Returns:
+        Data URI string (e.g., 'data:image/png;base64,...').
+    """
+    base64_string = tensor_to_base64_string(image_tensor, total_pixels, mime_type)
+    return f"data:{mime_type};base64,{base64_string}"
+
+
+def text_filepath_to_base64_string(filepath: str) -> str:
+    """Converts a text file to a base64 string."""
+    with open(filepath, "rb") as f:
+        file_content = f.read()
+    return base64.b64encode(file_content).decode("utf-8")
+
+
+def text_filepath_to_data_uri(filepath: str) -> str:
+    """Converts a text file to a data URI."""
+    base64_string = text_filepath_to_base64_string(filepath)
+    mime_type, _ = mimetypes.guess_type(filepath)
+    if mime_type is None:
+        mime_type = "application/octet-stream"
+    return f"data:{mime_type};base64,{base64_string}"
+
+
+async def upload_file_to_comfyapi(
+    file_bytes_io: BytesIO,
+    filename: str,
+    upload_mime_type: Optional[str],
+    auth_kwargs: Optional[dict[str, str]] = None,
+) -> str:
+    """
+    Uploads a single file to ComfyUI API and returns its download URL.
+
+    Args:
+        file_bytes_io: BytesIO object containing the file data.
+        filename: The filename of the file.
+        upload_mime_type: MIME type of the file.
+        auth_kwargs: Optional authentication token(s).
+
+    Returns:
+        The download URL for the uploaded file.
+    """
+    if upload_mime_type is None:
+        request_object = UploadRequest(file_name=filename)
+    else:
+        request_object = UploadRequest(file_name=filename, content_type=upload_mime_type)
+    operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path="/customers/storage",
+            method=HttpMethod.POST,
+            request_model=UploadRequest,
+            response_model=UploadResponse,
+        ),
+        request=request_object,
+        auth_kwargs=auth_kwargs,
+    )
+
+    response: UploadResponse = await operation.execute()
+    await ApiClient.upload_file(response.upload_url, file_bytes_io, content_type=upload_mime_type)
+    return response.download_url
+
+
+def video_to_base64_string(
+    video: VideoInput,
+    container_format: VideoContainer = None,
+    codec: VideoCodec = None
+) -> str:
+    """
+    Converts a video input to a base64 string.
+
+    Args:
+        video: The video input to convert
+        container_format: Optional container format to use (defaults to video.container if available)
+        codec: Optional codec to use (defaults to video.codec if available)
+    """
+    video_bytes_io = io.BytesIO()
+
+    # Use provided format/codec if specified, otherwise use video's own if available
+    format_to_use = container_format if container_format is not None else getattr(video, 'container', VideoContainer.MP4)
+    codec_to_use = codec if codec is not None else getattr(video, 'codec', VideoCodec.H264)
+
+    video.save_to(video_bytes_io, format=format_to_use, codec=codec_to_use)
+    video_bytes_io.seek(0)
+    return base64.b64encode(video_bytes_io.getvalue()).decode("utf-8")
+
+
+async def upload_video_to_comfyapi(
+    video: VideoInput,
+    auth_kwargs: Optional[dict[str, str]] = None,
+    container: VideoContainer = VideoContainer.MP4,
+    codec: VideoCodec = VideoCodec.H264,
+    max_duration: Optional[int] = None,
+) -> str:
+    """
+    Uploads a single video to ComfyUI API and returns its download URL.
+    Uses the specified container and codec for saving the video before upload.
+
+    Args:
+        video: VideoInput object (Comfy VIDEO type).
+        auth_kwargs: Optional authentication token(s).
+        container: The video container format to use (default: MP4).
+        codec: The video codec to use (default: H264).
+        max_duration: Optional maximum duration of the video in seconds. If the video is longer than this, an error will be raised.
+
+    Returns:
+        The download URL for the uploaded video file.
+    """
+    if max_duration is not None:
+        try:
+            actual_duration = video.duration_seconds
+            if actual_duration is not None and actual_duration > max_duration:
+                raise ValueError(
+                    f"Video duration ({actual_duration:.2f}s) exceeds the maximum allowed ({max_duration}s)."
+                )
+        except Exception as e:
+            logging.error("Error getting video duration: %s", str(e))
+            raise ValueError(f"Could not verify video duration from source: {e}") from e
+
+    upload_mime_type = f"video/{container.value.lower()}"
+    filename = f"uploaded_video.{container.value.lower()}"
+
+    # Convert VideoInput to BytesIO using specified container/codec
+    video_bytes_io = io.BytesIO()
+    video.save_to(video_bytes_io, format=container, codec=codec)
+    video_bytes_io.seek(0)
+
+    return await upload_file_to_comfyapi(video_bytes_io, filename, upload_mime_type, auth_kwargs)
+
+
+def audio_tensor_to_contiguous_ndarray(waveform: torch.Tensor) -> np.ndarray:
+    """
+    Prepares audio waveform for av library by converting to a contiguous numpy array.
+
+    Args:
+        waveform: a tensor of shape (1, channels, samples) derived from a Comfy `AUDIO` type.
+
+    Returns:
+        Contiguous numpy array of the audio waveform. If the audio was batched,
+            the first item is taken.
+    """
+    if waveform.ndim != 3 or waveform.shape[0] != 1:
+        raise ValueError("Expected waveform tensor shape (1, channels, samples)")
+
+    # If batch is > 1, take first item
+    if waveform.shape[0] > 1:
+        waveform = waveform[0]
+
+    # Prepare for av: remove batch dim, move to CPU, make contiguous, convert to numpy array
+    audio_data_np = waveform.squeeze(0).cpu().contiguous().numpy()
+    if audio_data_np.dtype != np.float32:
+        audio_data_np = audio_data_np.astype(np.float32)
+
+    return audio_data_np
+
+
+def audio_ndarray_to_bytesio(
+    audio_data_np: np.ndarray,
+    sample_rate: int,
+    container_format: str = "mp4",
+    codec_name: str = "aac",
+) -> BytesIO:
+    """
+    Encodes a numpy array of audio data into a BytesIO object.
+    """
+    audio_bytes_io = io.BytesIO()
+    with av.open(audio_bytes_io, mode="w", format=container_format) as output_container:
+        audio_stream = output_container.add_stream(codec_name, rate=sample_rate)
+        frame = av.AudioFrame.from_ndarray(
+            audio_data_np,
+            format="fltp",
+            layout="stereo" if audio_data_np.shape[0] > 1 else "mono",
+        )
+        frame.sample_rate = sample_rate
+        frame.pts = 0
+
+        for packet in audio_stream.encode(frame):
+            output_container.mux(packet)
+
+        # Flush stream
+        for packet in audio_stream.encode(None):
+            output_container.mux(packet)
+
+    audio_bytes_io.seek(0)
+    return audio_bytes_io
+
+
+async def upload_audio_to_comfyapi(
+    audio: AudioInput,
+    auth_kwargs: Optional[dict[str, str]] = None,
+    container_format: str = "mp4",
+    codec_name: str = "aac",
+    mime_type: str = "audio/mp4",
+    filename: str = "uploaded_audio.mp4",
+) -> str:
+    """
+    Uploads a single audio input to ComfyUI API and returns its download URL.
+    Encodes the raw waveform into the specified format before uploading.
+
+    Args:
+        audio: a Comfy `AUDIO` type (contains waveform tensor and sample_rate)
+        auth_kwargs: Optional authentication token(s).
+
+    Returns:
+        The download URL for the uploaded audio file.
+    """
+    sample_rate: int = audio["sample_rate"]
+    waveform: torch.Tensor = audio["waveform"]
+    audio_data_np = audio_tensor_to_contiguous_ndarray(waveform)
+    audio_bytes_io = audio_ndarray_to_bytesio(
+        audio_data_np, sample_rate, container_format, codec_name
+    )
+
+    return await upload_file_to_comfyapi(audio_bytes_io, filename, mime_type, auth_kwargs)
+
+
+def f32_pcm(wav: torch.Tensor) -> torch.Tensor:
+    """Convert audio to float 32 bits PCM format. Copy-paste from nodes_audio.py file."""
+    if wav.dtype.is_floating_point:
+        return wav
+    elif wav.dtype == torch.int16:
+        return wav.float() / (2 ** 15)
+    elif wav.dtype == torch.int32:
+        return wav.float() / (2 ** 31)
+    raise ValueError(f"Unsupported wav dtype: {wav.dtype}")
+
+
+def audio_bytes_to_audio_input(audio_bytes: bytes,) -> dict:
+    """
+    Decode any common audio container from bytes using PyAV and return
+    a Comfy AUDIO dict: {"waveform": [1, C, T] float32, "sample_rate": int}.
+    """
+    with av.open(io.BytesIO(audio_bytes)) as af:
+        if not af.streams.audio:
+            raise ValueError("No audio stream found in response.")
+        stream = af.streams.audio[0]
+
+        in_sr = int(stream.codec_context.sample_rate)
+        out_sr = in_sr
+
+        frames: list[torch.Tensor] = []
+        n_channels = stream.channels or 1
+
+        for frame in af.decode(streams=stream.index):
+            arr = frame.to_ndarray()  # shape can be [C, T] or [T, C] or [T]
+            buf = torch.from_numpy(arr)
+            if buf.ndim == 1:
+                buf = buf.unsqueeze(0)  # [T] -> [1, T]
+            elif buf.shape[0] != n_channels and buf.shape[-1] == n_channels:
+                buf = buf.transpose(0, 1).contiguous()  # [T, C] -> [C, T]
+            elif buf.shape[0] != n_channels:
+                buf = buf.reshape(-1, n_channels).t().contiguous()  # fallback to [C, T]
+            frames.append(buf)
+
+    if not frames:
+        raise ValueError("Decoded zero audio frames.")
+
+    wav = torch.cat(frames, dim=1)  # [C, T]
+    wav = f32_pcm(wav)
+    return {"waveform": wav.unsqueeze(0).contiguous(), "sample_rate": out_sr}
+
+
+def audio_input_to_mp3(audio: AudioInput) -> io.BytesIO:
+    waveform = audio["waveform"].cpu()
+
+    output_buffer = io.BytesIO()
+    output_container = av.open(output_buffer, mode='w', format="mp3")
+
+    out_stream = output_container.add_stream("libmp3lame", rate=audio["sample_rate"])
+    out_stream.bit_rate = 320000
+
+    frame = av.AudioFrame.from_ndarray(waveform.movedim(0, 1).reshape(1, -1).float().numpy(), format='flt', layout='mono' if waveform.shape[0] == 1 else 'stereo')
+    frame.sample_rate = audio["sample_rate"]
+    frame.pts = 0
+    output_container.mux(out_stream.encode(frame))
+    output_container.mux(out_stream.encode(None))
+    output_container.close()
+    output_buffer.seek(0)
+    return output_buffer
+
+
+def audio_to_base64_string(
+    audio: AudioInput, container_format: str = "mp4", codec_name: str = "aac"
+) -> str:
+    """Converts an audio input to a base64 string."""
+    sample_rate: int = audio["sample_rate"]
+    waveform: torch.Tensor = audio["waveform"]
+    audio_data_np = audio_tensor_to_contiguous_ndarray(waveform)
+    audio_bytes_io = audio_ndarray_to_bytesio(
+        audio_data_np, sample_rate, container_format, codec_name
+    )
+    audio_bytes = audio_bytes_io.getvalue()
+    return base64.b64encode(audio_bytes).decode("utf-8")
+
+
+async def upload_images_to_comfyapi(
+    image: torch.Tensor,
+    max_images=8,
+    auth_kwargs: Optional[dict[str, str]] = None,
+    mime_type: Optional[str] = None,
+) -> list[str]:
+    """
+    Uploads images to ComfyUI API and returns download URLs.
+    To upload multiple images, stack them in the batch dimension first.
+
+    Args:
+        image: Input torch.Tensor image.
+        max_images: Maximum number of images to upload.
+        auth_kwargs: Optional authentication token(s).
+        mime_type: Optional MIME type for the image.
+    """
+    # if batch, try to upload each file if max_images is greater than 0
+    download_urls: list[str] = []
+    is_batch = len(image.shape) > 3
+    batch_len = image.shape[0] if is_batch else 1
+
+    for idx in range(min(batch_len, max_images)):
+        tensor = image[idx] if is_batch else image
+        img_io = tensor_to_bytesio(tensor, mime_type=mime_type)
+        url = await upload_file_to_comfyapi(img_io, img_io.name, mime_type, auth_kwargs)
+        download_urls.append(url)
+    return download_urls
+
+
+def resize_mask_to_image(
+    mask: torch.Tensor,
+    image: torch.Tensor,
+    upscale_method="nearest-exact",
+    crop="disabled",
+    allow_gradient=True,
+    add_channel_dim=False,
+):
+    """
+    Resize mask to be the same dimensions as an image, while maintaining proper format for API calls.
+    """
+    _, H, W, _ = image.shape
+    mask = mask.unsqueeze(-1)
+    mask = mask.movedim(-1, 1)
+    mask = common_upscale(
+        mask, width=W, height=H, upscale_method=upscale_method, crop=crop
+    )
+    mask = mask.movedim(1, -1)
+    if not add_channel_dim:
+        mask = mask.squeeze(-1)
+    if not allow_gradient:
+        mask = (mask > 0.5).float()
+    return mask
+
+
+def validate_string(
+    string: str,
+    strip_whitespace=True,
+    field_name="prompt",
+    min_length=None,
+    max_length=None,
+):
+    if string is None:
+        raise Exception(f"Field '{field_name}' cannot be empty.")
+    if strip_whitespace:
+        string = string.strip()
+    if min_length and len(string) < min_length:
+        raise Exception(
+            f"Field '{field_name}' cannot be shorter than {min_length} characters; was {len(string)} characters long."
+        )
+    if max_length and len(string) > max_length:
+        raise Exception(
+            f" Field '{field_name} cannot be longer than {max_length} characters; was {len(string)} characters long."
+        )
+
+
+def image_tensor_pair_to_batch(
+    image1: torch.Tensor, image2: torch.Tensor
+) -> torch.Tensor:
+    """
+    Converts a pair of image tensors to a batch tensor.
+    If the images are not the same size, the smaller image is resized to
+    match the larger image.
+    """
+    if image1.shape[1:] != image2.shape[1:]:
+        image2 = common_upscale(
+            image2.movedim(-1, 1),
+            image1.shape[2],
+            image1.shape[1],
+            "bilinear",
+            "center",
+        ).movedim(1, -1)
+    return torch.cat((image1, image2), dim=0)
+
+
+def get_size(path_or_object: Union[str, io.BytesIO]) -> int:
+    if isinstance(path_or_object, str):
+        return os.path.getsize(path_or_object)
+    return len(path_or_object.getvalue())
+
+
+def validate_container_format_is_mp4(video: VideoInput) -> None:
+    """Validates video container format is MP4."""
+    container_format = video.get_container_format()
+    if container_format not in ["mp4", "mov,mp4,m4a,3gp,3g2,mj2"]:
+        raise ValueError(f"Only MP4 container format supported. Got: {container_format}")
--- a/comfy_api_nodes/apis/PixverseController.py
+++ b/comfy_api_nodes/apis/PixverseController.py
@@ -0,0 +1,17 @@
+# generated by datamodel-codegen:
+#   filename:  filtered-openapi.yaml
+#   timestamp: 2025-04-29T23:44:54+00:00
+
+from __future__ import annotations
+
+from typing import Optional
+
+from pydantic import BaseModel
+
+from . import PixverseDto
+
+
+class ResponseData(BaseModel):
+    ErrCode: Optional[int] = None
+    ErrMsg: Optional[str] = None
+    Resp: Optional[PixverseDto.V2OpenAPII2VResp] = None
--- a/comfy_api_nodes/apis/PixverseDto.py
+++ b/comfy_api_nodes/apis/PixverseDto.py
@@ -0,0 +1,57 @@
+# generated by datamodel-codegen:
+#   filename:  filtered-openapi.yaml
+#   timestamp: 2025-04-29T23:44:54+00:00
+
+from __future__ import annotations
+
+from typing import Optional
+
+from pydantic import BaseModel, Field
+
+
+class V2OpenAPII2VResp(BaseModel):
+    video_id: Optional[int] = Field(None, description='Video_id')
+
+
+class V2OpenAPIT2VReq(BaseModel):
+    aspect_ratio: str = Field(
+        ..., description='Aspect ratio (16:9, 4:3, 1:1, 3:4, 9:16)', examples=['16:9']
+    )
+    duration: int = Field(
+        ...,
+        description='Video duration (5, 8 seconds, --model=v3.5 only allows 5,8; --quality=1080p does not support 8s)',
+        examples=[5],
+    )
+    model: str = Field(
+        ..., description='Model version (only supports v3.5)', examples=['v3.5']
+    )
+    motion_mode: Optional[str] = Field(
+        'normal',
+        description='Motion mode (normal, fast, --fast only available when duration=5; --quality=1080p does not support fast)',
+        examples=['normal'],
+    )
+    negative_prompt: Optional[str] = Field(
+        None, description='Negative prompt\n', max_length=2048
+    )
+    prompt: str = Field(..., description='Prompt', max_length=2048)
+    quality: str = Field(
+        ...,
+        description='Video quality ("360p"(Turbo model), "540p", "720p", "1080p")',
+        examples=['540p'],
+    )
+    seed: Optional[int] = Field(None, description='Random seed, range: 0 - 2147483647')
+    style: Optional[str] = Field(
+        None,
+        description='Style (effective when model=v3.5, "anime", "3d_animation", "clay", "comic", "cyberpunk") Do not include style parameter unless needed',
+        examples=['anime'],
+    )
+    template_id: Optional[int] = Field(
+        None,
+        description='Template ID (template_id must be activated before use)',
+        examples=[302325299692608],
+    )
+    water_mark: Optional[bool] = Field(
+        False,
+        description='Watermark (true: add watermark, false: no watermark)',
+        examples=[False],
+    )
--- a/comfy_api_nodes/apis/bfl_api.py
+++ b/comfy_api_nodes/apis/bfl_api.py
@@ -50,6 +50,44 @@ class BFLFluxFillImageRequest(BaseModel):
    mask: str = Field(None, description='A Base64-encoded string representing the mask of the areas you with to modify.')


+class BFLFluxCannyImageRequest(BaseModel):
+    prompt: str = Field(..., description='Text prompt for image generation')
+    prompt_upsampling: Optional[bool] = Field(
+        None, description='Whether to perform upsampling on the prompt. If active, automatically modifies the prompt for more creative generation.'
+    )
+    canny_low_threshold: Optional[int] = Field(None, description='Low threshold for Canny edge detection')
+    canny_high_threshold: Optional[int] = Field(None, description='High threshold for Canny edge detection')
+    seed: Optional[int] = Field(None, description='The seed value for reproducibility.')
+    steps: conint(ge=15, le=50) = Field(..., description='Number of steps for the image generation process')
+    guidance: confloat(ge=1, le=100) = Field(..., description='Guidance strength for the image generation process')
+    safety_tolerance: Optional[conint(ge=0, le=6)] = Field(
+        6, description='Tolerance level for input and output moderation. Between 0 and 6, 0 being most strict, 6 being least strict. Defaults to 2.'
+    )
+    output_format: Optional[BFLOutputFormat] = Field(
+        BFLOutputFormat.png, description="Output format for the generated image. Can be 'jpeg' or 'png'.", examples=['png']
+    )
+    control_image: Optional[str] = Field(None, description='Base64 encoded image to use as control input if no preprocessed image is provided')
+    preprocessed_image: Optional[str] = Field(None, description='Optional pre-processed image that will bypass the control preprocessing step')
+
+
+class BFLFluxDepthImageRequest(BaseModel):
+    prompt: str = Field(..., description='Text prompt for image generation')
+    prompt_upsampling: Optional[bool] = Field(
+        None, description='Whether to perform upsampling on the prompt. If active, automatically modifies the prompt for more creative generation.'
+    )
+    seed: Optional[int] = Field(None, description='The seed value for reproducibility.')
+    steps: conint(ge=15, le=50) = Field(..., description='Number of steps for the image generation process')
+    guidance: confloat(ge=1, le=100) = Field(..., description='Guidance strength for the image generation process')
+    safety_tolerance: Optional[conint(ge=0, le=6)] = Field(
+        6, description='Tolerance level for input and output moderation. Between 0 and 6, 0 being most strict, 6 being least strict. Defaults to 2.'
+    )
+    output_format: Optional[BFLOutputFormat] = Field(
+        BFLOutputFormat.png, description="Output format for the generated image. Can be 'jpeg' or 'png'.", examples=['png']
+    )
+    control_image: Optional[str] = Field(None, description='Base64 encoded image to use as control input if no preprocessed image is provided')
+    preprocessed_image: Optional[str] = Field(None, description='Optional pre-processed image that will bypass the control preprocessing step')
+
+
 class BFLFluxProGenerateRequest(BaseModel):
    prompt: str = Field(..., description='The text prompt for image generation.')
    prompt_upsampling: Optional[bool] = Field(
@@ -122,8 +160,15 @@ class BFLStatus(str, Enum):
    error = "Error"


-class BFLFluxStatusResponse(BaseModel):
+class BFLFluxProStatusResponse(BaseModel):
    id: str = Field(..., description="The unique identifier for the generation task.")
    status: BFLStatus = Field(..., description="The status of the task.")
-    result: Optional[Dict[str, Any]] = Field(None, description="The result of the task (null if not completed).")
-    progress: Optional[float] = Field(None, description="The progress of the task (0.0 to 1.0).", ge=0.0, le=1.0)
+    result: Optional[Dict[str, Any]] = Field(
+        None, description="The result of the task (null if not completed)."
+    )
+    progress: confloat(ge=0.0, le=1.0) = Field(
+        ..., description="The progress of the task (0.0 to 1.0)."
+    )
+    details: Optional[Dict[str, Any]] = Field(
+        None, description="Additional details about the task (null if not available)."
+    )
--- a/comfy_api_nodes/apis/client.py
+++ b/comfy_api_nodes/apis/client.py
@@ -0,0 +1,981 @@
+"""
+API Client Framework for api.comfy.org.
+
+This module provides a flexible framework for making API requests from ComfyUI nodes.
+It supports both synchronous and asynchronous API operations with proper type validation.
+
+Key Components:
+--------------
+1. ApiClient - Handles HTTP requests with authentication and error handling
+2. ApiEndpoint - Defines a single HTTP endpoint with its request/response models
+3. ApiOperation - Executes a single synchronous API operation
+
+Usage Examples:
+--------------
+
+# Example 1: Synchronous API Operation
+# ------------------------------------
+# For a simple API call that returns the result immediately:
+
+# 1. Create the API client
+api_client = ApiClient(
+    base_url="https://api.example.com",
+    auth_token="your_auth_token_here",
+    comfy_api_key="your_comfy_api_key_here",
+    timeout=30.0,
+    verify_ssl=True
+)
+
+# 2. Define the endpoint
+user_info_endpoint = ApiEndpoint(
+    path="/v1/users/me",
+    method=HttpMethod.GET,
+    request_model=EmptyRequest,  # No request body needed
+    response_model=UserProfile,   # Pydantic model for the response
+    query_params=None
+)
+
+# 3. Create the request object
+request = EmptyRequest()
+
+# 4. Create and execute the operation
+operation = ApiOperation(
+    endpoint=user_info_endpoint,
+    request=request
+)
+user_profile = await operation.execute(client=api_client)  # Returns immediately with the result
+
+
+# Example 2: Asynchronous API Operation with Polling
+# -------------------------------------------------
+# For an API that starts a task and requires polling for completion:
+
+# 1. Define the endpoints (initial request and polling)
+generate_image_endpoint = ApiEndpoint(
+    path="/v1/images/generate",
+    method=HttpMethod.POST,
+    request_model=ImageGenerationRequest,
+    response_model=TaskCreatedResponse,
+    query_params=None
+)
+
+check_task_endpoint = ApiEndpoint(
+    path="/v1/tasks/{task_id}",
+    method=HttpMethod.GET,
+    request_model=EmptyRequest,
+    response_model=ImageGenerationResult,
+    query_params=None
+)
+
+# 2. Create the request object
+request = ImageGenerationRequest(
+    prompt="a beautiful sunset over mountains",
+    width=1024,
+    height=1024,
+    num_images=1
+)
+
+# 3. Create and execute the polling operation
+operation = PollingOperation(
+    initial_endpoint=generate_image_endpoint,
+    initial_request=request,
+    poll_endpoint=check_task_endpoint,
+    task_id_field="task_id",
+    status_field="status",
+    completed_statuses=["completed"],
+    failed_statuses=["failed", "error"]
+)
+
+# This will make the initial request and then poll until completion
+result = await operation.execute(client=api_client)  # Returns the final ImageGenerationResult when done
+"""
+
+from __future__ import annotations
+import aiohttp
+import asyncio
+import logging
+import io
+import os
+import socket
+from aiohttp.client_exceptions import ClientError, ClientResponseError
+from typing import Type, Optional, Any, TypeVar, Generic, Callable
+from enum import Enum
+import json
+from urllib.parse import urljoin, urlparse
+from pydantic import BaseModel, Field
+import uuid # For generating unique operation IDs
+
+from server import PromptServer
+from comfy.cli_args import args
+from comfy import utils
+from . import request_logger
+
+T = TypeVar("T", bound=BaseModel)
+R = TypeVar("R", bound=BaseModel)
+P = TypeVar("P", bound=BaseModel)  # For poll response
+
+PROGRESS_BAR_MAX = 100
+
+
+class NetworkError(Exception):
+    """Base exception for network-related errors with diagnostic information."""
+    pass
+
+
+class LocalNetworkError(NetworkError):
+    """Exception raised when local network connectivity issues are detected."""
+    pass
+
+
+class ApiServerError(NetworkError):
+    """Exception raised when the API server is unreachable but internet is working."""
+    pass
+
+
+class EmptyRequest(BaseModel):
+    """Base class for empty request bodies.
+    For GET requests, fields will be sent as query parameters."""
+
+    pass
+
+
+class UploadRequest(BaseModel):
+    file_name: str = Field(..., description="Filename to upload")
+    content_type: Optional[str] = Field(
+        None,
+        description="Mime type of the file. For example: image/png, image/jpeg, video/mp4, etc.",
+    )
+
+
+class UploadResponse(BaseModel):
+    download_url: str = Field(..., description="URL to GET uploaded file")
+    upload_url: str = Field(..., description="URL to PUT file to upload")
+
+
+class HttpMethod(str, Enum):
+    GET = "GET"
+    POST = "POST"
+    PUT = "PUT"
+    DELETE = "DELETE"
+    PATCH = "PATCH"
+
+
+class ApiClient:
+    """
+    Client for making HTTP requests to an API with authentication, error handling, and retry logic.
+    """
+
+    def __init__(
+        self,
+        base_url: str,
+        auth_token: Optional[str] = None,
+        comfy_api_key: Optional[str] = None,
+        timeout: float = 3600.0,
+        verify_ssl: bool = True,
+        max_retries: int = 3,
+        retry_delay: float = 1.0,
+        retry_backoff_factor: float = 2.0,
+        retry_status_codes: Optional[tuple[int, ...]] = None,
+        session: Optional[aiohttp.ClientSession] = None,
+    ):
+        self.base_url = base_url
+        self.auth_token = auth_token
+        self.comfy_api_key = comfy_api_key
+        self.timeout = timeout
+        self.verify_ssl = verify_ssl
+        self.max_retries = max_retries
+        self.retry_delay = retry_delay
+        self.retry_backoff_factor = retry_backoff_factor
+        # Default retry status codes: 408 (Request Timeout), 429 (Too Many Requests),
+        # 500, 502, 503, 504 (Server Errors)
+        self.retry_status_codes = retry_status_codes or (408, 429, 500, 502, 503, 504)
+        self._session: Optional[aiohttp.ClientSession] = session
+        self._owns_session = session is None  # Track if we have to close it
+
+    @staticmethod
+    def _generate_operation_id(path: str) -> str:
+        """Generates a unique operation ID for logging."""
+        return f"{path.strip('/').replace('/', '_')}_{uuid.uuid4().hex[:8]}"
+
+    @staticmethod
+    def _create_json_payload_args(
+        data: Optional[dict[str, Any]] = None,
+        headers: Optional[dict[str, str]] = None,
+    ) -> dict[str, Any]:
+        return {
+            "json": data,
+            "headers": headers,
+        }
+
+    def _create_form_data_args(
+        self,
+        data: dict[str, Any] | None,
+        files: dict[str, Any] | None,
+        headers: Optional[dict[str, str]] = None,
+        multipart_parser: Callable | None = None,
+    ) -> dict[str, Any]:
+        if headers and "Content-Type" in headers:
+            del headers["Content-Type"]
+
+        if multipart_parser and data:
+            data = multipart_parser(data)
+
+        if isinstance(data, aiohttp.FormData):
+            form = data  # If the parser already returned a FormData, pass it through
+        else:
+            form = aiohttp.FormData(default_to_multipart=True)
+            if data:  # regular text fields
+                for k, v in data.items():
+                    if v is None:
+                        continue  # aiohttp fails to serialize "None" values
+                    # aiohttp expects strings or bytes; convert enums etc.
+                    form.add_field(k, str(v) if not isinstance(v, (bytes, bytearray)) else v)
+
+        if files:
+            file_iter = files if isinstance(files, list) else files.items()
+            for field_name, file_obj in file_iter:
+                if file_obj is None:
+                    continue  # aiohttp fails to serialize "None" values
+                # file_obj can be (filename, bytes/io.BytesIO, content_type) tuple
+                if isinstance(file_obj, tuple):
+                    filename, file_value, content_type = self._unpack_tuple(file_obj)
+                else:
+                    file_value = file_obj
+                    filename = getattr(file_obj, "name", field_name)
+                    content_type = "application/octet-stream"
+
+                form.add_field(
+                    name=field_name,
+                    value=file_value,
+                    filename=filename,
+                    content_type=content_type,
+                )
+        return {"data": form, "headers": headers or {}}
+
+    @staticmethod
+    def _create_urlencoded_form_data_args(
+        data: dict[str, Any],
+        headers: Optional[dict[str, str]] = None,
+    ) -> dict[str, Any]:
+        headers = headers or {}
+        headers["Content-Type"] = "application/x-www-form-urlencoded"
+        return {
+            "data": data,
+            "headers": headers,
+        }
+
+    def get_headers(self) -> dict[str, str]:
+        """Get headers for API requests, including authentication if available"""
+        headers = {"Content-Type": "application/json", "Accept": "application/json"}
+
+        if self.auth_token:
+            headers["Authorization"] = f"Bearer {self.auth_token}"
+        elif self.comfy_api_key:
+            headers["X-API-KEY"] = self.comfy_api_key
+
+        return headers
+
+    async def _check_connectivity(self, target_url: str) -> dict[str, bool]:
+        """
+        Check connectivity to determine if network issues are local or server-related.
+
+        Args:
+            target_url: URL to check connectivity to
+
+        Returns:
+            Dictionary with connectivity status details
+        """
+        results = {
+            "internet_accessible": False,
+            "api_accessible": False,
+            "is_local_issue": False,
+            "is_api_issue": False,
+        }
+        timeout = aiohttp.ClientTimeout(total=5.0)
+        async with aiohttp.ClientSession(timeout=timeout) as session:
+            try:
+                async with session.get("https://www.google.com", ssl=self.verify_ssl) as resp:
+                    results["internet_accessible"] = resp.status < 500
+            except (ClientError, asyncio.TimeoutError, socket.gaierror):
+                results["is_local_issue"] = True
+                return results  # cannot reach the internet – early exit
+
+            # Now check API health endpoint
+            parsed = urlparse(target_url)
+            health_url = f"{parsed.scheme}://{parsed.netloc}/health"
+            try:
+                async with session.get(health_url, ssl=self.verify_ssl) as resp:
+                    results["api_accessible"] = resp.status < 500
+            except ClientError:
+                pass  # leave as False
+
+        results["is_api_issue"] = results["internet_accessible"] and not results["api_accessible"]
+        return results
+
+    async def request(
+        self,
+        method: str,
+        path: str,
+        params: Optional[dict[str, Any]] = None,
+        data: Optional[dict[str, Any]] = None,
+        files: Optional[dict[str, Any] | list[tuple[str, Any]]] = None,
+        headers: Optional[dict[str, str]] = None,
+        content_type: str = "application/json",
+        multipart_parser: Callable | None = None,
+        retry_count: int = 0,  # Used internally for tracking retries
+    ) -> dict[str, Any]:
+        """
+        Make an HTTP request to the API with automatic retries for transient errors.
+
+        Args:
+            method: HTTP method (GET, POST, etc.)
+            path: API endpoint path (will be joined with base_url)
+            params: Query parameters
+            data: body data
+            files: Files to upload
+            headers: Additional headers
+            content_type: Content type of the request. Defaults to application/json.
+            retry_count: Internal parameter for tracking retries, do not set manually
+
+        Returns:
+            Parsed JSON response
+
+        Raises:
+            LocalNetworkError: If local network connectivity issues are detected
+            ApiServerError: If the API server is unreachable but internet is working
+            Exception: For other request failures
+        """
+
+        # Build full URL and merge headers
+        relative_path = path.lstrip("/")
+        url = urljoin(self.base_url, relative_path)
+        self._check_auth(self.auth_token, self.comfy_api_key)
+
+        request_headers = self.get_headers()
+        if headers:
+            request_headers.update(headers)
+        if files:
+            request_headers.pop("Content-Type", None)
+        if params:
+            params = {k: v for k, v in params.items() if v is not None}  # aiohttp fails to serialize None values
+
+        logging.debug("[DEBUG] Request Headers: %s", request_headers)
+        logging.debug("[DEBUG] Files: %s", files)
+        logging.debug("[DEBUG] Params: %s", params)
+        logging.debug("[DEBUG] Data: %s", data)
+
+        if content_type == "application/x-www-form-urlencoded":
+            payload_args = self._create_urlencoded_form_data_args(data or {}, request_headers)
+        elif content_type == "multipart/form-data":
+            payload_args = self._create_form_data_args(data, files, request_headers, multipart_parser)
+        else:
+            payload_args = self._create_json_payload_args(data, request_headers)
+
+        operation_id = self._generate_operation_id(path)
+        request_logger.log_request_response(
+            operation_id=operation_id,
+            request_method=method,
+            request_url=url,
+            request_headers=request_headers,
+            request_params=params,
+            request_data=data if content_type == "application/json" else "[form-data or other]",
+        )
+
+        session = await self._get_session()
+        try:
+            async with session.request(
+                method,
+                url,
+                params=params,
+                ssl=self.verify_ssl,
+                **payload_args,
+            ) as resp:
+                if resp.status >= 400:
+                    try:
+                        error_data = await resp.json()
+                    except (aiohttp.ContentTypeError, json.JSONDecodeError):
+                        error_data = await resp.text()
+
+                    return await self._handle_http_error(
+                        ClientResponseError(resp.request_info, resp.history, status=resp.status, message=error_data),
+                        operation_id,
+                        method,
+                        url,
+                        params,
+                        data,
+                        files,
+                        headers,
+                        content_type,
+                        multipart_parser,
+                        retry_count=retry_count,
+                        response_content=error_data,
+                    )
+
+                # Success – parse JSON (safely) and log
+                try:
+                    payload = await resp.json()
+                    response_content_to_log = payload
+                except (aiohttp.ContentTypeError, json.JSONDecodeError):
+                    payload = {}
+                    response_content_to_log = await resp.text()
+
+                request_logger.log_request_response(
+                    operation_id=operation_id,
+                    request_method=method,
+                    request_url=url,
+                    response_status_code=resp.status,
+                    response_headers=dict(resp.headers),
+                    response_content=response_content_to_log,
+                )
+                return payload
+
+        except (ClientError, asyncio.TimeoutError, socket.gaierror) as e:
+            # Treat as *connection* problem – optionally retry, else escalate
+            if retry_count < self.max_retries:
+                delay = self.retry_delay * (self.retry_backoff_factor ** retry_count)
+                logging.warning("Connection error. Retrying in %.2fs (%s/%s): %s", delay, retry_count + 1,
+                                self.max_retries, str(e))
+                await asyncio.sleep(delay)
+                return await self.request(
+                    method,
+                    path,
+                    params=params,
+                    data=data,
+                    files=files,
+                    headers=headers,
+                    content_type=content_type,
+                    multipart_parser=multipart_parser,
+                    retry_count=retry_count + 1,
+                )
+            # One final connectivity check for diagnostics
+            connectivity = await self._check_connectivity(self.base_url)
+            if connectivity["is_local_issue"]:
+                raise LocalNetworkError(
+                    "Unable to connect to the API server due to local network issues. "
+                    "Please check your internet connection and try again."
+                ) from e
+            raise ApiServerError(
+                f"The API server at {self.base_url} is currently unreachable. "
+                f"The service may be experiencing issues. Please try again later."
+            ) from e
+
+    @staticmethod
+    def _check_auth(auth_token, comfy_api_key):
+        """Verify that an auth token is present or comfy_api_key is present"""
+        if auth_token is None and comfy_api_key is None:
+            raise Exception("Unauthorized: Please login first to use this node.")
+        return auth_token or comfy_api_key
+
+    @staticmethod
+    async def upload_file(
+        upload_url: str,
+        file: io.BytesIO | str,
+        content_type: str | None = None,
+        max_retries: int = 3,
+        retry_delay: float = 1.0,
+        retry_backoff_factor: float = 2.0,
+    ) -> aiohttp.ClientResponse:
+        """Upload a file to the API with retry logic.
+
+        Args:
+            upload_url: The URL to upload to
+            file: Either a file path string, BytesIO object, or tuple of (file_path, filename)
+            content_type: Optional mime type to set for the upload
+            max_retries: Maximum number of retry attempts
+            retry_delay: Initial delay between retries in seconds
+            retry_backoff_factor: Multiplier for the delay after each retry
+        """
+        headers: dict[str, str] = {}
+        skip_auto_headers: set[str] = set()
+        if content_type:
+            headers["Content-Type"] = content_type
+        else:
+            # tell aiohttp not to add Content-Type that will break the request signature and result in a 403 status.
+            skip_auto_headers.add("Content-Type")
+
+        # Extract file bytes
+        if isinstance(file, io.BytesIO):
+            file.seek(0)
+            data = file.read()
+        elif isinstance(file, str):
+            with open(file, "rb") as f:
+                data = f.read()
+        else:
+            raise ValueError("File must be BytesIO or str path")
+
+        parsed = urlparse(upload_url)
+        basename = os.path.basename(parsed.path) or parsed.netloc or "upload"
+        operation_id = f"upload_{basename}_{uuid.uuid4().hex[:8]}"
+        request_logger.log_request_response(
+            operation_id=operation_id,
+            request_method="PUT",
+            request_url=upload_url,
+            request_headers=headers,
+            request_data=f"[File data {len(data)} bytes]",
+        )
+
+        delay = retry_delay
+        for attempt in range(max_retries + 1):
+            try:
+                timeout = aiohttp.ClientTimeout(total=None)  # honour server side timeouts
+                async with aiohttp.ClientSession(timeout=timeout) as session:
+                    async with session.put(
+                        upload_url, data=data, headers=headers, skip_auto_headers=skip_auto_headers,
+                    ) as resp:
+                        resp.raise_for_status()
+                        request_logger.log_request_response(
+                            operation_id=operation_id,
+                            request_method="PUT",
+                            request_url=upload_url,
+                            response_status_code=resp.status,
+                            response_headers=dict(resp.headers),
+                            response_content="File uploaded successfully.",
+                        )
+                        return resp
+            except (ClientError, asyncio.TimeoutError) as e:
+                request_logger.log_request_response(
+                    operation_id=operation_id,
+                    request_method="PUT",
+                    request_url=upload_url,
+                    response_status_code=e.status if hasattr(e, "status") else None,
+                    response_headers=dict(e.headers) if hasattr(e, "headers") else None,
+                    response_content=None,
+                    error_message=f"{type(e).__name__}: {str(e)}",
+                )
+                if attempt < max_retries:
+                    logging.warning(
+                        "Upload failed (%s/%s). Retrying in %.2fs. %s", attempt + 1, max_retries, delay, str(e)
+                    )
+                    await asyncio.sleep(delay)
+                    delay *= retry_backoff_factor
+                else:
+                    raise NetworkError(f"Failed to upload file after {max_retries + 1} attempts: {e}") from e
+
+    async def _handle_http_error(
+        self,
+        exc: ClientResponseError,
+        operation_id: str,
+        *req_meta,
+        retry_count: int,
+        response_content: dict | str = "",
+    ) -> dict[str, Any]:
+        status_code = exc.status
+        if status_code == 401:
+            user_friendly = "Unauthorized: Please login first to use this node."
+        elif status_code == 402:
+            user_friendly = "Payment Required: Please add credits to your account to use this node."
+        elif status_code == 409:
+            user_friendly = "There is a problem with your account. Please contact support@comfy.org."
+        elif status_code == 429:
+            user_friendly = "Rate Limit Exceeded: Please try again later."
+        else:
+            if isinstance(response_content, dict):
+                if "error" in response_content and "message" in response_content["error"]:
+                    user_friendly = f"API Error: {response_content['error']['message']}"
+                    if "type" in response_content["error"]:
+                        user_friendly += f" (Type: {response_content['error']['type']})"
+                else: # Handle cases where error is just a JSON dict with unknown format
+                    user_friendly = f"API Error: {json.dumps(response_content)}"
+            else:
+                if len(response_content) < 200:  # Arbitrary limit for display
+                    user_friendly = f"API Error (raw): {response_content}"
+                else:
+                    user_friendly = f"API Error (raw, status {response_content})"
+
+        request_logger.log_request_response(
+            operation_id=operation_id,
+            request_method=req_meta[0],
+            request_url=req_meta[1],
+            response_status_code=exc.status,
+            response_headers=dict(req_meta[5]) if req_meta[5] else None,
+            response_content=response_content,
+            error_message=f"HTTP Error {exc.status}",
+        )
+
+        logging.debug("[DEBUG] API Error: %s (Status: %s)", user_friendly, status_code)
+        if response_content:
+            logging.debug("[DEBUG] Response content: %s", response_content)
+
+        # Retry if eligible
+        if status_code in self.retry_status_codes and retry_count < self.max_retries:
+            delay = self.retry_delay * (self.retry_backoff_factor ** retry_count)
+            logging.warning(
+                "HTTP error %s. Retrying in %.2fs (%s/%s)",
+                status_code,
+                delay,
+                retry_count + 1,
+                self.max_retries,
+            )
+            await asyncio.sleep(delay)
+            return await self.request(
+                req_meta[0],  # method
+                req_meta[1].replace(self.base_url, ""),  # path
+                params=req_meta[2],
+                data=req_meta[3],
+                files=req_meta[4],
+                headers=req_meta[5],
+                content_type=req_meta[6],
+                multipart_parser=req_meta[7],
+                retry_count=retry_count + 1,
+            )
+
+        raise Exception(user_friendly) from exc
+
+    @staticmethod
+    def _unpack_tuple(t):
+        """Helper to normalise (filename, file, content_type) tuples."""
+        if len(t) == 3:
+            return t
+        elif len(t) == 2:
+            return t[0], t[1], "application/octet-stream"
+        else:
+            raise ValueError("files tuple must be (filename, file[, content_type])")
+
+    async def _get_session(self) -> aiohttp.ClientSession:
+        if self._session is None or self._session.closed:
+            timeout = aiohttp.ClientTimeout(total=self.timeout)
+            self._session = aiohttp.ClientSession(timeout=timeout)
+            self._owns_session = True
+        return self._session
+
+    async def close(self) -> None:
+        if self._owns_session and self._session and not self._session.closed:
+            await self._session.close()
+
+    async def __aenter__(self) -> "ApiClient":
+        """Allow usage as async‑context‑manager – ensures clean teardown"""
+        return self
+
+    async def __aexit__(self, exc_type, exc, tb):
+        await self.close()
+
+
+class ApiEndpoint(Generic[T, R]):
+    """Defines an API endpoint with its request and response types"""
+
+    def __init__(
+        self,
+        path: str,
+        method: HttpMethod,
+        request_model: Type[T],
+        response_model: Type[R],
+        query_params: Optional[dict[str, Any]] = None,
+    ):
+        """Initialize an API endpoint definition.
+
+        Args:
+            path: The URL path for this endpoint, can include placeholders like {id}
+            method: The HTTP method to use (GET, POST, etc.)
+            request_model: Pydantic model class that defines the structure and validation rules for API requests to this endpoint
+            response_model: Pydantic model class that defines the structure and validation rules for API responses from this endpoint
+            query_params: Optional dictionary of query parameters to include in the request
+        """
+        self.path = path
+        self.method = method
+        self.request_model = request_model
+        self.response_model = response_model
+        self.query_params = query_params or {}
+
+
+class SynchronousOperation(Generic[T, R]):
+    """Represents a single synchronous API operation."""
+
+    def __init__(
+        self,
+        endpoint: ApiEndpoint[T, R],
+        request: T,
+        files: Optional[dict[str, Any] | list[tuple[str, Any]]] = None,
+        api_base: str | None = None,
+        auth_token: Optional[str] = None,
+        comfy_api_key: Optional[str] = None,
+        auth_kwargs: Optional[dict[str, str]] = None,
+        timeout: float = 7200.0,
+        verify_ssl: bool = True,
+        content_type: str = "application/json",
+        multipart_parser: Callable | None = None,
+        max_retries: int = 3,
+        retry_delay: float = 1.0,
+        retry_backoff_factor: float = 2.0,
+    ) -> None:
+        self.endpoint = endpoint
+        self.request = request
+        self.files = files
+        self.api_base: str = api_base or args.comfy_api_base
+        self.auth_token = auth_token
+        self.comfy_api_key = comfy_api_key
+        if auth_kwargs is not None:
+            self.auth_token = auth_kwargs.get("auth_token", self.auth_token)
+            self.comfy_api_key = auth_kwargs.get("comfy_api_key", self.comfy_api_key)
+        self.timeout = timeout
+        self.verify_ssl = verify_ssl
+        self.content_type = content_type
+        self.multipart_parser = multipart_parser
+        self.max_retries = max_retries
+        self.retry_delay = retry_delay
+        self.retry_backoff_factor = retry_backoff_factor
+
+    async def execute(self, client: Optional[ApiClient] = None) -> R:
+        owns_client = client is None
+        if owns_client:
+            client = ApiClient(
+                base_url=self.api_base,
+                auth_token=self.auth_token,
+                comfy_api_key=self.comfy_api_key,
+                timeout=self.timeout,
+                verify_ssl=self.verify_ssl,
+                max_retries=self.max_retries,
+                retry_delay=self.retry_delay,
+                retry_backoff_factor=self.retry_backoff_factor,
+            )
+
+        try:
+            request_dict: Optional[dict[str, Any]]
+            if isinstance(self.request, EmptyRequest):
+                request_dict = None
+            else:
+                request_dict = self.request.model_dump(exclude_none=True)
+                for k, v in list(request_dict.items()):
+                    if isinstance(v, Enum):
+                        request_dict[k] = v.value
+
+            logging.debug("[DEBUG] API Request: %s %s", self.endpoint.method.value, self.endpoint.path)
+            logging.debug("[DEBUG] Request Data: %s", json.dumps(request_dict, indent=2))
+            logging.debug("[DEBUG] Query Params: %s", self.endpoint.query_params)
+
+            response_json = await client.request(
+                self.endpoint.method.value,
+                self.endpoint.path,
+                params=self.endpoint.query_params,
+                data=request_dict,
+                files=self.files,
+                content_type=self.content_type,
+                multipart_parser=self.multipart_parser,
+            )
+
+            logging.debug("=" * 50)
+            logging.debug("[DEBUG] RESPONSE DETAILS:")
+            logging.debug("[DEBUG] Status Code: 200 (Success)")
+            logging.debug("[DEBUG] Response Body: %s", json.dumps(response_json, indent=2))
+            logging.debug("=" * 50)
+
+            parsed_response = self.endpoint.response_model.model_validate(response_json)
+            logging.debug("[DEBUG] Parsed Response: %s", parsed_response)
+            return parsed_response
+        finally:
+            if owns_client:
+                await client.close()
+
+
+class TaskStatus(str, Enum):
+    """Enum for task status values"""
+
+    COMPLETED = "completed"
+    FAILED = "failed"
+    PENDING = "pending"
+
+
+class PollingOperation(Generic[T, R]):
+    """Represents an asynchronous API operation that requires polling for completion."""
+
+    def __init__(
+        self,
+        poll_endpoint: ApiEndpoint[EmptyRequest, R],
+        completed_statuses: list[str],
+        failed_statuses: list[str],
+        *,
+        status_extractor: Callable[[R], Optional[str]],
+        progress_extractor: Callable[[R], Optional[float]] | None = None,
+        result_url_extractor: Callable[[R], Optional[str]] | None = None,
+        price_extractor: Callable[[R], Optional[float]] | None = None,
+        request: Optional[T] = None,
+        api_base: str | None = None,
+        auth_token: Optional[str] = None,
+        comfy_api_key: Optional[str] = None,
+        auth_kwargs: Optional[dict[str, str]] = None,
+        poll_interval: float = 5.0,
+        max_poll_attempts: int = 120,  # Default max polling attempts (10 minutes with 5s interval)
+        max_retries: int = 3,  # Max retries per individual API call
+        retry_delay: float = 1.0,
+        retry_backoff_factor: float = 2.0,
+        estimated_duration: Optional[float] = None,
+        node_id: Optional[str] = None,
+    ) -> None:
+        self.poll_endpoint = poll_endpoint
+        self.request = request
+        self.api_base: str = api_base or args.comfy_api_base
+        self.auth_token = auth_token
+        self.comfy_api_key = comfy_api_key
+        if auth_kwargs is not None:
+            self.auth_token = auth_kwargs.get("auth_token", self.auth_token)
+            self.comfy_api_key = auth_kwargs.get("comfy_api_key", self.comfy_api_key)
+        self.poll_interval = poll_interval
+        self.max_poll_attempts = max_poll_attempts
+        self.max_retries = max_retries
+        self.retry_delay = retry_delay
+        self.retry_backoff_factor = retry_backoff_factor
+        self.estimated_duration = estimated_duration
+        self.status_extractor = status_extractor or (lambda x: getattr(x, "status", None))
+        self.progress_extractor = progress_extractor
+        self.result_url_extractor = result_url_extractor
+        self.price_extractor = price_extractor
+        self.node_id = node_id
+        self.completed_statuses = completed_statuses
+        self.failed_statuses = failed_statuses
+        self.final_response: Optional[R] = None
+        self.extracted_price: Optional[float] = None
+
+    async def execute(self, client: Optional[ApiClient] = None) -> R:
+        owns_client = client is None
+        if owns_client:
+            client = ApiClient(
+                base_url=self.api_base,
+                auth_token=self.auth_token,
+                comfy_api_key=self.comfy_api_key,
+                max_retries=self.max_retries,
+                retry_delay=self.retry_delay,
+                retry_backoff_factor=self.retry_backoff_factor,
+            )
+        try:
+            return await self._poll_until_complete(client)
+        finally:
+            if owns_client:
+                await client.close()
+
+    def _display_text_on_node(self, text: str):
+        if not self.node_id:
+            return
+        if self.extracted_price is not None:
+            text = f"Price: ${self.extracted_price}\n{text}"
+        PromptServer.instance.send_progress_text(text, self.node_id)
+
+    def _display_time_progress_on_node(self, time_completed: int | float):
+        if not self.node_id:
+            return
+        if self.estimated_duration is not None:
+            remaining = max(0, int(self.estimated_duration) - time_completed)
+            message = f"Task in progress: {time_completed}s (~{remaining}s remaining)"
+        else:
+            message = f"Task in progress: {time_completed}s"
+        self._display_text_on_node(message)
+
+    def _check_task_status(self, response: R) -> TaskStatus:
+        try:
+            status = self.status_extractor(response)
+            if status in self.completed_statuses:
+                return TaskStatus.COMPLETED
+            if status in self.failed_statuses:
+                return TaskStatus.FAILED
+            return TaskStatus.PENDING
+        except Exception as e:
+            logging.error("Error extracting status: %s", e)
+            return TaskStatus.PENDING
+
+    async def _poll_until_complete(self, client: ApiClient) -> R:
+        """Poll until the task is complete"""
+        consecutive_errors = 0
+        max_consecutive_errors = min(5, self.max_retries * 2)  # Limit consecutive errors
+
+        if self.progress_extractor:
+            progress = utils.ProgressBar(PROGRESS_BAR_MAX)
+
+        status = TaskStatus.PENDING
+        for poll_count in range(1, self.max_poll_attempts + 1):
+            try:
+                logging.debug("[DEBUG] Polling attempt #%s", poll_count)
+
+                request_dict = None if self.request is None else self.request.model_dump(exclude_none=True)
+
+                if poll_count == 1:
+                    logging.debug(
+                        "[DEBUG] Poll Request: %s %s",
+                        self.poll_endpoint.method.value,
+                        self.poll_endpoint.path,
+                    )
+                    logging.debug(
+                        "[DEBUG] Poll Request Data: %s",
+                        json.dumps(request_dict, indent=2) if request_dict else "None",
+                    )
+
+                # Query task status
+                resp = await client.request(
+                    self.poll_endpoint.method.value,
+                    self.poll_endpoint.path,
+                    params=self.poll_endpoint.query_params,
+                    data=request_dict,
+                )
+                consecutive_errors = 0  # reset on success
+                response_obj: R = self.poll_endpoint.response_model.model_validate(resp)
+
+                # Check if task is complete
+                status = self._check_task_status(response_obj)
+                logging.debug("[DEBUG] Task Status: %s", status)
+
+                # If progress extractor is provided, extract progress
+                if self.progress_extractor:
+                    new_progress = self.progress_extractor(response_obj)
+                    if new_progress is not None:
+                        progress.update_absolute(new_progress, total=PROGRESS_BAR_MAX)
+
+                if self.price_extractor:
+                    price = self.price_extractor(response_obj)
+                    if price is not None:
+                        self.extracted_price = price
+
+                if status == TaskStatus.COMPLETED:
+                    message = "Task completed successfully"
+                    if self.result_url_extractor:
+                        result_url = self.result_url_extractor(response_obj)
+                        if result_url:
+                            message = f"Result URL: {result_url}"
+                    logging.debug("[DEBUG] %s", message)
+                    self._display_text_on_node(message)
+                    self.final_response = response_obj
+                    if self.progress_extractor:
+                        progress.update(100)
+                    return self.final_response
+                if status == TaskStatus.FAILED:
+                    message = f"Task failed: {json.dumps(resp)}"
+                    logging.error("[DEBUG] %s", message)
+                    raise Exception(message)
+                logging.debug("[DEBUG] Task still pending, continuing to poll...")
+                # Task pending – wait
+                for i in range(int(self.poll_interval)):
+                    self._display_time_progress_on_node((poll_count - 1) * self.poll_interval + i)
+                    await asyncio.sleep(1)
+
+            except (LocalNetworkError, ApiServerError, NetworkError) as e:
+                consecutive_errors += 1
+                if consecutive_errors >= max_consecutive_errors:
+                    raise Exception(
+                        f"Polling aborted after {consecutive_errors} network errors: {str(e)}"
+                    ) from e
+                logging.warning(
+                    "Network error (%s/%s): %s",
+                    consecutive_errors,
+                    max_consecutive_errors,
+                    str(e),
+                )
+                await asyncio.sleep(self.poll_interval)
+            except Exception as e:
+                # For other errors, increment count and potentially abort
+                consecutive_errors += 1
+                if consecutive_errors >= max_consecutive_errors or status == TaskStatus.FAILED:
+                    raise Exception(
+                        f"Polling aborted after {consecutive_errors} consecutive errors: {str(e)}"
+                    ) from e
+
+                logging.error("[DEBUG] Polling error: %s", str(e))
+                logging.warning(
+                    "Error during polling (attempt %s/%s): %s. Will retry in %s seconds.",
+                    poll_count,
+                    self.max_poll_attempts,
+                    str(e),
+                    self.poll_interval,
+                )
+                await asyncio.sleep(self.poll_interval)
+
+        # If we've exhausted all polling attempts
+        raise Exception(
+            f"Polling timed out after {self.max_poll_attempts} attempts (" f"{self.max_poll_attempts * self.poll_interval} seconds). "
+            "The operation may still be running on the server but is taking longer than expected."
+        )
--- a/comfy_api_nodes/apis/gemini_api.py
+++ b/comfy_api_nodes/apis/gemini_api.py
@@ -1,230 +1,22 @@
-from datetime import date
-from enum import Enum
-from typing import Any
+from typing import Optional

-from pydantic import BaseModel, Field
-
-
-class GeminiSafetyCategory(str, Enum):
-    HARM_CATEGORY_SEXUALLY_EXPLICIT = "HARM_CATEGORY_SEXUALLY_EXPLICIT"
-    HARM_CATEGORY_HATE_SPEECH = "HARM_CATEGORY_HATE_SPEECH"
-    HARM_CATEGORY_HARASSMENT = "HARM_CATEGORY_HARASSMENT"
-    HARM_CATEGORY_DANGEROUS_CONTENT = "HARM_CATEGORY_DANGEROUS_CONTENT"
-
-
-class GeminiSafetyThreshold(str, Enum):
-    OFF = "OFF"
-    BLOCK_NONE = "BLOCK_NONE"
-    BLOCK_LOW_AND_ABOVE = "BLOCK_LOW_AND_ABOVE"
-    BLOCK_MEDIUM_AND_ABOVE = "BLOCK_MEDIUM_AND_ABOVE"
-    BLOCK_ONLY_HIGH = "BLOCK_ONLY_HIGH"
-
-
-class GeminiSafetySetting(BaseModel):
-    category: GeminiSafetyCategory
-    threshold: GeminiSafetyThreshold
-
-
-class GeminiRole(str, Enum):
-    user = "user"
-    model = "model"
-
-
-class GeminiMimeType(str, Enum):
-    application_pdf = "application/pdf"
-    audio_mpeg = "audio/mpeg"
-    audio_mp3 = "audio/mp3"
-    audio_wav = "audio/wav"
-    image_png = "image/png"
-    image_jpeg = "image/jpeg"
-    image_webp = "image/webp"
-    text_plain = "text/plain"
-    video_mov = "video/mov"
-    video_mpeg = "video/mpeg"
-    video_mp4 = "video/mp4"
-    video_mpg = "video/mpg"
-    video_avi = "video/avi"
-    video_wmv = "video/wmv"
-    video_mpegps = "video/mpegps"
-    video_flv = "video/flv"
-
-
-class GeminiInlineData(BaseModel):
-    data: str | None = Field(
-        None,
-        description="The base64 encoding of the image, PDF, or video to include inline in the prompt. "
-        "When including media inline, you must also specify the media type (mimeType) of the data. Size limit: 20MB",
-    )
-    mimeType: GeminiMimeType | None = Field(None)
-
-
-class GeminiPart(BaseModel):
-    inlineData: GeminiInlineData | None = Field(None)
-    text: str | None = Field(None)
-
-
-class GeminiTextPart(BaseModel):
-    text: str | None = Field(None)
-
-
-class GeminiContent(BaseModel):
-    parts: list[GeminiPart] = Field([])
-    role: GeminiRole = Field(..., examples=["user"])
-
-
-class GeminiSystemInstructionContent(BaseModel):
-    parts: list[GeminiTextPart] = Field(
-        ...,
-        description="A list of ordered parts that make up a single message. "
-        "Different parts may have different IANA MIME types.",
-    )
-    role: GeminiRole = Field(
-        ...,
-        description="The identity of the entity that creates the message. "
-        "The following values are supported: "
-        "user: This indicates that the message is sent by a real person, typically a user-generated message. "
-        "model: This indicates that the message is generated by the model. "
-        "The model value is used to insert messages from model into the conversation during multi-turn conversations. "
-        "For non-multi-turn conversations, this field can be left blank or unset.",
-    )
-
-
-class GeminiFunctionDeclaration(BaseModel):
-    description: str | None = Field(None)
-    name: str = Field(...)
-    parameters: dict[str, Any] = Field(..., description="JSON schema for the function parameters")
-
-
-class GeminiTool(BaseModel):
-    functionDeclarations: list[GeminiFunctionDeclaration] | None = Field(None)
-
-
-class GeminiOffset(BaseModel):
-    nanos: int | None = Field(None, ge=0, le=999999999)
-    seconds: int | None = Field(None, ge=-315576000000, le=315576000000)
-
-
-class GeminiVideoMetadata(BaseModel):
-    endOffset: GeminiOffset | None = Field(None)
-    startOffset: GeminiOffset | None = Field(None)
-
-
-class GeminiGenerationConfig(BaseModel):
-    maxOutputTokens: int | None = Field(None, ge=16, le=8192)
-    seed: int | None = Field(None)
-    stopSequences: list[str] | None = Field(None)
-    temperature: float | None = Field(1, ge=0.0, le=2.0)
-    topK: int | None = Field(40, ge=1)
-    topP: float | None = Field(0.95, ge=0.0, le=1.0)
+from comfy_api_nodes.apis import GeminiGenerationConfig, GeminiContent, GeminiSafetySetting, GeminiSystemInstructionContent, GeminiTool, GeminiVideoMetadata
+from pydantic import BaseModel


 class GeminiImageConfig(BaseModel):
-    aspectRatio: str | None = Field(None)
-    imageSize: str | None = Field(None)
+    aspectRatio: Optional[str] = None


 class GeminiImageGenerationConfig(GeminiGenerationConfig):
-    responseModalities: list[str] | None = Field(None)
-    imageConfig: GeminiImageConfig | None = Field(None)
+    responseModalities: Optional[list[str]] = None
+    imageConfig: Optional[GeminiImageConfig] = None


 class GeminiImageGenerateContentRequest(BaseModel):
-    contents: list[GeminiContent] = Field(...)
-    generationConfig: GeminiImageGenerationConfig | None = Field(None)
-    safetySettings: list[GeminiSafetySetting] | None = Field(None)
-    systemInstruction: GeminiSystemInstructionContent | None = Field(None)
-    tools: list[GeminiTool] | None = Field(None)
-    videoMetadata: GeminiVideoMetadata | None = Field(None)
-
-
-class GeminiGenerateContentRequest(BaseModel):
-    contents: list[GeminiContent] = Field(...)
-    generationConfig: GeminiGenerationConfig | None = Field(None)
-    safetySettings: list[GeminiSafetySetting] | None = Field(None)
-    systemInstruction: GeminiSystemInstructionContent | None = Field(None)
-    tools: list[GeminiTool] | None = Field(None)
-    videoMetadata: GeminiVideoMetadata | None = Field(None)
-
-
-class Modality(str, Enum):
-    MODALITY_UNSPECIFIED = "MODALITY_UNSPECIFIED"
-    TEXT = "TEXT"
-    IMAGE = "IMAGE"
-    VIDEO = "VIDEO"
-    AUDIO = "AUDIO"
-    DOCUMENT = "DOCUMENT"
-
-
-class ModalityTokenCount(BaseModel):
-    modality: Modality | None = None
-    tokenCount: int | None = Field(None, description="Number of tokens for the given modality.")
-
-
-class Probability(str, Enum):
-    NEGLIGIBLE = "NEGLIGIBLE"
-    LOW = "LOW"
-    MEDIUM = "MEDIUM"
-    HIGH = "HIGH"
-    UNKNOWN = "UNKNOWN"
-
-
-class GeminiSafetyRating(BaseModel):
-    category: GeminiSafetyCategory | None = None
-    probability: Probability | None = Field(
-        None,
-        description="The probability that the content violates the specified safety category",
-    )
-
-
-class GeminiCitation(BaseModel):
-    authors: list[str] | None = None
-    endIndex: int | None = None
-    license: str | None = None
-    publicationDate: date | None = None
-    startIndex: int | None = None
-    title: str | None = None
-    uri: str | None = None
-
-
-class GeminiCitationMetadata(BaseModel):
-    citations: list[GeminiCitation] | None = None
-
-
-class GeminiCandidate(BaseModel):
-    citationMetadata: GeminiCitationMetadata | None = None
-    content: GeminiContent | None = None
-    finishReason: str | None = None
-    safetyRatings: list[GeminiSafetyRating] | None = None
-
-
-class GeminiPromptFeedback(BaseModel):
-    blockReason: str | None = None
-    blockReasonMessage: str | None = None
-    safetyRatings: list[GeminiSafetyRating] | None = None
-
-
-class GeminiUsageMetadata(BaseModel):
-    cachedContentTokenCount: int | None = Field(
-        None,
-        description="Output only. Number of tokens in the cached part in the input (the cached content).",
-    )
-    candidatesTokenCount: int | None = Field(None, description="Number of tokens in the response(s).")
-    candidatesTokensDetails: list[ModalityTokenCount] | None = Field(
-        None, description="Breakdown of candidate tokens by modality."
-    )
-    promptTokenCount: int | None = Field(
-        None,
-        description="Number of tokens in the request. When cachedContent is set, this is still the total effective prompt size meaning this includes the number of tokens in the cached content.",
-    )
-    promptTokensDetails: list[ModalityTokenCount] | None = Field(
-        None, description="Breakdown of prompt tokens by modality."
-    )
-    thoughtsTokenCount: int | None = Field(None, description="Number of tokens present in thoughts output.")
-    toolUsePromptTokenCount: int | None = Field(None, description="Number of tokens present in tool-use prompt(s).")
-
-
-class GeminiGenerateContentResponse(BaseModel):
-    candidates: list[GeminiCandidate] | None = Field(None)
-    promptFeedback: GeminiPromptFeedback | None = Field(None)
-    usageMetadata: GeminiUsageMetadata | None = Field(None)
-    modelVersion: str | None = Field(None)
+    contents: list[GeminiContent]
+    generationConfig: Optional[GeminiImageGenerationConfig] = None
+    safetySettings: Optional[list[GeminiSafetySetting]] = None
+    systemInstruction: Optional[GeminiSystemInstructionContent] = None
+    tools: Optional[list[GeminiTool]] = None
+    videoMetadata: Optional[GeminiVideoMetadata] = None
--- a/comfy_api_nodes/apis/minimax_api.py
+++ b/comfy_api_nodes/apis/minimax_api.py
@@ -1,120 +0,0 @@
-from enum import Enum
-from typing import Optional
-
-from pydantic import BaseModel, Field
-
-
-class MinimaxBaseResponse(BaseModel):
-    status_code: int = Field(
-        ...,
-        description='Status code. 0 indicates success, other values indicate errors.',
-    )
-    status_msg: str = Field(
-        ..., description='Specific error details or success message.'
-    )
-
-
-class File(BaseModel):
-    bytes: Optional[int] = Field(None, description='File size in bytes')
-    created_at: Optional[int] = Field(
-        None, description='Unix timestamp when the file was created, in seconds'
-    )
-    download_url: Optional[str] = Field(
-        None, description='The URL to download the video'
-    )
-    backup_download_url: Optional[str] = Field(
-        None, description='The backup URL to download the video'
-    )
-
-    file_id: Optional[int] = Field(None, description='Unique identifier for the file')
-    filename: Optional[str] = Field(None, description='The name of the file')
-    purpose: Optional[str] = Field(None, description='The purpose of using the file')
-
-
-class MinimaxFileRetrieveResponse(BaseModel):
-    base_resp: MinimaxBaseResponse
-    file: File
-
-
-class MiniMaxModel(str, Enum):
-    T2V_01_Director = 'T2V-01-Director'
-    I2V_01_Director = 'I2V-01-Director'
-    S2V_01 = 'S2V-01'
-    I2V_01 = 'I2V-01'
-    I2V_01_live = 'I2V-01-live'
-    T2V_01 = 'T2V-01'
-    Hailuo_02 = 'MiniMax-Hailuo-02'
-
-
-class Status6(str, Enum):
-    Queueing = 'Queueing'
-    Preparing = 'Preparing'
-    Processing = 'Processing'
-    Success = 'Success'
-    Fail = 'Fail'
-
-
-class MinimaxTaskResultResponse(BaseModel):
-    base_resp: MinimaxBaseResponse
-    file_id: Optional[str] = Field(
-        None,
-        description='After the task status changes to Success, this field returns the file ID corresponding to the generated video.',
-    )
-    status: Status6 = Field(
-        ...,
-        description="Task status: 'Queueing' (in queue), 'Preparing' (task is preparing), 'Processing' (generating), 'Success' (task completed successfully), or 'Fail' (task failed).",
-    )
-    task_id: str = Field(..., description='The task ID being queried.')
-
-
-class SubjectReferenceItem(BaseModel):
-    image: Optional[str] = Field(
-        None, description='URL or base64 encoding of the subject reference image.'
-    )
-    mask: Optional[str] = Field(
-        None,
-        description='URL or base64 encoding of the mask for the subject reference image.',
-    )
-
-
-class MinimaxVideoGenerationRequest(BaseModel):
-    callback_url: Optional[str] = Field(
-        None,
-        description='Optional. URL to receive real-time status updates about the video generation task.',
-    )
-    first_frame_image: Optional[str] = Field(
-        None,
-        description='URL or base64 encoding of the first frame image. Required when model is I2V-01, I2V-01-Director, or I2V-01-live.',
-    )
-    model: MiniMaxModel = Field(
-        ...,
-        description='Required. ID of model. Options: T2V-01-Director, I2V-01-Director, S2V-01, I2V-01, I2V-01-live, T2V-01',
-    )
-    prompt: Optional[str] = Field(
-        None,
-        description='Description of the video. Should be less than 2000 characters. Supports camera movement instructions in [brackets].',
-        max_length=2000,
-    )
-    prompt_optimizer: Optional[bool] = Field(
-        True,
-        description='If true (default), the model will automatically optimize the prompt. Set to false for more precise control.',
-    )
-    subject_reference: Optional[list[SubjectReferenceItem]] = Field(
-        None,
-        description='Only available when model is S2V-01. The model will generate a video based on the subject uploaded through this parameter.',
-    )
-    duration: Optional[int] = Field(
-        None,
-        description="The length of the output video in seconds."
-    )
-    resolution: Optional[str] = Field(
-        None,
-        description="The dimensions of the video display. 1080p corresponds to 1920 x 1080 pixels, 768p corresponds to 1366 x 768 pixels."
-    )
-
-
-class MinimaxVideoGenerationResponse(BaseModel):
-    base_resp: MinimaxBaseResponse
-    task_id: str = Field(
-        ..., description='The task ID for the asynchronous video generation task.'
-    )
--- a/comfy_api_nodes/apis/pika_defs.py
+++ b/comfy_api_nodes/apis/pika_defs.py
--- a/comfy_api_nodes/apis/request_logger.py
+++ b/comfy_api_nodes/apis/request_logger.py
@@ -1,11 +1,11 @@
 from __future__ import annotations

+import os
 import datetime
-import hashlib
 import json
 import logging
-import os
 import re
+import hashlib
 from typing import Any

 import folder_paths
--- a/comfy_api_nodes/apis/topaz_api.py
+++ b/comfy_api_nodes/apis/topaz_api.py
@@ -1,133 +0,0 @@
-from typing import Optional, Union
-
-from pydantic import BaseModel, Field
-
-
-class ImageEnhanceRequest(BaseModel):
-    model: str = Field("Reimagine")
-    output_format: str = Field("jpeg")
-    subject_detection: str = Field("All")
-    face_enhancement: bool = Field(True)
-    face_enhancement_creativity: float = Field(0, description="Is ignored if face_enhancement is false")
-    face_enhancement_strength: float = Field(0.8, description="Is ignored if face_enhancement is false")
-    source_url: str = Field(...)
-    output_width: Optional[int] = Field(None)
-    output_height: Optional[int] = Field(None)
-    crop_to_fill: bool = Field(False)
-    prompt: Optional[str] = Field(None, description="Text prompt for creative upscaling guidance")
-    creativity: int = Field(3, description="Creativity settings range from 1 to 9")
-    face_preservation: str = Field("true", description="To preserve the identity of characters")
-    color_preservation: str = Field("true", description="To preserve the original color")
-
-
-class ImageAsyncTaskResponse(BaseModel):
-    process_id: str = Field(...)
-
-
-class ImageStatusResponse(BaseModel):
-    process_id: str = Field(...)
-    status: str = Field(...)
-    progress: Optional[int] = Field(None)
-    credits: int = Field(...)
-
-
-class ImageDownloadResponse(BaseModel):
-    download_url: str = Field(...)
-    expiry: int = Field(...)
-
-
-class Resolution(BaseModel):
-    width: int = Field(...)
-    height: int = Field(...)
-
-
-class CreateCreateVideoRequestSource(BaseModel):
-    container: str = Field(...)
-    size: int = Field(..., description="Size of the video file in bytes")
-    duration: int = Field(..., description="Duration of the video file in seconds")
-    frameCount: int = Field(..., description="Total number of frames in the video")
-    frameRate: int = Field(...)
-    resolution: Resolution = Field(...)
-
-
-class VideoFrameInterpolationFilter(BaseModel):
-    model: str = Field(...)
-    slowmo: Optional[int] = Field(None)
-    fps: int = Field(...)
-    duplicate: bool = Field(...)
-    duplicate_threshold: float = Field(...)
-
-
-class VideoEnhancementFilter(BaseModel):
-    model: str = Field(...)
-    auto: Optional[str] = Field(None, description="Auto, Manual, Relative")
-    focusFixLevel: Optional[str] = Field(None, description="Downscales video input for correction of blurred subjects")
-    compression: Optional[float] = Field(None, description="Strength of compression recovery")
-    details: Optional[float] = Field(None, description="Amount of detail reconstruction")
-    prenoise: Optional[float] = Field(None, description="Amount of noise to add to input to reduce over-smoothing")
-    noise: Optional[float] = Field(None, description="Amount of noise reduction")
-    halo: Optional[float] = Field(None, description="Amount of halo reduction")
-    preblur: Optional[float] = Field(None, description="Anti-aliasing and deblurring strength")
-    blur: Optional[float] = Field(None, description="Amount of sharpness applied")
-    grain: Optional[float] = Field(None, description="Grain after AI model processing")
-    grainSize: Optional[float] = Field(None, description="Size of generated grain")
-    recoverOriginalDetailValue: Optional[float] = Field(None, description="Source details into the output video")
-    creativity: Optional[str] = Field(None, description="Creativity level(high, low) for slc-1 only")
-    isOptimizedMode: Optional[bool] = Field(None, description="Set to true for Starlight Creative (slc-1) only")
-
-
-class OutputInformationVideo(BaseModel):
-    resolution: Resolution = Field(...)
-    frameRate: int = Field(...)
-    audioCodec: Optional[str] = Field(..., description="Required if audioTransfer is Copy or Convert")
-    audioTransfer: str = Field(..., description="Copy, Convert, None")
-    dynamicCompressionLevel: str = Field(..., description="Low, Mid, High")
-
-
-class Overrides(BaseModel):
-    isPaidDiffusion: bool = Field(True)
-
-
-class CreateVideoRequest(BaseModel):
-    source: CreateCreateVideoRequestSource = Field(...)
-    filters: list[Union[VideoFrameInterpolationFilter, VideoEnhancementFilter]] = Field(...)
-    output: OutputInformationVideo = Field(...)
-    overrides: Overrides = Field(Overrides(isPaidDiffusion=True))
-
-
-class CreateVideoResponse(BaseModel):
-    requestId: str = Field(...)
-
-
-class VideoAcceptResponse(BaseModel):
-    uploadId: str = Field(...)
-    urls: list[str] = Field(...)
-
-
-class VideoCompleteUploadRequestPart(BaseModel):
-    partNum: int = Field(...)
-    eTag: str = Field(...)
-
-
-class VideoCompleteUploadRequest(BaseModel):
-    uploadResults: list[VideoCompleteUploadRequestPart] = Field(...)
-
-
-class VideoCompleteUploadResponse(BaseModel):
-    message: str = Field(..., description="Confirmation message")
-
-
-class VideoStatusResponseEstimates(BaseModel):
-    cost: list[int] = Field(...)
-
-
-class VideoStatusResponseDownloadUrl(BaseModel):
-    url: str = Field(...)
-
-
-class VideoStatusResponse(BaseModel):
-    status: str = Field(...)
-    estimates: Optional[VideoStatusResponseEstimates] = Field(None)
-    progress: Optional[float] = Field(None)
-    message: Optional[str] = Field("")
-    download: Optional[VideoStatusResponseDownloadUrl] = Field(None)
--- a/comfy_api_nodes/apis/tripo_api.py
+++ b/comfy_api_nodes/apis/tripo_api.py
@@ -1,20 +1,13 @@
 from __future__ import annotations
+from comfy_api_nodes.apis import (
+    TripoModelVersion,
+    TripoTextureQuality,
+)
 from enum import Enum
 from typing import Optional, List, Dict, Any, Union

 from pydantic import BaseModel, Field, RootModel

-class TripoModelVersion(str, Enum):
-    v2_5_20250123 = 'v2.5-20250123'
-    v2_0_20240919 = 'v2.0-20240919'
-    v1_4_20240625 = 'v1.4-20240625'
-
-
-class TripoTextureQuality(str, Enum):
-    standard = 'standard'
-    detailed = 'detailed'
-
-
 class TripoStyle(str, Enum):
    PERSON_TO_CARTOON = "person:person2cartoon"
    ANIMAL_VENOM = "animal:venom"
--- a/comfy_api_nodes/apis/veo_api.py
+++ b/comfy_api_nodes/apis/veo_api.py
@@ -1,111 +0,0 @@
-from typing import Optional, Union
-from enum import Enum
-
-from pydantic import BaseModel, Field
-
-
-class Image2(BaseModel):
-    bytesBase64Encoded: str
-    gcsUri: Optional[str] = None
-    mimeType: Optional[str] = None
-
-
-class Image3(BaseModel):
-    bytesBase64Encoded: Optional[str] = None
-    gcsUri: str
-    mimeType: Optional[str] = None
-
-
-class Instance1(BaseModel):
-    image: Optional[Union[Image2, Image3]] = Field(
-        None, description='Optional image to guide video generation'
-    )
-    prompt: str = Field(..., description='Text description of the video')
-
-
-class PersonGeneration1(str, Enum):
-    ALLOW = 'ALLOW'
-    BLOCK = 'BLOCK'
-
-
-class Parameters1(BaseModel):
-    aspectRatio: Optional[str] = Field(None, examples=['16:9'])
-    durationSeconds: Optional[int] = None
-    enhancePrompt: Optional[bool] = None
-    generateAudio: Optional[bool] = Field(
-        None,
-        description='Generate audio for the video. Only supported by veo 3 models.',
-    )
-    negativePrompt: Optional[str] = None
-    personGeneration: Optional[PersonGeneration1] = None
-    sampleCount: Optional[int] = None
-    seed: Optional[int] = None
-    storageUri: Optional[str] = Field(
-        None, description='Optional Cloud Storage URI to upload the video'
-    )
-
-
-class VeoGenVidRequest(BaseModel):
-    instances: Optional[list[Instance1]] = None
-    parameters: Optional[Parameters1] = None
-
-
-class VeoGenVidResponse(BaseModel):
-    name: str = Field(
-        ...,
-        description='Operation resource name',
-        examples=[
-            'projects/PROJECT_ID/locations/us-central1/publishers/google/models/MODEL_ID/operations/a1b07c8e-7b5a-4aba-bb34-3e1ccb8afcc8'
-        ],
-    )
-
-
-class VeoGenVidPollRequest(BaseModel):
-    operationName: str = Field(
-        ...,
-        description='Full operation name (from predict response)',
-        examples=[
-            'projects/PROJECT_ID/locations/us-central1/publishers/google/models/MODEL_ID/operations/OPERATION_ID'
-        ],
-    )
-
-
-class Video(BaseModel):
-    bytesBase64Encoded: Optional[str] = Field(
-        None, description='Base64-encoded video content'
-    )
-    gcsUri: Optional[str] = Field(None, description='Cloud Storage URI of the video')
-    mimeType: Optional[str] = Field(None, description='Video MIME type')
-
-
-class Error1(BaseModel):
-    code: Optional[int] = Field(None, description='Error code')
-    message: Optional[str] = Field(None, description='Error message')
-
-
-class Response1(BaseModel):
-    field_type: Optional[str] = Field(
-        None,
-        alias='@type',
-        examples=[
-            'type.googleapis.com/cloud.ai.large_models.vision.GenerateVideoResponse'
-        ],
-    )
-    raiMediaFilteredCount: Optional[int] = Field(
-        None, description='Count of media filtered by responsible AI policies'
-    )
-    raiMediaFilteredReasons: Optional[list[str]] = Field(
-        None, description='Reasons why media was filtered by responsible AI policies'
-    )
-    videos: Optional[list[Video]] = None
-
-
-class VeoGenVidPollResponse(BaseModel):
-    done: Optional[bool] = None
-    error: Optional[Error1] = Field(
-        None, description='Error details if operation failed'
-    )
-    name: Optional[str] = None
-    response: Optional[Response1] = Field(
-        None, description='The actual prediction response if done is true'
-    )
--- a/comfy_api_nodes/nodes_bfl.py
+++ b/comfy_api_nodes/nodes_bfl.py
@@ -1,46 +1,146 @@
+import asyncio
+import io
 from inspect import cleandoc
-from typing import Optional
-
-import torch
+from typing import Union, Optional
 from typing_extensions import override
-
-from comfy_api.latest import IO, ComfyExtension
+from comfy_api.latest import ComfyExtension, IO
 from comfy_api_nodes.apis.bfl_api import (
+    BFLStatus,
    BFLFluxExpandImageRequest,
    BFLFluxFillImageRequest,
-    BFLFluxKontextProGenerateRequest,
+    BFLFluxCannyImageRequest,
+    BFLFluxDepthImageRequest,
    BFLFluxProGenerateRequest,
-    BFLFluxProGenerateResponse,
+    BFLFluxKontextProGenerateRequest,
    BFLFluxProUltraGenerateRequest,
-    BFLFluxStatusResponse,
-    BFLStatus,
+    BFLFluxProGenerateResponse,
 )
-from comfy_api_nodes.util import (
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
-    download_url_to_image_tensor,
-    poll_op,
+    HttpMethod,
+    SynchronousOperation,
+)
+from comfy_api_nodes.apinode_utils import (
+    downscale_image_tensor,
+    validate_aspect_ratio,
+    process_image_response,
    resize_mask_to_image,
-    sync_op,
-    tensor_to_base64_string,
-    validate_aspect_ratio_string,
    validate_string,
 )

+import numpy as np
+from PIL import Image
+import aiohttp
+import torch
+import base64
+import time
+from server import PromptServer
+

 def convert_mask_to_image(mask: torch.Tensor):
    """
    Make mask have the expected amount of dims (4) and channels (3) to be recognized as an image.
    """
    mask = mask.unsqueeze(-1)
-    mask = torch.cat([mask] * 3, dim=-1)
+    mask = torch.cat([mask]*3, dim=-1)
    return mask


+async def handle_bfl_synchronous_operation(
+    operation: SynchronousOperation,
+    timeout_bfl_calls=360,
+    node_id: Union[str, None] = None,
+):
+    response_api: BFLFluxProGenerateResponse = await operation.execute()
+    return await _poll_until_generated(
+        response_api.polling_url, timeout=timeout_bfl_calls, node_id=node_id
+    )
+
+
+async def _poll_until_generated(
+    polling_url: str, timeout=360, node_id: Union[str, None] = None
+):
+    # used bfl-comfy-nodes to verify code implementation:
+    # https://github.com/black-forest-labs/bfl-comfy-nodes/tree/main
+    start_time = time.time()
+    retries_404 = 0
+    max_retries_404 = 5
+    retry_404_seconds = 2
+    retry_202_seconds = 2
+    retry_pending_seconds = 1
+
+    async with aiohttp.ClientSession() as session:
+        # NOTE: should True loop be replaced with checking if workflow has been interrupted?
+        while True:
+            if node_id:
+                time_elapsed = time.time() - start_time
+                PromptServer.instance.send_progress_text(
+                    f"Generating ({time_elapsed:.0f}s)", node_id
+                )
+
+            async with session.get(polling_url) as response:
+                if response.status == 200:
+                    result = await response.json()
+                    if result["status"] == BFLStatus.ready:
+                        img_url = result["result"]["sample"]
+                        if node_id:
+                            PromptServer.instance.send_progress_text(
+                                f"Result URL: {img_url}", node_id
+                            )
+                        async with session.get(img_url) as img_resp:
+                            return process_image_response(await img_resp.content.read())
+                    elif result["status"] in [
+                        BFLStatus.request_moderated,
+                        BFLStatus.content_moderated,
+                    ]:
+                        status = result["status"]
+                        raise Exception(
+                            f"BFL API did not return an image due to: {status}."
+                        )
+                    elif result["status"] == BFLStatus.error:
+                        raise Exception(f"BFL API encountered an error: {result}.")
+                    elif result["status"] == BFLStatus.pending:
+                        await asyncio.sleep(retry_pending_seconds)
+                        continue
+                elif response.status == 404:
+                    if retries_404 < max_retries_404:
+                        retries_404 += 1
+                        await asyncio.sleep(retry_404_seconds)
+                        continue
+                    raise Exception(
+                        f"BFL API could not find task after {max_retries_404} tries."
+                    )
+                elif response.status == 202:
+                    await asyncio.sleep(retry_202_seconds)
+                elif time.time() - start_time > timeout:
+                    raise Exception(
+                        f"BFL API experienced a timeout; could not return request under {timeout} seconds."
+                    )
+                else:
+                    raise Exception(f"BFL API encountered an error: {response.json()}")
+
+def convert_image_to_base64(image: torch.Tensor):
+    scaled_image = downscale_image_tensor(image, total_pixels=2048 * 2048)
+    # remove batch dimension if present
+    if len(scaled_image.shape) > 3:
+        scaled_image = scaled_image[0]
+    image_np = (scaled_image.numpy() * 255).astype(np.uint8)
+    img = Image.fromarray(image_np)
+    img_byte_arr = io.BytesIO()
+    img.save(img_byte_arr, format="PNG")
+    return base64.b64encode(img_byte_arr.getvalue()).decode()
+
+
 class FluxProUltraImageNode(IO.ComfyNode):
    """
    Generates images using Flux Pro 1.1 Ultra via api based on prompt and resolution.
    """

+    MINIMUM_RATIO = 1 / 4
+    MAXIMUM_RATIO = 4 / 1
+    MINIMUM_RATIO_STR = "1:4"
+    MAXIMUM_RATIO_STR = "4:1"
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
@@ -58,9 +158,7 @@ class FluxProUltraImageNode(IO.ComfyNode):
                IO.Boolean.Input(
                    "prompt_upsampling",
                    default=False,
-                    tooltip="Whether to perform upsampling on the prompt. "
-                    "If active, automatically modifies the prompt for more creative generation, "
-                    "but results are nondeterministic (same seed will not produce exactly the same result).",
+                    tooltip="Whether to perform upsampling on the prompt. If active, automatically modifies the prompt for more creative generation, but results are nondeterministic (same seed will not produce exactly the same result).",
                ),
                IO.Int.Input(
                    "seed",
@@ -105,7 +203,16 @@ class FluxProUltraImageNode(IO.ComfyNode):

    @classmethod
    def validate_inputs(cls, aspect_ratio: str):
-        validate_aspect_ratio_string(aspect_ratio, (1, 4), (4, 1))
+        try:
+            validate_aspect_ratio(
+                aspect_ratio,
+                minimum_ratio=cls.MINIMUM_RATIO,
+                maximum_ratio=cls.MAXIMUM_RATIO,
+                minimum_ratio_str=cls.MINIMUM_RATIO_STR,
+                maximum_ratio_str=cls.MAXIMUM_RATIO_STR,
+            )
+        except Exception as e:
+            return str(e)
        return True

    @classmethod
@@ -113,44 +220,49 @@ class FluxProUltraImageNode(IO.ComfyNode):
        cls,
        prompt: str,
        aspect_ratio: str,
-        prompt_upsampling: bool = False,
-        raw: bool = False,
-        seed: int = 0,
-        image_prompt: Optional[torch.Tensor] = None,
-        image_prompt_strength: float = 0.1,
+        prompt_upsampling=False,
+        raw=False,
+        seed=0,
+        image_prompt=None,
+        image_prompt_strength=0.1,
    ) -> IO.NodeOutput:
        if image_prompt is None:
            validate_string(prompt, strip_whitespace=False)
-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/bfl/flux-pro-1.1-ultra/generate", method="POST"),
-            response_model=BFLFluxProGenerateResponse,
-            data=BFLFluxProUltraGenerateRequest(
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/bfl/flux-pro-1.1-ultra/generate",
+                method=HttpMethod.POST,
+                request_model=BFLFluxProUltraGenerateRequest,
+                response_model=BFLFluxProGenerateResponse,
+            ),
+            request=BFLFluxProUltraGenerateRequest(
                prompt=prompt,
                prompt_upsampling=prompt_upsampling,
                seed=seed,
-                aspect_ratio=aspect_ratio,
+                aspect_ratio=validate_aspect_ratio(
+                    aspect_ratio,
+                    minimum_ratio=cls.MINIMUM_RATIO,
+                    maximum_ratio=cls.MAXIMUM_RATIO,
+                    minimum_ratio_str=cls.MINIMUM_RATIO_STR,
+                    maximum_ratio_str=cls.MAXIMUM_RATIO_STR,
+                ),
                raw=raw,
-                image_prompt=(image_prompt if image_prompt is None else tensor_to_base64_string(image_prompt)),
-                image_prompt_strength=(None if image_prompt is None else round(image_prompt_strength, 2)),
+                image_prompt=(
+                    image_prompt
+                    if image_prompt is None
+                    else convert_image_to_base64(image_prompt)
+                ),
+                image_prompt_strength=(
+                    None if image_prompt is None else round(image_prompt_strength, 2)
+                ),
            ),
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
        )
-        response = await poll_op(
-            cls,
-            ApiEndpoint(initial_response.polling_url),
-            response_model=BFLFluxStatusResponse,
-            status_extractor=lambda r: r.status,
-            progress_extractor=lambda r: r.progress,
-            completed_statuses=[BFLStatus.ready],
-            failed_statuses=[
-                BFLStatus.request_moderated,
-                BFLStatus.content_moderated,
-                BFLStatus.error,
-                BFLStatus.task_not_found,
-            ],
-            queued_statuses=[],
-        )
-        return IO.NodeOutput(await download_url_to_image_tensor(response.result["sample"]))
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=cls.hidden.unique_id)
+        return IO.NodeOutput(output_image)


 class FluxKontextProImageNode(IO.ComfyNode):
@@ -158,6 +270,11 @@ class FluxKontextProImageNode(IO.ComfyNode):
    Edits images using Flux.1 Kontext [pro] via api based on prompt and aspect ratio.
    """

+    MINIMUM_RATIO = 1 / 4
+    MAXIMUM_RATIO = 4 / 1
+    MINIMUM_RATIO_STR = "1:4"
+    MAXIMUM_RATIO_STR = "4:1"
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
@@ -230,43 +347,46 @@ class FluxKontextProImageNode(IO.ComfyNode):
        aspect_ratio: str,
        guidance: float,
        steps: int,
-        input_image: Optional[torch.Tensor] = None,
+        input_image: Optional[torch.Tensor]=None,
        seed=0,
        prompt_upsampling=False,
    ) -> IO.NodeOutput:
-        validate_aspect_ratio_string(aspect_ratio, (1, 4), (4, 1))
+        aspect_ratio = validate_aspect_ratio(
+            aspect_ratio,
+            minimum_ratio=cls.MINIMUM_RATIO,
+            maximum_ratio=cls.MAXIMUM_RATIO,
+            minimum_ratio_str=cls.MINIMUM_RATIO_STR,
+            maximum_ratio_str=cls.MAXIMUM_RATIO_STR,
+        )
        if input_image is None:
            validate_string(prompt, strip_whitespace=False)
-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(path=cls.BFL_PATH, method="POST"),
-            response_model=BFLFluxProGenerateResponse,
-            data=BFLFluxKontextProGenerateRequest(
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=cls.BFL_PATH,
+                method=HttpMethod.POST,
+                request_model=BFLFluxKontextProGenerateRequest,
+                response_model=BFLFluxProGenerateResponse,
+            ),
+            request=BFLFluxKontextProGenerateRequest(
                prompt=prompt,
                prompt_upsampling=prompt_upsampling,
                guidance=round(guidance, 1),
                steps=steps,
                seed=seed,
                aspect_ratio=aspect_ratio,
-                input_image=(input_image if input_image is None else tensor_to_base64_string(input_image)),
+                input_image=(
+                    input_image
+                    if input_image is None
+                    else convert_image_to_base64(input_image)
+                )
            ),
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
        )
-        response = await poll_op(
-            cls,
-            ApiEndpoint(initial_response.polling_url),
-            response_model=BFLFluxStatusResponse,
-            status_extractor=lambda r: r.status,
-            progress_extractor=lambda r: r.progress,
-            completed_statuses=[BFLStatus.ready],
-            failed_statuses=[
-                BFLStatus.request_moderated,
-                BFLStatus.content_moderated,
-                BFLStatus.error,
-                BFLStatus.task_not_found,
-            ],
-            queued_statuses=[],
-        )
-        return IO.NodeOutput(await download_url_to_image_tensor(response.result["sample"]))
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=cls.hidden.unique_id)
+        return IO.NodeOutput(output_image)


 class FluxKontextMaxImageNode(FluxKontextProImageNode):
@@ -302,9 +422,7 @@ class FluxProImageNode(IO.ComfyNode):
                IO.Boolean.Input(
                    "prompt_upsampling",
                    default=False,
-                    tooltip="Whether to perform upsampling on the prompt. "
-                    "If active, automatically modifies the prompt for more creative generation, "
-                    "but results are nondeterministic (same seed will not produce exactly the same result).",
+                    tooltip="Whether to perform upsampling on the prompt. If active, automatically modifies the prompt for more creative generation, but results are nondeterministic (same seed will not produce exactly the same result).",
                ),
                IO.Int.Input(
                    "width",
@@ -363,15 +481,20 @@ class FluxProImageNode(IO.ComfyNode):
        image_prompt=None,
        # image_prompt_strength=0.1,
    ) -> IO.NodeOutput:
-        image_prompt = image_prompt if image_prompt is None else tensor_to_base64_string(image_prompt)
-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(
+        image_prompt = (
+                    image_prompt
+                    if image_prompt is None
+                    else convert_image_to_base64(image_prompt)
+                )
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
                path="/proxy/bfl/flux-pro-1.1/generate",
-                method="POST",
+                method=HttpMethod.POST,
+                request_model=BFLFluxProGenerateRequest,
+                response_model=BFLFluxProGenerateResponse,
            ),
-            response_model=BFLFluxProGenerateResponse,
-            data=BFLFluxProGenerateRequest(
+            request=BFLFluxProGenerateRequest(
                prompt=prompt,
                prompt_upsampling=prompt_upsampling,
                width=width,
@@ -379,23 +502,13 @@ class FluxProImageNode(IO.ComfyNode):
                seed=seed,
                image_prompt=image_prompt,
            ),
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
        )
-        response = await poll_op(
-            cls,
-            ApiEndpoint(initial_response.polling_url),
-            response_model=BFLFluxStatusResponse,
-            status_extractor=lambda r: r.status,
-            progress_extractor=lambda r: r.progress,
-            completed_statuses=[BFLStatus.ready],
-            failed_statuses=[
-                BFLStatus.request_moderated,
-                BFLStatus.content_moderated,
-                BFLStatus.error,
-                BFLStatus.task_not_found,
-            ],
-            queued_statuses=[],
-        )
-        return IO.NodeOutput(await download_url_to_image_tensor(response.result["sample"]))
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=cls.hidden.unique_id)
+        return IO.NodeOutput(output_image)


 class FluxProExpandNode(IO.ComfyNode):
@@ -421,9 +534,7 @@ class FluxProExpandNode(IO.ComfyNode):
                IO.Boolean.Input(
                    "prompt_upsampling",
                    default=False,
-                    tooltip="Whether to perform upsampling on the prompt. "
-                    "If active, automatically modifies the prompt for more creative generation, "
-                    "but results are nondeterministic (same seed will not produce exactly the same result).",
+                    tooltip="Whether to perform upsampling on the prompt. If active, automatically modifies the prompt for more creative generation, but results are nondeterministic (same seed will not produce exactly the same result).",
                ),
                IO.Int.Input(
                    "top",
@@ -499,11 +610,16 @@ class FluxProExpandNode(IO.ComfyNode):
        guidance: float,
        seed=0,
    ) -> IO.NodeOutput:
-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/bfl/flux-pro-1.0-expand/generate", method="POST"),
-            response_model=BFLFluxProGenerateResponse,
-            data=BFLFluxExpandImageRequest(
+        image = convert_image_to_base64(image)
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/bfl/flux-pro-1.0-expand/generate",
+                method=HttpMethod.POST,
+                request_model=BFLFluxExpandImageRequest,
+                response_model=BFLFluxProGenerateResponse,
+            ),
+            request=BFLFluxExpandImageRequest(
                prompt=prompt,
                prompt_upsampling=prompt_upsampling,
                top=top,
@@ -513,25 +629,16 @@ class FluxProExpandNode(IO.ComfyNode):
                steps=steps,
                guidance=guidance,
                seed=seed,
-                image=tensor_to_base64_string(image),
+                image=image,
            ),
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
        )
-        response = await poll_op(
-            cls,
-            ApiEndpoint(initial_response.polling_url),
-            response_model=BFLFluxStatusResponse,
-            status_extractor=lambda r: r.status,
-            progress_extractor=lambda r: r.progress,
-            completed_statuses=[BFLStatus.ready],
-            failed_statuses=[
-                BFLStatus.request_moderated,
-                BFLStatus.content_moderated,
-                BFLStatus.error,
-                BFLStatus.task_not_found,
-            ],
-            queued_statuses=[],
-        )
-        return IO.NodeOutput(await download_url_to_image_tensor(response.result["sample"]))
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=cls.hidden.unique_id)
+        return IO.NodeOutput(output_image)
+


 class FluxProFillNode(IO.ComfyNode):
@@ -558,9 +665,7 @@ class FluxProFillNode(IO.ComfyNode):
                IO.Boolean.Input(
                    "prompt_upsampling",
                    default=False,
-                    tooltip="Whether to perform upsampling on the prompt. "
-                    "If active, automatically modifies the prompt for more creative generation, "
-                    "but results are nondeterministic (same seed will not produce exactly the same result).",
+                    tooltip="Whether to perform upsampling on the prompt. If active, automatically modifies the prompt for more creative generation, but results are nondeterministic (same seed will not produce exactly the same result).",
                ),
                IO.Float.Input(
                    "guidance",
@@ -607,37 +712,272 @@ class FluxProFillNode(IO.ComfyNode):
    ) -> IO.NodeOutput:
        # prepare mask
        mask = resize_mask_to_image(mask, image)
-        mask = tensor_to_base64_string(convert_mask_to_image(mask))
-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/bfl/flux-pro-1.0-fill/generate", method="POST"),
-            response_model=BFLFluxProGenerateResponse,
-            data=BFLFluxFillImageRequest(
+        mask = convert_image_to_base64(convert_mask_to_image(mask))
+        # make sure image will have alpha channel removed
+        image = convert_image_to_base64(image[:, :, :, :3])
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/bfl/flux-pro-1.0-fill/generate",
+                method=HttpMethod.POST,
+                request_model=BFLFluxFillImageRequest,
+                response_model=BFLFluxProGenerateResponse,
+            ),
+            request=BFLFluxFillImageRequest(
                prompt=prompt,
                prompt_upsampling=prompt_upsampling,
                steps=steps,
                guidance=guidance,
                seed=seed,
-                image=tensor_to_base64_string(image[:, :, :, :3]),  # make sure image will have alpha channel removed
+                image=image,
                mask=mask,
            ),
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
        )
-        response = await poll_op(
-            cls,
-            ApiEndpoint(initial_response.polling_url),
-            response_model=BFLFluxStatusResponse,
-            status_extractor=lambda r: r.status,
-            progress_extractor=lambda r: r.progress,
-            completed_statuses=[BFLStatus.ready],
-            failed_statuses=[
-                BFLStatus.request_moderated,
-                BFLStatus.content_moderated,
-                BFLStatus.error,
-                BFLStatus.task_not_found,
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=cls.hidden.unique_id)
+        return IO.NodeOutput(output_image)
+
+
+class FluxProCannyNode(IO.ComfyNode):
+    """
+    Generate image using a control image (canny).
+    """
+
+    @classmethod
+    def define_schema(cls) -> IO.Schema:
+        return IO.Schema(
+            node_id="FluxProCannyNode",
+            display_name="Flux.1 Canny Control Image",
+            category="api node/image/BFL",
+            description=cleandoc(cls.__doc__ or ""),
+            inputs=[
+                IO.Image.Input("control_image"),
+                IO.String.Input(
+                    "prompt",
+                    multiline=True,
+                    default="",
+                    tooltip="Prompt for the image generation",
+                ),
+                IO.Boolean.Input(
+                    "prompt_upsampling",
+                    default=False,
+                    tooltip="Whether to perform upsampling on the prompt. If active, automatically modifies the prompt for more creative generation, but results are nondeterministic (same seed will not produce exactly the same result).",
+                ),
+                IO.Float.Input(
+                    "canny_low_threshold",
+                    default=0.1,
+                    min=0.01,
+                    max=0.99,
+                    step=0.01,
+                    tooltip="Low threshold for Canny edge detection; ignored if skip_processing is True",
+                ),
+                IO.Float.Input(
+                    "canny_high_threshold",
+                    default=0.4,
+                    min=0.01,
+                    max=0.99,
+                    step=0.01,
+                    tooltip="High threshold for Canny edge detection; ignored if skip_processing is True",
+                ),
+                IO.Boolean.Input(
+                    "skip_preprocessing",
+                    default=False,
+                    tooltip="Whether to skip preprocessing; set to True if control_image already is canny-fied, False if it is a raw image.",
+                ),
+                IO.Float.Input(
+                    "guidance",
+                    default=30,
+                    min=1,
+                    max=100,
+                    tooltip="Guidance strength for the image generation process",
+                ),
+                IO.Int.Input(
+                    "steps",
+                    default=50,
+                    min=15,
+                    max=50,
+                    tooltip="Number of steps for the image generation process",
+                ),
+                IO.Int.Input(
+                    "seed",
+                    default=0,
+                    min=0,
+                    max=0xFFFFFFFFFFFFFFFF,
+                    control_after_generate=True,
+                    tooltip="The random seed used for creating the noise.",
+                ),
            ],
-            queued_statuses=[],
+            outputs=[IO.Image.Output()],
+            hidden=[
+                IO.Hidden.auth_token_comfy_org,
+                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
+            ],
+            is_api_node=True,
        )
-        return IO.NodeOutput(await download_url_to_image_tensor(response.result["sample"]))
+
+    @classmethod
+    async def execute(
+        cls,
+        control_image: torch.Tensor,
+        prompt: str,
+        prompt_upsampling: bool,
+        canny_low_threshold: float,
+        canny_high_threshold: float,
+        skip_preprocessing: bool,
+        steps: int,
+        guidance: float,
+        seed=0,
+    ) -> IO.NodeOutput:
+        control_image = convert_image_to_base64(control_image[:, :, :, :3])
+        preprocessed_image = None
+
+        # scale canny threshold between 0-500, to match BFL's API
+        def scale_value(value: float, min_val=0, max_val=500):
+            return min_val + value * (max_val - min_val)
+        canny_low_threshold = int(round(scale_value(canny_low_threshold)))
+        canny_high_threshold = int(round(scale_value(canny_high_threshold)))
+
+
+        if skip_preprocessing:
+            preprocessed_image = control_image
+            control_image = None
+            canny_low_threshold = None
+            canny_high_threshold = None
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/bfl/flux-pro-1.0-canny/generate",
+                method=HttpMethod.POST,
+                request_model=BFLFluxCannyImageRequest,
+                response_model=BFLFluxProGenerateResponse,
+            ),
+            request=BFLFluxCannyImageRequest(
+                prompt=prompt,
+                prompt_upsampling=prompt_upsampling,
+                steps=steps,
+                guidance=guidance,
+                seed=seed,
+                control_image=control_image,
+                canny_low_threshold=canny_low_threshold,
+                canny_high_threshold=canny_high_threshold,
+                preprocessed_image=preprocessed_image,
+            ),
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+        )
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=cls.hidden.unique_id)
+        return IO.NodeOutput(output_image)
+
+
+class FluxProDepthNode(IO.ComfyNode):
+    """
+    Generate image using a control image (depth).
+    """
+
+    @classmethod
+    def define_schema(cls) -> IO.Schema:
+        return IO.Schema(
+            node_id="FluxProDepthNode",
+            display_name="Flux.1 Depth Control Image",
+            category="api node/image/BFL",
+            description=cleandoc(cls.__doc__ or ""),
+            inputs=[
+                IO.Image.Input("control_image"),
+                IO.String.Input(
+                    "prompt",
+                    multiline=True,
+                    default="",
+                    tooltip="Prompt for the image generation",
+                ),
+                IO.Boolean.Input(
+                    "prompt_upsampling",
+                    default=False,
+                    tooltip="Whether to perform upsampling on the prompt. If active, automatically modifies the prompt for more creative generation, but results are nondeterministic (same seed will not produce exactly the same result).",
+                ),
+                IO.Boolean.Input(
+                    "skip_preprocessing",
+                    default=False,
+                    tooltip="Whether to skip preprocessing; set to True if control_image already is depth-ified, False if it is a raw image.",
+                ),
+                IO.Float.Input(
+                    "guidance",
+                    default=15,
+                    min=1,
+                    max=100,
+                    tooltip="Guidance strength for the image generation process",
+                ),
+                IO.Int.Input(
+                    "steps",
+                    default=50,
+                    min=15,
+                    max=50,
+                    tooltip="Number of steps for the image generation process",
+                ),
+                IO.Int.Input(
+                    "seed",
+                    default=0,
+                    min=0,
+                    max=0xFFFFFFFFFFFFFFFF,
+                    control_after_generate=True,
+                    tooltip="The random seed used for creating the noise.",
+                ),
+            ],
+            outputs=[IO.Image.Output()],
+            hidden=[
+                IO.Hidden.auth_token_comfy_org,
+                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
+            ],
+            is_api_node=True,
+        )
+
+    @classmethod
+    async def execute(
+        cls,
+        control_image: torch.Tensor,
+        prompt: str,
+        prompt_upsampling: bool,
+        skip_preprocessing: bool,
+        steps: int,
+        guidance: float,
+        seed=0,
+    ) -> IO.NodeOutput:
+        control_image = convert_image_to_base64(control_image[:,:,:,:3])
+        preprocessed_image = None
+
+        if skip_preprocessing:
+            preprocessed_image = control_image
+            control_image = None
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/bfl/flux-pro-1.0-depth/generate",
+                method=HttpMethod.POST,
+                request_model=BFLFluxDepthImageRequest,
+                response_model=BFLFluxProGenerateResponse,
+            ),
+            request=BFLFluxDepthImageRequest(
+                prompt=prompt,
+                prompt_upsampling=prompt_upsampling,
+                steps=steps,
+                guidance=guidance,
+                seed=seed,
+                control_image=control_image,
+                preprocessed_image=preprocessed_image,
+            ),
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+        )
+        output_image = await handle_bfl_synchronous_operation(operation, node_id=cls.hidden.unique_id)
+        return IO.NodeOutput(output_image)


 class BFLExtension(ComfyExtension):
@@ -650,6 +990,8 @@ class BFLExtension(ComfyExtension):
            FluxKontextMaxImageNode,
            FluxProExpandNode,
            FluxProFillNode,
+            FluxProCannyNode,
+            FluxProDepthNode,
        ]


--- a/comfy_api_nodes/nodes_bytedance.py
+++ b/comfy_api_nodes/nodes_bytedance.py
@@ -1,27 +1,35 @@
 import logging
 import math
 from enum import Enum
-from typing import Literal, Optional, Union
+from typing import Literal, Optional, Type, Union
+from typing_extensions import override

 import torch
 from pydantic import BaseModel, Field
-from typing_extensions import override

-from comfy_api.latest import IO, ComfyExtension
-from comfy_api_nodes.util import (
+from comfy_api.latest import ComfyExtension, IO
+from comfy_api_nodes.util.validation_utils import (
+    validate_image_aspect_ratio_range,
+    get_number_of_images,
+    validate_image_dimensions,
+)
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
+    EmptyRequest,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    T,
+)
+from comfy_api_nodes.apinode_utils import (
    download_url_to_image_tensor,
    download_url_to_video_output,
-    get_number_of_images,
-    image_tensor_pair_to_batch,
-    poll_op,
-    sync_op,
    upload_images_to_comfyapi,
-    validate_image_aspect_ratio,
-    validate_image_dimensions,
    validate_string,
+    image_tensor_pair_to_batch,
 )

+
 BYTEPLUS_IMAGE_ENDPOINT = "/proxy/byteplus/api/v3/images/generations"

 # Long-running tasks endpoints(e.g., video)
@@ -38,14 +46,13 @@ class Image2ImageModelName(str, Enum):


 class Text2VideoModelName(str, Enum):
-    seedance_1_pro = "seedance-1-0-pro-250528"
+    seedance_1_pro  = "seedance-1-0-pro-250528"
    seedance_1_lite = "seedance-1-0-lite-t2v-250428"


 class Image2VideoModelName(str, Enum):
    """note(August 31): Pro model only supports FirstFrame: https://docs.byteplus.com/en/docs/ModelArk/1520757"""
-
-    seedance_1_pro = "seedance-1-0-pro-250528"
+    seedance_1_pro  = "seedance-1-0-pro-250528"
    seedance_1_lite = "seedance-1-0-lite-i2v-250428"


@@ -201,6 +208,35 @@ def get_video_url_from_task_status(response: TaskStatusResponse) -> Union[str, N
    return None


+async def poll_until_finished(
+    auth_kwargs: dict[str, str],
+    task_id: str,
+    estimated_duration: Optional[int] = None,
+    node_id: Optional[str] = None,
+) -> TaskStatusResponse:
+    """Polls the ByteDance API endpoint until the task reaches a terminal state, then returns the response."""
+    return await PollingOperation(
+        poll_endpoint=ApiEndpoint(
+            path=f"{BYTEPLUS_TASK_STATUS_ENDPOINT}/{task_id}",
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=TaskStatusResponse,
+        ),
+        completed_statuses=[
+            "succeeded",
+        ],
+        failed_statuses=[
+            "cancelled",
+            "failed",
+        ],
+        status_extractor=lambda response: response.status,
+        auth_kwargs=auth_kwargs,
+        result_url_extractor=get_video_url_from_task_status,
+        estimated_duration=estimated_duration,
+        node_id=node_id,
+    ).execute()
+
+
 class ByteDanceImageNode(IO.ComfyNode):

    @classmethod
@@ -267,7 +303,7 @@ class ByteDanceImageNode(IO.ComfyNode):
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the image',
+                    tooltip="Whether to add an \"AI generated\" watermark to the image",
                    optional=True,
                ),
            ],
@@ -305,7 +341,8 @@ class ByteDanceImageNode(IO.ComfyNode):
            w, h = width, height
            if not (512 <= w <= 2048) or not (512 <= h <= 2048):
                raise ValueError(
-                    f"Custom size out of range: {w}x{h}. " "Both width and height must be between 512 and 2048 pixels."
+                    f"Custom size out of range: {w}x{h}. "
+                    "Both width and height must be between 512 and 2048 pixels."
                )

        payload = Text2ImageTaskCreationRequest(
@@ -316,12 +353,20 @@ class ByteDanceImageNode(IO.ComfyNode):
            guidance_scale=guidance_scale,
            watermark=watermark,
        )
-        response = await sync_op(
-            cls,
-            ApiEndpoint(path=BYTEPLUS_IMAGE_ENDPOINT, method="POST"),
-            data=payload,
-            response_model=ImageTaskCreationResponse,
-        )
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        response = await SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=BYTEPLUS_IMAGE_ENDPOINT,
+                method=HttpMethod.POST,
+                request_model=Text2ImageTaskCreationRequest,
+                response_model=ImageTaskCreationResponse,
+            ),
+            request=payload,
+            auth_kwargs=auth_kwargs,
+        ).execute()
        return IO.NodeOutput(await download_url_to_image_tensor(get_image_url_from_response(response)))


@@ -375,7 +420,7 @@ class ByteDanceImageEditNode(IO.ComfyNode):
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the image',
+                    tooltip="Whether to add an \"AI generated\" watermark to the image",
                    optional=True,
                ),
            ],
@@ -403,8 +448,17 @@ class ByteDanceImageEditNode(IO.ComfyNode):
        validate_string(prompt, strip_whitespace=True, min_length=1)
        if get_number_of_images(image) != 1:
            raise ValueError("Exactly one input image is required.")
-        validate_image_aspect_ratio(image, (1, 3), (3, 1))
-        source_url = (await upload_images_to_comfyapi(cls, image, max_images=1, mime_type="image/png"))[0]
+        validate_image_aspect_ratio_range(image, (1, 3), (3, 1))
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        source_url = (await upload_images_to_comfyapi(
+            image,
+            max_images=1,
+            mime_type="image/png",
+            auth_kwargs=auth_kwargs,
+        ))[0]
        payload = Image2ImageTaskCreationRequest(
            model=model,
            prompt=prompt,
@@ -413,12 +467,16 @@ class ByteDanceImageEditNode(IO.ComfyNode):
            guidance_scale=guidance_scale,
            watermark=watermark,
        )
-        response = await sync_op(
-            cls,
-            ApiEndpoint(path=BYTEPLUS_IMAGE_ENDPOINT, method="POST"),
-            data=payload,
-            response_model=ImageTaskCreationResponse,
-        )
+        response = await SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=BYTEPLUS_IMAGE_ENDPOINT,
+                method=HttpMethod.POST,
+                request_model=Image2ImageTaskCreationRequest,
+                response_model=ImageTaskCreationResponse,
+            ),
+            request=payload,
+            auth_kwargs=auth_kwargs,
+        ).execute()
        return IO.NodeOutput(await download_url_to_image_tensor(get_image_url_from_response(response)))


@@ -446,7 +504,7 @@ class ByteDanceSeedreamNode(IO.ComfyNode):
                IO.Image.Input(
                    "image",
                    tooltip="Input image(s) for image-to-image generation. "
-                    "List of 1-10 images for single or multi-reference generation.",
+                            "List of 1-10 images for single or multi-reference generation.",
                    optional=True,
                ),
                IO.Combo.Input(
@@ -476,9 +534,9 @@ class ByteDanceSeedreamNode(IO.ComfyNode):
                    "sequential_image_generation",
                    options=["disabled", "auto"],
                    tooltip="Group image generation mode. "
-                    "'disabled' generates a single image. "
-                    "'auto' lets the model decide whether to generate multiple related images "
-                    "(e.g., story scenes, character variations).",
+                            "'disabled' generates a single image. "
+                            "'auto' lets the model decide whether to generate multiple related images "
+                            "(e.g., story scenes, character variations).",
                    optional=True,
                ),
                IO.Int.Input(
@@ -489,7 +547,7 @@ class ByteDanceSeedreamNode(IO.ComfyNode):
                    step=1,
                    display_mode=IO.NumberDisplay.number,
                    tooltip="Maximum number of images to generate when sequential_image_generation='auto'. "
-                    "Total images (input + generated) cannot exceed 15.",
+                            "Total images (input + generated) cannot exceed 15.",
                    optional=True,
                ),
                IO.Int.Input(
@@ -506,7 +564,7 @@ class ByteDanceSeedreamNode(IO.ComfyNode):
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the image.',
+                    tooltip="Whether to add an \"AI generated\" watermark to the image.",
                    optional=True,
                ),
                IO.Boolean.Input(
@@ -553,7 +611,8 @@ class ByteDanceSeedreamNode(IO.ComfyNode):
            w, h = width, height
            if not (1024 <= w <= 4096) or not (1024 <= h <= 4096):
                raise ValueError(
-                    f"Custom size out of range: {w}x{h}. " "Both width and height must be between 1024 and 4096 pixels."
+                    f"Custom size out of range: {w}x{h}. "
+                    "Both width and height must be between 1024 and 4096 pixels."
                )
        n_input_images = get_number_of_images(image) if image is not None else 0
        if n_input_images > 10:
@@ -562,31 +621,41 @@ class ByteDanceSeedreamNode(IO.ComfyNode):
            raise ValueError(
                "The maximum number of generated images plus the number of reference images cannot exceed 15."
            )
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        reference_images_urls = []
        if n_input_images:
            for i in image:
-                validate_image_aspect_ratio(i, (1, 3), (3, 1))
-            reference_images_urls = await upload_images_to_comfyapi(
-                cls,
+                validate_image_aspect_ratio_range(i, (1, 3), (3, 1))
+            reference_images_urls = (await upload_images_to_comfyapi(
                image,
                max_images=n_input_images,
                mime_type="image/png",
-            )
-        response = await sync_op(
-            cls,
-            ApiEndpoint(path=BYTEPLUS_IMAGE_ENDPOINT, method="POST"),
-            response_model=ImageTaskCreationResponse,
-            data=Seedream4TaskCreationRequest(
-                model=model,
-                prompt=prompt,
-                image=reference_images_urls,
-                size=f"{w}x{h}",
-                seed=seed,
-                sequential_image_generation=sequential_image_generation,
-                sequential_image_generation_options=Seedream4Options(max_images=max_images),
-                watermark=watermark,
-            ),
+                auth_kwargs=auth_kwargs,
+            ))
+        payload = Seedream4TaskCreationRequest(
+            model=model,
+            prompt=prompt,
+            image=reference_images_urls,
+            size=f"{w}x{h}",
+            seed=seed,
+            sequential_image_generation=sequential_image_generation,
+            sequential_image_generation_options=Seedream4Options(max_images=max_images),
+            watermark=watermark,
        )
+        response = await SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=BYTEPLUS_IMAGE_ENDPOINT,
+                method=HttpMethod.POST,
+                request_model=Seedream4TaskCreationRequest,
+                response_model=ImageTaskCreationResponse,
+            ),
+            request=payload,
+            auth_kwargs=auth_kwargs,
+        ).execute()
+
        if len(response.data) == 1:
            return IO.NodeOutput(await download_url_to_image_tensor(get_image_url_from_response(response)))
        urls = [str(d["url"]) for d in response.data if isinstance(d, dict) and "url" in d]
@@ -650,13 +719,13 @@ class ByteDanceTextToVideoNode(IO.ComfyNode):
                    "camera_fixed",
                    default=False,
                    tooltip="Specifies whether to fix the camera. The platform appends an instruction "
-                    "to fix the camera to your prompt, but does not guarantee the actual effect.",
+                            "to fix the camera to your prompt, but does not guarantee the actual effect.",
                    optional=True,
                ),
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the video.',
+                    tooltip="Whether to add an \"AI generated\" watermark to the video.",
                    optional=True,
                ),
            ],
@@ -695,9 +764,19 @@ class ByteDanceTextToVideoNode(IO.ComfyNode):
            f"--camerafixed {str(camera_fixed).lower()} "
            f"--watermark {str(watermark).lower()}"
        )
+
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        return await process_video_task(
-            cls,
-            payload=Text2VideoTaskCreationRequest(model=model, content=[TaskTextContent(text=prompt)]),
+            request_model=Text2VideoTaskCreationRequest,
+            payload=Text2VideoTaskCreationRequest(
+                model=model,
+                content=[TaskTextContent(text=prompt)],
+            ),
+            auth_kwargs=auth_kwargs,
+            node_id=cls.hidden.unique_id,
            estimated_duration=max(1, math.ceil(VIDEO_TASKS_EXECUTION_TIME[model][resolution] * (duration / 10.0))),
        )

@@ -761,13 +840,13 @@ class ByteDanceImageToVideoNode(IO.ComfyNode):
                    "camera_fixed",
                    default=False,
                    tooltip="Specifies whether to fix the camera. The platform appends an instruction "
-                    "to fix the camera to your prompt, but does not guarantee the actual effect.",
+                            "to fix the camera to your prompt, but does not guarantee the actual effect.",
                    optional=True,
                ),
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the video.',
+                    tooltip="Whether to add an \"AI generated\" watermark to the video.",
                    optional=True,
                ),
            ],
@@ -798,9 +877,15 @@ class ByteDanceImageToVideoNode(IO.ComfyNode):
        validate_string(prompt, strip_whitespace=True, min_length=1)
        raise_if_text_params(prompt, ["resolution", "ratio", "duration", "seed", "camerafixed", "watermark"])
        validate_image_dimensions(image, min_width=300, min_height=300, max_width=6000, max_height=6000)
-        validate_image_aspect_ratio(image, (2, 5), (5, 2), strict=False)  # 0.4 to 2.5
+        validate_image_aspect_ratio_range(image, (2, 5), (5, 2), strict=False)  # 0.4 to 2.5
+
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
+        image_url = (await upload_images_to_comfyapi(image, max_images=1, auth_kwargs=auth_kwargs))[0]

-        image_url = (await upload_images_to_comfyapi(cls, image, max_images=1))[0]
        prompt = (
            f"{prompt} "
            f"--resolution {resolution} "
@@ -812,11 +897,13 @@ class ByteDanceImageToVideoNode(IO.ComfyNode):
        )

        return await process_video_task(
-            cls,
+            request_model=Image2VideoTaskCreationRequest,
            payload=Image2VideoTaskCreationRequest(
                model=model,
                content=[TaskTextContent(text=prompt), TaskImageContent(image_url=TaskImageContentUrl(url=image_url))],
            ),
+            auth_kwargs=auth_kwargs,
+            node_id=cls.hidden.unique_id,
            estimated_duration=max(1, math.ceil(VIDEO_TASKS_EXECUTION_TIME[model][resolution] * (duration / 10.0))),
        )

@@ -884,13 +971,13 @@ class ByteDanceFirstLastFrameNode(IO.ComfyNode):
                    "camera_fixed",
                    default=False,
                    tooltip="Specifies whether to fix the camera. The platform appends an instruction "
-                    "to fix the camera to your prompt, but does not guarantee the actual effect.",
+                            "to fix the camera to your prompt, but does not guarantee the actual effect.",
                    optional=True,
                ),
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the video.',
+                    tooltip="Whether to add an \"AI generated\" watermark to the video.",
                    optional=True,
                ),
            ],
@@ -923,13 +1010,18 @@ class ByteDanceFirstLastFrameNode(IO.ComfyNode):
        raise_if_text_params(prompt, ["resolution", "ratio", "duration", "seed", "camerafixed", "watermark"])
        for i in (first_frame, last_frame):
            validate_image_dimensions(i, min_width=300, min_height=300, max_width=6000, max_height=6000)
-            validate_image_aspect_ratio(i, (2, 5), (5, 2), strict=False)  # 0.4 to 2.5
+            validate_image_aspect_ratio_range(i, (2, 5), (5, 2), strict=False)  # 0.4 to 2.5
+
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }

        download_urls = await upload_images_to_comfyapi(
-            cls,
            image_tensor_pair_to_batch(first_frame, last_frame),
            max_images=2,
            mime_type="image/png",
+            auth_kwargs=auth_kwargs,
        )

        prompt = (
@@ -943,7 +1035,7 @@ class ByteDanceFirstLastFrameNode(IO.ComfyNode):
        )

        return await process_video_task(
-            cls,
+            request_model=Image2VideoTaskCreationRequest,
            payload=Image2VideoTaskCreationRequest(
                model=model,
                content=[
@@ -952,6 +1044,8 @@ class ByteDanceFirstLastFrameNode(IO.ComfyNode):
                    TaskImageContent(image_url=TaskImageContentUrl(url=str(download_urls[1])), role="last_frame"),
                ],
            ),
+            auth_kwargs=auth_kwargs,
+            node_id=cls.hidden.unique_id,
            estimated_duration=max(1, math.ceil(VIDEO_TASKS_EXECUTION_TIME[model][resolution] * (duration / 10.0))),
        )

@@ -1014,7 +1108,7 @@ class ByteDanceImageReferenceNode(IO.ComfyNode):
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the video.',
+                    tooltip="Whether to add an \"AI generated\" watermark to the video.",
                    optional=True,
                ),
            ],
@@ -1045,9 +1139,17 @@ class ByteDanceImageReferenceNode(IO.ComfyNode):
        raise_if_text_params(prompt, ["resolution", "ratio", "duration", "seed", "watermark"])
        for image in images:
            validate_image_dimensions(image, min_width=300, min_height=300, max_width=6000, max_height=6000)
-            validate_image_aspect_ratio(image, (2, 5), (5, 2), strict=False)  # 0.4 to 2.5
+            validate_image_aspect_ratio_range(image, (2, 5), (5, 2), strict=False)  # 0.4 to 2.5
+
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
+        image_urls = await upload_images_to_comfyapi(
+            images, max_images=4, mime_type="image/png", auth_kwargs=auth_kwargs
+        )

-        image_urls = await upload_images_to_comfyapi(cls, images, max_images=4, mime_type="image/png")
        prompt = (
            f"{prompt} "
            f"--resolution {resolution} "
@@ -1058,32 +1160,42 @@ class ByteDanceImageReferenceNode(IO.ComfyNode):
        )
        x = [
            TaskTextContent(text=prompt),
-            *[TaskImageContent(image_url=TaskImageContentUrl(url=str(i)), role="reference_image") for i in image_urls],
+            *[TaskImageContent(image_url=TaskImageContentUrl(url=str(i)), role="reference_image") for i in image_urls]
        ]
        return await process_video_task(
-            cls,
-            payload=Image2VideoTaskCreationRequest(model=model, content=x),
+            request_model=Image2VideoTaskCreationRequest,
+            payload=Image2VideoTaskCreationRequest(
+                model=model,
+                content=x,
+            ),
+            auth_kwargs=auth_kwargs,
+            node_id=cls.hidden.unique_id,
            estimated_duration=max(1, math.ceil(VIDEO_TASKS_EXECUTION_TIME[model][resolution] * (duration / 10.0))),
        )


 async def process_video_task(
-    cls: type[IO.ComfyNode],
+    request_model: Type[T],
    payload: Union[Text2VideoTaskCreationRequest, Image2VideoTaskCreationRequest],
+    auth_kwargs: dict,
+    node_id: str,
    estimated_duration: Optional[int],
 ) -> IO.NodeOutput:
-    initial_response = await sync_op(
-        cls,
-        ApiEndpoint(path=BYTEPLUS_TASK_ENDPOINT, method="POST"),
-        data=payload,
-        response_model=TaskCreationResponse,
-    )
-    response = await poll_op(
-        cls,
-        ApiEndpoint(path=f"{BYTEPLUS_TASK_STATUS_ENDPOINT}/{initial_response.id}"),
-        status_extractor=lambda r: r.status,
+    initial_response = await SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path=BYTEPLUS_TASK_ENDPOINT,
+            method=HttpMethod.POST,
+            request_model=request_model,
+            response_model=TaskCreationResponse,
+        ),
+        request=payload,
+        auth_kwargs=auth_kwargs,
+    ).execute()
+    response = await poll_until_finished(
+        auth_kwargs,
+        initial_response.id,
        estimated_duration=estimated_duration,
-        response_model=TaskStatusResponse,
+        node_id=node_id,
    )
    return IO.NodeOutput(await download_url_to_video_output(get_video_url_from_task_status(response)))

@@ -1109,6 +1221,5 @@ class ByteDanceExtension(ComfyExtension):
            ByteDanceImageReferenceNode,
        ]

-
 async def comfy_entrypoint() -> ByteDanceExtension:
    return ByteDanceExtension()
--- a/comfy_api_nodes/nodes_gemini.py
+++ b/comfy_api_nodes/nodes_gemini.py
--- a/comfy_api_nodes/nodes_ideogram.py
+++ b/comfy_api_nodes/nodes_ideogram.py
@@ -1,6 +1,6 @@
 from io import BytesIO
 from typing_extensions import override
-from comfy_api.latest import IO, ComfyExtension
+from comfy_api.latest import ComfyExtension, IO
 from PIL import Image
 import numpy as np
 import torch
@@ -11,14 +11,20 @@ from comfy_api_nodes.apis import (
    IdeogramV3Request,
    IdeogramV3EditRequest,
 )
-from comfy_api_nodes.util import (
+
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
-    bytesio_to_image_tensor,
-    download_url_as_bytesio,
-    resize_mask_to_image,
-    sync_op,
+    HttpMethod,
+    SynchronousOperation,
 )

+from comfy_api_nodes.apinode_utils import (
+    download_url_to_bytesio,
+    bytesio_to_image_tensor,
+    resize_mask_to_image,
+)
+from server import PromptServer
+
 V1_V1_RES_MAP = {
  "Auto":"AUTO",
  "512 x 1536":"RESOLUTION_512_1536",
@@ -214,7 +220,7 @@ async def download_and_process_images(image_urls):

    for image_url in image_urls:
        # Using functions from apinode_utils.py to handle downloading and processing
-        image_bytesio = await download_url_as_bytesio(image_url)  # Download image content to BytesIO
+        image_bytesio = await download_url_to_bytesio(image_url)  # Download image content to BytesIO
        img_tensor = bytesio_to_image_tensor(image_bytesio, mode="RGB")  # Convert to torch.Tensor with RGB mode
        image_tensors.append(img_tensor)

@@ -227,6 +233,19 @@ async def download_and_process_images(image_urls):
    return stacked_tensors


+def display_image_urls_on_node(image_urls, node_id):
+    if node_id and image_urls:
+        if len(image_urls) == 1:
+            PromptServer.instance.send_progress_text(
+                f"Generated Image URL:\n{image_urls[0]}", node_id
+            )
+        else:
+            urls_text = "Generated Image URLs:\n" + "\n".join(
+                f"{i+1}. {url}" for i, url in enumerate(image_urls)
+            )
+            PromptServer.instance.send_progress_text(urls_text, node_id)
+
+
 class IdeogramV1(IO.ComfyNode):

    @classmethod
@@ -315,30 +334,44 @@ class IdeogramV1(IO.ComfyNode):
        aspect_ratio = V1_V2_RATIO_MAP.get(aspect_ratio, None)
        model = "V_1_TURBO" if turbo else "V_1"

-        response = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/ideogram/generate", method="POST"),
-            response_model=IdeogramGenerateResponse,
-            data=IdeogramGenerateRequest(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/ideogram/generate",
+                method=HttpMethod.POST,
+                request_model=IdeogramGenerateRequest,
+                response_model=IdeogramGenerateResponse,
+            ),
+            request=IdeogramGenerateRequest(
                image_request=ImageRequest(
                    prompt=prompt,
                    model=model,
                    num_images=num_images,
                    seed=seed,
                    aspect_ratio=aspect_ratio if aspect_ratio != "ASPECT_1_1" else None,
-                    magic_prompt_option=(magic_prompt_option if magic_prompt_option != "AUTO" else None),
+                    magic_prompt_option=(
+                        magic_prompt_option if magic_prompt_option != "AUTO" else None
+                    ),
                    negative_prompt=negative_prompt if negative_prompt else None,
                )
            ),
-            max_retries=1,
+            auth_kwargs=auth,
        )

+        response = await operation.execute()
+
        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")

        image_urls = [image_data.url for image_data in response.data if image_data.url]
+
        if not image_urls:
            raise Exception("No image URLs were generated in the response")
+
+        display_image_urls_on_node(image_urls, cls.hidden.unique_id)
        return IO.NodeOutput(await download_and_process_images(image_urls))


@@ -467,11 +500,18 @@ class IdeogramV2(IO.ComfyNode):
        else:
            final_aspect_ratio = aspect_ratio if aspect_ratio != "ASPECT_1_1" else None

-        response = await sync_op(
-            cls,
-            endpoint=ApiEndpoint(path="/proxy/ideogram/generate", method="POST"),
-            response_model=IdeogramGenerateResponse,
-            data=IdeogramGenerateRequest(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/ideogram/generate",
+                method=HttpMethod.POST,
+                request_model=IdeogramGenerateRequest,
+                response_model=IdeogramGenerateResponse,
+            ),
+            request=IdeogramGenerateRequest(
                image_request=ImageRequest(
                    prompt=prompt,
                    model=model,
@@ -479,20 +519,28 @@ class IdeogramV2(IO.ComfyNode):
                    seed=seed,
                    aspect_ratio=final_aspect_ratio,
                    resolution=final_resolution,
-                    magic_prompt_option=(magic_prompt_option if magic_prompt_option != "AUTO" else None),
+                    magic_prompt_option=(
+                        magic_prompt_option if magic_prompt_option != "AUTO" else None
+                    ),
                    style_type=style_type if style_type != "NONE" else None,
                    negative_prompt=negative_prompt if negative_prompt else None,
                    color_palette=color_palette if color_palette else None,
                )
            ),
-            max_retries=1,
+            auth_kwargs=auth,
        )
+
+        response = await operation.execute()
+
        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")

        image_urls = [image_data.url for image_data in response.data if image_data.url]
+
        if not image_urls:
            raise Exception("No image URLs were generated in the response")
+
+        display_image_urls_on_node(image_urls, cls.hidden.unique_id)
        return IO.NodeOutput(await download_and_process_images(image_urls))


@@ -608,6 +656,10 @@ class IdeogramV3(IO.ComfyNode):
        character_image=None,
        character_mask=None,
    ):
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        if rendering_speed == "BALANCED":  # for backward compatibility
            rendering_speed = "DEFAULT"

@@ -642,6 +694,9 @@ class IdeogramV3(IO.ComfyNode):

        # Check if both image and mask are provided for editing mode
        if image is not None and mask is not None:
+            # Edit mode
+            path = "/proxy/ideogram/ideogram-v3/edit"
+
            # Process image and mask
            input_tensor = image.squeeze().cpu()
            # Resize mask to match image dimension
@@ -694,20 +749,27 @@ class IdeogramV3(IO.ComfyNode):
            if character_mask_binary:
                files["character_mask_binary"] = character_mask_binary

-            response = await sync_op(
-                cls,
-                ApiEndpoint(path="/proxy/ideogram/ideogram-v3/edit", method="POST"),
-                response_model=IdeogramGenerateResponse,
-                data=edit_request,
+            # Execute the operation for edit mode
+            operation = SynchronousOperation(
+                endpoint=ApiEndpoint(
+                    path=path,
+                    method=HttpMethod.POST,
+                    request_model=IdeogramV3EditRequest,
+                    response_model=IdeogramGenerateResponse,
+                ),
+                request=edit_request,
                files=files,
                content_type="multipart/form-data",
-                max_retries=1,
+                auth_kwargs=auth,
            )

        elif image is not None or mask is not None:
            # If only one of image or mask is provided, raise an error
            raise Exception("Ideogram V3 image editing requires both an image AND a mask")
        else:
+            # Generation mode
+            path = "/proxy/ideogram/ideogram-v3/generate"
+
            # Create generation request
            gen_request = IdeogramV3Request(
                prompt=prompt,
@@ -738,22 +800,32 @@ class IdeogramV3(IO.ComfyNode):
            if files:
                gen_request.style_type = "AUTO"

-            response = await sync_op(
-                cls,
-                endpoint=ApiEndpoint(path="/proxy/ideogram/ideogram-v3/generate", method="POST"),
-                response_model=IdeogramGenerateResponse,
-                data=gen_request,
+            # Execute the operation for generation mode
+            operation = SynchronousOperation(
+                endpoint=ApiEndpoint(
+                    path=path,
+                    method=HttpMethod.POST,
+                    request_model=IdeogramV3Request,
+                    response_model=IdeogramGenerateResponse,
+                ),
+                request=gen_request,
                files=files if files else None,
                content_type="multipart/form-data",
-                max_retries=1,
+                auth_kwargs=auth,
            )

+        # Execute the operation and process response
+        response = await operation.execute()
+
        if not response.data or len(response.data) == 0:
            raise Exception("No images were generated in the response")

        image_urls = [image_data.url for image_data in response.data if image_data.url]
+
        if not image_urls:
            raise Exception("No image URLs were generated in the response")
+
+        display_image_urls_on_node(image_urls, cls.hidden.unique_id)
        return IO.NodeOutput(await download_and_process_images(image_urls))


@@ -766,6 +838,5 @@ class IdeogramExtension(ComfyExtension):
            IdeogramV3,
        ]

-
 async def comfy_entrypoint() -> IdeogramExtension:
    return IdeogramExtension()
--- a/comfy_api_nodes/nodes_kling.py
+++ b/comfy_api_nodes/nodes_kling.py
@@ -5,7 +5,8 @@ For source of truth on the allowed permutations of request fields, please refere
 """

 from __future__ import annotations
-from typing import Optional, TypeVar
+from typing import Optional, TypeVar, Any
+from collections.abc import Callable
 import math
 import logging

@@ -14,6 +15,7 @@ from typing_extensions import override
 import torch

 from comfy_api_nodes.apis import (
+    KlingTaskStatus,
    KlingCameraControl,
    KlingCameraConfig,
    KlingCameraControlType,
@@ -50,20 +52,26 @@ from comfy_api_nodes.apis import (
    KlingCharacterEffectModelName,
    KlingSingleImageEffectModelName,
 )
-from comfy_api_nodes.util import (
+from comfy_api_nodes.apis.client import (
+    ApiEndpoint,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    EmptyRequest,
+)
+from comfy_api_nodes.apinode_utils import (
+    tensor_to_base64_string,
+    download_url_to_video_output,
+    upload_video_to_comfyapi,
+    upload_audio_to_comfyapi,
+    download_url_to_image_tensor,
+    validate_string,
+)
+from comfy_api_nodes.util.validation_utils import (
    validate_image_dimensions,
    validate_image_aspect_ratio,
    validate_video_dimensions,
    validate_video_duration,
-    tensor_to_base64_string,
-    validate_string,
-    upload_audio_to_comfyapi,
-    download_url_to_image_tensor,
-    upload_video_to_comfyapi,
-    download_url_to_video_output,
-    sync_op,
-    ApiEndpoint,
-    poll_op,
 )
 from comfy_api.input_impl import VideoFromFile
 from comfy_api.input.basic_types import AudioInput
@@ -206,6 +214,34 @@ VOICES_CONFIG = {
 }


+async def poll_until_finished(
+    auth_kwargs: dict[str, str],
+    api_endpoint: ApiEndpoint[Any, R],
+    result_url_extractor: Optional[Callable[[R], str]] = None,
+    estimated_duration: Optional[int] = None,
+    node_id: Optional[str] = None,
+) -> R:
+    """Polls the Kling API endpoint until the task reaches a terminal state, then returns the response."""
+    return await PollingOperation(
+        poll_endpoint=api_endpoint,
+        completed_statuses=[
+            KlingTaskStatus.succeed.value,
+        ],
+        failed_statuses=[KlingTaskStatus.failed.value],
+        status_extractor=lambda response: (
+            response.data.task_status.value
+            if response.data and response.data.task_status
+            else None
+        ),
+        auth_kwargs=auth_kwargs,
+        result_url_extractor=result_url_extractor,
+        estimated_duration=estimated_duration,
+        node_id=node_id,
+        poll_interval=16.0,
+        max_poll_attempts=256,
+    ).execute()
+
+
 def is_valid_camera_control_configs(configs: list[float]) -> bool:
    """Verifies that at least one camera control configuration is non-zero."""
    return any(not math.isclose(value, 0.0) for value in configs)
@@ -282,7 +318,7 @@ def validate_input_image(image: torch.Tensor) -> None:
    See: https://app.klingai.com/global/dev/document-api/apiReference/model/imageToVideo
    """
    validate_image_dimensions(image, min_width=300, min_height=300)
-    validate_image_aspect_ratio(image, (1, 2.5), (2.5, 1))
+    validate_image_aspect_ratio(image, min_aspect_ratio=1 / 2.5, max_aspect_ratio=2.5)


 def get_video_from_response(response) -> KlingVideoResult:
@@ -341,7 +377,8 @@ async def image_result_to_node_output(


 async def execute_text2video(
-    cls: type[IO.ComfyNode],
+    auth_kwargs: dict[str, str],
+    node_id: str,
    prompt: str,
    negative_prompt: str,
    cfg_scale: float,
@@ -352,11 +389,14 @@ async def execute_text2video(
    camera_control: Optional[KlingCameraControl] = None,
 ) -> IO.NodeOutput:
    validate_prompts(prompt, negative_prompt, MAX_PROMPT_LENGTH_T2V)
-    task_creation_response = await sync_op(
-        cls,
-        ApiEndpoint(path=PATH_TEXT_TO_VIDEO, method="POST"),
-        response_model=KlingText2VideoResponse,
-        data=KlingText2VideoRequest(
+    initial_operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path=PATH_TEXT_TO_VIDEO,
+            method=HttpMethod.POST,
+            request_model=KlingText2VideoRequest,
+            response_model=KlingText2VideoResponse,
+        ),
+        request=KlingText2VideoRequest(
            prompt=prompt if prompt else None,
            negative_prompt=negative_prompt if negative_prompt else None,
            duration=KlingVideoGenDuration(duration),
@@ -366,17 +406,24 @@ async def execute_text2video(
            aspect_ratio=KlingVideoGenAspectRatio(aspect_ratio),
            camera_control=camera_control,
        ),
+        auth_kwargs=auth_kwargs,
    )

+    task_creation_response = await initial_operation.execute()
    validate_task_creation_response(task_creation_response)

    task_id = task_creation_response.data.task_id
-    final_response = await poll_op(
-        cls,
-        ApiEndpoint(path=f"{PATH_TEXT_TO_VIDEO}/{task_id}"),
-        response_model=KlingText2VideoResponse,
+    final_response = await poll_until_finished(
+        auth_kwargs,
+        ApiEndpoint(
+            path=f"{PATH_TEXT_TO_VIDEO}/{task_id}",
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=KlingText2VideoResponse,
+        ),
+        result_url_extractor=get_video_url_from_response,
        estimated_duration=AVERAGE_DURATION_T2V,
-        status_extractor=lambda r: (r.data.task_status.value if r.data and r.data.task_status else None),
+        node_id=node_id,
    )
    validate_video_result_response(final_response)

@@ -385,7 +432,8 @@ async def execute_text2video(


 async def execute_image2video(
-    cls: type[IO.ComfyNode],
+    auth_kwargs: dict[str, str],
+    node_id: str,
    start_frame: torch.Tensor,
    prompt: str,
    negative_prompt: str,
@@ -407,11 +455,14 @@ async def execute_image2video(
    if model_mode == "std" and model_name == KlingVideoGenModelName.kling_v2_5_turbo.value:
        model_mode = "pro"  # October 5: currently "std" mode is not supported for this model

-    task_creation_response = await sync_op(
-        cls,
-        ApiEndpoint(path=PATH_IMAGE_TO_VIDEO, method="POST"),
-        response_model=KlingImage2VideoResponse,
-        data=KlingImage2VideoRequest(
+    initial_operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path=PATH_IMAGE_TO_VIDEO,
+            method=HttpMethod.POST,
+            request_model=KlingImage2VideoRequest,
+            response_model=KlingImage2VideoResponse,
+        ),
+        request=KlingImage2VideoRequest(
            model_name=KlingVideoGenModelName(model_name),
            image=tensor_to_base64_string(start_frame),
            image_tail=(
@@ -426,17 +477,24 @@ async def execute_image2video(
            duration=KlingVideoGenDuration(duration),
            camera_control=camera_control,
        ),
+        auth_kwargs=auth_kwargs,
    )

+    task_creation_response = await initial_operation.execute()
    validate_task_creation_response(task_creation_response)
    task_id = task_creation_response.data.task_id

-    final_response = await poll_op(
-            cls,
-            ApiEndpoint(path=f"{PATH_IMAGE_TO_VIDEO}/{task_id}"),
-            response_model=KlingImage2VideoResponse,
+    final_response = await poll_until_finished(
+            auth_kwargs,
+            ApiEndpoint(
+                path=f"{PATH_IMAGE_TO_VIDEO}/{task_id}",
+                method=HttpMethod.GET,
+                request_model=KlingImage2VideoRequest,
+                response_model=KlingImage2VideoResponse,
+            ),
+            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_I2V,
-            status_extractor=lambda r: (r.data.task_status.value if r.data and r.data.task_status else None),
+            node_id=node_id,
        )
    validate_video_result_response(final_response)

@@ -445,7 +503,8 @@ async def execute_image2video(


 async def execute_video_effect(
-    cls: type[IO.ComfyNode],
+    auth_kwargs: dict[str, str],
+    node_id: str,
    dual_character: bool,
    effect_scene: KlingDualCharacterEffectsScene | KlingSingleImageEffectsScene,
    model_name: str,
@@ -471,25 +530,35 @@ async def execute_video_effect(
            duration=duration,
        )

-    task_creation_response = await sync_op(
-        cls,
-        endpoint=ApiEndpoint(path=PATH_VIDEO_EFFECTS, method="POST"),
-        response_model=KlingVideoEffectsResponse,
-        data=KlingVideoEffectsRequest(
+    initial_operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path=PATH_VIDEO_EFFECTS,
+            method=HttpMethod.POST,
+            request_model=KlingVideoEffectsRequest,
+            response_model=KlingVideoEffectsResponse,
+        ),
+        request=KlingVideoEffectsRequest(
            effect_scene=effect_scene,
            input=request_input_field,
        ),
+        auth_kwargs=auth_kwargs,
    )

+    task_creation_response = await initial_operation.execute()
    validate_task_creation_response(task_creation_response)
    task_id = task_creation_response.data.task_id

-    final_response = await poll_op(
-        cls,
-        ApiEndpoint(path=f"{PATH_VIDEO_EFFECTS}/{task_id}"),
-        response_model=KlingVideoEffectsResponse,
+    final_response = await poll_until_finished(
+        auth_kwargs,
+        ApiEndpoint(
+            path=f"{PATH_VIDEO_EFFECTS}/{task_id}",
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=KlingVideoEffectsResponse,
+        ),
+        result_url_extractor=get_video_url_from_response,
        estimated_duration=AVERAGE_DURATION_VIDEO_EFFECTS,
-        status_extractor=lambda r: (r.data.task_status.value if r.data and r.data.task_status else None),
+        node_id=node_id,
    )
    validate_video_result_response(final_response)

@@ -498,7 +567,8 @@ async def execute_video_effect(


 async def execute_lipsync(
-    cls: type[IO.ComfyNode],
+    auth_kwargs: dict[str, str],
+    node_id: str,
    video: VideoInput,
    audio: Optional[AudioInput] = None,
    voice_language: Optional[str] = None,
@@ -513,23 +583,24 @@ async def execute_lipsync(
    validate_video_duration(video, 2, 10)

    # Upload video to Comfy API and get download URL
-    video_url = await upload_video_to_comfyapi(cls, video)
+    video_url = await upload_video_to_comfyapi(video, auth_kwargs=auth_kwargs)
    logging.info("Uploaded video to Comfy API. URL: %s", video_url)

    # Upload the audio file to Comfy API and get download URL
    if audio:
-        audio_url = await upload_audio_to_comfyapi(
-            cls, audio, container_format="mp3", codec_name="libmp3lame", mime_type="audio/mpeg", filename="output.mp3"
-        )
+        audio_url = await upload_audio_to_comfyapi(audio, auth_kwargs=auth_kwargs)
        logging.info("Uploaded audio to Comfy API. URL: %s", audio_url)
    else:
        audio_url = None

-    task_creation_response = await sync_op(
-        cls,
-        ApiEndpoint(PATH_LIP_SYNC, "POST"),
-        response_model=KlingLipSyncResponse,
-        data=KlingLipSyncRequest(
+    initial_operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path=PATH_LIP_SYNC,
+            method=HttpMethod.POST,
+            request_model=KlingLipSyncRequest,
+            response_model=KlingLipSyncResponse,
+        ),
+        request=KlingLipSyncRequest(
            input=KlingLipSyncInputObject(
                video_url=video_url,
                mode=model_mode,
@@ -541,17 +612,24 @@ async def execute_lipsync(
                voice_id=voice_id,
            ),
        ),
+        auth_kwargs=auth_kwargs,
    )

+    task_creation_response = await initial_operation.execute()
    validate_task_creation_response(task_creation_response)
    task_id = task_creation_response.data.task_id

-    final_response = await poll_op(
-        cls,
-        ApiEndpoint(path=f"{PATH_LIP_SYNC}/{task_id}"),
-        response_model=KlingLipSyncResponse,
+    final_response = await poll_until_finished(
+        auth_kwargs,
+        ApiEndpoint(
+            path=f"{PATH_LIP_SYNC}/{task_id}",
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=KlingLipSyncResponse,
+        ),
+        result_url_extractor=get_video_url_from_response,
        estimated_duration=AVERAGE_DURATION_LIP_SYNC,
-        status_extractor=lambda r: (r.data.task_status.value if r.data and r.data.task_status else None),
+        node_id=node_id,
    )
    validate_video_result_response(final_response)

@@ -729,7 +807,11 @@ class KlingTextToVideoNode(IO.ComfyNode):
    ) -> IO.NodeOutput:
        model_mode, duration, model_name = MODE_TEXT2VIDEO[mode]
        return await execute_text2video(
-            cls,
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            prompt=prompt,
            negative_prompt=negative_prompt,
            cfg_scale=cfg_scale,
@@ -790,7 +872,11 @@ class KlingCameraControlT2VNode(IO.ComfyNode):
        camera_control: Optional[KlingCameraControl] = None,
    ) -> IO.NodeOutput:
        return await execute_text2video(
-            cls,
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            model_name=KlingVideoGenModelName.kling_v1,
            cfg_scale=cfg_scale,
            model_mode=KlingVideoGenMode.std,
@@ -858,7 +944,11 @@ class KlingImage2VideoNode(IO.ComfyNode):
        end_frame: Optional[torch.Tensor] = None,
    ) -> IO.NodeOutput:
        return await execute_image2video(
-            cls,
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            start_frame=start_frame,
            prompt=prompt,
            negative_prompt=negative_prompt,
@@ -927,7 +1017,11 @@ class KlingCameraControlI2VNode(IO.ComfyNode):
        camera_control: KlingCameraControl,
    ) -> IO.NodeOutput:
        return await execute_image2video(
-            cls,
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            model_name=KlingVideoGenModelName.kling_v1_5,
            start_frame=start_frame,
            cfg_scale=cfg_scale,
@@ -1003,7 +1097,11 @@ class KlingStartEndFrameNode(IO.ComfyNode):
    ) -> IO.NodeOutput:
        mode, duration, model_name = MODE_START_END_FRAME[mode]
        return await execute_image2video(
-            cls,
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            prompt=prompt,
            negative_prompt=negative_prompt,
            model_name=model_name,
@@ -1064,27 +1162,41 @@ class KlingVideoExtendNode(IO.ComfyNode):
        video_id: str,
    ) -> IO.NodeOutput:
        validate_prompts(prompt, negative_prompt, MAX_PROMPT_LENGTH_T2V)
-        task_creation_response = await sync_op(
-            cls,
-            ApiEndpoint(path=PATH_VIDEO_EXTEND, method="POST"),
-            response_model=KlingVideoExtendResponse,
-            data=KlingVideoExtendRequest(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_VIDEO_EXTEND,
+                method=HttpMethod.POST,
+                request_model=KlingVideoExtendRequest,
+                response_model=KlingVideoExtendResponse,
+            ),
+            request=KlingVideoExtendRequest(
                prompt=prompt if prompt else None,
                negative_prompt=negative_prompt if negative_prompt else None,
                cfg_scale=cfg_scale,
                video_id=video_id,
            ),
+            auth_kwargs=auth,
        )

+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = await poll_op(
-            cls,
-            ApiEndpoint(path=f"{PATH_VIDEO_EXTEND}/{task_id}"),
-            response_model=KlingVideoExtendResponse,
+        final_response = await poll_until_finished(
+            auth,
+            ApiEndpoint(
+                path=f"{PATH_VIDEO_EXTEND}/{task_id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=KlingVideoExtendResponse,
+            ),
+            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_VIDEO_EXTEND,
-            status_extractor=lambda r: (r.data.task_status.value if r.data and r.data.task_status else None),
+            node_id=cls.hidden.unique_id,
        )
        validate_video_result_response(final_response)

@@ -1147,7 +1259,11 @@ class KlingDualCharacterVideoEffectNode(IO.ComfyNode):
        duration: KlingVideoGenDuration,
    ) -> IO.NodeOutput:
        video, _, duration = await execute_video_effect(
-            cls,
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            dual_character=True,
            effect_scene=effect_scene,
            model_name=model_name,
@@ -1208,7 +1324,11 @@ class KlingSingleImageVideoEffectNode(IO.ComfyNode):
        return IO.NodeOutput(
            *(
                await execute_video_effect(
-                    cls,
+                    auth_kwargs={
+                        "auth_token": cls.hidden.auth_token_comfy_org,
+                        "comfy_api_key": cls.hidden.api_key_comfy_org,
+                    },
+                    node_id=cls.hidden.unique_id,
                    dual_character=False,
                    effect_scene=effect_scene,
                    model_name=model_name,
@@ -1259,7 +1379,11 @@ class KlingLipSyncAudioToVideoNode(IO.ComfyNode):
        voice_language: str,
    ) -> IO.NodeOutput:
        return await execute_lipsync(
-            cls,
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            video=video,
            audio=audio,
            voice_language=voice_language,
@@ -1321,7 +1445,11 @@ class KlingLipSyncTextToVideoNode(IO.ComfyNode):
    ) -> IO.NodeOutput:
        voice_id, voice_language = VOICES_CONFIG[voice]
        return await execute_lipsync(
-            cls,
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            video=video,
            text=text,
            voice_language=voice_language,
@@ -1368,26 +1496,40 @@ class KlingVirtualTryOnNode(IO.ComfyNode):
        cloth_image: torch.Tensor,
        model_name: KlingVirtualTryOnModelName,
    ) -> IO.NodeOutput:
-        task_creation_response = await sync_op(
-            cls,
-            ApiEndpoint(path=PATH_VIRTUAL_TRY_ON, method="POST"),
-            response_model=KlingVirtualTryOnResponse,
-            data=KlingVirtualTryOnRequest(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_VIRTUAL_TRY_ON,
+                method=HttpMethod.POST,
+                request_model=KlingVirtualTryOnRequest,
+                response_model=KlingVirtualTryOnResponse,
+            ),
+            request=KlingVirtualTryOnRequest(
                human_image=tensor_to_base64_string(human_image),
                cloth_image=tensor_to_base64_string(cloth_image),
                model_name=model_name,
            ),
+            auth_kwargs=auth,
        )

+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = await poll_op(
-            cls,
-            ApiEndpoint(path=f"{PATH_VIRTUAL_TRY_ON}/{task_id}"),
-            response_model=KlingVirtualTryOnResponse,
+        final_response = await poll_until_finished(
+            auth,
+            ApiEndpoint(
+                path=f"{PATH_VIRTUAL_TRY_ON}/{task_id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=KlingVirtualTryOnResponse,
+            ),
+            result_url_extractor=get_images_urls_from_response,
            estimated_duration=AVERAGE_DURATION_VIRTUAL_TRY_ON,
-            status_extractor=lambda r: (r.data.task_status.value if r.data and r.data.task_status else None),
+            node_id=cls.hidden.unique_id,
        )
        validate_image_result_response(final_response)

@@ -1483,11 +1625,18 @@ class KlingImageGenerationNode(IO.ComfyNode):
        else:
            image = tensor_to_base64_string(image)

-        task_creation_response = await sync_op(
-            cls,
-            ApiEndpoint(path=PATH_IMAGE_GENERATIONS, method="POST"),
-            response_model=KlingImageGenerationsResponse,
-            data=KlingImageGenerationsRequest(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_IMAGE_GENERATIONS,
+                method=HttpMethod.POST,
+                request_model=KlingImageGenerationsRequest,
+                response_model=KlingImageGenerationsResponse,
+            ),
+            request=KlingImageGenerationsRequest(
                model_name=model_name,
                prompt=prompt,
                negative_prompt=negative_prompt,
@@ -1498,17 +1647,24 @@ class KlingImageGenerationNode(IO.ComfyNode):
                n=n,
                aspect_ratio=aspect_ratio,
            ),
+            auth_kwargs=auth,
        )

+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
        task_id = task_creation_response.data.task_id

-        final_response = await poll_op(
-            cls,
-            ApiEndpoint(path=f"{PATH_IMAGE_GENERATIONS}/{task_id}"),
-            response_model=KlingImageGenerationsResponse,
+        final_response = await poll_until_finished(
+            auth,
+            ApiEndpoint(
+                path=f"{PATH_IMAGE_GENERATIONS}/{task_id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=KlingImageGenerationsResponse,
+            ),
+            result_url_extractor=get_images_urls_from_response,
            estimated_duration=AVERAGE_DURATION_IMAGE_GEN,
-            status_extractor=lambda r: (r.data.task_status.value if r.data and r.data.task_status else None),
+            node_id=cls.hidden.unique_id,
        )
        validate_image_result_response(final_response)

--- a/comfy_api_nodes/nodes_ltxv.py
+++ b/comfy_api_nodes/nodes_ltxv.py
@@ -1,199 +0,0 @@
-from io import BytesIO
-from typing import Optional
-
-import torch
-from pydantic import BaseModel, Field
-from typing_extensions import override
-
-from comfy_api.input_impl import VideoFromFile
-from comfy_api.latest import IO, ComfyExtension
-from comfy_api_nodes.util import (
-    ApiEndpoint,
-    get_number_of_images,
-    sync_op_raw,
-    upload_images_to_comfyapi,
-    validate_string,
-)
-
-MODELS_MAP = {
-    "LTX-2 (Pro)": "ltx-2-pro",
-    "LTX-2 (Fast)": "ltx-2-fast",
-}
-
-
-class ExecuteTaskRequest(BaseModel):
-    prompt: str = Field(...)
-    model: str = Field(...)
-    duration: int = Field(...)
-    resolution: str = Field(...)
-    fps: Optional[int] = Field(25)
-    generate_audio: Optional[bool] = Field(True)
-    image_uri: Optional[str] = Field(None)
-
-
-class TextToVideoNode(IO.ComfyNode):
-    @classmethod
-    def define_schema(cls):
-        return IO.Schema(
-            node_id="LtxvApiTextToVideo",
-            display_name="LTXV Text To Video",
-            category="api node/video/LTXV",
-            description="Professional-quality videos with customizable duration and resolution.",
-            inputs=[
-                IO.Combo.Input("model", options=list(MODELS_MAP.keys())),
-                IO.String.Input(
-                    "prompt",
-                    multiline=True,
-                    default="",
-                ),
-                IO.Combo.Input("duration", options=[6, 8, 10, 12, 14, 16, 18, 20], default=8),
-                IO.Combo.Input(
-                    "resolution",
-                    options=[
-                        "1920x1080",
-                        "2560x1440",
-                        "3840x2160",
-                    ],
-                ),
-                IO.Combo.Input("fps", options=[25, 50], default=25),
-                IO.Boolean.Input(
-                    "generate_audio",
-                    default=False,
-                    optional=True,
-                    tooltip="When true, the generated video will include AI-generated audio matching the scene.",
-                ),
-            ],
-            outputs=[
-                IO.Video.Output(),
-            ],
-            hidden=[
-                IO.Hidden.auth_token_comfy_org,
-                IO.Hidden.api_key_comfy_org,
-                IO.Hidden.unique_id,
-            ],
-            is_api_node=True,
-        )
-
-    @classmethod
-    async def execute(
-        cls,
-        model: str,
-        prompt: str,
-        duration: int,
-        resolution: str,
-        fps: int = 25,
-        generate_audio: bool = False,
-    ) -> IO.NodeOutput:
-        validate_string(prompt, min_length=1, max_length=10000)
-        if duration > 10 and (model != "LTX-2 (Fast)" or resolution != "1920x1080" or fps != 25):
-            raise ValueError(
-                "Durations over 10s are only available for the Fast model at 1920x1080 resolution and 25 FPS."
-            )
-        response = await sync_op_raw(
-            cls,
-            ApiEndpoint("/proxy/ltx/v1/text-to-video", "POST"),
-            data=ExecuteTaskRequest(
-                prompt=prompt,
-                model=MODELS_MAP[model],
-                duration=duration,
-                resolution=resolution,
-                fps=fps,
-                generate_audio=generate_audio,
-            ),
-            as_binary=True,
-            max_retries=1,
-        )
-        return IO.NodeOutput(VideoFromFile(BytesIO(response)))
-
-
-class ImageToVideoNode(IO.ComfyNode):
-    @classmethod
-    def define_schema(cls):
-        return IO.Schema(
-            node_id="LtxvApiImageToVideo",
-            display_name="LTXV Image To Video",
-            category="api node/video/LTXV",
-            description="Professional-quality videos with customizable duration and resolution based on start image.",
-            inputs=[
-                IO.Image.Input("image", tooltip="First frame to be used for the video."),
-                IO.Combo.Input("model", options=list(MODELS_MAP.keys())),
-                IO.String.Input(
-                    "prompt",
-                    multiline=True,
-                    default="",
-                ),
-                IO.Combo.Input("duration", options=[6, 8, 10, 12, 14, 16, 18, 20], default=8),
-                IO.Combo.Input(
-                    "resolution",
-                    options=[
-                        "1920x1080",
-                        "2560x1440",
-                        "3840x2160",
-                    ],
-                ),
-                IO.Combo.Input("fps", options=[25, 50], default=25),
-                IO.Boolean.Input(
-                    "generate_audio",
-                    default=False,
-                    optional=True,
-                    tooltip="When true, the generated video will include AI-generated audio matching the scene.",
-                ),
-            ],
-            outputs=[
-                IO.Video.Output(),
-            ],
-            hidden=[
-                IO.Hidden.auth_token_comfy_org,
-                IO.Hidden.api_key_comfy_org,
-                IO.Hidden.unique_id,
-            ],
-            is_api_node=True,
-        )
-
-    @classmethod
-    async def execute(
-        cls,
-        image: torch.Tensor,
-        model: str,
-        prompt: str,
-        duration: int,
-        resolution: str,
-        fps: int = 25,
-        generate_audio: bool = False,
-    ) -> IO.NodeOutput:
-        validate_string(prompt, min_length=1, max_length=10000)
-        if duration > 10 and (model != "LTX-2 (Fast)" or resolution != "1920x1080" or fps != 25):
-            raise ValueError(
-                "Durations over 10s are only available for the Fast model at 1920x1080 resolution and 25 FPS."
-            )
-        if get_number_of_images(image) != 1:
-            raise ValueError("Currently only one input image is supported.")
-        response = await sync_op_raw(
-            cls,
-            ApiEndpoint("/proxy/ltx/v1/image-to-video", "POST"),
-            data=ExecuteTaskRequest(
-                image_uri=(await upload_images_to_comfyapi(cls, image, max_images=1, mime_type="image/png"))[0],
-                prompt=prompt,
-                model=MODELS_MAP[model],
-                duration=duration,
-                resolution=resolution,
-                fps=fps,
-                generate_audio=generate_audio,
-            ),
-            as_binary=True,
-            max_retries=1,
-        )
-        return IO.NodeOutput(VideoFromFile(BytesIO(response)))
-
-
-class LtxvApiExtension(ComfyExtension):
-    @override
-    async def get_node_list(self) -> list[type[IO.ComfyNode]]:
-        return [
-            TextToVideoNode,
-            ImageToVideoNode,
-        ]
-
-
-async def comfy_entrypoint() -> LtxvApiExtension:
-    return LtxvApiExtension()
--- a/comfy_api_nodes/nodes_luma.py
+++ b/comfy_api_nodes/nodes_luma.py
@@ -1,51 +1,69 @@
+from __future__ import annotations
+from inspect import cleandoc
 from typing import Optional
-
-import torch
 from typing_extensions import override
-
-from comfy_api.latest import IO, ComfyExtension
+from comfy_api.latest import ComfyExtension, IO
+from comfy_api.input_impl.video_types import VideoFromFile
 from comfy_api_nodes.apis.luma_api import (
-    LumaAspectRatio,
-    LumaCharacterRef,
-    LumaConceptChain,
-    LumaGeneration,
-    LumaGenerationRequest,
-    LumaImageGenerationRequest,
-    LumaImageIdentity,
    LumaImageModel,
-    LumaImageReference,
-    LumaIO,
-    LumaKeyframes,
+    LumaVideoModel,
+    LumaVideoOutputResolution,
+    LumaVideoModelOutputDuration,
+    LumaAspectRatio,
+    LumaState,
+    LumaImageGenerationRequest,
+    LumaGenerationRequest,
+    LumaGeneration,
+    LumaCharacterRef,
    LumaModifyImageRef,
+    LumaImageIdentity,
    LumaReference,
    LumaReferenceChain,
-    LumaVideoModel,
-    LumaVideoModelOutputDuration,
-    LumaVideoOutputResolution,
+    LumaImageReference,
+    LumaKeyframes,
+    LumaConceptChain,
+    LumaIO,
    get_luma_concepts,
 )
-from comfy_api_nodes.util import (
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
-    download_url_to_image_tensor,
-    download_url_to_video_output,
-    poll_op,
-    sync_op,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    EmptyRequest,
+)
+from comfy_api_nodes.apinode_utils import (
    upload_images_to_comfyapi,
+    process_image_response,
    validate_string,
 )
+from server import PromptServer
+
+import aiohttp
+import torch
+from io import BytesIO

 LUMA_T2V_AVERAGE_DURATION = 105
 LUMA_I2V_AVERAGE_DURATION = 100

+def image_result_url_extractor(response: LumaGeneration):
+    return response.assets.image if hasattr(response, "assets") and hasattr(response.assets, "image") else None
+
+def video_result_url_extractor(response: LumaGeneration):
+    return response.assets.video if hasattr(response, "assets") and hasattr(response.assets, "video") else None

 class LumaReferenceNode(IO.ComfyNode):
+    """
+    Holds an image and weight for use with Luma Generate Image node.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="LumaReferenceNode",
            display_name="Luma Reference",
            category="api node/image/Luma",
-            description="Holds an image and weight for use with Luma Generate Image node.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.Image.Input(
                    "image",
@@ -65,10 +83,17 @@ class LumaReferenceNode(IO.ComfyNode):
                ),
            ],
            outputs=[IO.Custom(LumaIO.LUMA_REF).Output(display_name="luma_ref")],
+            hidden=[
+                IO.Hidden.auth_token_comfy_org,
+                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
+            ],
        )

    @classmethod
-    def execute(cls, image: torch.Tensor, weight: float, luma_ref: LumaReferenceChain = None) -> IO.NodeOutput:
+    def execute(
+        cls, image: torch.Tensor, weight: float, luma_ref: LumaReferenceChain = None
+    ) -> IO.NodeOutput:
        if luma_ref is not None:
            luma_ref = luma_ref.clone()
        else:
@@ -78,13 +103,17 @@ class LumaReferenceNode(IO.ComfyNode):


 class LumaConceptsNode(IO.ComfyNode):
+    """
+    Holds one or more Camera Concepts for use with Luma Text to Video and Luma Image to Video nodes.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="LumaConceptsNode",
            display_name="Luma Concepts",
            category="api node/video/Luma",
-            description="Camera Concepts for use with Luma Text to Video and Luma Image to Video nodes.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.Combo.Input(
                    "concept1",
@@ -109,6 +138,11 @@ class LumaConceptsNode(IO.ComfyNode):
                ),
            ],
            outputs=[IO.Custom(LumaIO.LUMA_CONCEPTS).Output(display_name="luma_concepts")],
+            hidden=[
+                IO.Hidden.auth_token_comfy_org,
+                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
+            ],
        )

    @classmethod
@@ -127,13 +161,17 @@ class LumaConceptsNode(IO.ComfyNode):


 class LumaImageGenerationNode(IO.ComfyNode):
+    """
+    Generates images synchronously based on prompt and aspect ratio.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="LumaImageNode",
            display_name="Luma Text to Image",
            category="api node/image/Luma",
-            description="Generates images synchronously based on prompt and aspect ratio.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.String.Input(
                    "prompt",
@@ -199,30 +237,45 @@ class LumaImageGenerationNode(IO.ComfyNode):
        aspect_ratio: str,
        seed,
        style_image_weight: float,
-        image_luma_ref: Optional[LumaReferenceChain] = None,
-        style_image: Optional[torch.Tensor] = None,
-        character_image: Optional[torch.Tensor] = None,
+        image_luma_ref: LumaReferenceChain = None,
+        style_image: torch.Tensor = None,
+        character_image: torch.Tensor = None,
    ) -> IO.NodeOutput:
        validate_string(prompt, strip_whitespace=True, min_length=3)
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        # handle image_luma_ref
        api_image_ref = None
        if image_luma_ref is not None:
-            api_image_ref = await cls._convert_luma_refs(image_luma_ref, max_refs=4)
+            api_image_ref = await cls._convert_luma_refs(
+                image_luma_ref, max_refs=4, auth_kwargs=auth_kwargs,
+            )
        # handle style_luma_ref
        api_style_ref = None
        if style_image is not None:
-            api_style_ref = await cls._convert_style_image(style_image, weight=style_image_weight)
+            api_style_ref = await cls._convert_style_image(
+                style_image, weight=style_image_weight, auth_kwargs=auth_kwargs,
+            )
        # handle character_ref images
        character_ref = None
        if character_image is not None:
-            download_urls = await upload_images_to_comfyapi(cls, character_image, max_images=4)
-            character_ref = LumaCharacterRef(identity0=LumaImageIdentity(images=download_urls))
+            download_urls = await upload_images_to_comfyapi(
+                character_image, max_images=4, auth_kwargs=auth_kwargs,
+            )
+            character_ref = LumaCharacterRef(
+                identity0=LumaImageIdentity(images=download_urls)
+            )

-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/luma/generations/image", method="POST"),
-            response_model=LumaGeneration,
-            data=LumaImageGenerationRequest(
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/luma/generations/image",
+                method=HttpMethod.POST,
+                request_model=LumaImageGenerationRequest,
+                response_model=LumaGeneration,
+            ),
+            request=LumaImageGenerationRequest(
                prompt=prompt,
                model=model,
                aspect_ratio=aspect_ratio,
@@ -230,21 +283,41 @@ class LumaImageGenerationNode(IO.ComfyNode):
                style_ref=api_style_ref,
                character_ref=character_ref,
            ),
+            auth_kwargs=auth_kwargs,
        )
-        response_poll = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/luma/generations/{response_api.id}"),
-            response_model=LumaGeneration,
+        response_api: LumaGeneration = await operation.execute()
+
+        operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path=f"/proxy/luma/generations/{response_api.id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=LumaGeneration,
+            ),
+            completed_statuses=[LumaState.completed],
+            failed_statuses=[LumaState.failed],
            status_extractor=lambda x: x.state,
+            result_url_extractor=image_result_url_extractor,
+            node_id=cls.hidden.unique_id,
+            auth_kwargs=auth_kwargs,
        )
-        return IO.NodeOutput(await download_url_to_image_tensor(response_poll.assets.image))
+        response_poll = await operation.execute()
+
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.image) as img_response:
+                img = process_image_response(await img_response.content.read())
+        return IO.NodeOutput(img)

    @classmethod
-    async def _convert_luma_refs(cls, luma_ref: LumaReferenceChain, max_refs: int):
+    async def _convert_luma_refs(
+        cls, luma_ref: LumaReferenceChain, max_refs: int, auth_kwargs: Optional[dict[str,str]] = None
+    ):
        luma_urls = []
        ref_count = 0
        for ref in luma_ref.refs:
-            download_urls = await upload_images_to_comfyapi(cls, ref.image, max_images=1)
+            download_urls = await upload_images_to_comfyapi(
+                ref.image, max_images=1, auth_kwargs=auth_kwargs
+            )
            luma_urls.append(download_urls[0])
            ref_count += 1
            if ref_count >= max_refs:
@@ -252,19 +325,27 @@ class LumaImageGenerationNode(IO.ComfyNode):
        return luma_ref.create_api_model(download_urls=luma_urls, max_refs=max_refs)

    @classmethod
-    async def _convert_style_image(cls, style_image: torch.Tensor, weight: float):
-        chain = LumaReferenceChain(first_ref=LumaReference(image=style_image, weight=weight))
-        return await cls._convert_luma_refs(chain, max_refs=1)
+    async def _convert_style_image(
+        cls, style_image: torch.Tensor, weight: float, auth_kwargs: Optional[dict[str,str]] = None
+    ):
+        chain = LumaReferenceChain(
+            first_ref=LumaReference(image=style_image, weight=weight)
+        )
+        return await cls._convert_luma_refs(chain, max_refs=1, auth_kwargs=auth_kwargs)


 class LumaImageModifyNode(IO.ComfyNode):
+    """
+    Modifies images synchronously based on prompt and aspect ratio.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="LumaImageModifyNode",
            display_name="Luma Image to Image",
            category="api node/image/Luma",
-            description="Modifies images synchronously based on prompt and aspect ratio.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.Image.Input(
                    "image",
@@ -314,37 +395,68 @@ class LumaImageModifyNode(IO.ComfyNode):
        image_weight: float,
        seed,
    ) -> IO.NodeOutput:
-        download_urls = await upload_images_to_comfyapi(cls, image, max_images=1)
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        # first, upload image
+        download_urls = await upload_images_to_comfyapi(
+            image, max_images=1, auth_kwargs=auth_kwargs,
+        )
        image_url = download_urls[0]
-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/luma/generations/image", method="POST"),
-            response_model=LumaGeneration,
-            data=LumaImageGenerationRequest(
+        # next, make Luma call with download url provided
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/luma/generations/image",
+                method=HttpMethod.POST,
+                request_model=LumaImageGenerationRequest,
+                response_model=LumaGeneration,
+            ),
+            request=LumaImageGenerationRequest(
                prompt=prompt,
                model=model,
                modify_image_ref=LumaModifyImageRef(
-                    url=image_url, weight=round(max(min(1.0 - image_weight, 0.98), 0.0), 2)
+                    url=image_url, weight=round(max(min(1.0-image_weight, 0.98), 0.0), 2)
                ),
            ),
+            auth_kwargs=auth_kwargs,
        )
-        response_poll = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/luma/generations/{response_api.id}"),
-            response_model=LumaGeneration,
+        response_api: LumaGeneration = await operation.execute()
+
+        operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path=f"/proxy/luma/generations/{response_api.id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=LumaGeneration,
+            ),
+            completed_statuses=[LumaState.completed],
+            failed_statuses=[LumaState.failed],
            status_extractor=lambda x: x.state,
+            result_url_extractor=image_result_url_extractor,
+            node_id=cls.hidden.unique_id,
+            auth_kwargs=auth_kwargs,
        )
-        return IO.NodeOutput(await download_url_to_image_tensor(response_poll.assets.image))
+        response_poll = await operation.execute()
+
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.image) as img_response:
+                img = process_image_response(await img_response.content.read())
+        return IO.NodeOutput(img)


 class LumaTextToVideoGenerationNode(IO.ComfyNode):
+    """
+    Generates videos synchronously based on prompt and output_size.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="LumaVideoNode",
            display_name="Luma Text to Video",
            category="api node/video/Luma",
-            description="Generates videos synchronously based on prompt and output_size.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.String.Input(
                    "prompt",
@@ -386,7 +498,7 @@ class LumaTextToVideoGenerationNode(IO.ComfyNode):
                    "luma_concepts",
                    tooltip="Optional Camera Concepts to dictate camera motion via the Luma Concepts node.",
                    optional=True,
-                ),
+                )
            ],
            outputs=[IO.Video.Output()],
            hidden=[
@@ -407,17 +519,24 @@ class LumaTextToVideoGenerationNode(IO.ComfyNode):
        duration: str,
        loop: bool,
        seed,
-        luma_concepts: Optional[LumaConceptChain] = None,
+        luma_concepts: LumaConceptChain = None,
    ) -> IO.NodeOutput:
        validate_string(prompt, strip_whitespace=False, min_length=3)
        duration = duration if model != LumaVideoModel.ray_1_6 else None
        resolution = resolution if model != LumaVideoModel.ray_1_6 else None

-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/luma/generations", method="POST"),
-            response_model=LumaGeneration,
-            data=LumaGenerationRequest(
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/luma/generations",
+                method=HttpMethod.POST,
+                request_model=LumaGenerationRequest,
+                response_model=LumaGeneration,
+            ),
+            request=LumaGenerationRequest(
                prompt=prompt,
                model=model,
                resolution=resolution,
@@ -426,25 +545,47 @@ class LumaTextToVideoGenerationNode(IO.ComfyNode):
                loop=loop,
                concepts=luma_concepts.create_api_model() if luma_concepts else None,
            ),
+            auth_kwargs=auth_kwargs,
        )
-        response_poll = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/luma/generations/{response_api.id}"),
-            response_model=LumaGeneration,
+        response_api: LumaGeneration = await operation.execute()
+
+        if cls.hidden.unique_id:
+            PromptServer.instance.send_progress_text(f"Luma video generation started: {response_api.id}", cls.hidden.unique_id)
+
+        operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path=f"/proxy/luma/generations/{response_api.id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=LumaGeneration,
+            ),
+            completed_statuses=[LumaState.completed],
+            failed_statuses=[LumaState.failed],
            status_extractor=lambda x: x.state,
+            result_url_extractor=video_result_url_extractor,
+            node_id=cls.hidden.unique_id,
            estimated_duration=LUMA_T2V_AVERAGE_DURATION,
+            auth_kwargs=auth_kwargs,
        )
-        return IO.NodeOutput(await download_url_to_video_output(response_poll.assets.video))
+        response_poll = await operation.execute()
+
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.video) as vid_response:
+                return IO.NodeOutput(VideoFromFile(BytesIO(await vid_response.content.read())))


 class LumaImageToVideoGenerationNode(IO.ComfyNode):
+    """
+    Generates videos synchronously based on prompt, input images, and output_size.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="LumaImageToVideoNode",
            display_name="Luma Image to Video",
            category="api node/video/Luma",
-            description="Generates videos synchronously based on prompt, input images, and output_size.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.String.Input(
                    "prompt",
@@ -496,7 +637,7 @@ class LumaImageToVideoGenerationNode(IO.ComfyNode):
                    "luma_concepts",
                    tooltip="Optional Camera Concepts to dictate camera motion via the Luma Concepts node.",
                    optional=True,
-                ),
+                )
            ],
            outputs=[IO.Video.Output()],
            hidden=[
@@ -521,15 +662,25 @@ class LumaImageToVideoGenerationNode(IO.ComfyNode):
        luma_concepts: LumaConceptChain = None,
    ) -> IO.NodeOutput:
        if first_image is None and last_image is None:
-            raise Exception("At least one of first_image and last_image requires an input.")
-        keyframes = await cls._convert_to_keyframes(first_image, last_image)
+            raise Exception(
+                "At least one of first_image and last_image requires an input."
+            )
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        keyframes = await cls._convert_to_keyframes(first_image, last_image, auth_kwargs=auth_kwargs)
        duration = duration if model != LumaVideoModel.ray_1_6 else None
        resolution = resolution if model != LumaVideoModel.ray_1_6 else None
-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/luma/generations", method="POST"),
-            response_model=LumaGeneration,
-            data=LumaGenerationRequest(
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/luma/generations",
+                method=HttpMethod.POST,
+                request_model=LumaGenerationRequest,
+                response_model=LumaGeneration,
+            ),
+            request=LumaGenerationRequest(
                prompt=prompt,
                model=model,
                aspect_ratio=LumaAspectRatio.ratio_16_9,  # ignored, but still needed by the API for some reason
@@ -539,31 +690,54 @@ class LumaImageToVideoGenerationNode(IO.ComfyNode):
                keyframes=keyframes,
                concepts=luma_concepts.create_api_model() if luma_concepts else None,
            ),
+            auth_kwargs=auth_kwargs,
        )
-        response_poll = await poll_op(
-            cls,
-            poll_endpoint=ApiEndpoint(path=f"/proxy/luma/generations/{response_api.id}"),
-            response_model=LumaGeneration,
+        response_api: LumaGeneration = await operation.execute()
+
+        if cls.hidden.unique_id:
+            PromptServer.instance.send_progress_text(f"Luma video generation started: {response_api.id}", cls.hidden.unique_id)
+
+        operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path=f"/proxy/luma/generations/{response_api.id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=LumaGeneration,
+            ),
+            completed_statuses=[LumaState.completed],
+            failed_statuses=[LumaState.failed],
            status_extractor=lambda x: x.state,
+            result_url_extractor=video_result_url_extractor,
+            node_id=cls.hidden.unique_id,
            estimated_duration=LUMA_I2V_AVERAGE_DURATION,
+            auth_kwargs=auth_kwargs,
        )
-        return IO.NodeOutput(await download_url_to_video_output(response_poll.assets.video))
+        response_poll = await operation.execute()
+
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.assets.video) as vid_response:
+                return IO.NodeOutput(VideoFromFile(BytesIO(await vid_response.content.read())))

    @classmethod
    async def _convert_to_keyframes(
        cls,
        first_image: torch.Tensor = None,
        last_image: torch.Tensor = None,
+        auth_kwargs: Optional[dict[str,str]] = None,
    ):
        if first_image is None and last_image is None:
            return None
        frame0 = None
        frame1 = None
        if first_image is not None:
-            download_urls = await upload_images_to_comfyapi(cls, first_image, max_images=1)
+            download_urls = await upload_images_to_comfyapi(
+                first_image, max_images=1, auth_kwargs=auth_kwargs,
+            )
            frame0 = LumaImageReference(type="image", url=download_urls[0])
        if last_image is not None:
-            download_urls = await upload_images_to_comfyapi(cls, last_image, max_images=1)
+            download_urls = await upload_images_to_comfyapi(
+                last_image, max_images=1, auth_kwargs=auth_kwargs,
+            )
            frame1 = LumaImageReference(type="image", url=download_urls[0])
        return LumaKeyframes(frame0=frame0, frame1=frame1)

--- a/comfy_api_nodes/nodes_minimax.py
+++ b/comfy_api_nodes/nodes_minimax.py
@@ -1,57 +1,71 @@
+from inspect import cleandoc
 from typing import Optional
-
+import logging
 import torch
-from typing_extensions import override

-from comfy_api.latest import IO, ComfyExtension
-from comfy_api_nodes.apis.minimax_api import (
-    MinimaxFileRetrieveResponse,
-    MiniMaxModel,
-    MinimaxTaskResultResponse,
+from typing_extensions import override
+from comfy_api.latest import ComfyExtension, IO
+from comfy_api.input_impl.video_types import VideoFromFile
+from comfy_api_nodes.apis import (
    MinimaxVideoGenerationRequest,
    MinimaxVideoGenerationResponse,
+    MinimaxFileRetrieveResponse,
+    MinimaxTaskResultResponse,
    SubjectReferenceItem,
+    MiniMaxModel,
 )
-from comfy_api_nodes.util import (
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
-    download_url_to_video_output,
-    poll_op,
-    sync_op,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    EmptyRequest,
+)
+from comfy_api_nodes.apinode_utils import (
+    download_url_to_bytesio,
    upload_images_to_comfyapi,
    validate_string,
 )
+from server import PromptServer
+

 I2V_AVERAGE_DURATION = 114
 T2V_AVERAGE_DURATION = 234


 async def _generate_mm_video(
-    cls: type[IO.ComfyNode],
    *,
+    auth: dict[str, str],
+    node_id: str,
    prompt_text: str,
    seed: int,
    model: str,
-    image: Optional[torch.Tensor] = None,  # used for ImageToVideo
-    subject: Optional[torch.Tensor] = None,  # used for SubjectToVideo
+    image: Optional[torch.Tensor] = None,   # used for ImageToVideo
+    subject: Optional[torch.Tensor] = None, # used for SubjectToVideo
    average_duration: Optional[int] = None,
 ) -> IO.NodeOutput:
    if image is None:
        validate_string(prompt_text, field_name="prompt_text")
+    # upload image, if passed in
    image_url = None
    if image is not None:
-        image_url = (await upload_images_to_comfyapi(cls, image, max_images=1))[0]
+        image_url = (await upload_images_to_comfyapi(image, max_images=1, auth_kwargs=auth))[0]

    # TODO: figure out how to deal with subject properly, API returns invalid params when using S2V-01 model
    subject_reference = None
    if subject is not None:
-        subject_url = (await upload_images_to_comfyapi(cls, subject, max_images=1))[0]
+        subject_url = (await upload_images_to_comfyapi(subject, max_images=1, auth_kwargs=auth))[0]
        subject_reference = [SubjectReferenceItem(image=subject_url)]

-    response = await sync_op(
-        cls,
-        ApiEndpoint(path="/proxy/minimax/video_generation", method="POST"),
-        response_model=MinimaxVideoGenerationResponse,
-        data=MinimaxVideoGenerationRequest(
+
+    video_generate_operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path="/proxy/minimax/video_generation",
+            method=HttpMethod.POST,
+            request_model=MinimaxVideoGenerationRequest,
+            response_model=MinimaxVideoGenerationResponse,
+        ),
+        request=MinimaxVideoGenerationRequest(
            model=MiniMaxModel(model),
            prompt=prompt_text,
            callback_url=None,
@@ -59,50 +73,81 @@ async def _generate_mm_video(
            subject_reference=subject_reference,
            prompt_optimizer=None,
        ),
+        auth_kwargs=auth,
    )
+    response = await video_generate_operation.execute()

    task_id = response.task_id
    if not task_id:
        raise Exception(f"MiniMax generation failed: {response.base_resp}")

-    task_result = await poll_op(
-        cls,
-        ApiEndpoint(path="/proxy/minimax/query/video_generation", query_params={"task_id": task_id}),
-        response_model=MinimaxTaskResultResponse,
+    video_generate_operation = PollingOperation(
+        poll_endpoint=ApiEndpoint(
+            path="/proxy/minimax/query/video_generation",
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=MinimaxTaskResultResponse,
+            query_params={"task_id": task_id},
+        ),
+        completed_statuses=["Success"],
+        failed_statuses=["Fail"],
        status_extractor=lambda x: x.status.value,
        estimated_duration=average_duration,
+        node_id=node_id,
+        auth_kwargs=auth,
    )
+    task_result = await video_generate_operation.execute()

    file_id = task_result.file_id
    if file_id is None:
        raise Exception("Request was not successful. Missing file ID.")
-    file_result = await sync_op(
-        cls,
-        ApiEndpoint(path="/proxy/minimax/files/retrieve", query_params={"file_id": int(file_id)}),
-        response_model=MinimaxFileRetrieveResponse,
+    file_retrieve_operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path="/proxy/minimax/files/retrieve",
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=MinimaxFileRetrieveResponse,
+            query_params={"file_id": int(file_id)},
+        ),
+        request=EmptyRequest(),
+        auth_kwargs=auth,
    )
+    file_result = await file_retrieve_operation.execute()

    file_url = file_result.file.download_url
    if file_url is None:
-        raise Exception(f"No video was found in the response. Full response: {file_result.model_dump()}")
-    if file_result.file.backup_download_url:
-        try:
-            return IO.NodeOutput(await download_url_to_video_output(file_url, timeout=10, max_retries=2))
-        except Exception:  # if we have a second URL to retrieve the result, try again using that one
-            return IO.NodeOutput(
-                await download_url_to_video_output(file_result.file.backup_download_url, max_retries=3)
-            )
-    return IO.NodeOutput(await download_url_to_video_output(file_url))
+        raise Exception(
+            f"No video was found in the response. Full response: {file_result.model_dump()}"
+        )
+    logging.info("Generated video URL: %s", file_url)
+    if node_id:
+        if hasattr(file_result.file, "backup_download_url"):
+            message = f"Result URL: {file_url}\nBackup URL: {file_result.file.backup_download_url}"
+        else:
+            message = f"Result URL: {file_url}"
+        PromptServer.instance.send_progress_text(message, node_id)
+
+    # Download and return as VideoFromFile
+    video_io = await download_url_to_bytesio(file_url)
+    if video_io is None:
+        error_msg = f"Failed to download video from {file_url}"
+        logging.error(error_msg)
+        raise Exception(error_msg)
+    return IO.NodeOutput(VideoFromFile(video_io))


 class MinimaxTextToVideoNode(IO.ComfyNode):
+    """
+    Generates videos synchronously based on a prompt, and optional parameters using MiniMax's API.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="MinimaxTextToVideoNode",
            display_name="MiniMax Text to Video",
            category="api node/video/MiniMax",
-            description="Generates videos synchronously based on a prompt, and optional parameters.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.String.Input(
                    "prompt_text",
@@ -144,7 +189,11 @@ class MinimaxTextToVideoNode(IO.ComfyNode):
        seed: int = 0,
    ) -> IO.NodeOutput:
        return await _generate_mm_video(
-            cls,
+            auth={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            prompt_text=prompt_text,
            seed=seed,
            model=model,
@@ -155,13 +204,17 @@ class MinimaxTextToVideoNode(IO.ComfyNode):


 class MinimaxImageToVideoNode(IO.ComfyNode):
+    """
+    Generates videos synchronously based on an image and prompt, and optional parameters using MiniMax's API.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="MinimaxImageToVideoNode",
            display_name="MiniMax Image to Video",
            category="api node/video/MiniMax",
-            description="Generates videos synchronously based on an image and prompt, and optional parameters.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.Image.Input(
                    "image",
@@ -208,7 +261,11 @@ class MinimaxImageToVideoNode(IO.ComfyNode):
        seed: int = 0,
    ) -> IO.NodeOutput:
        return await _generate_mm_video(
-            cls,
+            auth={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            prompt_text=prompt_text,
            seed=seed,
            model=model,
@@ -219,13 +276,17 @@ class MinimaxImageToVideoNode(IO.ComfyNode):


 class MinimaxSubjectToVideoNode(IO.ComfyNode):
+    """
+    Generates videos synchronously based on an image and prompt, and optional parameters using MiniMax's API.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="MinimaxSubjectToVideoNode",
            display_name="MiniMax Subject to Video",
            category="api node/video/MiniMax",
-            description="Generates videos synchronously based on an image and prompt, and optional parameters.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.Image.Input(
                    "subject",
@@ -272,7 +333,11 @@ class MinimaxSubjectToVideoNode(IO.ComfyNode):
        seed: int = 0,
    ) -> IO.NodeOutput:
        return await _generate_mm_video(
-            cls,
+            auth={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            node_id=cls.hidden.unique_id,
            prompt_text=prompt_text,
            seed=seed,
            model=model,
@@ -283,13 +348,15 @@ class MinimaxSubjectToVideoNode(IO.ComfyNode):


 class MinimaxHailuoVideoNode(IO.ComfyNode):
+    """Generates videos from prompt, with optional start frame using the new MiniMax Hailuo-02 model."""
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="MinimaxHailuoVideoNode",
            display_name="MiniMax Hailuo Video",
            category="api node/video/MiniMax",
-            description="Generates videos from prompt, with optional start frame using the new MiniMax Hailuo-02 model.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.String.Input(
                    "prompt_text",
@@ -353,6 +420,10 @@ class MinimaxHailuoVideoNode(IO.ComfyNode):
        resolution: str = "768P",
        model: str = "MiniMax-Hailuo-02",
    ) -> IO.NodeOutput:
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        if first_frame_image is None:
            validate_string(prompt_text, field_name="prompt_text")

@@ -364,13 +435,16 @@ class MinimaxHailuoVideoNode(IO.ComfyNode):
        # upload image, if passed in
        image_url = None
        if first_frame_image is not None:
-            image_url = (await upload_images_to_comfyapi(cls, first_frame_image, max_images=1))[0]
+            image_url = (await upload_images_to_comfyapi(first_frame_image, max_images=1, auth_kwargs=auth))[0]

-        response = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/minimax/video_generation", method="POST"),
-            response_model=MinimaxVideoGenerationResponse,
-            data=MinimaxVideoGenerationRequest(
+        video_generate_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/minimax/video_generation",
+                method=HttpMethod.POST,
+                request_model=MinimaxVideoGenerationRequest,
+                response_model=MinimaxVideoGenerationResponse,
+            ),
+            request=MinimaxVideoGenerationRequest(
                model=MiniMaxModel(model),
                prompt=prompt_text,
                callback_url=None,
@@ -379,42 +453,67 @@ class MinimaxHailuoVideoNode(IO.ComfyNode):
                duration=duration,
                resolution=resolution,
            ),
+            auth_kwargs=auth,
        )
+        response = await video_generate_operation.execute()

        task_id = response.task_id
        if not task_id:
            raise Exception(f"MiniMax generation failed: {response.base_resp}")

        average_duration = 120 if resolution == "768P" else 240
-        task_result = await poll_op(
-            cls,
-            ApiEndpoint(path="/proxy/minimax/query/video_generation", query_params={"task_id": task_id}),
-            response_model=MinimaxTaskResultResponse,
+        video_generate_operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path="/proxy/minimax/query/video_generation",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=MinimaxTaskResultResponse,
+                query_params={"task_id": task_id},
+            ),
+            completed_statuses=["Success"],
+            failed_statuses=["Fail"],
            status_extractor=lambda x: x.status.value,
            estimated_duration=average_duration,
+            node_id=cls.hidden.unique_id,
+            auth_kwargs=auth,
        )
+        task_result = await video_generate_operation.execute()

        file_id = task_result.file_id
        if file_id is None:
            raise Exception("Request was not successful. Missing file ID.")
-        file_result = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/minimax/files/retrieve", query_params={"file_id": int(file_id)}),
-            response_model=MinimaxFileRetrieveResponse,
+        file_retrieve_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/minimax/files/retrieve",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=MinimaxFileRetrieveResponse,
+                query_params={"file_id": int(file_id)},
+            ),
+            request=EmptyRequest(),
+            auth_kwargs=auth,
        )
+        file_result = await file_retrieve_operation.execute()

        file_url = file_result.file.download_url
        if file_url is None:
-            raise Exception(f"No video was found in the response. Full response: {file_result.model_dump()}")
+            raise Exception(
+                f"No video was found in the response. Full response: {file_result.model_dump()}"
+            )
+        logging.info("Generated video URL: %s", file_url)
+        if cls.hidden.unique_id:
+            if hasattr(file_result.file, "backup_download_url"):
+                message = f"Result URL: {file_url}\nBackup URL: {file_result.file.backup_download_url}"
+            else:
+                message = f"Result URL: {file_url}"
+            PromptServer.instance.send_progress_text(message, cls.hidden.unique_id)

-        if file_result.file.backup_download_url:
-            try:
-                return IO.NodeOutput(await download_url_to_video_output(file_url, timeout=10, max_retries=2))
-            except Exception:  # if we have a second URL to retrieve the result, try again using that one
-                return IO.NodeOutput(
-                    await download_url_to_video_output(file_result.file.backup_download_url, max_retries=3)
-                )
-        return IO.NodeOutput(await download_url_to_video_output(file_url))
+        video_io = await download_url_to_bytesio(file_url)
+        if video_io is None:
+            error_msg = f"Failed to download video from {file_url}"
+            logging.error(error_msg)
+            raise Exception(error_msg)
+        return IO.NodeOutput(VideoFromFile(video_io))


 class MinimaxExtension(ComfyExtension):
--- a/comfy_api_nodes/nodes_moonvalley.py
+++ b/comfy_api_nodes/nodes_moonvalley.py
@@ -1,31 +1,35 @@
 import logging
-from typing import Optional
-
+from typing import Any, Callable, Optional, TypeVar
 import torch
 from typing_extensions import override
+from comfy_api_nodes.util.validation_utils import validate_image_dimensions

-from comfy_api.input import VideoInput
-from comfy_api.latest import IO, ComfyExtension
 from comfy_api_nodes.apis import (
-    MoonvalleyPromptResponse,
-    MoonvalleyTextToVideoInferenceParams,
    MoonvalleyTextToVideoRequest,
+    MoonvalleyTextToVideoInferenceParams,
    MoonvalleyVideoToVideoInferenceParams,
    MoonvalleyVideoToVideoRequest,
+    MoonvalleyPromptResponse,
 )
-from comfy_api_nodes.util import (
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    EmptyRequest,
+)
+from comfy_api_nodes.apinode_utils import (
    download_url_to_video_output,
-    poll_op,
-    sync_op,
-    trim_video,
    upload_images_to_comfyapi,
    upload_video_to_comfyapi,
    validate_container_format_is_mp4,
-    validate_image_dimensions,
-    validate_string,
 )

+from comfy_api.input import VideoInput
+from comfy_api.latest import ComfyExtension, InputImpl, IO
+import av
+import io
+
 API_UPLOADS_ENDPOINT = "/proxy/moonvalley/uploads"
 API_PROMPTS_ENDPOINT = "/proxy/moonvalley/prompts"
 API_VIDEO2VIDEO_ENDPOINT = "/proxy/moonvalley/prompts/video-to-video"
@@ -47,6 +51,13 @@ MAX_VID_HEIGHT = 10000
 MAX_VIDEO_SIZE = 1024 * 1024 * 1024  # 1 GB max for in-memory video processing

 MOONVALLEY_MAREY_MAX_PROMPT_LENGTH = 5000
+R = TypeVar("R")
+
+
+class MoonvalleyApiError(Exception):
+    """Base exception for Moonvalley API errors."""
+
+    pass


 def is_valid_task_creation_response(response: MoonvalleyPromptResponse) -> bool:
@@ -58,7 +69,64 @@ def validate_task_creation_response(response) -> None:
    if not is_valid_task_creation_response(response):
        error_msg = f"Moonvalley Marey API: Initial request failed. Code: {response.code}, Message: {response.message}, Data: {response}"
        logging.error(error_msg)
-        raise RuntimeError(error_msg)
+        raise MoonvalleyApiError(error_msg)
+
+
+def get_video_from_response(response):
+    video = response.output_url
+    logging.info(
+        "Moonvalley Marey API: Task %s succeeded. Video URL: %s", response.id, video
+    )
+    return video
+
+
+def get_video_url_from_response(response) -> Optional[str]:
+    """Returns the first video url from the Moonvalley video generation task result.
+    Will not raise an error if the response is not valid.
+    """
+    if response:
+        return str(get_video_from_response(response))
+    else:
+        return None
+
+
+async def poll_until_finished(
+    auth_kwargs: dict[str, str],
+    api_endpoint: ApiEndpoint[Any, R],
+    result_url_extractor: Optional[Callable[[R], str]] = None,
+    node_id: Optional[str] = None,
+) -> R:
+    """Polls the Moonvalley API endpoint until the task reaches a terminal state, then returns the response."""
+    return await PollingOperation(
+        poll_endpoint=api_endpoint,
+        completed_statuses=[
+            "completed",
+        ],
+        max_poll_attempts=240,  # 64 minutes with 16s interval
+        poll_interval=16.0,
+        failed_statuses=["error"],
+        status_extractor=lambda response: (
+            response.status if response and response.status else None
+        ),
+        auth_kwargs=auth_kwargs,
+        result_url_extractor=result_url_extractor,
+        node_id=node_id,
+    ).execute()
+
+
+def validate_prompts(
+    prompt: str, negative_prompt: str, max_length=MOONVALLEY_MAREY_MAX_PROMPT_LENGTH
+):
+    """Verifies that the prompt isn't empty and that neither prompt is too long."""
+    if not prompt:
+        raise ValueError("Positive prompt is empty")
+    if len(prompt) > max_length:
+        raise ValueError(f"Positive prompt is too long: {len(prompt)} characters")
+    if negative_prompt and len(negative_prompt) > max_length:
+        raise ValueError(
+            f"Negative prompt is too long: {len(negative_prompt)} characters"
+        )
+    return True


 def validate_video_to_video_input(video: VideoInput) -> VideoInput:
@@ -102,8 +170,12 @@ def _validate_video_dimensions(width: int, height: int) -> None:
    }

    if (width, height) not in supported_resolutions:
-        supported_list = ", ".join([f"{w}x{h}" for w, h in sorted(supported_resolutions)])
-        raise ValueError(f"Resolution {width}x{height} not supported. Supported: {supported_list}")
+        supported_list = ", ".join(
+            [f"{w}x{h}" for w, h in sorted(supported_resolutions)]
+        )
+        raise ValueError(
+            f"Resolution {width}x{height} not supported. Supported: {supported_list}"
+        )


 def _validate_and_trim_duration(video: VideoInput) -> VideoInput:
@@ -116,7 +188,7 @@ def _validate_and_trim_duration(video: VideoInput) -> VideoInput:
 def _validate_minimum_duration(duration: float) -> None:
    """Ensures video is at least 5 seconds long."""
    if duration < 5:
-        raise ValueError("Input video must be at least 5 seconds long.")
+        raise MoonvalleyApiError("Input video must be at least 5 seconds long.")


 def _trim_if_too_long(video: VideoInput, duration: float) -> VideoInput:
@@ -126,6 +198,123 @@ def _trim_if_too_long(video: VideoInput, duration: float) -> VideoInput:
    return video


+def trim_video(video: VideoInput, duration_sec: float) -> VideoInput:
+    """
+    Returns a new VideoInput object trimmed from the beginning to the specified duration,
+    using av to avoid loading entire video into memory.
+
+    Args:
+        video: Input video to trim
+        duration_sec: Duration in seconds to keep from the beginning
+
+    Returns:
+        VideoFromFile object that owns the output buffer
+    """
+    output_buffer = io.BytesIO()
+
+    input_container = None
+    output_container = None
+
+    try:
+        # Get the stream source - this avoids loading entire video into memory
+        # when the source is already a file path
+        input_source = video.get_stream_source()
+
+        # Open containers
+        input_container = av.open(input_source, mode="r")
+        output_container = av.open(output_buffer, mode="w", format="mp4")
+
+        # Set up output streams for re-encoding
+        video_stream = None
+        audio_stream = None
+
+        for stream in input_container.streams:
+            logging.info("Found stream: type=%s, class=%s", stream.type, type(stream))
+            if isinstance(stream, av.VideoStream):
+                # Create output video stream with same parameters
+                video_stream = output_container.add_stream(
+                    "h264", rate=stream.average_rate
+                )
+                video_stream.width = stream.width
+                video_stream.height = stream.height
+                video_stream.pix_fmt = "yuv420p"
+                logging.info(
+                    "Added video stream: %sx%s @ %sfps", stream.width, stream.height, stream.average_rate
+                )
+            elif isinstance(stream, av.AudioStream):
+                # Create output audio stream with same parameters
+                audio_stream = output_container.add_stream(
+                    "aac", rate=stream.sample_rate
+                )
+                audio_stream.sample_rate = stream.sample_rate
+                audio_stream.layout = stream.layout
+                logging.info("Added audio stream: %sHz, %s channels", stream.sample_rate, stream.channels)
+
+        # Calculate target frame count that's divisible by 16
+        fps = input_container.streams.video[0].average_rate
+        estimated_frames = int(duration_sec * fps)
+        target_frames = (
+            estimated_frames // 16
+        ) * 16  # Round down to nearest multiple of 16
+
+        if target_frames == 0:
+            raise ValueError("Video too short: need at least 16 frames for Moonvalley")
+
+        frame_count = 0
+        audio_frame_count = 0
+
+        # Decode and re-encode video frames
+        if video_stream:
+            for frame in input_container.decode(video=0):
+                if frame_count >= target_frames:
+                    break
+
+                # Re-encode frame
+                for packet in video_stream.encode(frame):
+                    output_container.mux(packet)
+                frame_count += 1
+
+            # Flush encoder
+            for packet in video_stream.encode():
+                output_container.mux(packet)
+
+            logging.info("Encoded %s video frames (target: %s)", frame_count, target_frames)
+
+        # Decode and re-encode audio frames
+        if audio_stream:
+            input_container.seek(0)  # Reset to beginning for audio
+            for frame in input_container.decode(audio=0):
+                if frame.time >= duration_sec:
+                    break
+
+                # Re-encode frame
+                for packet in audio_stream.encode(frame):
+                    output_container.mux(packet)
+                audio_frame_count += 1
+
+            # Flush encoder
+            for packet in audio_stream.encode():
+                output_container.mux(packet)
+
+            logging.info("Encoded %s audio frames", audio_frame_count)
+
+        # Close containers
+        output_container.close()
+        input_container.close()
+
+        # Return as VideoFromFile using the buffer
+        output_buffer.seek(0)
+        return InputImpl.VideoFromFile(output_buffer)
+
+    except Exception as e:
+        # Clean up on error
+        if input_container is not None:
+            input_container.close()
+        if output_container is not None:
+            output_container.close()
+        raise RuntimeError(f"Failed to trim video: {str(e)}") from e
+
+
 def parse_width_height_from_res(resolution: str):
    # Accepts a string like "16:9 (1920 x 1080)" and returns width, height as a dict
    res_map = {
@@ -149,14 +338,19 @@ def parse_control_parameter(value):
    return control_map.get(value, control_map["Motion Transfer"])


-async def get_response(cls: type[IO.ComfyNode], task_id: str) -> MoonvalleyPromptResponse:
-    return await poll_op(
-        cls,
-        ApiEndpoint(path=f"{API_PROMPTS_ENDPOINT}/{task_id}"),
-        response_model=MoonvalleyPromptResponse,
-        status_extractor=lambda r: (r.status if r and r.status else None),
-        poll_interval=16.0,
-        max_poll_attempts=240,
+async def get_response(
+    task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None
+) -> MoonvalleyPromptResponse:
+    return await poll_until_finished(
+        auth_kwargs,
+        ApiEndpoint(
+            path=f"{API_PROMPTS_ENDPOINT}/{task_id}",
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=MoonvalleyPromptResponse,
+        ),
+        result_url_extractor=get_video_url_from_response,
+        node_id=node_id,
    )


@@ -250,10 +444,14 @@ class MoonvalleyImg2VideoNode(IO.ComfyNode):
        steps: int,
    ) -> IO.NodeOutput:
        validate_image_dimensions(image, min_width=300, min_height=300, max_height=MAX_HEIGHT, max_width=MAX_WIDTH)
-        validate_string(prompt, min_length=1, max_length=MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
-        validate_string(negative_prompt, field_name="negative_prompt", max_length=MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
+        validate_prompts(prompt, negative_prompt, MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
        width_height = parse_width_height_from_res(resolution)

+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
        inference_params = MoonvalleyTextToVideoInferenceParams(
            negative_prompt=negative_prompt,
            steps=steps,
@@ -266,17 +464,33 @@ class MoonvalleyImg2VideoNode(IO.ComfyNode):

        # Get MIME type from tensor - assuming PNG format for image tensors
        mime_type = "image/png"
-        image_url = (await upload_images_to_comfyapi(cls, image, max_images=1, mime_type=mime_type))[0]
-        task_creation_response = await sync_op(
-            cls,
-            endpoint=ApiEndpoint(path=API_IMG2VIDEO_ENDPOINT, method="POST"),
-            response_model=MoonvalleyPromptResponse,
-            data=MoonvalleyTextToVideoRequest(
-                image_url=image_url, prompt_text=prompt, inference_params=inference_params
-            ),
+
+        image_url = (
+            await upload_images_to_comfyapi(
+                image, max_images=1, auth_kwargs=auth, mime_type=mime_type
+            )
+        )[0]
+
+        request = MoonvalleyTextToVideoRequest(
+            image_url=image_url, prompt_text=prompt, inference_params=inference_params
        )
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=API_IMG2VIDEO_ENDPOINT,
+                method=HttpMethod.POST,
+                request_model=MoonvalleyTextToVideoRequest,
+                response_model=MoonvalleyPromptResponse,
+            ),
+            request=request,
+            auth_kwargs=auth,
+        )
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
-        final_response = await get_response(cls, task_creation_response.id)
+        task_id = task_creation_response.id
+
+        final_response = await get_response(
+            task_id, auth_kwargs=auth, node_id=cls.hidden.unique_id
+        )
        video = await download_url_to_video_output(final_response.output_url)
        return IO.NodeOutput(video)

@@ -368,10 +582,15 @@ class MoonvalleyVideo2VideoNode(IO.ComfyNode):
        steps=33,
        prompt_adherence=4.5,
    ) -> IO.NodeOutput:
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
        validated_video = validate_video_to_video_input(video)
-        video_url = await upload_video_to_comfyapi(cls, validated_video)
-        validate_string(prompt, min_length=1, max_length=MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
-        validate_string(negative_prompt, field_name="negative_prompt", max_length=MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
+        video_url = await upload_video_to_comfyapi(validated_video, auth_kwargs=auth)
+
+        validate_prompts(prompt, negative_prompt)

        # Only include motion_intensity for Motion Transfer
        control_params = {}
@@ -386,20 +605,35 @@ class MoonvalleyVideo2VideoNode(IO.ComfyNode):
            guidance_scale=prompt_adherence,
        )

-        task_creation_response = await sync_op(
-            cls,
-            endpoint=ApiEndpoint(path=API_VIDEO2VIDEO_ENDPOINT, method="POST"),
-            response_model=MoonvalleyPromptResponse,
-            data=MoonvalleyVideoToVideoRequest(
-                control_type=parse_control_parameter(control_type),
-                video_url=video_url,
-                prompt_text=prompt,
-                inference_params=inference_params,
-            ),
+        control = parse_control_parameter(control_type)
+
+        request = MoonvalleyVideoToVideoRequest(
+            control_type=control,
+            video_url=video_url,
+            prompt_text=prompt,
+            inference_params=inference_params,
        )
+
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=API_VIDEO2VIDEO_ENDPOINT,
+                method=HttpMethod.POST,
+                request_model=MoonvalleyVideoToVideoRequest,
+                response_model=MoonvalleyPromptResponse,
+            ),
+            request=request,
+            auth_kwargs=auth,
+        )
+        task_creation_response = await initial_operation.execute()
        validate_task_creation_response(task_creation_response)
-        final_response = await get_response(cls, task_creation_response.id)
-        return IO.NodeOutput(await download_url_to_video_output(final_response.output_url))
+        task_id = task_creation_response.id
+
+        final_response = await get_response(
+            task_id, auth_kwargs=auth, node_id=cls.hidden.unique_id
+        )
+
+        video = await download_url_to_video_output(final_response.output_url)
+        return IO.NodeOutput(video)


 class MoonvalleyTxt2VideoNode(IO.ComfyNode):
@@ -486,10 +720,14 @@ class MoonvalleyTxt2VideoNode(IO.ComfyNode):
        seed: int,
        steps: int,
    ) -> IO.NodeOutput:
-        validate_string(prompt, min_length=1, max_length=MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
-        validate_string(negative_prompt, field_name="negative_prompt", max_length=MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
+        validate_prompts(prompt, negative_prompt, MOONVALLEY_MAREY_MAX_PROMPT_LENGTH)
        width_height = parse_width_height_from_res(resolution)

+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
        inference_params = MoonvalleyTextToVideoInferenceParams(
            negative_prompt=negative_prompt,
            steps=steps,
@@ -499,16 +737,30 @@ class MoonvalleyTxt2VideoNode(IO.ComfyNode):
            width=width_height["width"],
            height=width_height["height"],
        )
-
-        task_creation_response = await sync_op(
-            cls,
-            endpoint=ApiEndpoint(path=API_TXT2VIDEO_ENDPOINT, method="POST"),
-            response_model=MoonvalleyPromptResponse,
-            data=MoonvalleyTextToVideoRequest(prompt_text=prompt, inference_params=inference_params),
+        request = MoonvalleyTextToVideoRequest(
+            prompt_text=prompt, inference_params=inference_params
        )
+
+        init_op = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=API_TXT2VIDEO_ENDPOINT,
+                method=HttpMethod.POST,
+                request_model=MoonvalleyTextToVideoRequest,
+                response_model=MoonvalleyPromptResponse,
+            ),
+            request=request,
+            auth_kwargs=auth,
+        )
+        task_creation_response = await init_op.execute()
        validate_task_creation_response(task_creation_response)
-        final_response = await get_response(cls, task_creation_response.id)
-        return IO.NodeOutput(await download_url_to_video_output(final_response.output_url))
+        task_id = task_creation_response.id
+
+        final_response = await get_response(
+            task_id, auth_kwargs=auth, node_id=cls.hidden.unique_id
+        )
+
+        video = await download_url_to_video_output(final_response.output_url)
+        return IO.NodeOutput(video)


 class MoonvalleyExtension(ComfyExtension):
--- a/comfy_api_nodes/nodes_openai.py
+++ b/comfy_api_nodes/nodes_openai.py
--- a/comfy_api_nodes/nodes_pika.py
+++ b/comfy_api_nodes/nodes_pika.py
@@ -7,23 +7,28 @@ from __future__ import annotations

 from io import BytesIO
 import logging
-from typing import Optional
+from typing import Optional, TypeVar

 import torch

 from typing_extensions import override
 from comfy_api.latest import ComfyExtension, IO
 from comfy_api.input_impl.video_types import VideoCodec, VideoContainer, VideoInput
-from comfy_api_nodes.apis import pika_api as pika_defs
-from comfy_api_nodes.util import (
-    validate_string,
+from comfy_api_nodes.apinode_utils import (
    download_url_to_video_output,
    tensor_to_bytesio,
+    validate_string,
+)
+from comfy_api_nodes.apis import pika_defs
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
-    sync_op,
-    poll_op,
+    EmptyRequest,
+    HttpMethod,
+    PollingOperation,
+    SynchronousOperation,
 )

+R = TypeVar("R")

 PATH_PIKADDITIONS = "/proxy/pika/generate/pikadditions"
 PATH_PIKASWAPS = "/proxy/pika/generate/pikaswaps"
@@ -39,18 +44,28 @@ PATH_VIDEO_GET = "/proxy/pika/videos"


 async def execute_task(
-    task_id: str,
-    cls: type[IO.ComfyNode],
+    initial_operation: SynchronousOperation[R, pika_defs.PikaGenerateResponse],
+    auth_kwargs: Optional[dict[str, str]] = None,
+    node_id: Optional[str] = None,
 ) -> IO.NodeOutput:
-    final_response: pika_defs.PikaVideoResponse = await poll_op(
-        cls,
-        ApiEndpoint(path=f"{PATH_VIDEO_GET}/{task_id}"),
-        response_model=pika_defs.PikaVideoResponse,
+    task_id = (await initial_operation.execute()).video_id
+    final_response: pika_defs.PikaVideoResponse = await PollingOperation(
+        poll_endpoint=ApiEndpoint(
+            path=f"{PATH_VIDEO_GET}/{task_id}",
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=pika_defs.PikaVideoResponse,
+        ),
+        completed_statuses=["finished"],
+        failed_statuses=["failed", "cancelled"],
        status_extractor=lambda response: (response.status.value if response.status else None),
        progress_extractor=lambda response: (response.progress if hasattr(response, "progress") else None),
+        auth_kwargs=auth_kwargs,
+        result_url_extractor=lambda response: (response.url if hasattr(response, "url") else None),
+        node_id=node_id,
        estimated_duration=60,
        max_poll_attempts=240,
-    )
+    ).execute()
    if not final_response.url:
        error_msg = f"Pika task {task_id} succeeded but no video data found in response:\n{final_response}"
        logging.error(error_msg)
@@ -113,15 +128,23 @@ class PikaImageToVideo(IO.ComfyNode):
            resolution=resolution,
            duration=duration,
        )
-        initial_operation = await sync_op(
-            cls,
-            ApiEndpoint(path=PATH_IMAGE_TO_VIDEO, method="POST"),
-            response_model=pika_defs.PikaGenerateResponse,
-            data=pika_request_data,
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_IMAGE_TO_VIDEO,
+                method=HttpMethod.POST,
+                request_model=pika_defs.PikaBodyGenerate22I2vGenerate22I2vPost,
+                response_model=pika_defs.PikaGenerateResponse,
+            ),
+            request=pika_request_data,
            files=pika_files,
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )
-        return await execute_task(initial_operation.video_id, cls)
+        return await execute_task(initial_operation, auth_kwargs=auth, node_id=cls.hidden.unique_id)


 class PikaTextToVideoNode(IO.ComfyNode):
@@ -164,11 +187,18 @@ class PikaTextToVideoNode(IO.ComfyNode):
        duration: int,
        aspect_ratio: float,
    ) -> IO.NodeOutput:
-        initial_operation = await sync_op(
-            cls,
-            ApiEndpoint(path=PATH_TEXT_TO_VIDEO, method="POST"),
-            response_model=pika_defs.PikaGenerateResponse,
-            data=pika_defs.PikaBodyGenerate22T2vGenerate22T2vPost(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_TEXT_TO_VIDEO,
+                method=HttpMethod.POST,
+                request_model=pika_defs.PikaBodyGenerate22T2vGenerate22T2vPost,
+                response_model=pika_defs.PikaGenerateResponse,
+            ),
+            request=pika_defs.PikaBodyGenerate22T2vGenerate22T2vPost(
                promptText=prompt_text,
                negativePrompt=negative_prompt,
                seed=seed,
@@ -176,9 +206,10 @@ class PikaTextToVideoNode(IO.ComfyNode):
                duration=duration,
                aspectRatio=aspect_ratio,
            ),
+            auth_kwargs=auth,
            content_type="application/x-www-form-urlencoded",
        )
-        return await execute_task(initial_operation.video_id, cls)
+        return await execute_task(initial_operation, auth_kwargs=auth, node_id=cls.hidden.unique_id)


 class PikaScenes(IO.ComfyNode):
@@ -282,16 +313,24 @@ class PikaScenes(IO.ComfyNode):
            duration=duration,
            aspectRatio=aspect_ratio,
        )
-        initial_operation = await sync_op(
-            cls,
-            ApiEndpoint(path=PATH_PIKASCENES, method="POST"),
-            response_model=pika_defs.PikaGenerateResponse,
-            data=pika_request_data,
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_PIKASCENES,
+                method=HttpMethod.POST,
+                request_model=pika_defs.PikaBodyGenerate22C2vGenerate22PikascenesPost,
+                response_model=pika_defs.PikaGenerateResponse,
+            ),
+            request=pika_request_data,
            files=pika_files,
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )

-        return await execute_task(initial_operation.video_id, cls)
+        return await execute_task(initial_operation, auth_kwargs=auth, node_id=cls.hidden.unique_id)


 class PikAdditionsNode(IO.ComfyNode):
@@ -348,16 +387,24 @@ class PikAdditionsNode(IO.ComfyNode):
            negativePrompt=negative_prompt,
            seed=seed,
        )
-        initial_operation = await sync_op(
-            cls,
-            ApiEndpoint(path=PATH_PIKADDITIONS, method="POST"),
-            response_model=pika_defs.PikaGenerateResponse,
-            data=pika_request_data,
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_PIKADDITIONS,
+                method=HttpMethod.POST,
+                request_model=pika_defs.PikaBodyGeneratePikadditionsGeneratePikadditionsPost,
+                response_model=pika_defs.PikaGenerateResponse,
+            ),
+            request=pika_request_data,
            files=pika_files,
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )

-        return await execute_task(initial_operation.video_id, cls)
+        return await execute_task(initial_operation, auth_kwargs=auth, node_id=cls.hidden.unique_id)


 class PikaSwapsNode(IO.ComfyNode):
@@ -429,15 +476,23 @@ class PikaSwapsNode(IO.ComfyNode):
            seed=seed,
            modifyRegionRoi=region_to_modify if region_to_modify else None,
        )
-        initial_operation = await sync_op(
-            cls,
-            ApiEndpoint(path=PATH_PIKASWAPS, method="POST"),
-            response_model=pika_defs.PikaGenerateResponse,
-            data=pika_request_data,
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_PIKASWAPS,
+                method=HttpMethod.POST,
+                request_model=pika_defs.PikaBodyGeneratePikaswapsGeneratePikaswapsPost,
+                response_model=pika_defs.PikaGenerateResponse,
+            ),
+            request=pika_request_data,
            files=pika_files,
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )
-        return await execute_task(initial_operation.video_id, cls)
+        return await execute_task(initial_operation, auth_kwargs=auth, node_id=cls.hidden.unique_id)


 class PikaffectsNode(IO.ComfyNode):
@@ -477,11 +532,18 @@ class PikaffectsNode(IO.ComfyNode):
        negative_prompt: str,
        seed: int,
    ) -> IO.NodeOutput:
-        initial_operation = await sync_op(
-            cls,
-            ApiEndpoint(path=PATH_PIKAFFECTS, method="POST"),
-            response_model=pika_defs.PikaGenerateResponse,
-            data=pika_defs.PikaBodyGeneratePikaffectsGeneratePikaffectsPost(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_PIKAFFECTS,
+                method=HttpMethod.POST,
+                request_model=pika_defs.PikaBodyGeneratePikaffectsGeneratePikaffectsPost,
+                response_model=pika_defs.PikaGenerateResponse,
+            ),
+            request=pika_defs.PikaBodyGeneratePikaffectsGeneratePikaffectsPost(
                pikaffect=pikaffect,
                promptText=prompt_text,
                negativePrompt=negative_prompt,
@@ -489,8 +551,9 @@ class PikaffectsNode(IO.ComfyNode):
            ),
            files={"image": ("image.png", tensor_to_bytesio(image), "image/png")},
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )
-        return await execute_task(initial_operation.video_id, cls)
+        return await execute_task(initial_operation, auth_kwargs=auth, node_id=cls.hidden.unique_id)


 class PikaStartEndFrameNode(IO.ComfyNode):
@@ -533,11 +596,18 @@ class PikaStartEndFrameNode(IO.ComfyNode):
            ("keyFrames", ("image_start.png", tensor_to_bytesio(image_start), "image/png")),
            ("keyFrames", ("image_end.png", tensor_to_bytesio(image_end), "image/png")),
        ]
-        initial_operation = await sync_op(
-            cls,
-            ApiEndpoint(path=PATH_PIKAFRAMES, method="POST"),
-            response_model=pika_defs.PikaGenerateResponse,
-            data=pika_defs.PikaBodyGenerate22KeyframeGenerate22PikaframesPost(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_PIKAFRAMES,
+                method=HttpMethod.POST,
+                request_model=pika_defs.PikaBodyGenerate22KeyframeGenerate22PikaframesPost,
+                response_model=pika_defs.PikaGenerateResponse,
+            ),
+            request=pika_defs.PikaBodyGenerate22KeyframeGenerate22PikaframesPost(
                promptText=prompt_text,
                negativePrompt=negative_prompt,
                seed=seed,
@@ -546,8 +616,9 @@ class PikaStartEndFrameNode(IO.ComfyNode):
            ),
            files=pika_files,
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )
-        return await execute_task(initial_operation.video_id, cls)
+        return await execute_task(initial_operation, auth_kwargs=auth, node_id=cls.hidden.unique_id)


 class PikaApiNodesExtension(ComfyExtension):
--- a/comfy_api_nodes/nodes_pixverse.py
+++ b/comfy_api_nodes/nodes_pixverse.py
@@ -1,6 +1,7 @@
-import torch
+from inspect import cleandoc
+from typing import Optional
 from typing_extensions import override
-from comfy_api.latest import IO, ComfyExtension
+from io import BytesIO
 from comfy_api_nodes.apis.pixverse_api import (
    PixverseTextVideoRequest,
    PixverseImageVideoRequest,
@@ -16,30 +17,59 @@ from comfy_api_nodes.apis.pixverse_api import (
    PixverseIO,
    pixverse_templates,
 )
-from comfy_api_nodes.util import (
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
-    download_url_to_video_output,
-    poll_op,
-    sync_op,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    EmptyRequest,
+)
+from comfy_api_nodes.apinode_utils import (
    tensor_to_bytesio,
    validate_string,
 )
+from comfy_api.input_impl import VideoFromFile
+from comfy_api.latest import ComfyExtension, IO
+
+import torch
+import aiohttp
+

 AVERAGE_DURATION_T2V = 32
 AVERAGE_DURATION_I2V = 30
 AVERAGE_DURATION_T2T = 52


-async def upload_image_to_pixverse(cls: type[IO.ComfyNode], image: torch.Tensor):
-    response_upload = await sync_op(
-        cls,
-        ApiEndpoint(path="/proxy/pixverse/image/upload", method="POST"),
-        response_model=PixverseImageUploadResponse,
-        files={"image": tensor_to_bytesio(image)},
+def get_video_url_from_response(
+    response: PixverseGenerationStatusResponse,
+) -> Optional[str]:
+    if response.Resp is None or response.Resp.url is None:
+        return None
+    return str(response.Resp.url)
+
+
+async def upload_image_to_pixverse(image: torch.Tensor, auth_kwargs=None):
+    # first, upload image to Pixverse and get image id to use in actual generation call
+    files = {"image": tensor_to_bytesio(image)}
+    operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path="/proxy/pixverse/image/upload",
+            method=HttpMethod.POST,
+            request_model=EmptyRequest,
+            response_model=PixverseImageUploadResponse,
+        ),
+        request=EmptyRequest(),
+        files=files,
        content_type="multipart/form-data",
+        auth_kwargs=auth_kwargs,
    )
+    response_upload: PixverseImageUploadResponse = await operation.execute()
+
    if response_upload.Resp is None:
-        raise Exception(f"PixVerse image upload request failed: '{response_upload.ErrMsg}'")
+        raise Exception(
+            f"PixVerse image upload request failed: '{response_upload.ErrMsg}'"
+        )
+
    return response_upload.Resp.img_id


@@ -65,17 +95,22 @@ class PixverseTemplateNode(IO.ComfyNode):
        template_id = pixverse_templates.get(template, None)
        if template_id is None:
            raise Exception(f"Template '{template}' is not recognized.")
+        # just return the integer
        return IO.NodeOutput(template_id)


 class PixverseTextToVideoNode(IO.ComfyNode):
+    """
+    Generates videos based on prompt and output_size.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="PixverseTextToVideoNode",
            display_name="PixVerse Text to Video",
            category="api node/video/PixVerse",
-            description="Generates videos based on prompt and output_size.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.String.Input(
                    "prompt",
@@ -142,7 +177,7 @@ class PixverseTextToVideoNode(IO.ComfyNode):
        negative_prompt: str = None,
        pixverse_template: int = None,
    ) -> IO.NodeOutput:
-        validate_string(prompt, strip_whitespace=False, min_length=1)
+        validate_string(prompt, strip_whitespace=False)
        # 1080p is limited to 5 seconds duration
        # only normal motion_mode supported for 1080p or for non-5 second duration
        if quality == PixverseQuality.res_1080p:
@@ -151,11 +186,18 @@ class PixverseTextToVideoNode(IO.ComfyNode):
        elif duration_seconds != PixverseDuration.dur_5:
            motion_mode = PixverseMotionMode.normal

-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/pixverse/video/text/generate", method="POST"),
-            response_model=PixverseVideoResponse,
-            data=PixverseTextVideoRequest(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/pixverse/video/text/generate",
+                method=HttpMethod.POST,
+                request_model=PixverseTextVideoRequest,
+                response_model=PixverseVideoResponse,
+            ),
+            request=PixverseTextVideoRequest(
                prompt=prompt,
                aspect_ratio=aspect_ratio,
                quality=quality,
@@ -165,14 +207,20 @@ class PixverseTextToVideoNode(IO.ComfyNode):
                template_id=pixverse_template,
                seed=seed,
            ),
+            auth_kwargs=auth,
        )
+        response_api = await operation.execute()
+
        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")

-        response_poll = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/pixverse/video/result/{response_api.Resp.video_id}"),
-            response_model=PixverseGenerationStatusResponse,
+        operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path=f"/proxy/pixverse/video/result/{response_api.Resp.video_id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=PixverseGenerationStatusResponse,
+            ),
            completed_statuses=[PixverseStatus.successful],
            failed_statuses=[
                PixverseStatus.contents_moderation,
@@ -180,19 +228,30 @@ class PixverseTextToVideoNode(IO.ComfyNode):
                PixverseStatus.deleted,
            ],
            status_extractor=lambda x: x.Resp.status,
+            auth_kwargs=auth,
+            node_id=cls.hidden.unique_id,
+            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_T2V,
        )
-        return IO.NodeOutput(await download_url_to_video_output(response_poll.Resp.url))
+        response_poll = await operation.execute()
+
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return IO.NodeOutput(VideoFromFile(BytesIO(await vid_response.content.read())))


 class PixverseImageToVideoNode(IO.ComfyNode):
+    """
+    Generates videos based on prompt and output_size.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="PixverseImageToVideoNode",
            display_name="PixVerse Image to Video",
            category="api node/video/PixVerse",
-            description="Generates videos based on prompt and output_size.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.Image.Input("image"),
                IO.String.Input(
@@ -257,7 +316,11 @@ class PixverseImageToVideoNode(IO.ComfyNode):
        pixverse_template: int = None,
    ) -> IO.NodeOutput:
        validate_string(prompt, strip_whitespace=False)
-        img_id = await upload_image_to_pixverse(cls, image)
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        img_id = await upload_image_to_pixverse(image, auth_kwargs=auth)

        # 1080p is limited to 5 seconds duration
        # only normal motion_mode supported for 1080p or for non-5 second duration
@@ -267,11 +330,14 @@ class PixverseImageToVideoNode(IO.ComfyNode):
        elif duration_seconds != PixverseDuration.dur_5:
            motion_mode = PixverseMotionMode.normal

-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/pixverse/video/img/generate", method="POST"),
-            response_model=PixverseVideoResponse,
-            data=PixverseImageVideoRequest(
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/pixverse/video/img/generate",
+                method=HttpMethod.POST,
+                request_model=PixverseImageVideoRequest,
+                response_model=PixverseVideoResponse,
+            ),
+            request=PixverseImageVideoRequest(
                img_id=img_id,
                prompt=prompt,
                quality=quality,
@@ -281,15 +347,20 @@ class PixverseImageToVideoNode(IO.ComfyNode):
                template_id=pixverse_template,
                seed=seed,
            ),
+            auth_kwargs=auth,
        )
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")

-        response_poll = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/pixverse/video/result/{response_api.Resp.video_id}"),
-            response_model=PixverseGenerationStatusResponse,
+        operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path=f"/proxy/pixverse/video/result/{response_api.Resp.video_id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=PixverseGenerationStatusResponse,
+            ),
            completed_statuses=[PixverseStatus.successful],
            failed_statuses=[
                PixverseStatus.contents_moderation,
@@ -297,19 +368,30 @@ class PixverseImageToVideoNode(IO.ComfyNode):
                PixverseStatus.deleted,
            ],
            status_extractor=lambda x: x.Resp.status,
+            auth_kwargs=auth,
+            node_id=cls.hidden.unique_id,
+            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_I2V,
        )
-        return IO.NodeOutput(await download_url_to_video_output(response_poll.Resp.url))
+        response_poll = await operation.execute()
+
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return IO.NodeOutput(VideoFromFile(BytesIO(await vid_response.content.read())))


 class PixverseTransitionVideoNode(IO.ComfyNode):
+    """
+    Generates videos based on prompt and output_size.
+    """
+
    @classmethod
    def define_schema(cls) -> IO.Schema:
        return IO.Schema(
            node_id="PixverseTransitionVideoNode",
            display_name="PixVerse Transition Video",
            category="api node/video/PixVerse",
-            description="Generates videos based on prompt and output_size.",
+            description=cleandoc(cls.__doc__ or ""),
            inputs=[
                IO.Image.Input("first_frame"),
                IO.Image.Input("last_frame"),
@@ -370,8 +452,12 @@ class PixverseTransitionVideoNode(IO.ComfyNode):
        negative_prompt: str = None,
    ) -> IO.NodeOutput:
        validate_string(prompt, strip_whitespace=False)
-        first_frame_id = await upload_image_to_pixverse(cls, first_frame)
-        last_frame_id = await upload_image_to_pixverse(cls, last_frame)
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        first_frame_id = await upload_image_to_pixverse(first_frame, auth_kwargs=auth)
+        last_frame_id = await upload_image_to_pixverse(last_frame, auth_kwargs=auth)

        # 1080p is limited to 5 seconds duration
        # only normal motion_mode supported for 1080p or for non-5 second duration
@@ -381,11 +467,14 @@ class PixverseTransitionVideoNode(IO.ComfyNode):
        elif duration_seconds != PixverseDuration.dur_5:
            motion_mode = PixverseMotionMode.normal

-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/pixverse/video/transition/generate", method="POST"),
-            response_model=PixverseVideoResponse,
-            data=PixverseTransitionVideoRequest(
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/pixverse/video/transition/generate",
+                method=HttpMethod.POST,
+                request_model=PixverseTransitionVideoRequest,
+                response_model=PixverseVideoResponse,
+            ),
+            request=PixverseTransitionVideoRequest(
                first_frame_img=first_frame_id,
                last_frame_img=last_frame_id,
                prompt=prompt,
@@ -395,15 +484,20 @@ class PixverseTransitionVideoNode(IO.ComfyNode):
                negative_prompt=negative_prompt if negative_prompt else None,
                seed=seed,
            ),
+            auth_kwargs=auth,
        )
+        response_api = await operation.execute()

        if response_api.Resp is None:
            raise Exception(f"PixVerse request failed: '{response_api.ErrMsg}'")

-        response_poll = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/pixverse/video/result/{response_api.Resp.video_id}"),
-            response_model=PixverseGenerationStatusResponse,
+        operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path=f"/proxy/pixverse/video/result/{response_api.Resp.video_id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=PixverseGenerationStatusResponse,
+            ),
            completed_statuses=[PixverseStatus.successful],
            failed_statuses=[
                PixverseStatus.contents_moderation,
@@ -411,9 +505,16 @@ class PixverseTransitionVideoNode(IO.ComfyNode):
                PixverseStatus.deleted,
            ],
            status_extractor=lambda x: x.Resp.status,
+            auth_kwargs=auth,
+            node_id=cls.hidden.unique_id,
+            result_url_extractor=get_video_url_from_response,
            estimated_duration=AVERAGE_DURATION_T2V,
        )
-        return IO.NodeOutput(await download_url_to_video_output(response_poll.Resp.url))
+        response_poll = await operation.execute()
+
+        async with aiohttp.ClientSession() as session:
+            async with session.get(response_poll.Resp.url) as vid_response:
+                return IO.NodeOutput(VideoFromFile(BytesIO(await vid_response.content.read())))


 class PixVerseExtension(ComfyExtension):
--- a/comfy_api_nodes/nodes_recraft.py
+++ b/comfy_api_nodes/nodes_recraft.py
--- a/comfy_api_nodes/nodes_rodin.py
+++ b/comfy_api_nodes/nodes_rodin.py
@@ -5,9 +5,12 @@ Rodin API docs: https://developer.hyper3d.ai/

 """

+from __future__ import annotations
 from inspect import cleandoc
 import folder_paths as comfy_paths
+import aiohttp
 import os
+import asyncio
 import logging
 import math
 from typing import Optional
@@ -23,11 +26,11 @@ from comfy_api_nodes.apis.rodin_api import (
    Rodin3DDownloadResponse,
    JobStatus,
 )
-from comfy_api_nodes.util import (
-    sync_op,
-    poll_op,
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
-    download_url_to_bytesio,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
 )
 from comfy_api.latest import ComfyExtension, IO

@@ -118,31 +121,35 @@ def tensor_to_filelike(tensor, max_pixels: int = 2048*2048):


 async def create_generate_task(
-    cls: type[IO.ComfyNode],
    images=None,
    seed=1,
    material="PBR",
    quality_override=18000,
    tier="Regular",
    mesh_mode="Quad",
-    ta_pose: bool = False,
+    TAPose = False,
+    auth_kwargs: Optional[dict[str, str]] = None,
 ):
    if images is None:
        raise Exception("Rodin 3D generate requires at least 1 image.")
    if len(images) > 5:
        raise Exception("Rodin 3D generate requires up to 5 image.")

-    response = await sync_op(
-        cls,
-        ApiEndpoint(path="/proxy/rodin/api/v2/rodin", method="POST"),
-        response_model=Rodin3DGenerateResponse,
-        data=Rodin3DGenerateRequest(
+    path = "/proxy/rodin/api/v2/rodin"
+    operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path=path,
+            method=HttpMethod.POST,
+            request_model=Rodin3DGenerateRequest,
+            response_model=Rodin3DGenerateResponse,
+        ),
+        request=Rodin3DGenerateRequest(
            seed=seed,
            tier=tier,
            material=material,
            quality_override=quality_override,
            mesh_mode=mesh_mode,
-            TAPose=ta_pose,
+            TAPose=TAPose,
        ),
        files=[
            (
@@ -152,8 +159,11 @@ async def create_generate_task(
            for image in images if image is not None
        ],
        content_type="multipart/form-data",
+        auth_kwargs=auth_kwargs,
    )

+    response = await operation.execute()
+
    if hasattr(response, "error"):
        error_message = f"Rodin3D Create 3D generate Task Failed. Message: {response.message}, error: {response.error}"
        logging.error(error_message)
@@ -177,46 +187,75 @@ def check_rodin_status(response: Rodin3DCheckStatusResponse) -> str:
        return "DONE"
    return "Generating"

-def extract_progress(response: Rodin3DCheckStatusResponse) -> Optional[int]:
-    if not response.jobs:
-        return None
-    completed_count = sum(1 for job in response.jobs if job.status == JobStatus.Done)
-    return int((completed_count / len(response.jobs)) * 100)

-
-async def poll_for_task_status(subscription_key: str, cls: type[IO.ComfyNode]) -> Rodin3DCheckStatusResponse:
-    logging.info("[ Rodin3D API - CheckStatus ] Generate Start!")
-    return await poll_op(
-        cls,
-        ApiEndpoint(path="/proxy/rodin/api/v2/status", method="POST"),
-        response_model=Rodin3DCheckStatusResponse,
-        data=Rodin3DCheckStatusRequest(subscription_key=subscription_key),
+async def poll_for_task_status(
+    subscription_key, auth_kwargs: Optional[dict[str, str]] = None,
+) -> Rodin3DCheckStatusResponse:
+    poll_operation = PollingOperation(
+        poll_endpoint=ApiEndpoint(
+            path="/proxy/rodin/api/v2/status",
+            method=HttpMethod.POST,
+            request_model=Rodin3DCheckStatusRequest,
+            response_model=Rodin3DCheckStatusResponse,
+        ),
+        request=Rodin3DCheckStatusRequest(subscription_key=subscription_key),
+        completed_statuses=["DONE"],
+        failed_statuses=["FAILED"],
        status_extractor=check_rodin_status,
-        progress_extractor=extract_progress,
+        poll_interval=3.0,
+        auth_kwargs=auth_kwargs,
    )
+    logging.info("[ Rodin3D API - CheckStatus ] Generate Start!")
+    return await poll_operation.execute()


-async def get_rodin_download_list(uuid: str, cls: type[IO.ComfyNode]) -> Rodin3DDownloadResponse:
+async def get_rodin_download_list(uuid, auth_kwargs: Optional[dict[str, str]] = None) -> Rodin3DDownloadResponse:
    logging.info("[ Rodin3D API - Downloading ] Generate Successfully!")
-    return await sync_op(
-        cls,
-        ApiEndpoint(path="/proxy/rodin/api/v2/download", method="POST"),
-        response_model=Rodin3DDownloadResponse,
-        data=Rodin3DDownloadRequest(task_uuid=uuid),
-        monitor_progress=False,
+    operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path="/proxy/rodin/api/v2/download",
+            method=HttpMethod.POST,
+            request_model=Rodin3DDownloadRequest,
+            response_model=Rodin3DDownloadResponse,
+        ),
+        request=Rodin3DDownloadRequest(task_uuid=uuid),
+        auth_kwargs=auth_kwargs,
    )
+    return await operation.execute()


-async def download_files(url_list, task_uuid: str):
-    result_folder_name = f"Rodin3D_{task_uuid}"
-    save_path = os.path.join(comfy_paths.get_output_directory(), result_folder_name)
+async def download_files(url_list, task_uuid):
+    save_path = os.path.join(comfy_paths.get_output_directory(), f"Rodin3D_{task_uuid}")
    os.makedirs(save_path, exist_ok=True)
    model_file_path = None
-    for i in url_list.list:
-        file_path = os.path.join(save_path, i.name)
-        if file_path.endswith(".glb"):
-            model_file_path = os.path.join(result_folder_name, i.name)
-        await download_url_to_bytesio(i.url, file_path)
+    async with aiohttp.ClientSession() as session:
+        for i in url_list.list:
+            url = i.url
+            file_name = i.name
+            file_path = os.path.join(save_path, file_name)
+            if file_path.endswith(".glb"):
+                model_file_path = file_path
+            logging.info("[ Rodin3D API - download_files ] Downloading file: %s", file_path)
+            max_retries = 5
+            for attempt in range(max_retries):
+                try:
+                    async with session.get(url) as resp:
+                        resp.raise_for_status()
+                        with open(file_path, "wb") as f:
+                            async for chunk in resp.content.iter_chunked(32 * 1024):
+                                f.write(chunk)
+                    break
+                except Exception as e:
+                    logging.info("[ Rodin3D API - download_files ] Error downloading %s:%s", file_path, str(e))
+                    if attempt < max_retries - 1:
+                        logging.info("Retrying...")
+                        await asyncio.sleep(2)
+                    else:
+                        logging.info(
+                            "[ Rodin3D API - download_files ] Failed to download %s after %s attempts.",
+                            file_path,
+                            max_retries,
+                        )
    return model_file_path


@@ -238,7 +277,6 @@ class Rodin3D_Regular(IO.ComfyNode):
            hidden=[
                IO.Hidden.auth_token_comfy_org,
                IO.Hidden.api_key_comfy_org,
-                IO.Hidden.unique_id,
            ],
            is_api_node=True,
        )
@@ -257,17 +295,21 @@ class Rodin3D_Regular(IO.ComfyNode):
        for i in range(num_images):
            m_images.append(Images[i])
        mesh_mode, quality_override = get_quality_mode(Polygon_count)
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        task_uuid, subscription_key = await create_generate_task(
-            cls,
            images=m_images,
            seed=Seed,
            material=Material_Type,
            quality_override=quality_override,
            tier=tier,
            mesh_mode=mesh_mode,
+            auth_kwargs=auth,
        )
-        await poll_for_task_status(subscription_key, cls)
-        download_list = await get_rodin_download_list(task_uuid, cls)
+        await poll_for_task_status(subscription_key, auth_kwargs=auth)
+        download_list = await get_rodin_download_list(task_uuid, auth_kwargs=auth)
        model = await download_files(download_list, task_uuid)

        return IO.NodeOutput(model)
@@ -291,7 +333,6 @@ class Rodin3D_Detail(IO.ComfyNode):
            hidden=[
                IO.Hidden.auth_token_comfy_org,
                IO.Hidden.api_key_comfy_org,
-                IO.Hidden.unique_id,
            ],
            is_api_node=True,
        )
@@ -310,17 +351,21 @@ class Rodin3D_Detail(IO.ComfyNode):
        for i in range(num_images):
            m_images.append(Images[i])
        mesh_mode, quality_override = get_quality_mode(Polygon_count)
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        task_uuid, subscription_key = await create_generate_task(
-            cls,
            images=m_images,
            seed=Seed,
            material=Material_Type,
            quality_override=quality_override,
            tier=tier,
            mesh_mode=mesh_mode,
+            auth_kwargs=auth,
        )
-        await poll_for_task_status(subscription_key, cls)
-        download_list = await get_rodin_download_list(task_uuid, cls)
+        await poll_for_task_status(subscription_key, auth_kwargs=auth)
+        download_list = await get_rodin_download_list(task_uuid, auth_kwargs=auth)
        model = await download_files(download_list, task_uuid)

        return IO.NodeOutput(model)
@@ -344,7 +389,6 @@ class Rodin3D_Smooth(IO.ComfyNode):
            hidden=[
                IO.Hidden.auth_token_comfy_org,
                IO.Hidden.api_key_comfy_org,
-                IO.Hidden.unique_id,
            ],
            is_api_node=True,
        )
@@ -357,22 +401,27 @@ class Rodin3D_Smooth(IO.ComfyNode):
        Material_Type,
        Polygon_count,
    ) -> IO.NodeOutput:
+        tier = "Smooth"
        num_images = Images.shape[0]
        m_images = []
        for i in range(num_images):
            m_images.append(Images[i])
        mesh_mode, quality_override = get_quality_mode(Polygon_count)
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        task_uuid, subscription_key = await create_generate_task(
-            cls,
            images=m_images,
            seed=Seed,
            material=Material_Type,
            quality_override=quality_override,
-            tier="Smooth",
+            tier=tier,
            mesh_mode=mesh_mode,
+            auth_kwargs=auth,
        )
-        await poll_for_task_status(subscription_key, cls)
-        download_list = await get_rodin_download_list(task_uuid, cls)
+        await poll_for_task_status(subscription_key, auth_kwargs=auth)
+        download_list = await get_rodin_download_list(task_uuid, auth_kwargs=auth)
        model = await download_files(download_list, task_uuid)

        return IO.NodeOutput(model)
@@ -403,7 +452,6 @@ class Rodin3D_Sketch(IO.ComfyNode):
            hidden=[
                IO.Hidden.auth_token_comfy_org,
                IO.Hidden.api_key_comfy_org,
-                IO.Hidden.unique_id,
            ],
            is_api_node=True,
        )
@@ -414,21 +462,29 @@ class Rodin3D_Sketch(IO.ComfyNode):
        Images,
        Seed,
    ) -> IO.NodeOutput:
+        tier = "Sketch"
        num_images = Images.shape[0]
        m_images = []
        for i in range(num_images):
            m_images.append(Images[i])
+        material_type = "PBR"
+        quality_override = 18000
+        mesh_mode = "Quad"
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        task_uuid, subscription_key = await create_generate_task(
-            cls,
            images=m_images,
            seed=Seed,
-            material="PBR",
-            quality_override=18000,
-            tier="Sketch",
-            mesh_mode="Quad",
+            material=material_type,
+            quality_override=quality_override,
+            tier=tier,
+            mesh_mode=mesh_mode,
+            auth_kwargs=auth,
        )
-        await poll_for_task_status(subscription_key, cls)
-        download_list = await get_rodin_download_list(task_uuid, cls)
+        await poll_for_task_status(subscription_key, auth_kwargs=auth)
+        download_list = await get_rodin_download_list(task_uuid, auth_kwargs=auth)
        model = await download_files(download_list, task_uuid)

        return IO.NodeOutput(model)
@@ -467,7 +523,6 @@ class Rodin3D_Gen2(IO.ComfyNode):
            hidden=[
                IO.Hidden.auth_token_comfy_org,
                IO.Hidden.api_key_comfy_org,
-                IO.Hidden.unique_id,
            ],
            is_api_node=True,
        )
@@ -487,18 +542,22 @@ class Rodin3D_Gen2(IO.ComfyNode):
        for i in range(num_images):
            m_images.append(Images[i])
        mesh_mode, quality_override = get_quality_mode(Polygon_count)
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        task_uuid, subscription_key = await create_generate_task(
-            cls,
            images=m_images,
            seed=Seed,
            material=Material_Type,
            quality_override=quality_override,
            tier=tier,
            mesh_mode=mesh_mode,
-            ta_pose=TAPose,
+            TAPose=TAPose,
+            auth_kwargs=auth,
        )
-        await poll_for_task_status(subscription_key, cls)
-        download_list = await get_rodin_download_list(task_uuid, cls)
+        await poll_for_task_status(subscription_key, auth_kwargs=auth)
+        download_list = await get_rodin_download_list(task_uuid, auth_kwargs=auth)
        model = await download_files(download_list, task_uuid)

        return IO.NodeOutput(model)
--- a/comfy_api_nodes/nodes_runway.py
+++ b/comfy_api_nodes/nodes_runway.py
@@ -11,7 +11,7 @@ User Guides:

 """

-from typing import Union, Optional
+from typing import Union, Optional, Any
 from typing_extensions import override
 from enum import Enum

@@ -21,6 +21,7 @@ from comfy_api_nodes.apis import (
    RunwayImageToVideoRequest,
    RunwayImageToVideoResponse,
    RunwayTaskStatusResponse as TaskStatusResponse,
+    RunwayTaskStatusEnum as TaskStatus,
    RunwayModelEnum as Model,
    RunwayDurationEnum as Duration,
    RunwayAspectRatioEnum as AspectRatio,
@@ -32,20 +33,23 @@ from comfy_api_nodes.apis import (
    ReferenceImage,
    RunwayTextToImageAspectRatioEnum,
 )
-from comfy_api_nodes.util import (
-    image_tensor_pair_to_batch,
-    validate_string,
-    validate_image_dimensions,
-    validate_image_aspect_ratio,
+from comfy_api_nodes.apis.client import (
+    ApiEndpoint,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    EmptyRequest,
+)
+from comfy_api_nodes.apinode_utils import (
    upload_images_to_comfyapi,
    download_url_to_video_output,
+    image_tensor_pair_to_batch,
+    validate_string,
    download_url_to_image_tensor,
-    ApiEndpoint,
-    sync_op,
-    poll_op,
 )
 from comfy_api.input_impl import VideoFromFile
 from comfy_api.latest import ComfyExtension, IO
+from comfy_api_nodes.util.validation_utils import validate_image_dimensions, validate_image_aspect_ratio

 PATH_IMAGE_TO_VIDEO = "/proxy/runway/image_to_video"
 PATH_TEXT_TO_IMAGE = "/proxy/runway/text_to_image"
@@ -87,6 +91,31 @@ def get_video_url_from_task_status(response: TaskStatusResponse) -> Union[str, N
    return None


+async def poll_until_finished(
+    auth_kwargs: dict[str, str],
+    api_endpoint: ApiEndpoint[Any, TaskStatusResponse],
+    estimated_duration: Optional[int] = None,
+    node_id: Optional[str] = None,
+) -> TaskStatusResponse:
+    """Polls the Runway API endpoint until the task reaches a terminal state, then returns the response."""
+    return await PollingOperation(
+        poll_endpoint=api_endpoint,
+        completed_statuses=[
+            TaskStatus.SUCCEEDED.value,
+        ],
+        failed_statuses=[
+            TaskStatus.FAILED.value,
+            TaskStatus.CANCELLED.value,
+        ],
+        status_extractor=lambda response: response.status.value,
+        auth_kwargs=auth_kwargs,
+        result_url_extractor=get_video_url_from_task_status,
+        estimated_duration=estimated_duration,
+        node_id=node_id,
+        progress_extractor=extract_progress_from_task_status,
+    ).execute()
+
+
 def extract_progress_from_task_status(
    response: TaskStatusResponse,
 ) -> Union[float, None]:
@@ -103,32 +132,42 @@ def get_image_url_from_task_status(response: TaskStatusResponse) -> Union[str, N


 async def get_response(
-    cls: type[IO.ComfyNode], task_id: str, estimated_duration: Optional[int] = None
+    task_id: str, auth_kwargs: dict[str, str], node_id: Optional[str] = None, estimated_duration: Optional[int] = None
 ) -> TaskStatusResponse:
    """Poll the task status until it is finished then get the response."""
-    return await poll_op(
-        cls,
-        ApiEndpoint(path=f"{PATH_GET_TASK_STATUS}/{task_id}"),
-        response_model=TaskStatusResponse,
-        status_extractor=lambda r: r.status.value,
+    return await poll_until_finished(
+        auth_kwargs,
+        ApiEndpoint(
+            path=f"{PATH_GET_TASK_STATUS}/{task_id}",
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=TaskStatusResponse,
+        ),
        estimated_duration=estimated_duration,
-        progress_extractor=extract_progress_from_task_status,
+        node_id=node_id,
    )


 async def generate_video(
-    cls: type[IO.ComfyNode],
    request: RunwayImageToVideoRequest,
+    auth_kwargs: dict[str, str],
+    node_id: Optional[str] = None,
    estimated_duration: Optional[int] = None,
 ) -> VideoFromFile:
-    initial_response = await sync_op(
-        cls,
-        endpoint=ApiEndpoint(path=PATH_IMAGE_TO_VIDEO, method="POST"),
-        response_model=RunwayImageToVideoResponse,
-        data=request,
+    initial_operation = SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path=PATH_IMAGE_TO_VIDEO,
+            method=HttpMethod.POST,
+            request_model=RunwayImageToVideoRequest,
+            response_model=RunwayImageToVideoResponse,
+        ),
+        request=request,
+        auth_kwargs=auth_kwargs,
    )

-    final_response = await get_response(cls, initial_response.id, estimated_duration)
+    initial_response = await initial_operation.execute()
+
+    final_response = await get_response(initial_response.id, auth_kwargs, node_id, estimated_duration)
    if not final_response.output:
        raise RunwayApiError("Runway task succeeded but no video data found in response.")

@@ -145,9 +184,9 @@ class RunwayImageToVideoNodeGen3a(IO.ComfyNode):
            display_name="Runway Image to Video (Gen3a Turbo)",
            category="api node/video/Runway",
            description="Generate a video from a single starting frame using Gen3a Turbo model. "
-            "Before diving in, review these best practices to ensure that "
-            "your input selections will set your generation up for success: "
-            "https://help.runwayml.com/hc/en-us/articles/33927968552339-Creating-with-Act-One-on-Gen-3-Alpha-and-Turbo.",
+                        "Before diving in, review these best practices to ensure that "
+                        "your input selections will set your generation up for success: "
+                        "https://help.runwayml.com/hc/en-us/articles/33927968552339-Creating-with-Act-One-on-Gen-3-Alpha-and-Turbo.",
            inputs=[
                IO.String.Input(
                    "prompt",
@@ -200,18 +239,22 @@ class RunwayImageToVideoNodeGen3a(IO.ComfyNode):
    ) -> IO.NodeOutput:
        validate_string(prompt, min_length=1)
        validate_image_dimensions(start_frame, max_width=7999, max_height=7999)
-        validate_image_aspect_ratio(start_frame, (1, 2), (2, 1))
+        validate_image_aspect_ratio(start_frame, min_aspect_ratio=0.5, max_aspect_ratio=2.0)
+
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }

        download_urls = await upload_images_to_comfyapi(
-            cls,
            start_frame,
            max_images=1,
            mime_type="image/png",
+            auth_kwargs=auth_kwargs,
        )

        return IO.NodeOutput(
            await generate_video(
-                cls,
                RunwayImageToVideoRequest(
                    promptText=prompt,
                    seed=seed,
@@ -219,9 +262,15 @@ class RunwayImageToVideoNodeGen3a(IO.ComfyNode):
                    duration=Duration(duration),
                    ratio=AspectRatio(ratio),
                    promptImage=RunwayPromptImageObject(
-                        root=[RunwayPromptImageDetailedObject(uri=str(download_urls[0]), position="first")]
+                        root=[
+                            RunwayPromptImageDetailedObject(
+                                uri=str(download_urls[0]), position="first"
+                            )
+                        ]
                    ),
                ),
+                auth_kwargs=auth_kwargs,
+                node_id=cls.hidden.unique_id,
            )
        )

@@ -235,9 +284,9 @@ class RunwayImageToVideoNodeGen4(IO.ComfyNode):
            display_name="Runway Image to Video (Gen4 Turbo)",
            category="api node/video/Runway",
            description="Generate a video from a single starting frame using Gen4 Turbo model. "
-            "Before diving in, review these best practices to ensure that "
-            "your input selections will set your generation up for success: "
-            "https://help.runwayml.com/hc/en-us/articles/37327109429011-Creating-with-Gen-4-Video.",
+                        "Before diving in, review these best practices to ensure that "
+                        "your input selections will set your generation up for success: "
+                        "https://help.runwayml.com/hc/en-us/articles/37327109429011-Creating-with-Gen-4-Video.",
            inputs=[
                IO.String.Input(
                    "prompt",
@@ -290,18 +339,22 @@ class RunwayImageToVideoNodeGen4(IO.ComfyNode):
    ) -> IO.NodeOutput:
        validate_string(prompt, min_length=1)
        validate_image_dimensions(start_frame, max_width=7999, max_height=7999)
-        validate_image_aspect_ratio(start_frame, (1, 2), (2, 1))
+        validate_image_aspect_ratio(start_frame, min_aspect_ratio=0.5, max_aspect_ratio=2.0)
+
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }

        download_urls = await upload_images_to_comfyapi(
-            cls,
            start_frame,
            max_images=1,
            mime_type="image/png",
+            auth_kwargs=auth_kwargs,
        )

        return IO.NodeOutput(
            await generate_video(
-                cls,
                RunwayImageToVideoRequest(
                    promptText=prompt,
                    seed=seed,
@@ -309,9 +362,15 @@ class RunwayImageToVideoNodeGen4(IO.ComfyNode):
                    duration=Duration(duration),
                    ratio=AspectRatio(ratio),
                    promptImage=RunwayPromptImageObject(
-                        root=[RunwayPromptImageDetailedObject(uri=str(download_urls[0]), position="first")]
+                        root=[
+                            RunwayPromptImageDetailedObject(
+                                uri=str(download_urls[0]), position="first"
+                            )
+                        ]
                    ),
                ),
+                auth_kwargs=auth_kwargs,
+                node_id=cls.hidden.unique_id,
                estimated_duration=AVERAGE_DURATION_FLF_SECONDS,
            )
        )
@@ -326,12 +385,12 @@ class RunwayFirstLastFrameNode(IO.ComfyNode):
            display_name="Runway First-Last-Frame to Video",
            category="api node/video/Runway",
            description="Upload first and last keyframes, draft a prompt, and generate a video. "
-            "More complex transitions, such as cases where the Last frame is completely different "
-            "from the First frame, may benefit from the longer 10s duration. "
-            "This would give the generation more time to smoothly transition between the two inputs. "
-            "Before diving in, review these best practices to ensure that your input selections "
-            "will set your generation up for success: "
-            "https://help.runwayml.com/hc/en-us/articles/34170748696595-Creating-with-Keyframes-on-Gen-3.",
+                        "More complex transitions, such as cases where the Last frame is completely different "
+                        "from the First frame, may benefit from the longer 10s duration. "
+                        "This would give the generation more time to smoothly transition between the two inputs. "
+                        "Before diving in, review these best practices to ensure that your input selections "
+                        "will set your generation up for success: "
+                        "https://help.runwayml.com/hc/en-us/articles/34170748696595-Creating-with-Keyframes-on-Gen-3.",
            inputs=[
                IO.String.Input(
                    "prompt",
@@ -390,22 +449,26 @@ class RunwayFirstLastFrameNode(IO.ComfyNode):
        validate_string(prompt, min_length=1)
        validate_image_dimensions(start_frame, max_width=7999, max_height=7999)
        validate_image_dimensions(end_frame, max_width=7999, max_height=7999)
-        validate_image_aspect_ratio(start_frame, (1, 2), (2, 1))
-        validate_image_aspect_ratio(end_frame, (1, 2), (2, 1))
+        validate_image_aspect_ratio(start_frame, min_aspect_ratio=0.5, max_aspect_ratio=2.0)
+        validate_image_aspect_ratio(end_frame, min_aspect_ratio=0.5, max_aspect_ratio=2.0)
+
+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }

        stacked_input_images = image_tensor_pair_to_batch(start_frame, end_frame)
        download_urls = await upload_images_to_comfyapi(
-            cls,
            stacked_input_images,
            max_images=2,
            mime_type="image/png",
+            auth_kwargs=auth_kwargs,
        )
        if len(download_urls) != 2:
            raise RunwayApiError("Failed to upload one or more images to comfy api.")

        return IO.NodeOutput(
            await generate_video(
-                cls,
                RunwayImageToVideoRequest(
                    promptText=prompt,
                    seed=seed,
@@ -414,11 +477,17 @@ class RunwayFirstLastFrameNode(IO.ComfyNode):
                    ratio=AspectRatio(ratio),
                    promptImage=RunwayPromptImageObject(
                        root=[
-                            RunwayPromptImageDetailedObject(uri=str(download_urls[0]), position="first"),
-                            RunwayPromptImageDetailedObject(uri=str(download_urls[1]), position="last"),
+                            RunwayPromptImageDetailedObject(
+                                uri=str(download_urls[0]), position="first"
+                            ),
+                            RunwayPromptImageDetailedObject(
+                                uri=str(download_urls[1]), position="last"
+                            ),
                        ]
                    ),
                ),
+                auth_kwargs=auth_kwargs,
+                node_id=cls.hidden.unique_id,
                estimated_duration=AVERAGE_DURATION_FLF_SECONDS,
            )
        )
@@ -433,7 +502,7 @@ class RunwayTextToImageNode(IO.ComfyNode):
            display_name="Runway Text to Image",
            category="api node/image/Runway",
            description="Generate an image from a text prompt using Runway's Gen 4 model. "
-            "You can also include reference image to guide the generation.",
+                        "You can also include reference image to guide the generation.",
            inputs=[
                IO.String.Input(
                    "prompt",
@@ -471,34 +540,49 @@ class RunwayTextToImageNode(IO.ComfyNode):
    ) -> IO.NodeOutput:
        validate_string(prompt, min_length=1)

+        auth_kwargs = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
        # Prepare reference images if provided
        reference_images = None
        if reference_image is not None:
            validate_image_dimensions(reference_image, max_width=7999, max_height=7999)
-            validate_image_aspect_ratio(reference_image, (1, 2), (2, 1))
+            validate_image_aspect_ratio(reference_image, min_aspect_ratio=0.5, max_aspect_ratio=2.0)
            download_urls = await upload_images_to_comfyapi(
-                cls,
                reference_image,
                max_images=1,
                mime_type="image/png",
+                auth_kwargs=auth_kwargs,
            )
            reference_images = [ReferenceImage(uri=str(download_urls[0]))]

-        initial_response = await sync_op(
-            cls,
-            endpoint=ApiEndpoint(path=PATH_TEXT_TO_IMAGE, method="POST"),
-            response_model=RunwayTextToImageResponse,
-            data=RunwayTextToImageRequest(
-                promptText=prompt,
-                model=Model4.gen4_image,
-                ratio=ratio,
-                referenceImages=reference_images,
-            ),
+        request = RunwayTextToImageRequest(
+            promptText=prompt,
+            model=Model4.gen4_image,
+            ratio=ratio,
+            referenceImages=reference_images,
        )

+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=PATH_TEXT_TO_IMAGE,
+                method=HttpMethod.POST,
+                request_model=RunwayTextToImageRequest,
+                response_model=RunwayTextToImageResponse,
+            ),
+            request=request,
+            auth_kwargs=auth_kwargs,
+        )
+
+        initial_response = await initial_operation.execute()
+
+        # Poll for completion
        final_response = await get_response(
-            cls,
            initial_response.id,
+            auth_kwargs=auth_kwargs,
+            node_id=cls.hidden.unique_id,
            estimated_duration=AVERAGE_DURATION_T2I_SECONDS,
        )
        if not final_response.output:
@@ -517,6 +601,5 @@ class RunwayExtension(ComfyExtension):
            RunwayTextToImageNode,
        ]

-
 async def comfy_entrypoint() -> RunwayExtension:
    return RunwayExtension()
--- a/comfy_api_nodes/nodes_sora.py
+++ b/comfy_api_nodes/nodes_sora.py
@@ -1,20 +1,23 @@
 from typing import Optional
+from typing_extensions import override

 import torch
 from pydantic import BaseModel, Field
-from typing_extensions import override
-
-from comfy_api.latest import IO, ComfyExtension
-from comfy_api_nodes.util import (
+from comfy_api.latest import ComfyExtension, IO
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    EmptyRequest,
+)
+from comfy_api_nodes.util.validation_utils import get_number_of_images
+
+from comfy_api_nodes.apinode_utils import (
    download_url_to_video_output,
-    get_number_of_images,
-    poll_op,
-    sync_op,
    tensor_to_bytesio,
 )

-
 class Sora2GenerationRequest(BaseModel):
    prompt: str = Field(...)
    model: str = Field(...)
@@ -77,7 +80,7 @@ class OpenAIVideoSora2(IO.ComfyNode):
                    control_after_generate=True,
                    optional=True,
                    tooltip="Seed to determine if node should re-run; "
-                    "actual results are nondeterministic regardless of seed.",
+                            "actual results are nondeterministic regardless of seed.",
                ),
            ],
            outputs=[
@@ -108,34 +111,55 @@ class OpenAIVideoSora2(IO.ComfyNode):
            if get_number_of_images(image) != 1:
                raise ValueError("Currently only one input image is supported.")
            files_input = {"input_reference": ("image.png", tensor_to_bytesio(image), "image/png")}
-        initial_response = await sync_op(
-            cls,
-            endpoint=ApiEndpoint(path="/proxy/openai/v1/videos", method="POST"),
-            data=Sora2GenerationRequest(
-                model=model,
-                prompt=prompt,
-                seconds=str(duration),
-                size=size,
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        payload = Sora2GenerationRequest(
+            model=model,
+            prompt=prompt,
+            seconds=str(duration),
+            size=size,
+        )
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/openai/v1/videos",
+                method=HttpMethod.POST,
+                request_model=Sora2GenerationRequest,
+                response_model=Sora2GenerationResponse
            ),
+            request=payload,
            files=files_input,
-            response_model=Sora2GenerationResponse,
+            auth_kwargs=auth,
            content_type="multipart/form-data",
        )
+        initial_response = await initial_operation.execute()
        if initial_response.error:
-            raise Exception(initial_response.error["message"])
+            raise Exception(initial_response.error.message)

        model_time_multiplier = 1 if model == "sora-2" else 2
-        await poll_op(
-            cls,
-            poll_endpoint=ApiEndpoint(path=f"/proxy/openai/v1/videos/{initial_response.id}"),
-            response_model=Sora2GenerationResponse,
+        poll_operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path=f"/proxy/openai/v1/videos/{initial_response.id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=Sora2GenerationResponse
+            ),
+            completed_statuses=["completed"],
+            failed_statuses=["failed"],
            status_extractor=lambda x: x.status,
+            auth_kwargs=auth,
            poll_interval=8.0,
            max_poll_attempts=160,
-            estimated_duration=int(45 * (duration / 4) * model_time_multiplier),
+            node_id=cls.hidden.unique_id,
+            estimated_duration=45 * (duration / 4) * model_time_multiplier,
        )
+        await poll_operation.execute()
        return IO.NodeOutput(
-            await download_url_to_video_output(f"/proxy/openai/v1/videos/{initial_response.id}/content", cls=cls),
+            await download_url_to_video_output(
+                f"/proxy/openai/v1/videos/{initial_response.id}/content",
+                auth_kwargs=auth,
+            )
        )


--- a/comfy_api_nodes/nodes_stability.py
+++ b/comfy_api_nodes/nodes_stability.py
@@ -20,17 +20,21 @@ from comfy_api_nodes.apis.stability_api import (
    StabilityAudioInpaintRequest,
    StabilityAudioResponse,
 )
-from comfy_api_nodes.util import (
-    validate_audio_duration,
-    validate_string,
-    audio_input_to_mp3,
+from comfy_api_nodes.apis.client import (
+    ApiEndpoint,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    EmptyRequest,
+)
+from comfy_api_nodes.apinode_utils import (
    bytesio_to_image_tensor,
    tensor_to_bytesio,
+    validate_string,
    audio_bytes_to_audio_input,
-    sync_op,
-    poll_op,
-    ApiEndpoint,
+    audio_input_to_mp3,
 )
+from comfy_api_nodes.util.validation_utils import validate_audio_duration

 import torch
 import base64
@@ -157,11 +161,19 @@ class StabilityStableImageUltraNode(IO.ComfyNode):
            "image": image_binary
        }

-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/stability/v2beta/stable-image/generate/ultra", method="POST"),
-            response_model=StabilityStableUltraResponse,
-            data=StabilityStableUltraRequest(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/stability/v2beta/stable-image/generate/ultra",
+                method=HttpMethod.POST,
+                request_model=StabilityStableUltraRequest,
+                response_model=StabilityStableUltraResponse,
+            ),
+            request=StabilityStableUltraRequest(
                prompt=prompt,
                negative_prompt=negative_prompt,
                aspect_ratio=aspect_ratio,
@@ -171,7 +183,9 @@ class StabilityStableImageUltraNode(IO.ComfyNode):
            ),
            files=files,
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stable Image Ultra generation failed: {response_api.finish_reason}.")
@@ -299,11 +313,19 @@ class StabilityStableImageSD_3_5Node(IO.ComfyNode):
            "image": image_binary
        }

-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/stability/v2beta/stable-image/generate/sd3", method="POST"),
-            response_model=StabilityStableUltraResponse,
-            data=StabilityStable3_5Request(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/stability/v2beta/stable-image/generate/sd3",
+                method=HttpMethod.POST,
+                request_model=StabilityStable3_5Request,
+                response_model=StabilityStableUltraResponse,
+            ),
+            request=StabilityStable3_5Request(
                prompt=prompt,
                negative_prompt=negative_prompt,
                aspect_ratio=aspect_ratio,
@@ -316,7 +338,9 @@ class StabilityStableImageSD_3_5Node(IO.ComfyNode):
            ),
            files=files,
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stable Diffusion 3.5 Image generation failed: {response_api.finish_reason}.")
@@ -403,11 +427,19 @@ class StabilityUpscaleConservativeNode(IO.ComfyNode):
            "image": image_binary
        }

-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/stability/v2beta/stable-image/upscale/conservative", method="POST"),
-            response_model=StabilityStableUltraResponse,
-            data=StabilityUpscaleConservativeRequest(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/stability/v2beta/stable-image/upscale/conservative",
+                method=HttpMethod.POST,
+                request_model=StabilityUpscaleConservativeRequest,
+                response_model=StabilityStableUltraResponse,
+            ),
+            request=StabilityUpscaleConservativeRequest(
                prompt=prompt,
                negative_prompt=negative_prompt,
                creativity=round(creativity,2),
@@ -415,7 +447,9 @@ class StabilityUpscaleConservativeNode(IO.ComfyNode):
            ),
            files=files,
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stability Upscale Conservative generation failed: {response_api.finish_reason}.")
@@ -510,11 +544,19 @@ class StabilityUpscaleCreativeNode(IO.ComfyNode):
            "image": image_binary
        }

-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/stability/v2beta/stable-image/upscale/creative", method="POST"),
-            response_model=StabilityAsyncResponse,
-            data=StabilityUpscaleCreativeRequest(
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/stability/v2beta/stable-image/upscale/creative",
+                method=HttpMethod.POST,
+                request_model=StabilityUpscaleCreativeRequest,
+                response_model=StabilityAsyncResponse,
+            ),
+            request=StabilityUpscaleCreativeRequest(
                prompt=prompt,
                negative_prompt=negative_prompt,
                creativity=round(creativity,2),
@@ -523,15 +565,25 @@ class StabilityUpscaleCreativeNode(IO.ComfyNode):
            ),
            files=files,
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )
+        response_api = await operation.execute()

-        response_poll = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/stability/v2beta/results/{response_api.id}"),
-            response_model=StabilityResultsGetResponse,
+        operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path=f"/proxy/stability/v2beta/results/{response_api.id}",
+                method=HttpMethod.GET,
+                request_model=EmptyRequest,
+                response_model=StabilityResultsGetResponse,
+            ),
            poll_interval=3,
+            completed_statuses=[StabilityPollStatus.finished],
+            failed_statuses=[StabilityPollStatus.failed],
            status_extractor=lambda x: get_async_dummy_status(x),
+            auth_kwargs=auth,
+            node_id=cls.hidden.unique_id,
        )
+        response_poll: StabilityResultsGetResponse = await operation.execute()

        if response_poll.finish_reason != "SUCCESS":
            raise Exception(f"Stability Upscale Creative generation failed: {response_poll.finish_reason}.")
@@ -576,13 +628,24 @@ class StabilityUpscaleFastNode(IO.ComfyNode):
            "image": image_binary
        }

-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/stability/v2beta/stable-image/upscale/fast", method="POST"),
-            response_model=StabilityStableUltraResponse,
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/stability/v2beta/stable-image/upscale/fast",
+                method=HttpMethod.POST,
+                request_model=EmptyRequest,
+                response_model=StabilityStableUltraResponse,
+            ),
+            request=EmptyRequest(),
            files=files,
            content_type="multipart/form-data",
+            auth_kwargs=auth,
        )
+        response_api = await operation.execute()

        if response_api.finish_reason != "SUCCESS":
            raise Exception(f"Stability Upscale Fast failed: {response_api.finish_reason}.")
@@ -654,13 +717,21 @@ class StabilityTextToAudio(IO.ComfyNode):
    async def execute(cls, model: str, prompt: str, duration: int, seed: int, steps: int) -> IO.NodeOutput:
        validate_string(prompt, max_length=10000)
        payload = StabilityTextToAudioRequest(prompt=prompt, model=model, duration=duration, seed=seed, steps=steps)
-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/stability/v2beta/audio/stable-audio-2/text-to-audio", method="POST"),
-            response_model=StabilityAudioResponse,
-            data=payload,
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/stability/v2beta/audio/stable-audio-2/text-to-audio",
+                method=HttpMethod.POST,
+                request_model=StabilityTextToAudioRequest,
+                response_model=StabilityAudioResponse,
+            ),
+            request=payload,
            content_type="multipart/form-data",
+            auth_kwargs= {
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
        )
+        response_api = await operation.execute()
        if not response_api.audio:
            raise ValueError("No audio file was received in response.")
        return IO.NodeOutput(audio_bytes_to_audio_input(base64.b64decode(response_api.audio)))
@@ -743,14 +814,22 @@ class StabilityAudioToAudio(IO.ComfyNode):
        payload = StabilityAudioToAudioRequest(
            prompt=prompt, model=model, duration=duration, seed=seed, steps=steps, strength=strength
        )
-        response_api = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/stability/v2beta/audio/stable-audio-2/audio-to-audio", method="POST"),
-            response_model=StabilityAudioResponse,
-            data=payload,
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/stability/v2beta/audio/stable-audio-2/audio-to-audio",
+                method=HttpMethod.POST,
+                request_model=StabilityAudioToAudioRequest,
+                response_model=StabilityAudioResponse,
+            ),
+            request=payload,
            content_type="multipart/form-data",
            files={"audio": audio_input_to_mp3(audio)},
+            auth_kwargs= {
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
        )
+        response_api = await operation.execute()
        if not response_api.audio:
            raise ValueError("No audio file was received in response.")
        return IO.NodeOutput(audio_bytes_to_audio_input(base64.b64decode(response_api.audio)))
@@ -856,14 +935,22 @@ class StabilityAudioInpaint(IO.ComfyNode):
            mask_start=mask_start,
            mask_end=mask_end,
        )
-        response_api = await sync_op(
-            cls,
-            endpoint=ApiEndpoint(path="/proxy/stability/v2beta/audio/stable-audio-2/inpaint", method="POST"),
-            response_model=StabilityAudioResponse,
-            data=payload,
+        operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path="/proxy/stability/v2beta/audio/stable-audio-2/inpaint",
+                method=HttpMethod.POST,
+                request_model=StabilityAudioInpaintRequest,
+                response_model=StabilityAudioResponse,
+            ),
+            request=payload,
            content_type="multipart/form-data",
            files={"audio": audio_input_to_mp3(audio)},
+            auth_kwargs={
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
        )
+        response_api = await operation.execute()
        if not response_api.audio:
            raise ValueError("No audio file was received in response.")
        return IO.NodeOutput(audio_bytes_to_audio_input(base64.b64decode(response_api.audio)))
--- a/comfy_api_nodes/nodes_topaz.py
+++ b/comfy_api_nodes/nodes_topaz.py
@@ -1,421 +0,0 @@
-import builtins
-from io import BytesIO
-
-import aiohttp
-import torch
-from typing_extensions import override
-
-from comfy_api.input.video_types import VideoInput
-from comfy_api.latest import IO, ComfyExtension
-from comfy_api_nodes.apis import topaz_api
-from comfy_api_nodes.util import (
-    ApiEndpoint,
-    download_url_to_image_tensor,
-    download_url_to_video_output,
-    get_fs_object_size,
-    get_number_of_images,
-    poll_op,
-    sync_op,
-    upload_images_to_comfyapi,
-    validate_container_format_is_mp4,
-)
-
-UPSCALER_MODELS_MAP = {
-    "Starlight (Astra) Fast": "slf-1",
-    "Starlight (Astra) Creative": "slc-1",
-}
-UPSCALER_VALUES_MAP = {
-    "FullHD (1080p)": 1920,
-    "4K (2160p)": 3840,
-}
-
-
-class TopazImageEnhance(IO.ComfyNode):
-    @classmethod
-    def define_schema(cls):
-        return IO.Schema(
-            node_id="TopazImageEnhance",
-            display_name="Topaz Image Enhance",
-            category="api node/image/Topaz",
-            description="Industry-standard upscaling and image enhancement.",
-            inputs=[
-                IO.Combo.Input("model", options=["Reimagine"]),
-                IO.Image.Input("image"),
-                IO.String.Input(
-                    "prompt",
-                    multiline=True,
-                    default="",
-                    tooltip="Optional text prompt for creative upscaling guidance.",
-                    optional=True,
-                ),
-                IO.Combo.Input(
-                    "subject_detection",
-                    options=["All", "Foreground", "Background"],
-                    optional=True,
-                ),
-                IO.Boolean.Input(
-                    "face_enhancement",
-                    default=True,
-                    optional=True,
-                    tooltip="Enhance faces (if present) during processing.",
-                ),
-                IO.Float.Input(
-                    "face_enhancement_creativity",
-                    default=0.0,
-                    min=0.0,
-                    max=1.0,
-                    step=0.01,
-                    display_mode=IO.NumberDisplay.number,
-                    optional=True,
-                    tooltip="Set the creativity level for face enhancement.",
-                ),
-                IO.Float.Input(
-                    "face_enhancement_strength",
-                    default=1.0,
-                    min=0.0,
-                    max=1.0,
-                    step=0.01,
-                    display_mode=IO.NumberDisplay.number,
-                    optional=True,
-                    tooltip="Controls how sharp enhanced faces are relative to the background.",
-                ),
-                IO.Boolean.Input(
-                    "crop_to_fill",
-                    default=False,
-                    optional=True,
-                    tooltip="By default, the image is letterboxed when the output aspect ratio differs. "
-                    "Enable to crop the image to fill the output dimensions.",
-                ),
-                IO.Int.Input(
-                    "output_width",
-                    default=0,
-                    min=0,
-                    max=32000,
-                    step=1,
-                    display_mode=IO.NumberDisplay.number,
-                    optional=True,
-                    tooltip="Zero value means to calculate automatically (usually it will be original size or output_height if specified).",
-                ),
-                IO.Int.Input(
-                    "output_height",
-                    default=0,
-                    min=0,
-                    max=32000,
-                    step=1,
-                    display_mode=IO.NumberDisplay.number,
-                    optional=True,
-                    tooltip="Zero value means to output in the same height as original or output width.",
-                ),
-                IO.Int.Input(
-                    "creativity",
-                    default=3,
-                    min=1,
-                    max=9,
-                    step=1,
-                    display_mode=IO.NumberDisplay.slider,
-                    optional=True,
-                ),
-                IO.Boolean.Input(
-                    "face_preservation",
-                    default=True,
-                    optional=True,
-                    tooltip="Preserve subjects' facial identity.",
-                ),
-                IO.Boolean.Input(
-                    "color_preservation",
-                    default=True,
-                    optional=True,
-                    tooltip="Preserve the original colors.",
-                ),
-            ],
-            outputs=[
-                IO.Image.Output(),
-            ],
-            hidden=[
-                IO.Hidden.auth_token_comfy_org,
-                IO.Hidden.api_key_comfy_org,
-                IO.Hidden.unique_id,
-            ],
-            is_api_node=True,
-        )
-
-    @classmethod
-    async def execute(
-        cls,
-        model: str,
-        image: torch.Tensor,
-        prompt: str = "",
-        subject_detection: str = "All",
-        face_enhancement: bool = True,
-        face_enhancement_creativity: float = 1.0,
-        face_enhancement_strength: float = 0.8,
-        crop_to_fill: bool = False,
-        output_width: int = 0,
-        output_height: int = 0,
-        creativity: int = 3,
-        face_preservation: bool = True,
-        color_preservation: bool = True,
-    ) -> IO.NodeOutput:
-        if get_number_of_images(image) != 1:
-            raise ValueError("Only one input image is supported.")
-        download_url = await upload_images_to_comfyapi(cls, image, max_images=1, mime_type="image/png")
-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/topaz/image/v1/enhance-gen/async", method="POST"),
-            response_model=topaz_api.ImageAsyncTaskResponse,
-            data=topaz_api.ImageEnhanceRequest(
-                model=model,
-                prompt=prompt,
-                subject_detection=subject_detection,
-                face_enhancement=face_enhancement,
-                face_enhancement_creativity=face_enhancement_creativity,
-                face_enhancement_strength=face_enhancement_strength,
-                crop_to_fill=crop_to_fill,
-                output_width=output_width if output_width else None,
-                output_height=output_height if output_height else None,
-                creativity=creativity,
-                face_preservation=str(face_preservation).lower(),
-                color_preservation=str(color_preservation).lower(),
-                source_url=download_url[0],
-                output_format="png",
-            ),
-            content_type="multipart/form-data",
-        )
-
-        await poll_op(
-            cls,
-            poll_endpoint=ApiEndpoint(path=f"/proxy/topaz/image/v1/status/{initial_response.process_id}"),
-            response_model=topaz_api.ImageStatusResponse,
-            status_extractor=lambda x: x.status,
-            progress_extractor=lambda x: getattr(x, "progress", 0),
-            price_extractor=lambda x: x.credits * 0.08,
-            poll_interval=8.0,
-            max_poll_attempts=160,
-            estimated_duration=60,
-        )
-
-        results = await sync_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/topaz/image/v1/download/{initial_response.process_id}"),
-            response_model=topaz_api.ImageDownloadResponse,
-            monitor_progress=False,
-        )
-        return IO.NodeOutput(await download_url_to_image_tensor(results.download_url))
-
-
-class TopazVideoEnhance(IO.ComfyNode):
-    @classmethod
-    def define_schema(cls):
-        return IO.Schema(
-            node_id="TopazVideoEnhance",
-            display_name="Topaz Video Enhance",
-            category="api node/video/Topaz",
-            description="Breathe new life into video with powerful upscaling and recovery technology.",
-            inputs=[
-                IO.Video.Input("video"),
-                IO.Boolean.Input("upscaler_enabled", default=True),
-                IO.Combo.Input("upscaler_model", options=list(UPSCALER_MODELS_MAP.keys())),
-                IO.Combo.Input("upscaler_resolution", options=list(UPSCALER_VALUES_MAP.keys())),
-                IO.Combo.Input(
-                    "upscaler_creativity",
-                    options=["low", "middle", "high"],
-                    default="low",
-                    tooltip="Creativity level (applies only to Starlight (Astra) Creative).",
-                    optional=True,
-                ),
-                IO.Boolean.Input("interpolation_enabled", default=False, optional=True),
-                IO.Combo.Input("interpolation_model", options=["apo-8"], default="apo-8", optional=True),
-                IO.Int.Input(
-                    "interpolation_slowmo",
-                    default=1,
-                    min=1,
-                    max=16,
-                    display_mode=IO.NumberDisplay.number,
-                    tooltip="Slow-motion factor applied to the input video. "
-                    "For example, 2 makes the output twice as slow and doubles the duration.",
-                    optional=True,
-                ),
-                IO.Int.Input(
-                    "interpolation_frame_rate",
-                    default=60,
-                    min=15,
-                    max=240,
-                    display_mode=IO.NumberDisplay.number,
-                    tooltip="Output frame rate.",
-                    optional=True,
-                ),
-                IO.Boolean.Input(
-                    "interpolation_duplicate",
-                    default=False,
-                    tooltip="Analyze the input for duplicate frames and remove them.",
-                    optional=True,
-                ),
-                IO.Float.Input(
-                    "interpolation_duplicate_threshold",
-                    default=0.01,
-                    min=0.001,
-                    max=0.1,
-                    step=0.001,
-                    display_mode=IO.NumberDisplay.number,
-                    tooltip="Detection sensitivity for duplicate frames.",
-                    optional=True,
-                ),
-                IO.Combo.Input(
-                    "dynamic_compression_level",
-                    options=["Low", "Mid", "High"],
-                    default="Low",
-                    tooltip="CQP level.",
-                    optional=True,
-                ),
-            ],
-            outputs=[
-                IO.Video.Output(),
-            ],
-            hidden=[
-                IO.Hidden.auth_token_comfy_org,
-                IO.Hidden.api_key_comfy_org,
-                IO.Hidden.unique_id,
-            ],
-            is_api_node=True,
-        )
-
-    @classmethod
-    async def execute(
-        cls,
-        video: VideoInput,
-        upscaler_enabled: bool,
-        upscaler_model: str,
-        upscaler_resolution: str,
-        upscaler_creativity: str = "low",
-        interpolation_enabled: bool = False,
-        interpolation_model: str = "apo-8",
-        interpolation_slowmo: int = 1,
-        interpolation_frame_rate: int = 60,
-        interpolation_duplicate: bool = False,
-        interpolation_duplicate_threshold: float = 0.01,
-        dynamic_compression_level: str = "Low",
-    ) -> IO.NodeOutput:
-        if upscaler_enabled is False and interpolation_enabled is False:
-            raise ValueError("There is nothing to do: both upscaling and interpolation are disabled.")
-        src_width, src_height = video.get_dimensions()
-        video_components = video.get_components()
-        src_frame_rate = int(video_components.frame_rate)
-        duration_sec = video.get_duration()
-        estimated_frames = int(duration_sec * src_frame_rate)
-        validate_container_format_is_mp4(video)
-        src_video_stream = video.get_stream_source()
-        target_width = src_width
-        target_height = src_height
-        target_frame_rate = src_frame_rate
-        filters = []
-        if upscaler_enabled:
-            target_width = UPSCALER_VALUES_MAP[upscaler_resolution]
-            target_height = UPSCALER_VALUES_MAP[upscaler_resolution]
-            filters.append(
-                topaz_api.VideoEnhancementFilter(
-                    model=UPSCALER_MODELS_MAP[upscaler_model],
-                    creativity=(upscaler_creativity if UPSCALER_MODELS_MAP[upscaler_model] == "slc-1" else None),
-                    isOptimizedMode=(True if UPSCALER_MODELS_MAP[upscaler_model] == "slc-1" else None),
-                ),
-            )
-        if interpolation_enabled:
-            target_frame_rate = interpolation_frame_rate
-            filters.append(
-                topaz_api.VideoFrameInterpolationFilter(
-                    model=interpolation_model,
-                    slowmo=interpolation_slowmo,
-                    fps=interpolation_frame_rate,
-                    duplicate=interpolation_duplicate,
-                    duplicate_threshold=interpolation_duplicate_threshold,
-                ),
-            )
-        initial_res = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/topaz/video/", method="POST"),
-            response_model=topaz_api.CreateVideoResponse,
-            data=topaz_api.CreateVideoRequest(
-                source=topaz_api.CreateCreateVideoRequestSource(
-                    container="mp4",
-                    size=get_fs_object_size(src_video_stream),
-                    duration=int(duration_sec),
-                    frameCount=estimated_frames,
-                    frameRate=src_frame_rate,
-                    resolution=topaz_api.Resolution(width=src_width, height=src_height),
-                ),
-                filters=filters,
-                output=topaz_api.OutputInformationVideo(
-                    resolution=topaz_api.Resolution(width=target_width, height=target_height),
-                    frameRate=target_frame_rate,
-                    audioCodec="AAC",
-                    audioTransfer="Copy",
-                    dynamicCompressionLevel=dynamic_compression_level,
-                ),
-            ),
-            wait_label="Creating task",
-            final_label_on_success="Task created",
-        )
-        upload_res = await sync_op(
-            cls,
-            ApiEndpoint(
-                path=f"/proxy/topaz/video/{initial_res.requestId}/accept",
-                method="PATCH",
-            ),
-            response_model=topaz_api.VideoAcceptResponse,
-            wait_label="Preparing upload",
-            final_label_on_success="Upload started",
-        )
-        if len(upload_res.urls) > 1:
-            raise NotImplementedError(
-                "Large files are not currently supported. Please open an issue in the ComfyUI repository."
-            )
-        async with aiohttp.ClientSession(headers={"Content-Type": "video/mp4"}) as session:
-            if isinstance(src_video_stream, BytesIO):
-                src_video_stream.seek(0)
-                async with session.put(upload_res.urls[0], data=src_video_stream, raise_for_status=True) as res:
-                    upload_etag = res.headers["Etag"]
-            else:
-                with builtins.open(src_video_stream, "rb") as video_file:
-                    async with session.put(upload_res.urls[0], data=video_file, raise_for_status=True) as res:
-                        upload_etag = res.headers["Etag"]
-        await sync_op(
-            cls,
-            ApiEndpoint(
-                path=f"/proxy/topaz/video/{initial_res.requestId}/complete-upload",
-                method="PATCH",
-            ),
-            response_model=topaz_api.VideoCompleteUploadResponse,
-            data=topaz_api.VideoCompleteUploadRequest(
-                uploadResults=[
-                    topaz_api.VideoCompleteUploadRequestPart(
-                        partNum=1,
-                        eTag=upload_etag,
-                    ),
-                ],
-            ),
-            wait_label="Finalizing upload",
-            final_label_on_success="Upload completed",
-        )
-        final_response = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/topaz/video/{initial_res.requestId}/status"),
-            response_model=topaz_api.VideoStatusResponse,
-            status_extractor=lambda x: x.status,
-            progress_extractor=lambda x: getattr(x, "progress", 0),
-            price_extractor=lambda x: (x.estimates.cost[0] * 0.08 if x.estimates and x.estimates.cost[0] else None),
-            poll_interval=10.0,
-            max_poll_attempts=320,
-        )
-        return IO.NodeOutput(await download_url_to_video_output(final_response.download.url))
-
-
-class TopazExtension(ComfyExtension):
-    @override
-    async def get_node_list(self) -> list[type[IO.ComfyNode]]:
-        return [
-            TopazImageEnhance,
-            TopazVideoEnhance,
-        ]
-
-
-async def comfy_entrypoint() -> TopazExtension:
-    return TopazExtension()
--- a/comfy_api_nodes/nodes_tripo.py
+++ b/comfy_api_nodes/nodes_tripo.py
--- a/comfy_api_nodes/nodes_veo2.py
+++ b/comfy_api_nodes/nodes_veo2.py
@@ -1,21 +1,28 @@
+import logging
 import base64
+import aiohttp
+import torch
 from io import BytesIO
-
+from typing import Optional
 from typing_extensions import override

+from comfy_api.latest import ComfyExtension, IO
 from comfy_api.input_impl.video_types import VideoFromFile
-from comfy_api.latest import IO, ComfyExtension
-from comfy_api_nodes.apis.veo_api import (
-    VeoGenVidPollRequest,
-    VeoGenVidPollResponse,
+from comfy_api_nodes.apis import (
    VeoGenVidRequest,
    VeoGenVidResponse,
+    VeoGenVidPollRequest,
+    VeoGenVidPollResponse,
 )
-from comfy_api_nodes.util import (
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
-    download_url_to_video_output,
-    poll_op,
-    sync_op,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+)
+
+from comfy_api_nodes.apinode_utils import (
+    downscale_image_tensor,
    tensor_to_base64_string,
 )

@@ -28,6 +35,28 @@ MODELS_MAP = {
    "veo-3.0-fast-generate-001": "veo-3.0-fast-generate-001",
 }

+def convert_image_to_base64(image: torch.Tensor):
+    if image is None:
+        return None
+
+    scaled_image = downscale_image_tensor(image, total_pixels=2048*2048)
+    return tensor_to_base64_string(scaled_image)
+
+
+def get_video_url_from_response(poll_response: VeoGenVidPollResponse) -> Optional[str]:
+    if (
+        poll_response.response
+        and hasattr(poll_response.response, "videos")
+        and poll_response.response.videos
+        and len(poll_response.response.videos) > 0
+    ):
+        video = poll_response.response.videos[0]
+    else:
+        return None
+    if hasattr(video, "gcsUri") and video.gcsUri:
+        return str(video.gcsUri)
+    return None
+

 class VeoVideoGenerationNode(IO.ComfyNode):
    """
@@ -140,13 +169,18 @@ class VeoVideoGenerationNode(IO.ComfyNode):
        # Prepare the instances for the request
        instances = []

-        instance = {"prompt": prompt}
+        instance = {
+            "prompt": prompt
+        }

        # Add image if provided
        if image is not None:
-            image_base64 = tensor_to_base64_string(image)
+            image_base64 = convert_image_to_base64(image)
            if image_base64:
-                instance["image"] = {"bytesBase64Encoded": image_base64, "mimeType": "image/png"}
+                instance["image"] = {
+                    "bytesBase64Encoded": image_base64,
+                    "mimeType": "image/png"
+                }

        instances.append(instance)

@@ -164,77 +198,119 @@ class VeoVideoGenerationNode(IO.ComfyNode):
        if seed > 0:
            parameters["seed"] = seed
        # Only add generateAudio for Veo 3 models
-        if model.find("veo-2.0") == -1:
+        if "veo-3.0" in model:
            parameters["generateAudio"] = generate_audio

-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/veo/{model}/generate", method="POST"),
-            response_model=VeoGenVidResponse,
-            data=VeoGenVidRequest(
-                instances=instances,
-                parameters=parameters,
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        # Initial request to start video generation
+        initial_operation = SynchronousOperation(
+            endpoint=ApiEndpoint(
+                path=f"/proxy/veo/{model}/generate",
+                method=HttpMethod.POST,
+                request_model=VeoGenVidRequest,
+                response_model=VeoGenVidResponse
            ),
+            request=VeoGenVidRequest(
+                instances=instances,
+                parameters=parameters
+            ),
+            auth_kwargs=auth,
        )

+        initial_response = await initial_operation.execute()
+        operation_name = initial_response.name
+
+        logging.info("Veo generation started with operation name: %s", operation_name)
+
+        # Define status extractor function
        def status_extractor(response):
            # Only return "completed" if the operation is done, regardless of success or failure
            # We'll check for errors after polling completes
            return "completed" if response.done else "pending"

-        poll_response = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/veo/{model}/poll", method="POST"),
-            response_model=VeoGenVidPollResponse,
-            status_extractor=status_extractor,
-            data=VeoGenVidPollRequest(
-                operationName=initial_response.name,
+        # Define progress extractor function
+        def progress_extractor(response):
+            # Could be enhanced if the API provides progress information
+            return None
+
+        # Define the polling operation
+        poll_operation = PollingOperation(
+            poll_endpoint=ApiEndpoint(
+                path=f"/proxy/veo/{model}/poll",
+                method=HttpMethod.POST,
+                request_model=VeoGenVidPollRequest,
+                response_model=VeoGenVidPollResponse
            ),
+            completed_statuses=["completed"],
+            failed_statuses=[],  # No failed statuses, we'll handle errors after polling
+            status_extractor=status_extractor,
+            progress_extractor=progress_extractor,
+            request=VeoGenVidPollRequest(
+                operationName=operation_name
+            ),
+            auth_kwargs=auth,
            poll_interval=5.0,
+            result_url_extractor=get_video_url_from_response,
+            node_id=cls.hidden.unique_id,
            estimated_duration=AVERAGE_DURATION_VIDEO_GEN,
        )

+        # Execute the polling operation
+        poll_response = await poll_operation.execute()
+
        # Now check for errors in the final response
        # Check for error in poll response
-        if poll_response.error:
-            raise Exception(f"Veo API error: {poll_response.error.message} (code: {poll_response.error.code})")
+        if hasattr(poll_response, 'error') and poll_response.error:
+            error_message = f"Veo API error: {poll_response.error.message} (code: {poll_response.error.code})"
+            logging.error(error_message)
+            raise Exception(error_message)

        # Check for RAI filtered content
-        if (
-            hasattr(poll_response.response, "raiMediaFilteredCount")
-            and poll_response.response.raiMediaFilteredCount > 0
-        ):
+        if (hasattr(poll_response.response, 'raiMediaFilteredCount') and
+            poll_response.response.raiMediaFilteredCount > 0):

            # Extract reason message if available
-            if (
-                hasattr(poll_response.response, "raiMediaFilteredReasons")
-                and poll_response.response.raiMediaFilteredReasons
-            ):
+            if (hasattr(poll_response.response, 'raiMediaFilteredReasons') and
+                poll_response.response.raiMediaFilteredReasons):
                reason = poll_response.response.raiMediaFilteredReasons[0]
                error_message = f"Content filtered by Google's Responsible AI practices: {reason} ({poll_response.response.raiMediaFilteredCount} videos filtered.)"
            else:
                error_message = f"Content filtered by Google's Responsible AI practices ({poll_response.response.raiMediaFilteredCount} videos filtered.)"

+            logging.error(error_message)
            raise Exception(error_message)

        # Extract video data
-        if (
-            poll_response.response
-            and hasattr(poll_response.response, "videos")
-            and poll_response.response.videos
-            and len(poll_response.response.videos) > 0
-        ):
+        if poll_response.response and hasattr(poll_response.response, 'videos') and poll_response.response.videos and len(poll_response.response.videos) > 0:
            video = poll_response.response.videos[0]

            # Check if video is provided as base64 or URL
-            if hasattr(video, "bytesBase64Encoded") and video.bytesBase64Encoded:
-                return IO.NodeOutput(VideoFromFile(BytesIO(base64.b64decode(video.bytesBase64Encoded))))
+            if hasattr(video, 'bytesBase64Encoded') and video.bytesBase64Encoded:
+                # Decode base64 string to bytes
+                video_data = base64.b64decode(video.bytesBase64Encoded)
+            elif hasattr(video, 'gcsUri') and video.gcsUri:
+                # Download from URL
+                async with aiohttp.ClientSession() as session:
+                    async with session.get(video.gcsUri) as video_response:
+                        video_data = await video_response.content.read()
+            else:
+                raise Exception("Video returned but no data or URL was provided")
+        else:
+            raise Exception("Video generation completed but no video was returned")

-            if hasattr(video, "gcsUri") and video.gcsUri:
-                return IO.NodeOutput(await download_url_to_video_output(video.gcsUri))
+        if not video_data:
+            raise Exception("No video data was returned")

-            raise Exception("Video returned but no data or URL was provided")
-        raise Exception("Video generation completed but no video was returned")
+        logging.info("Video generation completed successfully")
+
+        # Convert video data to BytesIO object
+        video_io = BytesIO(video_data)
+
+        # Return VideoFromFile object
+        return IO.NodeOutput(VideoFromFile(video_io))


 class Veo3VideoGenerationNode(VeoVideoGenerationNode):
@@ -317,12 +393,7 @@ class Veo3VideoGenerationNode(VeoVideoGenerationNode):
                ),
                IO.Combo.Input(
                    "model",
-                    options=[
-                        "veo-3.1-generate",
-                        "veo-3.1-fast-generate",
-                        "veo-3.0-generate-001",
-                        "veo-3.0-fast-generate-001",
-                    ],
+                    options=list(MODELS_MAP.keys()),
                    default="veo-3.0-generate-001",
                    tooltip="Veo 3 model to use for video generation",
                    optional=True,
@@ -354,6 +425,5 @@ class VeoExtension(ComfyExtension):
            Veo3VideoGenerationNode,
        ]

-
 async def comfy_entrypoint() -> VeoExtension:
    return VeoExtension()
--- a/comfy_api_nodes/nodes_vidu.py
+++ b/comfy_api_nodes/nodes_vidu.py
@@ -1,23 +1,27 @@
 import logging
 from enum import Enum
-from typing import Literal, Optional, TypeVar
+from typing import Any, Callable, Optional, Literal, TypeVar
+from typing_extensions import override

 import torch
 from pydantic import BaseModel, Field
-from typing_extensions import override

-from comfy_api.latest import IO, ComfyExtension
-from comfy_api_nodes.util import (
-    ApiEndpoint,
-    download_url_to_video_output,
-    get_number_of_images,
-    poll_op,
-    sync_op,
-    upload_images_to_comfyapi,
-    validate_image_aspect_ratio,
+from comfy_api.latest import ComfyExtension, IO
+from comfy_api_nodes.util.validation_utils import (
+    validate_aspect_ratio_closeness,
    validate_image_dimensions,
-    validate_images_aspect_ratio_closeness,
+    validate_image_aspect_ratio_range,
+    get_number_of_images,
 )
+from comfy_api_nodes.apis.client import (
+    ApiEndpoint,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    EmptyRequest,
+)
+from comfy_api_nodes.apinode_utils import download_url_to_video_output, upload_images_to_comfyapi
+

 VIDU_TEXT_TO_VIDEO = "/proxy/vidu/text2video"
 VIDU_IMAGE_TO_VIDEO = "/proxy/vidu/img2video"
@@ -27,9 +31,8 @@ VIDU_GET_GENERATION_STATUS = "/proxy/vidu/tasks/%s/creations"

 R = TypeVar("R")

-
 class VideoModelName(str, Enum):
-    vidu_q1 = "viduq1"
+    vidu_q1 = 'viduq1'


 class AspectRatio(str, Enum):
@@ -60,9 +63,17 @@ class TaskCreationRequest(BaseModel):
    images: Optional[list[str]] = Field(None, description="Base64 encoded string or image URL")


+class TaskStatus(str, Enum):
+    created = "created"
+    queueing = "queueing"
+    processing = "processing"
+    success = "success"
+    failed = "failed"
+
+
 class TaskCreationResponse(BaseModel):
    task_id: str = Field(...)
-    state: str = Field(...)
+    state: TaskStatus = Field(...)
    created_at: str = Field(...)
    code: Optional[int] = Field(None, description="Error code")

@@ -74,11 +85,32 @@ class TaskResult(BaseModel):


 class TaskStatusResponse(BaseModel):
-    state: str = Field(...)
+    state: TaskStatus = Field(...)
    err_code: Optional[str] = Field(None)
    creations: list[TaskResult] = Field(..., description="Generated results")


+async def poll_until_finished(
+    auth_kwargs: dict[str, str],
+    api_endpoint: ApiEndpoint[Any, R],
+    result_url_extractor: Optional[Callable[[R], str]] = None,
+    estimated_duration: Optional[int] = None,
+    node_id: Optional[str] = None,
+) -> R:
+    return await PollingOperation(
+        poll_endpoint=api_endpoint,
+        completed_statuses=[TaskStatus.success.value],
+        failed_statuses=[TaskStatus.failed.value],
+        status_extractor=lambda response: response.state.value,
+        auth_kwargs=auth_kwargs,
+        result_url_extractor=result_url_extractor,
+        estimated_duration=estimated_duration,
+        node_id=node_id,
+        poll_interval=16.0,
+        max_poll_attempts=256,
+    ).execute()
+
+
 def get_video_url_from_response(response) -> Optional[str]:
    if response.creations:
        return response.creations[0].url
@@ -95,27 +127,37 @@ def get_video_from_response(response) -> TaskResult:


 async def execute_task(
-    cls: type[IO.ComfyNode],
    vidu_endpoint: str,
+    auth_kwargs: Optional[dict[str, str]],
    payload: TaskCreationRequest,
    estimated_duration: int,
+    node_id: str,
 ) -> R:
-    response = await sync_op(
-        cls,
-        endpoint=ApiEndpoint(path=vidu_endpoint, method="POST"),
-        response_model=TaskCreationResponse,
-        data=payload,
-    )
-    if response.state == "failed":
+    response = await SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path=vidu_endpoint,
+            method=HttpMethod.POST,
+            request_model=TaskCreationRequest,
+            response_model=TaskCreationResponse,
+        ),
+        request=payload,
+        auth_kwargs=auth_kwargs,
+    ).execute()
+    if response.state == TaskStatus.failed:
        error_msg = f"Vidu request failed. Code: {response.code}"
        logging.error(error_msg)
        raise RuntimeError(error_msg)
-    return await poll_op(
-        cls,
-        ApiEndpoint(path=VIDU_GET_GENERATION_STATUS % response.task_id),
-        response_model=TaskStatusResponse,
-        status_extractor=lambda r: r.state,
+    return await poll_until_finished(
+        auth_kwargs,
+        ApiEndpoint(
+            path=VIDU_GET_GENERATION_STATUS % response.task_id,
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=TaskStatusResponse,
+        ),
+        result_url_extractor=get_video_url_from_response,
        estimated_duration=estimated_duration,
+        node_id=node_id,
    )


@@ -216,7 +258,11 @@ class ViduTextToVideoNode(IO.ComfyNode):
            resolution=resolution,
            movement_amplitude=movement_amplitude,
        )
-        results = await execute_task(cls, VIDU_TEXT_TO_VIDEO, payload, 320)
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
+        results = await execute_task(VIDU_TEXT_TO_VIDEO, auth, payload, 320, cls.hidden.unique_id)
        return IO.NodeOutput(await download_url_to_video_output(get_video_from_response(results).url))


@@ -307,7 +353,7 @@ class ViduImageToVideoNode(IO.ComfyNode):
    ) -> IO.NodeOutput:
        if get_number_of_images(image) > 1:
            raise ValueError("Only one input image is allowed.")
-        validate_image_aspect_ratio(image, (1, 4), (4, 1))
+        validate_image_aspect_ratio_range(image, (1, 4), (4, 1))
        payload = TaskCreationRequest(
            model_name=model,
            prompt=prompt,
@@ -316,13 +362,17 @@ class ViduImageToVideoNode(IO.ComfyNode):
            resolution=resolution,
            movement_amplitude=movement_amplitude,
        )
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        payload.images = await upload_images_to_comfyapi(
-            cls,
            image,
            max_images=1,
            mime_type="image/png",
+            auth_kwargs=auth,
        )
-        results = await execute_task(cls, VIDU_IMAGE_TO_VIDEO, payload, 120)
+        results = await execute_task(VIDU_IMAGE_TO_VIDEO, auth, payload, 120, cls.hidden.unique_id)
        return IO.NodeOutput(await download_url_to_video_output(get_video_from_response(results).url))


@@ -423,7 +473,7 @@ class ViduReferenceVideoNode(IO.ComfyNode):
        if a > 7:
            raise ValueError("Too many images, maximum allowed is 7.")
        for image in images:
-            validate_image_aspect_ratio(image, (1, 4), (4, 1))
+            validate_image_aspect_ratio_range(image, (1, 4), (4, 1))
            validate_image_dimensions(image, min_width=128, min_height=128)
        payload = TaskCreationRequest(
            model_name=model,
@@ -434,13 +484,17 @@ class ViduReferenceVideoNode(IO.ComfyNode):
            resolution=resolution,
            movement_amplitude=movement_amplitude,
        )
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        payload.images = await upload_images_to_comfyapi(
-            cls,
            images,
            max_images=7,
            mime_type="image/png",
+            auth_kwargs=auth,
        )
-        results = await execute_task(cls, VIDU_REFERENCE_VIDEO, payload, 120)
+        results = await execute_task(VIDU_REFERENCE_VIDEO, auth, payload, 120, cls.hidden.unique_id)
        return IO.NodeOutput(await download_url_to_video_output(get_video_from_response(results).url))


@@ -533,7 +587,7 @@ class ViduStartEndToVideoNode(IO.ComfyNode):
        resolution: str,
        movement_amplitude: str,
    ) -> IO.NodeOutput:
-        validate_images_aspect_ratio_closeness(first_frame, end_frame, min_rel=0.8, max_rel=1.25, strict=False)
+        validate_aspect_ratio_closeness(first_frame, end_frame, min_rel=0.8, max_rel=1.25, strict=False)
        payload = TaskCreationRequest(
            model_name=model,
            prompt=prompt,
@@ -542,11 +596,15 @@ class ViduStartEndToVideoNode(IO.ComfyNode):
            resolution=resolution,
            movement_amplitude=movement_amplitude,
        )
+        auth = {
+            "auth_token": cls.hidden.auth_token_comfy_org,
+            "comfy_api_key": cls.hidden.api_key_comfy_org,
+        }
        payload.images = [
-            (await upload_images_to_comfyapi(cls, frame, max_images=1, mime_type="image/png"))[0]
+            (await upload_images_to_comfyapi(frame, max_images=1, mime_type="image/png", auth_kwargs=auth))[0]
            for frame in (first_frame, end_frame)
        ]
-        results = await execute_task(cls, VIDU_START_END_VIDEO, payload, 96)
+        results = await execute_task(VIDU_START_END_VIDEO, auth, payload, 96, cls.hidden.unique_id)
        return IO.NodeOutput(await download_url_to_video_output(get_video_from_response(results).url))


@@ -560,6 +618,5 @@ class ViduExtension(ComfyExtension):
            ViduStartEndToVideoNode,
        ]

-
 async def comfy_entrypoint() -> ViduExtension:
    return ViduExtension()
--- a/comfy_api_nodes/nodes_wan.py
+++ b/comfy_api_nodes/nodes_wan.py
@@ -1,24 +1,28 @@
 import re
-from typing import Optional
+from typing import Optional, Type, Union
+from typing_extensions import override

 import torch
 from pydantic import BaseModel, Field
-from typing_extensions import override
-
-from comfy_api.latest import IO, ComfyExtension, Input
-from comfy_api_nodes.util import (
+from comfy_api.latest import ComfyExtension, Input, IO
+from comfy_api_nodes.apis.client import (
    ApiEndpoint,
-    audio_to_base64_string,
+    HttpMethod,
+    SynchronousOperation,
+    PollingOperation,
+    EmptyRequest,
+    R,
+    T,
+)
+from comfy_api_nodes.util.validation_utils import get_number_of_images, validate_audio_duration
+
+from comfy_api_nodes.apinode_utils import (
    download_url_to_image_tensor,
    download_url_to_video_output,
-    get_number_of_images,
-    poll_op,
-    sync_op,
    tensor_to_base64_string,
-    validate_audio_duration,
+    audio_to_base64_string,
 )

-
 class Text2ImageInputField(BaseModel):
    prompt: str = Field(...)
    negative_prompt: Optional[str] = Field(None)
@@ -142,7 +146,53 @@ class VideoTaskStatusResponse(BaseModel):
    request_id: str = Field(...)


-RES_IN_PARENS = re.compile(r"\((\d+)\s*[x×]\s*(\d+)\)")
+RES_IN_PARENS = re.compile(r'\((\d+)\s*[x×]\s*(\d+)\)')
+
+
+async def process_task(
+    auth_kwargs: dict[str, str],
+    url: str,
+    request_model: Type[T],
+    response_model: Type[R],
+    payload: Union[
+        Text2ImageTaskCreationRequest,
+        Image2ImageTaskCreationRequest,
+        Text2VideoTaskCreationRequest,
+        Image2VideoTaskCreationRequest,
+    ],
+    node_id: str,
+    estimated_duration: int,
+    poll_interval: int,
+) -> Type[R]:
+    initial_response = await SynchronousOperation(
+        endpoint=ApiEndpoint(
+            path=url,
+            method=HttpMethod.POST,
+            request_model=request_model,
+            response_model=TaskCreationResponse,
+        ),
+        request=payload,
+        auth_kwargs=auth_kwargs,
+    ).execute()
+
+    if not initial_response.output:
+        raise Exception(f"Unknown error occurred: {initial_response.code} - {initial_response.message}")
+
+    return await PollingOperation(
+        poll_endpoint=ApiEndpoint(
+            path=f"/proxy/wan/api/v1/tasks/{initial_response.output.task_id}",
+            method=HttpMethod.GET,
+            request_model=EmptyRequest,
+            response_model=response_model,
+        ),
+        completed_statuses=["SUCCEEDED"],
+        failed_statuses=["FAILED", "CANCELED", "UNKNOWN"],
+        status_extractor=lambda x: x.output.task_status,
+        estimated_duration=estimated_duration,
+        poll_interval=poll_interval,
+        node_id=node_id,
+        auth_kwargs=auth_kwargs,
+    ).execute()


 class WanTextToImageApi(IO.ComfyNode):
@@ -209,7 +259,7 @@ class WanTextToImageApi(IO.ComfyNode):
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the result.',
+                    tooltip="Whether to add an \"AI generated\" watermark to the result.",
                    optional=True,
                ),
            ],
@@ -236,28 +286,26 @@ class WanTextToImageApi(IO.ComfyNode):
        prompt_extend: bool = True,
        watermark: bool = True,
    ):
-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/wan/api/v1/services/aigc/text2image/image-synthesis", method="POST"),
-            response_model=TaskCreationResponse,
-            data=Text2ImageTaskCreationRequest(
-                model=model,
-                input=Text2ImageInputField(prompt=prompt, negative_prompt=negative_prompt),
-                parameters=Txt2ImageParametersField(
-                    size=f"{width}*{height}",
-                    seed=seed,
-                    prompt_extend=prompt_extend,
-                    watermark=watermark,
-                ),
+        payload = Text2ImageTaskCreationRequest(
+            model=model,
+            input=Text2ImageInputField(prompt=prompt, negative_prompt=negative_prompt),
+            parameters=Txt2ImageParametersField(
+                size=f"{width}*{height}",
+                seed=seed,
+                prompt_extend=prompt_extend,
+                watermark=watermark,
            ),
        )
-        if not initial_response.output:
-            raise Exception(f"Unknown error occurred: {initial_response.code} - {initial_response.message}")
-        response = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/wan/api/v1/tasks/{initial_response.output.task_id}"),
+        response = await process_task(
+            {
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            "/proxy/wan/api/v1/services/aigc/text2image/image-synthesis",
+            request_model=Text2ImageTaskCreationRequest,
            response_model=ImageTaskStatusResponse,
-            status_extractor=lambda x: x.output.task_status,
+            payload=payload,
+            node_id=cls.hidden.unique_id,
            estimated_duration=9,
            poll_interval=3,
        )
@@ -272,7 +320,7 @@ class WanImageToImageApi(IO.ComfyNode):
            display_name="Wan Image to Image",
            category="api node/image/Wan",
            description="Generates an image from one or two input images and a text prompt. "
-            "The output image is currently fixed at 1.6 MP; its aspect ratio matches the input image(s).",
+                        "The output image is currently fixed at 1.6 MP; its aspect ratio matches the input image(s).",
            inputs=[
                IO.Combo.Input(
                    "model",
@@ -328,7 +376,7 @@ class WanImageToImageApi(IO.ComfyNode):
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the result.',
+                    tooltip="Whether to add an \"AI generated\" watermark to the result.",
                    optional=True,
                ),
            ],
@@ -360,30 +408,28 @@ class WanImageToImageApi(IO.ComfyNode):
            raise ValueError(f"Expected 1 or 2 input images, got {n_images}.")
        images = []
        for i in image:
-            images.append("data:image/png;base64," + tensor_to_base64_string(i, total_pixels=4096 * 4096))
-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/wan/api/v1/services/aigc/image2image/image-synthesis", method="POST"),
-            response_model=TaskCreationResponse,
-            data=Image2ImageTaskCreationRequest(
-                model=model,
-                input=Image2ImageInputField(prompt=prompt, negative_prompt=negative_prompt, images=images),
-                parameters=Image2ImageParametersField(
-                    # size=f"{width}*{height}",
-                    seed=seed,
-                    watermark=watermark,
-                ),
+            images.append("data:image/png;base64," + tensor_to_base64_string(i, total_pixels=4096*4096))
+        payload = Image2ImageTaskCreationRequest(
+            model=model,
+            input=Image2ImageInputField(prompt=prompt, negative_prompt=negative_prompt, images=images),
+            parameters=Image2ImageParametersField(
+                # size=f"{width}*{height}",
+                seed=seed,
+                watermark=watermark,
            ),
        )
-        if not initial_response.output:
-            raise Exception(f"Unknown error occurred: {initial_response.code} - {initial_response.message}")
-        response = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/wan/api/v1/tasks/{initial_response.output.task_id}"),
+        response = await process_task(
+            {
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            "/proxy/wan/api/v1/services/aigc/image2image/image-synthesis",
+            request_model=Image2ImageTaskCreationRequest,
            response_model=ImageTaskStatusResponse,
-            status_extractor=lambda x: x.output.task_status,
+            payload=payload,
+            node_id=cls.hidden.unique_id,
            estimated_duration=42,
-            poll_interval=4,
+            poll_interval=3,
        )
        return IO.NodeOutput(await download_url_to_image_tensor(str(response.output.results[0].url)))

@@ -477,7 +523,7 @@ class WanTextToVideoApi(IO.ComfyNode):
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the result.',
+                    tooltip="Whether to add an \"AI generated\" watermark to the result.",
                    optional=True,
                ),
            ],
@@ -511,31 +557,28 @@ class WanTextToVideoApi(IO.ComfyNode):
        if audio is not None:
            validate_audio_duration(audio, 3.0, 29.0)
            audio_url = "data:audio/mp3;base64," + audio_to_base64_string(audio, "mp3", "libmp3lame")
-
-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/wan/api/v1/services/aigc/video-generation/video-synthesis", method="POST"),
-            response_model=TaskCreationResponse,
-            data=Text2VideoTaskCreationRequest(
-                model=model,
-                input=Text2VideoInputField(prompt=prompt, negative_prompt=negative_prompt, audio_url=audio_url),
-                parameters=Text2VideoParametersField(
-                    size=f"{width}*{height}",
-                    duration=duration,
-                    seed=seed,
-                    audio=generate_audio,
-                    prompt_extend=prompt_extend,
-                    watermark=watermark,
-                ),
+        payload = Text2VideoTaskCreationRequest(
+            model=model,
+            input=Text2VideoInputField(prompt=prompt, negative_prompt=negative_prompt, audio_url=audio_url),
+            parameters=Text2VideoParametersField(
+                size=f"{width}*{height}",
+                duration=duration,
+                seed=seed,
+                audio=generate_audio,
+                prompt_extend=prompt_extend,
+                watermark=watermark,
            ),
        )
-        if not initial_response.output:
-            raise Exception(f"Unknown error occurred: {initial_response.code} - {initial_response.message}")
-        response = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/wan/api/v1/tasks/{initial_response.output.task_id}"),
+        response = await process_task(
+            {
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            "/proxy/wan/api/v1/services/aigc/video-generation/video-synthesis",
+            request_model=Text2VideoTaskCreationRequest,
            response_model=VideoTaskStatusResponse,
-            status_extractor=lambda x: x.output.task_status,
+            payload=payload,
+            node_id=cls.hidden.unique_id,
            estimated_duration=120 * int(duration / 5),
            poll_interval=6,
        )
@@ -624,7 +667,7 @@ class WanImageToVideoApi(IO.ComfyNode):
                IO.Boolean.Input(
                    "watermark",
                    default=True,
-                    tooltip='Whether to add an "AI generated" watermark to the result.',
+                    tooltip="Whether to add an \"AI generated\" watermark to the result.",
                    optional=True,
                ),
            ],
@@ -656,37 +699,35 @@ class WanImageToVideoApi(IO.ComfyNode):
    ):
        if get_number_of_images(image) != 1:
            raise ValueError("Exactly one input image is required.")
-        image_url = "data:image/png;base64," + tensor_to_base64_string(image, total_pixels=2000 * 2000)
+        image_url = "data:image/png;base64," + tensor_to_base64_string(image, total_pixels=2000*2000)
        audio_url = None
        if audio is not None:
            validate_audio_duration(audio, 3.0, 29.0)
            audio_url = "data:audio/mp3;base64," + audio_to_base64_string(audio, "mp3", "libmp3lame")
-        initial_response = await sync_op(
-            cls,
-            ApiEndpoint(path="/proxy/wan/api/v1/services/aigc/video-generation/video-synthesis", method="POST"),
-            response_model=TaskCreationResponse,
-            data=Image2VideoTaskCreationRequest(
-                model=model,
-                input=Image2VideoInputField(
-                    prompt=prompt, negative_prompt=negative_prompt, img_url=image_url, audio_url=audio_url
-                ),
-                parameters=Image2VideoParametersField(
-                    resolution=resolution,
-                    duration=duration,
-                    seed=seed,
-                    audio=generate_audio,
-                    prompt_extend=prompt_extend,
-                    watermark=watermark,
-                ),
+        payload = Image2VideoTaskCreationRequest(
+            model=model,
+            input=Image2VideoInputField(
+                prompt=prompt, negative_prompt=negative_prompt, img_url=image_url, audio_url=audio_url
+            ),
+            parameters=Image2VideoParametersField(
+                resolution=resolution,
+                duration=duration,
+                seed=seed,
+                audio=generate_audio,
+                prompt_extend=prompt_extend,
+                watermark=watermark,
            ),
        )
-        if not initial_response.output:
-            raise Exception(f"Unknown error occurred: {initial_response.code} - {initial_response.message}")
-        response = await poll_op(
-            cls,
-            ApiEndpoint(path=f"/proxy/wan/api/v1/tasks/{initial_response.output.task_id}"),
+        response = await process_task(
+            {
+                "auth_token": cls.hidden.auth_token_comfy_org,
+                "comfy_api_key": cls.hidden.api_key_comfy_org,
+            },
+            "/proxy/wan/api/v1/services/aigc/video-generation/video-synthesis",
+            request_model=Image2VideoTaskCreationRequest,
            response_model=VideoTaskStatusResponse,
-            status_extractor=lambda x: x.output.task_status,
+            payload=payload,
+            node_id=cls.hidden.unique_id,
            estimated_duration=120 * int(duration / 5),
            poll_interval=6,
        )
--- a/comfy_api_nodes/util/init.py
+++ b/comfy_api_nodes/util/init.py
@@ -1,97 +0,0 @@
-from ._helpers import get_fs_object_size
-from .client import (
-    ApiEndpoint,
-    poll_op,
-    poll_op_raw,
-    sync_op,
-    sync_op_raw,
-)
-from .conversions import (
-    audio_bytes_to_audio_input,
-    audio_input_to_mp3,
-    audio_to_base64_string,
-    bytesio_to_image_tensor,
-    downscale_image_tensor,
-    image_tensor_pair_to_batch,
-    pil_to_bytesio,
-    resize_mask_to_image,
-    tensor_to_base64_string,
-    tensor_to_bytesio,
-    tensor_to_pil,
-    text_filepath_to_base64_string,
-    text_filepath_to_data_uri,
-    trim_video,
-    video_to_base64_string,
-)
-from .download_helpers import (
-    download_url_as_bytesio,
-    download_url_to_bytesio,
-    download_url_to_image_tensor,
-    download_url_to_video_output,
-)
-from .upload_helpers import (
-    upload_audio_to_comfyapi,
-    upload_file_to_comfyapi,
-    upload_images_to_comfyapi,
-    upload_video_to_comfyapi,
-)
-from .validation_utils import (
-    get_number_of_images,
-    validate_aspect_ratio_string,
-    validate_audio_duration,
-    validate_container_format_is_mp4,
-    validate_image_aspect_ratio,
-    validate_image_dimensions,
-    validate_images_aspect_ratio_closeness,
-    validate_string,
-    validate_video_dimensions,
-    validate_video_duration,
-)
-
-__all__ = [
-    # API client
-    "ApiEndpoint",
-    "poll_op",
-    "poll_op_raw",
-    "sync_op",
-    "sync_op_raw",
-    # Upload helpers
-    "upload_audio_to_comfyapi",
-    "upload_file_to_comfyapi",
-    "upload_images_to_comfyapi",
-    "upload_video_to_comfyapi",
-    # Download helpers
-    "download_url_as_bytesio",
-    "download_url_to_bytesio",
-    "download_url_to_image_tensor",
-    "download_url_to_video_output",
-    # Conversions
-    "audio_bytes_to_audio_input",
-    "audio_input_to_mp3",
-    "audio_to_base64_string",
-    "bytesio_to_image_tensor",
-    "downscale_image_tensor",
-    "image_tensor_pair_to_batch",
-    "pil_to_bytesio",
-    "resize_mask_to_image",
-    "tensor_to_base64_string",
-    "tensor_to_bytesio",
-    "tensor_to_pil",
-    "text_filepath_to_base64_string",
-    "text_filepath_to_data_uri",
-    "trim_video",
-    "video_to_base64_string",
-    # Validation utilities
-    "get_number_of_images",
-    "validate_aspect_ratio_string",
-    "validate_audio_duration",
-    "validate_container_format_is_mp4",
-    "validate_image_aspect_ratio",
-    "validate_image_dimensions",
-    "validate_images_aspect_ratio_closeness",
-    "validate_string",
-    "validate_video_dimensions",
-    "validate_video_duration",
-    # Misc functions
-    "get_fs_object_size",
-]
--- a/comfy_api_nodes/util/_helpers.py
+++ b/comfy_api_nodes/util/_helpers.py
@@ -1,71 +0,0 @@
-import asyncio
-import contextlib
-import os
-import time
-from io import BytesIO
-from typing import Callable, Optional, Union
-
-from comfy.cli_args import args
-from comfy.model_management import processing_interrupted
-from comfy_api.latest import IO
-
-from .common_exceptions import ProcessingInterrupted
-
-
-def is_processing_interrupted() -> bool:
-    """Return True if user/runtime requested interruption."""
-    return processing_interrupted()
-
-
-def get_node_id(node_cls: type[IO.ComfyNode]) -> str:
-    return node_cls.hidden.unique_id
-
-
-def get_auth_header(node_cls: type[IO.ComfyNode]) -> dict[str, str]:
-    if node_cls.hidden.auth_token_comfy_org:
-        return {"Authorization": f"Bearer {node_cls.hidden.auth_token_comfy_org}"}
-    if node_cls.hidden.api_key_comfy_org:
-        return {"X-API-KEY": node_cls.hidden.api_key_comfy_org}
-    return {}
-
-
-def default_base_url() -> str:
-    return getattr(args, "comfy_api_base", "https://api.comfy.org")
-
-
-async def sleep_with_interrupt(
-    seconds: float,
-    node_cls: Optional[type[IO.ComfyNode]],
-    label: Optional[str] = None,
-    start_ts: Optional[float] = None,
-    estimated_total: Optional[int] = None,
-    *,
-    display_callback: Optional[Callable[[type[IO.ComfyNode], str, int, Optional[int]], None]] = None,
-):
-    """
-    Sleep in 1s slices while:
-      - Checking for interruption (raises ProcessingInterrupted).
-      - Optionally emitting time progress via display_callback (if provided).
-    """
-    end = time.monotonic() + seconds
-    while True:
-        if is_processing_interrupted():
-            raise ProcessingInterrupted("Task cancelled")
-        now = time.monotonic()
-        if start_ts is not None and label and display_callback:
-            with contextlib.suppress(Exception):
-                display_callback(node_cls, label, int(now - start_ts), estimated_total)
-        if now >= end:
-            break
-        await asyncio.sleep(min(1.0, end - now))
-
-
-def mimetype_to_extension(mime_type: str) -> str:
-    """Converts a MIME type to a file extension."""
-    return mime_type.split("/")[-1].lower()
-
-
-def get_fs_object_size(path_or_object: Union[str, BytesIO]) -> int:
-    if isinstance(path_or_object, str):
-        return os.path.getsize(path_or_object)
-    return len(path_or_object.getvalue())
--- a/comfy_api_nodes/util/client.py
+++ b/comfy_api_nodes/util/client.py
@@ -1,946 +0,0 @@
-import asyncio
-import contextlib
-import json
-import logging
-import time
-import uuid
-from dataclasses import dataclass
-from enum import Enum
-from io import BytesIO
-from typing import Any, Callable, Iterable, Literal, Optional, Type, TypeVar, Union
-from urllib.parse import urljoin, urlparse
-
-import aiohttp
-from aiohttp.client_exceptions import ClientError, ContentTypeError
-from pydantic import BaseModel
-
-from comfy import utils
-from comfy_api.latest import IO
-from server import PromptServer
-
-from . import request_logger
-from ._helpers import (
-    default_base_url,
-    get_auth_header,
-    get_node_id,
-    is_processing_interrupted,
-    sleep_with_interrupt,
-)
-from .common_exceptions import ApiServerError, LocalNetworkError, ProcessingInterrupted
-
-M = TypeVar("M", bound=BaseModel)
-
-
-class ApiEndpoint:
-    def __init__(
-        self,
-        path: str,
-        method: Literal["GET", "POST", "PUT", "DELETE", "PATCH"] = "GET",
-        *,
-        query_params: Optional[dict[str, Any]] = None,
-        headers: Optional[dict[str, str]] = None,
-    ):
-        self.path = path
-        self.method = method
-        self.query_params = query_params or {}
-        self.headers = headers or {}
-
-
-@dataclass
-class _RequestConfig:
-    node_cls: type[IO.ComfyNode]
-    endpoint: ApiEndpoint
-    timeout: float
-    content_type: str
-    data: Optional[dict[str, Any]]
-    files: Optional[Union[dict[str, Any], list[tuple[str, Any]]]]
-    multipart_parser: Optional[Callable]
-    max_retries: int
-    retry_delay: float
-    retry_backoff: float
-    wait_label: str = "Waiting"
-    monitor_progress: bool = True
-    estimated_total: Optional[int] = None
-    final_label_on_success: Optional[str] = "Completed"
-    progress_origin_ts: Optional[float] = None
-    price_extractor: Optional[Callable[[dict[str, Any]], Optional[float]]] = None
-
-
-@dataclass
-class _PollUIState:
-    started: float
-    status_label: str = "Queued"
-    is_queued: bool = True
-    price: Optional[float] = None
-    estimated_duration: Optional[int] = None
-    base_processing_elapsed: float = 0.0  # sum of completed active intervals
-    active_since: Optional[float] = None  # start time of current active interval (None if queued)
-
-
-_RETRY_STATUS = {408, 429, 500, 502, 503, 504}
-COMPLETED_STATUSES = ["succeeded", "succeed", "success", "completed", "finished", "done", "complete"]
-FAILED_STATUSES = ["cancelled", "canceled", "canceling", "fail", "failed", "error"]
-QUEUED_STATUSES = ["created", "queued", "queueing", "submitted", "initializing"]
-
-
-async def sync_op(
-    cls: type[IO.ComfyNode],
-    endpoint: ApiEndpoint,
-    *,
-    response_model: Type[M],
-    price_extractor: Optional[Callable[[M], Optional[float]]] = None,
-    data: Optional[BaseModel] = None,
-    files: Optional[Union[dict[str, Any], list[tuple[str, Any]]]] = None,
-    content_type: str = "application/json",
-    timeout: float = 3600.0,
-    multipart_parser: Optional[Callable] = None,
-    max_retries: int = 3,
-    retry_delay: float = 1.0,
-    retry_backoff: float = 2.0,
-    wait_label: str = "Waiting for server",
-    estimated_duration: Optional[int] = None,
-    final_label_on_success: Optional[str] = "Completed",
-    progress_origin_ts: Optional[float] = None,
-    monitor_progress: bool = True,
-) -> M:
-    raw = await sync_op_raw(
-        cls,
-        endpoint,
-        price_extractor=_wrap_model_extractor(response_model, price_extractor),
-        data=data,
-        files=files,
-        content_type=content_type,
-        timeout=timeout,
-        multipart_parser=multipart_parser,
-        max_retries=max_retries,
-        retry_delay=retry_delay,
-        retry_backoff=retry_backoff,
-        wait_label=wait_label,
-        estimated_duration=estimated_duration,
-        as_binary=False,
-        final_label_on_success=final_label_on_success,
-        progress_origin_ts=progress_origin_ts,
-        monitor_progress=monitor_progress,
-    )
-    if not isinstance(raw, dict):
-        raise Exception("Expected JSON response to validate into a Pydantic model, got non-JSON (binary or text).")
-    return _validate_or_raise(response_model, raw)
-
-
-async def poll_op(
-    cls: type[IO.ComfyNode],
-    poll_endpoint: ApiEndpoint,
-    *,
-    response_model: Type[M],
-    status_extractor: Callable[[M], Optional[Union[str, int]]],
-    progress_extractor: Optional[Callable[[M], Optional[int]]] = None,
-    price_extractor: Optional[Callable[[M], Optional[float]]] = None,
-    completed_statuses: Optional[list[Union[str, int]]] = None,
-    failed_statuses: Optional[list[Union[str, int]]] = None,
-    queued_statuses: Optional[list[Union[str, int]]] = None,
-    data: Optional[BaseModel] = None,
-    poll_interval: float = 5.0,
-    max_poll_attempts: int = 120,
-    timeout_per_poll: float = 120.0,
-    max_retries_per_poll: int = 3,
-    retry_delay_per_poll: float = 1.0,
-    retry_backoff_per_poll: float = 2.0,
-    estimated_duration: Optional[int] = None,
-    cancel_endpoint: Optional[ApiEndpoint] = None,
-    cancel_timeout: float = 10.0,
-) -> M:
-    raw = await poll_op_raw(
-        cls,
-        poll_endpoint=poll_endpoint,
-        status_extractor=_wrap_model_extractor(response_model, status_extractor),
-        progress_extractor=_wrap_model_extractor(response_model, progress_extractor),
-        price_extractor=_wrap_model_extractor(response_model, price_extractor),
-        completed_statuses=completed_statuses,
-        failed_statuses=failed_statuses,
-        queued_statuses=queued_statuses,
-        data=data,
-        poll_interval=poll_interval,
-        max_poll_attempts=max_poll_attempts,
-        timeout_per_poll=timeout_per_poll,
-        max_retries_per_poll=max_retries_per_poll,
-        retry_delay_per_poll=retry_delay_per_poll,
-        retry_backoff_per_poll=retry_backoff_per_poll,
-        estimated_duration=estimated_duration,
-        cancel_endpoint=cancel_endpoint,
-        cancel_timeout=cancel_timeout,
-    )
-    if not isinstance(raw, dict):
-        raise Exception("Expected JSON response to validate into a Pydantic model, got non-JSON (binary or text).")
-    return _validate_or_raise(response_model, raw)
-
-
-async def sync_op_raw(
-    cls: type[IO.ComfyNode],
-    endpoint: ApiEndpoint,
-    *,
-    price_extractor: Optional[Callable[[dict[str, Any]], Optional[float]]] = None,
-    data: Optional[Union[dict[str, Any], BaseModel]] = None,
-    files: Optional[Union[dict[str, Any], list[tuple[str, Any]]]] = None,
-    content_type: str = "application/json",
-    timeout: float = 3600.0,
-    multipart_parser: Optional[Callable] = None,
-    max_retries: int = 3,
-    retry_delay: float = 1.0,
-    retry_backoff: float = 2.0,
-    wait_label: str = "Waiting for server",
-    estimated_duration: Optional[int] = None,
-    as_binary: bool = False,
-    final_label_on_success: Optional[str] = "Completed",
-    progress_origin_ts: Optional[float] = None,
-    monitor_progress: bool = True,
-) -> Union[dict[str, Any], bytes]:
-    """
-    Make a single network request.
-      - If as_binary=False (default): returns JSON dict (or {'_raw': '<text>'} if non-JSON).
-      - If as_binary=True: returns bytes.
-    """
-    if isinstance(data, BaseModel):
-        data = data.model_dump(exclude_none=True)
-        for k, v in list(data.items()):
-            if isinstance(v, Enum):
-                data[k] = v.value
-    cfg = _RequestConfig(
-        node_cls=cls,
-        endpoint=endpoint,
-        timeout=timeout,
-        content_type=content_type,
-        data=data,
-        files=files,
-        multipart_parser=multipart_parser,
-        max_retries=max_retries,
-        retry_delay=retry_delay,
-        retry_backoff=retry_backoff,
-        wait_label=wait_label,
-        monitor_progress=monitor_progress,
-        estimated_total=estimated_duration,
-        final_label_on_success=final_label_on_success,
-        progress_origin_ts=progress_origin_ts,
-        price_extractor=price_extractor,
-    )
-    return await _request_base(cfg, expect_binary=as_binary)
-
-
-async def poll_op_raw(
-    cls: type[IO.ComfyNode],
-    poll_endpoint: ApiEndpoint,
-    *,
-    status_extractor: Callable[[dict[str, Any]], Optional[Union[str, int]]],
-    progress_extractor: Optional[Callable[[dict[str, Any]], Optional[int]]] = None,
-    price_extractor: Optional[Callable[[dict[str, Any]], Optional[float]]] = None,
-    completed_statuses: Optional[list[Union[str, int]]] = None,
-    failed_statuses: Optional[list[Union[str, int]]] = None,
-    queued_statuses: Optional[list[Union[str, int]]] = None,
-    data: Optional[Union[dict[str, Any], BaseModel]] = None,
-    poll_interval: float = 5.0,
-    max_poll_attempts: int = 120,
-    timeout_per_poll: float = 120.0,
-    max_retries_per_poll: int = 3,
-    retry_delay_per_poll: float = 1.0,
-    retry_backoff_per_poll: float = 2.0,
-    estimated_duration: Optional[int] = None,
-    cancel_endpoint: Optional[ApiEndpoint] = None,
-    cancel_timeout: float = 10.0,
-) -> dict[str, Any]:
-    """
-    Polls an endpoint until the task reaches a terminal state. Displays time while queued/processing,
-    checks interruption every second, and calls Cancel endpoint (if provided) on interruption.
-
-    Uses default complete, failed and queued states assumption.
-
-    Returns the final JSON response from the poll endpoint.
-    """
-    completed_states = _normalize_statuses(COMPLETED_STATUSES if completed_statuses is None else completed_statuses)
-    failed_states = _normalize_statuses(FAILED_STATUSES if failed_statuses is None else failed_statuses)
-    queued_states = _normalize_statuses(QUEUED_STATUSES if queued_statuses is None else queued_statuses)
-    started = time.monotonic()
-    consumed_attempts = 0  # counts only non-queued polls
-
-    progress_bar = utils.ProgressBar(100) if progress_extractor else None
-    last_progress: Optional[int] = None
-
-    state = _PollUIState(started=started, estimated_duration=estimated_duration)
-    stop_ticker = asyncio.Event()
-
-    async def _ticker():
-        """Emit a UI update every second while polling is in progress."""
-        try:
-            while not stop_ticker.is_set():
-                if is_processing_interrupted():
-                    break
-                now = time.monotonic()
-                proc_elapsed = state.base_processing_elapsed + (
-                    (now - state.active_since) if state.active_since is not None else 0.0
-                )
-                _display_time_progress(
-                    cls,
-                    status=state.status_label,
-                    elapsed_seconds=int(now - state.started),
-                    estimated_total=state.estimated_duration,
-                    price=state.price,
-                    is_queued=state.is_queued,
-                    processing_elapsed_seconds=int(proc_elapsed),
-                )
-                await asyncio.sleep(1.0)
-        except Exception as exc:
-            logging.debug("Polling ticker exited: %s", exc)
-
-    ticker_task = asyncio.create_task(_ticker())
-    try:
-        while consumed_attempts < max_poll_attempts:
-            try:
-                resp_json = await sync_op_raw(
-                    cls,
-                    poll_endpoint,
-                    data=data,
-                    timeout=timeout_per_poll,
-                    max_retries=max_retries_per_poll,
-                    retry_delay=retry_delay_per_poll,
-                    retry_backoff=retry_backoff_per_poll,
-                    wait_label="Checking",
-                    estimated_duration=None,
-                    as_binary=False,
-                    final_label_on_success=None,
-                    monitor_progress=False,
-                )
-                if not isinstance(resp_json, dict):
-                    raise Exception("Polling endpoint returned non-JSON response.")
-            except ProcessingInterrupted:
-                if cancel_endpoint:
-                    with contextlib.suppress(Exception):
-                        await sync_op_raw(
-                            cls,
-                            cancel_endpoint,
-                            timeout=cancel_timeout,
-                            max_retries=0,
-                            wait_label="Cancelling task",
-                            estimated_duration=None,
-                            as_binary=False,
-                            final_label_on_success=None,
-                            monitor_progress=False,
-                        )
-                raise
-
-            try:
-                status = _normalize_status_value(status_extractor(resp_json))
-            except Exception as e:
-                logging.error("Status extraction failed: %s", e)
-                status = None
-
-            if price_extractor:
-                new_price = price_extractor(resp_json)
-                if new_price is not None:
-                    state.price = new_price
-
-            if progress_extractor:
-                new_progress = progress_extractor(resp_json)
-                if new_progress is not None and last_progress != new_progress:
-                    progress_bar.update_absolute(new_progress, total=100)
-                    last_progress = new_progress
-
-            now_ts = time.monotonic()
-            is_queued = status in queued_states
-
-            if is_queued:
-                if state.active_since is not None:  # If we just moved from active -> queued, close the active interval
-                    state.base_processing_elapsed += now_ts - state.active_since
-                    state.active_since = None
-            else:
-                if state.active_since is None:  # If we just moved from queued -> active, open a new active interval
-                    state.active_since = now_ts
-
-            state.is_queued = is_queued
-            state.status_label = status or ("Queued" if is_queued else "Processing")
-            if status in completed_states:
-                if state.active_since is not None:
-                    state.base_processing_elapsed += now_ts - state.active_since
-                    state.active_since = None
-                stop_ticker.set()
-                with contextlib.suppress(Exception):
-                    await ticker_task
-
-                if progress_bar and last_progress != 100:
-                    progress_bar.update_absolute(100, total=100)
-
-                _display_time_progress(
-                    cls,
-                    status=status if status else "Completed",
-                    elapsed_seconds=int(now_ts - started),
-                    estimated_total=estimated_duration,
-                    price=state.price,
-                    is_queued=False,
-                    processing_elapsed_seconds=int(state.base_processing_elapsed),
-                )
-                return resp_json
-
-            if status in failed_states:
-                msg = f"Task failed: {json.dumps(resp_json)}"
-                logging.error(msg)
-                raise Exception(msg)
-
-            try:
-                await sleep_with_interrupt(poll_interval, cls, None, None, None)
-            except ProcessingInterrupted:
-                if cancel_endpoint:
-                    with contextlib.suppress(Exception):
-                        await sync_op_raw(
-                            cls,
-                            cancel_endpoint,
-                            timeout=cancel_timeout,
-                            max_retries=0,
-                            wait_label="Cancelling task",
-                            estimated_duration=None,
-                            as_binary=False,
-                            final_label_on_success=None,
-                            monitor_progress=False,
-                        )
-                raise
-            if not is_queued:
-                consumed_attempts += 1
-
-        raise Exception(
-            f"Polling timed out after {max_poll_attempts} non-queued attempts "
-            f"(~{int(max_poll_attempts * poll_interval)}s of active polling)."
-        )
-    except ProcessingInterrupted:
-        raise
-    except (LocalNetworkError, ApiServerError):
-        raise
-    except Exception as e:
-        raise Exception(f"Polling aborted due to error: {e}") from e
-    finally:
-        stop_ticker.set()
-        with contextlib.suppress(Exception):
-            await ticker_task
-
-
-def _display_text(
-    node_cls: type[IO.ComfyNode],
-    text: Optional[str],
-    *,
-    status: Optional[Union[str, int]] = None,
-    price: Optional[float] = None,
-) -> None:
-    display_lines: list[str] = []
-    if status:
-        display_lines.append(f"Status: {status.capitalize() if isinstance(status, str) else status}")
-    if price is not None:
-        p = f"{float(price):,.4f}".rstrip("0").rstrip(".")
-        if p != "0":
-            display_lines.append(f"Price: ${p}")
-    if text is not None:
-        display_lines.append(text)
-    if display_lines:
-        PromptServer.instance.send_progress_text("\n".join(display_lines), get_node_id(node_cls))
-
-
-def _display_time_progress(
-    node_cls: type[IO.ComfyNode],
-    status: Optional[Union[str, int]],
-    elapsed_seconds: int,
-    estimated_total: Optional[int] = None,
-    *,
-    price: Optional[float] = None,
-    is_queued: Optional[bool] = None,
-    processing_elapsed_seconds: Optional[int] = None,
-) -> None:
-    if estimated_total is not None and estimated_total > 0 and is_queued is False:
-        pe = processing_elapsed_seconds if processing_elapsed_seconds is not None else elapsed_seconds
-        remaining = max(0, int(estimated_total) - int(pe))
-        time_line = f"Time elapsed: {int(elapsed_seconds)}s (~{remaining}s remaining)"
-    else:
-        time_line = f"Time elapsed: {int(elapsed_seconds)}s"
-    _display_text(node_cls, time_line, status=status, price=price)
-
-
-async def _diagnose_connectivity() -> dict[str, bool]:
-    """Best-effort connectivity diagnostics to distinguish local vs. server issues."""
-    results = {
-        "internet_accessible": False,
-        "api_accessible": False,
-    }
-    timeout = aiohttp.ClientTimeout(total=5.0)
-    async with aiohttp.ClientSession(timeout=timeout) as session:
-        with contextlib.suppress(ClientError, OSError):
-            async with session.get("https://www.google.com") as resp:
-                results["internet_accessible"] = resp.status < 500
-        if not results["internet_accessible"]:
-            return results
-
-        parsed = urlparse(default_base_url())
-        health_url = f"{parsed.scheme}://{parsed.netloc}/health"
-        with contextlib.suppress(ClientError, OSError):
-            async with session.get(health_url) as resp:
-                results["api_accessible"] = resp.status < 500
-    return results
-
-
-def _unpack_tuple(t: tuple) -> tuple[str, Any, str]:
-    """Normalize (filename, value, content_type)."""
-    if len(t) == 2:
-        return t[0], t[1], "application/octet-stream"
-    if len(t) == 3:
-        return t[0], t[1], t[2]
-    raise ValueError("files tuple must be (filename, file[, content_type])")
-
-
-def _merge_params(endpoint_params: dict[str, Any], method: str, data: Optional[dict[str, Any]]) -> dict[str, Any]:
-    params = dict(endpoint_params or {})
-    if method.upper() == "GET" and data:
-        for k, v in data.items():
-            if v is not None:
-                params[k] = v
-    return params
-
-
-def _friendly_http_message(status: int, body: Any) -> str:
-    if status == 401:
-        return "Unauthorized: Please login first to use this node."
-    if status == 402:
-        return "Payment Required: Please add credits to your account to use this node."
-    if status == 409:
-        return "There is a problem with your account. Please contact support@comfy.org."
-    if status == 429:
-        return "Rate Limit Exceeded: Please try again later."
-    try:
-        if isinstance(body, dict):
-            err = body.get("error")
-            if isinstance(err, dict):
-                msg = err.get("message")
-                typ = err.get("type")
-                if msg and typ:
-                    return f"API Error: {msg} (Type: {typ})"
-                if msg:
-                    return f"API Error: {msg}"
-            return f"API Error: {json.dumps(body)}"
-        else:
-            txt = str(body)
-            if len(txt) <= 200:
-                return f"API Error (raw): {txt}"
-            return f"API Error (status {status})"
-    except Exception:
-        return f"HTTP {status}: Unknown error"
-
-
-def _generate_operation_id(method: str, path: str, attempt: int) -> str:
-    slug = path.strip("/").replace("/", "_") or "op"
-    return f"{method}_{slug}_try{attempt}_{uuid.uuid4().hex[:8]}"
-
-
-def _snapshot_request_body_for_logging(
-    content_type: str,
-    method: str,
-    data: Optional[dict[str, Any]],
-    files: Optional[Union[dict[str, Any], list[tuple[str, Any]]]],
-) -> Optional[Union[dict[str, Any], str]]:
-    if method.upper() == "GET":
-        return None
-    if content_type == "multipart/form-data":
-        form_fields = sorted([k for k, v in (data or {}).items() if v is not None])
-        file_fields: list[dict[str, str]] = []
-        if files:
-            file_iter = files if isinstance(files, list) else list(files.items())
-            for field_name, file_obj in file_iter:
-                if file_obj is None:
-                    continue
-                if isinstance(file_obj, tuple):
-                    filename = file_obj[0]
-                else:
-                    filename = getattr(file_obj, "name", field_name)
-                file_fields.append({"field": field_name, "filename": str(filename or "")})
-        return {"_multipart": True, "form_fields": form_fields, "file_fields": file_fields}
-    if content_type == "application/x-www-form-urlencoded":
-        return data or {}
-    return data or {}
-
-
-async def _request_base(cfg: _RequestConfig, expect_binary: bool):
-    """Core request with retries, per-second interruption monitoring, true cancellation, and friendly errors."""
-    url = cfg.endpoint.path
-    parsed_url = urlparse(url)
-    if not parsed_url.scheme and not parsed_url.netloc:  # is URL relative?
-        url = urljoin(default_base_url().rstrip("/") + "/", url.lstrip("/"))
-
-    method = cfg.endpoint.method
-    params = _merge_params(cfg.endpoint.query_params, method, cfg.data if method == "GET" else None)
-
-    async def _monitor(stop_evt: asyncio.Event, start_ts: float):
-        """Every second: update elapsed time and signal interruption."""
-        try:
-            while not stop_evt.is_set():
-                if is_processing_interrupted():
-                    return
-                if cfg.monitor_progress:
-                    _display_time_progress(
-                        cfg.node_cls, cfg.wait_label, int(time.monotonic() - start_ts), cfg.estimated_total
-                    )
-                await asyncio.sleep(1.0)
-        except asyncio.CancelledError:
-            return  # normal shutdown
-
-    start_time = cfg.progress_origin_ts if cfg.progress_origin_ts is not None else time.monotonic()
-    attempt = 0
-    delay = cfg.retry_delay
-    operation_succeeded: bool = False
-    final_elapsed_seconds: Optional[int] = None
-    extracted_price: Optional[float] = None
-    while True:
-        attempt += 1
-        stop_event = asyncio.Event()
-        monitor_task: Optional[asyncio.Task] = None
-        sess: Optional[aiohttp.ClientSession] = None
-
-        operation_id = _generate_operation_id(method, cfg.endpoint.path, attempt)
-        logging.debug("[DEBUG] HTTP %s %s (attempt %d)", method, url, attempt)
-
-        payload_headers = {"Accept": "*/*"} if expect_binary else {"Accept": "application/json"}
-        if not parsed_url.scheme and not parsed_url.netloc:  # is URL relative?
-            payload_headers.update(get_auth_header(cfg.node_cls))
-        if cfg.endpoint.headers:
-            payload_headers.update(cfg.endpoint.headers)
-
-        payload_kw: dict[str, Any] = {"headers": payload_headers}
-        if method == "GET":
-            payload_headers.pop("Content-Type", None)
-        request_body_log = _snapshot_request_body_for_logging(cfg.content_type, method, cfg.data, cfg.files)
-        try:
-            if cfg.monitor_progress:
-                monitor_task = asyncio.create_task(_monitor(stop_event, start_time))
-
-            timeout = aiohttp.ClientTimeout(total=cfg.timeout)
-            sess = aiohttp.ClientSession(timeout=timeout)
-
-            if cfg.content_type == "multipart/form-data" and method != "GET":
-                # aiohttp will set Content-Type boundary; remove any fixed Content-Type
-                payload_headers.pop("Content-Type", None)
-                if cfg.multipart_parser and cfg.data:
-                    form = cfg.multipart_parser(cfg.data)
-                    if not isinstance(form, aiohttp.FormData):
-                        raise ValueError("multipart_parser must return aiohttp.FormData")
-                else:
-                    form = aiohttp.FormData(default_to_multipart=True)
-                    if cfg.data:
-                        for k, v in cfg.data.items():
-                            if v is None:
-                                continue
-                            form.add_field(k, str(v) if not isinstance(v, (bytes, bytearray)) else v)
-                if cfg.files:
-                    file_iter = cfg.files if isinstance(cfg.files, list) else cfg.files.items()
-                    for field_name, file_obj in file_iter:
-                        if file_obj is None:
-                            continue
-                        if isinstance(file_obj, tuple):
-                            filename, file_value, content_type = _unpack_tuple(file_obj)
-                        else:
-                            filename = getattr(file_obj, "name", field_name)
-                            file_value = file_obj
-                            content_type = "application/octet-stream"
-                        # Attempt to rewind BytesIO for retries
-                        if isinstance(file_value, BytesIO):
-                            with contextlib.suppress(Exception):
-                                file_value.seek(0)
-                        form.add_field(field_name, file_value, filename=filename, content_type=content_type)
-                payload_kw["data"] = form
-            elif cfg.content_type == "application/x-www-form-urlencoded" and method != "GET":
-                payload_headers["Content-Type"] = "application/x-www-form-urlencoded"
-                payload_kw["data"] = cfg.data or {}
-            elif method != "GET":
-                payload_headers["Content-Type"] = "application/json"
-                payload_kw["json"] = cfg.data or {}
-
-            try:
-                request_logger.log_request_response(
-                    operation_id=operation_id,
-                    request_method=method,
-                    request_url=url,
-                    request_headers=dict(payload_headers) if payload_headers else None,
-                    request_params=dict(params) if params else None,
-                    request_data=request_body_log,
-                )
-            except Exception as _log_e:
-                logging.debug("[DEBUG] request logging failed: %s", _log_e)
-
-            req_coro = sess.request(method, url, params=params, **payload_kw)
-            req_task = asyncio.create_task(req_coro)
-
-            # Race: request vs. monitor (interruption)
-            tasks = {req_task}
-            if monitor_task:
-                tasks.add(monitor_task)
-            done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
-
-            if monitor_task and monitor_task in done:
-                # Interrupted – cancel the request and abort
-                if req_task in pending:
-                    req_task.cancel()
-                raise ProcessingInterrupted("Task cancelled")
-
-            # Otherwise, request finished
-            resp = await req_task
-            async with resp:
-                if resp.status >= 400:
-                    try:
-                        body = await resp.json()
-                    except (ContentTypeError, json.JSONDecodeError):
-                        body = await resp.text()
-                    if resp.status in _RETRY_STATUS and attempt <= cfg.max_retries:
-                        logging.warning(
-                            "HTTP %s %s -> %s. Retrying in %.2fs (retry %d of %d).",
-                            method,
-                            url,
-                            resp.status,
-                            delay,
-                            attempt,
-                            cfg.max_retries,
-                        )
-                        try:
-                            request_logger.log_request_response(
-                                operation_id=operation_id,
-                                request_method=method,
-                                request_url=url,
-                                response_status_code=resp.status,
-                                response_headers=dict(resp.headers),
-                                response_content=body,
-                                error_message=_friendly_http_message(resp.status, body),
-                            )
-                        except Exception as _log_e:
-                            logging.debug("[DEBUG] response logging failed: %s", _log_e)
-
-                        await sleep_with_interrupt(
-                            delay,
-                            cfg.node_cls,
-                            cfg.wait_label if cfg.monitor_progress else None,
-                            start_time if cfg.monitor_progress else None,
-                            cfg.estimated_total,
-                            display_callback=_display_time_progress if cfg.monitor_progress else None,
-                        )
-                        delay *= cfg.retry_backoff
-                        continue
-                    msg = _friendly_http_message(resp.status, body)
-                    try:
-                        request_logger.log_request_response(
-                            operation_id=operation_id,
-                            request_method=method,
-                            request_url=url,
-                            response_status_code=resp.status,
-                            response_headers=dict(resp.headers),
-                            response_content=body,
-                            error_message=msg,
-                        )
-                    except Exception as _log_e:
-                        logging.debug("[DEBUG] response logging failed: %s", _log_e)
-                    raise Exception(msg)
-
-                if expect_binary:
-                    buff = bytearray()
-                    last_tick = time.monotonic()
-                    async for chunk in resp.content.iter_chunked(64 * 1024):
-                        buff.extend(chunk)
-                        now = time.monotonic()
-                        if now - last_tick >= 1.0:
-                            last_tick = now
-                            if is_processing_interrupted():
-                                raise ProcessingInterrupted("Task cancelled")
-                            if cfg.monitor_progress:
-                                _display_time_progress(
-                                    cfg.node_cls, cfg.wait_label, int(now - start_time), cfg.estimated_total
-                                )
-                    bytes_payload = bytes(buff)
-                    operation_succeeded = True
-                    final_elapsed_seconds = int(time.monotonic() - start_time)
-                    try:
-                        request_logger.log_request_response(
-                            operation_id=operation_id,
-                            request_method=method,
-                            request_url=url,
-                            response_status_code=resp.status,
-                            response_headers=dict(resp.headers),
-                            response_content=bytes_payload,
-                        )
-                    except Exception as _log_e:
-                        logging.debug("[DEBUG] response logging failed: %s", _log_e)
-                    return bytes_payload
-                else:
-                    try:
-                        payload = await resp.json()
-                        response_content_to_log: Any = payload
-                    except (ContentTypeError, json.JSONDecodeError):
-                        text = await resp.text()
-                        try:
-                            payload = json.loads(text) if text else {}
-                        except json.JSONDecodeError:
-                            payload = {"_raw": text}
-                        response_content_to_log = payload if isinstance(payload, dict) else text
-                    with contextlib.suppress(Exception):
-                        extracted_price = cfg.price_extractor(payload) if cfg.price_extractor else None
-                    operation_succeeded = True
-                    final_elapsed_seconds = int(time.monotonic() - start_time)
-                    try:
-                        request_logger.log_request_response(
-                            operation_id=operation_id,
-                            request_method=method,
-                            request_url=url,
-                            response_status_code=resp.status,
-                            response_headers=dict(resp.headers),
-                            response_content=response_content_to_log,
-                        )
-                    except Exception as _log_e:
-                        logging.debug("[DEBUG] response logging failed: %s", _log_e)
-                    return payload
-
-        except ProcessingInterrupted:
-            logging.debug("Polling was interrupted by user")
-            raise
-        except (ClientError, OSError) as e:
-            if attempt <= cfg.max_retries:
-                logging.warning(
-                    "Connection error calling %s %s. Retrying in %.2fs (%d/%d): %s",
-                    method,
-                    url,
-                    delay,
-                    attempt,
-                    cfg.max_retries,
-                    str(e),
-                )
-                try:
-                    request_logger.log_request_response(
-                        operation_id=operation_id,
-                        request_method=method,
-                        request_url=url,
-                        request_headers=dict(payload_headers) if payload_headers else None,
-                        request_params=dict(params) if params else None,
-                        request_data=request_body_log,
-                        error_message=f"{type(e).__name__}: {str(e)} (will retry)",
-                    )
-                except Exception as _log_e:
-                    logging.debug("[DEBUG] request error logging failed: %s", _log_e)
-                await sleep_with_interrupt(
-                    delay,
-                    cfg.node_cls,
-                    cfg.wait_label if cfg.monitor_progress else None,
-                    start_time if cfg.monitor_progress else None,
-                    cfg.estimated_total,
-                    display_callback=_display_time_progress if cfg.monitor_progress else None,
-                )
-                delay *= cfg.retry_backoff
-                continue
-            diag = await _diagnose_connectivity()
-            if not diag["internet_accessible"]:
-                try:
-                    request_logger.log_request_response(
-                        operation_id=operation_id,
-                        request_method=method,
-                        request_url=url,
-                        request_headers=dict(payload_headers) if payload_headers else None,
-                        request_params=dict(params) if params else None,
-                        request_data=request_body_log,
-                        error_message=f"LocalNetworkError: {str(e)}",
-                    )
-                except Exception as _log_e:
-                    logging.debug("[DEBUG] final error logging failed: %s", _log_e)
-                raise LocalNetworkError(
-                    "Unable to connect to the API server due to local network issues. "
-                    "Please check your internet connection and try again."
-                ) from e
-            try:
-                request_logger.log_request_response(
-                    operation_id=operation_id,
-                    request_method=method,
-                    request_url=url,
-                    request_headers=dict(payload_headers) if payload_headers else None,
-                    request_params=dict(params) if params else None,
-                    request_data=request_body_log,
-                    error_message=f"ApiServerError: {str(e)}",
-                )
-            except Exception as _log_e:
-                logging.debug("[DEBUG] final error logging failed: %s", _log_e)
-            raise ApiServerError(
-                f"The API server at {default_base_url()} is currently unreachable. "
-                f"The service may be experiencing issues."
-            ) from e
-        finally:
-            stop_event.set()
-            if monitor_task:
-                monitor_task.cancel()
-                with contextlib.suppress(Exception):
-                    await monitor_task
-            if sess:
-                with contextlib.suppress(Exception):
-                    await sess.close()
-            if operation_succeeded and cfg.monitor_progress and cfg.final_label_on_success:
-                _display_time_progress(
-                    cfg.node_cls,
-                    status=cfg.final_label_on_success,
-                    elapsed_seconds=(
-                        final_elapsed_seconds
-                        if final_elapsed_seconds is not None
-                        else int(time.monotonic() - start_time)
-                    ),
-                    estimated_total=cfg.estimated_total,
-                    price=extracted_price,
-                    is_queued=False,
-                    processing_elapsed_seconds=final_elapsed_seconds,
-                )
-
-
-def _validate_or_raise(response_model: Type[M], payload: Any) -> M:
-    try:
-        return response_model.model_validate(payload)
-    except Exception as e:
-        logging.error(
-            "Response validation failed for %s: %s",
-            getattr(response_model, "__name__", response_model),
-            e,
-        )
-        raise Exception(
-            f"Response validation failed for {getattr(response_model, '__name__', response_model)}: {e}"
-        ) from e
-
-
-def _wrap_model_extractor(
-    response_model: Type[M],
-    extractor: Optional[Callable[[M], Any]],
-) -> Optional[Callable[[dict[str, Any]], Any]]:
-    """Wrap a typed extractor so it can be used by the dict-based poller.
-    Validates the dict into `response_model` before invoking `extractor`.
-    Uses a small per-wrapper cache keyed by `id(dict)` to avoid re-validating
-    the same response for multiple extractors in a single poll attempt.
-    """
-    if extractor is None:
-        return None
-    _cache: dict[int, M] = {}
-
-    def _wrapped(d: dict[str, Any]) -> Any:
-        try:
-            key = id(d)
-            model = _cache.get(key)
-            if model is None:
-                model = response_model.model_validate(d)
-                _cache[key] = model
-            return extractor(model)
-        except Exception as e:
-            logging.error("Extractor failed (typed -> dict wrapper): %s", e)
-            raise
-
-    return _wrapped
-
-
-def _normalize_statuses(values: Optional[Iterable[Union[str, int]]]) -> set[Union[str, int]]:
-    if not values:
-        return set()
-    out: set[Union[str, int]] = set()
-    for v in values:
-        nv = _normalize_status_value(v)
-        if nv is not None:
-            out.add(nv)
-    return out
-
-
-def _normalize_status_value(val: Union[str, int, None]) -> Union[str, int, None]:
-    if isinstance(val, str):
-        return val.strip().lower()
-    return val
--- a/comfy_api_nodes/util/common_exceptions.py
+++ b/comfy_api_nodes/util/common_exceptions.py
@@ -1,14 +0,0 @@
-class NetworkError(Exception):
-    """Base exception for network-related errors with diagnostic information."""
-
-
-class LocalNetworkError(NetworkError):
-    """Exception raised when local network connectivity issues are detected."""
-
-
-class ApiServerError(NetworkError):
-    """Exception raised when the API server is unreachable but internet is working."""
-
-
-class ProcessingInterrupted(Exception):
-    """Operation was interrupted by user/runtime via processing_interrupted()."""
--- a/comfy_api_nodes/util/conversions.py
+++ b/comfy_api_nodes/util/conversions.py
@@ -1,470 +0,0 @@
-import base64
-import logging
-import math
-import mimetypes
-import uuid
-from io import BytesIO
-from typing import Optional
-
-import av
-import numpy as np
-import torch
-from PIL import Image
-
-from comfy.utils import common_upscale
-from comfy_api.latest import Input, InputImpl
-from comfy_api.util import VideoCodec, VideoContainer
-
-from ._helpers import mimetype_to_extension
-
-
-def bytesio_to_image_tensor(image_bytesio: BytesIO, mode: str = "RGBA") -> torch.Tensor:
-    """Converts image data from BytesIO to a torch.Tensor.
-
-    Args:
-        image_bytesio: BytesIO object containing the image data.
-        mode: The PIL mode to convert the image to (e.g., "RGB", "RGBA").
-
-    Returns:
-        A torch.Tensor representing the image (1, H, W, C).
-
-    Raises:
-        PIL.UnidentifiedImageError: If the image data cannot be identified.
-        ValueError: If the specified mode is invalid.
-    """
-    image = Image.open(image_bytesio)
-    image = image.convert(mode)
-    image_array = np.array(image).astype(np.float32) / 255.0
-    return torch.from_numpy(image_array).unsqueeze(0)
-
-
-def image_tensor_pair_to_batch(image1: torch.Tensor, image2: torch.Tensor) -> torch.Tensor:
-    """
-    Converts a pair of image tensors to a batch tensor.
-    If the images are not the same size, the smaller image is resized to
-    match the larger image.
-    """
-    if image1.shape[1:] != image2.shape[1:]:
-        image2 = common_upscale(
-            image2.movedim(-1, 1),
-            image1.shape[2],
-            image1.shape[1],
-            "bilinear",
-            "center",
-        ).movedim(1, -1)
-    return torch.cat((image1, image2), dim=0)
-
-
-def tensor_to_bytesio(
-    image: torch.Tensor,
-    name: Optional[str] = None,
-    total_pixels: int = 2048 * 2048,
-    mime_type: str = "image/png",
-) -> BytesIO:
-    """Converts a torch.Tensor image to a named BytesIO object.
-
-    Args:
-        image: Input torch.Tensor image.
-        name: Optional filename for the BytesIO object.
-        total_pixels: Maximum total pixels for potential downscaling.
-        mime_type: Target image MIME type (e.g., 'image/png', 'image/jpeg', 'image/webp', 'video/mp4').
-
-    Returns:
-        Named BytesIO object containing the image data, with pointer set to the start of buffer.
-    """
-    if not mime_type:
-        mime_type = "image/png"
-
-    pil_image = tensor_to_pil(image, total_pixels=total_pixels)
-    img_binary = pil_to_bytesio(pil_image, mime_type=mime_type)
-    img_binary.name = f"{name if name else uuid.uuid4()}.{mimetype_to_extension(mime_type)}"
-    return img_binary
-
-
-def tensor_to_pil(image: torch.Tensor, total_pixels: int = 2048 * 2048) -> Image.Image:
-    """Converts a single torch.Tensor image [H, W, C] to a PIL Image, optionally downscaling."""
-    if len(image.shape) > 3:
-        image = image[0]
-    # TODO: remove alpha if not allowed and present
-    input_tensor = image.cpu()
-    input_tensor = downscale_image_tensor(input_tensor.unsqueeze(0), total_pixels=total_pixels).squeeze()
-    image_np = (input_tensor.numpy() * 255).astype(np.uint8)
-    img = Image.fromarray(image_np)
-    return img
-
-
-def tensor_to_base64_string(
-    image_tensor: torch.Tensor,
-    total_pixels: int = 2048 * 2048,
-    mime_type: str = "image/png",
-) -> str:
-    """Convert [B, H, W, C] or [H, W, C] tensor to a base64 string.
-
-    Args:
-        image_tensor: Input torch.Tensor image.
-        total_pixels: Maximum total pixels for potential downscaling.
-        mime_type: Target image MIME type (e.g., 'image/png', 'image/jpeg', 'image/webp', 'video/mp4').
-
-    Returns:
-        Base64 encoded string of the image.
-    """
-    pil_image = tensor_to_pil(image_tensor, total_pixels=total_pixels)
-    img_byte_arr = pil_to_bytesio(pil_image, mime_type=mime_type)
-    img_bytes = img_byte_arr.getvalue()
-    # Encode bytes to base64 string
-    base64_encoded_string = base64.b64encode(img_bytes).decode("utf-8")
-    return base64_encoded_string
-
-
-def pil_to_bytesio(img: Image.Image, mime_type: str = "image/png") -> BytesIO:
-    """Converts a PIL Image to a BytesIO object."""
-    if not mime_type:
-        mime_type = "image/png"
-
-    img_byte_arr = BytesIO()
-    # Derive PIL format from MIME type (e.g., 'image/png' -> 'PNG')
-    pil_format = mime_type.split("/")[-1].upper()
-    if pil_format == "JPG":
-        pil_format = "JPEG"
-    img.save(img_byte_arr, format=pil_format)
-    img_byte_arr.seek(0)
-    return img_byte_arr
-
-
-def downscale_image_tensor(image, total_pixels=1536 * 1024) -> torch.Tensor:
-    """Downscale input image tensor to roughly the specified total pixels."""
-    samples = image.movedim(-1, 1)
-    total = int(total_pixels)
-    scale_by = math.sqrt(total / (samples.shape[3] * samples.shape[2]))
-    if scale_by >= 1:
-        return image
-    width = round(samples.shape[3] * scale_by)
-    height = round(samples.shape[2] * scale_by)
-
-    s = common_upscale(samples, width, height, "lanczos", "disabled")
-    s = s.movedim(1, -1)
-    return s
-
-
-def tensor_to_data_uri(
-    image_tensor: torch.Tensor,
-    total_pixels: int = 2048 * 2048,
-    mime_type: str = "image/png",
-) -> str:
-    """Converts a tensor image to a Data URI string.
-
-    Args:
-        image_tensor: Input torch.Tensor image.
-        total_pixels: Maximum total pixels for potential downscaling.
-        mime_type: Target image MIME type (e.g., 'image/png', 'image/jpeg', 'image/webp').
-
-    Returns:
-        Data URI string (e.g., 'data:image/png;base64,...').
-    """
-    base64_string = tensor_to_base64_string(image_tensor, total_pixels, mime_type)
-    return f"data:{mime_type};base64,{base64_string}"
-
-
-def audio_to_base64_string(audio: Input.Audio, container_format: str = "mp4", codec_name: str = "aac") -> str:
-    """Converts an audio input to a base64 string."""
-    sample_rate: int = audio["sample_rate"]
-    waveform: torch.Tensor = audio["waveform"]
-    audio_data_np = audio_tensor_to_contiguous_ndarray(waveform)
-    audio_bytes_io = audio_ndarray_to_bytesio(audio_data_np, sample_rate, container_format, codec_name)
-    audio_bytes = audio_bytes_io.getvalue()
-    return base64.b64encode(audio_bytes).decode("utf-8")
-
-
-def video_to_base64_string(
-    video: Input.Video,
-    container_format: VideoContainer = None,
-    codec: VideoCodec = None
-) -> str:
-    """
-    Converts a video input to a base64 string.
-
-    Args:
-        video: The video input to convert
-        container_format: Optional container format to use (defaults to video.container if available)
-        codec: Optional codec to use (defaults to video.codec if available)
-    """
-    video_bytes_io = BytesIO()
-
-    # Use provided format/codec if specified, otherwise use video's own if available
-    format_to_use = container_format if container_format is not None else getattr(video, 'container', VideoContainer.MP4)
-    codec_to_use = codec if codec is not None else getattr(video, 'codec', VideoCodec.H264)
-
-    video.save_to(video_bytes_io, format=format_to_use, codec=codec_to_use)
-    video_bytes_io.seek(0)
-    return base64.b64encode(video_bytes_io.getvalue()).decode("utf-8")
-
-
-def audio_ndarray_to_bytesio(
-    audio_data_np: np.ndarray,
-    sample_rate: int,
-    container_format: str = "mp4",
-    codec_name: str = "aac",
-) -> BytesIO:
-    """
-    Encodes a numpy array of audio data into a BytesIO object.
-    """
-    audio_bytes_io = BytesIO()
-    with av.open(audio_bytes_io, mode="w", format=container_format) as output_container:
-        audio_stream = output_container.add_stream(codec_name, rate=sample_rate)
-        frame = av.AudioFrame.from_ndarray(
-            audio_data_np,
-            format="fltp",
-            layout="stereo" if audio_data_np.shape[0] > 1 else "mono",
-        )
-        frame.sample_rate = sample_rate
-        frame.pts = 0
-
-        for packet in audio_stream.encode(frame):
-            output_container.mux(packet)
-
-        # Flush stream
-        for packet in audio_stream.encode(None):
-            output_container.mux(packet)
-
-    audio_bytes_io.seek(0)
-    return audio_bytes_io
-
-
-def audio_tensor_to_contiguous_ndarray(waveform: torch.Tensor) -> np.ndarray:
-    """
-    Prepares audio waveform for av library by converting to a contiguous numpy array.
-
-    Args:
-        waveform: a tensor of shape (1, channels, samples) derived from a Comfy `AUDIO` type.
-
-    Returns:
-        Contiguous numpy array of the audio waveform. If the audio was batched,
-            the first item is taken.
-    """
-    if waveform.ndim != 3 or waveform.shape[0] != 1:
-        raise ValueError("Expected waveform tensor shape (1, channels, samples)")
-
-    # If batch is > 1, take first item
-    if waveform.shape[0] > 1:
-        waveform = waveform[0]
-
-    # Prepare for av: remove batch dim, move to CPU, make contiguous, convert to numpy array
-    audio_data_np = waveform.squeeze(0).cpu().contiguous().numpy()
-    if audio_data_np.dtype != np.float32:
-        audio_data_np = audio_data_np.astype(np.float32)
-
-    return audio_data_np
-
-
-def audio_input_to_mp3(audio: Input.Audio) -> BytesIO:
-    waveform = audio["waveform"].cpu()
-
-    output_buffer = BytesIO()
-    output_container = av.open(output_buffer, mode="w", format="mp3")
-
-    out_stream = output_container.add_stream("libmp3lame", rate=audio["sample_rate"])
-    out_stream.bit_rate = 320000
-
-    frame = av.AudioFrame.from_ndarray(
-        waveform.movedim(0, 1).reshape(1, -1).float().numpy(),
-        format="flt",
-        layout="mono" if waveform.shape[0] == 1 else "stereo",
-    )
-    frame.sample_rate = audio["sample_rate"]
-    frame.pts = 0
-    output_container.mux(out_stream.encode(frame))
-    output_container.mux(out_stream.encode(None))
-    output_container.close()
-    output_buffer.seek(0)
-    return output_buffer
-
-
-def trim_video(video: Input.Video, duration_sec: float) -> Input.Video:
-    """
-    Returns a new VideoInput object trimmed from the beginning to the specified duration,
-    using av to avoid loading entire video into memory.
-
-    Args:
-        video: Input video to trim
-        duration_sec: Duration in seconds to keep from the beginning
-
-    Returns:
-        VideoFromFile object that owns the output buffer
-    """
-    output_buffer = BytesIO()
-    input_container = None
-    output_container = None
-
-    try:
-        # Get the stream source - this avoids loading entire video into memory
-        # when the source is already a file path
-        input_source = video.get_stream_source()
-
-        # Open containers
-        input_container = av.open(input_source, mode="r")
-        output_container = av.open(output_buffer, mode="w", format="mp4")
-
-        # Set up output streams for re-encoding
-        video_stream = None
-        audio_stream = None
-
-        for stream in input_container.streams:
-            logging.info("Found stream: type=%s, class=%s", stream.type, type(stream))
-            if isinstance(stream, av.VideoStream):
-                # Create output video stream with same parameters
-                video_stream = output_container.add_stream("h264", rate=stream.average_rate)
-                video_stream.width = stream.width
-                video_stream.height = stream.height
-                video_stream.pix_fmt = "yuv420p"
-                logging.info("Added video stream: %sx%s @ %sfps", stream.width, stream.height, stream.average_rate)
-            elif isinstance(stream, av.AudioStream):
-                # Create output audio stream with same parameters
-                audio_stream = output_container.add_stream("aac", rate=stream.sample_rate)
-                audio_stream.sample_rate = stream.sample_rate
-                audio_stream.layout = stream.layout
-                logging.info("Added audio stream: %sHz, %s channels", stream.sample_rate, stream.channels)
-
-        # Calculate target frame count that's divisible by 16
-        fps = input_container.streams.video[0].average_rate
-        estimated_frames = int(duration_sec * fps)
-        target_frames = (estimated_frames // 16) * 16  # Round down to nearest multiple of 16
-
-        if target_frames == 0:
-            raise ValueError("Video too short: need at least 16 frames for Moonvalley")
-
-        frame_count = 0
-        audio_frame_count = 0
-
-        # Decode and re-encode video frames
-        if video_stream:
-            for frame in input_container.decode(video=0):
-                if frame_count >= target_frames:
-                    break
-
-                # Re-encode frame
-                for packet in video_stream.encode(frame):
-                    output_container.mux(packet)
-                frame_count += 1
-
-            # Flush encoder
-            for packet in video_stream.encode():
-                output_container.mux(packet)
-
-            logging.info("Encoded %s video frames (target: %s)", frame_count, target_frames)
-
-        # Decode and re-encode audio frames
-        if audio_stream:
-            input_container.seek(0)  # Reset to beginning for audio
-            for frame in input_container.decode(audio=0):
-                if frame.time >= duration_sec:
-                    break
-
-                # Re-encode frame
-                for packet in audio_stream.encode(frame):
-                    output_container.mux(packet)
-                audio_frame_count += 1
-
-            # Flush encoder
-            for packet in audio_stream.encode():
-                output_container.mux(packet)
-
-            logging.info("Encoded %s audio frames", audio_frame_count)
-
-        # Close containers
-        output_container.close()
-        input_container.close()
-
-        # Return as VideoFromFile using the buffer
-        output_buffer.seek(0)
-        return InputImpl.VideoFromFile(output_buffer)
-
-    except Exception as e:
-        # Clean up on error
-        if input_container is not None:
-            input_container.close()
-        if output_container is not None:
-            output_container.close()
-        raise RuntimeError(f"Failed to trim video: {str(e)}") from e
-
-
-def _f32_pcm(wav: torch.Tensor) -> torch.Tensor:
-    """Convert audio to float 32 bits PCM format. Copy-paste from nodes_audio.py file."""
-    if wav.dtype.is_floating_point:
-        return wav
-    elif wav.dtype == torch.int16:
-        return wav.float() / (2**15)
-    elif wav.dtype == torch.int32:
-        return wav.float() / (2**31)
-    raise ValueError(f"Unsupported wav dtype: {wav.dtype}")
-
-
-def audio_bytes_to_audio_input(audio_bytes: bytes) -> dict:
-    """
-    Decode any common audio container from bytes using PyAV and return
-    a Comfy AUDIO dict: {"waveform": [1, C, T] float32, "sample_rate": int}.
-    """
-    with av.open(BytesIO(audio_bytes)) as af:
-        if not af.streams.audio:
-            raise ValueError("No audio stream found in response.")
-        stream = af.streams.audio[0]
-
-        in_sr = int(stream.codec_context.sample_rate)
-        out_sr = in_sr
-
-        frames: list[torch.Tensor] = []
-        n_channels = stream.channels or 1
-
-        for frame in af.decode(streams=stream.index):
-            arr = frame.to_ndarray()  # shape can be [C, T] or [T, C] or [T]
-            buf = torch.from_numpy(arr)
-            if buf.ndim == 1:
-                buf = buf.unsqueeze(0)  # [T] -> [1, T]
-            elif buf.shape[0] != n_channels and buf.shape[-1] == n_channels:
-                buf = buf.transpose(0, 1).contiguous()  # [T, C] -> [C, T]
-            elif buf.shape[0] != n_channels:
-                buf = buf.reshape(-1, n_channels).t().contiguous()  # fallback to [C, T]
-            frames.append(buf)
-
-    if not frames:
-        raise ValueError("Decoded zero audio frames.")
-
-    wav = torch.cat(frames, dim=1)  # [C, T]
-    wav = _f32_pcm(wav)
-    return {"waveform": wav.unsqueeze(0).contiguous(), "sample_rate": out_sr}
-
-
-def resize_mask_to_image(
-    mask: torch.Tensor,
-    image: torch.Tensor,
-    upscale_method="nearest-exact",
-    crop="disabled",
-    allow_gradient=True,
-    add_channel_dim=False,
-):
-    """Resize mask to be the same dimensions as an image, while maintaining proper format for API calls."""
-    _, height, width, _ = image.shape
-    mask = mask.unsqueeze(-1)
-    mask = mask.movedim(-1, 1)
-    mask = common_upscale(mask, width=width, height=height, upscale_method=upscale_method, crop=crop)
-    mask = mask.movedim(1, -1)
-    if not add_channel_dim:
-        mask = mask.squeeze(-1)
-    if not allow_gradient:
-        mask = (mask > 0.5).float()
-    return mask
-
-
-def text_filepath_to_base64_string(filepath: str) -> str:
-    """Converts a text file to a base64 string."""
-    with open(filepath, "rb") as f:
-        file_content = f.read()
-    return base64.b64encode(file_content).decode("utf-8")
-
-
-def text_filepath_to_data_uri(filepath: str) -> str:
-    """Converts a text file to a data URI."""
-    base64_string = text_filepath_to_base64_string(filepath)
-    mime_type, _ = mimetypes.guess_type(filepath)
-    if mime_type is None:
-        mime_type = "application/octet-stream"
-    return f"data:{mime_type};base64,{base64_string}"
--- a/comfy_api_nodes/util/download_helpers.py
+++ b/comfy_api_nodes/util/download_helpers.py
@@ -1,262 +0,0 @@
-import asyncio
-import contextlib
-import uuid
-from io import BytesIO
-from pathlib import Path
-from typing import IO, Optional, Union
-from urllib.parse import urljoin, urlparse
-
-import aiohttp
-import torch
-from aiohttp.client_exceptions import ClientError, ContentTypeError
-
-from comfy_api.input_impl import VideoFromFile
-from comfy_api.latest import IO as COMFY_IO
-
-from . import request_logger
-from ._helpers import (
-    default_base_url,
-    get_auth_header,
-    is_processing_interrupted,
-    sleep_with_interrupt,
-)
-from .client import _diagnose_connectivity
-from .common_exceptions import ApiServerError, LocalNetworkError, ProcessingInterrupted
-from .conversions import bytesio_to_image_tensor
-
-_RETRY_STATUS = {408, 429, 500, 502, 503, 504}
-
-
-async def download_url_to_bytesio(
-    url: str,
-    dest: Optional[Union[BytesIO, IO[bytes], str, Path]],
-    *,
-    timeout: Optional[float] = None,
-    max_retries: int = 5,
-    retry_delay: float = 1.0,
-    retry_backoff: float = 2.0,
-    cls: type[COMFY_IO.ComfyNode] = None,
-) -> None:
-    """Stream-download a URL to `dest`.
-
-    `dest` must be one of:
-      - a BytesIO (rewound to 0 after write),
-      - a file-like object opened in binary write mode (must implement .write()),
-      - a filesystem path (str | pathlib.Path), which will be opened with 'wb'.
-
-    If `url` starts with `/proxy/`, `cls` must be provided so the URL can be expanded
-    to an absolute URL and authentication headers can be applied.
-
-    Raises:
-        ProcessingInterrupted, LocalNetworkError, ApiServerError, Exception (HTTP and other errors)
-    """
-    if not isinstance(dest, (str, Path)) and not hasattr(dest, "write"):
-        raise ValueError("dest must be a path (str|Path) or a binary-writable object providing .write().")
-
-    attempt = 0
-    delay = retry_delay
-    headers: dict[str, str] = {}
-
-    parsed_url = urlparse(url)
-    if not parsed_url.scheme and not parsed_url.netloc:  # is URL relative?
-        if cls is None:
-            raise ValueError("For relative 'cloud' paths, the `cls` parameter is required.")
-        url = urljoin(default_base_url().rstrip("/") + "/", url.lstrip("/"))
-        headers = get_auth_header(cls)
-
-    while True:
-        attempt += 1
-        op_id = _generate_operation_id("GET", url, attempt)
-        timeout_cfg = aiohttp.ClientTimeout(total=timeout)
-
-        is_path_sink = isinstance(dest, (str, Path))
-        fhandle = None
-        session: Optional[aiohttp.ClientSession] = None
-        stop_evt: Optional[asyncio.Event] = None
-        monitor_task: Optional[asyncio.Task] = None
-        req_task: Optional[asyncio.Task] = None
-
-        try:
-            with contextlib.suppress(Exception):
-                request_logger.log_request_response(operation_id=op_id, request_method="GET", request_url=url)
-
-            session = aiohttp.ClientSession(timeout=timeout_cfg)
-            stop_evt = asyncio.Event()
-
-            async def _monitor():
-                try:
-                    while not stop_evt.is_set():
-                        if is_processing_interrupted():
-                            return
-                        await asyncio.sleep(1.0)
-                except asyncio.CancelledError:
-                    return
-
-            monitor_task = asyncio.create_task(_monitor())
-
-            req_task = asyncio.create_task(session.get(url, headers=headers))
-            done, pending = await asyncio.wait({req_task, monitor_task}, return_when=asyncio.FIRST_COMPLETED)
-
-            if monitor_task in done and req_task in pending:
-                req_task.cancel()
-                with contextlib.suppress(Exception):
-                    await req_task
-                raise ProcessingInterrupted("Task cancelled")
-
-            try:
-                resp = await req_task
-            except asyncio.CancelledError:
-                raise ProcessingInterrupted("Task cancelled") from None
-
-            async with resp:
-                if resp.status >= 400:
-                    with contextlib.suppress(Exception):
-                        try:
-                            body = await resp.json()
-                        except (ContentTypeError, ValueError):
-                            text = await resp.text()
-                            body = text if len(text) <= 4096 else f"[text {len(text)} bytes]"
-                        request_logger.log_request_response(
-                            operation_id=op_id,
-                            request_method="GET",
-                            request_url=url,
-                            response_status_code=resp.status,
-                            response_headers=dict(resp.headers),
-                            response_content=body,
-                            error_message=f"HTTP {resp.status}",
-                        )
-
-                    if resp.status in _RETRY_STATUS and attempt <= max_retries:
-                        await sleep_with_interrupt(delay, cls, None, None, None)
-                        delay *= retry_backoff
-                        continue
-                    raise Exception(f"Failed to download (HTTP {resp.status}).")
-
-                if is_path_sink:
-                    p = Path(str(dest))
-                    with contextlib.suppress(Exception):
-                        p.parent.mkdir(parents=True, exist_ok=True)
-                    fhandle = open(p, "wb")
-                    sink = fhandle
-                else:
-                    sink = dest  # BytesIO or file-like
-
-                written = 0
-                while True:
-                    try:
-                        chunk = await asyncio.wait_for(resp.content.read(1024 * 1024), timeout=1.0)
-                    except asyncio.TimeoutError:
-                        chunk = b""
-                    except asyncio.CancelledError:
-                        raise ProcessingInterrupted("Task cancelled") from None
-
-                    if is_processing_interrupted():
-                        raise ProcessingInterrupted("Task cancelled")
-
-                    if not chunk:
-                        if resp.content.at_eof():
-                            break
-                        continue
-
-                    sink.write(chunk)
-                    written += len(chunk)
-
-                if isinstance(dest, BytesIO):
-                    with contextlib.suppress(Exception):
-                        dest.seek(0)
-
-                with contextlib.suppress(Exception):
-                    request_logger.log_request_response(
-                        operation_id=op_id,
-                        request_method="GET",
-                        request_url=url,
-                        response_status_code=resp.status,
-                        response_headers=dict(resp.headers),
-                        response_content=f"[streamed {written} bytes to dest]",
-                    )
-                return
-        except asyncio.CancelledError:
-            raise ProcessingInterrupted("Task cancelled") from None
-        except (ClientError, OSError) as e:
-            if attempt <= max_retries:
-                with contextlib.suppress(Exception):
-                    request_logger.log_request_response(
-                        operation_id=op_id,
-                        request_method="GET",
-                        request_url=url,
-                        error_message=f"{type(e).__name__}: {str(e)} (will retry)",
-                    )
-                await sleep_with_interrupt(delay, cls, None, None, None)
-                delay *= retry_backoff
-                continue
-
-            diag = await _diagnose_connectivity()
-            if not diag["internet_accessible"]:
-                raise LocalNetworkError(
-                    "Unable to connect to the network. Please check your internet connection and try again."
-                ) from e
-            raise ApiServerError("The remote service appears unreachable at this time.") from e
-        finally:
-            if stop_evt is not None:
-                stop_evt.set()
-            if monitor_task:
-                monitor_task.cancel()
-                with contextlib.suppress(Exception):
-                    await monitor_task
-            if req_task and not req_task.done():
-                req_task.cancel()
-                with contextlib.suppress(Exception):
-                    await req_task
-            if session:
-                with contextlib.suppress(Exception):
-                    await session.close()
-            if fhandle:
-                with contextlib.suppress(Exception):
-                    fhandle.flush()
-                    fhandle.close()
-
-
-async def download_url_to_image_tensor(
-    url: str,
-    *,
-    timeout: float = None,
-    cls: type[COMFY_IO.ComfyNode] = None,
-) -> torch.Tensor:
-    """Downloads an image from a URL and returns a [B, H, W, C] tensor."""
-    result = BytesIO()
-    await download_url_to_bytesio(url, result, timeout=timeout, cls=cls)
-    return bytesio_to_image_tensor(result)
-
-
-async def download_url_to_video_output(
-    video_url: str,
-    *,
-    timeout: float = None,
-    max_retries: int = 5,
-    cls: type[COMFY_IO.ComfyNode] = None,
-) -> VideoFromFile:
-    """Downloads a video from a URL and returns a `VIDEO` output."""
-    result = BytesIO()
-    await download_url_to_bytesio(video_url, result, timeout=timeout, max_retries=max_retries, cls=cls)
-    return VideoFromFile(result)
-
-
-async def download_url_as_bytesio(
-    url: str,
-    *,
-    timeout: float = None,
-    cls: type[COMFY_IO.ComfyNode] = None,
-) -> BytesIO:
-    """Downloads content from a URL and returns a new BytesIO (rewound to 0)."""
-    result = BytesIO()
-    await download_url_to_bytesio(url, result, timeout=timeout, cls=cls)
-    return result
-
-
-def _generate_operation_id(method: str, url: str, attempt: int) -> str:
-    try:
-        parsed = urlparse(url)
-        slug = (parsed.path.rsplit("/", 1)[-1] or parsed.netloc or "download").strip("/").replace("/", "_")
-    except Exception:
-        slug = "download"
-    return f"{method}_{slug}_try{attempt}_{uuid.uuid4().hex[:8]}"
--- a/comfy_api_nodes/util/upload_helpers.py
+++ b/comfy_api_nodes/util/upload_helpers.py
@@ -1,338 +0,0 @@
-import asyncio
-import contextlib
-import logging
-import time
-import uuid
-from io import BytesIO
-from typing import Optional, Union
-from urllib.parse import urlparse
-
-import aiohttp
-import torch
-from pydantic import BaseModel, Field
-
-from comfy_api.latest import IO, Input
-from comfy_api.util import VideoCodec, VideoContainer
-
-from . import request_logger
-from ._helpers import is_processing_interrupted, sleep_with_interrupt
-from .client import (
-    ApiEndpoint,
-    _diagnose_connectivity,
-    _display_time_progress,
-    sync_op,
-)
-from .common_exceptions import ApiServerError, LocalNetworkError, ProcessingInterrupted
-from .conversions import (
-    audio_ndarray_to_bytesio,
-    audio_tensor_to_contiguous_ndarray,
-    tensor_to_bytesio,
-)
-
-
-class UploadRequest(BaseModel):
-    file_name: str = Field(..., description="Filename to upload")
-    content_type: Optional[str] = Field(
-        None,
-        description="Mime type of the file. For example: image/png, image/jpeg, video/mp4, etc.",
-    )
-
-
-class UploadResponse(BaseModel):
-    download_url: str = Field(..., description="URL to GET uploaded file")
-    upload_url: str = Field(..., description="URL to PUT file to upload")
-
-
-async def upload_images_to_comfyapi(
-    cls: type[IO.ComfyNode],
-    image: torch.Tensor,
-    *,
-    max_images: int = 8,
-    mime_type: Optional[str] = None,
-    wait_label: Optional[str] = "Uploading",
-) -> list[str]:
-    """
-    Uploads images to ComfyUI API and returns download URLs.
-    To upload multiple images, stack them in the batch dimension first.
-    """
-    # if batch, try to upload each file if max_images is greater than 0
-    download_urls: list[str] = []
-    is_batch = len(image.shape) > 3
-    batch_len = image.shape[0] if is_batch else 1
-
-    for idx in range(min(batch_len, max_images)):
-        tensor = image[idx] if is_batch else image
-        img_io = tensor_to_bytesio(tensor, mime_type=mime_type)
-        url = await upload_file_to_comfyapi(cls, img_io, img_io.name, mime_type, wait_label)
-        download_urls.append(url)
-    return download_urls
-
-
-async def upload_audio_to_comfyapi(
-    cls: type[IO.ComfyNode],
-    audio: Input.Audio,
-    *,
-    container_format: str = "mp4",
-    codec_name: str = "aac",
-    mime_type: str = "audio/mp4",
-    filename: str = "uploaded_audio.mp4",
-) -> str:
-    """
-    Uploads a single audio input to ComfyUI API and returns its download URL.
-    Encodes the raw waveform into the specified format before uploading.
-    """
-    sample_rate: int = audio["sample_rate"]
-    waveform: torch.Tensor = audio["waveform"]
-    audio_data_np = audio_tensor_to_contiguous_ndarray(waveform)
-    audio_bytes_io = audio_ndarray_to_bytesio(audio_data_np, sample_rate, container_format, codec_name)
-    return await upload_file_to_comfyapi(cls, audio_bytes_io, filename, mime_type)
-
-
-async def upload_video_to_comfyapi(
-    cls: type[IO.ComfyNode],
-    video: Input.Video,
-    *,
-    container: VideoContainer = VideoContainer.MP4,
-    codec: VideoCodec = VideoCodec.H264,
-    max_duration: Optional[int] = None,
-) -> str:
-    """
-    Uploads a single video to ComfyUI API and returns its download URL.
-    Uses the specified container and codec for saving the video before upload.
-    """
-    if max_duration is not None:
-        try:
-            actual_duration = video.get_duration()
-            if actual_duration > max_duration:
-                raise ValueError(
-                    f"Video duration ({actual_duration:.2f}s) exceeds the maximum allowed ({max_duration}s)."
-                )
-        except Exception as e:
-            logging.error("Error getting video duration: %s", str(e))
-            raise ValueError(f"Could not verify video duration from source: {e}") from e
-
-    upload_mime_type = f"video/{container.value.lower()}"
-    filename = f"uploaded_video.{container.value.lower()}"
-
-    # Convert VideoInput to BytesIO using specified container/codec
-    video_bytes_io = BytesIO()
-    video.save_to(video_bytes_io, format=container, codec=codec)
-    video_bytes_io.seek(0)
-
-    return await upload_file_to_comfyapi(cls, video_bytes_io, filename, upload_mime_type)
-
-
-async def upload_file_to_comfyapi(
-    cls: type[IO.ComfyNode],
-    file_bytes_io: BytesIO,
-    filename: str,
-    upload_mime_type: Optional[str],
-    wait_label: Optional[str] = "Uploading",
-) -> str:
-    """Uploads a single file to ComfyUI API and returns its download URL."""
-    if upload_mime_type is None:
-        request_object = UploadRequest(file_name=filename)
-    else:
-        request_object = UploadRequest(file_name=filename, content_type=upload_mime_type)
-    create_resp = await sync_op(
-        cls,
-        endpoint=ApiEndpoint(path="/customers/storage", method="POST"),
-        data=request_object,
-        response_model=UploadResponse,
-        final_label_on_success=None,
-        monitor_progress=False,
-    )
-    await upload_file(
-        cls,
-        create_resp.upload_url,
-        file_bytes_io,
-        content_type=upload_mime_type,
-        wait_label=wait_label,
-    )
-    return create_resp.download_url
-
-
-async def upload_file(
-    cls: type[IO.ComfyNode],
-    upload_url: str,
-    file: Union[BytesIO, str],
-    *,
-    content_type: Optional[str] = None,
-    max_retries: int = 3,
-    retry_delay: float = 1.0,
-    retry_backoff: float = 2.0,
-    wait_label: Optional[str] = None,
-) -> None:
-    """
-    Upload a file to a signed URL (e.g., S3 pre-signed PUT) with retries, Comfy progress display, and interruption.
-
-    Args:
-        cls: Node class (provides auth context + UI progress hooks).
-        upload_url: Pre-signed PUT URL.
-        file: BytesIO or path string.
-        content_type: Explicit MIME type. If None, we *suppress* Content-Type.
-        max_retries: Maximum retry attempts.
-        retry_delay: Initial delay in seconds.
-        retry_backoff: Exponential backoff factor.
-        wait_label: Progress label shown in Comfy UI.
-
-    Raises:
-        ProcessingInterrupted, LocalNetworkError, ApiServerError, Exception
-    """
-    if isinstance(file, BytesIO):
-        with contextlib.suppress(Exception):
-            file.seek(0)
-        data = file.read()
-    elif isinstance(file, str):
-        with open(file, "rb") as f:
-            data = f.read()
-    else:
-        raise ValueError("file must be a BytesIO or a filesystem path string")
-
-    headers: dict[str, str] = {}
-    skip_auto_headers: set[str] = set()
-    if content_type:
-        headers["Content-Type"] = content_type
-    else:
-        skip_auto_headers.add("Content-Type")  # Don't let aiohttp add Content-Type, it can break the signed request
-
-    attempt = 0
-    delay = retry_delay
-    start_ts = time.monotonic()
-    op_uuid = uuid.uuid4().hex[:8]
-    while True:
-        attempt += 1
-        operation_id = _generate_operation_id("PUT", upload_url, attempt, op_uuid)
-        timeout = aiohttp.ClientTimeout(total=None)
-        stop_evt = asyncio.Event()
-
-        async def _monitor():
-            try:
-                while not stop_evt.is_set():
-                    if is_processing_interrupted():
-                        return
-                    if wait_label:
-                        _display_time_progress(cls, wait_label, int(time.monotonic() - start_ts), None)
-                    await asyncio.sleep(1.0)
-            except asyncio.CancelledError:
-                return
-
-        monitor_task = asyncio.create_task(_monitor())
-        sess: Optional[aiohttp.ClientSession] = None
-        try:
-            try:
-                request_logger.log_request_response(
-                    operation_id=operation_id,
-                    request_method="PUT",
-                    request_url=upload_url,
-                    request_headers=headers or None,
-                    request_params=None,
-                    request_data=f"[File data {len(data)} bytes]",
-                )
-            except Exception as e:
-                logging.debug("[DEBUG] upload request logging failed: %s", e)
-
-            sess = aiohttp.ClientSession(timeout=timeout)
-            req = sess.put(upload_url, data=data, headers=headers, skip_auto_headers=skip_auto_headers)
-            req_task = asyncio.create_task(req)
-
-            done, pending = await asyncio.wait({req_task, monitor_task}, return_when=asyncio.FIRST_COMPLETED)
-
-            if monitor_task in done and req_task in pending:
-                req_task.cancel()
-                raise ProcessingInterrupted("Upload cancelled")
-
-            try:
-                resp = await req_task
-            except asyncio.CancelledError:
-                raise ProcessingInterrupted("Upload cancelled") from None
-
-            async with resp:
-                if resp.status >= 400:
-                    with contextlib.suppress(Exception):
-                        try:
-                            body = await resp.json()
-                        except Exception:
-                            body = await resp.text()
-                        msg = f"Upload failed with status {resp.status}"
-                        request_logger.log_request_response(
-                            operation_id=operation_id,
-                            request_method="PUT",
-                            request_url=upload_url,
-                            response_status_code=resp.status,
-                            response_headers=dict(resp.headers),
-                            response_content=body,
-                            error_message=msg,
-                        )
-                    if resp.status in {408, 429, 500, 502, 503, 504} and attempt <= max_retries:
-                        await sleep_with_interrupt(
-                            delay,
-                            cls,
-                            wait_label,
-                            start_ts,
-                            None,
-                            display_callback=_display_time_progress if wait_label else None,
-                        )
-                        delay *= retry_backoff
-                        continue
-                    raise Exception(f"Failed to upload (HTTP {resp.status}).")
-                try:
-                    request_logger.log_request_response(
-                        operation_id=operation_id,
-                        request_method="PUT",
-                        request_url=upload_url,
-                        response_status_code=resp.status,
-                        response_headers=dict(resp.headers),
-                        response_content="File uploaded successfully.",
-                    )
-                except Exception as e:
-                    logging.debug("[DEBUG] upload response logging failed: %s", e)
-                return
-        except asyncio.CancelledError:
-            raise ProcessingInterrupted("Task cancelled") from None
-        except (aiohttp.ClientError, OSError) as e:
-            if attempt <= max_retries:
-                with contextlib.suppress(Exception):
-                    request_logger.log_request_response(
-                        operation_id=operation_id,
-                        request_method="PUT",
-                        request_url=upload_url,
-                        request_headers=headers or None,
-                        request_data=f"[File data {len(data)} bytes]",
-                        error_message=f"{type(e).__name__}: {str(e)} (will retry)",
-                    )
-                await sleep_with_interrupt(
-                    delay,
-                    cls,
-                    wait_label,
-                    start_ts,
-                    None,
-                    display_callback=_display_time_progress if wait_label else None,
-                )
-                delay *= retry_backoff
-                continue
-
-            diag = await _diagnose_connectivity()
-            if not diag["internet_accessible"]:
-                raise LocalNetworkError(
-                    "Unable to connect to the network. Please check your internet connection and try again."
-                ) from e
-            raise ApiServerError("The API service appears unreachable at this time.") from e
-        finally:
-            stop_evt.set()
-            if monitor_task:
-                monitor_task.cancel()
-                with contextlib.suppress(Exception):
-                    await monitor_task
-            if sess:
-                with contextlib.suppress(Exception):
-                    await sess.close()
-
-
-def _generate_operation_id(method: str, url: str, attempt: int, op_uuid: str) -> str:
-    try:
-        parsed = urlparse(url)
-        slug = (parsed.path.rsplit("/", 1)[-1] or parsed.netloc or "upload").strip("/").replace("/", "_")
-    except Exception:
-        slug = "upload"
-    return f"{method}_{slug}_{op_uuid}_try{attempt}"
--- a/comfy_api_nodes/util/validation_utils.py
+++ b/comfy_api_nodes/util/validation_utils.py
@@ -2,8 +2,6 @@ import logging
 from typing import Optional

 import torch
-
-from comfy_api.input.video_types import VideoInput
 from comfy_api.latest import Input


@@ -30,69 +28,76 @@ def validate_image_dimensions(
    if max_width is not None and width > max_width:
        raise ValueError(f"Image width must be at most {max_width}px, got {width}px")
    if min_height is not None and height < min_height:
-        raise ValueError(f"Image height must be at least {min_height}px, got {height}px")
+        raise ValueError(
+            f"Image height must be at least {min_height}px, got {height}px"
+        )
    if max_height is not None and height > max_height:
        raise ValueError(f"Image height must be at most {max_height}px, got {height}px")


 def validate_image_aspect_ratio(
    image: torch.Tensor,
-    min_ratio: Optional[tuple[float, float]] = None,  # e.g. (1, 4)
-    max_ratio: Optional[tuple[float, float]] = None,  # e.g. (4, 1)
+    min_aspect_ratio: Optional[float] = None,
+    max_aspect_ratio: Optional[float] = None,
+):
+    width, height = get_image_dimensions(image)
+    aspect_ratio = width / height
+
+    if min_aspect_ratio is not None and aspect_ratio < min_aspect_ratio:
+        raise ValueError(
+            f"Image aspect ratio must be at least {min_aspect_ratio}, got {aspect_ratio}"
+        )
+    if max_aspect_ratio is not None and aspect_ratio > max_aspect_ratio:
+        raise ValueError(
+            f"Image aspect ratio must be at most {max_aspect_ratio}, got {aspect_ratio}"
+        )
+
+
+def validate_image_aspect_ratio_range(
+    image: torch.Tensor,
+    min_ratio: tuple[float, float],  # e.g. (1, 4)
+    max_ratio: tuple[float, float],  # e.g. (4, 1)
    *,
-    strict: bool = True,  # True -> (min, max); False -> [min, max]
+    strict: bool = True,             # True -> (min, max); False -> [min, max]
 ) -> float:
-    """Validates that image aspect ratio is within min and max. If a bound is None, that side is not checked."""
+    a1, b1 = min_ratio
+    a2, b2 = max_ratio
+    if a1 <= 0 or b1 <= 0 or a2 <= 0 or b2 <= 0:
+        raise ValueError("Ratios must be positive, like (1, 4) or (4, 1).")
+    lo, hi = (a1 / b1), (a2 / b2)
+    if lo > hi:
+        lo, hi = hi, lo
+        a1, b1, a2, b2 = a2, b2, a1, b1  # swap only for error text
    w, h = get_image_dimensions(image)
    if w <= 0 or h <= 0:
        raise ValueError(f"Invalid image dimensions: {w}x{h}")
    ar = w / h
-    _assert_ratio_bounds(ar, min_ratio=min_ratio, max_ratio=max_ratio, strict=strict)
+    ok = (lo < ar < hi) if strict else (lo <= ar <= hi)
+    if not ok:
+        op = "<" if strict else "≤"
+        raise ValueError(f"Image aspect ratio {ar:.6g} is outside allowed range: {a1}:{b1} {op} ratio {op} {a2}:{b2}")
    return ar


-def validate_images_aspect_ratio_closeness(
-    first_image: torch.Tensor,
-    second_image: torch.Tensor,
-    min_rel: float,   # e.g. 0.8
-    max_rel: float,   # e.g. 1.25
+def validate_aspect_ratio_closeness(
+    start_img,
+    end_img,
+    min_rel: float,
+    max_rel: float,
    *,
-    strict: bool = False,  # True -> (min, max); False -> [min, max]
-) -> float:
-    """
-    Validates that the two images' aspect ratios are 'close'.
-    The closeness factor is C = max(ar1, ar2) / min(ar1, ar2)  (C >= 1).
-    We require C <= limit, where limit = max(max_rel, 1.0 / min_rel).
-
-    Returns the computed closeness factor C.
-    """
-    w1, h1 = get_image_dimensions(first_image)
-    w2, h2 = get_image_dimensions(second_image)
+    strict: bool = False,   # True => exclusive, False => inclusive
+) -> None:
+    w1, h1 = get_image_dimensions(start_img)
+    w2, h2 = get_image_dimensions(end_img)
    if min(w1, h1, w2, h2) <= 0:
        raise ValueError("Invalid image dimensions")
    ar1 = w1 / h1
    ar2 = w2 / h2
+    # Normalize so it is symmetric (no need to check both ar1/ar2 and ar2/ar1)
    closeness = max(ar1, ar2) / min(ar1, ar2)
-    limit = max(max_rel, 1.0 / min_rel)
+    limit = max(max_rel, 1.0 / min_rel)  # for 0.8..1.25 this is 1.25
    if (closeness >= limit) if strict else (closeness > limit):
-        raise ValueError(
-            f"Aspect ratios must be close: ar1/ar2={ar1/ar2:.2g}, "
-            f"allowed range {min_rel}–{max_rel} (limit {limit:.2g})."
-        )
-    return closeness
-
-
-def validate_aspect_ratio_string(
-    aspect_ratio: str,
-    min_ratio: Optional[tuple[float, float]] = None,  # e.g. (1, 4)
-    max_ratio: Optional[tuple[float, float]] = None,  # e.g. (4, 1)
-    *,
-    strict: bool = False,  # True -> (min, max); False -> [min, max]
-) -> float:
-    """Parses 'X:Y' and validates it against optional bounds. Returns the numeric ratio."""
-    ar = _parse_aspect_ratio_string(aspect_ratio)
-    _assert_ratio_bounds(ar, min_ratio=min_ratio, max_ratio=max_ratio, strict=strict)
-    return ar
+        raise ValueError(f"Aspect ratios must be close: start/end={ar1/ar2:.4f}, allowed range {min_rel}–{max_rel}.")


 def validate_video_dimensions(
@@ -113,7 +118,9 @@ def validate_video_dimensions(
    if max_width is not None and width > max_width:
        raise ValueError(f"Video width must be at most {max_width}px, got {width}px")
    if min_height is not None and height < min_height:
-        raise ValueError(f"Video height must be at least {min_height}px, got {height}px")
+        raise ValueError(
+            f"Video height must be at least {min_height}px, got {height}px"
+        )
    if max_height is not None and height > max_height:
        raise ValueError(f"Video height must be at most {max_height}px, got {height}px")

@@ -131,9 +138,13 @@ def validate_video_duration(

    epsilon = 0.0001
    if min_duration is not None and min_duration - epsilon > duration:
-        raise ValueError(f"Video duration must be at least {min_duration}s, got {duration}s")
+        raise ValueError(
+            f"Video duration must be at least {min_duration}s, got {duration}s"
+        )
    if max_duration is not None and duration > max_duration + epsilon:
-        raise ValueError(f"Video duration must be at most {max_duration}s, got {duration}s")
+        raise ValueError(
+            f"Video duration must be at most {max_duration}s, got {duration}s"
+        )


 def get_number_of_images(images):
@@ -154,77 +165,3 @@ def validate_audio_duration(
        raise ValueError(f"Audio duration must be at least {min_duration}s, got {dur + eps:.2f}s")
    if max_duration is not None and dur - eps > max_duration:
        raise ValueError(f"Audio duration must be at most {max_duration}s, got {dur - eps:.2f}s")
-
-
-def validate_string(
-    string: str,
-    strip_whitespace=True,
-    field_name="prompt",
-    min_length=None,
-    max_length=None,
-):
-    if string is None:
-        raise Exception(f"Field '{field_name}' cannot be empty.")
-    if strip_whitespace:
-        string = string.strip()
-    if min_length and len(string) < min_length:
-        raise Exception(
-            f"Field '{field_name}' cannot be shorter than {min_length} characters; was {len(string)} characters long."
-        )
-    if max_length and len(string) > max_length:
-        raise Exception(
-            f" Field '{field_name} cannot be longer than {max_length} characters; was {len(string)} characters long."
-        )
-
-
-def validate_container_format_is_mp4(video: VideoInput) -> None:
-    """Validates video container format is MP4."""
-    container_format = video.get_container_format()
-    if container_format not in ["mp4", "mov,mp4,m4a,3gp,3g2,mj2"]:
-        raise ValueError(f"Only MP4 container format supported. Got: {container_format}")
-
-
-def _ratio_from_tuple(r: tuple[float, float]) -> float:
-    a, b = r
-    if a <= 0 or b <= 0:
-        raise ValueError(f"Ratios must be positive, got {a}:{b}.")
-    return a / b
-
-
-def _assert_ratio_bounds(
-    ar: float,
-    *,
-    min_ratio: Optional[tuple[float, float]] = None,
-    max_ratio: Optional[tuple[float, float]] = None,
-    strict: bool = True,
-) -> None:
-    """Validate a numeric aspect ratio against optional min/max ratio bounds."""
-    lo = _ratio_from_tuple(min_ratio) if min_ratio is not None else None
-    hi = _ratio_from_tuple(max_ratio) if max_ratio is not None else None
-
-    if lo is not None and hi is not None and lo > hi:
-        lo, hi = hi, lo  # normalize order if caller swapped them
-
-    if lo is not None:
-        if (ar <= lo) if strict else (ar < lo):
-            op = "<" if strict else "≤"
-            raise ValueError(f"Aspect ratio `{ar:.2g}` must be {op} {lo:.2g}.")
-    if hi is not None:
-        if (ar >= hi) if strict else (ar > hi):
-            op = "<" if strict else "≤"
-            raise ValueError(f"Aspect ratio `{ar:.2g}` must be {op} {hi:.2g}.")
-
-
-def _parse_aspect_ratio_string(ar_str: str) -> float:
-    """Parse 'X:Y' with integer parts into a positive float ratio X/Y."""
-    parts = ar_str.split(":")
-    if len(parts) != 2:
-        raise ValueError(f"Aspect ratio must be 'X:Y' (e.g., 16:9), got '{ar_str}'.")
-    try:
-        a = int(parts[0].strip())
-        b = int(parts[1].strip())
-    except ValueError as exc:
-        raise ValueError(f"Aspect ratio must contain integers separated by ':', got '{ar_str}'.") from exc
-    if a <= 0 or b <= 0:
-        raise ValueError(f"Aspect ratio parts must be positive integers, got {a}:{b}.")
-    return a / b
--- a/comfy_execution/caching.py
+++ b/comfy_execution/caching.py
@@ -1,9 +1,4 @@
-import bisect
-import gc
 import itertools
-import psutil
-import time
-import torch
 from typing import Sequence, Mapping, Dict
 from comfy_execution.graph import DynamicPrompt
 from abc import ABC, abstractmethod
@@ -53,7 +48,7 @@ class Unhashable:
 def to_hashable(obj):
    # So that we don't infinitely recurse since frozenset and tuples
    # are Sequences.
-    if isinstance(obj, (int, float, str, bool, bytes, type(None))):
+    if isinstance(obj, (int, float, str, bool, type(None))):
        return obj
    elif isinstance(obj, Mapping):
        return frozenset([(to_hashable(k), to_hashable(v)) for k, v in sorted(obj.items())])
@@ -193,9 +188,6 @@ class BasicCache:
        self._clean_cache()
        self._clean_subcaches()

-    def poll(self, **kwargs):
-        pass
-
    def _set_immediate(self, node_id, value):
        assert self.initialized
        cache_key = self.cache_key_set.get_data_key(node_id)
@@ -273,29 +265,6 @@ class HierarchicalCache(BasicCache):
        assert cache is not None
        return await cache._ensure_subcache(node_id, children_ids)

-class NullCache:
-
-    async def set_prompt(self, dynprompt, node_ids, is_changed_cache):
-        pass
-
-    def all_node_ids(self):
-        return []
-
-    def clean_unused(self):
-        pass
-
-    def poll(self, **kwargs):
-        pass
-
-    def get(self, node_id):
-        return None
-
-    def set(self, node_id, value):
-        pass
-
-    async def ensure_subcache_for(self, node_id, children_ids):
-        return self
-
 class LRUCache(BasicCache):
    def __init__(self, key_class, max_size=100):
        super().__init__(key_class)
@@ -349,75 +318,155 @@ class LRUCache(BasicCache):
        return self


-#Iterating the cache for usage analysis might be expensive, so if we trigger make sure
-#to take a chunk out to give breathing space on high-node / low-ram-per-node flows.
-
-RAM_CACHE_HYSTERESIS = 1.1
-
-#This is kinda in GB but not really. It needs to be non-zero for the below heuristic
-#and as long as Multi GB models dwarf this it will approximate OOM scoring OK
-
-RAM_CACHE_DEFAULT_RAM_USAGE = 0.1
-
-#Exponential bias towards evicting older workflows so garbage will be taken out
-#in constantly changing setups.
-
-RAM_CACHE_OLD_WORKFLOW_OOM_MULTIPLIER = 1.3
-
-class RAMPressureCache(LRUCache):
+class DependencyAwareCache(BasicCache):
+    """
+    A cache implementation that tracks dependencies between nodes and manages
+    their execution and caching accordingly. It extends the BasicCache class.
+    Nodes are removed from this cache once all of their descendants have been
+    executed.
+    """

    def __init__(self, key_class):
-        super().__init__(key_class, 0)
-        self.timestamps = {}
+        """
+        Initialize the DependencyAwareCache.

-    def clean_unused(self):
-        self._clean_subcaches()
+        Args:
+            key_class: The class used for generating cache keys.
+        """
+        super().__init__(key_class)
+        self.descendants = {}  # Maps node_id -> set of descendant node_ids
+        self.ancestors = {}    # Maps node_id -> set of ancestor node_ids
+        self.executed_nodes = set()  # Tracks nodes that have been executed
+
+    async def set_prompt(self, dynprompt, node_ids, is_changed_cache):
+        """
+        Clear the entire cache and rebuild the dependency graph.
+
+        Args:
+            dynprompt: The dynamic prompt object containing node information.
+            node_ids: List of node IDs to initialize the cache for.
+            is_changed_cache: Flag indicating if the cache has changed.
+        """
+        # Clear all existing cache data
+        self.cache.clear()
+        self.subcaches.clear()
+        self.descendants.clear()
+        self.ancestors.clear()
+        self.executed_nodes.clear()
+
+        # Call the parent method to initialize the cache with the new prompt
+        await super().set_prompt(dynprompt, node_ids, is_changed_cache)
+
+        # Rebuild the dependency graph
+        self._build_dependency_graph(dynprompt, node_ids)
+
+    def _build_dependency_graph(self, dynprompt, node_ids):
+        """
+        Build the dependency graph for all nodes.
+
+        Args:
+            dynprompt: The dynamic prompt object containing node information.
+            node_ids: List of node IDs to build the graph for.
+        """
+        self.descendants.clear()
+        self.ancestors.clear()
+        for node_id in node_ids:
+            self.descendants[node_id] = set()
+            self.ancestors[node_id] = set()
+
+        for node_id in node_ids:
+            inputs = dynprompt.get_node(node_id)["inputs"]
+            for input_data in inputs.values():
+                if is_link(input_data):  # Check if the input is a link to another node
+                    ancestor_id = input_data[0]
+                    self.descendants[ancestor_id].add(node_id)
+                    self.ancestors[node_id].add(ancestor_id)

    def set(self, node_id, value):
-        self.timestamps[self.cache_key_set.get_data_key(node_id)] = time.time()
-        super().set(node_id, value)
+        """
+        Mark a node as executed and store its value in the cache.
+
+        Args:
+            node_id: The ID of the node to store.
+            value: The value to store for the node.
+        """
+        self._set_immediate(node_id, value)
+        self.executed_nodes.add(node_id)
+        self._cleanup_ancestors(node_id)

    def get(self, node_id):
-        self.timestamps[self.cache_key_set.get_data_key(node_id)] = time.time()
-        return super().get(node_id)
+        """
+        Retrieve the cached value for a node.

-    def poll(self, ram_headroom):
-        def _ram_gb():
-            return psutil.virtual_memory().available / (1024**3)
+        Args:
+            node_id: The ID of the node to retrieve.

-        if _ram_gb() > ram_headroom:
-            return
-        gc.collect()
-        if _ram_gb() > ram_headroom:
-            return
+        Returns:
+            The cached value for the node.
+        """
+        return self._get_immediate(node_id)

-        clean_list = []
+    async def ensure_subcache_for(self, node_id, children_ids):
+        """
+        Ensure a subcache exists for a node and update dependencies.

-        for key, (outputs, _), in self.cache.items():
-            oom_score =  RAM_CACHE_OLD_WORKFLOW_OOM_MULTIPLIER ** (self.generation - self.used_generation[key])
+        Args:
+            node_id: The ID of the parent node.
+            children_ids: List of child node IDs to associate with the parent node.

-            ram_usage = RAM_CACHE_DEFAULT_RAM_USAGE
-            def scan_list_for_ram_usage(outputs):
-                nonlocal ram_usage
-                if outputs is None:
-                    return
-                for output in outputs:
-                    if isinstance(output, list):
-                        scan_list_for_ram_usage(output)
-                    elif isinstance(output, torch.Tensor) and output.device.type == 'cpu':
-                        #score Tensors at a 50% discount for RAM usage as they are likely to
-                        #be high value intermediates
-                        ram_usage += (output.numel() * output.element_size()) * 0.5
-                    elif hasattr(output, "get_ram_usage"):
-                        ram_usage += output.get_ram_usage()
-            scan_list_for_ram_usage(outputs)
+        Returns:
+            The subcache object for the node.
+        """
+        subcache = await super()._ensure_subcache(node_id, children_ids)
+        for child_id in children_ids:
+            self.descendants[node_id].add(child_id)
+            self.ancestors[child_id].add(node_id)
+        return subcache

-            oom_score *= ram_usage
-            #In the case where we have no information on the node ram usage at all,
-            #break OOM score ties on the last touch timestamp (pure LRU)
-            bisect.insort(clean_list, (oom_score, self.timestamps[key], key))
+    def _cleanup_ancestors(self, node_id):
+        """
+        Check if ancestors of a node can be removed from the cache.

-        while _ram_gb() < ram_headroom * RAM_CACHE_HYSTERESIS and clean_list:
-            _, _, key = clean_list.pop()
-            del self.cache[key]
-            gc.collect()
+        Args:
+            node_id: The ID of the node whose ancestors are to be checked.
+        """
+        for ancestor_id in self.ancestors.get(node_id, []):
+            if ancestor_id in self.executed_nodes:
+                # Remove ancestor if all its descendants have been executed
+                if all(descendant in self.executed_nodes for descendant in self.descendants[ancestor_id]):
+                    self._remove_node(ancestor_id)
+
+    def _remove_node(self, node_id):
+        """
+        Remove a node from the cache.
+
+        Args:
+            node_id: The ID of the node to remove.
+        """
+        cache_key = self.cache_key_set.get_data_key(node_id)
+        if cache_key in self.cache:
+            del self.cache[cache_key]
+        subcache_key = self.cache_key_set.get_subcache_key(node_id)
+        if subcache_key in self.subcaches:
+            del self.subcaches[subcache_key]
+
+    def clean_unused(self):
+        """
+        Clean up unused nodes. This is a no-op for this cache implementation.
+        """
+        pass
+
+    def recursive_debug_dump(self):
+        """
+        Dump the cache and dependency graph for debugging.
+
+        Returns:
+            A list containing the cache state and dependency graph.
+        """
+        result = super().recursive_debug_dump()
+        result.append({
+            "descendants": self.descendants,
+            "ancestors": self.ancestors,
+            "executed_nodes": list(self.executed_nodes),
+        })
+        return result
--- a/comfy_execution/graph.py
+++ b/comfy_execution/graph.py
@@ -153,9 +153,8 @@ class TopologicalSort:
                        continue
                    _, _, input_info = self.get_input_info(unique_id, input_name)
                    is_lazy = input_info is not None and "lazy" in input_info and input_info["lazy"]
-                    if (include_lazy or not is_lazy):
-                        if not self.is_cached(from_node_id):
-                            node_ids.append(from_node_id)
+                    if (include_lazy or not is_lazy) and not self.is_cached(from_node_id):
+                        node_ids.append(from_node_id)
                        links.append((from_node_id, from_socket, unique_id))

        for link in links:
@@ -195,40 +194,10 @@ class ExecutionList(TopologicalSort):
        super().__init__(dynprompt)
        self.output_cache = output_cache
        self.staged_node_id = None
-        self.execution_cache = {}
-        self.execution_cache_listeners = {}

    def is_cached(self, node_id):
        return self.output_cache.get(node_id) is not None

-    def cache_link(self, from_node_id, to_node_id):
-        if not to_node_id in self.execution_cache:
-            self.execution_cache[to_node_id] = {}
-        self.execution_cache[to_node_id][from_node_id] = self.output_cache.get(from_node_id)
-        if not from_node_id in self.execution_cache_listeners:
-            self.execution_cache_listeners[from_node_id] = set()
-        self.execution_cache_listeners[from_node_id].add(to_node_id)
-
-    def get_cache(self, from_node_id, to_node_id):
-        if not to_node_id in self.execution_cache:
-            return None
-        value = self.execution_cache[to_node_id].get(from_node_id)
-        if value is None:
-            return None
-        #Write back to the main cache on touch.
-        self.output_cache.set(from_node_id, value)
-        return value
-
-    def cache_update(self, node_id, value):
-        if node_id in self.execution_cache_listeners:
-            for to_node_id in self.execution_cache_listeners[node_id]:
-                if to_node_id in self.execution_cache:
-                    self.execution_cache[to_node_id][node_id] = value
-
-    def add_strong_link(self, from_node_id, from_socket, to_node_id):
-        super().add_strong_link(from_node_id, from_socket, to_node_id)
-        self.cache_link(from_node_id, to_node_id)
-
    async def stage_node_execution(self):
        assert self.staged_node_id is None
        if self.is_empty():
@@ -308,8 +277,6 @@ class ExecutionList(TopologicalSort):
    def complete_node_execution(self):
        node_id = self.staged_node_id
        self.pop_node(node_id)
-        self.execution_cache.pop(node_id, None)
-        self.execution_cache_listeners.pop(node_id, None)
        self.staged_node_id = None

    def get_nodes_in_cycle(self):
--- a/comfy_extras/nodes_controlnet.py
+++ b/comfy_extras/nodes_controlnet.py
@@ -1,26 +1,20 @@
 from comfy.cldm.control_types import UNION_CONTROLNET_TYPES
 import nodes
 import comfy.utils
-from typing_extensions import override
-from comfy_api.latest import ComfyExtension, io

-class SetUnionControlNetType(io.ComfyNode):
+class SetUnionControlNetType:
    @classmethod
-    def define_schema(cls):
-        return io.Schema(
-            node_id="SetUnionControlNetType",
-            category="conditioning/controlnet",
-            inputs=[
-                io.ControlNet.Input("control_net"),
-                io.Combo.Input("type", options=["auto"] + list(UNION_CONTROLNET_TYPES.keys())),
-            ],
-            outputs=[
-                io.ControlNet.Output(),
-            ],
-        )
+    def INPUT_TYPES(s):
+        return {"required": {"control_net": ("CONTROL_NET", ),
+                             "type": (["auto"] + list(UNION_CONTROLNET_TYPES.keys()),)
+                             }}

-    @classmethod
-    def execute(cls, control_net, type) -> io.NodeOutput:
+    CATEGORY = "conditioning/controlnet"
+    RETURN_TYPES = ("CONTROL_NET",)
+
+    FUNCTION = "set_controlnet_type"
+
+    def set_controlnet_type(self, control_net, type):
        control_net = control_net.copy()
        type_number = UNION_CONTROLNET_TYPES.get(type, -1)
        if type_number >= 0:
@@ -28,36 +22,27 @@ class SetUnionControlNetType(io.ComfyNode):
        else:
            control_net.set_extra_arg("control_type", [])

-        return io.NodeOutput(control_net)
+        return (control_net,)

-    set_controlnet_type = execute  # TODO: remove
-
-
-class ControlNetInpaintingAliMamaApply(io.ComfyNode):
+class ControlNetInpaintingAliMamaApply(nodes.ControlNetApplyAdvanced):
    @classmethod
-    def define_schema(cls):
-        return io.Schema(
-            node_id="ControlNetInpaintingAliMamaApply",
-            category="conditioning/controlnet",
-            inputs=[
-                io.Conditioning.Input("positive"),
-                io.Conditioning.Input("negative"),
-                io.ControlNet.Input("control_net"),
-                io.Vae.Input("vae"),
-                io.Image.Input("image"),
-                io.Mask.Input("mask"),
-                io.Float.Input("strength", default=1.0, min=0.0, max=10.0, step=0.01),
-                io.Float.Input("start_percent", default=0.0, min=0.0, max=1.0, step=0.001),
-                io.Float.Input("end_percent", default=1.0, min=0.0, max=1.0, step=0.001),
-            ],
-            outputs=[
-                io.Conditioning.Output(display_name="positive"),
-                io.Conditioning.Output(display_name="negative"),
-            ],
-        )
+    def INPUT_TYPES(s):
+        return {"required": {"positive": ("CONDITIONING", ),
+                             "negative": ("CONDITIONING", ),
+                             "control_net": ("CONTROL_NET", ),
+                             "vae": ("VAE", ),
+                             "image": ("IMAGE", ),
+                             "mask": ("MASK", ),
+                             "strength": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 10.0, "step": 0.01}),
+                             "start_percent": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 1.0, "step": 0.001}),
+                             "end_percent": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.001})
+                             }}

-    @classmethod
-    def execute(cls, positive, negative, control_net, vae, image, mask, strength, start_percent, end_percent) -> io.NodeOutput:
+    FUNCTION = "apply_inpaint_controlnet"
+
+    CATEGORY = "conditioning/controlnet"
+
+    def apply_inpaint_controlnet(self, positive, negative, control_net, vae, image, mask, strength, start_percent, end_percent):
        extra_concat = []
        if control_net.concat_mask:
            mask = 1.0 - mask.reshape((-1, 1, mask.shape[-2], mask.shape[-1]))
@@ -65,20 +50,11 @@ class ControlNetInpaintingAliMamaApply(io.ComfyNode):
            image = image * mask_apply.movedim(1, -1).repeat(1, 1, 1, image.shape[3])
            extra_concat = [mask]

-        result = nodes.ControlNetApplyAdvanced().apply_controlnet(positive, negative, control_net, image, strength, start_percent, end_percent, vae=vae, extra_concat=extra_concat)
-        return io.NodeOutput(result[0], result[1])
-
-    apply_inpaint_controlnet = execute  # TODO: remove
+        return self.apply_controlnet(positive, negative, control_net, image, strength, start_percent, end_percent, vae=vae, extra_concat=extra_concat)


-class ControlNetExtension(ComfyExtension):
-    @override
-    async def get_node_list(self) -> list[type[io.ComfyNode]]:
-        return [
-            SetUnionControlNetType,
-            ControlNetInpaintingAliMamaApply,
-        ]

-
-async def comfy_entrypoint() -> ControlNetExtension:
-    return ControlNetExtension()
+NODE_CLASS_MAPPINGS = {
+    "SetUnionControlNetType": SetUnionControlNetType,
+    "ControlNetInpaintingAliMamaApply": ControlNetInpaintingAliMamaApply,
+}
--- a/comfy_extras/nodes_easycache.py
+++ b/comfy_extras/nodes_easycache.py
@@ -11,13 +11,13 @@ if TYPE_CHECKING:

 def easycache_forward_wrapper(executor, *args, **kwargs):
    # get values from args
+    x: torch.Tensor = args[0]
    transformer_options: dict[str] = args[-1]
    if not isinstance(transformer_options, dict):
        transformer_options = kwargs.get("transformer_options")
        if not transformer_options:
            transformer_options = args[-2]
    easycache: EasyCacheHolder = transformer_options["easycache"]
-    x: torch.Tensor = args[0][:, :easycache.output_channels]
    sigmas = transformer_options["sigmas"]
    uuids = transformer_options["uuids"]
    if sigmas is not None and easycache.is_past_end_timestep(sigmas):
@@ -82,13 +82,13 @@ def easycache_forward_wrapper(executor, *args, **kwargs):

 def lazycache_predict_noise_wrapper(executor, *args, **kwargs):
    # get values from args
+    x: torch.Tensor = args[0]
    timestep: float = args[1]
    model_options: dict[str] = args[2]
    easycache: LazyCacheHolder = model_options["transformer_options"]["easycache"]
    if easycache.is_past_end_timestep(timestep):
        return executor(*args, **kwargs)
    # prepare next x_prev
-    x: torch.Tensor = args[0][:, :easycache.output_channels]
    next_x_prev = x
    input_change = None
    do_easycache = easycache.should_do_easycache(timestep)
@@ -173,7 +173,7 @@ def easycache_sample_wrapper(executor, *args, **kwargs):


 class EasyCacheHolder:
-    def __init__(self, reuse_threshold: float, start_percent: float, end_percent: float, subsample_factor: int, offload_cache_diff: bool, verbose: bool=False, output_channels: int=None):
+    def __init__(self, reuse_threshold: float, start_percent: float, end_percent: float, subsample_factor: int, offload_cache_diff: bool, verbose: bool=False):
        self.name = "EasyCache"
        self.reuse_threshold = reuse_threshold
        self.start_percent = start_percent
@@ -202,7 +202,6 @@ class EasyCacheHolder:
        self.allow_mismatch = True
        self.cut_from_start = True
        self.state_metadata = None
-        self.output_channels = output_channels

    def is_past_end_timestep(self, timestep: float) -> bool:
        return not (timestep[0] > self.end_t).item()
@@ -245,8 +244,6 @@ class EasyCacheHolder:
            self.total_steps_skipped += 1
        batch_offset = x.shape[0] // len(uuids)
        for i, uuid in enumerate(uuids):
-            # slice out only what is relevant to this cond
-            batch_slice = [slice(i*batch_offset,(i+1)*batch_offset)]
            # if cached dims don't match x dims, cut off excess and hope for the best (cosmos world2video)
            if x.shape[1:] != self.uuid_cache_diffs[uuid].shape[1:]:
                if not self.allow_mismatch:
@@ -264,8 +261,9 @@ class EasyCacheHolder:
                            slicing.append(slice(None, dim_u))
                    else:
                        slicing.append(slice(None))
-                batch_slice = batch_slice + slicing
-            x[tuple(batch_slice)] += self.uuid_cache_diffs[uuid].to(x.device)
+                slicing = [slice(i*batch_offset,(i+1)*batch_offset)] + slicing
+                x = x[slicing]
+            x += self.uuid_cache_diffs[uuid].to(x.device)
        return x

    def update_cache_diff(self, output: torch.Tensor, x: torch.Tensor, uuids: list[UUID]):
@@ -284,7 +282,7 @@ class EasyCacheHolder:
                else:
                    slicing.append(slice(None))
                skip_dim = False
-            x = x[tuple(slicing)]
+            x = x[slicing]
        diff = output - x
        batch_offset = diff.shape[0] // len(uuids)
        for i, uuid in enumerate(uuids):
@@ -324,7 +322,7 @@ class EasyCacheHolder:
        return self

    def clone(self):
-        return EasyCacheHolder(self.reuse_threshold, self.start_percent, self.end_percent, self.subsample_factor, self.offload_cache_diff, self.verbose, output_channels=self.output_channels)
+        return EasyCacheHolder(self.reuse_threshold, self.start_percent, self.end_percent, self.subsample_factor, self.offload_cache_diff, self.verbose)


 class EasyCacheNode(io.ComfyNode):
@@ -351,7 +349,7 @@ class EasyCacheNode(io.ComfyNode):
    @classmethod
    def execute(cls, model: io.Model.Type, reuse_threshold: float, start_percent: float, end_percent: float, verbose: bool) -> io.NodeOutput:
        model = model.clone()
-        model.model_options["transformer_options"]["easycache"] = EasyCacheHolder(reuse_threshold, start_percent, end_percent, subsample_factor=8, offload_cache_diff=False, verbose=verbose, output_channels=model.model.latent_format.latent_channels)
+        model.model_options["transformer_options"]["easycache"] = EasyCacheHolder(reuse_threshold, start_percent, end_percent, subsample_factor=8, offload_cache_diff=False, verbose=verbose)
        model.add_wrapper_with_key(comfy.patcher_extension.WrappersMP.OUTER_SAMPLE, "easycache", easycache_sample_wrapper)
        model.add_wrapper_with_key(comfy.patcher_extension.WrappersMP.CALC_COND_BATCH, "easycache", easycache_calc_cond_batch_wrapper)
        model.add_wrapper_with_key(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, "easycache", easycache_forward_wrapper)
@@ -359,7 +357,7 @@ class EasyCacheNode(io.ComfyNode):


 class LazyCacheHolder:
-    def __init__(self, reuse_threshold: float, start_percent: float, end_percent: float, subsample_factor: int, offload_cache_diff: bool, verbose: bool=False, output_channels: int=None):
+    def __init__(self, reuse_threshold: float, start_percent: float, end_percent: float, subsample_factor: int, offload_cache_diff: bool, verbose: bool=False):
        self.name = "LazyCache"
        self.reuse_threshold = reuse_threshold
        self.start_percent = start_percent
@@ -383,7 +381,6 @@ class LazyCacheHolder:
        self.approx_output_change_rates = []
        self.total_steps_skipped = 0
        self.state_metadata = None
-        self.output_channels = output_channels

    def has_cache_diff(self) -> bool:
        return self.cache_diff is not None
@@ -458,7 +455,7 @@ class LazyCacheHolder:
        return self

    def clone(self):
-        return LazyCacheHolder(self.reuse_threshold, self.start_percent, self.end_percent, self.subsample_factor, self.offload_cache_diff, self.verbose, output_channels=self.output_channels)
+        return LazyCacheHolder(self.reuse_threshold, self.start_percent, self.end_percent, self.subsample_factor, self.offload_cache_diff, self.verbose)

 class LazyCacheNode(io.ComfyNode):
    @classmethod
@@ -484,7 +481,7 @@ class LazyCacheNode(io.ComfyNode):
    @classmethod
    def execute(cls, model: io.Model.Type, reuse_threshold: float, start_percent: float, end_percent: float, verbose: bool) -> io.NodeOutput:
        model = model.clone()
-        model.model_options["transformer_options"]["easycache"] = LazyCacheHolder(reuse_threshold, start_percent, end_percent, subsample_factor=8, offload_cache_diff=False, verbose=verbose, output_channels=model.model.latent_format.latent_channels)
+        model.model_options["transformer_options"]["easycache"] = LazyCacheHolder(reuse_threshold, start_percent, end_percent, subsample_factor=8, offload_cache_diff=False, verbose=verbose)
        model.add_wrapper_with_key(comfy.patcher_extension.WrappersMP.OUTER_SAMPLE, "lazycache", easycache_sample_wrapper)
        model.add_wrapper_with_key(comfy.patcher_extension.WrappersMP.PREDICT_NOISE, "lazycache", lazycache_predict_noise_wrapper)
        return io.NodeOutput(model)
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Jedrzej Kosinski	4661d1db5a	Bring patches changes from _calc_cond_batch into _calc_cond_batch_multigpu	2025-10-15 17:34:36 -07:00
Jedrzej Kosinski	b326a544d5	Merge branch 'master' into worksplit-multigpu	2025-10-15 17:33:02 -07:00
Jedrzej Kosinski	d89dd5f0b0	Satisfy ruff	2025-10-13 22:00:34 -07:00
Jedrzej Kosinski	8cbbf0be6c	Merge branch 'master' into worksplit-multigpu	2025-10-13 21:53:14 -07:00
Jedrzej Kosinski	c2115a4bac	Merge branch 'master' into worksplit-multigpu	2025-09-24 23:45:26 -07:00
Jedrzej Kosinski	bb44c2ecb9	Merge branch 'master' into worksplit-multigpu	2025-09-18 14:20:27 -07:00
Jedrzej Kosinski	efcd8280d6	Merge branch 'master' into worksplit-multigpu	2025-09-11 20:59:47 -07:00
Jedrzej Kosinski	9e9c129cd0	Merge remote-tracking branch 'origin/master' into worksplit-multigpu	2025-08-29 23:36:19 -07:00
Jedrzej Kosinski	ac14ee68c0	Merge branch 'master' into worksplit-multigpu	2025-08-18 19:51:24 -07:00
Jedrzej Kosinski	2c8f485434	Merge branch 'master' into worksplit-multigpu	2025-08-18 00:29:52 -07:00
Jedrzej Kosinski	383f9b34cb	Merge branch 'master' into worksplit-multigpu	2025-08-17 16:02:44 -07:00
Jedrzej Kosinski	b0741c7e5b	Merge branch 'master' into worksplit-multigpu	2025-08-15 16:50:04 -07:00
Jedrzej Kosinski	1489399cb5	Merge branch 'master' into worksplit-multigpu	2025-08-13 19:47:08 -07:00
Jedrzej Kosinski	3677943fa5	Merge branch 'master' into worksplit-multigpu	2025-08-13 14:06:09 -07:00
Jedrzej Kosinski	cfb63bfcd7	Merge branch 'worksplit-multigpu' of https://github.com/comfyanonymous/ComfyUI into worksplit-multigpu	2025-08-11 14:09:58 -07:00
Jedrzej Kosinski	962c3c832c	Merge branch 'master' into worksplit-multigpu	2025-08-11 14:09:41 -07:00
Jedrzej Kosinski	6ea69369ce	Merge branch 'master' into worksplit-multigpu	2025-08-07 23:24:02 -07:00
Jedrzej Kosinski	b4f559b34d	Merge branch 'master' into worksplit-multigpu	2025-08-04 20:23:19 -07:00
Jedrzej Kosinski	df122a7dba	Merge branch 'master' into worksplit-multigpu	2025-08-01 12:31:57 -07:00
Jedrzej Kosinski	67e906aa64	Merge branch 'master' into worksplit-multigpu	2025-07-31 04:00:22 -07:00
Jedrzej Kosinski	382f84a826	Merge branch 'master' into worksplit-multigpu	2025-07-29 17:17:29 -07:00
Jedrzej Kosinski	9cca36fa2b	Merge branch 'master' into worksplit-multigpu	2025-07-29 12:47:36 -07:00
Jedrzej Kosinski	5d5024296d	Merge branch 'master' into worksplit-multigpu	2025-07-28 06:17:24 -07:00
Jedrzej Kosinski	3b90a30178	Merge branch 'master' into worksplit-multigpu-wip	2025-07-27 01:03:25 -07:00
Jedrzej Kosinski	3c4104652b	Merge branch 'master' into worksplit-multigpu-wip	2025-07-22 11:42:23 -07:00
kosinkadink1@gmail.com	9855baaab3	Merge branch 'master' into worksplit-multigpu	2025-07-09 03:57:30 -05:00
Jedrzej Kosinski	d53479a197	Merge branch 'master' into worksplit-multigpu	2025-07-01 17:33:05 -05:00
Jedrzej Kosinski	443a795850	Merge branch 'master' into worksplit-multigpu	2025-06-24 00:49:24 -05:00
Jedrzej Kosinski	431dec8e53	Merge branch 'worksplit-multigpu' of https://github.com/comfyanonymous/ComfyUI into worksplit-multigpu	2025-06-24 00:48:58 -05:00
Jedrzej Kosinski	44e053c26d	Improve error handling for multigpu threads	2025-06-24 00:48:51 -05:00
Jedrzej Kosinski	1ae98932f1	Merge branch 'master' into worksplit-multigpu	2025-06-17 04:58:56 -05:00
kosinkadink1@gmail.com	0336b0ace8	Merge branch 'master' into worksplit-multigpu	2025-06-01 02:39:26 -07:00
kosinkadink1@gmail.com	8ae25235ec	Merge branch 'master' into worksplit-multigpu	2025-05-21 12:01:27 -07:00
Jedrzej Kosinski	9726eac475	Merge branch 'master' into worksplit-multigpu	2025-05-12 19:29:13 -05:00
Jedrzej Kosinski	272e8d42c1	Merge branch 'master' into worksplit-multigpu	2025-04-22 22:40:00 -05:00
Jedrzej Kosinski	6211d2be5a	Merge branch 'master' into worksplit-multigpu	2025-04-19 17:36:23 -05:00
Jedrzej Kosinski	8be711715c	Make unload_all_models account for all devices	2025-04-19 17:35:54 -05:00
Jedrzej Kosinski	b5cccf1325	Merge branch 'master' into worksplit-multigpu	2025-04-18 15:39:34 -05:00
Jedrzej Kosinski	2a54a904f4	Merge branch 'master' into worksplit-multigpu	2025-04-16 19:26:48 -05:00
Jedrzej Kosinski	ed6f92c975	Merge branch 'master' into worksplit-multigpu	2025-04-16 16:53:57 -05:00
Jedrzej Kosinski	adc66c0698	Merge branch 'master' into worksplit-multigpu	2025-04-16 14:23:56 -05:00
Jedrzej Kosinski	ccd5c01e5a	Merge branch 'master' into worksplit-multigpu	2025-04-09 09:17:12 -05:00
Jedrzej Kosinski	2fa9affcc1	Merge branch 'master' into worksplit-multigpu	2025-04-08 22:52:17 -05:00
Jedrzej Kosinski	407a5a656f	Rollback core of last commit due to weird behavior	2025-03-28 02:48:11 -05:00
kosinkadink1@gmail.com	9ce9ff8ef8	Allow chained MultiGPU Work Unit nodes to affect max_gpus present on ModelPatcher clone	2025-03-28 15:29:44 +08:00
Jedrzej Kosinski	63567c0ce8	Merge branch 'master' into worksplit-multigpu	2025-03-27 22:36:46 -05:00
Jedrzej Kosinski	a786ce5ead	Merge branch 'master' into worksplit-multigpu	2025-03-26 22:26:26 -05:00
Jedrzej Kosinski	4879b47648	Merge branch 'master' into worksplit-multigpu	2025-03-18 22:19:32 -05:00
Jedrzej Kosinski	5ccec33c22	Merge branch 'worksplit-multigpu' of https://github.com/comfyanonymous/ComfyUI into worksplit-multigpu	2025-03-17 14:27:39 -05:00
Jedrzej Kosinski	219d3cd0d0	Merge branch 'master' into worksplit-multigpu	2025-03-17 14:26:35 -05:00
Jedrzej Kosinski	c4ba399475	Merge branch 'master' into worksplit-multigpu	2025-03-15 09:12:09 -05:00
Jedrzej Kosinski	cc928a786d	Merge branch 'master' into worksplit-multigpu	2025-03-13 20:59:11 -05:00
Jedrzej Kosinski	6e144b98c4	Merge branch 'master' into worksplit-multigpu	2025-03-09 00:00:38 -06:00
Jedrzej Kosinski	6dca17bd2d	Satisfy ruff linting	2025-03-03 23:08:29 -06:00
Jedrzej Kosinski	5080105c23	Merge branch 'master' into worksplit-multigpu	2025-03-03 22:56:53 -06:00
Jedrzej Kosinski	093914a247	Made MultiGPU Work Units node more robust by forcing ModelPatcher clones to match at sample time, reuse loaded MultiGPU clones, finalize MultiGPU Work Units node ID and name, small refactors/cleanup of logging and multigpu-related code	2025-03-03 22:56:13 -06:00
Jedrzej Kosinski	605893d3cf	Merge branch 'master' into worksplit-multigpu	2025-02-24 19:23:16 -06:00
Jedrzej Kosinski	048f4f0b3a	Merge branch 'master' into worksplit-multigpu	2025-02-17 19:35:58 -06:00
Jedrzej Kosinski	d2504fb701	Merge branch 'master' into worksplit-multigpu	2025-02-11 22:34:51 -06:00
Jedrzej Kosinski	b03763bca6	Merge branch 'multigpu_support' into worksplit-multigpu	2025-02-07 13:27:49 -06:00
Jedrzej Kosinski	476aa79b64	Let --cuda-device take in a string to allow multiple devices (or device order) to be chosen, print available devices on startup, potentially support MultiGPU Intel and Ascend setups	2025-02-06 08:44:07 -06:00
Jedrzej Kosinski	441cfd1a7a	Merge branch 'master' into multigpu_support	2025-02-06 08:10:48 -06:00
Jedrzej Kosinski	99a5c1068a	Merge branch 'master' into multigpu_support	2025-02-02 03:19:18 -06:00
Jedrzej Kosinski	02747cde7d	Carry over change from _calc_cond_batch into _calc_cond_batch_multigpu	2025-01-29 11:10:23 -06:00
Jedrzej Kosinski	0b3233b4e2	Merge remote-tracking branch 'origin/master' into multigpu_support	2025-01-28 06:11:07 -06:00
Jedrzej Kosinski	eda866bf51	Extracted multigpu core code into multigpu.py, added load_balance_devices to get subdivision of work based on available devices and splittable work item count, added MultiGPU Options nodes to set relative_speed of specific devices; does not change behavior yet	2025-01-27 06:25:48 -06:00
Jedrzej Kosinski	e3298b84de	Create proper MultiGPU Initialize node, create gpu_options to create scaffolding for asymmetrical GPU support	2025-01-26 09:34:20 -06:00
Jedrzej Kosinski	c7feef9060	Cast transformer_options for multigpu	2025-01-26 05:29:27 -06:00
Jedrzej Kosinski	51af7fa1b4	Fix multigpu ControlBase get_models and cleanup calls to avoid multiple calls of functions on multigpu_clones versions of controlnets	2025-01-25 06:05:01 -06:00
Jedrzej Kosinski	46969c380a	Initial MultiGPU support for controlnets	2025-01-24 05:39:38 -06:00
Jedrzej Kosinski	5db4277449	Make sure additional_models are unloaded as well when perform	2025-01-23 19:06:05 -06:00
Jedrzej Kosinski	02a4d0ad7d	Added unload_model_and_clones to model_management.py to allow unloading only relevant models	2025-01-23 01:20:00 -06:00
Jedrzej Kosinski	ef137ac0b6	Merge branch 'multigpu_support' of https://github.com/kosinkadink/ComfyUI into multigpu_support	2025-01-20 04:34:39 -06:00
Jedrzej Kosinski	328d4f16a9	Make WeightHooks compatible with MultiGPU, clean up some code	2025-01-20 04:34:26 -06:00
Jedrzej Kosinski	bdbcb85b8d	Merge branch 'multigpu_support' of https://github.com/Kosinkadink/ComfyUI into multigpu_support	2025-01-20 00:51:42 -06:00
Jedrzej Kosinski	6c9e94bae7	Merge branch 'master' into multigpu_support	2025-01-20 00:51:37 -06:00
Jedrzej Kosinski	bfce723311	Initial work on multigpu_clone function, which will account for additional_models getting cloned	2025-01-17 03:31:28 -06:00
Jedrzej Kosinski	31f5458938	Merge branch 'master' into multigpu_support	2025-01-16 18:25:05 -06:00
Jedrzej Kosinski	2145a202eb	Merge branch 'master' into multigpu_support	2025-01-15 19:58:28 -06:00
Jedrzej Kosinski	25818dc848	Added a 'max_gpus' input	2025-01-14 13:45:14 -06:00
Jedrzej Kosinski	198953cd08	Add nodes_multigpu.py to loaded nodes	2025-01-14 12:24:55 -06:00
Jedrzej Kosinski	ec16ee2f39	Merge branch 'master' into multigpu_support	2025-01-13 20:21:06 -06:00
Jedrzej Kosinski	d5088072fb	Make test node for multigpu instead of storing it in just a local __init__.py	2025-01-13 20:20:25 -06:00
Jedrzej Kosinski	8d4b50158e	Merge branch 'master' into multigpu_support	2025-01-11 20:16:42 -06:00
Jedrzej Kosinski	e88c6c03ff	Fix cond_cat to not try to cast anything that doesn't have a 'to' function	2025-01-10 23:05:24 -06:00
Jedrzej Kosinski	d3cf2b7b24	Merge branch 'comfyanonymous:master' into multigpu_support	2025-01-10 20:24:37 -06:00
Jedrzej Kosinski	7448f02b7c	Initial proof of concept of giving splitting cond sampling between multiple GPUs	2025-01-08 03:33:05 -06:00
Jedrzej Kosinski	871258aa72	Add get_all_torch_devices to get detected devices intended for current torch hardware device	2025-01-07 21:06:03 -06:00
Jedrzej Kosinski	66838ebd39	Merge branch 'comfyanonymous:master' into multigpu_support	2025-01-07 20:11:27 -06:00
Jedrzej Kosinski	7333281698	Clean up a typehint	2025-01-07 02:58:59 -06:00
Jedrzej Kosinski	3cd4c5cb0a	Rename AddModelsHooks to AdditionalModelsHook, rename SetInjectionsHook to InjectionsHook (not yet implemented, but at least getting the naming figured out)	2025-01-07 02:22:49 -06:00
Jedrzej Kosinski	11c6d56037	Merge branch 'master' into hooks_part2	2025-01-07 01:01:53 -06:00
Jedrzej Kosinski	216fea15ee	Made TransformerOptionsHook contribute to registered hooks properly, added some doc strings and removed a so-far unused variable	2025-01-07 00:59:18 -06:00
Jedrzej Kosinski	58bf8815c8	Add a get_injections function to ModelPatcher	2025-01-06 20:34:30 -06:00
Jedrzej Kosinski	1b38f5bf57	removed 4 whitespace lines to satisfy Ruff,	2025-01-06 17:11:12 -06:00
Jedrzej Kosinski	2724ac4a60	Merge branch 'master' into hooks_part2	2025-01-06 17:04:24 -06:00
Jedrzej Kosinski	f48f90e471	Make hook_scope functional for TransformerOptionsHook	2025-01-06 02:23:04 -06:00
Jedrzej Kosinski	6463c39ce0	Merge branch 'master' into hooks_part2	2025-01-06 01:28:26 -06:00
Jedrzej Kosinski	0a7e2ae787	Filter only registered hooks on self.conds in CFGGuider.sample	2025-01-06 01:04:29 -06:00
Jedrzej Kosinski	03a97b604a	Fix performance of hooks when hooks are appended via Cond Pair Set Props nodes by properly caching between positive and negative conds, make hook_patches_backup behave as intended (in the case that something pre-registers WeightHooks on the ModelPatcher instead of registering it at sample time)	2025-01-06 01:03:59 -06:00
Jedrzej Kosinski	4446c86052	Made hook clone code sane, made clear ObjectPatchHook and SetInjectionsHook are not yet operational	2025-01-05 22:25:51 -06:00
Jedrzej Kosinski	8270ff312f	Refactored 'registered' to be HookGroup instead of a list of Hooks, made AddModelsHook operational and compliant with should_register result, moved TransformerOptionsHook handling out of ModelPatcher.register_all_hook_patches, support patches in TransformerOptionsHook properly by casting any patches/wrappers/hooks to proper device at sample time	2025-01-05 21:07:02 -06:00
Jedrzej Kosinski	db2d7ad9ba	Merge branch 'add_sample_sigmas' into hooks_part2	2025-01-05 15:45:13 -06:00
Jedrzej Kosinski	6620d86318	In inner_sample, change "sigmas" to "sampler_sigmas" in transformer_options to not conflict with the "sigmas" that will overwrite "sigmas" in _calc_cond_batch	2025-01-05 15:26:22 -06:00
Jedrzej Kosinski	111fd0cadf	Refactored HookGroup to also store a dictionary of hooks separated by hook_type, modified necessary code to no longer need to manually separate out hooks by hook_type	2025-01-04 02:04:07 -06:00
Jedrzej Kosinski	776aa734e1	Refactor WrapperHook into TransformerOptionsHook, as there is no need to separate out Wrappers/Callbacks/Patches into different hook types (all affect transformer_options)	2025-01-04 01:02:21 -06:00
Jedrzej Kosinski	5a2ad032cb	Cleaned up hooks.py, refactored Hook.should_register and add_hook_patches to use target_dict instead of target so that more information can be provided about the current execution environment if needed	2025-01-03 20:02:27 -06:00
Jedrzej Kosinski	d44295ef71	Merge branch 'master' into hooks_part2	2025-01-03 18:28:31 -06:00
Jedrzej Kosinski	bf21be066f	Merge branch 'master' into hooks_part2	2024-12-30 14:16:22 -06:00
Jedrzej Kosinski	72bbf49349	Add 'sigmas' to transformer_options so that downstream code can know about the full scope of current sampling run, fix Hook Keyframes' guarantee_steps=1 inconsistent behavior with sampling split across different Sampling nodes/sampling runs by referencing 'sigmas'	2024-12-29 15:49:09 -06:00