ComfyUI 0.3.70

feat(api-nodes): add new Gemini model (#10789 )
Fix hunyuan 3d 2.0 (#10792 )
2026-02-13 11:40:02 +00:00 · 2025-11-18 19:37:20 -05:00 · 2025-11-18 14:26:44 -08:00 · 2025-11-18 16:46:19 -05:00 · 2025-11-18 07:00:21 -08:00 · 2025-11-18 03:59:27 -08:00
40 changed files with 959 additions and 1540 deletions
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@@ -8,13 +8,15 @@ body:
        Before submitting a **Bug Report**, please ensure the following:

        - **1:** You are running the latest version of ComfyUI.
-        - **2:** You have looked at the existing bug reports and made sure this isn't already reported.
+        - **2:** You have your ComfyUI logs and relevant workflow on hand and will post them in this bug report.
        - **3:** You confirmed that the bug is not caused by a custom node. You can disable all custom nodes by passing
-        `--disable-all-custom-nodes` command line argument.
+        `--disable-all-custom-nodes` command line argument. If you have custom node try updating them to the latest version.
        - **4:** This is an actual bug in ComfyUI, not just a support question. A bug is when you can specify exact
        steps to replicate what went wrong and others will be able to repeat your steps and see the same issue happen.

-        If unsure, ask on the [ComfyUI Matrix Space](https://app.element.io/#/room/%23comfyui_space%3Amatrix.org) or the [Comfy Org Discord](https://discord.gg/comfyorg) first.
+        ## Very Important
+
+        Please make sure that you post ALL your ComfyUI logs in the bug report. A bug report without logs will likely be ignored.
  - type: checkboxes
    id: custom-nodes-test
    attributes:
--- a/.github/PULL_REQUEST_TEMPLATE/api-node.md
+++ b/.github/PULL_REQUEST_TEMPLATE/api-node.md
@@ -0,0 +1,21 @@
+<!-- API_NODE_PR_CHECKLIST: do not remove -->
+
+## API Node PR Checklist
+
+### Scope
+- [ ] **Is API Node Change**
+
+### Pricing & Billing
+- [ ] **Need pricing update**
+- [ ] **No pricing update**
+
+If **Need pricing update**:
+- [ ] Metronome rate cards updated
+- [ ] Auto‑billing tests updated and passing
+
+### QA
+- [ ] **QA done**
+- [ ] **QA not required**
+
+### Comms
+- [ ] Informed **Kosinkadink**
--- a/.github/workflows/api-node-template.yml
+++ b/.github/workflows/api-node-template.yml
@@ -0,0 +1,58 @@
+name: Append API Node PR template
+
+on:
+  pull_request_target:
+    types: [opened, reopened, synchronize, ready_for_review]
+    paths:
+      - 'comfy_api_nodes/**'   # only run if these files changed
+
+permissions:
+  contents: read
+  pull-requests: write
+
+jobs:
+  inject:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Ensure template exists and append to PR body
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const { owner, repo } = context.repo;
+            const number = context.payload.pull_request.number;
+            const templatePath = '.github/PULL_REQUEST_TEMPLATE/api-node.md';
+            const marker = '<!-- API_NODE_PR_CHECKLIST: do not remove -->';
+
+            const { data: pr } = await github.rest.pulls.get({ owner, repo, pull_number: number });
+
+            let templateText;
+            try {
+              const res = await github.rest.repos.getContent({
+                owner,
+                repo,
+                path: templatePath,
+                ref: pr.base.ref
+              });
+              const buf = Buffer.from(res.data.content, res.data.encoding || 'base64');
+              templateText = buf.toString('utf8');
+            } catch (e) {
+              core.setFailed(`Required PR template not found at "${templatePath}" on ${pr.base.ref}. Please add it to the repo.`);
+              return;
+            }
+
+            // Enforce the presence of the marker inside the template (for idempotence)
+            if (!templateText.includes(marker)) {
+              core.setFailed(`Template at "${templatePath}" does not contain the required marker:\n${marker}\nAdd it so we can detect duplicates safely.`);
+              return;
+            }
+
+            // If the PR already contains the marker, do not append again.
+            const body = pr.body || '';
+            if (body.includes(marker)) {
+              core.info('Template already present in PR body; nothing to inject.');
+              return;
+            }
+
+            const newBody = (body ? body + '\n\n' : '') + templateText + '\n';
+            await github.rest.pulls.update({ owner, repo, pull_number: number, body: newBody });
+            core.notice('API Node template appended to PR description.');
--- a/.github/workflows/release-stable-all.yml
+++ b/.github/workflows/release-stable-all.yml
@@ -43,6 +43,23 @@ jobs:
      test_release: true
    secrets: inherit

+  release_nvidia_cu126:
+    permissions:
+      contents: "write"
+      packages: "write"
+      pull-requests: "read"
+    name: "Release NVIDIA cu126"
+    uses: ./.github/workflows/stable-release.yml
+    with:
+      git_tag: ${{ inputs.git_tag }}
+      cache_tag: "cu126"
+      python_minor: "12"
+      python_patch: "10"
+      rel_name: "nvidia"
+      rel_extra_name: "_cu126"
+      test_release: true
+    secrets: inherit
+
  release_amd_rocm:
    permissions:
      contents: "write"
--- a/.github/workflows/test-ci.yml
+++ b/.github/workflows/test-ci.yml
@@ -21,14 +21,15 @@ jobs:
      fail-fast: false
      matrix:
        # os: [macos, linux, windows]
-        os: [macos, linux]
-        python_version: ["3.9", "3.10", "3.11", "3.12"]
+        # os: [macos, linux]
+        os: [linux]
+        python_version: ["3.10", "3.11", "3.12"]
        cuda_version: ["12.1"]
        torch_version: ["stable"]
        include:
-          - os: macos
-            runner_label: [self-hosted, macOS]
-            flags: "--use-pytorch-cross-attention"
+          # - os: macos
+          #   runner_label: [self-hosted, macOS]
+          #   flags: "--use-pytorch-cross-attention"
          - os: linux
            runner_label: [self-hosted, Linux]
            flags: ""
@@ -73,14 +74,15 @@ jobs:
    strategy:
      fail-fast: false
      matrix:
-        os: [macos, linux]
+        # os: [macos, linux]
+        os: [linux]
        python_version: ["3.11"]
        cuda_version: ["12.1"]
        torch_version: ["nightly"]
        include:
-          - os: macos
-            runner_label: [self-hosted, macOS]
-            flags: "--use-pytorch-cross-attention"
+          # - os: macos
+          #   runner_label: [self-hosted, macOS]
+          #   flags: "--use-pytorch-cross-attention"
          - os: linux
            runner_label: [self-hosted, Linux]
            flags: ""
--- a/QUANTIZATION.md
+++ b/QUANTIZATION.md
@@ -0,0 +1,168 @@
+# The Comfy guide to Quantization
+
+
+## How does quantization work?
+
+Quantization aims to map a high-precision value x_f to a lower precision format with minimal loss in accuracy. These smaller formats then serve to reduce the models memory footprint and increase throughput by using specialized hardware.
+
+When simply converting a value from FP16 to FP8 using the round-nearest method we might hit two issues:
+- The dynamic range of FP16 (-65,504, 65,504) far exceeds FP8 formats like E4M3 (-448, 448) or E5M2 (-57,344, 57,344), potentially resulting in clipped values
+- The original values are concentrated in a small range (e.g. -1,1) leaving many FP8-bits "unused"
+
+By using a scaling factor, we aim to map these values into the quantized-dtype range, making use of the full spectrum. One of the easiest approaches, and common, is using per-tensor absolute-maximum scaling.
+
+```
+absmax = max(abs(tensor))
+scale = amax / max_dynamic_range_low_precision
+
+# Quantization
+tensor_q = (tensor / scale).to(low_precision_dtype)
+
+# De-Quantization
+tensor_dq = tensor_q.to(fp16) * scale
+
+tensor_dq ~ tensor
+```
+
+Given that additional information (scaling factor) is needed to "interpret" the quantized values, we describe those as derived datatypes.
+
+
+## Quantization in Comfy
+
+```
+QuantizedTensor (torch.Tensor subclass)
+  ↓ __torch_dispatch__
+Two-Level Registry (generic + layout handlers)
+  ↓
+MixedPrecisionOps + Metadata Detection
+```
+
+### Representation
+
+To represent these derived datatypes, ComfyUI uses a subclass of torch.Tensor to implements these using the `QuantizedTensor` class found in `comfy/quant_ops.py`
+
+A `Layout` class defines how a specific quantization format behaves:
+- Required parameters
+- Quantize method
+- De-Quantize method
+
+```python
+from comfy.quant_ops import QuantizedLayout
+
+class MyLayout(QuantizedLayout):
+    @classmethod
+    def quantize(cls, tensor, **kwargs):
+        # Convert to quantized format
+        qdata = ...
+        params = {'scale': ..., 'orig_dtype': tensor.dtype}
+        return qdata, params
+    
+    @staticmethod
+    def dequantize(qdata, scale, orig_dtype, **kwargs):
+        return qdata.to(orig_dtype) * scale
+```
+
+To then run operations using these QuantizedTensors we use two registry systems to define supported operations. 
+The first is a **generic registry** that handles operations common to all quantized formats (e.g., `.to()`, `.clone()`, `.reshape()`).
+
+The second registry is layout-specific and allows to implement fast-paths like nn.Linear.
+```python
+from comfy.quant_ops import register_layout_op
+
+@register_layout_op(torch.ops.aten.linear.default, MyLayout)
+def my_linear(func, args, kwargs):
+    # Extract tensors, call optimized kernel
+    ...
+```
+When `torch.nn.functional.linear()` is called with QuantizedTensor arguments, `__torch_dispatch__` automatically routes to the registered implementation.
+For any unsupported operation, QuantizedTensor will fallback to call `dequantize` and dispatch using the high-precision implementation.
+
+
+### Mixed Precision
+
+The `MixedPrecisionOps` class (lines 542-648 in `comfy/ops.py`) enables per-layer quantization decisions, allowing different layers in a model to use different precisions. This is activated when a model config contains a `layer_quant_config` dictionary that specifies which layers should be quantized and how.
+
+**Architecture:**
+
+```python
+class MixedPrecisionOps(disable_weight_init):
+    _layer_quant_config = {}  # Maps layer names to quantization configs
+    _compute_dtype = torch.bfloat16  # Default compute / dequantize precision
+```
+
+**Key mechanism:**
+
+The custom `Linear._load_from_state_dict()` method inspects each layer during model loading:
+- If the layer name is **not** in `_layer_quant_config`: load weight as regular tensor in `_compute_dtype`
+- If the layer name **is** in `_layer_quant_config`: 
+  - Load weight as `QuantizedTensor` with the specified layout (e.g., `TensorCoreFP8Layout`)
+  - Load associated quantization parameters (scales, block_size, etc.)
+
+**Why it's needed:**
+
+Not all layers tolerate quantization equally. Sensitive operations like final projections can be kept in higher precision, while compute-heavy matmuls are quantized. This provides most of the performance benefits while maintaining quality.
+
+The system is selected in `pick_operations()` when `model_config.layer_quant_config` is present, making it the highest-priority operation mode.
+
+
+## Checkpoint Format
+
+Quantized checkpoints are stored as standard safetensors files with quantized weight tensors and associated scaling parameters, plus a `_quantization_metadata` JSON entry describing the quantization scheme.
+
+The quantized checkpoint will contain the same layers as the original checkpoint but:
+- The weights are stored as quantized values, sometimes using a different storage datatype. E.g. uint8 container for fp8.
+- For each quantized weight a number of additional scaling parameters are stored alongside depending on the recipe.
+- We store a metadata.json in the metadata of the final safetensor containing the `_quantization_metadata` describing which layers are quantized and what layout has been used.
+
+### Scaling Parameters details
+We define 4 possible scaling parameters that should cover most recipes in the near-future:
+- **weight_scale**: quantization scalers for the weights
+- **weight_scale_2**: global scalers in the context of double scaling
+- **pre_quant_scale**: scalers used for smoothing salient weights
+- **input_scale**: quantization scalers for the activations
+
+| Format | Storage dtype | weight_scale | weight_scale_2 | pre_quant_scale | input_scale |
+|--------|---------------|--------------|----------------|-----------------|-------------|
+| float8_e4m3fn | float32 | float32 (scalar) | - | - | float32 (scalar) |
+
+You can find the defined formats in `comfy/quant_ops.py` (QUANT_ALGOS).
+
+### Quantization Metadata
+
+The metadata stored alongside the checkpoint contains:
+- **format_version**: String to define a version of the standard
+- **layers**: A dictionary mapping layer names to their quantization format. The format string maps to the definitions found in `QUANT_ALGOS`. 
+
+Example:
+```json
+{
+  "_quantization_metadata": {
+    "format_version": "1.0",
+    "layers": {
+      "model.layers.0.mlp.up_proj": "float8_e4m3fn",
+      "model.layers.0.mlp.down_proj": "float8_e4m3fn",
+      "model.layers.1.mlp.up_proj": "float8_e4m3fn"
+    }
+  }
+}
+```
+
+
+## Creating Quantized Checkpoints
+
+To create compatible checkpoints, use any quantization tool provided the output follows the checkpoint format described above and uses a layout defined in `QUANT_ALGOS`.
+
+### Weight Quantization
+
+Weight quantization is straightforward - compute the scaling factor directly from the weight tensor using the absolute maximum method described earlier. Each layer's weights are quantized independently and stored with their corresponding `weight_scale` parameter.
+
+### Calibration (for Activation Quantization)
+
+Activation quantization (e.g., for FP8 Tensor Core operations) requires `input_scale` parameters that cannot be determined from static weights alone. Since activation values depend on actual inputs, we use **post-training calibration (PTQ)**:
+
+1. **Collect statistics**: Run inference on N representative samples
+2. **Track activations**: Record the absolute maximum (`amax`) of inputs to each quantized layer
+3. **Compute scales**: Derive `input_scale` from collected statistics
+4. **Store in checkpoint**: Save `input_scale` parameters alongside weights
+
+The calibration dataset should be representative of your target use case. For diffusion models, this typically means a diverse set of prompts and generation parameters.
--- a/README.md
+++ b/README.md
@@ -112,10 +112,11 @@ Workflow examples can be found on the [Examples page](https://comfyanonymous.git

 ## Release Process

-ComfyUI follows a weekly release cycle targeting Friday but this regularly changes because of model releases or large changes to the codebase. There are three interconnected repositories:
+ComfyUI follows a weekly release cycle targeting Monday but this regularly changes because of model releases or large changes to the codebase. There are three interconnected repositories:

 1. **[ComfyUI Core](https://github.com/comfyanonymous/ComfyUI)**
-   - Releases a new stable version (e.g., v0.7.0)
+   - Releases a new stable version (e.g., v0.7.0) roughly every week.
+   - Commits outside of the stable release tags may be very unstable and break many custom nodes.
   - Serves as the foundation for the desktop release

 2. **[ComfyUI Desktop](https://github.com/Comfy-Org/desktop)**
@@ -172,7 +173,7 @@ There is a portable standalone build for Windows that should work for running on

 ### [Direct link to download](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia.7z)

-Simply download, extract with [7-Zip](https://7-zip.org) and run. Make sure you put your Stable Diffusion checkpoints/models (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints
+Simply download, extract with [7-Zip](https://7-zip.org) or with the windows explorer on recent windows versions and run. For smaller models you normally only need to put the checkpoints (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints but many of the larger models have multiple files. Make sure to follow the instructions to know which subfolder to put them in ComfyUI\models\

 If you have trouble extracting it, right click the file -> properties -> unblock

@@ -182,7 +183,9 @@ Update your Nvidia drivers if it doesn't start.

 [Experimental portable for AMD GPUs](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_amd.7z)

-[Portable with pytorch cuda 12.8 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu128.7z) (Supports Nvidia 10 series and older GPUs).
+[Portable with pytorch cuda 12.8 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu128.7z).
+
+[Portable with pytorch cuda 12.6 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu126.7z) (Supports Nvidia 10 series and older GPUs).

 #### How do I share models between another UI and ComfyUI?

@@ -199,7 +202,7 @@ comfy install

 ## Manual Install (Windows, Linux)

-Python 3.14 will work if you comment out the `kornia` dependency in the requirements.txt file (breaks the canny node) but it is not recommended.
+Python 3.14 works but you may encounter issues with the torch compile node. The free threaded variant is still missing some dependencies.

 Python 3.13 is very well supported. If you have trouble with some custom node dependencies on 3.13 you can try 3.12

@@ -220,7 +223,7 @@ AMD users can install rocm and pytorch with pip if you don't have it already ins

 This is the command to install the nightly with ROCm 7.0 which might have some performance improvements:

-```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.0```
+```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.1```


 ### AMD GPUs (Experimental: Windows and Linux), RDNA 3, 3.5 and 4 only.
@@ -241,7 +244,7 @@ RDNA 4 (RX 9000 series):

 ### Intel GPUs (Windows and Linux)

-(Option 1) Intel Arc GPU users can install native PyTorch with torch.xpu support using pip. More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)
+Intel Arc GPU users can install native PyTorch with torch.xpu support using pip. More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)

 1. To install PyTorch xpu, use the following command:

@@ -251,10 +254,6 @@ This is the command to install the Pytorch xpu nightly which might have some per

 ```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu```

-(Option 2) Alternatively, Intel GPUs supported by Intel Extension for PyTorch (IPEX) can leverage IPEX for improved performance.
-
-1. visit [Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu) for more information.
-
 ### NVIDIA

 Nvidia users should install stable pytorch using this command:
--- a/comfy/cli_args.py
+++ b/comfy/cli_args.py
@@ -145,10 +145,11 @@ class PerformanceFeature(enum.Enum):
    Fp8MatrixMultiplication = "fp8_matrix_mult"
    CublasOps = "cublas_ops"
    AutoTune = "autotune"
-    PinnedMem = "pinned_memory"

 parser.add_argument("--fast", nargs="*", type=PerformanceFeature, help="Enable some untested and potentially quality deteriorating optimizations. This is used to test new features so using it might crash your comfyui. --fast with no arguments enables everything. You can pass a list specific optimizations if you only want to enable specific ones. Current valid optimizations: {}".format(" ".join(map(lambda c: c.value, PerformanceFeature))))

+parser.add_argument("--disable-pinned-memory", action="store_true", help="Disable pinned memory use.")
+
 parser.add_argument("--mmap-torch-files", action="store_true", help="Use mmap when loading ckpt/pt files.")
 parser.add_argument("--disable-mmap", action="store_true", help="Don't use mmap when loading safetensors.")

--- a/comfy/ldm/chroma/layers.py
+++ b/comfy/ldm/chroma/layers.py
@@ -1,15 +1,15 @@
 import torch
 from torch import Tensor, nn

-from comfy.ldm.flux.math import attention
 from comfy.ldm.flux.layers import (
    MLPEmbedder,
    RMSNorm,
-    QKNorm,
-    SelfAttention,
    ModulationOut,
 )

+# TODO: remove this in a few months
+SingleStreamBlock = None
+DoubleStreamBlock = None


 class ChromaModulationOut(ModulationOut):
@@ -48,124 +48,6 @@ class Approximator(nn.Module):
        return x


-class DoubleStreamBlock(nn.Module):
-    def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False, flipped_img_txt=False, dtype=None, device=None, operations=None):
-        super().__init__()
-
-        mlp_hidden_dim = int(hidden_size * mlp_ratio)
-        self.num_heads = num_heads
-        self.hidden_size = hidden_size
-        self.img_norm1 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
-        self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias, dtype=dtype, device=device, operations=operations)
-
-        self.img_norm2 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
-        self.img_mlp = nn.Sequential(
-            operations.Linear(hidden_size, mlp_hidden_dim, bias=True, dtype=dtype, device=device),
-            nn.GELU(approximate="tanh"),
-            operations.Linear(mlp_hidden_dim, hidden_size, bias=True, dtype=dtype, device=device),
-        )
-
-        self.txt_norm1 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
-        self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias, dtype=dtype, device=device, operations=operations)
-
-        self.txt_norm2 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
-        self.txt_mlp = nn.Sequential(
-            operations.Linear(hidden_size, mlp_hidden_dim, bias=True, dtype=dtype, device=device),
-            nn.GELU(approximate="tanh"),
-            operations.Linear(mlp_hidden_dim, hidden_size, bias=True, dtype=dtype, device=device),
-        )
-        self.flipped_img_txt = flipped_img_txt
-
-    def forward(self, img: Tensor, txt: Tensor, pe: Tensor, vec: Tensor, attn_mask=None, transformer_options={}):
-        (img_mod1, img_mod2), (txt_mod1, txt_mod2) = vec
-
-        # prepare image for attention
-        img_modulated = torch.addcmul(img_mod1.shift, 1 + img_mod1.scale, self.img_norm1(img))
-        img_qkv = self.img_attn.qkv(img_modulated)
-        img_q, img_k, img_v = img_qkv.view(img_qkv.shape[0], img_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
-        img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)
-
-        # prepare txt for attention
-        txt_modulated = torch.addcmul(txt_mod1.shift, 1 + txt_mod1.scale, self.txt_norm1(txt))
-        txt_qkv = self.txt_attn.qkv(txt_modulated)
-        txt_q, txt_k, txt_v = txt_qkv.view(txt_qkv.shape[0], txt_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
-        txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)
-
-        # run actual attention
-        attn = attention(torch.cat((txt_q, img_q), dim=2),
-                         torch.cat((txt_k, img_k), dim=2),
-                         torch.cat((txt_v, img_v), dim=2),
-                         pe=pe, mask=attn_mask, transformer_options=transformer_options)
-
-        txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
-
-        # calculate the img bloks
-        img.addcmul_(img_mod1.gate, self.img_attn.proj(img_attn))
-        img.addcmul_(img_mod2.gate, self.img_mlp(torch.addcmul(img_mod2.shift, 1 + img_mod2.scale, self.img_norm2(img))))
-
-        # calculate the txt bloks
-        txt.addcmul_(txt_mod1.gate, self.txt_attn.proj(txt_attn))
-        txt.addcmul_(txt_mod2.gate, self.txt_mlp(torch.addcmul(txt_mod2.shift, 1 + txt_mod2.scale, self.txt_norm2(txt))))
-
-        if txt.dtype == torch.float16:
-            txt = torch.nan_to_num(txt, nan=0.0, posinf=65504, neginf=-65504)
-
-        return img, txt
-
-
-class SingleStreamBlock(nn.Module):
-    """
-    A DiT block with parallel linear layers as described in
-    https://arxiv.org/abs/2302.05442 and adapted modulation interface.
-    """
-
-    def __init__(
-        self,
-        hidden_size: int,
-        num_heads: int,
-        mlp_ratio: float = 4.0,
-        qk_scale: float = None,
-        dtype=None,
-        device=None,
-        operations=None
-    ):
-        super().__init__()
-        self.hidden_dim = hidden_size
-        self.num_heads = num_heads
-        head_dim = hidden_size // num_heads
-        self.scale = qk_scale or head_dim**-0.5
-
-        self.mlp_hidden_dim = int(hidden_size * mlp_ratio)
-        # qkv and mlp_in
-        self.linear1 = operations.Linear(hidden_size, hidden_size * 3 + self.mlp_hidden_dim, dtype=dtype, device=device)
-        # proj and mlp_out
-        self.linear2 = operations.Linear(hidden_size + self.mlp_hidden_dim, hidden_size, dtype=dtype, device=device)
-
-        self.norm = QKNorm(head_dim, dtype=dtype, device=device, operations=operations)
-
-        self.hidden_size = hidden_size
-        self.pre_norm = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
-
-        self.mlp_act = nn.GELU(approximate="tanh")
-
-    def forward(self, x: Tensor, pe: Tensor, vec: Tensor, attn_mask=None, transformer_options={}) -> Tensor:
-        mod = vec
-        x_mod = torch.addcmul(mod.shift, 1 + mod.scale, self.pre_norm(x))
-        qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)
-
-        q, k, v = qkv.view(qkv.shape[0], qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
-        q, k = self.norm(q, k, v)
-
-        # compute attention
-        attn = attention(q, k, v, pe=pe, mask=attn_mask, transformer_options=transformer_options)
-        # compute activation in mlp stream, cat again and run second linear layer
-        output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
-        x.addcmul_(mod.gate, output)
-        if x.dtype == torch.float16:
-            x = torch.nan_to_num(x, nan=0.0, posinf=65504, neginf=-65504)
-        return x
-
-
 class LastLayer(nn.Module):
    def __init__(self, hidden_size: int, patch_size: int, out_channels: int, dtype=None, device=None, operations=None):
        super().__init__()
--- a/comfy/ldm/chroma/model.py
+++ b/comfy/ldm/chroma/model.py
@@ -11,12 +11,12 @@ import comfy.ldm.common_dit
 from comfy.ldm.flux.layers import (
    EmbedND,
    timestep_embedding,
+    DoubleStreamBlock,
+    SingleStreamBlock,
 )

 from .layers import (
-    DoubleStreamBlock,
    LastLayer,
-    SingleStreamBlock,
    Approximator,
    ChromaModulationOut,
 )
@@ -90,6 +90,7 @@ class Chroma(nn.Module):
                    self.num_heads,
                    mlp_ratio=params.mlp_ratio,
                    qkv_bias=params.qkv_bias,
+                    modulation=False,
                    dtype=dtype, device=device, operations=operations
                )
                for _ in range(params.depth)
@@ -98,7 +99,7 @@ class Chroma(nn.Module):

        self.single_blocks = nn.ModuleList(
            [
-                SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=params.mlp_ratio, dtype=dtype, device=device, operations=operations)
+                SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=params.mlp_ratio, modulation=False, dtype=dtype, device=device, operations=operations)
                for _ in range(params.depth_single_blocks)
            ]
        )
--- a/comfy/ldm/chroma_radiance/model.py
+++ b/comfy/ldm/chroma_radiance/model.py
@@ -10,12 +10,10 @@ from torch import Tensor, nn
 from einops import repeat
 import comfy.ldm.common_dit

-from comfy.ldm.flux.layers import EmbedND
+from comfy.ldm.flux.layers import EmbedND, DoubleStreamBlock, SingleStreamBlock

 from comfy.ldm.chroma.model import Chroma, ChromaParams
 from comfy.ldm.chroma.layers import (
-    DoubleStreamBlock,
-    SingleStreamBlock,
    Approximator,
 )
 from .layers import (
@@ -89,7 +87,6 @@ class ChromaRadiance(Chroma):
                    dtype=dtype, device=device, operations=operations
                )

-
        self.double_blocks = nn.ModuleList(
            [
                DoubleStreamBlock(
@@ -97,6 +94,7 @@ class ChromaRadiance(Chroma):
                    self.num_heads,
                    mlp_ratio=params.mlp_ratio,
                    qkv_bias=params.qkv_bias,
+                    modulation=False,
                    dtype=dtype, device=device, operations=operations
                )
                for _ in range(params.depth)
@@ -109,6 +107,7 @@ class ChromaRadiance(Chroma):
                    self.hidden_size,
                    self.num_heads,
                    mlp_ratio=params.mlp_ratio,
+                    modulation=False,
                    dtype=dtype, device=device, operations=operations,
                )
                for _ in range(params.depth_single_blocks)
--- a/comfy/ldm/flux/layers.py
+++ b/comfy/ldm/flux/layers.py
@@ -130,13 +130,17 @@ def apply_mod(tensor, m_mult, m_add=None, modulation_dims=None):


 class DoubleStreamBlock(nn.Module):
-    def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False, flipped_img_txt=False, dtype=None, device=None, operations=None):
+    def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False, flipped_img_txt=False, modulation=True, dtype=None, device=None, operations=None):
        super().__init__()

        mlp_hidden_dim = int(hidden_size * mlp_ratio)
        self.num_heads = num_heads
        self.hidden_size = hidden_size
-        self.img_mod = Modulation(hidden_size, double=True, dtype=dtype, device=device, operations=operations)
+        self.modulation = modulation
+
+        if self.modulation:
+            self.img_mod = Modulation(hidden_size, double=True, dtype=dtype, device=device, operations=operations)
+
        self.img_norm1 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
        self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias, dtype=dtype, device=device, operations=operations)

@@ -147,7 +151,9 @@ class DoubleStreamBlock(nn.Module):
            operations.Linear(mlp_hidden_dim, hidden_size, bias=True, dtype=dtype, device=device),
        )

-        self.txt_mod = Modulation(hidden_size, double=True, dtype=dtype, device=device, operations=operations)
+        if self.modulation:
+            self.txt_mod = Modulation(hidden_size, double=True, dtype=dtype, device=device, operations=operations)
+
        self.txt_norm1 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
        self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias, dtype=dtype, device=device, operations=operations)

@@ -160,46 +166,65 @@ class DoubleStreamBlock(nn.Module):
        self.flipped_img_txt = flipped_img_txt

    def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor, attn_mask=None, modulation_dims_img=None, modulation_dims_txt=None, transformer_options={}):
-        img_mod1, img_mod2 = self.img_mod(vec)
-        txt_mod1, txt_mod2 = self.txt_mod(vec)
+        if self.modulation:
+            img_mod1, img_mod2 = self.img_mod(vec)
+            txt_mod1, txt_mod2 = self.txt_mod(vec)
+        else:
+            (img_mod1, img_mod2), (txt_mod1, txt_mod2) = vec

        # prepare image for attention
        img_modulated = self.img_norm1(img)
        img_modulated = apply_mod(img_modulated, (1 + img_mod1.scale), img_mod1.shift, modulation_dims_img)
        img_qkv = self.img_attn.qkv(img_modulated)
+        del img_modulated
        img_q, img_k, img_v = img_qkv.view(img_qkv.shape[0], img_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
+        del img_qkv
        img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)

        # prepare txt for attention
        txt_modulated = self.txt_norm1(txt)
        txt_modulated = apply_mod(txt_modulated, (1 + txt_mod1.scale), txt_mod1.shift, modulation_dims_txt)
        txt_qkv = self.txt_attn.qkv(txt_modulated)
+        del txt_modulated
        txt_q, txt_k, txt_v = txt_qkv.view(txt_qkv.shape[0], txt_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
+        del txt_qkv
        txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)

        if self.flipped_img_txt:
+            q = torch.cat((img_q, txt_q), dim=2)
+            del img_q, txt_q
+            k = torch.cat((img_k, txt_k), dim=2)
+            del img_k, txt_k
+            v = torch.cat((img_v, txt_v), dim=2)
+            del img_v, txt_v
            # run actual attention
-            attn = attention(torch.cat((img_q, txt_q), dim=2),
-                             torch.cat((img_k, txt_k), dim=2),
-                             torch.cat((img_v, txt_v), dim=2),
+            attn = attention(q, k, v,
                             pe=pe, mask=attn_mask, transformer_options=transformer_options)
+            del q, k, v

            img_attn, txt_attn = attn[:, : img.shape[1]], attn[:, img.shape[1]:]
        else:
+            q = torch.cat((txt_q, img_q), dim=2)
+            del txt_q, img_q
+            k = torch.cat((txt_k, img_k), dim=2)
+            del txt_k, img_k
+            v = torch.cat((txt_v, img_v), dim=2)
+            del txt_v, img_v
            # run actual attention
-            attn = attention(torch.cat((txt_q, img_q), dim=2),
-                             torch.cat((txt_k, img_k), dim=2),
-                             torch.cat((txt_v, img_v), dim=2),
+            attn = attention(q, k, v,
                             pe=pe, mask=attn_mask, transformer_options=transformer_options)
+            del q, k, v

            txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1]:]

        # calculate the img bloks
-        img = img + apply_mod(self.img_attn.proj(img_attn), img_mod1.gate, None, modulation_dims_img)
-        img = img + apply_mod(self.img_mlp(apply_mod(self.img_norm2(img), (1 + img_mod2.scale), img_mod2.shift, modulation_dims_img)), img_mod2.gate, None, modulation_dims_img)
+        img += apply_mod(self.img_attn.proj(img_attn), img_mod1.gate, None, modulation_dims_img)
+        del img_attn
+        img += apply_mod(self.img_mlp(apply_mod(self.img_norm2(img), (1 + img_mod2.scale), img_mod2.shift, modulation_dims_img)), img_mod2.gate, None, modulation_dims_img)

        # calculate the txt bloks
        txt += apply_mod(self.txt_attn.proj(txt_attn), txt_mod1.gate, None, modulation_dims_txt)
+        del txt_attn
        txt += apply_mod(self.txt_mlp(apply_mod(self.txt_norm2(txt), (1 + txt_mod2.scale), txt_mod2.shift, modulation_dims_txt)), txt_mod2.gate, None, modulation_dims_txt)

        if txt.dtype == torch.float16:
@@ -220,6 +245,7 @@ class SingleStreamBlock(nn.Module):
        num_heads: int,
        mlp_ratio: float = 4.0,
        qk_scale: float = None,
+        modulation=True,
        dtype=None,
        device=None,
        operations=None
@@ -242,19 +268,29 @@ class SingleStreamBlock(nn.Module):
        self.pre_norm = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)

        self.mlp_act = nn.GELU(approximate="tanh")
-        self.modulation = Modulation(hidden_size, double=False, dtype=dtype, device=device, operations=operations)
+        if modulation:
+            self.modulation = Modulation(hidden_size, double=False, dtype=dtype, device=device, operations=operations)
+        else:
+            self.modulation = None

    def forward(self, x: Tensor, vec: Tensor, pe: Tensor, attn_mask=None, modulation_dims=None, transformer_options={}) -> Tensor:
-        mod, _ = self.modulation(vec)
+        if self.modulation:
+            mod, _ = self.modulation(vec)
+        else:
+            mod = vec
+
        qkv, mlp = torch.split(self.linear1(apply_mod(self.pre_norm(x), (1 + mod.scale), mod.shift, modulation_dims)), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)

        q, k, v = qkv.view(qkv.shape[0], qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
+        del qkv
        q, k = self.norm(q, k, v)

        # compute attention
        attn = attention(q, k, v, pe=pe, mask=attn_mask, transformer_options=transformer_options)
+        del q, k, v
        # compute activation in mlp stream, cat again and run second linear layer
-        output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
+        mlp = self.mlp_act(mlp)
+        output = self.linear2(torch.cat((attn, mlp), 2))
        x += apply_mod(output, mod.gate, None, modulation_dims)
        if x.dtype == torch.float16:
            x = torch.nan_to_num(x, nan=0.0, posinf=65504, neginf=-65504)
--- a/comfy/ldm/flux/math.py
+++ b/comfy/ldm/flux/math.py
@@ -7,15 +7,8 @@ import comfy.model_management


 def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor, mask=None, transformer_options={}) -> Tensor:
-    q_shape = q.shape
-    k_shape = k.shape
-
    if pe is not None:
-        q = q.to(dtype=pe.dtype).reshape(*q.shape[:-1], -1, 1, 2)
-        k = k.to(dtype=pe.dtype).reshape(*k.shape[:-1], -1, 1, 2)
-        q = (pe[..., 0] * q[..., 0] + pe[..., 1] * q[..., 1]).reshape(*q_shape).type_as(v)
-        k = (pe[..., 0] * k[..., 0] + pe[..., 1] * k[..., 1]).reshape(*k_shape).type_as(v)
-
+        q, k = apply_rope(q, k, pe)
    heads = q.shape[1]
    x = optimized_attention(q, k, v, heads, skip_reshape=True, mask=mask, transformer_options=transformer_options)
    return x
--- a/comfy/ldm/flux/model.py
+++ b/comfy/ldm/flux/model.py
@@ -210,7 +210,7 @@ class Flux(nn.Module):
        img = self.final_layer(img, vec)  # (N, T, patch_size ** 2 * out_channels)
        return img

-    def process_img(self, x, index=0, h_offset=0, w_offset=0):
+    def process_img(self, x, index=0, h_offset=0, w_offset=0, transformer_options={}):
        bs, c, h, w = x.shape
        patch_size = self.patch_size
        x = comfy.ldm.common_dit.pad_to_patch_size(x, (patch_size, patch_size))
@@ -222,10 +222,22 @@ class Flux(nn.Module):
        h_offset = ((h_offset + (patch_size // 2)) // patch_size)
        w_offset = ((w_offset + (patch_size // 2)) // patch_size)

-        img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
+        steps_h = h_len
+        steps_w = w_len
+
+        rope_options = transformer_options.get("rope_options", None)
+        if rope_options is not None:
+            h_len = (h_len - 1.0) * rope_options.get("scale_y", 1.0) + 1.0
+            w_len = (w_len - 1.0) * rope_options.get("scale_x", 1.0) + 1.0
+
+            index += rope_options.get("shift_t", 0.0)
+            h_offset += rope_options.get("shift_y", 0.0)
+            w_offset += rope_options.get("shift_x", 0.0)
+
+        img_ids = torch.zeros((steps_h, steps_w, 3), device=x.device, dtype=x.dtype)
        img_ids[:, :, 0] = img_ids[:, :, 1] + index
-        img_ids[:, :, 1] = img_ids[:, :, 1] + torch.linspace(h_offset, h_len - 1 + h_offset, steps=h_len, device=x.device, dtype=x.dtype).unsqueeze(1)
-        img_ids[:, :, 2] = img_ids[:, :, 2] + torch.linspace(w_offset, w_len - 1 + w_offset, steps=w_len, device=x.device, dtype=x.dtype).unsqueeze(0)
+        img_ids[:, :, 1] = img_ids[:, :, 1] + torch.linspace(h_offset, h_len - 1 + h_offset, steps=steps_h, device=x.device, dtype=x.dtype).unsqueeze(1)
+        img_ids[:, :, 2] = img_ids[:, :, 2] + torch.linspace(w_offset, w_len - 1 + w_offset, steps=steps_w, device=x.device, dtype=x.dtype).unsqueeze(0)
        return img, repeat(img_ids, "h w c -> b (h w) c", b=bs)

    def forward(self, x, timestep, context, y=None, guidance=None, ref_latents=None, control=None, transformer_options={}, **kwargs):
@@ -241,7 +253,7 @@ class Flux(nn.Module):

        h_len = ((h_orig + (patch_size // 2)) // patch_size)
        w_len = ((w_orig + (patch_size // 2)) // patch_size)
-        img, img_ids = self.process_img(x)
+        img, img_ids = self.process_img(x, transformer_options=transformer_options)
        img_tokens = img.shape[1]
        if ref_latents is not None:
            h = 0
--- a/comfy/ldm/lightricks/model.py
+++ b/comfy/ldm/lightricks/model.py
@@ -3,12 +3,11 @@ from torch import nn
 import comfy.patcher_extension
 import comfy.ldm.modules.attention
 import comfy.ldm.common_dit
-from einops import rearrange
 import math
 from typing import Dict, Optional, Tuple

 from .symmetric_patchifier import SymmetricPatchifier, latent_to_pixel_coords
-
+from comfy.ldm.flux.math import apply_rope1

 def get_timestep_embedding(
    timesteps: torch.Tensor,
@@ -238,20 +237,6 @@ class FeedForward(nn.Module):
        return self.net(x)


-def apply_rotary_emb(input_tensor, freqs_cis): #TODO: remove duplicate funcs and pick the best/fastest one
-    cos_freqs = freqs_cis[0]
-    sin_freqs = freqs_cis[1]
-
-    t_dup = rearrange(input_tensor, "... (d r) -> ... d r", r=2)
-    t1, t2 = t_dup.unbind(dim=-1)
-    t_dup = torch.stack((-t2, t1), dim=-1)
-    input_tensor_rot = rearrange(t_dup, "... d r -> ... (d r)")
-
-    out = input_tensor * cos_freqs + input_tensor_rot * sin_freqs
-
-    return out
-
-
 class CrossAttention(nn.Module):
    def __init__(self, query_dim, context_dim=None, heads=8, dim_head=64, dropout=0., attn_precision=None, dtype=None, device=None, operations=None):
        super().__init__()
@@ -281,8 +266,8 @@ class CrossAttention(nn.Module):
        k = self.k_norm(k)

        if pe is not None:
-            q = apply_rotary_emb(q, pe)
-            k = apply_rotary_emb(k, pe)
+            q = apply_rope1(q.unsqueeze(1), pe).squeeze(1)
+            k = apply_rope1(k.unsqueeze(1), pe).squeeze(1)

        if mask is None:
            out = comfy.ldm.modules.attention.optimized_attention(q, k, v, self.heads, attn_precision=self.attn_precision, transformer_options=transformer_options)
@@ -306,12 +291,17 @@ class BasicTransformerBlock(nn.Module):
    def forward(self, x, context=None, attention_mask=None, timestep=None, pe=None, transformer_options={}):
        shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (self.scale_shift_table[None, None].to(device=x.device, dtype=x.dtype) + timestep.reshape(x.shape[0], timestep.shape[1], self.scale_shift_table.shape[0], -1)).unbind(dim=2)

-        x += self.attn1(comfy.ldm.common_dit.rms_norm(x) * (1 + scale_msa) + shift_msa, pe=pe, transformer_options=transformer_options) * gate_msa
+        attn1_input = comfy.ldm.common_dit.rms_norm(x)
+        attn1_input = torch.addcmul(attn1_input, attn1_input, scale_msa).add_(shift_msa)
+        attn1_input = self.attn1(attn1_input, pe=pe, transformer_options=transformer_options)
+        x.addcmul_(attn1_input, gate_msa)
+        del attn1_input

        x += self.attn2(x, context=context, mask=attention_mask, transformer_options=transformer_options)

-        y = comfy.ldm.common_dit.rms_norm(x) * (1 + scale_mlp) + shift_mlp
-        x += self.ff(y) * gate_mlp
+        y = comfy.ldm.common_dit.rms_norm(x)
+        y = torch.addcmul(y, y, scale_mlp).add_(shift_mlp)
+        x.addcmul_(self.ff(y), gate_mlp)

        return x

@@ -327,41 +317,35 @@ def get_fractional_positions(indices_grid, max_pos):


 def precompute_freqs_cis(indices_grid, dim, out_dtype, theta=10000.0, max_pos=[20, 2048, 2048]):
-    dtype = torch.float32 #self.dtype
+    dtype = torch.float32
+    device = indices_grid.device

+    # Get fractional positions and compute frequency indices
    fractional_positions = get_fractional_positions(indices_grid, max_pos)
+    indices = theta ** torch.linspace(0, 1, dim // 6, device=device, dtype=dtype) * math.pi / 2

-    start = 1
-    end = theta
-    device = fractional_positions.device
+    # Compute frequencies and apply cos/sin
+    freqs = (indices * (fractional_positions.unsqueeze(-1) * 2 - 1)).transpose(-1, -2).flatten(2)
+    cos_vals = freqs.cos().repeat_interleave(2, dim=-1)
+    sin_vals = freqs.sin().repeat_interleave(2, dim=-1)

-    indices = theta ** (
-        torch.linspace(
-            math.log(start, theta),
-            math.log(end, theta),
-            dim // 6,
-            device=device,
-            dtype=dtype,
-        )
-    )
-    indices = indices.to(dtype=dtype)
-
-    indices = indices * math.pi / 2
-
-    freqs = (
-        (indices * (fractional_positions.unsqueeze(-1) * 2 - 1))
-        .transpose(-1, -2)
-        .flatten(2)
-    )
-
-    cos_freq = freqs.cos().repeat_interleave(2, dim=-1)
-    sin_freq = freqs.sin().repeat_interleave(2, dim=-1)
+    # Pad if dim is not divisible by 6
    if dim % 6 != 0:
-        cos_padding = torch.ones_like(cos_freq[:, :, : dim % 6])
-        sin_padding = torch.zeros_like(cos_freq[:, :, : dim % 6])
-        cos_freq = torch.cat([cos_padding, cos_freq], dim=-1)
-        sin_freq = torch.cat([sin_padding, sin_freq], dim=-1)
-    return cos_freq.to(out_dtype), sin_freq.to(out_dtype)
+        padding_size = dim % 6
+        cos_vals = torch.cat([torch.ones_like(cos_vals[:, :, :padding_size]), cos_vals], dim=-1)
+        sin_vals = torch.cat([torch.zeros_like(sin_vals[:, :, :padding_size]), sin_vals], dim=-1)
+
+    # Reshape and extract one value per pair (since repeat_interleave duplicates each value)
+    cos_vals = cos_vals.reshape(*cos_vals.shape[:2], -1, 2)[..., 0].to(out_dtype)  # [B, N, dim//2]
+    sin_vals = sin_vals.reshape(*sin_vals.shape[:2], -1, 2)[..., 0].to(out_dtype)  # [B, N, dim//2]
+
+    # Build rotation matrix [[cos, -sin], [sin, cos]] and add heads dimension
+    freqs_cis = torch.stack([
+        torch.stack([cos_vals, -sin_vals], dim=-1),
+        torch.stack([sin_vals, cos_vals], dim=-1)
+    ], dim=-2).unsqueeze(1)  # [B, 1, N, dim//2, 2, 2]
+
+    return freqs_cis


 class LTXVModel(torch.nn.Module):
@@ -501,7 +485,7 @@ class LTXVModel(torch.nn.Module):
        shift, scale = scale_shift_values[:, :, 0], scale_shift_values[:, :, 1]
        x = self.norm_out(x)
        # Modulation
-        x = x * (1 + scale) + shift
+        x = torch.addcmul(x, x, scale).add_(shift)
        x = self.proj_out(x)

        x = self.patchifier.unpatchify(
--- a/comfy/ldm/qwen_image/controlnet.py
+++ b/comfy/ldm/qwen_image/controlnet.py
@@ -44,7 +44,7 @@ class QwenImageControlNetModel(QwenImageTransformer2DModel):
        txt_start = round(max(((x.shape[-1] + (self.patch_size // 2)) // self.patch_size) // 2, ((x.shape[-2] + (self.patch_size // 2)) // self.patch_size) // 2))
        txt_ids = torch.arange(txt_start, txt_start + context.shape[1], device=x.device).reshape(1, -1, 1).repeat(x.shape[0], 1, 3)
        ids = torch.cat((txt_ids, img_ids), dim=1)
-        image_rotary_emb = self.pe_embedder(ids).squeeze(1).unsqueeze(2).to(x.dtype)
+        image_rotary_emb = self.pe_embedder(ids).to(x.dtype).contiguous()
        del ids, txt_ids, img_ids

        hidden_states = self.img_in(hidden_states) + self.controlnet_x_embedder(hint)
--- a/comfy/ldm/qwen_image/model.py
+++ b/comfy/ldm/qwen_image/model.py
@@ -10,6 +10,7 @@ from comfy.ldm.modules.attention import optimized_attention_masked
 from comfy.ldm.flux.layers import EmbedND
 import comfy.ldm.common_dit
 import comfy.patcher_extension
+from comfy.ldm.flux.math import apply_rope1

 class GELU(nn.Module):
    def __init__(self, dim_in: int, dim_out: int, approximate: str = "none", bias: bool = True, dtype=None, device=None, operations=None):
@@ -134,33 +135,34 @@ class Attention(nn.Module):
        image_rotary_emb: Optional[torch.Tensor] = None,
        transformer_options={},
    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        batch_size = hidden_states.shape[0]
+        seq_img = hidden_states.shape[1]
        seq_txt = encoder_hidden_states.shape[1]

-        img_query = self.to_q(hidden_states).unflatten(-1, (self.heads, -1))
-        img_key = self.to_k(hidden_states).unflatten(-1, (self.heads, -1))
-        img_value = self.to_v(hidden_states).unflatten(-1, (self.heads, -1))
+        # Project and reshape to BHND format (batch, heads, seq, dim)
+        img_query = self.to_q(hidden_states).view(batch_size, seq_img, self.heads, -1).transpose(1, 2).contiguous()
+        img_key = self.to_k(hidden_states).view(batch_size, seq_img, self.heads, -1).transpose(1, 2).contiguous()
+        img_value = self.to_v(hidden_states).view(batch_size, seq_img, self.heads, -1).transpose(1, 2)

-        txt_query = self.add_q_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
-        txt_key = self.add_k_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
-        txt_value = self.add_v_proj(encoder_hidden_states).unflatten(-1, (self.heads, -1))
+        txt_query = self.add_q_proj(encoder_hidden_states).view(batch_size, seq_txt, self.heads, -1).transpose(1, 2).contiguous()
+        txt_key = self.add_k_proj(encoder_hidden_states).view(batch_size, seq_txt, self.heads, -1).transpose(1, 2).contiguous()
+        txt_value = self.add_v_proj(encoder_hidden_states).view(batch_size, seq_txt, self.heads, -1).transpose(1, 2)

        img_query = self.norm_q(img_query)
        img_key = self.norm_k(img_key)
        txt_query = self.norm_added_q(txt_query)
        txt_key = self.norm_added_k(txt_key)

-        joint_query = torch.cat([txt_query, img_query], dim=1)
-        joint_key = torch.cat([txt_key, img_key], dim=1)
-        joint_value = torch.cat([txt_value, img_value], dim=1)
+        joint_query = torch.cat([txt_query, img_query], dim=2)
+        joint_key = torch.cat([txt_key, img_key], dim=2)
+        joint_value = torch.cat([txt_value, img_value], dim=2)

-        joint_query = apply_rotary_emb(joint_query, image_rotary_emb)
-        joint_key = apply_rotary_emb(joint_key, image_rotary_emb)
+        joint_query = apply_rope1(joint_query, image_rotary_emb)
+        joint_key = apply_rope1(joint_key, image_rotary_emb)

-        joint_query = joint_query.flatten(start_dim=2)
-        joint_key = joint_key.flatten(start_dim=2)
-        joint_value = joint_value.flatten(start_dim=2)
-
-        joint_hidden_states = optimized_attention_masked(joint_query, joint_key, joint_value, self.heads, attention_mask, transformer_options=transformer_options)
+        joint_hidden_states = optimized_attention_masked(joint_query, joint_key, joint_value, self.heads,
+                                                         attention_mask, transformer_options=transformer_options,
+                                                         skip_reshape=True)

        txt_attn_output = joint_hidden_states[:, :seq_txt, :]
        img_attn_output = joint_hidden_states[:, seq_txt:, :]
@@ -234,10 +236,10 @@ class QwenImageTransformerBlock(nn.Module):
        img_mod1, img_mod2 = img_mod_params.chunk(2, dim=-1)
        txt_mod1, txt_mod2 = txt_mod_params.chunk(2, dim=-1)

-        img_normed = self.img_norm1(hidden_states)
-        img_modulated, img_gate1 = self._modulate(img_normed, img_mod1)
-        txt_normed = self.txt_norm1(encoder_hidden_states)
-        txt_modulated, txt_gate1 = self._modulate(txt_normed, txt_mod1)
+        img_modulated, img_gate1 = self._modulate(self.img_norm1(hidden_states), img_mod1)
+        del img_mod1
+        txt_modulated, txt_gate1 = self._modulate(self.txt_norm1(encoder_hidden_states), txt_mod1)
+        del txt_mod1

        img_attn_output, txt_attn_output = self.attn(
            hidden_states=img_modulated,
@@ -246,16 +248,20 @@ class QwenImageTransformerBlock(nn.Module):
            image_rotary_emb=image_rotary_emb,
            transformer_options=transformer_options,
        )
+        del img_modulated
+        del txt_modulated

        hidden_states = hidden_states + img_gate1 * img_attn_output
        encoder_hidden_states = encoder_hidden_states + txt_gate1 * txt_attn_output
+        del img_attn_output
+        del txt_attn_output
+        del img_gate1
+        del txt_gate1

-        img_normed2 = self.img_norm2(hidden_states)
-        img_modulated2, img_gate2 = self._modulate(img_normed2, img_mod2)
+        img_modulated2, img_gate2 = self._modulate(self.img_norm2(hidden_states), img_mod2)
        hidden_states = torch.addcmul(hidden_states, img_gate2, self.img_mlp(img_modulated2))

-        txt_normed2 = self.txt_norm2(encoder_hidden_states)
-        txt_modulated2, txt_gate2 = self._modulate(txt_normed2, txt_mod2)
+        txt_modulated2, txt_gate2 = self._modulate(self.txt_norm2(encoder_hidden_states), txt_mod2)
        encoder_hidden_states = torch.addcmul(encoder_hidden_states, txt_gate2, self.txt_mlp(txt_modulated2))

        return encoder_hidden_states, hidden_states
@@ -413,7 +419,7 @@ class QwenImageTransformer2DModel(nn.Module):
        txt_start = round(max(((x.shape[-1] + (self.patch_size // 2)) // self.patch_size) // 2, ((x.shape[-2] + (self.patch_size // 2)) // self.patch_size) // 2))
        txt_ids = torch.arange(txt_start, txt_start + context.shape[1], device=x.device).reshape(1, -1, 1).repeat(x.shape[0], 1, 3)
        ids = torch.cat((txt_ids, img_ids), dim=1)
-        image_rotary_emb = self.pe_embedder(ids).squeeze(1).unsqueeze(2).to(x.dtype)
+        image_rotary_emb = self.pe_embedder(ids).to(x.dtype).contiguous()
        del ids, txt_ids, img_ids

        hidden_states = self.img_in(hidden_states)
--- a/comfy/ldm/wan/model.py
+++ b/comfy/ldm/wan/model.py
@@ -232,6 +232,7 @@ class WanAttentionBlock(nn.Module):
        # assert e[0].dtype == torch.float32

        # self-attention
+        x = x.contiguous() # otherwise implicit in LayerNorm
        y = self.self_attn(
            torch.addcmul(repeat_e(e[0], x), self.norm1(x), 1 + repeat_e(e[1], x)),
            freqs, transformer_options=transformer_options)
--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@@ -504,6 +504,7 @@ class LoadedModel:
        if use_more_vram == 0:
            use_more_vram = 1e32
        self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
+
        real_model = self.model.model

        if is_intel_xpu() and not args.disable_ipex_optimize and 'ipex' in globals() and real_model is not None:
@@ -689,7 +690,10 @@ def load_models_gpu(models, memory_required=0, force_patch_weights=False, minimu
            current_free_mem = get_free_memory(torch_dev) + loaded_memory

            lowvram_model_memory = max(128 * 1024 * 1024, (current_free_mem - minimum_memory_required), min(current_free_mem * MIN_WEIGHT_MEMORY_RATIO, current_free_mem - minimum_inference_memory()))
-            lowvram_model_memory = max(0.1, lowvram_model_memory - loaded_memory)
+            lowvram_model_memory = lowvram_model_memory - loaded_memory
+
+            if lowvram_model_memory == 0:
+                lowvram_model_memory = 0.1

        if vram_set_state == VRAMState.NO_VRAM:
            lowvram_model_memory = 0.1
@@ -1082,32 +1086,75 @@ def cast_to_device(tensor, device, dtype, copy=False):
    non_blocking = device_supports_non_blocking(device)
    return cast_to(tensor, dtype=dtype, device=device, non_blocking=non_blocking, copy=copy)

+
+PINNED_MEMORY = {}
+TOTAL_PINNED_MEMORY = 0
+MAX_PINNED_MEMORY = -1
+if not args.disable_pinned_memory:
+    if is_nvidia() or is_amd():
+        if WINDOWS:
+            MAX_PINNED_MEMORY = get_total_memory(torch.device("cpu")) * 0.45  # Windows limit is apparently 50%
+        else:
+            MAX_PINNED_MEMORY = get_total_memory(torch.device("cpu")) * 0.95
+        logging.info("Enabled pinned memory {}".format(MAX_PINNED_MEMORY // (1024 * 1024)))
+
+
 def pin_memory(tensor):
-    if PerformanceFeature.PinnedMem not in args.fast:
+    global TOTAL_PINNED_MEMORY
+    if MAX_PINNED_MEMORY <= 0:
        return False

-    if not is_nvidia():
+    if type(tensor) is not torch.nn.parameter.Parameter:
        return False

    if not is_device_cpu(tensor.device):
        return False

-    if torch.cuda.cudart().cudaHostRegister(tensor.data_ptr(), tensor.numel() * tensor.element_size(), 1) == 0:
+    if tensor.is_pinned():
+        #NOTE: Cuda does detect when a tensor is already pinned and would
+        #error below, but there are proven cases where this also queues an error
+        #on the GPU async. So dont trust the CUDA API and guard here
+        return False
+
+    if not tensor.is_contiguous():
+        return False
+
+    size = tensor.numel() * tensor.element_size()
+    if (TOTAL_PINNED_MEMORY + size) > MAX_PINNED_MEMORY:
+        return False
+
+    ptr = tensor.data_ptr()
+    if torch.cuda.cudart().cudaHostRegister(ptr, size, 1) == 0:
+        PINNED_MEMORY[ptr] = size
+        TOTAL_PINNED_MEMORY += size
        return True

    return False

 def unpin_memory(tensor):
-    if PerformanceFeature.PinnedMem not in args.fast:
-        return False
-
-    if not is_nvidia():
+    global TOTAL_PINNED_MEMORY
+    if MAX_PINNED_MEMORY <= 0:
        return False

    if not is_device_cpu(tensor.device):
        return False

-    if torch.cuda.cudart().cudaHostUnregister(tensor.data_ptr()) == 0:
+    ptr = tensor.data_ptr()
+    size = tensor.numel() * tensor.element_size()
+
+    size_stored = PINNED_MEMORY.get(ptr, None)
+    if size_stored is None:
+        logging.warning("Tried to unpin tensor not pinned by ComfyUI")
+        return False
+
+    if size != size_stored:
+        logging.warning("Size of pinned tensor changed")
+        return False
+
+    if torch.cuda.cudart().cudaHostUnregister(ptr) == 0:
+        TOTAL_PINNED_MEMORY -= PINNED_MEMORY.pop(ptr)
+        if len(PINNED_MEMORY) == 0:
+            TOTAL_PINNED_MEMORY = 0
        return True

    return False
--- a/comfy/model_patcher.py
+++ b/comfy/model_patcher.py
@@ -843,7 +843,7 @@ class ModelPatcher:

        self.object_patches_backup.clear()

-    def partially_unload(self, device_to, memory_to_free=0):
+    def partially_unload(self, device_to, memory_to_free=0, force_patch_weights=False):
        with self.use_ejected():
            hooks_unpatched = False
            memory_freed = 0
@@ -887,13 +887,19 @@ class ModelPatcher:
                        module_mem += move_weight_functions(m, device_to)
                        if lowvram_possible:
                            if weight_key in self.patches:
-                                _, set_func, convert_func = get_key_weight(self.model, weight_key)
-                                m.weight_function.append(LowVramPatch(weight_key, self.patches, convert_func, set_func))
-                                patch_counter += 1
+                                if force_patch_weights:
+                                    self.patch_weight_to_device(weight_key)
+                                else:
+                                    _, set_func, convert_func = get_key_weight(self.model, weight_key)
+                                    m.weight_function.append(LowVramPatch(weight_key, self.patches, convert_func, set_func))
+                                    patch_counter += 1
                            if bias_key in self.patches:
-                                _, set_func, convert_func = get_key_weight(self.model, bias_key)
-                                m.bias_function.append(LowVramPatch(bias_key, self.patches, convert_func, set_func))
-                                patch_counter += 1
+                                if force_patch_weights:
+                                    self.patch_weight_to_device(bias_key)
+                                else:
+                                    _, set_func, convert_func = get_key_weight(self.model, bias_key)
+                                    m.bias_function.append(LowVramPatch(bias_key, self.patches, convert_func, set_func))
+                                    patch_counter += 1
                            cast_weight = True

                        if cast_weight:
@@ -909,6 +915,7 @@ class ModelPatcher:
            self.model.model_lowvram = True
            self.model.lowvram_patch_counter += patch_counter
            self.model.model_loaded_weight_memory -= memory_freed
+            logging.info("loaded partially: {:.2f} MB loaded, lowvram patches: {}".format(self.model.model_loaded_weight_memory / (1024 * 1024), self.model.lowvram_patch_counter))
            return memory_freed

    def partially_load(self, device_to, extra_memory=0, force_patch_weights=False):
@@ -921,6 +928,9 @@ class ModelPatcher:
                extra_memory += (used - self.model.model_loaded_weight_memory)

            self.patch_model(load_weights=False)
+            if extra_memory < 0 and not unpatch_weights:
+                self.partially_unload(self.offload_device, -extra_memory, force_patch_weights=force_patch_weights)
+                return 0
            full_load = False
            if self.model.model_lowvram == False and self.model.model_loaded_weight_memory > 0:
                self.apply_hooks(self.forced_hooks, force_apply=True)
--- a/comfy/ops.py
+++ b/comfy/ops.py
@@ -77,7 +77,10 @@ def cast_bias_weight(s, input=None, dtype=None, device=None, bias_dtype=None, of
    # will add async-offload support to your cast and improve performance.
    if input is not None:
        if dtype is None:
-            dtype = input.dtype
+            if isinstance(input, QuantizedTensor):
+                dtype = input._layout_params["orig_dtype"]
+            else:
+                dtype = input.dtype
        if bias_dtype is None:
            bias_dtype = dtype
        if device is None:
@@ -110,9 +113,9 @@ def cast_bias_weight(s, input=None, dtype=None, device=None, bias_dtype=None, of
                for f in s.bias_function:
                    bias = f(bias)

-    weight = weight.to(dtype=dtype)
-    if weight_has_function:
+    if weight_has_function or weight.dtype != dtype:
        with wf_context:
+            weight = weight.to(dtype=dtype)
            for f in s.weight_function:
                weight = f(weight)

@@ -534,18 +537,7 @@ if CUBLAS_IS_AVAILABLE:
 # ==============================================================================
 # Mixed Precision Operations
 # ==============================================================================
-from .quant_ops import QuantizedTensor
-
-QUANT_FORMAT_MIXINS = {
-    "float8_e4m3fn": {
-        "dtype": torch.float8_e4m3fn,
-        "layout_type": "TensorCoreFP8Layout",
-        "parameters": {
-            "weight_scale": torch.nn.Parameter(torch.zeros((), dtype=torch.float32), requires_grad=False),
-            "input_scale": torch.nn.Parameter(torch.zeros((), dtype=torch.float32), requires_grad=False),
-        }
-    }
-}
+from .quant_ops import QuantizedTensor, QUANT_ALGOS

 class MixedPrecisionOps(disable_weight_init):
    _layer_quant_config = {}
@@ -596,23 +588,24 @@ class MixedPrecisionOps(disable_weight_init):
                if quant_format is None:
                    raise ValueError(f"Unknown quantization format for layer {layer_name}")

-                mixin = QUANT_FORMAT_MIXINS[quant_format]
-                self.layout_type = mixin["layout_type"]
+                qconfig = QUANT_ALGOS[quant_format]
+                self.layout_type = qconfig["comfy_tensor_layout"]

-                scale_key = f"{prefix}weight_scale"
+                weight_scale_key = f"{prefix}weight_scale"
                layout_params = {
-                    'scale': state_dict.pop(scale_key, None),
-                    'orig_dtype': MixedPrecisionOps._compute_dtype
+                    'scale': state_dict.pop(weight_scale_key, None),
+                    'orig_dtype': MixedPrecisionOps._compute_dtype,
+                    'block_size': qconfig.get("group_size", None),
                }
                if layout_params['scale'] is not None:
-                    manually_loaded_keys.append(scale_key)
+                    manually_loaded_keys.append(weight_scale_key)

                self.weight = torch.nn.Parameter(
-                    QuantizedTensor(weight.to(device=device, dtype=mixin["dtype"]), self.layout_type, layout_params),
+                    QuantizedTensor(weight.to(device=device), self.layout_type, layout_params),
                    requires_grad=False
                )

-                for param_name, param_value in mixin["parameters"].items():
+                for param_name in qconfig["parameters"]:
                    param_key = f"{prefix}{param_name}"
                    _v = state_dict.pop(param_key, None)
                    if _v is None:
@@ -643,7 +636,7 @@ class MixedPrecisionOps(disable_weight_init):
            if (getattr(self, 'layout_type', None) is not None and
                getattr(self, 'input_scale', None) is not None and
                not isinstance(input, QuantizedTensor)):
-                input = QuantizedTensor.from_float(input, self.layout_type, scale=self.input_scale, fp8_dtype=self.weight.dtype)
+                input = QuantizedTensor.from_float(input, self.layout_type, scale=self.input_scale, dtype=self.weight.dtype)
            return self._forward(input, self.weight, self.bias)


--- a/comfy/quant_ops.py
+++ b/comfy/quant_ops.py
@@ -74,6 +74,12 @@ def _copy_layout_params(params):
            new_params[k] = v
    return new_params

+def _copy_layout_params_inplace(src, dst, non_blocking=False):
+    for k, v in src.items():
+        if isinstance(v, torch.Tensor):
+            dst[k].copy_(v, non_blocking=non_blocking)
+        else:
+            dst[k] = v

 class QuantizedLayout:
    """
@@ -318,13 +324,13 @@ def generic_to_dtype_layout(func, args, kwargs):
 def generic_copy_(func, args, kwargs):
    qt_dest = args[0]
    src = args[1]
-
+    non_blocking = args[2] if len(args) > 2 else False
    if isinstance(qt_dest, QuantizedTensor):
        if isinstance(src, QuantizedTensor):
            # Copy from another quantized tensor
-            qt_dest._qdata.copy_(src._qdata)
+            qt_dest._qdata.copy_(src._qdata, non_blocking=non_blocking)
            qt_dest._layout_type = src._layout_type
-            qt_dest._layout_params = _copy_layout_params(src._layout_params)
+            _copy_layout_params_inplace(src._layout_params, qt_dest._layout_params, non_blocking=non_blocking)
        else:
            # Copy from regular tensor - just copy raw data
            qt_dest._qdata.copy_(src)
@@ -336,6 +342,26 @@ def generic_copy_(func, args, kwargs):
 def generic_has_compatible_shallow_copy_type(func, args, kwargs):
    return True

+
+@register_generic_util(torch.ops.aten.empty_like.default)
+def generic_empty_like(func, args, kwargs):
+    """Empty_like operation - creates an empty tensor with the same quantized structure."""
+    qt = args[0]
+    if isinstance(qt, QuantizedTensor):
+        # Create empty tensor with same shape and dtype as the quantized data
+        hp_dtype = kwargs.pop('dtype', qt._layout_params["orig_dtype"])
+        new_qdata = torch.empty_like(qt._qdata, **kwargs)
+
+        # Handle device transfer for layout params
+        target_device = kwargs.get('device', new_qdata.device)
+        new_params = _move_layout_params_to_device(qt._layout_params, target_device)
+
+        # Update orig_dtype if dtype is specified
+        new_params['orig_dtype'] = hp_dtype
+
+        return QuantizedTensor(new_qdata, qt._layout_type, new_params)
+    return func(*args, **kwargs)
+
 # ==============================================================================
 # FP8 Layout + Operation Handlers
 # ==============================================================================
@@ -378,6 +404,13 @@ class TensorCoreFP8Layout(QuantizedLayout):
    def get_plain_tensors(cls, qtensor):
        return qtensor._qdata, qtensor._layout_params['scale']

+QUANT_ALGOS = {
+    "float8_e4m3fn": {
+        "storage_t": torch.float8_e4m3fn,
+        "parameters": {"weight_scale", "input_scale"},
+        "comfy_tensor_layout": "TensorCoreFP8Layout",
+    },
+}

 LAYOUTS = {
    "TensorCoreFP8Layout": TensorCoreFP8Layout,
--- a/comfy/sd1_clip.py
+++ b/comfy/sd1_clip.py
@@ -460,7 +460,7 @@ def load_embed(embedding_name, embedding_directory, embedding_size, embed_key=No
    return embed_out

 class SDTokenizer:
-    def __init__(self, tokenizer_path=None, max_length=77, pad_with_end=True, embedding_directory=None, embedding_size=768, embedding_key='clip_l', tokenizer_class=CLIPTokenizer, has_start_token=True, has_end_token=True, pad_to_max_length=True, min_length=None, pad_token=None, end_token=None, min_padding=None, tokenizer_data={}, tokenizer_args={}):
+    def __init__(self, tokenizer_path=None, max_length=77, pad_with_end=True, embedding_directory=None, embedding_size=768, embedding_key='clip_l', tokenizer_class=CLIPTokenizer, has_start_token=True, has_end_token=True, pad_to_max_length=True, min_length=None, pad_token=None, end_token=None, min_padding=None, pad_left=False, tokenizer_data={}, tokenizer_args={}):
        if tokenizer_path is None:
            tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "sd1_tokenizer")
        self.tokenizer = tokenizer_class.from_pretrained(tokenizer_path, **tokenizer_args)
@@ -468,6 +468,7 @@ class SDTokenizer:
        self.min_length = tokenizer_data.get("{}_min_length".format(embedding_key), min_length)
        self.end_token = None
        self.min_padding = min_padding
+        self.pad_left = pad_left

        empty = self.tokenizer('')["input_ids"]
        self.tokenizer_adds_end_token = has_end_token
@@ -522,6 +523,12 @@ class SDTokenizer:
                return (embed, "{} {}".format(embedding_name[len(stripped):], leftover))
        return (embed, leftover)

+    def pad_tokens(self, tokens, amount):
+        if self.pad_left:
+            for i in range(amount):
+                tokens.insert(0, (self.pad_token, 1.0, 0))
+        else:
+            tokens.extend([(self.pad_token, 1.0, 0)] * amount)

    def tokenize_with_weights(self, text:str, return_word_ids=False, tokenizer_options={}, **kwargs):
        '''
@@ -600,7 +607,7 @@ class SDTokenizer:
                        if self.end_token is not None:
                            batch.append((self.end_token, 1.0, 0))
                        if self.pad_to_max_length:
-                            batch.extend([(self.pad_token, 1.0, 0)] * (remaining_length))
+                            self.pad_tokens(batch, remaining_length)
                    #start new batch
                    batch = []
                    if self.start_token is not None:
@@ -614,11 +621,11 @@ class SDTokenizer:
        if self.end_token is not None:
            batch.append((self.end_token, 1.0, 0))
        if min_padding is not None:
-            batch.extend([(self.pad_token, 1.0, 0)] * min_padding)
+            self.pad_tokens(batch, min_padding)
        if self.pad_to_max_length and len(batch) < self.max_length:
-            batch.extend([(self.pad_token, 1.0, 0)] * (self.max_length - len(batch)))
+            self.pad_tokens(batch, self.max_length - len(batch))
        if min_length is not None and len(batch) < min_length:
-            batch.extend([(self.pad_token, 1.0, 0)] * (min_length - len(batch)))
+            self.pad_tokens(batch, min_length - len(batch))

        if not return_word_ids:
            batched_tokens = [[(t, w) for t, w,_ in x] for x in batched_tokens]
--- a/comfy_api_nodes/apis/PixverseController.py
+++ b/comfy_api_nodes/apis/PixverseController.py
@@ -1,17 +0,0 @@
-# generated by datamodel-codegen:
-#   filename:  filtered-openapi.yaml
-#   timestamp: 2025-04-29T23:44:54+00:00
-
-from __future__ import annotations
-
-from typing import Optional
-
-from pydantic import BaseModel
-
-from . import PixverseDto
-
-
-class ResponseData(BaseModel):
-    ErrCode: Optional[int] = None
-    ErrMsg: Optional[str] = None
-    Resp: Optional[PixverseDto.V2OpenAPII2VResp] = None
--- a/comfy_api_nodes/apis/PixverseDto.py
+++ b/comfy_api_nodes/apis/PixverseDto.py
@@ -1,57 +0,0 @@
-# generated by datamodel-codegen:
-#   filename:  filtered-openapi.yaml
-#   timestamp: 2025-04-29T23:44:54+00:00
-
-from __future__ import annotations
-
-from typing import Optional
-
-from pydantic import BaseModel, Field
-
-
-class V2OpenAPII2VResp(BaseModel):
-    video_id: Optional[int] = Field(None, description='Video_id')
-
-
-class V2OpenAPIT2VReq(BaseModel):
-    aspect_ratio: str = Field(
-        ..., description='Aspect ratio (16:9, 4:3, 1:1, 3:4, 9:16)', examples=['16:9']
-    )
-    duration: int = Field(
-        ...,
-        description='Video duration (5, 8 seconds, --model=v3.5 only allows 5,8; --quality=1080p does not support 8s)',
-        examples=[5],
-    )
-    model: str = Field(
-        ..., description='Model version (only supports v3.5)', examples=['v3.5']
-    )
-    motion_mode: Optional[str] = Field(
-        'normal',
-        description='Motion mode (normal, fast, --fast only available when duration=5; --quality=1080p does not support fast)',
-        examples=['normal'],
-    )
-    negative_prompt: Optional[str] = Field(
-        None, description='Negative prompt\n', max_length=2048
-    )
-    prompt: str = Field(..., description='Prompt', max_length=2048)
-    quality: str = Field(
-        ...,
-        description='Video quality ("360p"(Turbo model), "540p", "720p", "1080p")',
-        examples=['540p'],
-    )
-    seed: Optional[int] = Field(None, description='Random seed, range: 0 - 2147483647')
-    style: Optional[str] = Field(
-        None,
-        description='Style (effective when model=v3.5, "anime", "3d_animation", "clay", "comic", "cyberpunk") Do not include style parameter unless needed',
-        examples=['anime'],
-    )
-    template_id: Optional[int] = Field(
-        None,
-        description='Template ID (template_id must be activated before use)',
-        examples=[302325299692608],
-    )
-    water_mark: Optional[bool] = Field(
-        False,
-        description='Watermark (true: add watermark, false: no watermark)',
-        examples=[False],
-    )
--- a/comfy_api_nodes/apis/client.py
+++ b/comfy_api_nodes/apis/client.py
@@ -1,981 +0,0 @@
-"""
-API Client Framework for api.comfy.org.
-
-This module provides a flexible framework for making API requests from ComfyUI nodes.
-It supports both synchronous and asynchronous API operations with proper type validation.
-
-Key Components:
--------------
-1. ApiClient - Handles HTTP requests with authentication and error handling
-2. ApiEndpoint - Defines a single HTTP endpoint with its request/response models
-3. ApiOperation - Executes a single synchronous API operation
-
-Usage Examples:
--------------
-
-# Example 1: Synchronous API Operation
-# ------------------------------------
-# For a simple API call that returns the result immediately:
-
-# 1. Create the API client
-api_client = ApiClient(
-    base_url="https://api.example.com",
-    auth_token="your_auth_token_here",
-    comfy_api_key="your_comfy_api_key_here",
-    timeout=30.0,
-    verify_ssl=True
-)
-
-# 2. Define the endpoint
-user_info_endpoint = ApiEndpoint(
-    path="/v1/users/me",
-    method=HttpMethod.GET,
-    request_model=EmptyRequest,  # No request body needed
-    response_model=UserProfile,   # Pydantic model for the response
-    query_params=None
-)
-
-# 3. Create the request object
-request = EmptyRequest()
-
-# 4. Create and execute the operation
-operation = ApiOperation(
-    endpoint=user_info_endpoint,
-    request=request
-)
-user_profile = await operation.execute(client=api_client)  # Returns immediately with the result
-
-
-# Example 2: Asynchronous API Operation with Polling
-# -------------------------------------------------
-# For an API that starts a task and requires polling for completion:
-
-# 1. Define the endpoints (initial request and polling)
-generate_image_endpoint = ApiEndpoint(
-    path="/v1/images/generate",
-    method=HttpMethod.POST,
-    request_model=ImageGenerationRequest,
-    response_model=TaskCreatedResponse,
-    query_params=None
-)
-
-check_task_endpoint = ApiEndpoint(
-    path="/v1/tasks/{task_id}",
-    method=HttpMethod.GET,
-    request_model=EmptyRequest,
-    response_model=ImageGenerationResult,
-    query_params=None
-)
-
-# 2. Create the request object
-request = ImageGenerationRequest(
-    prompt="a beautiful sunset over mountains",
-    width=1024,
-    height=1024,
-    num_images=1
-)
-
-# 3. Create and execute the polling operation
-operation = PollingOperation(
-    initial_endpoint=generate_image_endpoint,
-    initial_request=request,
-    poll_endpoint=check_task_endpoint,
-    task_id_field="task_id",
-    status_field="status",
-    completed_statuses=["completed"],
-    failed_statuses=["failed", "error"]
-)
-
-# This will make the initial request and then poll until completion
-result = await operation.execute(client=api_client)  # Returns the final ImageGenerationResult when done
-"""
-
-from __future__ import annotations
-import aiohttp
-import asyncio
-import logging
-import io
-import os
-import socket
-from aiohttp.client_exceptions import ClientError, ClientResponseError
-from typing import Type, Optional, Any, TypeVar, Generic, Callable
-from enum import Enum
-import json
-from urllib.parse import urljoin, urlparse
-from pydantic import BaseModel, Field
-import uuid # For generating unique operation IDs
-
-from server import PromptServer
-from comfy.cli_args import args
-from comfy import utils
-from . import request_logger
-
-T = TypeVar("T", bound=BaseModel)
-R = TypeVar("R", bound=BaseModel)
-P = TypeVar("P", bound=BaseModel)  # For poll response
-
-PROGRESS_BAR_MAX = 100
-
-
-class NetworkError(Exception):
-    """Base exception for network-related errors with diagnostic information."""
-    pass
-
-
-class LocalNetworkError(NetworkError):
-    """Exception raised when local network connectivity issues are detected."""
-    pass
-
-
-class ApiServerError(NetworkError):
-    """Exception raised when the API server is unreachable but internet is working."""
-    pass
-
-
-class EmptyRequest(BaseModel):
-    """Base class for empty request bodies.
-    For GET requests, fields will be sent as query parameters."""
-
-    pass
-
-
-class UploadRequest(BaseModel):
-    file_name: str = Field(..., description="Filename to upload")
-    content_type: Optional[str] = Field(
-        None,
-        description="Mime type of the file. For example: image/png, image/jpeg, video/mp4, etc.",
-    )
-
-
-class UploadResponse(BaseModel):
-    download_url: str = Field(..., description="URL to GET uploaded file")
-    upload_url: str = Field(..., description="URL to PUT file to upload")
-
-
-class HttpMethod(str, Enum):
-    GET = "GET"
-    POST = "POST"
-    PUT = "PUT"
-    DELETE = "DELETE"
-    PATCH = "PATCH"
-
-
-class ApiClient:
-    """
-    Client for making HTTP requests to an API with authentication, error handling, and retry logic.
-    """
-
-    def __init__(
-        self,
-        base_url: str,
-        auth_token: Optional[str] = None,
-        comfy_api_key: Optional[str] = None,
-        timeout: float = 3600.0,
-        verify_ssl: bool = True,
-        max_retries: int = 3,
-        retry_delay: float = 1.0,
-        retry_backoff_factor: float = 2.0,
-        retry_status_codes: Optional[tuple[int, ...]] = None,
-        session: Optional[aiohttp.ClientSession] = None,
-    ):
-        self.base_url = base_url
-        self.auth_token = auth_token
-        self.comfy_api_key = comfy_api_key
-        self.timeout = timeout
-        self.verify_ssl = verify_ssl
-        self.max_retries = max_retries
-        self.retry_delay = retry_delay
-        self.retry_backoff_factor = retry_backoff_factor
-        # Default retry status codes: 408 (Request Timeout), 429 (Too Many Requests),
-        # 500, 502, 503, 504 (Server Errors)
-        self.retry_status_codes = retry_status_codes or (408, 429, 500, 502, 503, 504)
-        self._session: Optional[aiohttp.ClientSession] = session
-        self._owns_session = session is None  # Track if we have to close it
-
-    @staticmethod
-    def _generate_operation_id(path: str) -> str:
-        """Generates a unique operation ID for logging."""
-        return f"{path.strip('/').replace('/', '_')}_{uuid.uuid4().hex[:8]}"
-
-    @staticmethod
-    def _create_json_payload_args(
-        data: Optional[dict[str, Any]] = None,
-        headers: Optional[dict[str, str]] = None,
-    ) -> dict[str, Any]:
-        return {
-            "json": data,
-            "headers": headers,
-        }
-
-    def _create_form_data_args(
-        self,
-        data: dict[str, Any] | None,
-        files: dict[str, Any] | None,
-        headers: Optional[dict[str, str]] = None,
-        multipart_parser: Callable | None = None,
-    ) -> dict[str, Any]:
-        if headers and "Content-Type" in headers:
-            del headers["Content-Type"]
-
-        if multipart_parser and data:
-            data = multipart_parser(data)
-
-        if isinstance(data, aiohttp.FormData):
-            form = data  # If the parser already returned a FormData, pass it through
-        else:
-            form = aiohttp.FormData(default_to_multipart=True)
-            if data:  # regular text fields
-                for k, v in data.items():
-                    if v is None:
-                        continue  # aiohttp fails to serialize "None" values
-                    # aiohttp expects strings or bytes; convert enums etc.
-                    form.add_field(k, str(v) if not isinstance(v, (bytes, bytearray)) else v)
-
-        if files:
-            file_iter = files if isinstance(files, list) else files.items()
-            for field_name, file_obj in file_iter:
-                if file_obj is None:
-                    continue  # aiohttp fails to serialize "None" values
-                # file_obj can be (filename, bytes/io.BytesIO, content_type) tuple
-                if isinstance(file_obj, tuple):
-                    filename, file_value, content_type = self._unpack_tuple(file_obj)
-                else:
-                    file_value = file_obj
-                    filename = getattr(file_obj, "name", field_name)
-                    content_type = "application/octet-stream"
-
-                form.add_field(
-                    name=field_name,
-                    value=file_value,
-                    filename=filename,
-                    content_type=content_type,
-                )
-        return {"data": form, "headers": headers or {}}
-
-    @staticmethod
-    def _create_urlencoded_form_data_args(
-        data: dict[str, Any],
-        headers: Optional[dict[str, str]] = None,
-    ) -> dict[str, Any]:
-        headers = headers or {}
-        headers["Content-Type"] = "application/x-www-form-urlencoded"
-        return {
-            "data": data,
-            "headers": headers,
-        }
-
-    def get_headers(self) -> dict[str, str]:
-        """Get headers for API requests, including authentication if available"""
-        headers = {"Content-Type": "application/json", "Accept": "application/json"}
-
-        if self.auth_token:
-            headers["Authorization"] = f"Bearer {self.auth_token}"
-        elif self.comfy_api_key:
-            headers["X-API-KEY"] = self.comfy_api_key
-
-        return headers
-
-    async def _check_connectivity(self, target_url: str) -> dict[str, bool]:
-        """
-        Check connectivity to determine if network issues are local or server-related.
-
-        Args:
-            target_url: URL to check connectivity to
-
-        Returns:
-            Dictionary with connectivity status details
-        """
-        results = {
-            "internet_accessible": False,
-            "api_accessible": False,
-            "is_local_issue": False,
-            "is_api_issue": False,
-        }
-        timeout = aiohttp.ClientTimeout(total=5.0)
-        async with aiohttp.ClientSession(timeout=timeout) as session:
-            try:
-                async with session.get("https://www.google.com", ssl=self.verify_ssl) as resp:
-                    results["internet_accessible"] = resp.status < 500
-            except (ClientError, asyncio.TimeoutError, socket.gaierror):
-                results["is_local_issue"] = True
-                return results  # cannot reach the internet – early exit
-
-            # Now check API health endpoint
-            parsed = urlparse(target_url)
-            health_url = f"{parsed.scheme}://{parsed.netloc}/health"
-            try:
-                async with session.get(health_url, ssl=self.verify_ssl) as resp:
-                    results["api_accessible"] = resp.status < 500
-            except ClientError:
-                pass  # leave as False
-
-        results["is_api_issue"] = results["internet_accessible"] and not results["api_accessible"]
-        return results
-
-    async def request(
-        self,
-        method: str,
-        path: str,
-        params: Optional[dict[str, Any]] = None,
-        data: Optional[dict[str, Any]] = None,
-        files: Optional[dict[str, Any] | list[tuple[str, Any]]] = None,
-        headers: Optional[dict[str, str]] = None,
-        content_type: str = "application/json",
-        multipart_parser: Callable | None = None,
-        retry_count: int = 0,  # Used internally for tracking retries
-    ) -> dict[str, Any]:
-        """
-        Make an HTTP request to the API with automatic retries for transient errors.
-
-        Args:
-            method: HTTP method (GET, POST, etc.)
-            path: API endpoint path (will be joined with base_url)
-            params: Query parameters
-            data: body data
-            files: Files to upload
-            headers: Additional headers
-            content_type: Content type of the request. Defaults to application/json.
-            retry_count: Internal parameter for tracking retries, do not set manually
-
-        Returns:
-            Parsed JSON response
-
-        Raises:
-            LocalNetworkError: If local network connectivity issues are detected
-            ApiServerError: If the API server is unreachable but internet is working
-            Exception: For other request failures
-        """
-
-        # Build full URL and merge headers
-        relative_path = path.lstrip("/")
-        url = urljoin(self.base_url, relative_path)
-        self._check_auth(self.auth_token, self.comfy_api_key)
-
-        request_headers = self.get_headers()
-        if headers:
-            request_headers.update(headers)
-        if files:
-            request_headers.pop("Content-Type", None)
-        if params:
-            params = {k: v for k, v in params.items() if v is not None}  # aiohttp fails to serialize None values
-
-        logging.debug("[DEBUG] Request Headers: %s", request_headers)
-        logging.debug("[DEBUG] Files: %s", files)
-        logging.debug("[DEBUG] Params: %s", params)
-        logging.debug("[DEBUG] Data: %s", data)
-
-        if content_type == "application/x-www-form-urlencoded":
-            payload_args = self._create_urlencoded_form_data_args(data or {}, request_headers)
-        elif content_type == "multipart/form-data":
-            payload_args = self._create_form_data_args(data, files, request_headers, multipart_parser)
-        else:
-            payload_args = self._create_json_payload_args(data, request_headers)
-
-        operation_id = self._generate_operation_id(path)
-        request_logger.log_request_response(
-            operation_id=operation_id,
-            request_method=method,
-            request_url=url,
-            request_headers=request_headers,
-            request_params=params,
-            request_data=data if content_type == "application/json" else "[form-data or other]",
-        )
-
-        session = await self._get_session()
-        try:
-            async with session.request(
-                method,
-                url,
-                params=params,
-                ssl=self.verify_ssl,
-                **payload_args,
-            ) as resp:
-                if resp.status >= 400:
-                    try:
-                        error_data = await resp.json()
-                    except (aiohttp.ContentTypeError, json.JSONDecodeError):
-                        error_data = await resp.text()
-
-                    return await self._handle_http_error(
-                        ClientResponseError(resp.request_info, resp.history, status=resp.status, message=error_data),
-                        operation_id,
-                        method,
-                        url,
-                        params,
-                        data,
-                        files,
-                        headers,
-                        content_type,
-                        multipart_parser,
-                        retry_count=retry_count,
-                        response_content=error_data,
-                    )
-
-                # Success – parse JSON (safely) and log
-                try:
-                    payload = await resp.json()
-                    response_content_to_log = payload
-                except (aiohttp.ContentTypeError, json.JSONDecodeError):
-                    payload = {}
-                    response_content_to_log = await resp.text()
-
-                request_logger.log_request_response(
-                    operation_id=operation_id,
-                    request_method=method,
-                    request_url=url,
-                    response_status_code=resp.status,
-                    response_headers=dict(resp.headers),
-                    response_content=response_content_to_log,
-                )
-                return payload
-
-        except (ClientError, asyncio.TimeoutError, socket.gaierror) as e:
-            # Treat as *connection* problem – optionally retry, else escalate
-            if retry_count < self.max_retries:
-                delay = self.retry_delay * (self.retry_backoff_factor ** retry_count)
-                logging.warning("Connection error. Retrying in %.2fs (%s/%s): %s", delay, retry_count + 1,
-                                self.max_retries, str(e))
-                await asyncio.sleep(delay)
-                return await self.request(
-                    method,
-                    path,
-                    params=params,
-                    data=data,
-                    files=files,
-                    headers=headers,
-                    content_type=content_type,
-                    multipart_parser=multipart_parser,
-                    retry_count=retry_count + 1,
-                )
-            # One final connectivity check for diagnostics
-            connectivity = await self._check_connectivity(self.base_url)
-            if connectivity["is_local_issue"]:
-                raise LocalNetworkError(
-                    "Unable to connect to the API server due to local network issues. "
-                    "Please check your internet connection and try again."
-                ) from e
-            raise ApiServerError(
-                f"The API server at {self.base_url} is currently unreachable. "
-                f"The service may be experiencing issues. Please try again later."
-            ) from e
-
-    @staticmethod
-    def _check_auth(auth_token, comfy_api_key):
-        """Verify that an auth token is present or comfy_api_key is present"""
-        if auth_token is None and comfy_api_key is None:
-            raise Exception("Unauthorized: Please login first to use this node.")
-        return auth_token or comfy_api_key
-
-    @staticmethod
-    async def upload_file(
-        upload_url: str,
-        file: io.BytesIO | str,
-        content_type: str | None = None,
-        max_retries: int = 3,
-        retry_delay: float = 1.0,
-        retry_backoff_factor: float = 2.0,
-    ) -> aiohttp.ClientResponse:
-        """Upload a file to the API with retry logic.
-
-        Args:
-            upload_url: The URL to upload to
-            file: Either a file path string, BytesIO object, or tuple of (file_path, filename)
-            content_type: Optional mime type to set for the upload
-            max_retries: Maximum number of retry attempts
-            retry_delay: Initial delay between retries in seconds
-            retry_backoff_factor: Multiplier for the delay after each retry
-        """
-        headers: dict[str, str] = {}
-        skip_auto_headers: set[str] = set()
-        if content_type:
-            headers["Content-Type"] = content_type
-        else:
-            # tell aiohttp not to add Content-Type that will break the request signature and result in a 403 status.
-            skip_auto_headers.add("Content-Type")
-
-        # Extract file bytes
-        if isinstance(file, io.BytesIO):
-            file.seek(0)
-            data = file.read()
-        elif isinstance(file, str):
-            with open(file, "rb") as f:
-                data = f.read()
-        else:
-            raise ValueError("File must be BytesIO or str path")
-
-        parsed = urlparse(upload_url)
-        basename = os.path.basename(parsed.path) or parsed.netloc or "upload"
-        operation_id = f"upload_{basename}_{uuid.uuid4().hex[:8]}"
-        request_logger.log_request_response(
-            operation_id=operation_id,
-            request_method="PUT",
-            request_url=upload_url,
-            request_headers=headers,
-            request_data=f"[File data {len(data)} bytes]",
-        )
-
-        delay = retry_delay
-        for attempt in range(max_retries + 1):
-            try:
-                timeout = aiohttp.ClientTimeout(total=None)  # honour server side timeouts
-                async with aiohttp.ClientSession(timeout=timeout) as session:
-                    async with session.put(
-                        upload_url, data=data, headers=headers, skip_auto_headers=skip_auto_headers,
-                    ) as resp:
-                        resp.raise_for_status()
-                        request_logger.log_request_response(
-                            operation_id=operation_id,
-                            request_method="PUT",
-                            request_url=upload_url,
-                            response_status_code=resp.status,
-                            response_headers=dict(resp.headers),
-                            response_content="File uploaded successfully.",
-                        )
-                        return resp
-            except (ClientError, asyncio.TimeoutError) as e:
-                request_logger.log_request_response(
-                    operation_id=operation_id,
-                    request_method="PUT",
-                    request_url=upload_url,
-                    response_status_code=e.status if hasattr(e, "status") else None,
-                    response_headers=dict(e.headers) if hasattr(e, "headers") else None,
-                    response_content=None,
-                    error_message=f"{type(e).__name__}: {str(e)}",
-                )
-                if attempt < max_retries:
-                    logging.warning(
-                        "Upload failed (%s/%s). Retrying in %.2fs. %s", attempt + 1, max_retries, delay, str(e)
-                    )
-                    await asyncio.sleep(delay)
-                    delay *= retry_backoff_factor
-                else:
-                    raise NetworkError(f"Failed to upload file after {max_retries + 1} attempts: {e}") from e
-
-    async def _handle_http_error(
-        self,
-        exc: ClientResponseError,
-        operation_id: str,
-        *req_meta,
-        retry_count: int,
-        response_content: dict | str = "",
-    ) -> dict[str, Any]:
-        status_code = exc.status
-        if status_code == 401:
-            user_friendly = "Unauthorized: Please login first to use this node."
-        elif status_code == 402:
-            user_friendly = "Payment Required: Please add credits to your account to use this node."
-        elif status_code == 409:
-            user_friendly = "There is a problem with your account. Please contact support@comfy.org."
-        elif status_code == 429:
-            user_friendly = "Rate Limit Exceeded: Please try again later."
-        else:
-            if isinstance(response_content, dict):
-                if "error" in response_content and "message" in response_content["error"]:
-                    user_friendly = f"API Error: {response_content['error']['message']}"
-                    if "type" in response_content["error"]:
-                        user_friendly += f" (Type: {response_content['error']['type']})"
-                else: # Handle cases where error is just a JSON dict with unknown format
-                    user_friendly = f"API Error: {json.dumps(response_content)}"
-            else:
-                if len(response_content) < 200:  # Arbitrary limit for display
-                    user_friendly = f"API Error (raw): {response_content}"
-                else:
-                    user_friendly = f"API Error (raw, status {response_content})"
-
-        request_logger.log_request_response(
-            operation_id=operation_id,
-            request_method=req_meta[0],
-            request_url=req_meta[1],
-            response_status_code=exc.status,
-            response_headers=dict(req_meta[5]) if req_meta[5] else None,
-            response_content=response_content,
-            error_message=f"HTTP Error {exc.status}",
-        )
-
-        logging.debug("[DEBUG] API Error: %s (Status: %s)", user_friendly, status_code)
-        if response_content:
-            logging.debug("[DEBUG] Response content: %s", response_content)
-
-        # Retry if eligible
-        if status_code in self.retry_status_codes and retry_count < self.max_retries:
-            delay = self.retry_delay * (self.retry_backoff_factor ** retry_count)
-            logging.warning(
-                "HTTP error %s. Retrying in %.2fs (%s/%s)",
-                status_code,
-                delay,
-                retry_count + 1,
-                self.max_retries,
-            )
-            await asyncio.sleep(delay)
-            return await self.request(
-                req_meta[0],  # method
-                req_meta[1].replace(self.base_url, ""),  # path
-                params=req_meta[2],
-                data=req_meta[3],
-                files=req_meta[4],
-                headers=req_meta[5],
-                content_type=req_meta[6],
-                multipart_parser=req_meta[7],
-                retry_count=retry_count + 1,
-            )
-
-        raise Exception(user_friendly) from exc
-
-    @staticmethod
-    def _unpack_tuple(t):
-        """Helper to normalise (filename, file, content_type) tuples."""
-        if len(t) == 3:
-            return t
-        elif len(t) == 2:
-            return t[0], t[1], "application/octet-stream"
-        else:
-            raise ValueError("files tuple must be (filename, file[, content_type])")
-
-    async def _get_session(self) -> aiohttp.ClientSession:
-        if self._session is None or self._session.closed:
-            timeout = aiohttp.ClientTimeout(total=self.timeout)
-            self._session = aiohttp.ClientSession(timeout=timeout)
-            self._owns_session = True
-        return self._session
-
-    async def close(self) -> None:
-        if self._owns_session and self._session and not self._session.closed:
-            await self._session.close()
-
-    async def __aenter__(self) -> "ApiClient":
-        """Allow usage as async‑context‑manager – ensures clean teardown"""
-        return self
-
-    async def __aexit__(self, exc_type, exc, tb):
-        await self.close()
-
-
-class ApiEndpoint(Generic[T, R]):
-    """Defines an API endpoint with its request and response types"""
-
-    def __init__(
-        self,
-        path: str,
-        method: HttpMethod,
-        request_model: Type[T],
-        response_model: Type[R],
-        query_params: Optional[dict[str, Any]] = None,
-    ):
-        """Initialize an API endpoint definition.
-
-        Args:
-            path: The URL path for this endpoint, can include placeholders like {id}
-            method: The HTTP method to use (GET, POST, etc.)
-            request_model: Pydantic model class that defines the structure and validation rules for API requests to this endpoint
-            response_model: Pydantic model class that defines the structure and validation rules for API responses from this endpoint
-            query_params: Optional dictionary of query parameters to include in the request
-        """
-        self.path = path
-        self.method = method
-        self.request_model = request_model
-        self.response_model = response_model
-        self.query_params = query_params or {}
-
-
-class SynchronousOperation(Generic[T, R]):
-    """Represents a single synchronous API operation."""
-
-    def __init__(
-        self,
-        endpoint: ApiEndpoint[T, R],
-        request: T,
-        files: Optional[dict[str, Any] | list[tuple[str, Any]]] = None,
-        api_base: str | None = None,
-        auth_token: Optional[str] = None,
-        comfy_api_key: Optional[str] = None,
-        auth_kwargs: Optional[dict[str, str]] = None,
-        timeout: float = 7200.0,
-        verify_ssl: bool = True,
-        content_type: str = "application/json",
-        multipart_parser: Callable | None = None,
-        max_retries: int = 3,
-        retry_delay: float = 1.0,
-        retry_backoff_factor: float = 2.0,
-    ) -> None:
-        self.endpoint = endpoint
-        self.request = request
-        self.files = files
-        self.api_base: str = api_base or args.comfy_api_base
-        self.auth_token = auth_token
-        self.comfy_api_key = comfy_api_key
-        if auth_kwargs is not None:
-            self.auth_token = auth_kwargs.get("auth_token", self.auth_token)
-            self.comfy_api_key = auth_kwargs.get("comfy_api_key", self.comfy_api_key)
-        self.timeout = timeout
-        self.verify_ssl = verify_ssl
-        self.content_type = content_type
-        self.multipart_parser = multipart_parser
-        self.max_retries = max_retries
-        self.retry_delay = retry_delay
-        self.retry_backoff_factor = retry_backoff_factor
-
-    async def execute(self, client: Optional[ApiClient] = None) -> R:
-        owns_client = client is None
-        if owns_client:
-            client = ApiClient(
-                base_url=self.api_base,
-                auth_token=self.auth_token,
-                comfy_api_key=self.comfy_api_key,
-                timeout=self.timeout,
-                verify_ssl=self.verify_ssl,
-                max_retries=self.max_retries,
-                retry_delay=self.retry_delay,
-                retry_backoff_factor=self.retry_backoff_factor,
-            )
-
-        try:
-            request_dict: Optional[dict[str, Any]]
-            if isinstance(self.request, EmptyRequest):
-                request_dict = None
-            else:
-                request_dict = self.request.model_dump(exclude_none=True)
-                for k, v in list(request_dict.items()):
-                    if isinstance(v, Enum):
-                        request_dict[k] = v.value
-
-            logging.debug("[DEBUG] API Request: %s %s", self.endpoint.method.value, self.endpoint.path)
-            logging.debug("[DEBUG] Request Data: %s", json.dumps(request_dict, indent=2))
-            logging.debug("[DEBUG] Query Params: %s", self.endpoint.query_params)
-
-            response_json = await client.request(
-                self.endpoint.method.value,
-                self.endpoint.path,
-                params=self.endpoint.query_params,
-                data=request_dict,
-                files=self.files,
-                content_type=self.content_type,
-                multipart_parser=self.multipart_parser,
-            )
-
-            logging.debug("=" * 50)
-            logging.debug("[DEBUG] RESPONSE DETAILS:")
-            logging.debug("[DEBUG] Status Code: 200 (Success)")
-            logging.debug("[DEBUG] Response Body: %s", json.dumps(response_json, indent=2))
-            logging.debug("=" * 50)
-
-            parsed_response = self.endpoint.response_model.model_validate(response_json)
-            logging.debug("[DEBUG] Parsed Response: %s", parsed_response)
-            return parsed_response
-        finally:
-            if owns_client:
-                await client.close()
-
-
-class TaskStatus(str, Enum):
-    """Enum for task status values"""
-
-    COMPLETED = "completed"
-    FAILED = "failed"
-    PENDING = "pending"
-
-
-class PollingOperation(Generic[T, R]):
-    """Represents an asynchronous API operation that requires polling for completion."""
-
-    def __init__(
-        self,
-        poll_endpoint: ApiEndpoint[EmptyRequest, R],
-        completed_statuses: list[str],
-        failed_statuses: list[str],
-        *,
-        status_extractor: Callable[[R], Optional[str]],
-        progress_extractor: Callable[[R], Optional[float]] | None = None,
-        result_url_extractor: Callable[[R], Optional[str]] | None = None,
-        price_extractor: Callable[[R], Optional[float]] | None = None,
-        request: Optional[T] = None,
-        api_base: str | None = None,
-        auth_token: Optional[str] = None,
-        comfy_api_key: Optional[str] = None,
-        auth_kwargs: Optional[dict[str, str]] = None,
-        poll_interval: float = 5.0,
-        max_poll_attempts: int = 120,  # Default max polling attempts (10 minutes with 5s interval)
-        max_retries: int = 3,  # Max retries per individual API call
-        retry_delay: float = 1.0,
-        retry_backoff_factor: float = 2.0,
-        estimated_duration: Optional[float] = None,
-        node_id: Optional[str] = None,
-    ) -> None:
-        self.poll_endpoint = poll_endpoint
-        self.request = request
-        self.api_base: str = api_base or args.comfy_api_base
-        self.auth_token = auth_token
-        self.comfy_api_key = comfy_api_key
-        if auth_kwargs is not None:
-            self.auth_token = auth_kwargs.get("auth_token", self.auth_token)
-            self.comfy_api_key = auth_kwargs.get("comfy_api_key", self.comfy_api_key)
-        self.poll_interval = poll_interval
-        self.max_poll_attempts = max_poll_attempts
-        self.max_retries = max_retries
-        self.retry_delay = retry_delay
-        self.retry_backoff_factor = retry_backoff_factor
-        self.estimated_duration = estimated_duration
-        self.status_extractor = status_extractor or (lambda x: getattr(x, "status", None))
-        self.progress_extractor = progress_extractor
-        self.result_url_extractor = result_url_extractor
-        self.price_extractor = price_extractor
-        self.node_id = node_id
-        self.completed_statuses = completed_statuses
-        self.failed_statuses = failed_statuses
-        self.final_response: Optional[R] = None
-        self.extracted_price: Optional[float] = None
-
-    async def execute(self, client: Optional[ApiClient] = None) -> R:
-        owns_client = client is None
-        if owns_client:
-            client = ApiClient(
-                base_url=self.api_base,
-                auth_token=self.auth_token,
-                comfy_api_key=self.comfy_api_key,
-                max_retries=self.max_retries,
-                retry_delay=self.retry_delay,
-                retry_backoff_factor=self.retry_backoff_factor,
-            )
-        try:
-            return await self._poll_until_complete(client)
-        finally:
-            if owns_client:
-                await client.close()
-
-    def _display_text_on_node(self, text: str):
-        if not self.node_id:
-            return
-        if self.extracted_price is not None:
-            text = f"Price: ${self.extracted_price}\n{text}"
-        PromptServer.instance.send_progress_text(text, self.node_id)
-
-    def _display_time_progress_on_node(self, time_completed: int | float):
-        if not self.node_id:
-            return
-        if self.estimated_duration is not None:
-            remaining = max(0, int(self.estimated_duration) - time_completed)
-            message = f"Task in progress: {time_completed}s (~{remaining}s remaining)"
-        else:
-            message = f"Task in progress: {time_completed}s"
-        self._display_text_on_node(message)
-
-    def _check_task_status(self, response: R) -> TaskStatus:
-        try:
-            status = self.status_extractor(response)
-            if status in self.completed_statuses:
-                return TaskStatus.COMPLETED
-            if status in self.failed_statuses:
-                return TaskStatus.FAILED
-            return TaskStatus.PENDING
-        except Exception as e:
-            logging.error("Error extracting status: %s", e)
-            return TaskStatus.PENDING
-
-    async def _poll_until_complete(self, client: ApiClient) -> R:
-        """Poll until the task is complete"""
-        consecutive_errors = 0
-        max_consecutive_errors = min(5, self.max_retries * 2)  # Limit consecutive errors
-
-        if self.progress_extractor:
-            progress = utils.ProgressBar(PROGRESS_BAR_MAX)
-
-        status = TaskStatus.PENDING
-        for poll_count in range(1, self.max_poll_attempts + 1):
-            try:
-                logging.debug("[DEBUG] Polling attempt #%s", poll_count)
-
-                request_dict = None if self.request is None else self.request.model_dump(exclude_none=True)
-
-                if poll_count == 1:
-                    logging.debug(
-                        "[DEBUG] Poll Request: %s %s",
-                        self.poll_endpoint.method.value,
-                        self.poll_endpoint.path,
-                    )
-                    logging.debug(
-                        "[DEBUG] Poll Request Data: %s",
-                        json.dumps(request_dict, indent=2) if request_dict else "None",
-                    )
-
-                # Query task status
-                resp = await client.request(
-                    self.poll_endpoint.method.value,
-                    self.poll_endpoint.path,
-                    params=self.poll_endpoint.query_params,
-                    data=request_dict,
-                )
-                consecutive_errors = 0  # reset on success
-                response_obj: R = self.poll_endpoint.response_model.model_validate(resp)
-
-                # Check if task is complete
-                status = self._check_task_status(response_obj)
-                logging.debug("[DEBUG] Task Status: %s", status)
-
-                # If progress extractor is provided, extract progress
-                if self.progress_extractor:
-                    new_progress = self.progress_extractor(response_obj)
-                    if new_progress is not None:
-                        progress.update_absolute(new_progress, total=PROGRESS_BAR_MAX)
-
-                if self.price_extractor:
-                    price = self.price_extractor(response_obj)
-                    if price is not None:
-                        self.extracted_price = price
-
-                if status == TaskStatus.COMPLETED:
-                    message = "Task completed successfully"
-                    if self.result_url_extractor:
-                        result_url = self.result_url_extractor(response_obj)
-                        if result_url:
-                            message = f"Result URL: {result_url}"
-                    logging.debug("[DEBUG] %s", message)
-                    self._display_text_on_node(message)
-                    self.final_response = response_obj
-                    if self.progress_extractor:
-                        progress.update(100)
-                    return self.final_response
-                if status == TaskStatus.FAILED:
-                    message = f"Task failed: {json.dumps(resp)}"
-                    logging.error("[DEBUG] %s", message)
-                    raise Exception(message)
-                logging.debug("[DEBUG] Task still pending, continuing to poll...")
-                # Task pending – wait
-                for i in range(int(self.poll_interval)):
-                    self._display_time_progress_on_node((poll_count - 1) * self.poll_interval + i)
-                    await asyncio.sleep(1)
-
-            except (LocalNetworkError, ApiServerError, NetworkError) as e:
-                consecutive_errors += 1
-                if consecutive_errors >= max_consecutive_errors:
-                    raise Exception(
-                        f"Polling aborted after {consecutive_errors} network errors: {str(e)}"
-                    ) from e
-                logging.warning(
-                    "Network error (%s/%s): %s",
-                    consecutive_errors,
-                    max_consecutive_errors,
-                    str(e),
-                )
-                await asyncio.sleep(self.poll_interval)
-            except Exception as e:
-                # For other errors, increment count and potentially abort
-                consecutive_errors += 1
-                if consecutive_errors >= max_consecutive_errors or status == TaskStatus.FAILED:
-                    raise Exception(
-                        f"Polling aborted after {consecutive_errors} consecutive errors: {str(e)}"
-                    ) from e
-
-                logging.error("[DEBUG] Polling error: %s", str(e))
-                logging.warning(
-                    "Error during polling (attempt %s/%s): %s. Will retry in %s seconds.",
-                    poll_count,
-                    self.max_poll_attempts,
-                    str(e),
-                    self.poll_interval,
-                )
-                await asyncio.sleep(self.poll_interval)
-
-        # If we've exhausted all polling attempts
-        raise Exception(
-            f"Polling timed out after {self.max_poll_attempts} attempts (" f"{self.max_poll_attempts * self.poll_interval} seconds). "
-            "The operation may still be running on the server but is taking longer than expected."
-        )
--- a/comfy_api_nodes/apis/gemini_api.py
+++ b/comfy_api_nodes/apis/gemini_api.py
@@ -1,22 +1,229 @@
-from typing import Optional
+from datetime import date
+from enum import Enum
+from typing import Any

-from comfy_api_nodes.apis import GeminiGenerationConfig, GeminiContent, GeminiSafetySetting, GeminiSystemInstructionContent, GeminiTool, GeminiVideoMetadata
-from pydantic import BaseModel
+from pydantic import BaseModel, Field
+
+
+class GeminiSafetyCategory(str, Enum):
+    HARM_CATEGORY_SEXUALLY_EXPLICIT = "HARM_CATEGORY_SEXUALLY_EXPLICIT"
+    HARM_CATEGORY_HATE_SPEECH = "HARM_CATEGORY_HATE_SPEECH"
+    HARM_CATEGORY_HARASSMENT = "HARM_CATEGORY_HARASSMENT"
+    HARM_CATEGORY_DANGEROUS_CONTENT = "HARM_CATEGORY_DANGEROUS_CONTENT"
+
+
+class GeminiSafetyThreshold(str, Enum):
+    OFF = "OFF"
+    BLOCK_NONE = "BLOCK_NONE"
+    BLOCK_LOW_AND_ABOVE = "BLOCK_LOW_AND_ABOVE"
+    BLOCK_MEDIUM_AND_ABOVE = "BLOCK_MEDIUM_AND_ABOVE"
+    BLOCK_ONLY_HIGH = "BLOCK_ONLY_HIGH"
+
+
+class GeminiSafetySetting(BaseModel):
+    category: GeminiSafetyCategory
+    threshold: GeminiSafetyThreshold
+
+
+class GeminiRole(str, Enum):
+    user = "user"
+    model = "model"
+
+
+class GeminiMimeType(str, Enum):
+    application_pdf = "application/pdf"
+    audio_mpeg = "audio/mpeg"
+    audio_mp3 = "audio/mp3"
+    audio_wav = "audio/wav"
+    image_png = "image/png"
+    image_jpeg = "image/jpeg"
+    image_webp = "image/webp"
+    text_plain = "text/plain"
+    video_mov = "video/mov"
+    video_mpeg = "video/mpeg"
+    video_mp4 = "video/mp4"
+    video_mpg = "video/mpg"
+    video_avi = "video/avi"
+    video_wmv = "video/wmv"
+    video_mpegps = "video/mpegps"
+    video_flv = "video/flv"
+
+
+class GeminiInlineData(BaseModel):
+    data: str | None = Field(
+        None,
+        description="The base64 encoding of the image, PDF, or video to include inline in the prompt. "
+        "When including media inline, you must also specify the media type (mimeType) of the data. Size limit: 20MB",
+    )
+    mimeType: GeminiMimeType | None = Field(None)
+
+
+class GeminiPart(BaseModel):
+    inlineData: GeminiInlineData | None = Field(None)
+    text: str | None = Field(None)
+
+
+class GeminiTextPart(BaseModel):
+    text: str | None = Field(None)
+
+
+class GeminiContent(BaseModel):
+    parts: list[GeminiPart] = Field(...)
+    role: GeminiRole = Field(..., examples=["user"])
+
+
+class GeminiSystemInstructionContent(BaseModel):
+    parts: list[GeminiTextPart] = Field(
+        ...,
+        description="A list of ordered parts that make up a single message. "
+        "Different parts may have different IANA MIME types.",
+    )
+    role: GeminiRole = Field(
+        ...,
+        description="The identity of the entity that creates the message. "
+        "The following values are supported: "
+        "user: This indicates that the message is sent by a real person, typically a user-generated message. "
+        "model: This indicates that the message is generated by the model. "
+        "The model value is used to insert messages from model into the conversation during multi-turn conversations. "
+        "For non-multi-turn conversations, this field can be left blank or unset.",
+    )
+
+
+class GeminiFunctionDeclaration(BaseModel):
+    description: str | None = Field(None)
+    name: str = Field(...)
+    parameters: dict[str, Any] = Field(..., description="JSON schema for the function parameters")
+
+
+class GeminiTool(BaseModel):
+    functionDeclarations: list[GeminiFunctionDeclaration] | None = Field(None)
+
+
+class GeminiOffset(BaseModel):
+    nanos: int | None = Field(None, ge=0, le=999999999)
+    seconds: int | None = Field(None, ge=-315576000000, le=315576000000)
+
+
+class GeminiVideoMetadata(BaseModel):
+    endOffset: GeminiOffset | None = Field(None)
+    startOffset: GeminiOffset | None = Field(None)
+
+
+class GeminiGenerationConfig(BaseModel):
+    maxOutputTokens: int | None = Field(None, ge=16, le=8192)
+    seed: int | None = Field(None)
+    stopSequences: list[str] | None = Field(None)
+    temperature: float | None = Field(1, ge=0.0, le=2.0)
+    topK: int | None = Field(40, ge=1)
+    topP: float | None = Field(0.95, ge=0.0, le=1.0)


 class GeminiImageConfig(BaseModel):
-    aspectRatio: Optional[str] = None
+    aspectRatio: str | None = Field(None)
+    resolution: str | None = Field(None)


 class GeminiImageGenerationConfig(GeminiGenerationConfig):
-    responseModalities: Optional[list[str]] = None
-    imageConfig: Optional[GeminiImageConfig] = None
+    responseModalities: list[str] | None = Field(None)
+    imageConfig: GeminiImageConfig | None = Field(None)


 class GeminiImageGenerateContentRequest(BaseModel):
-    contents: list[GeminiContent]
-    generationConfig: Optional[GeminiImageGenerationConfig] = None
-    safetySettings: Optional[list[GeminiSafetySetting]] = None
-    systemInstruction: Optional[GeminiSystemInstructionContent] = None
-    tools: Optional[list[GeminiTool]] = None
-    videoMetadata: Optional[GeminiVideoMetadata] = None
+    contents: list[GeminiContent] = Field(...)
+    generationConfig: GeminiImageGenerationConfig | None = Field(None)
+    safetySettings: list[GeminiSafetySetting] | None = Field(None)
+    systemInstruction: GeminiSystemInstructionContent | None = Field(None)
+    tools: list[GeminiTool] | None = Field(None)
+    videoMetadata: GeminiVideoMetadata | None = Field(None)
+
+
+class GeminiGenerateContentRequest(BaseModel):
+    contents: list[GeminiContent] = Field(...)
+    generationConfig: GeminiGenerationConfig | None = Field(None)
+    safetySettings: list[GeminiSafetySetting] | None = Field(None)
+    systemInstruction: GeminiSystemInstructionContent | None = Field(None)
+    tools: list[GeminiTool] | None = Field(None)
+    videoMetadata: GeminiVideoMetadata | None = Field(None)
+
+
+class Modality(str, Enum):
+    MODALITY_UNSPECIFIED = "MODALITY_UNSPECIFIED"
+    TEXT = "TEXT"
+    IMAGE = "IMAGE"
+    VIDEO = "VIDEO"
+    AUDIO = "AUDIO"
+    DOCUMENT = "DOCUMENT"
+
+
+class ModalityTokenCount(BaseModel):
+    modality: Modality | None = None
+    tokenCount: int | None = Field(None, description="Number of tokens for the given modality.")
+
+
+class Probability(str, Enum):
+    NEGLIGIBLE = "NEGLIGIBLE"
+    LOW = "LOW"
+    MEDIUM = "MEDIUM"
+    HIGH = "HIGH"
+    UNKNOWN = "UNKNOWN"
+
+
+class GeminiSafetyRating(BaseModel):
+    category: GeminiSafetyCategory | None = None
+    probability: Probability | None = Field(
+        None,
+        description="The probability that the content violates the specified safety category",
+    )
+
+
+class GeminiCitation(BaseModel):
+    authors: list[str] | None = None
+    endIndex: int | None = None
+    license: str | None = None
+    publicationDate: date | None = None
+    startIndex: int | None = None
+    title: str | None = None
+    uri: str | None = None
+
+
+class GeminiCitationMetadata(BaseModel):
+    citations: list[GeminiCitation] | None = None
+
+
+class GeminiCandidate(BaseModel):
+    citationMetadata: GeminiCitationMetadata | None = None
+    content: GeminiContent | None = None
+    finishReason: str | None = None
+    safetyRatings: list[GeminiSafetyRating] | None = None
+
+
+class GeminiPromptFeedback(BaseModel):
+    blockReason: str | None = None
+    blockReasonMessage: str | None = None
+    safetyRatings: list[GeminiSafetyRating] | None = None
+
+
+class GeminiUsageMetadata(BaseModel):
+    cachedContentTokenCount: int | None = Field(
+        None,
+        description="Output only. Number of tokens in the cached part in the input (the cached content).",
+    )
+    candidatesTokenCount: int | None = Field(None, description="Number of tokens in the response(s).")
+    candidatesTokensDetails: list[ModalityTokenCount] | None = Field(
+        None, description="Breakdown of candidate tokens by modality."
+    )
+    promptTokenCount: int | None = Field(
+        None,
+        description="Number of tokens in the request. When cachedContent is set, this is still the total effective prompt size meaning this includes the number of tokens in the cached content.",
+    )
+    promptTokensDetails: list[ModalityTokenCount] | None = Field(
+        None, description="Breakdown of prompt tokens by modality."
+    )
+    thoughtsTokenCount: int | None = Field(None, description="Number of tokens present in thoughts output.")
+    toolUsePromptTokenCount: int | None = Field(None, description="Number of tokens present in tool-use prompt(s).")
+
+
+class GeminiGenerateContentResponse(BaseModel):
+    candidates: list[GeminiCandidate] | None = Field(None)
+    promptFeedback: GeminiPromptFeedback | None = Field(None)
+    usageMetadata: GeminiUsageMetadata | None = Field(None)
--- a/comfy_api_nodes/nodes_gemini.py
+++ b/comfy_api_nodes/nodes_gemini.py
@@ -3,8 +3,6 @@ API Nodes for Gemini Multimodal LLM Usage via Remote API
 See: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
 """

-from __future__ import annotations
-
 import base64
 import json
 import os
@@ -12,7 +10,7 @@ import time
 import uuid
 from enum import Enum
 from io import BytesIO
-from typing import Literal, Optional
+from typing import Literal

 import torch
 from typing_extensions import override
@@ -20,18 +18,17 @@ from typing_extensions import override
 import folder_paths
 from comfy_api.latest import IO, ComfyExtension, Input
 from comfy_api.util import VideoCodec, VideoContainer
-from comfy_api_nodes.apis import (
+from comfy_api_nodes.apis.gemini_api import (
    GeminiContent,
    GeminiGenerateContentRequest,
    GeminiGenerateContentResponse,
-    GeminiInlineData,
-    GeminiMimeType,
-    GeminiPart,
-)
-from comfy_api_nodes.apis.gemini_api import (
    GeminiImageConfig,
    GeminiImageGenerateContentRequest,
    GeminiImageGenerationConfig,
+    GeminiInlineData,
+    GeminiMimeType,
+    GeminiPart,
+    GeminiRole,
 )
 from comfy_api_nodes.util import (
    ApiEndpoint,
@@ -57,6 +54,7 @@ class GeminiModel(str, Enum):
    gemini_2_5_flash_preview_04_17 = "gemini-2.5-flash-preview-04-17"
    gemini_2_5_pro = "gemini-2.5-pro"
    gemini_2_5_flash = "gemini-2.5-flash"
+    gemini_3_0_pro = "gemini-3-pro-preview"


 class GeminiImageModel(str, Enum):
@@ -103,6 +101,16 @@ def get_parts_by_type(response: GeminiGenerateContentResponse, part_type: Litera
    Returns:
        List of response parts matching the requested type.
    """
+    if response.candidates is None:
+        if response.promptFeedback.blockReason:
+            feedback = response.promptFeedback
+            raise ValueError(
+                f"Gemini API blocked the request. Reason: {feedback.blockReason} ({feedback.blockReasonMessage})"
+            )
+        raise NotImplementedError(
+            "Gemini returned no response candidates. "
+            "Please report to ComfyUI repository with the example of workflow to reproduce this."
+        )
    parts = []
    for part in response.candidates[0].content.parts:
        if part_type == "text" and hasattr(part, "text") and part.text:
@@ -272,10 +280,10 @@ class GeminiNode(IO.ComfyNode):
        prompt: str,
        model: str,
        seed: int,
-        images: Optional[torch.Tensor] = None,
-        audio: Optional[Input.Audio] = None,
-        video: Optional[Input.Video] = None,
-        files: Optional[list[GeminiPart]] = None,
+        images: torch.Tensor | None = None,
+        audio: Input.Audio | None = None,
+        video: Input.Video | None = None,
+        files: list[GeminiPart] | None = None,
    ) -> IO.NodeOutput:
        validate_string(prompt, strip_whitespace=False)

@@ -300,7 +308,7 @@ class GeminiNode(IO.ComfyNode):
            data=GeminiGenerateContentRequest(
                contents=[
                    GeminiContent(
-                        role="user",
+                        role=GeminiRole.user,
                        parts=parts,
                    )
                ]
@@ -308,7 +316,6 @@ class GeminiNode(IO.ComfyNode):
            response_model=GeminiGenerateContentResponse,
        )

-        # Get result output
        output_text = get_text_from_response(response)
        if output_text:
            # Not a true chat history like the OpenAI Chat node. It is emulated so the frontend can show a copy button.
@@ -406,7 +413,7 @@ class GeminiInputFiles(IO.ComfyNode):
        )

    @classmethod
-    def execute(cls, file: str, GEMINI_INPUT_FILES: Optional[list[GeminiPart]] = None) -> IO.NodeOutput:
+    def execute(cls, file: str, GEMINI_INPUT_FILES: list[GeminiPart] | None = None) -> IO.NodeOutput:
        """Loads and formats input files for Gemini API."""
        if GEMINI_INPUT_FILES is None:
            GEMINI_INPUT_FILES = []
@@ -421,7 +428,7 @@ class GeminiImage(IO.ComfyNode):
    def define_schema(cls):
        return IO.Schema(
            node_id="GeminiImageNode",
-            display_name="Google Gemini Image",
+            display_name="Nano Banana (Google Gemini Image)",
            category="api node/image/Gemini",
            description="Edit images synchronously via Google API.",
            inputs=[
@@ -488,8 +495,8 @@ class GeminiImage(IO.ComfyNode):
        prompt: str,
        model: str,
        seed: int,
-        images: Optional[torch.Tensor] = None,
-        files: Optional[list[GeminiPart]] = None,
+        images: torch.Tensor | None = None,
+        files: list[GeminiPart] | None = None,
        aspect_ratio: str = "auto",
    ) -> IO.NodeOutput:
        validate_string(prompt, strip_whitespace=True, min_length=1)
@@ -510,7 +517,7 @@ class GeminiImage(IO.ComfyNode):
            endpoint=ApiEndpoint(path=f"{GEMINI_BASE_ENDPOINT}/{model}", method="POST"),
            data=GeminiImageGenerateContentRequest(
                contents=[
-                    GeminiContent(role="user", parts=parts),
+                    GeminiContent(role=GeminiRole.user, parts=parts),
                ],
                generationConfig=GeminiImageGenerationConfig(
                    responseModalities=["TEXT", "IMAGE"],
--- a/comfy_api_nodes/nodes_rodin.py
+++ b/comfy_api_nodes/nodes_rodin.py
@@ -5,12 +5,9 @@ Rodin API docs: https://developer.hyper3d.ai/

 """

-from __future__ import annotations
 from inspect import cleandoc
 import folder_paths as comfy_paths
-import aiohttp
 import os
-import asyncio
 import logging
 import math
 from typing import Optional
@@ -26,11 +23,11 @@ from comfy_api_nodes.apis.rodin_api import (
    Rodin3DDownloadResponse,
    JobStatus,
 )
-from comfy_api_nodes.apis.client import (
+from comfy_api_nodes.util import (
+    sync_op,
+    poll_op,
    ApiEndpoint,
-    HttpMethod,
-    SynchronousOperation,
-    PollingOperation,
+    download_url_to_bytesio,
 )
 from comfy_api.latest import ComfyExtension, IO

@@ -121,35 +118,31 @@ def tensor_to_filelike(tensor, max_pixels: int = 2048*2048):


 async def create_generate_task(
+    cls: type[IO.ComfyNode],
    images=None,
    seed=1,
    material="PBR",
    quality_override=18000,
    tier="Regular",
    mesh_mode="Quad",
-    TAPose = False,
-    auth_kwargs: Optional[dict[str, str]] = None,
+    ta_pose: bool = False,
 ):
    if images is None:
        raise Exception("Rodin 3D generate requires at least 1 image.")
    if len(images) > 5:
        raise Exception("Rodin 3D generate requires up to 5 image.")

-    path = "/proxy/rodin/api/v2/rodin"
-    operation = SynchronousOperation(
-        endpoint=ApiEndpoint(
-            path=path,
-            method=HttpMethod.POST,
-            request_model=Rodin3DGenerateRequest,
-            response_model=Rodin3DGenerateResponse,
-        ),
-        request=Rodin3DGenerateRequest(
+    response = await sync_op(
+        cls,
+        ApiEndpoint(path="/proxy/rodin/api/v2/rodin", method="POST"),
+        response_model=Rodin3DGenerateResponse,
+        data=Rodin3DGenerateRequest(
            seed=seed,
            tier=tier,
            material=material,
            quality_override=quality_override,
            mesh_mode=mesh_mode,
-            TAPose=TAPose,
+            TAPose=ta_pose,
        ),
        files=[
            (
@@ -159,11 +152,8 @@ async def create_generate_task(
            for image in images if image is not None
        ],
        content_type="multipart/form-data",
-        auth_kwargs=auth_kwargs,
    )

-    response = await operation.execute()
-
    if hasattr(response, "error"):
        error_message = f"Rodin3D Create 3D generate Task Failed. Message: {response.message}, error: {response.error}"
        logging.error(error_message)
@@ -187,74 +177,46 @@ def check_rodin_status(response: Rodin3DCheckStatusResponse) -> str:
        return "DONE"
    return "Generating"

+def extract_progress(response: Rodin3DCheckStatusResponse) -> Optional[int]:
+    if not response.jobs:
+        return None
+    completed_count = sum(1 for job in response.jobs if job.status == JobStatus.Done)
+    return int((completed_count / len(response.jobs)) * 100)

-async def poll_for_task_status(
-    subscription_key, auth_kwargs: Optional[dict[str, str]] = None,
-) -> Rodin3DCheckStatusResponse:
-    poll_operation = PollingOperation(
-        poll_endpoint=ApiEndpoint(
-            path="/proxy/rodin/api/v2/status",
-            method=HttpMethod.POST,
-            request_model=Rodin3DCheckStatusRequest,
-            response_model=Rodin3DCheckStatusResponse,
-        ),
-        request=Rodin3DCheckStatusRequest(subscription_key=subscription_key),
-        completed_statuses=["DONE"],
-        failed_statuses=["FAILED"],
-        status_extractor=check_rodin_status,
-        poll_interval=3.0,
-        auth_kwargs=auth_kwargs,
-    )
+
+async def poll_for_task_status(subscription_key: str, cls: type[IO.ComfyNode]) -> Rodin3DCheckStatusResponse:
    logging.info("[ Rodin3D API - CheckStatus ] Generate Start!")
-    return await poll_operation.execute()
-
-
-async def get_rodin_download_list(uuid, auth_kwargs: Optional[dict[str, str]] = None) -> Rodin3DDownloadResponse:
-    logging.info("[ Rodin3D API - Downloading ] Generate Successfully!")
-    operation = SynchronousOperation(
-        endpoint=ApiEndpoint(
-            path="/proxy/rodin/api/v2/download",
-            method=HttpMethod.POST,
-            request_model=Rodin3DDownloadRequest,
-            response_model=Rodin3DDownloadResponse,
-        ),
-        request=Rodin3DDownloadRequest(task_uuid=uuid),
-        auth_kwargs=auth_kwargs,
+    return await poll_op(
+        cls,
+        ApiEndpoint(path="/proxy/rodin/api/v2/status", method="POST"),
+        response_model=Rodin3DCheckStatusResponse,
+        data=Rodin3DCheckStatusRequest(subscription_key=subscription_key),
+        status_extractor=check_rodin_status,
+        progress_extractor=extract_progress,
    )
-    return await operation.execute()


-async def download_files(url_list, task_uuid):
+async def get_rodin_download_list(uuid: str, cls: type[IO.ComfyNode]) -> Rodin3DDownloadResponse:
+    logging.info("[ Rodin3D API - Downloading ] Generate Successfully!")
+    return await sync_op(
+        cls,
+        ApiEndpoint(path="/proxy/rodin/api/v2/download", method="POST"),
+        response_model=Rodin3DDownloadResponse,
+        data=Rodin3DDownloadRequest(task_uuid=uuid),
+        monitor_progress=False,
+    )
+
+
+async def download_files(url_list, task_uuid: str):
    result_folder_name = f"Rodin3D_{task_uuid}"
    save_path = os.path.join(comfy_paths.get_output_directory(), result_folder_name)
    os.makedirs(save_path, exist_ok=True)
    model_file_path = None
-    async with aiohttp.ClientSession() as session:
-        for i in url_list.list:
-            file_path = os.path.join(save_path, i.name)
-            if file_path.endswith(".glb"):
-                model_file_path = os.path.join(result_folder_name, i.name)
-            logging.info("[ Rodin3D API - download_files ] Downloading file: %s", file_path)
-            max_retries = 5
-            for attempt in range(max_retries):
-                try:
-                    async with session.get(i.url) as resp:
-                        resp.raise_for_status()
-                        with open(file_path, "wb") as f:
-                            async for chunk in resp.content.iter_chunked(32 * 1024):
-                                f.write(chunk)
-                    break
-                except Exception as e:
-                    logging.info("[ Rodin3D API - download_files ] Error downloading %s:%s", file_path, str(e))
-                    if attempt < max_retries - 1:
-                        logging.info("Retrying...")
-                        await asyncio.sleep(2)
-                    else:
-                        logging.info(
-                            "[ Rodin3D API - download_files ] Failed to download %s after %s attempts.",
-                            file_path,
-                            max_retries,
-                        )
+    for i in url_list.list:
+        file_path = os.path.join(save_path, i.name)
+        if file_path.endswith(".glb"):
+            model_file_path = os.path.join(result_folder_name, i.name)
+        await download_url_to_bytesio(i.url, file_path)
    return model_file_path


@@ -276,6 +238,7 @@ class Rodin3D_Regular(IO.ComfyNode):
            hidden=[
                IO.Hidden.auth_token_comfy_org,
                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
            ],
            is_api_node=True,
        )
@@ -294,21 +257,17 @@ class Rodin3D_Regular(IO.ComfyNode):
        for i in range(num_images):
            m_images.append(Images[i])
        mesh_mode, quality_override = get_quality_mode(Polygon_count)
-        auth = {
-            "auth_token": cls.hidden.auth_token_comfy_org,
-            "comfy_api_key": cls.hidden.api_key_comfy_org,
-        }
        task_uuid, subscription_key = await create_generate_task(
+            cls,
            images=m_images,
            seed=Seed,
            material=Material_Type,
            quality_override=quality_override,
            tier=tier,
            mesh_mode=mesh_mode,
-            auth_kwargs=auth,
        )
-        await poll_for_task_status(subscription_key, auth_kwargs=auth)
-        download_list = await get_rodin_download_list(task_uuid, auth_kwargs=auth)
+        await poll_for_task_status(subscription_key, cls)
+        download_list = await get_rodin_download_list(task_uuid, cls)
        model = await download_files(download_list, task_uuid)

        return IO.NodeOutput(model)
@@ -332,6 +291,7 @@ class Rodin3D_Detail(IO.ComfyNode):
            hidden=[
                IO.Hidden.auth_token_comfy_org,
                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
            ],
            is_api_node=True,
        )
@@ -350,21 +310,17 @@ class Rodin3D_Detail(IO.ComfyNode):
        for i in range(num_images):
            m_images.append(Images[i])
        mesh_mode, quality_override = get_quality_mode(Polygon_count)
-        auth = {
-            "auth_token": cls.hidden.auth_token_comfy_org,
-            "comfy_api_key": cls.hidden.api_key_comfy_org,
-        }
        task_uuid, subscription_key = await create_generate_task(
+            cls,
            images=m_images,
            seed=Seed,
            material=Material_Type,
            quality_override=quality_override,
            tier=tier,
            mesh_mode=mesh_mode,
-            auth_kwargs=auth,
        )
-        await poll_for_task_status(subscription_key, auth_kwargs=auth)
-        download_list = await get_rodin_download_list(task_uuid, auth_kwargs=auth)
+        await poll_for_task_status(subscription_key, cls)
+        download_list = await get_rodin_download_list(task_uuid, cls)
        model = await download_files(download_list, task_uuid)

        return IO.NodeOutput(model)
@@ -388,6 +344,7 @@ class Rodin3D_Smooth(IO.ComfyNode):
            hidden=[
                IO.Hidden.auth_token_comfy_org,
                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
            ],
            is_api_node=True,
        )
@@ -400,27 +357,22 @@ class Rodin3D_Smooth(IO.ComfyNode):
        Material_Type,
        Polygon_count,
    ) -> IO.NodeOutput:
-        tier = "Smooth"
        num_images = Images.shape[0]
        m_images = []
        for i in range(num_images):
            m_images.append(Images[i])
        mesh_mode, quality_override = get_quality_mode(Polygon_count)
-        auth = {
-            "auth_token": cls.hidden.auth_token_comfy_org,
-            "comfy_api_key": cls.hidden.api_key_comfy_org,
-        }
        task_uuid, subscription_key = await create_generate_task(
+            cls,
            images=m_images,
            seed=Seed,
            material=Material_Type,
            quality_override=quality_override,
-            tier=tier,
+            tier="Smooth",
            mesh_mode=mesh_mode,
-            auth_kwargs=auth,
        )
-        await poll_for_task_status(subscription_key, auth_kwargs=auth)
-        download_list = await get_rodin_download_list(task_uuid, auth_kwargs=auth)
+        await poll_for_task_status(subscription_key, cls)
+        download_list = await get_rodin_download_list(task_uuid, cls)
        model = await download_files(download_list, task_uuid)

        return IO.NodeOutput(model)
@@ -451,6 +403,7 @@ class Rodin3D_Sketch(IO.ComfyNode):
            hidden=[
                IO.Hidden.auth_token_comfy_org,
                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
            ],
            is_api_node=True,
        )
@@ -461,29 +414,21 @@ class Rodin3D_Sketch(IO.ComfyNode):
        Images,
        Seed,
    ) -> IO.NodeOutput:
-        tier = "Sketch"
        num_images = Images.shape[0]
        m_images = []
        for i in range(num_images):
            m_images.append(Images[i])
-        material_type = "PBR"
-        quality_override = 18000
-        mesh_mode = "Quad"
-        auth = {
-            "auth_token": cls.hidden.auth_token_comfy_org,
-            "comfy_api_key": cls.hidden.api_key_comfy_org,
-        }
        task_uuid, subscription_key = await create_generate_task(
+            cls,
            images=m_images,
            seed=Seed,
-            material=material_type,
-            quality_override=quality_override,
-            tier=tier,
-            mesh_mode=mesh_mode,
-            auth_kwargs=auth,
+            material="PBR",
+            quality_override=18000,
+            tier="Sketch",
+            mesh_mode="Quad",
        )
-        await poll_for_task_status(subscription_key, auth_kwargs=auth)
-        download_list = await get_rodin_download_list(task_uuid, auth_kwargs=auth)
+        await poll_for_task_status(subscription_key, cls)
+        download_list = await get_rodin_download_list(task_uuid, cls)
        model = await download_files(download_list, task_uuid)

        return IO.NodeOutput(model)
@@ -522,6 +467,7 @@ class Rodin3D_Gen2(IO.ComfyNode):
            hidden=[
                IO.Hidden.auth_token_comfy_org,
                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
            ],
            is_api_node=True,
        )
@@ -541,22 +487,18 @@ class Rodin3D_Gen2(IO.ComfyNode):
        for i in range(num_images):
            m_images.append(Images[i])
        mesh_mode, quality_override = get_quality_mode(Polygon_count)
-        auth = {
-            "auth_token": cls.hidden.auth_token_comfy_org,
-            "comfy_api_key": cls.hidden.api_key_comfy_org,
-        }
        task_uuid, subscription_key = await create_generate_task(
+            cls,
            images=m_images,
            seed=Seed,
            material=Material_Type,
            quality_override=quality_override,
            tier=tier,
            mesh_mode=mesh_mode,
-            TAPose=TAPose,
-            auth_kwargs=auth,
+            ta_pose=TAPose,
        )
-        await poll_for_task_status(subscription_key, auth_kwargs=auth)
-        download_list = await get_rodin_download_list(task_uuid, auth_kwargs=auth)
+        await poll_for_task_status(subscription_key, cls)
+        download_list = await get_rodin_download_list(task_uuid, cls)
        model = await download_files(download_list, task_uuid)

        return IO.NodeOutput(model)
--- a/comfy_api_nodes/util/client.py
+++ b/comfy_api_nodes/util/client.py
@@ -16,9 +16,9 @@ from pydantic import BaseModel

 from comfy import utils
 from comfy_api.latest import IO
-from comfy_api_nodes.apis import request_logger
 from server import PromptServer

+from . import request_logger
 from ._helpers import (
    default_base_url,
    get_auth_header,
@@ -77,7 +77,7 @@ class _PollUIState:


 _RETRY_STATUS = {408, 429, 500, 502, 503, 504}
-COMPLETED_STATUSES = ["succeeded", "succeed", "success", "completed", "finished"]
+COMPLETED_STATUSES = ["succeeded", "succeed", "success", "completed", "finished", "done"]
 FAILED_STATUSES = ["cancelled", "canceled", "fail", "failed", "error"]
 QUEUED_STATUSES = ["created", "queued", "queueing", "submitted"]

--- a/comfy_api_nodes/util/download_helpers.py
+++ b/comfy_api_nodes/util/download_helpers.py
@@ -12,8 +12,8 @@ from aiohttp.client_exceptions import ClientError, ContentTypeError

 from comfy_api.input_impl import VideoFromFile
 from comfy_api.latest import IO as COMFY_IO
-from comfy_api_nodes.apis import request_logger

+from . import request_logger
 from ._helpers import (
    default_base_url,
    get_auth_header,
--- a/comfy_api_nodes/util/request_logger.py
+++ b/comfy_api_nodes/util/request_logger.py
@@ -1,11 +1,11 @@
 from __future__ import annotations

-import os
 import datetime
+import hashlib
 import json
 import logging
+import os
 import re
-import hashlib
 from typing import Any

 import folder_paths
--- a/comfy_api_nodes/util/upload_helpers.py
+++ b/comfy_api_nodes/util/upload_helpers.py
@@ -13,8 +13,8 @@ from pydantic import BaseModel, Field

 from comfy_api.latest import IO, Input
 from comfy_api.util import VideoCodec, VideoContainer
-from comfy_api_nodes.apis import request_logger

+from . import request_logger
 from ._helpers import is_processing_interrupted, sleep_with_interrupt
 from .client import (
    ApiEndpoint,
--- a/comfy_execution/caching.py
+++ b/comfy_execution/caching.py
@@ -399,6 +399,8 @@ class RAMPressureCache(LRUCache):
            ram_usage = RAM_CACHE_DEFAULT_RAM_USAGE
            def scan_list_for_ram_usage(outputs):
                nonlocal ram_usage
+                if outputs is None:
+                    return
                for output in outputs:
                    if isinstance(output, list):
                        scan_list_for_ram_usage(output)
--- a/comfy_extras/nodes_easycache.py
+++ b/comfy_extras/nodes_easycache.py
@@ -11,13 +11,13 @@ if TYPE_CHECKING:

 def easycache_forward_wrapper(executor, *args, **kwargs):
    # get values from args
-    x: torch.Tensor = args[0]
    transformer_options: dict[str] = args[-1]
    if not isinstance(transformer_options, dict):
        transformer_options = kwargs.get("transformer_options")
        if not transformer_options:
            transformer_options = args[-2]
    easycache: EasyCacheHolder = transformer_options["easycache"]
+    x: torch.Tensor = args[0][:, :easycache.output_channels]
    sigmas = transformer_options["sigmas"]
    uuids = transformer_options["uuids"]
    if sigmas is not None and easycache.is_past_end_timestep(sigmas):
@@ -82,13 +82,13 @@ def easycache_forward_wrapper(executor, *args, **kwargs):

 def lazycache_predict_noise_wrapper(executor, *args, **kwargs):
    # get values from args
-    x: torch.Tensor = args[0]
    timestep: float = args[1]
    model_options: dict[str] = args[2]
    easycache: LazyCacheHolder = model_options["transformer_options"]["easycache"]
    if easycache.is_past_end_timestep(timestep):
        return executor(*args, **kwargs)
    # prepare next x_prev
+    x: torch.Tensor = args[0][:, :easycache.output_channels]
    next_x_prev = x
    input_change = None
    do_easycache = easycache.should_do_easycache(timestep)
@@ -173,7 +173,7 @@ def easycache_sample_wrapper(executor, *args, **kwargs):


 class EasyCacheHolder:
-    def __init__(self, reuse_threshold: float, start_percent: float, end_percent: float, subsample_factor: int, offload_cache_diff: bool, verbose: bool=False):
+    def __init__(self, reuse_threshold: float, start_percent: float, end_percent: float, subsample_factor: int, offload_cache_diff: bool, verbose: bool=False, output_channels: int=None):
        self.name = "EasyCache"
        self.reuse_threshold = reuse_threshold
        self.start_percent = start_percent
@@ -202,6 +202,7 @@ class EasyCacheHolder:
        self.allow_mismatch = True
        self.cut_from_start = True
        self.state_metadata = None
+        self.output_channels = output_channels

    def is_past_end_timestep(self, timestep: float) -> bool:
        return not (timestep[0] > self.end_t).item()
@@ -264,7 +265,7 @@ class EasyCacheHolder:
                    else:
                        slicing.append(slice(None))
                batch_slice = batch_slice + slicing
-            x[batch_slice] += self.uuid_cache_diffs[uuid].to(x.device)
+            x[tuple(batch_slice)] += self.uuid_cache_diffs[uuid].to(x.device)
        return x

    def update_cache_diff(self, output: torch.Tensor, x: torch.Tensor, uuids: list[UUID]):
@@ -283,7 +284,7 @@ class EasyCacheHolder:
                else:
                    slicing.append(slice(None))
                skip_dim = False
-            x = x[slicing]
+            x = x[tuple(slicing)]
        diff = output - x
        batch_offset = diff.shape[0] // len(uuids)
        for i, uuid in enumerate(uuids):
@@ -323,7 +324,7 @@ class EasyCacheHolder:
        return self

    def clone(self):
-        return EasyCacheHolder(self.reuse_threshold, self.start_percent, self.end_percent, self.subsample_factor, self.offload_cache_diff, self.verbose)
+        return EasyCacheHolder(self.reuse_threshold, self.start_percent, self.end_percent, self.subsample_factor, self.offload_cache_diff, self.verbose, output_channels=self.output_channels)


 class EasyCacheNode(io.ComfyNode):
@@ -350,7 +351,7 @@ class EasyCacheNode(io.ComfyNode):
    @classmethod
    def execute(cls, model: io.Model.Type, reuse_threshold: float, start_percent: float, end_percent: float, verbose: bool) -> io.NodeOutput:
        model = model.clone()
-        model.model_options["transformer_options"]["easycache"] = EasyCacheHolder(reuse_threshold, start_percent, end_percent, subsample_factor=8, offload_cache_diff=False, verbose=verbose)
+        model.model_options["transformer_options"]["easycache"] = EasyCacheHolder(reuse_threshold, start_percent, end_percent, subsample_factor=8, offload_cache_diff=False, verbose=verbose, output_channels=model.model.latent_format.latent_channels)
        model.add_wrapper_with_key(comfy.patcher_extension.WrappersMP.OUTER_SAMPLE, "easycache", easycache_sample_wrapper)
        model.add_wrapper_with_key(comfy.patcher_extension.WrappersMP.CALC_COND_BATCH, "easycache", easycache_calc_cond_batch_wrapper)
        model.add_wrapper_with_key(comfy.patcher_extension.WrappersMP.DIFFUSION_MODEL, "easycache", easycache_forward_wrapper)
@@ -358,7 +359,7 @@ class EasyCacheNode(io.ComfyNode):


 class LazyCacheHolder:
-    def __init__(self, reuse_threshold: float, start_percent: float, end_percent: float, subsample_factor: int, offload_cache_diff: bool, verbose: bool=False):
+    def __init__(self, reuse_threshold: float, start_percent: float, end_percent: float, subsample_factor: int, offload_cache_diff: bool, verbose: bool=False, output_channels: int=None):
        self.name = "LazyCache"
        self.reuse_threshold = reuse_threshold
        self.start_percent = start_percent
@@ -382,6 +383,7 @@ class LazyCacheHolder:
        self.approx_output_change_rates = []
        self.total_steps_skipped = 0
        self.state_metadata = None
+        self.output_channels = output_channels

    def has_cache_diff(self) -> bool:
        return self.cache_diff is not None
@@ -456,7 +458,7 @@ class LazyCacheHolder:
        return self

    def clone(self):
-        return LazyCacheHolder(self.reuse_threshold, self.start_percent, self.end_percent, self.subsample_factor, self.offload_cache_diff, self.verbose)
+        return LazyCacheHolder(self.reuse_threshold, self.start_percent, self.end_percent, self.subsample_factor, self.offload_cache_diff, self.verbose, output_channels=self.output_channels)

 class LazyCacheNode(io.ComfyNode):
    @classmethod
@@ -482,7 +484,7 @@ class LazyCacheNode(io.ComfyNode):
    @classmethod
    def execute(cls, model: io.Model.Type, reuse_threshold: float, start_percent: float, end_percent: float, verbose: bool) -> io.NodeOutput:
        model = model.clone()
-        model.model_options["transformer_options"]["easycache"] = LazyCacheHolder(reuse_threshold, start_percent, end_percent, subsample_factor=8, offload_cache_diff=False, verbose=verbose)
+        model.model_options["transformer_options"]["easycache"] = LazyCacheHolder(reuse_threshold, start_percent, end_percent, subsample_factor=8, offload_cache_diff=False, verbose=verbose, output_channels=model.model.latent_format.latent_channels)
        model.add_wrapper_with_key(comfy.patcher_extension.WrappersMP.OUTER_SAMPLE, "lazycache", easycache_sample_wrapper)
        model.add_wrapper_with_key(comfy.patcher_extension.WrappersMP.PREDICT_NOISE, "lazycache", lazycache_predict_noise_wrapper)
        return io.NodeOutput(model)
--- a/comfy_extras/nodes_nop.py
+++ b/comfy_extras/nodes_nop.py
@@ -0,0 +1,39 @@
+from comfy_api.latest import ComfyExtension, io
+from typing_extensions import override
+# If you write a node that is so useless that it breaks ComfyUI it will be featured in this exclusive list
+
+# "native" block swap nodes are placebo at best and break the ComfyUI memory management system.
+# They are also considered harmful because instead of users reporting issues with the built in
+# memory management they install these stupid nodes and complain even harder. Now it completely
+# breaks with some of the new ComfyUI memory optimizations so I have made the decision to NOP it
+# out of all workflows.
+class wanBlockSwap(io.ComfyNode):
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="wanBlockSwap",
+            category="",
+            description="NOP",
+            inputs=[
+                io.Model.Input("model"),
+            ],
+            outputs=[
+                io.Model.Output(),
+            ],
+            is_deprecated=True,
+        )
+
+    @classmethod
+    def execute(cls, model) -> io.NodeOutput:
+        return io.NodeOutput(model)
+
+
+class NopExtension(ComfyExtension):
+    @override
+    async def get_node_list(self) -> list[type[io.ComfyNode]]:
+        return [
+            wanBlockSwap
+        ]
+
+async def comfy_entrypoint() -> NopExtension:
+    return NopExtension()
--- a/comfyui_version.py
+++ b/comfyui_version.py
@@ -1,3 +1,3 @@
 # This file is automatically generated by the build process when version is
 # updated in pyproject.toml.
-__version__ = "0.3.67"
+__version__ = "0.3.70"
--- a/nodes.py
+++ b/nodes.py
@@ -2330,6 +2330,7 @@ async def init_builtin_extra_nodes():
        "nodes_easycache.py",
        "nodes_audio_encoder.py",
        "nodes_rope.py",
+        "nodes_nop.py",
    ]

    import_failed = []
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "ComfyUI"
-version = "0.3.67"
+version = "0.3.70"
 readme = "README.md"
 license = { file = "LICENSE" }
 requires-python = ">=3.9"
@@ -24,7 +24,7 @@ lint.select = [
 exclude = ["*.ipynb", "**/generated/*.pyi"]

 [tool.pylint]
-master.py-version = "3.9"
+master.py-version = "3.10"
 master.extension-pkg-allow-list = [
  "pydantic",
 ]
--- a/server.py
+++ b/server.py
@@ -2,6 +2,7 @@ import os
 import sys
 import asyncio
 import traceback
+import time

 import nodes
 import folder_paths
@@ -733,6 +734,7 @@ class PromptServer():
                    for sensitive_val in execution.SENSITIVE_EXTRA_DATA_KEYS:
                        if sensitive_val in extra_data:
                            sensitive[sensitive_val] = extra_data.pop(sensitive_val)
+                    extra_data["create_time"] = int(time.time() * 1000)  # timestamp in milliseconds
                    self.prompt_queue.put((number, prompt_id, prompt, extra_data, outputs_to_execute, sensitive))
                    response = {"prompt_id": prompt_id, "number": number, "node_errors": valid[3]}
                    return web.json_response(response)
Author	SHA1	Message	Date
comfyanonymous	b5c8be8b1d	ComfyUI 0.3.70	2025-11-18 19:37:20 -05:00
Alexander Piskun	24fdb92edf	feat(api-nodes): add new Gemini model (#10789 )	2025-11-18 14:26:44 -08:00
comfyanonymous	d526974576	Fix hunyuan 3d 2.0 (#10792 )	2025-11-18 16:46:19 -05:00
Jukka Seppänen	e1ab6bb394	EasyCache: Fix for mismatch in input/output channels with some models (#10788 ) Slices model input with output channels so the caching tracks only the noise channels, resolves channel mismatch with models like WanVideo I2V Also fix for slicing deprecation in pytorch 2.9	2025-11-18 07:00:21 -08:00
Alexander Piskun	048f49adbd	chore(api-nodes): adjusted PR template; set min python version for pylint to 3.10 (#10787 )	2025-11-18 03:59:27 -08:00
comfyanonymous	47bfd5a33f	Native block swap custom nodes considered harmful. (#10783 )	2025-11-18 00:26:44 -05:00
ComfyUI Wiki	fdf49a2861	Fix the portable download link for CUDA 12.6 (#10780 )	2025-11-17 22:04:06 -05:00
comfyanonymous	f41e5f398d	Update README with new portable download link (#10778 )	2025-11-17 19:59:19 -05:00
comfyanonymous	27cbac865e	Add release workflow for NVIDIA cu126 (#10777 )	2025-11-17 19:04:04 -05:00
comfyanonymous	3d0003c24c	ComfyUI version 0.3.69	2025-11-17 17:17:24 -05:00
comfyanonymous	7d6103325e	Change ROCm nightly install command to 7.1 (#10764 )	2025-11-16 03:01:14 -05:00
Alexander Piskun	2d4a08b717	Revert "chore(api-nodes): mark OpenAIDalle2 and OpenAIDalle3 nodes as deprecated (#10757 )" (#10759 ) This reverts commit `9a02382568`.	2025-11-15 12:37:34 -08:00
Alexander Piskun	9a02382568	chore(api-nodes): mark OpenAIDalle2 and OpenAIDalle3 nodes as deprecated (#10757 )	2025-11-15 11:18:49 -08:00
comfyanonymous	bd01d9f7fd	Add left padding support to tokenizers. (#10753 )	2025-11-15 06:54:40 -05:00
comfyanonymous	443056c401	Fix custom nodes import error. (#10747 ) This should fix the import errors but will break if the custom nodes actually try to use the class.	2025-11-14 03:26:05 -05:00
comfyanonymous	f60923590c	Use same code for chroma and flux blocks so that optimizations are shared. (#10746 )	2025-11-14 01:28:05 -05:00
comfyanonymous	1ef328c007	Better instructions for the portable. (#10743 )	2025-11-13 21:32:39 -05:00
rattus	94c298f962	flux: reduce VRAM usage (#10737 ) Cleanup a bunch of stack tensors on Flux. This take me from B=19 to B=22 for 1600x1600 on RTX5090.	2025-11-13 16:02:03 -08:00
ric-yu	2fde9597f4	feat: add create_time dict to prompt field in /history and /queue (#10741 )	2025-11-13 15:11:52 -08:00
Alexander Piskun	f91078b1ff	add PR template for API-Nodes (#10736 )	2025-11-13 10:05:26 -08:00
contentis	3b3ef9a77a	Quantized Ops fixes (#10715 ) * offload support, bug fixes, remove mixins * add readme	2025-11-12 18:26:52 -05:00
comfyanonymous	8b0b93df51	Update Python 3.14 compatibility notes in README (#10730 )	2025-11-12 17:04:41 -05:00
rattus	1c7eaeca10	qwen: reduce VRAM usage (#10725 ) Clean up a bunch of stacked and no-longer-needed tensors on the QWEN VRAM peak (currently FFN). With this I go from OOMing at B=37x1328x1328 to being able to succesfully run B=47 (RTX5090).	2025-11-12 16:20:53 -05:00
rattus	18e7d6dba5	mm/mp: always unload re-used but modified models (#10724 ) The partial unloader path in model re-use flow skips straight to the actual unload without any check of the patching UUID. This means that if you do an upscale flow with a model patch on an existing model, it will not apply your patchings. Fix by delaying the partial_unload until after the uuid checks. This is done by making partial_unload a model of partial_load where extra_mem is -ve.	2025-11-12 16:19:53 -05:00
Qiacheng Li	e1d85e7577	Update README.md for Intel Arc GPU installation, remove IPEX (#10729 ) IPEX is no longer needed for Intel Arc GPUs. Removing instruction to setup ipex.	2025-11-12 15:21:05 -05:00
comfyanonymous	1199411747	Don't pin tensor if not a torch.nn.parameter.Parameter (#10718 )	2025-11-11 19:33:30 -05:00
comfyanonymous	5ebcab3c7d	Update CI workflow to remove dead macOS runner. (#10704 ) * Update CI workflow to remove dead macOS runner. * revert * revert	2025-11-10 15:35:29 -05:00
rattus	c350009236	ops: Put weight cast on the offload stream (#10697 ) This needs to be on the offload stream. This reproduced a black screen with low resolution images on a slow bus when using FP8.	2025-11-09 22:52:11 -05:00
comfyanonymous	dea899f221	Unload weights if vram usage goes up between runs. (#10690 )	2025-11-09 18:51:33 -05:00
comfyanonymous	e632e5de28	Add logging for model unloading. (#10692 )	2025-11-09 18:06:39 -05:00
comfyanonymous	2abd2b5c20	Make ScaleROPE node work on Flux. (#10686 )	2025-11-08 15:52:02 -05:00
comfyanonymous	a1a70362ca	Only unpin tensor if it was pinned by ComfyUI (#10677 )	2025-11-07 11:15:05 -05:00
rattus	cf97b033ee	mm: guard against double pin and unpin explicitly (#10672 ) As commented, if you let cuda be the one to detect double pin/unpinning it actually creates an asyc GPU error.	2025-11-06 21:20:48 -05:00
comfyanonymous	eb1c42f649	Tell users they need to upload their logs in bug reports. (#10671 )	2025-11-06 20:24:28 -05:00
comfyanonymous	e05c907126	Clarify release cycle. (#10667 )	2025-11-06 04:11:30 -05:00
comfyanonymous	09dc24c8a9	Pinned mem also seems to work on AMD. (#10658 )	2025-11-05 19:11:15 -05:00
comfyanonymous	1d69245981	Enable pinned memory by default on Nvidia. (#10656 ) Removed the --fast pinned_memory flag. You can use --disable-pinned-memory to disable it. Please report if it causes any issues.	2025-11-05 18:08:13 -05:00
comfyanonymous	97f198e421	Fix qwen controlnet regression. (#10657 )	2025-11-05 18:07:35 -05:00
Alexander Piskun	bda0eb2448	feat(API-nodes): move Rodin3D nodes to new client; removed old api client.py (#10645 )	2025-11-05 02:16:00 -08:00
comfyanonymous	c4a6b389de	Lower ltxv mem usage to what it was before previous pr. (#10643 ) Bring back qwen behavior to what it was before previous pr.	2025-11-04 22:47:35 -05:00
contentis	4cd881866b	Use single apply_rope function across models (#10547 )	2025-11-04 20:10:11 -05:00
comfyanonymous	265adad858	ComfyUI version v0.3.68	2025-11-04 19:42:23 -05:00
comfyanonymous	7f3e4d486c	Limit amount of pinned memory on windows to prevent issues. (#10638 )	2025-11-04 17:37:50 -05:00
rattus	a389ee01bb	caching: Handle None outputs tuple case (#10637 )	2025-11-04 14:14:10 -08:00