Merge remote-tracking branch 'origin/master' into feat/cache-provider-api

This commit is contained in:
Deep Mehta
2026-01-28 13:30:03 +05:30
165 changed files with 10585 additions and 1620 deletions

View File

@@ -20,7 +20,7 @@ jobs:
git_tag: ${{ inputs.git_tag }}
cache_tag: "cu130"
python_minor: "13"
python_patch: "9"
python_patch: "11"
rel_name: "nvidia"
rel_extra_name: ""
test_release: true
@@ -65,11 +65,11 @@ jobs:
contents: "write"
packages: "write"
pull-requests: "read"
name: "Release AMD ROCm 7.1.1"
name: "Release AMD ROCm 7.2"
uses: ./.github/workflows/stable-release.yml
with:
git_tag: ${{ inputs.git_tag }}
cache_tag: "rocm711"
cache_tag: "rocm72"
python_minor: "12"
python_patch: "10"
rel_name: "amd"

View File

@@ -13,7 +13,7 @@ jobs:
- name: Checkout ComfyUI
uses: actions/checkout@v4
with:
repository: "comfyanonymous/ComfyUI"
repository: "Comfy-Org/ComfyUI"
path: "ComfyUI"
- uses: actions/setup-python@v4
with:

View File

@@ -0,0 +1,59 @@
name: "CI: Update CI Container"
on:
release:
types: [published]
workflow_dispatch:
inputs:
version:
description: 'ComfyUI version (e.g., v0.7.0)'
required: true
type: string
jobs:
update-ci-container:
runs-on: ubuntu-latest
# Skip pre-releases unless manually triggered
if: github.event_name == 'workflow_dispatch' || !github.event.release.prerelease
steps:
- name: Get version
id: version
run: |
if [ "${{ github.event_name }}" = "release" ]; then
VERSION="${{ github.event.release.tag_name }}"
else
VERSION="${{ inputs.version }}"
fi
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Checkout comfyui-ci-container
uses: actions/checkout@v4
with:
repository: comfy-org/comfyui-ci-container
token: ${{ secrets.CI_CONTAINER_PAT }}
- name: Check current version
id: current
run: |
CURRENT=$(grep -oP 'ARG COMFYUI_VERSION=\K.*' Dockerfile || echo "unknown")
echo "current_version=$CURRENT" >> $GITHUB_OUTPUT
- name: Update Dockerfile
run: |
VERSION="${{ steps.version.outputs.version }}"
sed -i "s/^ARG COMFYUI_VERSION=.*/ARG COMFYUI_VERSION=${VERSION}/" Dockerfile
- name: Create Pull Request
id: create-pr
uses: peter-evans/create-pull-request@v7
with:
token: ${{ secrets.CI_CONTAINER_PAT }}
branch: automation/comfyui-${{ steps.version.outputs.version }}
title: "chore: bump ComfyUI to ${{ steps.version.outputs.version }}"
body: |
Updates ComfyUI version from `${{ steps.current.outputs.current_version }}` to `${{ steps.version.outputs.version }}`
**Triggered by:** ${{ github.event_name == 'release' && format('[Release {0}]({1})', github.event.release.tag_name, github.event.release.html_url) || 'Manual workflow dispatch' }}
labels: automation
commit-message: "chore: bump ComfyUI to ${{ steps.version.outputs.version }}"

View File

@@ -108,7 +108,7 @@ See what ComfyUI can do with the [example workflows](https://comfyanonymous.gith
- [LCM models and Loras](https://comfyanonymous.github.io/ComfyUI_examples/lcm/)
- Latent previews with [TAESD](#how-to-show-high-quality-previews)
- Works fully offline: core will never download anything unless you want to.
- Optional API nodes to use paid models from external providers through the online [Comfy API](https://docs.comfy.org/tutorials/api-nodes/overview).
- Optional API nodes to use paid models from external providers through the online [Comfy API](https://docs.comfy.org/tutorials/api-nodes/overview) disable with: `--disable-api-nodes`
- [Config file](extra_model_paths.yaml.example) to set the search paths for models.
Workflow examples can be found on the [Examples page](https://comfyanonymous.github.io/ComfyUI_examples/)
@@ -183,7 +183,7 @@ Simply download, extract with [7-Zip](https://7-zip.org) or with the windows exp
If you have trouble extracting it, right click the file -> properties -> unblock
Update your Nvidia drivers if it doesn't start.
The portable above currently comes with python 3.13 and pytorch cuda 13.0. Update your Nvidia drivers if it doesn't start.
#### Alternative Downloads:
@@ -208,11 +208,11 @@ comfy install
## Manual Install (Windows, Linux)
Python 3.14 works but you may encounter issues with the torch compile node. The free threaded variant is still missing some dependencies.
Python 3.14 works but some custom nodes may have issues. The free threaded variant works but some dependencies will enable the GIL so it's not fully supported.
Python 3.13 is very well supported. If you have trouble with some custom node dependencies on 3.13 you can try 3.12
torch 2.4 and above is supported but some features might only work on newer versions. We generally recommend using the latest major version of pytorch unless it is less than 2 weeks old.
torch 2.4 and above is supported but some features and optimizations might only work on newer versions. We generally recommend using the latest major version of pytorch with the latest cuda version unless it is less than 2 weeks old.
### Instructions:
@@ -229,7 +229,7 @@ AMD users can install rocm and pytorch with pip if you don't have it already ins
```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.4```
This is the command to install the nightly with ROCm 7.0 which might have some performance improvements:
This is the command to install the nightly with ROCm 7.1 which might have some performance improvements:
```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.1```
@@ -240,7 +240,7 @@ These have less hardware support than the builds above but they work on windows.
RDNA 3 (RX 7000 series):
```pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx110X-dgpu/```
```pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/```
RDNA 3.5 (Strix halo/Ryzen AI Max+ 365):

View File

@@ -92,14 +92,23 @@ def seed_from_paths_batch(
session.execute(ins_asset, chunk)
# try to claim AssetCacheState (file_path)
winners_by_path: set[str] = set()
# Insert with ON CONFLICT DO NOTHING, then query to find which paths were actually inserted
ins_state = (
sqlite.insert(AssetCacheState)
.on_conflict_do_nothing(index_elements=[AssetCacheState.file_path])
.returning(AssetCacheState.file_path)
)
for chunk in _iter_chunks(state_rows, _rows_per_stmt(3)):
winners_by_path.update((session.execute(ins_state, chunk)).scalars().all())
session.execute(ins_state, chunk)
# Query to find which of our paths won (were actually inserted)
winners_by_path: set[str] = set()
for chunk in _iter_chunks(path_list, MAX_BIND_PARAMS):
result = session.execute(
sqlalchemy.select(AssetCacheState.file_path)
.where(AssetCacheState.file_path.in_(chunk))
.where(AssetCacheState.asset_id.in_([path_to_asset[p] for p in chunk]))
)
winners_by_path.update(result.scalars().all())
all_paths_set = set(path_list)
losers_by_path = all_paths_set - winners_by_path
@@ -112,16 +121,23 @@ def seed_from_paths_batch(
return {"inserted_infos": 0, "won_states": 0, "lost_states": len(losers_by_path)}
# insert AssetInfo only for winners
# Insert with ON CONFLICT DO NOTHING, then query to find which were actually inserted
winner_info_rows = [asset_to_info[path_to_asset[p]] for p in winners_by_path]
ins_info = (
sqlite.insert(AssetInfo)
.on_conflict_do_nothing(index_elements=[AssetInfo.asset_id, AssetInfo.owner_id, AssetInfo.name])
.returning(AssetInfo.id)
)
inserted_info_ids: set[str] = set()
for chunk in _iter_chunks(winner_info_rows, _rows_per_stmt(9)):
inserted_info_ids.update((session.execute(ins_info, chunk)).scalars().all())
session.execute(ins_info, chunk)
# Query to find which info rows were actually inserted (by matching our generated IDs)
all_info_ids = [row["id"] for row in winner_info_rows]
inserted_info_ids: set[str] = set()
for chunk in _iter_chunks(all_info_ids, MAX_BIND_PARAMS):
result = session.execute(
sqlalchemy.select(AssetInfo.id).where(AssetInfo.id.in_(chunk))
)
inserted_info_ids.update(result.scalars().all())
# build and insert tag + meta rows for the AssetInfo
tag_rows: list[dict] = []

View File

@@ -10,6 +10,7 @@ import hashlib
class Source:
custom_node = "custom_node"
templates = "templates"
class SubgraphEntry(TypedDict):
source: str
@@ -38,6 +39,18 @@ class CustomNodeSubgraphEntryInfo(TypedDict):
class SubgraphManager:
def __init__(self):
self.cached_custom_node_subgraphs: dict[SubgraphEntry] | None = None
self.cached_blueprint_subgraphs: dict[SubgraphEntry] | None = None
def _create_entry(self, file: str, source: str, node_pack: str) -> tuple[str, SubgraphEntry]:
"""Create a subgraph entry from a file path. Expects normalized path (forward slashes)."""
entry_id = hashlib.sha256(f"{source}{file}".encode()).hexdigest()
entry: SubgraphEntry = {
"source": source,
"name": os.path.splitext(os.path.basename(file))[0],
"path": file,
"info": {"node_pack": node_pack},
}
return entry_id, entry
async def load_entry_data(self, entry: SubgraphEntry):
with open(entry['path'], 'r') as f:
@@ -60,53 +73,60 @@ class SubgraphManager:
return entries
async def get_custom_node_subgraphs(self, loadedModules, force_reload=False):
# if not forced to reload and cached, return cache
"""Load subgraphs from custom nodes."""
if not force_reload and self.cached_custom_node_subgraphs is not None:
return self.cached_custom_node_subgraphs
# Load subgraphs from custom nodes
subfolder = "subgraphs"
subgraphs_dict: dict[SubgraphEntry] = {}
subgraphs_dict: dict[SubgraphEntry] = {}
for folder in folder_paths.get_folder_paths("custom_nodes"):
pattern = os.path.join(folder, f"*/{subfolder}/*.json")
matched_files = glob.glob(pattern)
for file in matched_files:
# replace backslashes with forward slashes
pattern = os.path.join(folder, "*/subgraphs/*.json")
for file in glob.glob(pattern):
file = file.replace('\\', '/')
info: CustomNodeSubgraphEntryInfo = {
"node_pack": "custom_nodes." + file.split('/')[-3]
}
source = Source.custom_node
# hash source + path to make sure id will be as unique as possible, but
# reproducible across backend reloads
id = hashlib.sha256(f"{source}{file}".encode()).hexdigest()
entry: SubgraphEntry = {
"source": Source.custom_node,
"name": os.path.splitext(os.path.basename(file))[0],
"path": file,
"info": info,
}
subgraphs_dict[id] = entry
node_pack = "custom_nodes." + file.split('/')[-3]
entry_id, entry = self._create_entry(file, Source.custom_node, node_pack)
subgraphs_dict[entry_id] = entry
self.cached_custom_node_subgraphs = subgraphs_dict
return subgraphs_dict
async def get_custom_node_subgraph(self, id: str, loadedModules):
subgraphs = await self.get_custom_node_subgraphs(loadedModules)
entry: SubgraphEntry = subgraphs.get(id, None)
if entry is not None and entry.get('data', None) is None:
async def get_blueprint_subgraphs(self, force_reload=False):
"""Load subgraphs from the blueprints directory."""
if not force_reload and self.cached_blueprint_subgraphs is not None:
return self.cached_blueprint_subgraphs
subgraphs_dict: dict[SubgraphEntry] = {}
blueprints_dir = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'blueprints')
if os.path.exists(blueprints_dir):
for file in glob.glob(os.path.join(blueprints_dir, "*.json")):
file = file.replace('\\', '/')
entry_id, entry = self._create_entry(file, Source.templates, "comfyui")
subgraphs_dict[entry_id] = entry
self.cached_blueprint_subgraphs = subgraphs_dict
return subgraphs_dict
async def get_all_subgraphs(self, loadedModules, force_reload=False):
"""Get all subgraphs from all sources (custom nodes and blueprints)."""
custom_node_subgraphs = await self.get_custom_node_subgraphs(loadedModules, force_reload)
blueprint_subgraphs = await self.get_blueprint_subgraphs(force_reload)
return {**custom_node_subgraphs, **blueprint_subgraphs}
async def get_subgraph(self, id: str, loadedModules):
"""Get a specific subgraph by ID from any source."""
entry = (await self.get_all_subgraphs(loadedModules)).get(id)
if entry is not None and entry.get('data') is None:
await self.load_entry_data(entry)
return entry
def add_routes(self, routes, loadedModules):
@routes.get("/global_subgraphs")
async def get_global_subgraphs(request):
subgraphs_dict = await self.get_custom_node_subgraphs(loadedModules)
# NOTE: we may want to include other sources of global subgraphs such as templates in the future;
# that's the reasoning for the current implementation
subgraphs_dict = await self.get_all_subgraphs(loadedModules)
return web.json_response(await self.sanitize_entries(subgraphs_dict, remove_data=True))
@routes.get("/global_subgraphs/{id}")
async def get_global_subgraph(request):
id = request.match_info.get("id", None)
subgraph = await self.get_custom_node_subgraph(id, loadedModules)
subgraph = await self.get_subgraph(id, loadedModules)
return web.json_response(await self.sanitize_entry(subgraph))

View File

View File

@@ -1,6 +1,7 @@
import torch
from comfy.ldm.modules.attention import optimized_attention_for_device
import comfy.ops
import math
def clip_preprocess(image, size=224, mean=[0.48145466, 0.4578275, 0.40821073], std=[0.26862954, 0.26130258, 0.27577711], crop=True):
image = image[:, :, :, :3] if image.shape[3] > 3 else image
@@ -21,6 +22,39 @@ def clip_preprocess(image, size=224, mean=[0.48145466, 0.4578275, 0.40821073], s
image = torch.clip((255. * image), 0, 255).round() / 255.0
return (image - mean.view([3,1,1])) / std.view([3,1,1])
def siglip2_flex_calc_resolution(oh, ow, patch_size, max_num_patches, eps=1e-5):
def scale_dim(size, scale):
scaled = math.ceil(size * scale / patch_size) * patch_size
return max(patch_size, int(scaled))
# Binary search for optimal scale
lo, hi = eps / 10, 100.0
while hi - lo >= eps:
mid = (lo + hi) / 2
h, w = scale_dim(oh, mid), scale_dim(ow, mid)
if (h // patch_size) * (w // patch_size) <= max_num_patches:
lo = mid
else:
hi = mid
return scale_dim(oh, lo), scale_dim(ow, lo)
def siglip2_preprocess(image, size, patch_size, num_patches, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5], crop=True):
if size > 0:
return clip_preprocess(image, size=size, mean=mean, std=std, crop=crop)
image = image[:, :, :, :3] if image.shape[3] > 3 else image
mean = torch.tensor(mean, device=image.device, dtype=image.dtype)
std = torch.tensor(std, device=image.device, dtype=image.dtype)
image = image.movedim(-1, 1)
b, c, h, w = image.shape
h, w = siglip2_flex_calc_resolution(h, w, patch_size, num_patches)
image = torch.nn.functional.interpolate(image, size=(h, w), mode="bilinear", antialias=True)
image = torch.clip((255. * image), 0, 255).round() / 255.0
return (image - mean.view([3, 1, 1])) / std.view([3, 1, 1])
class CLIPAttention(torch.nn.Module):
def __init__(self, embed_dim, heads, dtype, device, operations):
super().__init__()
@@ -175,6 +209,27 @@ class CLIPTextModel(torch.nn.Module):
out = self.text_projection(x[2])
return (x[0], x[1], out, x[2])
def siglip2_pos_embed(embed_weight, embeds, orig_shape):
embed_weight_len = round(embed_weight.shape[0] ** 0.5)
embed_weight = comfy.ops.cast_to_input(embed_weight, embeds).movedim(1, 0).reshape(1, -1, embed_weight_len, embed_weight_len)
embed_weight = torch.nn.functional.interpolate(embed_weight, size=orig_shape, mode="bilinear", align_corners=False, antialias=True)
embed_weight = embed_weight.reshape(-1, embed_weight.shape[-2] * embed_weight.shape[-1]).movedim(0, 1)
return embeds + embed_weight
class Siglip2Embeddings(torch.nn.Module):
def __init__(self, embed_dim, num_channels=3, patch_size=14, image_size=224, model_type="", num_patches=None, dtype=None, device=None, operations=None):
super().__init__()
self.patch_embedding = operations.Linear(num_channels * patch_size * patch_size, embed_dim, dtype=dtype, device=device)
self.position_embedding = operations.Embedding(num_patches, embed_dim, dtype=dtype, device=device)
self.patch_size = patch_size
def forward(self, pixel_values):
b, c, h, w = pixel_values.shape
img = pixel_values.movedim(1, -1).reshape(b, h // self.patch_size, self.patch_size, w // self.patch_size, self.patch_size, c)
img = img.permute(0, 1, 3, 2, 4, 5)
img = img.reshape(b, img.shape[1] * img.shape[2], -1)
img = self.patch_embedding(img)
return siglip2_pos_embed(self.position_embedding.weight, img, (h // self.patch_size, w // self.patch_size))
class CLIPVisionEmbeddings(torch.nn.Module):
def __init__(self, embed_dim, num_channels=3, patch_size=14, image_size=224, model_type="", dtype=None, device=None, operations=None):
@@ -218,8 +273,11 @@ class CLIPVision(torch.nn.Module):
intermediate_activation = config_dict["hidden_act"]
model_type = config_dict["model_type"]
self.embeddings = CLIPVisionEmbeddings(embed_dim, config_dict["num_channels"], config_dict["patch_size"], config_dict["image_size"], model_type=model_type, dtype=dtype, device=device, operations=operations)
if model_type == "siglip_vision_model":
if model_type in ["siglip2_vision_model"]:
self.embeddings = Siglip2Embeddings(embed_dim, config_dict["num_channels"], config_dict["patch_size"], config_dict["image_size"], model_type=model_type, num_patches=config_dict.get("num_patches", None), dtype=dtype, device=device, operations=operations)
else:
self.embeddings = CLIPVisionEmbeddings(embed_dim, config_dict["num_channels"], config_dict["patch_size"], config_dict["image_size"], model_type=model_type, dtype=dtype, device=device, operations=operations)
if model_type in ["siglip_vision_model", "siglip2_vision_model"]:
self.pre_layrnorm = lambda a: a
self.output_layernorm = True
else:

View File

@@ -21,6 +21,7 @@ clip_preprocess = comfy.clip_model.clip_preprocess # Prevent some stuff from br
IMAGE_ENCODERS = {
"clip_vision_model": comfy.clip_model.CLIPVisionModelProjection,
"siglip_vision_model": comfy.clip_model.CLIPVisionModelProjection,
"siglip2_vision_model": comfy.clip_model.CLIPVisionModelProjection,
"dinov2": comfy.image_encoders.dino2.Dinov2Model,
}
@@ -32,9 +33,10 @@ class ClipVisionModel():
self.image_size = config.get("image_size", 224)
self.image_mean = config.get("image_mean", [0.48145466, 0.4578275, 0.40821073])
self.image_std = config.get("image_std", [0.26862954, 0.26130258, 0.27577711])
model_type = config.get("model_type", "clip_vision_model")
model_class = IMAGE_ENCODERS.get(model_type)
if model_type == "siglip_vision_model":
self.model_type = config.get("model_type", "clip_vision_model")
self.config = config.copy()
model_class = IMAGE_ENCODERS.get(self.model_type)
if self.model_type == "siglip_vision_model":
self.return_all_hidden_states = True
else:
self.return_all_hidden_states = False
@@ -55,12 +57,16 @@ class ClipVisionModel():
def encode_image(self, image, crop=True):
comfy.model_management.load_model_gpu(self.patcher)
pixel_values = comfy.clip_model.clip_preprocess(image.to(self.load_device), size=self.image_size, mean=self.image_mean, std=self.image_std, crop=crop).float()
if self.model_type == "siglip2_vision_model":
pixel_values = comfy.clip_model.siglip2_preprocess(image.to(self.load_device), size=self.image_size, patch_size=self.config.get("patch_size", 16), num_patches=self.config.get("num_patches", 256), mean=self.image_mean, std=self.image_std, crop=crop).float()
else:
pixel_values = comfy.clip_model.clip_preprocess(image.to(self.load_device), size=self.image_size, mean=self.image_mean, std=self.image_std, crop=crop).float()
out = self.model(pixel_values=pixel_values, intermediate_output='all' if self.return_all_hidden_states else -2)
outputs = Output()
outputs["last_hidden_state"] = out[0].to(comfy.model_management.intermediate_device())
outputs["image_embeds"] = out[2].to(comfy.model_management.intermediate_device())
outputs["image_sizes"] = [pixel_values.shape[1:]] * pixel_values.shape[0]
if self.return_all_hidden_states:
all_hs = out[1].to(comfy.model_management.intermediate_device())
outputs["penultimate_hidden_states"] = all_hs[:, -2]
@@ -107,10 +113,14 @@ def load_clipvision_from_sd(sd, prefix="", convert_keys=False):
elif "vision_model.encoder.layers.22.layer_norm1.weight" in sd:
embed_shape = sd["vision_model.embeddings.position_embedding.weight"].shape[0]
if sd["vision_model.encoder.layers.0.layer_norm1.weight"].shape[0] == 1152:
if embed_shape == 729:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_siglip_384.json")
elif embed_shape == 1024:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_siglip_512.json")
patch_embedding_shape = sd["vision_model.embeddings.patch_embedding.weight"].shape
if len(patch_embedding_shape) == 2:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_siglip2_base_naflex.json")
else:
if embed_shape == 729:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_siglip_384.json")
elif embed_shape == 1024:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_siglip_512.json")
elif embed_shape == 577:
if "multi_modal_projector.linear_1.bias" in sd:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_config_vitl_336_llava.json")

View File

@@ -0,0 +1,14 @@
{
"num_channels": 3,
"hidden_act": "gelu_pytorch_tanh",
"hidden_size": 1152,
"image_size": -1,
"intermediate_size": 4304,
"model_type": "siglip2_vision_model",
"num_attention_heads": 16,
"num_hidden_layers": 27,
"patch_size": 16,
"num_patches": 256,
"image_mean": [0.5, 0.5, 0.5],
"image_std": [0.5, 0.5, 0.5]
}

View File

@@ -236,6 +236,8 @@ class ComfyNodeABC(ABC):
"""Flags a node as experimental, informing users that it may change or not work as expected."""
DEPRECATED: bool
"""Flags a node as deprecated, indicating to users that they should find alternatives to this node."""
DEV_ONLY: bool
"""Flags a node as dev-only, hiding it from search/menus unless dev mode is enabled."""
API_NODE: Optional[bool]
"""Flags a node as an API node. See: https://docs.comfy.org/tutorials/api-nodes/overview."""

View File

@@ -65,3 +65,147 @@ def stochastic_rounding(value, dtype, seed=0):
return output
return value.to(dtype=dtype)
# TODO: improve this?
def stochastic_float_to_fp4_e2m1(x, generator):
orig_shape = x.shape
sign = torch.signbit(x).to(torch.uint8)
exp = torch.floor(torch.log2(x.abs()) + 1.0).clamp(0, 3)
x += (torch.rand(x.size(), dtype=x.dtype, layout=x.layout, device=x.device, generator=generator) - 0.5) * (2 ** (exp - 2.0)) * 1.25
x = x.abs()
exp = torch.floor(torch.log2(x) + 1.1925).clamp(0, 3)
mantissa = torch.where(
exp > 0,
(x / (2.0 ** (exp - 1)) - 1.0) * 2.0,
(x * 2.0),
out=x
).round().to(torch.uint8)
del x
exp = exp.to(torch.uint8)
fp4 = (sign << 3) | (exp << 1) | mantissa
del sign, exp, mantissa
fp4_flat = fp4.view(-1)
packed = (fp4_flat[0::2] << 4) | fp4_flat[1::2]
return packed.reshape(list(orig_shape)[:-1] + [-1])
def to_blocked(input_matrix, flatten: bool = True) -> torch.Tensor:
"""
Rearrange a large matrix by breaking it into blocks and applying the rearrangement pattern.
See:
https://docs.nvidia.com/cuda/cublas/index.html#d-block-scaling-factors-layout
Args:
input_matrix: Input tensor of shape (H, W)
Returns:
Rearranged tensor of shape (32*ceil_div(H,128), 16*ceil_div(W,4))
"""
def ceil_div(a, b):
return (a + b - 1) // b
rows, cols = input_matrix.shape
n_row_blocks = ceil_div(rows, 128)
n_col_blocks = ceil_div(cols, 4)
# Calculate the padded shape
padded_rows = n_row_blocks * 128
padded_cols = n_col_blocks * 4
padded = input_matrix
if (rows, cols) != (padded_rows, padded_cols):
padded = torch.zeros(
(padded_rows, padded_cols),
device=input_matrix.device,
dtype=input_matrix.dtype,
)
padded[:rows, :cols] = input_matrix
# Rearrange the blocks
blocks = padded.view(n_row_blocks, 128, n_col_blocks, 4).permute(0, 2, 1, 3)
rearranged = blocks.reshape(-1, 4, 32, 4).transpose(1, 2).reshape(-1, 32, 16)
if flatten:
return rearranged.flatten()
return rearranged.reshape(padded_rows, padded_cols)
def stochastic_round_quantize_nvfp4_block(x, per_tensor_scale, generator):
F4_E2M1_MAX = 6.0
F8_E4M3_MAX = 448.0
orig_shape = x.shape
block_size = 16
x = x.reshape(orig_shape[0], -1, block_size)
scaled_block_scales_fp8 = torch.clamp(((torch.amax(torch.abs(x), dim=-1)) / F4_E2M1_MAX) / per_tensor_scale.to(x.dtype), max=F8_E4M3_MAX).to(torch.float8_e4m3fn)
x = x / (per_tensor_scale.to(x.dtype) * scaled_block_scales_fp8.to(x.dtype)).unsqueeze(-1)
x = x.view(orig_shape).nan_to_num()
data_lp = stochastic_float_to_fp4_e2m1(x, generator=generator)
return data_lp, scaled_block_scales_fp8
def stochastic_round_quantize_nvfp4(x, per_tensor_scale, pad_16x, seed=0):
def roundup(x: int, multiple: int) -> int:
"""Round up x to the nearest multiple."""
return ((x + multiple - 1) // multiple) * multiple
generator = torch.Generator(device=x.device)
generator.manual_seed(seed)
# Handle padding
if pad_16x:
rows, cols = x.shape
padded_rows = roundup(rows, 16)
padded_cols = roundup(cols, 16)
if padded_rows != rows or padded_cols != cols:
x = torch.nn.functional.pad(x, (0, padded_cols - cols, 0, padded_rows - rows))
x, blocked_scaled = stochastic_round_quantize_nvfp4_block(x, per_tensor_scale, generator)
return x, to_blocked(blocked_scaled, flatten=False)
def stochastic_round_quantize_nvfp4_by_block(x, per_tensor_scale, pad_16x, seed=0, block_size=4096 * 4096):
def roundup(x: int, multiple: int) -> int:
"""Round up x to the nearest multiple."""
return ((x + multiple - 1) // multiple) * multiple
orig_shape = x.shape
# Handle padding
if pad_16x:
rows, cols = x.shape
padded_rows = roundup(rows, 16)
padded_cols = roundup(cols, 16)
if padded_rows != rows or padded_cols != cols:
x = torch.nn.functional.pad(x, (0, padded_cols - cols, 0, padded_rows - rows))
# Note: We update orig_shape because the output tensor logic below assumes x.shape matches
# what we want to produce. If we pad here, we want the padded output.
orig_shape = x.shape
orig_shape = list(orig_shape)
output_fp4 = torch.empty(orig_shape[:-1] + [orig_shape[-1] // 2], dtype=torch.uint8, device=x.device)
output_block = torch.empty(orig_shape[:-1] + [orig_shape[-1] // 16], dtype=torch.float8_e4m3fn, device=x.device)
generator = torch.Generator(device=x.device)
generator.manual_seed(seed)
num_slices = max(1, (x.numel() / block_size))
slice_size = max(1, (round(x.shape[0] / num_slices)))
for i in range(0, x.shape[0], slice_size):
fp4, block = stochastic_round_quantize_nvfp4_block(x[i: i + slice_size], per_tensor_scale, generator=generator)
output_fp4[i:i + slice_size].copy_(fp4)
output_block[i:i + slice_size].copy_(block)
return output_fp4, to_blocked(output_block, flatten=False)

View File

@@ -8,6 +8,7 @@ class LatentFormat:
latent_rgb_factors_bias = None
latent_rgb_factors_reshape = None
taesd_decoder_name = None
spacial_downscale_ratio = 8
def process_in(self, latent):
return latent * self.scale_factor
@@ -181,6 +182,7 @@ class Flux(SD3):
class Flux2(LatentFormat):
latent_channels = 128
spacial_downscale_ratio = 16
def __init__(self):
self.latent_rgb_factors =[
@@ -592,6 +594,7 @@ class Wan22(Wan21):
class HunyuanImage21(LatentFormat):
latent_channels = 64
latent_dimensions = 2
spacial_downscale_ratio = 32
scale_factor = 0.75289
latent_rgb_factors = [
@@ -725,6 +728,7 @@ class HunyuanVideo15(LatentFormat):
latent_rgb_factors_bias = [ 0.0456, -0.0202, -0.0644]
latent_channels = 32
latent_dimensions = 3
spacial_downscale_ratio = 16
scale_factor = 1.03682
taesd_decoder_name = "lighttaehy1_5"
@@ -749,6 +753,7 @@ class ACEAudio(LatentFormat):
class ChromaRadiance(LatentFormat):
latent_channels = 3
spacial_downscale_ratio = 1
def __init__(self):
self.latent_rgb_factors = [

202
comfy/ldm/anima/model.py Normal file
View File

@@ -0,0 +1,202 @@
from comfy.ldm.cosmos.predict2 import MiniTrainDIT
import torch
from torch import nn
import torch.nn.functional as F
def rotate_half(x):
x1 = x[..., : x.shape[-1] // 2]
x2 = x[..., x.shape[-1] // 2 :]
return torch.cat((-x2, x1), dim=-1)
def apply_rotary_pos_emb(x, cos, sin, unsqueeze_dim=1):
cos = cos.unsqueeze(unsqueeze_dim)
sin = sin.unsqueeze(unsqueeze_dim)
x_embed = (x * cos) + (rotate_half(x) * sin)
return x_embed
class RotaryEmbedding(nn.Module):
def __init__(self, head_dim):
super().__init__()
self.rope_theta = 10000
inv_freq = 1.0 / (self.rope_theta ** (torch.arange(0, head_dim, 2, dtype=torch.int64).to(dtype=torch.float) / head_dim))
self.register_buffer("inv_freq", inv_freq, persistent=False)
@torch.no_grad()
def forward(self, x, position_ids):
inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1).to(x.device)
position_ids_expanded = position_ids[:, None, :].float()
device_type = x.device.type if isinstance(x.device.type, str) and x.device.type != "mps" else "cpu"
with torch.autocast(device_type=device_type, enabled=False): # Force float32
freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
emb = torch.cat((freqs, freqs), dim=-1)
cos = emb.cos()
sin = emb.sin()
return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
class Attention(nn.Module):
def __init__(self, query_dim, context_dim, n_heads, head_dim, device=None, dtype=None, operations=None):
super().__init__()
inner_dim = head_dim * n_heads
self.n_heads = n_heads
self.head_dim = head_dim
self.query_dim = query_dim
self.context_dim = context_dim
self.q_proj = operations.Linear(query_dim, inner_dim, bias=False, device=device, dtype=dtype)
self.q_norm = operations.RMSNorm(self.head_dim, eps=1e-6, device=device, dtype=dtype)
self.k_proj = operations.Linear(context_dim, inner_dim, bias=False, device=device, dtype=dtype)
self.k_norm = operations.RMSNorm(self.head_dim, eps=1e-6, device=device, dtype=dtype)
self.v_proj = operations.Linear(context_dim, inner_dim, bias=False, device=device, dtype=dtype)
self.o_proj = operations.Linear(inner_dim, query_dim, bias=False, device=device, dtype=dtype)
def forward(self, x, mask=None, context=None, position_embeddings=None, position_embeddings_context=None):
context = x if context is None else context
input_shape = x.shape[:-1]
q_shape = (*input_shape, self.n_heads, self.head_dim)
context_shape = context.shape[:-1]
kv_shape = (*context_shape, self.n_heads, self.head_dim)
query_states = self.q_norm(self.q_proj(x).view(q_shape)).transpose(1, 2)
key_states = self.k_norm(self.k_proj(context).view(kv_shape)).transpose(1, 2)
value_states = self.v_proj(context).view(kv_shape).transpose(1, 2)
if position_embeddings is not None:
assert position_embeddings_context is not None
cos, sin = position_embeddings
query_states = apply_rotary_pos_emb(query_states, cos, sin)
cos, sin = position_embeddings_context
key_states = apply_rotary_pos_emb(key_states, cos, sin)
attn_output = F.scaled_dot_product_attention(query_states, key_states, value_states, attn_mask=mask)
attn_output = attn_output.transpose(1, 2).reshape(*input_shape, -1).contiguous()
attn_output = self.o_proj(attn_output)
return attn_output
def init_weights(self):
torch.nn.init.zeros_(self.o_proj.weight)
class TransformerBlock(nn.Module):
def __init__(self, source_dim, model_dim, num_heads=16, mlp_ratio=4.0, use_self_attn=False, layer_norm=False, device=None, dtype=None, operations=None):
super().__init__()
self.use_self_attn = use_self_attn
if self.use_self_attn:
self.norm_self_attn = operations.LayerNorm(model_dim, device=device, dtype=dtype) if layer_norm else operations.RMSNorm(model_dim, eps=1e-6, device=device, dtype=dtype)
self.self_attn = Attention(
query_dim=model_dim,
context_dim=model_dim,
n_heads=num_heads,
head_dim=model_dim//num_heads,
device=device,
dtype=dtype,
operations=operations,
)
self.norm_cross_attn = operations.LayerNorm(model_dim, device=device, dtype=dtype) if layer_norm else operations.RMSNorm(model_dim, eps=1e-6, device=device, dtype=dtype)
self.cross_attn = Attention(
query_dim=model_dim,
context_dim=source_dim,
n_heads=num_heads,
head_dim=model_dim//num_heads,
device=device,
dtype=dtype,
operations=operations,
)
self.norm_mlp = operations.LayerNorm(model_dim, device=device, dtype=dtype) if layer_norm else operations.RMSNorm(model_dim, eps=1e-6, device=device, dtype=dtype)
self.mlp = nn.Sequential(
operations.Linear(model_dim, int(model_dim * mlp_ratio), device=device, dtype=dtype),
nn.GELU(),
operations.Linear(int(model_dim * mlp_ratio), model_dim, device=device, dtype=dtype)
)
def forward(self, x, context, target_attention_mask=None, source_attention_mask=None, position_embeddings=None, position_embeddings_context=None):
if self.use_self_attn:
normed = self.norm_self_attn(x)
attn_out = self.self_attn(normed, mask=target_attention_mask, position_embeddings=position_embeddings, position_embeddings_context=position_embeddings)
x = x + attn_out
normed = self.norm_cross_attn(x)
attn_out = self.cross_attn(normed, mask=source_attention_mask, context=context, position_embeddings=position_embeddings, position_embeddings_context=position_embeddings_context)
x = x + attn_out
x = x + self.mlp(self.norm_mlp(x))
return x
def init_weights(self):
torch.nn.init.zeros_(self.mlp[2].weight)
self.cross_attn.init_weights()
class LLMAdapter(nn.Module):
def __init__(
self,
source_dim=1024,
target_dim=1024,
model_dim=1024,
num_layers=6,
num_heads=16,
use_self_attn=True,
layer_norm=False,
device=None,
dtype=None,
operations=None,
):
super().__init__()
self.embed = operations.Embedding(32128, target_dim, device=device, dtype=dtype)
if model_dim != target_dim:
self.in_proj = operations.Linear(target_dim, model_dim, device=device, dtype=dtype)
else:
self.in_proj = nn.Identity()
self.rotary_emb = RotaryEmbedding(model_dim//num_heads)
self.blocks = nn.ModuleList([
TransformerBlock(source_dim, model_dim, num_heads=num_heads, use_self_attn=use_self_attn, layer_norm=layer_norm, device=device, dtype=dtype, operations=operations) for _ in range(num_layers)
])
self.out_proj = operations.Linear(model_dim, target_dim, device=device, dtype=dtype)
self.norm = operations.RMSNorm(target_dim, eps=1e-6, device=device, dtype=dtype)
def forward(self, source_hidden_states, target_input_ids, target_attention_mask=None, source_attention_mask=None):
if target_attention_mask is not None:
target_attention_mask = target_attention_mask.to(torch.bool)
if target_attention_mask.ndim == 2:
target_attention_mask = target_attention_mask.unsqueeze(1).unsqueeze(1)
if source_attention_mask is not None:
source_attention_mask = source_attention_mask.to(torch.bool)
if source_attention_mask.ndim == 2:
source_attention_mask = source_attention_mask.unsqueeze(1).unsqueeze(1)
x = self.in_proj(self.embed(target_input_ids))
context = source_hidden_states
position_ids = torch.arange(x.shape[1], device=x.device).unsqueeze(0)
position_ids_context = torch.arange(context.shape[1], device=x.device).unsqueeze(0)
position_embeddings = self.rotary_emb(x, position_ids)
position_embeddings_context = self.rotary_emb(x, position_ids_context)
for block in self.blocks:
x = block(x, context, target_attention_mask=target_attention_mask, source_attention_mask=source_attention_mask, position_embeddings=position_embeddings, position_embeddings_context=position_embeddings_context)
return self.norm(self.out_proj(x))
class Anima(MiniTrainDIT):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.llm_adapter = LLMAdapter(device=kwargs.get("device"), dtype=kwargs.get("dtype"), operations=kwargs.get("operations"))
def preprocess_text_embeds(self, text_embeds, text_ids):
if text_ids is not None:
return self.llm_adapter(text_embeds, text_ids)
else:
return text_embeds

View File

@@ -11,6 +11,69 @@ from comfy.ldm.lightricks.model import (
from comfy.ldm.lightricks.symmetric_patchifier import AudioPatchifier
import comfy.ldm.common_dit
class CompressedTimestep:
"""Store video timestep embeddings in compressed form using per-frame indexing."""
__slots__ = ('data', 'batch_size', 'num_frames', 'patches_per_frame', 'feature_dim')
def __init__(self, tensor: torch.Tensor, patches_per_frame: int):
"""
tensor: [batch_size, num_tokens, feature_dim] tensor where num_tokens = num_frames * patches_per_frame
patches_per_frame: Number of spatial patches per frame (height * width in latent space), or None to disable compression
"""
self.batch_size, num_tokens, self.feature_dim = tensor.shape
# Check if compression is valid (num_tokens must be divisible by patches_per_frame)
if patches_per_frame is not None and num_tokens % patches_per_frame == 0 and num_tokens >= patches_per_frame:
self.patches_per_frame = patches_per_frame
self.num_frames = num_tokens // patches_per_frame
# Reshape to [batch, frames, patches_per_frame, feature_dim] and store one value per frame
# All patches in a frame are identical, so we only keep the first one
reshaped = tensor.view(self.batch_size, self.num_frames, patches_per_frame, self.feature_dim)
self.data = reshaped[:, :, 0, :].contiguous() # [batch, frames, feature_dim]
else:
# Not divisible or too small - store directly without compression
self.patches_per_frame = 1
self.num_frames = num_tokens
self.data = tensor
def expand(self):
"""Expand back to original tensor."""
if self.patches_per_frame == 1:
return self.data
# [batch, frames, feature_dim] -> [batch, frames, patches_per_frame, feature_dim] -> [batch, tokens, feature_dim]
expanded = self.data.unsqueeze(2).expand(self.batch_size, self.num_frames, self.patches_per_frame, self.feature_dim)
return expanded.reshape(self.batch_size, -1, self.feature_dim)
def expand_for_computation(self, scale_shift_table: torch.Tensor, batch_size: int, indices: slice = slice(None, None)):
"""Compute ada values on compressed per-frame data, then expand spatially."""
num_ada_params = scale_shift_table.shape[0]
# No compression - compute directly
if self.patches_per_frame == 1:
num_tokens = self.data.shape[1]
dim_per_param = self.feature_dim // num_ada_params
reshaped = self.data.reshape(batch_size, num_tokens, num_ada_params, dim_per_param)[:, :, indices, :]
table_values = scale_shift_table[indices].unsqueeze(0).unsqueeze(0).to(device=self.data.device, dtype=self.data.dtype)
ada_values = (table_values + reshaped).unbind(dim=2)
return ada_values
# Compressed: compute on per-frame data then expand spatially
# Reshape: [batch, frames, feature_dim] -> [batch, frames, num_ada_params, dim_per_param]
frame_reshaped = self.data.reshape(batch_size, self.num_frames, num_ada_params, -1)[:, :, indices, :]
table_values = scale_shift_table[indices].unsqueeze(0).unsqueeze(0).to(
device=self.data.device, dtype=self.data.dtype
)
frame_ada = (table_values + frame_reshaped).unbind(dim=2)
# Expand each ada parameter spatially: [batch, frames, dim] -> [batch, frames, patches, dim] -> [batch, tokens, dim]
return tuple(
frame_val.unsqueeze(2).expand(batch_size, self.num_frames, self.patches_per_frame, -1)
.reshape(batch_size, -1, frame_val.shape[-1])
for frame_val in frame_ada
)
class BasicAVTransformerBlock(nn.Module):
def __init__(
self,
@@ -119,6 +182,9 @@ class BasicAVTransformerBlock(nn.Module):
def get_ada_values(
self, scale_shift_table: torch.Tensor, batch_size: int, timestep: torch.Tensor, indices: slice = slice(None, None)
):
if isinstance(timestep, CompressedTimestep):
return timestep.expand_for_computation(scale_shift_table, batch_size, indices)
num_ada_params = scale_shift_table.shape[0]
ada_values = (
@@ -146,28 +212,12 @@ class BasicAVTransformerBlock(nn.Module):
gate_timestep,
)
scale_shift_chunks = [t.squeeze(2) for t in scale_shift_ada_values]
gate_ada_values = [t.squeeze(2) for t in gate_ada_values]
return (*scale_shift_chunks, *gate_ada_values)
return (*scale_shift_ada_values, *gate_ada_values)
def forward(
self,
x: Tuple[torch.Tensor, torch.Tensor],
v_context=None,
a_context=None,
attention_mask=None,
v_timestep=None,
a_timestep=None,
v_pe=None,
a_pe=None,
v_cross_pe=None,
a_cross_pe=None,
v_cross_scale_shift_timestep=None,
a_cross_scale_shift_timestep=None,
v_cross_gate_timestep=None,
a_cross_gate_timestep=None,
transformer_options=None,
self, x: Tuple[torch.Tensor, torch.Tensor], v_context=None, a_context=None, attention_mask=None, v_timestep=None, a_timestep=None,
v_pe=None, a_pe=None, v_cross_pe=None, a_cross_pe=None, v_cross_scale_shift_timestep=None, a_cross_scale_shift_timestep=None,
v_cross_gate_timestep=None, a_cross_gate_timestep=None, transformer_options=None,
) -> Tuple[torch.Tensor, torch.Tensor]:
run_vx = transformer_options.get("run_vx", True)
run_ax = transformer_options.get("run_ax", True)
@@ -177,144 +227,102 @@ class BasicAVTransformerBlock(nn.Module):
run_a2v = run_vx and transformer_options.get("a2v_cross_attn", True) and ax.numel() > 0
run_v2a = run_ax and transformer_options.get("v2a_cross_attn", True)
# video
if run_vx:
vshift_msa, vscale_msa, vgate_msa = (
self.get_ada_values(self.scale_shift_table, vx.shape[0], v_timestep, slice(0, 3))
)
# video self-attention
vshift_msa, vscale_msa = (self.get_ada_values(self.scale_shift_table, vx.shape[0], v_timestep, slice(0, 2)))
norm_vx = comfy.ldm.common_dit.rms_norm(vx) * (1 + vscale_msa) + vshift_msa
vx += self.attn1(norm_vx, pe=v_pe, transformer_options=transformer_options) * vgate_msa
vx += self.attn2(
comfy.ldm.common_dit.rms_norm(vx),
context=v_context,
mask=attention_mask,
transformer_options=transformer_options,
)
del vshift_msa, vscale_msa, vgate_msa
del vshift_msa, vscale_msa
attn1_out = self.attn1(norm_vx, pe=v_pe, transformer_options=transformer_options)
del norm_vx
# video cross-attention
vgate_msa = self.get_ada_values(self.scale_shift_table, vx.shape[0], v_timestep, slice(2, 3))[0]
vx.addcmul_(attn1_out, vgate_msa)
del vgate_msa, attn1_out
vx.add_(self.attn2(comfy.ldm.common_dit.rms_norm(vx), context=v_context, mask=attention_mask, transformer_options=transformer_options))
# audio
if run_ax:
ashift_msa, ascale_msa, agate_msa = (
self.get_ada_values(self.audio_scale_shift_table, ax.shape[0], a_timestep, slice(0, 3))
)
# audio self-attention
ashift_msa, ascale_msa = (self.get_ada_values(self.audio_scale_shift_table, ax.shape[0], a_timestep, slice(0, 2)))
norm_ax = comfy.ldm.common_dit.rms_norm(ax) * (1 + ascale_msa) + ashift_msa
ax += (
self.audio_attn1(norm_ax, pe=a_pe, transformer_options=transformer_options)
* agate_msa
)
ax += self.audio_attn2(
comfy.ldm.common_dit.rms_norm(ax),
context=a_context,
mask=attention_mask,
transformer_options=transformer_options,
)
del ashift_msa, ascale_msa
attn1_out = self.audio_attn1(norm_ax, pe=a_pe, transformer_options=transformer_options)
del norm_ax
# audio cross-attention
agate_msa = self.get_ada_values(self.audio_scale_shift_table, ax.shape[0], a_timestep, slice(2, 3))[0]
ax.addcmul_(attn1_out, agate_msa)
del agate_msa, attn1_out
ax.add_(self.audio_attn2(comfy.ldm.common_dit.rms_norm(ax), context=a_context, mask=attention_mask, transformer_options=transformer_options))
del ashift_msa, ascale_msa, agate_msa
# Audio - Video cross attention.
# video - audio cross attention.
if run_a2v or run_v2a:
# norm3
vx_norm3 = comfy.ldm.common_dit.rms_norm(vx)
ax_norm3 = comfy.ldm.common_dit.rms_norm(ax)
(
scale_ca_audio_hidden_states_a2v,
shift_ca_audio_hidden_states_a2v,
scale_ca_audio_hidden_states_v2a,
shift_ca_audio_hidden_states_v2a,
gate_out_v2a,
) = self.get_av_ca_ada_values(
self.scale_shift_table_a2v_ca_audio,
ax.shape[0],
a_cross_scale_shift_timestep,
a_cross_gate_timestep,
)
(
scale_ca_video_hidden_states_a2v,
shift_ca_video_hidden_states_a2v,
scale_ca_video_hidden_states_v2a,
shift_ca_video_hidden_states_v2a,
gate_out_a2v,
) = self.get_av_ca_ada_values(
self.scale_shift_table_a2v_ca_video,
vx.shape[0],
v_cross_scale_shift_timestep,
v_cross_gate_timestep,
)
# audio to video cross attention
if run_a2v:
vx_scaled = (
vx_norm3 * (1 + scale_ca_video_hidden_states_a2v)
+ shift_ca_video_hidden_states_a2v
)
ax_scaled = (
ax_norm3 * (1 + scale_ca_audio_hidden_states_a2v)
+ shift_ca_audio_hidden_states_a2v
)
vx += (
self.audio_to_video_attn(
vx_scaled,
context=ax_scaled,
pe=v_cross_pe,
k_pe=a_cross_pe,
transformer_options=transformer_options,
)
* gate_out_a2v
)
scale_ca_audio_hidden_states_a2v, shift_ca_audio_hidden_states_a2v = self.get_ada_values(
self.scale_shift_table_a2v_ca_audio[:4, :], ax.shape[0], a_cross_scale_shift_timestep)[:2]
scale_ca_video_hidden_states_a2v_v, shift_ca_video_hidden_states_a2v_v = self.get_ada_values(
self.scale_shift_table_a2v_ca_video[:4, :], vx.shape[0], v_cross_scale_shift_timestep)[:2]
del gate_out_a2v
del scale_ca_video_hidden_states_a2v,\
shift_ca_video_hidden_states_a2v,\
scale_ca_audio_hidden_states_a2v,\
shift_ca_audio_hidden_states_a2v,\
vx_scaled = vx_norm3 * (1 + scale_ca_video_hidden_states_a2v_v) + shift_ca_video_hidden_states_a2v_v
ax_scaled = ax_norm3 * (1 + scale_ca_audio_hidden_states_a2v) + shift_ca_audio_hidden_states_a2v
del scale_ca_video_hidden_states_a2v_v, shift_ca_video_hidden_states_a2v_v, scale_ca_audio_hidden_states_a2v, shift_ca_audio_hidden_states_a2v
a2v_out = self.audio_to_video_attn(vx_scaled, context=ax_scaled, pe=v_cross_pe, k_pe=a_cross_pe, transformer_options=transformer_options)
del vx_scaled, ax_scaled
gate_out_a2v = self.get_ada_values(self.scale_shift_table_a2v_ca_video[4:, :], vx.shape[0], v_cross_gate_timestep)[0]
vx.addcmul_(a2v_out, gate_out_a2v)
del gate_out_a2v, a2v_out
# video to audio cross attention
if run_v2a:
ax_scaled = (
ax_norm3 * (1 + scale_ca_audio_hidden_states_v2a)
+ shift_ca_audio_hidden_states_v2a
)
vx_scaled = (
vx_norm3 * (1 + scale_ca_video_hidden_states_v2a)
+ shift_ca_video_hidden_states_v2a
)
ax += (
self.video_to_audio_attn(
ax_scaled,
context=vx_scaled,
pe=a_cross_pe,
k_pe=v_cross_pe,
transformer_options=transformer_options,
)
* gate_out_v2a
)
scale_ca_audio_hidden_states_v2a, shift_ca_audio_hidden_states_v2a = self.get_ada_values(
self.scale_shift_table_a2v_ca_audio[:4, :], ax.shape[0], a_cross_scale_shift_timestep)[2:4]
scale_ca_video_hidden_states_v2a, shift_ca_video_hidden_states_v2a = self.get_ada_values(
self.scale_shift_table_a2v_ca_video[:4, :], vx.shape[0], v_cross_scale_shift_timestep)[2:4]
del gate_out_v2a
del scale_ca_video_hidden_states_v2a,\
shift_ca_video_hidden_states_v2a,\
scale_ca_audio_hidden_states_v2a,\
shift_ca_audio_hidden_states_v2a
ax_scaled = ax_norm3 * (1 + scale_ca_audio_hidden_states_v2a) + shift_ca_audio_hidden_states_v2a
vx_scaled = vx_norm3 * (1 + scale_ca_video_hidden_states_v2a) + shift_ca_video_hidden_states_v2a
del scale_ca_video_hidden_states_v2a, shift_ca_video_hidden_states_v2a, scale_ca_audio_hidden_states_v2a, shift_ca_audio_hidden_states_v2a
v2a_out = self.video_to_audio_attn(ax_scaled, context=vx_scaled, pe=a_cross_pe, k_pe=v_cross_pe, transformer_options=transformer_options)
del ax_scaled, vx_scaled
gate_out_v2a = self.get_ada_values(self.scale_shift_table_a2v_ca_audio[4:, :], ax.shape[0], a_cross_gate_timestep)[0]
ax.addcmul_(v2a_out, gate_out_v2a)
del gate_out_v2a, v2a_out
del vx_norm3, ax_norm3
# video feedforward
if run_vx:
vshift_mlp, vscale_mlp, vgate_mlp = (
self.get_ada_values(self.scale_shift_table, vx.shape[0], v_timestep, slice(3, None))
)
vshift_mlp, vscale_mlp = self.get_ada_values(self.scale_shift_table, vx.shape[0], v_timestep, slice(3, 5))
vx_scaled = comfy.ldm.common_dit.rms_norm(vx) * (1 + vscale_mlp) + vshift_mlp
vx += self.ff(vx_scaled) * vgate_mlp
del vshift_mlp, vscale_mlp, vgate_mlp
del vshift_mlp, vscale_mlp
ff_out = self.ff(vx_scaled)
del vx_scaled
vgate_mlp = self.get_ada_values(self.scale_shift_table, vx.shape[0], v_timestep, slice(5, 6))[0]
vx.addcmul_(ff_out, vgate_mlp)
del vgate_mlp, ff_out
# audio feedforward
if run_ax:
ashift_mlp, ascale_mlp, agate_mlp = (
self.get_ada_values(self.audio_scale_shift_table, ax.shape[0], a_timestep, slice(3, None))
)
ashift_mlp, ascale_mlp = self.get_ada_values(self.audio_scale_shift_table, ax.shape[0], a_timestep, slice(3, 5))
ax_scaled = comfy.ldm.common_dit.rms_norm(ax) * (1 + ascale_mlp) + ashift_mlp
ax += self.audio_ff(ax_scaled) * agate_mlp
del ashift_mlp, ascale_mlp
del ashift_mlp, ascale_mlp, agate_mlp
ff_out = self.audio_ff(ax_scaled)
del ax_scaled
agate_mlp = self.get_ada_values(self.audio_scale_shift_table, ax.shape[0], a_timestep, slice(5, 6))[0]
ax.addcmul_(ff_out, agate_mlp)
del agate_mlp, ff_out
return vx, ax
@@ -526,9 +534,20 @@ class LTXAVModel(LTXVModel):
audio_length = kwargs.get("audio_length", 0)
# Separate audio and video latents
vx, ax = self.separate_audio_and_video_latents(x, audio_length)
has_spatial_mask = False
if denoise_mask is not None:
# check if any frame has spatial variation (inpainting)
for frame_idx in range(denoise_mask.shape[2]):
frame_mask = denoise_mask[0, 0, frame_idx]
if frame_mask.numel() > 0 and frame_mask.min() != frame_mask.max():
has_spatial_mask = True
break
[vx, v_pixel_coords, additional_args] = super()._process_input(
vx, keyframe_idxs, denoise_mask, **kwargs
)
additional_args["has_spatial_mask"] = has_spatial_mask
ax, a_latent_coords = self.a_patchifier.patchify(ax)
ax = self.audio_patchify_proj(ax)
@@ -543,72 +562,82 @@ class LTXAVModel(LTXVModel):
if grid_mask is not None:
timestep = timestep[:, grid_mask]
timestep = timestep * self.timestep_scale_multiplier
timestep_scaled = timestep * self.timestep_scale_multiplier
v_timestep, v_embedded_timestep = self.adaln_single(
timestep.flatten(),
timestep_scaled.flatten(),
{"resolution": None, "aspect_ratio": None},
batch_size=batch_size,
hidden_dtype=hidden_dtype,
)
# Second dimension is 1 or number of tokens (if timestep_per_token)
v_timestep = v_timestep.view(batch_size, -1, v_timestep.shape[-1])
v_embedded_timestep = v_embedded_timestep.view(
batch_size, -1, v_embedded_timestep.shape[-1]
)
# Calculate patches_per_frame from orig_shape: [batch, channels, frames, height, width]
# Video tokens are arranged as (frames * height * width), so patches_per_frame = height * width
orig_shape = kwargs.get("orig_shape")
has_spatial_mask = kwargs.get("has_spatial_mask", None)
v_patches_per_frame = None
if not has_spatial_mask and orig_shape is not None and len(orig_shape) == 5:
# orig_shape[3] = height, orig_shape[4] = width (in latent space)
v_patches_per_frame = orig_shape[3] * orig_shape[4]
# Reshape to [batch_size, num_tokens, dim] and compress for storage
v_timestep = CompressedTimestep(v_timestep.view(batch_size, -1, v_timestep.shape[-1]), v_patches_per_frame)
v_embedded_timestep = CompressedTimestep(v_embedded_timestep.view(batch_size, -1, v_embedded_timestep.shape[-1]), v_patches_per_frame)
# Prepare audio timestep
a_timestep = kwargs.get("a_timestep")
if a_timestep is not None:
a_timestep = a_timestep * self.timestep_scale_multiplier
a_timestep_scaled = a_timestep * self.timestep_scale_multiplier
a_timestep_flat = a_timestep_scaled.flatten()
timestep_flat = timestep_scaled.flatten()
av_ca_factor = self.av_ca_timestep_scale_multiplier / self.timestep_scale_multiplier
# Cross-attention timesteps - compress these too
av_ca_audio_scale_shift_timestep, _ = self.av_ca_audio_scale_shift_adaln_single(
a_timestep.flatten(),
a_timestep_flat,
{"resolution": None, "aspect_ratio": None},
batch_size=batch_size,
hidden_dtype=hidden_dtype,
)
av_ca_video_scale_shift_timestep, _ = self.av_ca_video_scale_shift_adaln_single(
timestep.flatten(),
timestep_flat,
{"resolution": None, "aspect_ratio": None},
batch_size=batch_size,
hidden_dtype=hidden_dtype,
)
av_ca_a2v_gate_noise_timestep, _ = self.av_ca_a2v_gate_adaln_single(
timestep.flatten() * av_ca_factor,
timestep_flat * av_ca_factor,
{"resolution": None, "aspect_ratio": None},
batch_size=batch_size,
hidden_dtype=hidden_dtype,
)
av_ca_v2a_gate_noise_timestep, _ = self.av_ca_v2a_gate_adaln_single(
a_timestep.flatten() * av_ca_factor,
a_timestep_flat * av_ca_factor,
{"resolution": None, "aspect_ratio": None},
batch_size=batch_size,
hidden_dtype=hidden_dtype,
)
# Compress cross-attention timesteps (only video side, audio is too small to benefit)
# v_patches_per_frame is None for spatial masks, set for temporal masks or no mask
cross_av_timestep_ss = [
av_ca_audio_scale_shift_timestep.view(batch_size, -1, av_ca_audio_scale_shift_timestep.shape[-1]),
CompressedTimestep(av_ca_video_scale_shift_timestep.view(batch_size, -1, av_ca_video_scale_shift_timestep.shape[-1]), v_patches_per_frame), # video - compressed if possible
CompressedTimestep(av_ca_a2v_gate_noise_timestep.view(batch_size, -1, av_ca_a2v_gate_noise_timestep.shape[-1]), v_patches_per_frame), # video - compressed if possible
av_ca_v2a_gate_noise_timestep.view(batch_size, -1, av_ca_v2a_gate_noise_timestep.shape[-1]),
]
a_timestep, a_embedded_timestep = self.audio_adaln_single(
a_timestep.flatten(),
a_timestep_flat,
{"resolution": None, "aspect_ratio": None},
batch_size=batch_size,
hidden_dtype=hidden_dtype,
)
# Audio timesteps
a_timestep = a_timestep.view(batch_size, -1, a_timestep.shape[-1])
a_embedded_timestep = a_embedded_timestep.view(
batch_size, -1, a_embedded_timestep.shape[-1]
)
cross_av_timestep_ss = [
av_ca_audio_scale_shift_timestep,
av_ca_video_scale_shift_timestep,
av_ca_a2v_gate_noise_timestep,
av_ca_v2a_gate_noise_timestep,
]
cross_av_timestep_ss = list(
[t.view(batch_size, -1, t.shape[-1]) for t in cross_av_timestep_ss]
)
a_embedded_timestep = a_embedded_timestep.view(batch_size, -1, a_embedded_timestep.shape[-1])
else:
a_timestep = timestep
a_timestep = timestep_scaled
a_embedded_timestep = kwargs.get("embedded_timestep")
cross_av_timestep_ss = []
@@ -767,6 +796,11 @@ class LTXAVModel(LTXVModel):
ax = x[1]
v_embedded_timestep = embedded_timestep[0]
a_embedded_timestep = embedded_timestep[1]
# Expand compressed video timestep if needed
if isinstance(v_embedded_timestep, CompressedTimestep):
v_embedded_timestep = v_embedded_timestep.expand()
vx = super()._process_output(vx, v_embedded_timestep, keyframe_idxs, **kwargs)
# Process audio output

View File

@@ -103,20 +103,10 @@ class AudioPreprocessor:
return waveform
return torchaudio.functional.resample(waveform, source_rate, self.target_sample_rate)
@staticmethod
def normalize_amplitude(
waveform: torch.Tensor, max_amplitude: float = 0.5, eps: float = 1e-5
) -> torch.Tensor:
waveform = waveform - waveform.mean(dim=2, keepdim=True)
peak = torch.max(torch.abs(waveform)) + eps
scale = peak.clamp(max=max_amplitude) / peak
return waveform * scale
def waveform_to_mel(
self, waveform: torch.Tensor, waveform_sample_rate: int, device
) -> torch.Tensor:
waveform = self.resample(waveform, waveform_sample_rate)
waveform = self.normalize_amplitude(waveform)
mel_transform = torchaudio.transforms.MelSpectrogram(
sample_rate=self.target_sample_rate,
@@ -189,9 +179,12 @@ class AudioVAE(torch.nn.Module):
waveform = self.device_manager.move_to_load_device(waveform)
expected_channels = self.autoencoder.encoder.in_channels
if waveform.shape[1] != expected_channels:
raise ValueError(
f"Input audio must have {expected_channels} channels, got {waveform.shape[1]}"
)
if waveform.shape[1] == 1:
waveform = waveform.expand(-1, expected_channels, *waveform.shape[2:])
else:
raise ValueError(
f"Input audio must have {expected_channels} channels, got {waveform.shape[1]}"
)
mel_spec = self.preprocessor.waveform_to_mel(
waveform, waveform_sample_rate, device=self.device_manager.load_device

View File

@@ -1,11 +1,11 @@
from typing import Tuple, Union
import threading
import torch
import torch.nn as nn
import comfy.ops
ops = comfy.ops.disable_weight_init
class CausalConv3d(nn.Module):
def __init__(
self,
@@ -42,23 +42,34 @@ class CausalConv3d(nn.Module):
padding_mode=spatial_padding_mode,
groups=groups,
)
self.temporal_cache_state={}
def forward(self, x, causal: bool = True):
if causal:
first_frame_pad = x[:, :, :1, :, :].repeat(
(1, 1, self.time_kernel_size - 1, 1, 1)
)
x = torch.concatenate((first_frame_pad, x), dim=2)
else:
first_frame_pad = x[:, :, :1, :, :].repeat(
(1, 1, (self.time_kernel_size - 1) // 2, 1, 1)
)
last_frame_pad = x[:, :, -1:, :, :].repeat(
(1, 1, (self.time_kernel_size - 1) // 2, 1, 1)
)
x = torch.concatenate((first_frame_pad, x, last_frame_pad), dim=2)
x = self.conv(x)
return x
tid = threading.get_ident()
cached, is_end = self.temporal_cache_state.get(tid, (None, False))
if cached is None:
padding_length = self.time_kernel_size - 1
if not causal:
padding_length = padding_length // 2
if x.shape[2] == 0:
return x
cached = x[:, :, :1, :, :].repeat((1, 1, padding_length, 1, 1))
pieces = [ cached, x ]
if is_end and not causal:
pieces.append(x[:, :, -1:, :, :].repeat((1, 1, (self.time_kernel_size - 1) // 2, 1, 1)))
needs_caching = not is_end
if needs_caching and x.shape[2] >= self.time_kernel_size - 1:
needs_caching = False
self.temporal_cache_state[tid] = (x[:, :, -(self.time_kernel_size - 1):, :, :], False)
x = torch.cat(pieces, dim=2)
if needs_caching:
self.temporal_cache_state[tid] = (x[:, :, -(self.time_kernel_size - 1):, :, :], False)
return self.conv(x) if x.shape[2] >= self.time_kernel_size else x[:, :, :0, :, :]
@property
def weight(self):

View File

@@ -1,4 +1,5 @@
from __future__ import annotations
import threading
import torch
from torch import nn
from functools import partial
@@ -6,12 +7,35 @@ import math
from einops import rearrange
from typing import List, Optional, Tuple, Union
from .conv_nd_factory import make_conv_nd, make_linear_nd
from .causal_conv3d import CausalConv3d
from .pixel_norm import PixelNorm
from ..model import PixArtAlphaCombinedTimestepSizeEmbeddings
import comfy.ops
from comfy.ldm.modules.diffusionmodules.model import torch_cat_if_needed
ops = comfy.ops.disable_weight_init
def mark_conv3d_ended(module):
tid = threading.get_ident()
for _, m in module.named_modules():
if isinstance(m, CausalConv3d):
current = m.temporal_cache_state.get(tid, (None, False))
m.temporal_cache_state[tid] = (current[0], True)
def split2(tensor, split_point, dim=2):
return torch.split(tensor, [split_point, tensor.shape[dim] - split_point], dim=dim)
def add_exchange_cache(dest, cache_in, new_input, dim=2):
if dest is not None:
if cache_in is not None:
cache_to_dest = min(dest.shape[dim], cache_in.shape[dim])
lead_in_dest, dest = split2(dest, cache_to_dest, dim=dim)
lead_in_source, cache_in = split2(cache_in, cache_to_dest, dim=dim)
lead_in_dest.add_(lead_in_source)
body, new_input = split2(new_input, dest.shape[dim], dim)
dest.add_(body)
return torch_cat_if_needed([cache_in, new_input], dim=dim)
class Encoder(nn.Module):
r"""
The `Encoder` layer of a variational autoencoder that encodes its input into a latent representation.
@@ -205,7 +229,7 @@ class Encoder(nn.Module):
self.gradient_checkpointing = False
def forward(self, sample: torch.FloatTensor) -> torch.FloatTensor:
def forward_orig(self, sample: torch.FloatTensor) -> torch.FloatTensor:
r"""The forward method of the `Encoder` class."""
sample = patchify(sample, patch_size_hw=self.patch_size, patch_size_t=1)
@@ -254,6 +278,22 @@ class Encoder(nn.Module):
return sample
def forward(self, *args, **kwargs):
#No encoder support so just flag the end so it doesnt use the cache.
mark_conv3d_ended(self)
try:
return self.forward_orig(*args, **kwargs)
finally:
tid = threading.get_ident()
for _, module in self.named_modules():
# ComfyUI doesn't thread this kind of stuff today, but just in case
# we key on the thread to make it thread safe.
tid = threading.get_ident()
if hasattr(module, "temporal_cache_state"):
module.temporal_cache_state.pop(tid, None)
MAX_CHUNK_SIZE=(128 * 1024 ** 2)
class Decoder(nn.Module):
r"""
@@ -341,18 +381,6 @@ class Decoder(nn.Module):
timestep_conditioning=timestep_conditioning,
spatial_padding_mode=spatial_padding_mode,
)
elif block_name == "attn_res_x":
block = UNetMidBlock3D(
dims=dims,
in_channels=input_channel,
num_layers=block_params["num_layers"],
resnet_groups=norm_num_groups,
norm_layer=norm_layer,
inject_noise=block_params.get("inject_noise", False),
timestep_conditioning=timestep_conditioning,
attention_head_dim=block_params["attention_head_dim"],
spatial_padding_mode=spatial_padding_mode,
)
elif block_name == "res_x_y":
output_channel = output_channel // block_params.get("multiplier", 2)
block = ResnetBlock3D(
@@ -428,8 +456,9 @@ class Decoder(nn.Module):
)
self.last_scale_shift_table = nn.Parameter(torch.empty(2, output_channel))
# def forward(self, sample: torch.FloatTensor, target_shape) -> torch.FloatTensor:
def forward(
def forward_orig(
self,
sample: torch.FloatTensor,
timestep: Optional[torch.Tensor] = None,
@@ -437,6 +466,7 @@ class Decoder(nn.Module):
r"""The forward method of the `Decoder` class."""
batch_size = sample.shape[0]
mark_conv3d_ended(self.conv_in)
sample = self.conv_in(sample, causal=self.causal)
checkpoint_fn = (
@@ -445,24 +475,12 @@ class Decoder(nn.Module):
else lambda x: x
)
scaled_timestep = None
timestep_shift_scale = None
if self.timestep_conditioning:
assert (
timestep is not None
), "should pass timestep with timestep_conditioning=True"
scaled_timestep = timestep * self.timestep_scale_multiplier.to(dtype=sample.dtype, device=sample.device)
for up_block in self.up_blocks:
if self.timestep_conditioning and isinstance(up_block, UNetMidBlock3D):
sample = checkpoint_fn(up_block)(
sample, causal=self.causal, timestep=scaled_timestep
)
else:
sample = checkpoint_fn(up_block)(sample, causal=self.causal)
sample = self.conv_norm_out(sample)
if self.timestep_conditioning:
embedded_timestep = self.last_time_embedder(
timestep=scaled_timestep.flatten(),
resolution=None,
@@ -483,16 +501,62 @@ class Decoder(nn.Module):
embedded_timestep.shape[-2],
embedded_timestep.shape[-1],
)
shift, scale = ada_values.unbind(dim=1)
sample = sample * (1 + scale) + shift
timestep_shift_scale = ada_values.unbind(dim=1)
sample = self.conv_act(sample)
sample = self.conv_out(sample, causal=self.causal)
output = []
def run_up(idx, sample, ended):
if idx >= len(self.up_blocks):
sample = self.conv_norm_out(sample)
if timestep_shift_scale is not None:
shift, scale = timestep_shift_scale
sample = sample * (1 + scale) + shift
sample = self.conv_act(sample)
if ended:
mark_conv3d_ended(self.conv_out)
sample = self.conv_out(sample, causal=self.causal)
if sample is not None and sample.shape[2] > 0:
output.append(sample)
return
up_block = self.up_blocks[idx]
if (ended):
mark_conv3d_ended(up_block)
if self.timestep_conditioning and isinstance(up_block, UNetMidBlock3D):
sample = checkpoint_fn(up_block)(
sample, causal=self.causal, timestep=scaled_timestep
)
else:
sample = checkpoint_fn(up_block)(sample, causal=self.causal)
if sample is None or sample.shape[2] == 0:
return
total_bytes = sample.numel() * sample.element_size()
num_chunks = (total_bytes + MAX_CHUNK_SIZE - 1) // MAX_CHUNK_SIZE
samples = torch.chunk(sample, chunks=num_chunks, dim=2)
for chunk_idx, sample1 in enumerate(samples):
run_up(idx + 1, sample1, ended and chunk_idx == len(samples) - 1)
run_up(0, sample, True)
sample = torch.cat(output, dim=2)
sample = unpatchify(sample, patch_size_hw=self.patch_size, patch_size_t=1)
return sample
def forward(self, *args, **kwargs):
try:
return self.forward_orig(*args, **kwargs)
finally:
for _, module in self.named_modules():
#ComfyUI doesn't thread this kind of stuff today, but just incase
#we key on the thread to make it thread safe.
tid = threading.get_ident()
if hasattr(module, "temporal_cache_state"):
module.temporal_cache_state.pop(tid, None)
class UNetMidBlock3D(nn.Module):
"""
@@ -663,8 +727,22 @@ class DepthToSpaceUpsample(nn.Module):
)
self.residual = residual
self.out_channels_reduction_factor = out_channels_reduction_factor
self.temporal_cache_state = {}
def forward(self, x, causal: bool = True, timestep: Optional[torch.Tensor] = None):
tid = threading.get_ident()
cached, drop_first_conv, drop_first_res = self.temporal_cache_state.get(tid, (None, True, True))
y = self.conv(x, causal=causal)
y = rearrange(
y,
"b (c p1 p2 p3) d h w -> b c (d p1) (h p2) (w p3)",
p1=self.stride[0],
p2=self.stride[1],
p3=self.stride[2],
)
if self.stride[0] == 2 and y.shape[2] > 0 and drop_first_conv:
y = y[:, :, 1:, :, :]
drop_first_conv = False
if self.residual:
# Reshape and duplicate the input to match the output shape
x_in = rearrange(
@@ -676,21 +754,20 @@ class DepthToSpaceUpsample(nn.Module):
)
num_repeat = math.prod(self.stride) // self.out_channels_reduction_factor
x_in = x_in.repeat(1, num_repeat, 1, 1, 1)
if self.stride[0] == 2:
if self.stride[0] == 2 and x_in.shape[2] > 0 and drop_first_res:
x_in = x_in[:, :, 1:, :, :]
x = self.conv(x, causal=causal)
x = rearrange(
x,
"b (c p1 p2 p3) d h w -> b c (d p1) (h p2) (w p3)",
p1=self.stride[0],
p2=self.stride[1],
p3=self.stride[2],
)
if self.stride[0] == 2:
x = x[:, :, 1:, :, :]
if self.residual:
x = x + x_in
return x
drop_first_res = False
if y.shape[2] == 0:
y = None
cached = add_exchange_cache(y, cached, x_in, dim=2)
self.temporal_cache_state[tid] = (cached, drop_first_conv, drop_first_res)
else:
self.temporal_cache_state[tid] = (None, drop_first_conv, False)
return y
class LayerNorm(nn.Module):
def __init__(self, dim, eps, elementwise_affine=True) -> None:
@@ -807,6 +884,8 @@ class ResnetBlock3D(nn.Module):
torch.randn(4, in_channels) / in_channels**0.5
)
self.temporal_cache_state={}
def _feed_spatial_noise(
self, hidden_states: torch.FloatTensor, per_channel_scale: torch.FloatTensor
) -> torch.FloatTensor:
@@ -880,9 +959,12 @@ class ResnetBlock3D(nn.Module):
input_tensor = self.conv_shortcut(input_tensor)
output_tensor = input_tensor + hidden_states
tid = threading.get_ident()
cached = self.temporal_cache_state.get(tid, None)
cached = add_exchange_cache(hidden_states, cached, input_tensor, dim=2)
self.temporal_cache_state[tid] = cached
return output_tensor
return hidden_states
def patchify(x, patch_size_hw, patch_size_t=1):

View File

@@ -13,10 +13,53 @@ from comfy.ldm.modules.attention import optimized_attention_masked
from comfy.ldm.flux.layers import EmbedND
from comfy.ldm.flux.math import apply_rope
import comfy.patcher_extension
import comfy.utils
def modulate(x, scale):
return x * (1 + scale.unsqueeze(1))
def invert_slices(slices, length):
sorted_slices = sorted(slices)
result = []
current = 0
for start, end in sorted_slices:
if current < start:
result.append((current, start))
current = max(current, end)
if current < length:
result.append((current, length))
return result
def modulate(x, scale, timestep_zero_index=None):
if timestep_zero_index is None:
return x * (1 + scale.unsqueeze(1))
else:
scale = (1 + scale.unsqueeze(1))
actual_batch = scale.size(0) // 2
slices = timestep_zero_index
invert = invert_slices(timestep_zero_index, x.shape[1])
for s in slices:
x[:, s[0]:s[1]] *= scale[actual_batch:]
for s in invert:
x[:, s[0]:s[1]] *= scale[:actual_batch]
return x
def apply_gate(gate, x, timestep_zero_index=None):
if timestep_zero_index is None:
return gate * x
else:
actual_batch = gate.size(0) // 2
slices = timestep_zero_index
invert = invert_slices(timestep_zero_index, x.shape[1])
for s in slices:
x[:, s[0]:s[1]] *= gate[actual_batch:]
for s in invert:
x[:, s[0]:s[1]] *= gate[:actual_batch]
return x
#############################################################################
# Core NextDiT Model #
@@ -258,6 +301,7 @@ class JointTransformerBlock(nn.Module):
x_mask: torch.Tensor,
freqs_cis: torch.Tensor,
adaln_input: Optional[torch.Tensor]=None,
timestep_zero_index=None,
transformer_options={},
):
"""
@@ -276,18 +320,18 @@ class JointTransformerBlock(nn.Module):
assert adaln_input is not None
scale_msa, gate_msa, scale_mlp, gate_mlp = self.adaLN_modulation(adaln_input).chunk(4, dim=1)
x = x + gate_msa.unsqueeze(1).tanh() * self.attention_norm2(
x = x + apply_gate(gate_msa.unsqueeze(1).tanh(), self.attention_norm2(
clamp_fp16(self.attention(
modulate(self.attention_norm1(x), scale_msa),
modulate(self.attention_norm1(x), scale_msa, timestep_zero_index=timestep_zero_index),
x_mask,
freqs_cis,
transformer_options=transformer_options,
))
))), timestep_zero_index=timestep_zero_index
)
x = x + gate_mlp.unsqueeze(1).tanh() * self.ffn_norm2(
x = x + apply_gate(gate_mlp.unsqueeze(1).tanh(), self.ffn_norm2(
clamp_fp16(self.feed_forward(
modulate(self.ffn_norm1(x), scale_mlp),
))
modulate(self.ffn_norm1(x), scale_mlp, timestep_zero_index=timestep_zero_index),
))), timestep_zero_index=timestep_zero_index
)
else:
assert adaln_input is None
@@ -345,13 +389,37 @@ class FinalLayer(nn.Module):
),
)
def forward(self, x, c):
def forward(self, x, c, timestep_zero_index=None):
scale = self.adaLN_modulation(c)
x = modulate(self.norm_final(x), scale)
x = modulate(self.norm_final(x), scale, timestep_zero_index=timestep_zero_index)
x = self.linear(x)
return x
def pad_zimage(feats, pad_token, pad_tokens_multiple):
pad_extra = (-feats.shape[1]) % pad_tokens_multiple
return torch.cat((feats, pad_token.to(device=feats.device, dtype=feats.dtype, copy=True).unsqueeze(0).repeat(feats.shape[0], pad_extra, 1)), dim=1), pad_extra
def pos_ids_x(start_t, H_tokens, W_tokens, batch_size, device, transformer_options={}):
rope_options = transformer_options.get("rope_options", None)
h_scale = 1.0
w_scale = 1.0
h_start = 0
w_start = 0
if rope_options is not None:
h_scale = rope_options.get("scale_y", 1.0)
w_scale = rope_options.get("scale_x", 1.0)
h_start = rope_options.get("shift_y", 0.0)
w_start = rope_options.get("shift_x", 0.0)
x_pos_ids = torch.zeros((batch_size, H_tokens * W_tokens, 3), dtype=torch.float32, device=device)
x_pos_ids[:, :, 0] = start_t
x_pos_ids[:, :, 1] = (torch.arange(H_tokens, dtype=torch.float32, device=device) * h_scale + h_start).view(-1, 1).repeat(1, W_tokens).flatten()
x_pos_ids[:, :, 2] = (torch.arange(W_tokens, dtype=torch.float32, device=device) * w_scale + w_start).view(1, -1).repeat(H_tokens, 1).flatten()
return x_pos_ids
class NextDiT(nn.Module):
"""
Diffusion model with a Transformer backbone.
@@ -378,10 +446,12 @@ class NextDiT(nn.Module):
time_scale=1.0,
pad_tokens_multiple=None,
clip_text_dim=None,
siglip_feat_dim=None,
image_model=None,
device=None,
dtype=None,
operations=None,
**kwargs,
) -> None:
super().__init__()
self.dtype = dtype
@@ -491,6 +561,41 @@ class NextDiT(nn.Module):
for layer_id in range(n_layers)
]
)
if siglip_feat_dim is not None:
self.siglip_embedder = nn.Sequential(
operation_settings.get("operations").RMSNorm(siglip_feat_dim, eps=norm_eps, elementwise_affine=True, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")),
operation_settings.get("operations").Linear(
siglip_feat_dim,
dim,
bias=True,
device=operation_settings.get("device"),
dtype=operation_settings.get("dtype"),
),
)
self.siglip_refiner = nn.ModuleList(
[
JointTransformerBlock(
layer_id,
dim,
n_heads,
n_kv_heads,
multiple_of,
ffn_dim_multiplier,
norm_eps,
qk_norm,
modulation=False,
operation_settings=operation_settings,
)
for layer_id in range(n_refiner_layers)
]
)
self.siglip_pad_token = nn.Parameter(torch.empty((1, dim), device=device, dtype=dtype))
else:
self.siglip_embedder = None
self.siglip_refiner = None
self.siglip_pad_token = None
# This norm final is in the lumina 2.0 code but isn't actually used for anything.
# self.norm_final = operation_settings.get("operations").RMSNorm(dim, eps=norm_eps, elementwise_affine=True, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
self.final_layer = FinalLayer(dim, patch_size, self.out_channels, z_image_modulation=z_image_modulation, operation_settings=operation_settings)
@@ -531,70 +636,168 @@ class NextDiT(nn.Module):
imgs = torch.stack(imgs, dim=0)
return imgs
def patchify_and_embed(
self, x: List[torch.Tensor] | torch.Tensor, cap_feats: torch.Tensor, cap_mask: torch.Tensor, t: torch.Tensor, num_tokens, transformer_options={}
) -> Tuple[torch.Tensor, torch.Tensor, List[Tuple[int, int]], List[int], torch.Tensor]:
bsz = len(x)
pH = pW = self.patch_size
device = x[0].device
orig_x = x
if self.pad_tokens_multiple is not None:
pad_extra = (-cap_feats.shape[1]) % self.pad_tokens_multiple
cap_feats = torch.cat((cap_feats, self.cap_pad_token.to(device=cap_feats.device, dtype=cap_feats.dtype, copy=True).unsqueeze(0).repeat(cap_feats.shape[0], pad_extra, 1)), dim=1)
def embed_cap(self, cap_feats=None, offset=0, bsz=1, device=None, dtype=None):
if cap_feats is not None:
cap_feats = self.cap_embedder(cap_feats)
cap_feats_len = cap_feats.shape[1]
if self.pad_tokens_multiple is not None:
cap_feats, _ = pad_zimage(cap_feats, self.cap_pad_token, self.pad_tokens_multiple)
else:
cap_feats_len = 0
cap_feats = self.cap_pad_token.to(device=device, dtype=dtype, copy=True).unsqueeze(0).repeat(bsz, self.pad_tokens_multiple, 1)
cap_pos_ids = torch.zeros(bsz, cap_feats.shape[1], 3, dtype=torch.float32, device=device)
cap_pos_ids[:, :, 0] = torch.arange(cap_feats.shape[1], dtype=torch.float32, device=device) + 1.0
cap_pos_ids[:, :, 0] = torch.arange(cap_feats.shape[1], dtype=torch.float32, device=device) + 1.0 + offset
embeds = (cap_feats,)
freqs_cis = (self.rope_embedder(cap_pos_ids).movedim(1, 2),)
return embeds, freqs_cis, cap_feats_len
def embed_all(self, x, cap_feats=None, siglip_feats=None, offset=0, omni=False, transformer_options={}):
bsz = 1
pH = pW = self.patch_size
device = x.device
embeds, freqs_cis, cap_feats_len = self.embed_cap(cap_feats, offset=offset, bsz=bsz, device=device, dtype=x.dtype)
if (not omni) or self.siglip_embedder is None:
cap_feats_len = embeds[0].shape[1] + offset
embeds += (None,)
freqs_cis += (None,)
else:
cap_feats_len += offset
if siglip_feats is not None:
b, h, w, c = siglip_feats.shape
siglip_feats = siglip_feats.permute(0, 3, 1, 2).reshape(b, h * w, c)
siglip_feats = self.siglip_embedder(siglip_feats)
siglip_pos_ids = torch.zeros((bsz, siglip_feats.shape[1], 3), dtype=torch.float32, device=device)
siglip_pos_ids[:, :, 0] = cap_feats_len + 2
siglip_pos_ids[:, :, 1] = (torch.linspace(0, h * 8 - 1, steps=h, dtype=torch.float32, device=device).floor()).view(-1, 1).repeat(1, w).flatten()
siglip_pos_ids[:, :, 2] = (torch.linspace(0, w * 8 - 1, steps=w, dtype=torch.float32, device=device).floor()).view(1, -1).repeat(h, 1).flatten()
if self.siglip_pad_token is not None:
siglip_feats, pad_extra = pad_zimage(siglip_feats, self.siglip_pad_token, self.pad_tokens_multiple) # TODO: double check
siglip_pos_ids = torch.nn.functional.pad(siglip_pos_ids, (0, 0, 0, pad_extra))
else:
if self.siglip_pad_token is not None:
siglip_feats = self.siglip_pad_token.to(device=device, dtype=x.dtype, copy=True).unsqueeze(0).repeat(bsz, self.pad_tokens_multiple, 1)
siglip_pos_ids = torch.zeros((bsz, siglip_feats.shape[1], 3), dtype=torch.float32, device=device)
if siglip_feats is None:
embeds += (None,)
freqs_cis += (None,)
else:
embeds += (siglip_feats,)
freqs_cis += (self.rope_embedder(siglip_pos_ids).movedim(1, 2),)
B, C, H, W = x.shape
x = self.x_embedder(x.view(B, C, H // pH, pH, W // pW, pW).permute(0, 2, 4, 3, 5, 1).flatten(3).flatten(1, 2))
rope_options = transformer_options.get("rope_options", None)
h_scale = 1.0
w_scale = 1.0
h_start = 0
w_start = 0
if rope_options is not None:
h_scale = rope_options.get("scale_y", 1.0)
w_scale = rope_options.get("scale_x", 1.0)
h_start = rope_options.get("shift_y", 0.0)
w_start = rope_options.get("shift_x", 0.0)
H_tokens, W_tokens = H // pH, W // pW
x_pos_ids = torch.zeros((bsz, x.shape[1], 3), dtype=torch.float32, device=device)
x_pos_ids[:, :, 0] = cap_feats.shape[1] + 1
x_pos_ids[:, :, 1] = (torch.arange(H_tokens, dtype=torch.float32, device=device) * h_scale + h_start).view(-1, 1).repeat(1, W_tokens).flatten()
x_pos_ids[:, :, 2] = (torch.arange(W_tokens, dtype=torch.float32, device=device) * w_scale + w_start).view(1, -1).repeat(H_tokens, 1).flatten()
x_pos_ids = pos_ids_x(cap_feats_len + 1, H // pH, W // pW, bsz, device, transformer_options=transformer_options)
if self.pad_tokens_multiple is not None:
pad_extra = (-x.shape[1]) % self.pad_tokens_multiple
x = torch.cat((x, self.x_pad_token.to(device=x.device, dtype=x.dtype, copy=True).unsqueeze(0).repeat(x.shape[0], pad_extra, 1)), dim=1)
x, pad_extra = pad_zimage(x, self.x_pad_token, self.pad_tokens_multiple)
x_pos_ids = torch.nn.functional.pad(x_pos_ids, (0, 0, 0, pad_extra))
freqs_cis = self.rope_embedder(torch.cat((cap_pos_ids, x_pos_ids), dim=1)).movedim(1, 2)
embeds += (x,)
freqs_cis += (self.rope_embedder(x_pos_ids).movedim(1, 2),)
return embeds, freqs_cis, cap_feats_len + len(freqs_cis) - 1
def patchify_and_embed(
self, x: torch.Tensor, cap_feats: torch.Tensor, cap_mask: torch.Tensor, t: torch.Tensor, num_tokens, ref_latents=[], ref_contexts=[], siglip_feats=[], transformer_options={}
) -> Tuple[torch.Tensor, torch.Tensor, List[Tuple[int, int]], List[int], torch.Tensor]:
bsz = x.shape[0]
cap_mask = None # TODO?
main_siglip = None
orig_x = x
embeds = ([], [], [])
freqs_cis = ([], [], [])
leftover_cap = []
start_t = 0
omni = len(ref_latents) > 0
if omni:
for i, ref in enumerate(ref_latents):
if i < len(ref_contexts):
ref_con = ref_contexts[i]
else:
ref_con = None
if i < len(siglip_feats):
sig_feat = siglip_feats[i]
else:
sig_feat = None
out = self.embed_all(ref, ref_con, sig_feat, offset=start_t, omni=omni, transformer_options=transformer_options)
for i, e in enumerate(out[0]):
if e is not None:
embeds[i].append(comfy.utils.repeat_to_batch_size(e, bsz))
freqs_cis[i].append(out[1][i])
start_t = out[2]
leftover_cap = ref_contexts[len(ref_latents):]
H, W = x.shape[-2], x.shape[-1]
img_sizes = [(H, W)] * bsz
out = self.embed_all(x, cap_feats, main_siglip, offset=start_t, omni=omni, transformer_options=transformer_options)
img_len = out[0][-1].shape[1]
cap_len = out[0][0].shape[1]
for i, e in enumerate(out[0]):
if e is not None:
e = comfy.utils.repeat_to_batch_size(e, bsz)
embeds[i].append(e)
freqs_cis[i].append(out[1][i])
start_t = out[2]
for cap in leftover_cap:
out = self.embed_cap(cap, offset=start_t, bsz=bsz, device=x.device, dtype=x.dtype)
cap_len += out[0][0].shape[1]
embeds[0].append(comfy.utils.repeat_to_batch_size(out[0][0], bsz))
freqs_cis[0].append(out[1][0])
start_t += out[2]
patches = transformer_options.get("patches", {})
# refine context
cap_feats = torch.cat(embeds[0], dim=1)
cap_freqs_cis = torch.cat(freqs_cis[0], dim=1)
for layer in self.context_refiner:
cap_feats = layer(cap_feats, cap_mask, freqs_cis[:, :cap_pos_ids.shape[1]], transformer_options=transformer_options)
cap_feats = layer(cap_feats, cap_mask, cap_freqs_cis, transformer_options=transformer_options)
feats = (cap_feats,)
fc = (cap_freqs_cis,)
if omni and len(embeds[1]) > 0:
siglip_mask = None
siglip_feats_combined = torch.cat(embeds[1], dim=1)
siglip_feats_freqs_cis = torch.cat(freqs_cis[1], dim=1)
if self.siglip_refiner is not None:
for layer in self.siglip_refiner:
siglip_feats_combined = layer(siglip_feats_combined, siglip_mask, siglip_feats_freqs_cis, transformer_options=transformer_options)
feats += (siglip_feats_combined,)
fc += (siglip_feats_freqs_cis,)
padded_img_mask = None
x = torch.cat(embeds[-1], dim=1)
fc_x = torch.cat(freqs_cis[-1], dim=1)
if omni:
timestep_zero_index = [(x.shape[1] - img_len, x.shape[1])]
else:
timestep_zero_index = None
x_input = x
for i, layer in enumerate(self.noise_refiner):
x = layer(x, padded_img_mask, freqs_cis[:, cap_pos_ids.shape[1]:], t, transformer_options=transformer_options)
x = layer(x, padded_img_mask, fc_x, t, timestep_zero_index=timestep_zero_index, transformer_options=transformer_options)
if "noise_refiner" in patches:
for p in patches["noise_refiner"]:
out = p({"img": x, "img_input": x_input, "txt": cap_feats, "pe": freqs_cis[:, cap_pos_ids.shape[1]:], "vec": t, "x": orig_x, "block_index": i, "transformer_options": transformer_options, "block_type": "noise_refiner"})
out = p({"img": x, "img_input": x_input, "txt": cap_feats, "pe": fc_x, "vec": t, "x": orig_x, "block_index": i, "transformer_options": transformer_options, "block_type": "noise_refiner"})
if "img" in out:
x = out["img"]
padded_full_embed = torch.cat((cap_feats, x), dim=1)
padded_full_embed = torch.cat(feats + (x,), dim=1)
if timestep_zero_index is not None:
ind = padded_full_embed.shape[1] - x.shape[1]
timestep_zero_index = [(ind + x.shape[1] - img_len, ind + x.shape[1])]
timestep_zero_index.append((feats[0].shape[1] - cap_len, feats[0].shape[1]))
mask = None
img_sizes = [(H, W)] * bsz
l_effective_cap_len = [cap_feats.shape[1]] * bsz
return padded_full_embed, mask, img_sizes, l_effective_cap_len, freqs_cis
l_effective_cap_len = [padded_full_embed.shape[1] - img_len] * bsz
return padded_full_embed, mask, img_sizes, l_effective_cap_len, torch.cat(fc + (fc_x,), dim=1), timestep_zero_index
def forward(self, x, timesteps, context, num_tokens, attention_mask=None, **kwargs):
return comfy.patcher_extension.WrapperExecutor.new_class_executor(
@@ -604,7 +807,11 @@ class NextDiT(nn.Module):
).execute(x, timesteps, context, num_tokens, attention_mask, **kwargs)
# def forward(self, x, t, cap_feats, cap_mask):
def _forward(self, x, timesteps, context, num_tokens, attention_mask=None, transformer_options={}, **kwargs):
def _forward(self, x, timesteps, context, num_tokens, attention_mask=None, ref_latents=[], ref_contexts=[], siglip_feats=[], transformer_options={}, **kwargs):
omni = len(ref_latents) > 0
if omni:
timesteps = torch.cat([timesteps * 0, timesteps], dim=0)
t = 1.0 - timesteps
cap_feats = context
cap_mask = attention_mask
@@ -619,8 +826,6 @@ class NextDiT(nn.Module):
t = self.t_embedder(t * self.time_scale, dtype=x.dtype) # (N, D)
adaln_input = t
cap_feats = self.cap_embedder(cap_feats) # (N, L, D) # todo check if able to batchify w.o. redundant compute
if self.clip_text_pooled_proj is not None:
pooled = kwargs.get("clip_text_pooled", None)
if pooled is not None:
@@ -632,7 +837,7 @@ class NextDiT(nn.Module):
patches = transformer_options.get("patches", {})
x_is_tensor = isinstance(x, torch.Tensor)
img, mask, img_size, cap_size, freqs_cis = self.patchify_and_embed(x, cap_feats, cap_mask, adaln_input, num_tokens, transformer_options=transformer_options)
img, mask, img_size, cap_size, freqs_cis, timestep_zero_index = self.patchify_and_embed(x, cap_feats, cap_mask, adaln_input, num_tokens, ref_latents=ref_latents, ref_contexts=ref_contexts, siglip_feats=siglip_feats, transformer_options=transformer_options)
freqs_cis = freqs_cis.to(img.device)
transformer_options["total_blocks"] = len(self.layers)
@@ -640,7 +845,7 @@ class NextDiT(nn.Module):
img_input = img
for i, layer in enumerate(self.layers):
transformer_options["block_index"] = i
img = layer(img, mask, freqs_cis, adaln_input, transformer_options=transformer_options)
img = layer(img, mask, freqs_cis, adaln_input, timestep_zero_index=timestep_zero_index, transformer_options=transformer_options)
if "double_block" in patches:
for p in patches["double_block"]:
out = p({"img": img[:, cap_size[0]:], "img_input": img_input[:, cap_size[0]:], "txt": img[:, :cap_size[0]], "pe": freqs_cis[:, cap_size[0]:], "vec": adaln_input, "x": x, "block_index": i, "transformer_options": transformer_options})
@@ -649,8 +854,7 @@ class NextDiT(nn.Module):
if "txt" in out:
img[:, :cap_size[0]] = out["txt"]
img = self.final_layer(img, adaln_input)
img = self.final_layer(img, adaln_input, timestep_zero_index=timestep_zero_index)
img = self.unpatchify(img, img_size, cap_size, return_tensor=x_is_tensor)[:, :, :h, :w]
return -img

View File

@@ -14,10 +14,13 @@ if model_management.xformers_enabled_vae():
import xformers.ops
def torch_cat_if_needed(xl, dim):
xl = [x for x in xl if x is not None and x.shape[dim] > 0]
if len(xl) > 1:
return torch.cat(xl, dim)
else:
elif len(xl) == 1:
return xl[0]
else:
return None
def get_timestep_embedding(timesteps, embedding_dim):
"""

View File

@@ -170,8 +170,14 @@ class Attention(nn.Module):
joint_query = apply_rope1(joint_query, image_rotary_emb)
joint_key = apply_rope1(joint_key, image_rotary_emb)
if encoder_hidden_states_mask is not None:
attn_mask = torch.zeros((batch_size, 1, seq_txt + seq_img), dtype=hidden_states.dtype, device=hidden_states.device)
attn_mask[:, 0, :seq_txt] = encoder_hidden_states_mask
else:
attn_mask = None
joint_hidden_states = optimized_attention_masked(joint_query, joint_key, joint_value, self.heads,
attention_mask, transformer_options=transformer_options,
attn_mask, transformer_options=transformer_options,
skip_reshape=True)
txt_attn_output = joint_hidden_states[:, :seq_txt, :]
@@ -430,6 +436,9 @@ class QwenImageTransformer2DModel(nn.Module):
encoder_hidden_states = context
encoder_hidden_states_mask = attention_mask
if encoder_hidden_states_mask is not None and not torch.is_floating_point(encoder_hidden_states_mask):
encoder_hidden_states_mask = (encoder_hidden_states_mask - 1).to(x.dtype) * torch.finfo(x.dtype).max
hidden_states, img_ids, orig_shape = self.process_img(x)
num_embeds = hidden_states.shape[1]

View File

@@ -62,6 +62,8 @@ class WanSelfAttention(nn.Module):
x(Tensor): Shape [B, L, num_heads, C / num_heads]
freqs(Tensor): Rope freqs, shape [1024, C / num_heads / 2]
"""
patches = transformer_options.get("patches", {})
b, s, n, d = *x.shape[:2], self.num_heads, self.head_dim
def qkv_fn_q(x):
@@ -86,6 +88,10 @@ class WanSelfAttention(nn.Module):
transformer_options=transformer_options,
)
if "attn1_patch" in patches:
for p in patches["attn1_patch"]:
x = p({"x": x, "q": q, "k": k, "transformer_options": transformer_options})
x = self.o(x)
return x
@@ -225,6 +231,8 @@ class WanAttentionBlock(nn.Module):
"""
# assert e.dtype == torch.float32
patches = transformer_options.get("patches", {})
if e.ndim < 4:
e = (comfy.model_management.cast_to(self.modulation, dtype=x.dtype, device=x.device) + e).chunk(6, dim=1)
else:
@@ -242,6 +250,11 @@ class WanAttentionBlock(nn.Module):
# cross-attention & ffn
x = x + self.cross_attn(self.norm3(x), context, context_img_len=context_img_len, transformer_options=transformer_options)
if "attn2_patch" in patches:
for p in patches["attn2_patch"]:
x = p({"x": x, "transformer_options": transformer_options})
y = self.ffn(torch.addcmul(repeat_e(e[3], x), self.norm2(x), 1 + repeat_e(e[4], x)))
x = torch.addcmul(x, y, repeat_e(e[5], x))
return x
@@ -488,7 +501,7 @@ class WanModel(torch.nn.Module):
self.blocks = nn.ModuleList([
wan_attn_block_class(cross_attn_type, dim, ffn_dim, num_heads,
window_size, qk_norm, cross_attn_norm, eps, operation_settings=operation_settings)
for _ in range(num_layers)
for i in range(num_layers)
])
# head
@@ -541,6 +554,7 @@ class WanModel(torch.nn.Module):
# embeddings
x = self.patch_embedding(x.float()).to(x.dtype)
grid_sizes = x.shape[2:]
transformer_options["grid_sizes"] = grid_sizes
x = x.flatten(2).transpose(1, 2)
# time embeddings
@@ -738,6 +752,7 @@ class VaceWanModel(WanModel):
# embeddings
x = self.patch_embedding(x.float()).to(x.dtype)
grid_sizes = x.shape[2:]
transformer_options["grid_sizes"] = grid_sizes
x = x.flatten(2).transpose(1, 2)
# time embeddings

View File

@@ -0,0 +1,500 @@
import torch
from einops import rearrange, repeat
import comfy
from comfy.ldm.modules.attention import optimized_attention
def calculate_x_ref_attn_map(visual_q, ref_k, ref_target_masks, split_num=8):
scale = 1.0 / visual_q.shape[-1] ** 0.5
visual_q = visual_q.transpose(1, 2) * scale
B, H, x_seqlens, K = visual_q.shape
x_ref_attn_maps = []
for class_idx, ref_target_mask in enumerate(ref_target_masks):
ref_target_mask = ref_target_mask.view(1, 1, 1, -1)
x_ref_attnmap = torch.zeros(B, H, x_seqlens, device=visual_q.device, dtype=visual_q.dtype)
chunk_size = min(max(x_seqlens // split_num, 1), x_seqlens)
for i in range(0, x_seqlens, chunk_size):
end_i = min(i + chunk_size, x_seqlens)
attn_chunk = visual_q[:, :, i:end_i] @ ref_k.permute(0, 2, 3, 1) # B, H, chunk, ref_seqlens
# Apply softmax
attn_max = attn_chunk.max(dim=-1, keepdim=True).values
attn_chunk = (attn_chunk - attn_max).exp()
attn_sum = attn_chunk.sum(dim=-1, keepdim=True)
attn_chunk = attn_chunk / (attn_sum + 1e-8)
# Apply mask and sum
masked_attn = attn_chunk * ref_target_mask
x_ref_attnmap[:, :, i:end_i] = masked_attn.sum(-1) / (ref_target_mask.sum() + 1e-8)
del attn_chunk, masked_attn
# Average across heads
x_ref_attnmap = x_ref_attnmap.mean(dim=1) # B, x_seqlens
x_ref_attn_maps.append(x_ref_attnmap)
del visual_q, ref_k
return torch.cat(x_ref_attn_maps, dim=0)
def get_attn_map_with_target(visual_q, ref_k, shape, ref_target_masks=None, split_num=2):
"""Args:
query (torch.tensor): B M H K
key (torch.tensor): B M H K
shape (tuple): (N_t, N_h, N_w)
ref_target_masks: [B, N_h * N_w]
"""
N_t, N_h, N_w = shape
x_seqlens = N_h * N_w
ref_k = ref_k[:, :x_seqlens]
_, seq_lens, heads, _ = visual_q.shape
class_num, _ = ref_target_masks.shape
x_ref_attn_maps = torch.zeros(class_num, seq_lens).to(visual_q)
split_chunk = heads // split_num
for i in range(split_num):
x_ref_attn_maps_perhead = calculate_x_ref_attn_map(
visual_q[:, :, i*split_chunk:(i+1)*split_chunk, :],
ref_k[:, :, i*split_chunk:(i+1)*split_chunk, :],
ref_target_masks
)
x_ref_attn_maps += x_ref_attn_maps_perhead
return x_ref_attn_maps / split_num
def normalize_and_scale(column, source_range, target_range, epsilon=1e-8):
source_min, source_max = source_range
new_min, new_max = target_range
normalized = (column - source_min) / (source_max - source_min + epsilon)
scaled = normalized * (new_max - new_min) + new_min
return scaled
def rotate_half(x):
x = rearrange(x, "... (d r) -> ... d r", r=2)
x1, x2 = x.unbind(dim=-1)
x = torch.stack((-x2, x1), dim=-1)
return rearrange(x, "... d r -> ... (d r)")
def get_audio_embeds(encoded_audio, audio_start, audio_end):
audio_embs = []
human_num = len(encoded_audio)
audio_frames = encoded_audio[0].shape[0]
indices = (torch.arange(4 + 1) - 2) * 1
for human_idx in range(human_num):
if audio_end > audio_frames: # in case of not enough audio for current window, pad with first audio frame as that's most likely silence
pad_len = audio_end - audio_frames
pad_shape = list(encoded_audio[human_idx].shape)
pad_shape[0] = pad_len
pad_tensor = encoded_audio[human_idx][:1].repeat(pad_len, *([1] * (encoded_audio[human_idx].dim() - 1)))
encoded_audio_in = torch.cat([encoded_audio[human_idx], pad_tensor], dim=0)
else:
encoded_audio_in = encoded_audio[human_idx]
center_indices = torch.arange(audio_start, audio_end, 1).unsqueeze(1) + indices.unsqueeze(0)
center_indices = torch.clamp(center_indices, min=0, max=encoded_audio_in.shape[0] - 1)
audio_emb = encoded_audio_in[center_indices].unsqueeze(0)
audio_embs.append(audio_emb)
return torch.cat(audio_embs, dim=0)
def project_audio_features(audio_proj, encoded_audio, audio_start, audio_end):
audio_embs = get_audio_embeds(encoded_audio, audio_start, audio_end)
first_frame_audio_emb_s = audio_embs[:, :1, ...]
latter_frame_audio_emb = audio_embs[:, 1:, ...]
latter_frame_audio_emb = rearrange(latter_frame_audio_emb, "b (n_t n) w s c -> b n_t n w s c", n=4)
middle_index = audio_proj.seq_len // 2
latter_first_frame_audio_emb = latter_frame_audio_emb[:, :, :1, :middle_index+1, ...]
latter_first_frame_audio_emb = rearrange(latter_first_frame_audio_emb, "b n_t n w s c -> b n_t (n w) s c")
latter_last_frame_audio_emb = latter_frame_audio_emb[:, :, -1:, middle_index:, ...]
latter_last_frame_audio_emb = rearrange(latter_last_frame_audio_emb, "b n_t n w s c -> b n_t (n w) s c")
latter_middle_frame_audio_emb = latter_frame_audio_emb[:, :, 1:-1, middle_index:middle_index+1, ...]
latter_middle_frame_audio_emb = rearrange(latter_middle_frame_audio_emb, "b n_t n w s c -> b n_t (n w) s c")
latter_frame_audio_emb_s = torch.cat([latter_first_frame_audio_emb, latter_middle_frame_audio_emb, latter_last_frame_audio_emb], dim=2)
audio_emb = audio_proj(first_frame_audio_emb_s, latter_frame_audio_emb_s)
audio_emb = torch.cat(audio_emb.split(1), dim=2)
return audio_emb
class RotaryPositionalEmbedding1D(torch.nn.Module):
def __init__(self,
head_dim,
):
super().__init__()
self.head_dim = head_dim
self.base = 10000
def precompute_freqs_cis_1d(self, pos_indices):
freqs = 1.0 / (self.base ** (torch.arange(0, self.head_dim, 2)[: (self.head_dim // 2)].float() / self.head_dim))
freqs = freqs.to(pos_indices.device)
freqs = torch.einsum("..., f -> ... f", pos_indices.float(), freqs)
freqs = repeat(freqs, "... n -> ... (n r)", r=2)
return freqs
def forward(self, x, pos_indices):
freqs_cis = self.precompute_freqs_cis_1d(pos_indices)
x_ = x.float()
freqs_cis = freqs_cis.float().to(x.device)
cos, sin = freqs_cis.cos(), freqs_cis.sin()
cos, sin = rearrange(cos, 'n d -> 1 1 n d'), rearrange(sin, 'n d -> 1 1 n d')
x_ = (x_ * cos) + (rotate_half(x_) * sin)
return x_.type_as(x)
class SingleStreamAttention(torch.nn.Module):
def __init__(
self,
dim: int,
encoder_hidden_states_dim: int,
num_heads: int,
qkv_bias: bool,
device=None, dtype=None, operations=None
) -> None:
super().__init__()
self.dim = dim
self.encoder_hidden_states_dim = encoder_hidden_states_dim
self.num_heads = num_heads
self.head_dim = dim // num_heads
self.q_linear = operations.Linear(dim, dim, bias=qkv_bias, device=device, dtype=dtype)
self.proj = operations.Linear(dim, dim, device=device, dtype=dtype)
self.kv_linear = operations.Linear(encoder_hidden_states_dim, dim * 2, bias=qkv_bias, device=device, dtype=dtype)
def forward(self, x: torch.Tensor, encoder_hidden_states: torch.Tensor, shape=None) -> torch.Tensor:
N_t, N_h, N_w = shape
expected_tokens = N_t * N_h * N_w
actual_tokens = x.shape[1]
x_extra = None
if actual_tokens != expected_tokens:
x_extra = x[:, -N_h * N_w:, :]
x = x[:, :-N_h * N_w, :]
N_t = N_t - 1
B = x.shape[0]
S = N_h * N_w
x = x.view(B * N_t, S, self.dim)
# get q for hidden_state
q = self.q_linear(x).view(B * N_t, S, self.num_heads, self.head_dim)
# get kv from encoder_hidden_states # shape: (B, N, num_heads, head_dim)
kv = self.kv_linear(encoder_hidden_states)
encoder_k, encoder_v = kv.view(B * N_t, encoder_hidden_states.shape[1], 2, self.num_heads, self.head_dim).unbind(2)
#print("q.shape", q.shape) #torch.Size([21, 1024, 40, 128])
x = optimized_attention(
q.transpose(1, 2),
encoder_k.transpose(1, 2),
encoder_v.transpose(1, 2),
heads=self.num_heads, skip_reshape=True, skip_output_reshape=True).transpose(1, 2)
# linear transform
x = self.proj(x.reshape(B * N_t, S, self.dim))
x = x.view(B, N_t * S, self.dim)
if x_extra is not None:
x = torch.cat([x, torch.zeros_like(x_extra)], dim=1)
return x
class SingleStreamMultiAttention(SingleStreamAttention):
def __init__(
self,
dim: int,
encoder_hidden_states_dim: int,
num_heads: int,
qkv_bias: bool,
class_range: int = 24,
class_interval: int = 4,
device=None, dtype=None, operations=None
) -> None:
super().__init__(
dim=dim,
encoder_hidden_states_dim=encoder_hidden_states_dim,
num_heads=num_heads,
qkv_bias=qkv_bias,
device=device,
dtype=dtype,
operations=operations
)
# Rotary-embedding layout parameters
self.class_interval = class_interval
self.class_range = class_range
self.max_humans = self.class_range // self.class_interval
# Constant bucket used for background tokens
self.rope_bak = int(self.class_range // 2)
self.rope_1d = RotaryPositionalEmbedding1D(self.head_dim)
def forward(
self,
x: torch.Tensor,
encoder_hidden_states: torch.Tensor,
shape=None,
x_ref_attn_map=None
) -> torch.Tensor:
encoder_hidden_states = encoder_hidden_states.squeeze(0).to(x.device)
human_num = x_ref_attn_map.shape[0] if x_ref_attn_map is not None else 1
# Single-speaker fall-through
if human_num <= 1:
return super().forward(x, encoder_hidden_states, shape)
N_t, N_h, N_w = shape
x_extra = None
if x.shape[0] * N_t != encoder_hidden_states.shape[0]:
x_extra = x[:, -N_h * N_w:, :]
x = x[:, :-N_h * N_w, :]
N_t = N_t - 1
x = rearrange(x, "B (N_t S) C -> (B N_t) S C", N_t=N_t)
# Query projection
B, N, C = x.shape
q = self.q_linear(x)
q = q.view(B, N, self.num_heads, self.head_dim).permute(0, 2, 1, 3)
# Use `class_range` logic for 2 speakers
rope_h1 = (0, self.class_interval)
rope_h2 = (self.class_range - self.class_interval, self.class_range)
rope_bak = int(self.class_range // 2)
# Normalize and scale attention maps for each speaker
max_values = x_ref_attn_map.max(1).values[:, None, None]
min_values = x_ref_attn_map.min(1).values[:, None, None]
max_min_values = torch.cat([max_values, min_values], dim=2)
human1_max_value, human1_min_value = max_min_values[0, :, 0].max(), max_min_values[0, :, 1].min()
human2_max_value, human2_min_value = max_min_values[1, :, 0].max(), max_min_values[1, :, 1].min()
human1 = normalize_and_scale(x_ref_attn_map[0], (human1_min_value, human1_max_value), rope_h1)
human2 = normalize_and_scale(x_ref_attn_map[1], (human2_min_value, human2_max_value), rope_h2)
back = torch.full((x_ref_attn_map.size(1),), rope_bak, dtype=human1.dtype, device=human1.device)
# Token-wise speaker dominance
max_indices = x_ref_attn_map.argmax(dim=0)
normalized_map = torch.stack([human1, human2, back], dim=1)
normalized_pos = normalized_map[torch.arange(x_ref_attn_map.size(1)), max_indices]
# Apply rotary to Q
q = rearrange(q, "(B N_t) H S C -> B H (N_t S) C", N_t=N_t)
q = self.rope_1d(q, normalized_pos)
q = rearrange(q, "B H (N_t S) C -> (B N_t) H S C", N_t=N_t)
# Keys / Values
_, N_a, _ = encoder_hidden_states.shape
encoder_kv = self.kv_linear(encoder_hidden_states)
encoder_kv = encoder_kv.view(B, N_a, 2, self.num_heads, self.head_dim).permute(2, 0, 3, 1, 4)
encoder_k, encoder_v = encoder_kv.unbind(0)
# Rotary for keys assign centre of each speaker bucket to its context tokens
per_frame = torch.zeros(N_a, dtype=encoder_k.dtype, device=encoder_k.device)
per_frame[: per_frame.size(0) // 2] = (rope_h1[0] + rope_h1[1]) / 2
per_frame[per_frame.size(0) // 2 :] = (rope_h2[0] + rope_h2[1]) / 2
encoder_pos = torch.cat([per_frame] * N_t, dim=0)
encoder_k = rearrange(encoder_k, "(B N_t) H S C -> B H (N_t S) C", N_t=N_t)
encoder_k = self.rope_1d(encoder_k, encoder_pos)
encoder_k = rearrange(encoder_k, "B H (N_t S) C -> (B N_t) H S C", N_t=N_t)
# Final attention
q = rearrange(q, "B H M K -> B M H K")
encoder_k = rearrange(encoder_k, "B H M K -> B M H K")
encoder_v = rearrange(encoder_v, "B H M K -> B M H K")
x = optimized_attention(
q.transpose(1, 2),
encoder_k.transpose(1, 2),
encoder_v.transpose(1, 2),
heads=self.num_heads, skip_reshape=True, skip_output_reshape=True).transpose(1, 2)
# Linear projection
x = x.reshape(B, N, C)
x = self.proj(x)
# Restore original layout
x = rearrange(x, "(B N_t) S C -> B (N_t S) C", N_t=N_t)
if x_extra is not None:
x = torch.cat([x, torch.zeros_like(x_extra)], dim=1)
return x
class MultiTalkAudioProjModel(torch.nn.Module):
def __init__(
self,
seq_len: int = 5,
seq_len_vf: int = 12,
blocks: int = 12,
channels: int = 768,
intermediate_dim: int = 512,
out_dim: int = 768,
context_tokens: int = 32,
device=None, dtype=None, operations=None
):
super().__init__()
self.seq_len = seq_len
self.blocks = blocks
self.channels = channels
self.input_dim = seq_len * blocks * channels
self.input_dim_vf = seq_len_vf * blocks * channels
self.intermediate_dim = intermediate_dim
self.context_tokens = context_tokens
self.out_dim = out_dim
# define multiple linear layers
self.proj1 = operations.Linear(self.input_dim, intermediate_dim, device=device, dtype=dtype)
self.proj1_vf = operations.Linear(self.input_dim_vf, intermediate_dim, device=device, dtype=dtype)
self.proj2 = operations.Linear(intermediate_dim, intermediate_dim, device=device, dtype=dtype)
self.proj3 = operations.Linear(intermediate_dim, context_tokens * out_dim, device=device, dtype=dtype)
self.norm = operations.LayerNorm(out_dim, device=device, dtype=dtype)
def forward(self, audio_embeds, audio_embeds_vf):
video_length = audio_embeds.shape[1] + audio_embeds_vf.shape[1]
B, _, _, S, C = audio_embeds.shape
# process audio of first frame
audio_embeds = rearrange(audio_embeds, "bz f w b c -> (bz f) w b c")
batch_size, window_size, blocks, channels = audio_embeds.shape
audio_embeds = audio_embeds.view(batch_size, window_size * blocks * channels)
# process audio of latter frame
audio_embeds_vf = rearrange(audio_embeds_vf, "bz f w b c -> (bz f) w b c")
batch_size_vf, window_size_vf, blocks_vf, channels_vf = audio_embeds_vf.shape
audio_embeds_vf = audio_embeds_vf.view(batch_size_vf, window_size_vf * blocks_vf * channels_vf)
# first projection
audio_embeds = torch.relu(self.proj1(audio_embeds))
audio_embeds_vf = torch.relu(self.proj1_vf(audio_embeds_vf))
audio_embeds = rearrange(audio_embeds, "(bz f) c -> bz f c", bz=B)
audio_embeds_vf = rearrange(audio_embeds_vf, "(bz f) c -> bz f c", bz=B)
audio_embeds_c = torch.concat([audio_embeds, audio_embeds_vf], dim=1)
batch_size_c, N_t, C_a = audio_embeds_c.shape
audio_embeds_c = audio_embeds_c.view(batch_size_c*N_t, C_a)
# second projection
audio_embeds_c = torch.relu(self.proj2(audio_embeds_c))
context_tokens = self.proj3(audio_embeds_c).reshape(batch_size_c*N_t, self.context_tokens, self.out_dim)
# normalization and reshape
context_tokens = self.norm(context_tokens)
context_tokens = rearrange(context_tokens, "(bz f) m c -> bz f m c", f=video_length)
return context_tokens
class WanMultiTalkAttentionBlock(torch.nn.Module):
def __init__(self, in_dim=5120, out_dim=768, device=None, dtype=None, operations=None):
super().__init__()
self.audio_cross_attn = SingleStreamMultiAttention(in_dim, out_dim, num_heads=40, qkv_bias=True, device=device, dtype=dtype, operations=operations)
self.norm_x = operations.LayerNorm(in_dim, device=device, dtype=dtype, elementwise_affine=True)
class MultiTalkGetAttnMapPatch:
def __init__(self, ref_target_masks=None):
self.ref_target_masks = ref_target_masks
def __call__(self, kwargs):
transformer_options = kwargs.get("transformer_options", {})
x = kwargs["x"]
if self.ref_target_masks is not None:
x_ref_attn_map = get_attn_map_with_target(kwargs["q"], kwargs["k"], transformer_options["grid_sizes"], ref_target_masks=self.ref_target_masks.to(x.device))
transformer_options["x_ref_attn_map"] = x_ref_attn_map
return x
class MultiTalkCrossAttnPatch:
def __init__(self, model_patch, audio_scale=1.0, ref_target_masks=None):
self.model_patch = model_patch
self.audio_scale = audio_scale
self.ref_target_masks = ref_target_masks
def __call__(self, kwargs):
transformer_options = kwargs.get("transformer_options", {})
block_idx = transformer_options.get("block_index", None)
x = kwargs["x"]
if block_idx is None:
return torch.zeros_like(x)
audio_embeds = transformer_options.get("audio_embeds")
x_ref_attn_map = transformer_options.pop("x_ref_attn_map", None)
norm_x = self.model_patch.model.blocks[block_idx].norm_x(x)
x_audio = self.model_patch.model.blocks[block_idx].audio_cross_attn(
norm_x, audio_embeds.to(x.dtype),
shape=transformer_options["grid_sizes"],
x_ref_attn_map=x_ref_attn_map
)
x = x + x_audio * self.audio_scale
return x
def models(self):
return [self.model_patch]
class MultiTalkApplyModelWrapper:
def __init__(self, init_latents):
self.init_latents = init_latents
def __call__(self, executor, x, *args, **kwargs):
x[:, :, :self.init_latents.shape[2]] = self.init_latents.to(x)
samples = executor(x, *args, **kwargs)
return samples
class InfiniteTalkOuterSampleWrapper:
def __init__(self, motion_frames_latent, model_patch, is_extend=False):
self.motion_frames_latent = motion_frames_latent
self.model_patch = model_patch
self.is_extend = is_extend
def __call__(self, executor, *args, **kwargs):
model_patcher = executor.class_obj.model_patcher
model_options = executor.class_obj.model_options
process_latent_in = model_patcher.model.process_latent_in
# for InfiniteTalk, model input first latent(s) need to always be replaced on every step
if self.motion_frames_latent is not None:
wrappers = model_options["transformer_options"]["wrappers"]
w = wrappers.setdefault(comfy.patcher_extension.WrappersMP.APPLY_MODEL, {})
w["MultiTalk_apply_model"] = [MultiTalkApplyModelWrapper(process_latent_in(self.motion_frames_latent))]
# run the sampling process
result = executor(*args, **kwargs)
# insert motion frames before decoding
if self.is_extend:
overlap = self.motion_frames_latent.shape[2]
result = torch.cat([self.motion_frames_latent.to(result), result[:, :, overlap:]], dim=2)
return result
def to(self, device_or_dtype):
if isinstance(device_or_dtype, torch.device):
if self.motion_frames_latent is not None:
self.motion_frames_latent = self.motion_frames_latent.to(device_or_dtype)
return self

View File

@@ -5,7 +5,7 @@ import torch
import torch.nn as nn
import torch.nn.functional as F
from einops import rearrange
from comfy.ldm.modules.diffusionmodules.model import vae_attention
from comfy.ldm.modules.diffusionmodules.model import vae_attention, torch_cat_if_needed
import comfy.ops
ops = comfy.ops.disable_weight_init
@@ -20,22 +20,29 @@ class CausalConv3d(ops.Conv3d):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._padding = (self.padding[2], self.padding[2], self.padding[1],
self.padding[1], 2 * self.padding[0], 0)
self.padding = (0, 0, 0)
self._padding = 2 * self.padding[0]
self.padding = (0, self.padding[1], self.padding[2])
def forward(self, x, cache_x=None, cache_list=None, cache_idx=None):
if cache_list is not None:
cache_x = cache_list[cache_idx]
cache_list[cache_idx] = None
padding = list(self._padding)
if cache_x is not None and self._padding[4] > 0:
cache_x = cache_x.to(x.device)
x = torch.cat([cache_x, x], dim=2)
padding[4] -= cache_x.shape[2]
if cache_x is None and x.shape[2] == 1:
#Fast path - the op will pad for use by truncating the weight
#and save math on a pile of zeros.
return super().forward(x, autopad="causal_zero")
if self._padding > 0:
padding_needed = self._padding
if cache_x is not None:
cache_x = cache_x.to(x.device)
padding_needed = max(0, padding_needed - cache_x.shape[2])
padding_shape = list(x.shape)
padding_shape[2] = padding_needed
padding = torch.zeros(padding_shape, device=x.device, dtype=x.dtype)
x = torch_cat_if_needed([padding, cache_x, x], dim=2)
del cache_x
x = F.pad(x, padding)
return super().forward(x)
@@ -472,10 +479,12 @@ class WanVAE(nn.Module):
def encode(self, x):
conv_idx = [0]
feat_map = [None] * count_conv3d(self.decoder)
## cache
t = x.shape[2]
iter_ = 1 + (t - 1) // 4
feat_map = None
if iter_ > 1:
feat_map = [None] * count_conv3d(self.decoder)
## 对encode输入的x按时间拆分为1、4、4、4....
for i in range(iter_):
conv_idx = [0]
@@ -495,10 +504,11 @@ class WanVAE(nn.Module):
def decode(self, z):
conv_idx = [0]
feat_map = [None] * count_conv3d(self.decoder)
# z: [b,c,t,h,w]
iter_ = z.shape[2]
feat_map = None
if iter_ > 1:
feat_map = [None] * count_conv3d(self.decoder)
x = self.conv2(z)
for i in range(iter_):
conv_idx = [0]

View File

@@ -260,6 +260,7 @@ def model_lora_keys_unet(model, key_map={}):
key_map["transformer.{}".format(k[:-len(".weight")])] = to #simpletrainer and probably regular diffusers flux lora format
key_map["lycoris_{}".format(k[:-len(".weight")].replace(".", "_"))] = to #simpletrainer lycoris
key_map["lora_transformer_{}".format(k[:-len(".weight")].replace(".", "_"))] = to #onetrainer
key_map[k[:-len(".weight")]] = to #DiffSynth lora format
for k in sdk:
hidden_size = model.model_config.unet_config.get("hidden_size", 0)
if k.endswith(".weight") and ".linear1." in k:
@@ -322,6 +323,7 @@ def model_lora_keys_unet(model, key_map={}):
key_map["diffusion_model.{}".format(key_lora)] = to
key_map["transformer.{}".format(key_lora)] = to
key_map["lycoris_{}".format(key_lora.replace(".", "_"))] = to
key_map[key_lora] = to
if isinstance(model, comfy.model_base.Kandinsky5):
for k in sdk:

View File

@@ -49,6 +49,7 @@ import comfy.ldm.ace.model
import comfy.ldm.omnigen.omnigen2
import comfy.ldm.qwen_image.model
import comfy.ldm.kandinsky5.model
import comfy.ldm.anima.model
import comfy.model_management
import comfy.patcher_extension
@@ -1147,9 +1148,31 @@ class CosmosPredict2(BaseModel):
sigma = (sigma / (sigma + 1))
return latent_image / (1.0 - sigma)
class Anima(BaseModel):
def __init__(self, model_config, model_type=ModelType.FLOW, device=None):
super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.anima.model.Anima)
def extra_conds(self, **kwargs):
out = super().extra_conds(**kwargs)
cross_attn = kwargs.get("cross_attn", None)
t5xxl_ids = kwargs.get("t5xxl_ids", None)
t5xxl_weights = kwargs.get("t5xxl_weights", None)
device = kwargs["device"]
if cross_attn is not None:
if t5xxl_ids is not None:
cross_attn = self.diffusion_model.preprocess_text_embeds(cross_attn.to(device=device, dtype=self.get_dtype()), t5xxl_ids.unsqueeze(0).to(device=device))
if t5xxl_weights is not None:
cross_attn *= t5xxl_weights.unsqueeze(0).unsqueeze(-1).to(cross_attn)
if cross_attn.shape[1] < 512:
cross_attn = torch.nn.functional.pad(cross_attn, (0, 0, 0, 512 - cross_attn.shape[1]))
out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)
return out
class Lumina2(BaseModel):
def __init__(self, model_config, model_type=ModelType.FLOW, device=None):
super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.lumina.model.NextDiT)
self.memory_usage_factor_conds = ("ref_latents",)
def extra_conds(self, **kwargs):
out = super().extra_conds(**kwargs)
@@ -1169,6 +1192,35 @@ class Lumina2(BaseModel):
if clip_text_pooled is not None:
out['clip_text_pooled'] = comfy.conds.CONDRegular(clip_text_pooled)
clip_vision_outputs = kwargs.get("clip_vision_outputs", list(map(lambda a: a.get("clip_vision_output"), kwargs.get("unclip_conditioning", [{}])))) # Z Image omni
if clip_vision_outputs is not None and len(clip_vision_outputs) > 0:
sigfeats = []
for clip_vision_output in clip_vision_outputs:
if clip_vision_output is not None:
image_size = clip_vision_output.image_sizes[0]
shape = clip_vision_output.last_hidden_state.shape
sigfeats.append(clip_vision_output.last_hidden_state.reshape(shape[0], image_size[1] // 16, image_size[2] // 16, shape[-1]))
if len(sigfeats) > 0:
out['siglip_feats'] = comfy.conds.CONDList(sigfeats)
ref_latents = kwargs.get("reference_latents", None)
if ref_latents is not None:
latents = []
for lat in ref_latents:
latents.append(self.process_latent_in(lat))
out['ref_latents'] = comfy.conds.CONDList(latents)
ref_contexts = kwargs.get("reference_latents_text_embeds", None)
if ref_contexts is not None:
out['ref_contexts'] = comfy.conds.CONDList(ref_contexts)
return out
def extra_conds_shapes(self, **kwargs):
out = {}
ref_latents = kwargs.get("reference_latents", None)
if ref_latents is not None:
out['ref_latents'] = list([1, 16, sum(map(lambda a: math.prod(a.size()[2:]), ref_latents))])
return out
class WAN21(BaseModel):
@@ -1526,6 +1578,9 @@ class QwenImage(BaseModel):
def extra_conds(self, **kwargs):
out = super().extra_conds(**kwargs)
attention_mask = kwargs.get("attention_mask", None)
if attention_mask is not None:
out['attention_mask'] = comfy.conds.CONDRegular(attention_mask)
cross_attn = kwargs.get("cross_attn", None)
if cross_attn is not None:
out['c_crossattn'] = comfy.conds.CONDRegular(cross_attn)

View File

@@ -237,6 +237,8 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
else:
dit_config["vec_in_dim"] = None
dit_config["num_heads"] = dit_config["hidden_size"] // sum(dit_config["axes_dim"])
dit_config["depth"] = count_blocks(state_dict_keys, '{}double_blocks.'.format(key_prefix) + '{}.')
dit_config["depth_single_blocks"] = count_blocks(state_dict_keys, '{}single_blocks.'.format(key_prefix) + '{}.')
if '{}distilled_guidance_layer.0.norms.0.scale'.format(key_prefix) in state_dict_keys or '{}distilled_guidance_layer.norms.0.scale'.format(key_prefix) in state_dict_keys: #Chroma
@@ -251,7 +253,7 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
dit_config["image_model"] = "chroma_radiance"
dit_config["in_channels"] = 3
dit_config["out_channels"] = 3
dit_config["patch_size"] = 16
dit_config["patch_size"] = state_dict.get('{}img_in_patch.weight'.format(key_prefix)).size(dim=-1)
dit_config["nerf_hidden_size"] = 64
dit_config["nerf_mlp_ratio"] = 4
dit_config["nerf_depth"] = 4
@@ -442,8 +444,15 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
dit_config["ffn_dim_multiplier"] = (8.0 / 3.0)
dit_config["z_image_modulation"] = True
dit_config["time_scale"] = 1000.0
try:
dit_config["allow_fp16"] = torch.std(state_dict['{}layers.{}.ffn_norm1.weight'.format(key_prefix, dit_config["n_layers"] - 2)], unbiased=False).item() < 0.42
except Exception:
pass
if '{}cap_pad_token'.format(key_prefix) in state_dict_keys:
dit_config["pad_tokens_multiple"] = 32
sig_weight = state_dict.get('{}siglip_embedder.0.weight'.format(key_prefix), None)
if sig_weight is not None:
dit_config["siglip_feat_dim"] = sig_weight.shape[0]
return dit_config
@@ -545,6 +554,8 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
if '{}blocks.0.mlp.layer1.weight'.format(key_prefix) in state_dict_keys: # Cosmos predict2
dit_config = {}
dit_config["image_model"] = "cosmos_predict2"
if "{}llm_adapter.blocks.0.cross_attn.q_proj.weight".format(key_prefix) in state_dict_keys:
dit_config["image_model"] = "anima"
dit_config["max_img_h"] = 240
dit_config["max_img_w"] = 240
dit_config["max_frames"] = 128

View File

@@ -368,7 +368,7 @@ try:
if any((a in arch) for a in ["gfx90a", "gfx942", "gfx1100", "gfx1101", "gfx1151"]): # TODO: more arches, TODO: gfx950
ENABLE_PYTORCH_ATTENTION = True
if rocm_version >= (7, 0):
if any((a in arch) for a in ["gfx1201"]):
if any((a in arch) for a in ["gfx1200", "gfx1201"]):
ENABLE_PYTORCH_ATTENTION = True
if torch_version_numeric >= (2, 7) and rocm_version >= (6, 4):
if any((a in arch) for a in ["gfx1200", "gfx1201", "gfx950"]): # TODO: more arches, "gfx942" gives error on pytorch nightly 2.10 1013 rocm7.0

View File

@@ -203,7 +203,9 @@ class disable_weight_init:
def reset_parameters(self):
return None
def _conv_forward(self, input, weight, bias, *args, **kwargs):
def _conv_forward(self, input, weight, bias, autopad=None, *args, **kwargs):
if autopad == "causal_zero":
weight = weight[:, :, -input.shape[2]:, :, :]
if NVIDIA_MEMORY_CONV_BUG_WORKAROUND and weight.dtype in (torch.float16, torch.bfloat16):
out = torch.cudnn_convolution(input, weight, self.padding, self.stride, self.dilation, self.groups, benchmark=False, deterministic=False, allow_tf32=True)
if bias is not None:
@@ -212,15 +214,15 @@ class disable_weight_init:
else:
return super()._conv_forward(input, weight, bias, *args, **kwargs)
def forward_comfy_cast_weights(self, input):
def forward_comfy_cast_weights(self, input, autopad=None):
weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
x = self._conv_forward(input, weight, bias)
x = self._conv_forward(input, weight, bias, autopad=autopad)
uncast_bias_weight(self, weight, bias, offload_stream)
return x
def forward(self, *args, **kwargs):
run_every_op()
if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0:
if self.comfy_cast_weights or len(self.weight_function) > 0 or len(self.bias_function) > 0 or "autopad" in kwargs:
return self.forward_comfy_cast_weights(*args, **kwargs)
else:
return super().forward(*args, **kwargs)
@@ -546,7 +548,8 @@ def mixed_precision_ops(quant_config={}, compute_dtype=torch.bfloat16, full_prec
weight_key = f"{prefix}weight"
weight = state_dict.pop(weight_key, None)
if weight is None:
raise ValueError(f"Missing weight for layer {layer_name}")
logging.warning(f"Missing weight for layer {layer_name}")
return
manually_loaded_keys = [weight_key]
@@ -624,21 +627,29 @@ def mixed_precision_ops(quant_config={}, compute_dtype=torch.bfloat16, full_prec
missing_keys.remove(key)
def state_dict(self, *args, destination=None, prefix="", **kwargs):
sd = super().state_dict(*args, destination=destination, prefix=prefix, **kwargs)
if isinstance(self.weight, QuantizedTensor):
layout_cls = self.weight._layout_cls
if destination is not None:
sd = destination
else:
sd = {}
# Check if it's any FP8 variant (E4M3 or E5M2)
if layout_cls in ("TensorCoreFP8E4M3Layout", "TensorCoreFP8E5M2Layout", "TensorCoreFP8Layout"):
sd["{}weight_scale".format(prefix)] = self.weight._params.scale
elif layout_cls == "TensorCoreNVFP4Layout":
sd["{}weight_scale_2".format(prefix)] = self.weight._params.scale
sd["{}weight_scale".format(prefix)] = self.weight._params.block_scale
if self.bias is not None:
sd["{}bias".format(prefix)] = self.bias
if isinstance(self.weight, QuantizedTensor):
sd_out = self.weight.state_dict("{}weight".format(prefix))
for k in sd_out:
sd[k] = sd_out[k]
quant_conf = {"format": self.quant_format}
if self._full_precision_mm_config:
quant_conf["full_precision_matrix_mult"] = True
sd["{}comfy_quant".format(prefix)] = torch.tensor(list(json.dumps(quant_conf).encode('utf-8')), dtype=torch.uint8)
input_scale = getattr(self, 'input_scale', None)
if input_scale is not None:
sd["{}input_scale".format(prefix)] = input_scale
else:
sd["{}weight".format(prefix)] = self.weight
return sd
def _forward(self, input, weight, bias):
@@ -690,7 +701,7 @@ def mixed_precision_ops(quant_config={}, compute_dtype=torch.bfloat16, full_prec
def set_weight(self, weight, inplace_update=False, seed=None, return_weight=False, **kwargs):
if getattr(self, 'layout_type', None) is not None:
# dtype is now implicit in the layout class
weight = QuantizedTensor.from_float(weight, self.layout_type, scale="recalculate", stochastic_rounding=seed, inplace_ops=True)
weight = QuantizedTensor.from_float(weight, self.layout_type, scale="recalculate", stochastic_rounding=seed, inplace_ops=True).to(self.weight.dtype)
else:
weight = weight.to(self.weight.dtype)
if return_weight:

View File

@@ -7,7 +7,7 @@ try:
QuantizedTensor,
QuantizedLayout,
TensorCoreFP8Layout as _CKFp8Layout,
TensorCoreNVFP4Layout, # Direct import, no wrapper needed
TensorCoreNVFP4Layout as _CKNvfp4Layout,
register_layout_op,
register_layout_class,
get_layout_class,
@@ -34,7 +34,7 @@ except ImportError as e:
class _CKFp8Layout:
pass
class TensorCoreNVFP4Layout:
class _CKNvfp4Layout:
pass
def register_layout_class(name, cls):
@@ -84,6 +84,39 @@ class _TensorCoreFP8LayoutBase(_CKFp8Layout):
return qdata, params
class TensorCoreNVFP4Layout(_CKNvfp4Layout):
@classmethod
def quantize(cls, tensor, scale=None, stochastic_rounding=0, inplace_ops=False):
if tensor.dim() != 2:
raise ValueError(f"NVFP4 requires 2D tensor, got {tensor.dim()}D")
orig_dtype = tensor.dtype
orig_shape = tuple(tensor.shape)
if scale is None or (isinstance(scale, str) and scale == "recalculate"):
scale = torch.amax(tensor.abs()) / (ck.float_utils.F8_E4M3_MAX * ck.float_utils.F4_E2M1_MAX)
if not isinstance(scale, torch.Tensor):
scale = torch.tensor(scale)
scale = scale.to(device=tensor.device, dtype=torch.float32)
padded_shape = cls.get_padded_shape(orig_shape)
needs_padding = padded_shape != orig_shape
if stochastic_rounding > 0:
qdata, block_scale = comfy.float.stochastic_round_quantize_nvfp4_by_block(tensor, scale, pad_16x=needs_padding, seed=stochastic_rounding)
else:
qdata, block_scale = ck.quantize_nvfp4(tensor, scale, pad_16x=needs_padding)
params = cls.Params(
scale=scale,
orig_dtype=orig_dtype,
orig_shape=orig_shape,
block_scale=block_scale,
)
return qdata, params
class TensorCoreFP8E4M3Layout(_TensorCoreFP8LayoutBase):
FP8_DTYPE = torch.float8_e4m3fn

View File

@@ -37,12 +37,18 @@ def prepare_noise(latent_image, seed, noise_inds=None):
return noises
def fix_empty_latent_channels(model, latent_image):
def fix_empty_latent_channels(model, latent_image, downscale_ratio_spacial=None):
if latent_image.is_nested:
return latent_image
latent_format = model.get_model_object("latent_format") #Resize the empty latent image so it has the right number of channels
if latent_format.latent_channels != latent_image.shape[1] and torch.count_nonzero(latent_image) == 0:
latent_image = comfy.utils.repeat_to_batch_size(latent_image, latent_format.latent_channels, dim=1)
if torch.count_nonzero(latent_image) == 0:
if latent_format.latent_channels != latent_image.shape[1]:
latent_image = comfy.utils.repeat_to_batch_size(latent_image, latent_format.latent_channels, dim=1)
if downscale_ratio_spacial is not None:
if downscale_ratio_spacial != latent_format.spacial_downscale_ratio:
ratio = downscale_ratio_spacial / latent_format.spacial_downscale_ratio
latent_image = comfy.utils.common_upscale(latent_image, round(latent_image.shape[-1] * ratio), round(latent_image.shape[-2] * ratio), "nearest-exact", crop="disabled")
if latent_format.latent_dimensions == 3 and latent_image.ndim == 4:
latent_image = latent_image.unsqueeze(2)
return latent_image

View File

@@ -20,6 +20,7 @@ import comfy.ldm.ace.vae.music_dcae_pipeline
import comfy.ldm.hunyuan_video.vae
import comfy.ldm.mmaudio.vae.autoencoder
import comfy.pixel_space_convert
import comfy.weight_adapter
import yaml
import math
import os
@@ -57,6 +58,7 @@ import comfy.text_encoders.ovis
import comfy.text_encoders.kandinsky5
import comfy.text_encoders.jina_clip_2
import comfy.text_encoders.newbie
import comfy.text_encoders.anima
import comfy.model_patcher
import comfy.lora
@@ -100,6 +102,105 @@ def load_lora_for_models(model, clip, lora, strength_model, strength_clip):
return (new_modelpatcher, new_clip)
def load_bypass_lora_for_models(model, clip, lora, strength_model, strength_clip):
"""
Load LoRA in bypass mode without modifying base model weights.
Instead of patching weights, this injects the LoRA computation into the
forward pass: output = base_forward(x) + lora_path(x)
Non-adapter patches (bias diff, weight diff, etc.) are applied as regular patches.
This is useful for training and when model weights are offloaded.
"""
key_map = {}
if model is not None:
key_map = comfy.lora.model_lora_keys_unet(model.model, key_map)
if clip is not None:
key_map = comfy.lora.model_lora_keys_clip(clip.cond_stage_model, key_map)
logging.debug(f"[BypassLoRA] key_map has {len(key_map)} entries")
lora = comfy.lora_convert.convert_lora(lora)
loaded = comfy.lora.load_lora(lora, key_map)
logging.debug(f"[BypassLoRA] loaded has {len(loaded)} entries")
# Separate adapters (for bypass) from other patches (for regular patching)
bypass_patches = {} # WeightAdapterBase instances -> bypass mode
regular_patches = {} # diff, set, bias patches -> regular weight patching
for key, patch_data in loaded.items():
if isinstance(patch_data, comfy.weight_adapter.WeightAdapterBase):
bypass_patches[key] = patch_data
else:
regular_patches[key] = patch_data
logging.debug(f"[BypassLoRA] {len(bypass_patches)} bypass adapters, {len(regular_patches)} regular patches")
k = set()
k1 = set()
if model is not None:
new_modelpatcher = model.clone()
# Apply regular patches (bias diff, weight diff, etc.) via normal patching
if regular_patches:
patched_keys = new_modelpatcher.add_patches(regular_patches, strength_model)
k.update(patched_keys)
# Apply adapter patches via bypass injection
manager = comfy.weight_adapter.BypassInjectionManager()
model_sd_keys = set(new_modelpatcher.model.state_dict().keys())
for key, adapter in bypass_patches.items():
if key in model_sd_keys:
manager.add_adapter(key, adapter, strength=strength_model)
k.add(key)
else:
logging.warning(f"[BypassLoRA] Adapter key not in model state_dict: {key}")
injections = manager.create_injections(new_modelpatcher.model)
if manager.get_hook_count() > 0:
new_modelpatcher.set_injections("bypass_lora", injections)
else:
new_modelpatcher = None
if clip is not None:
new_clip = clip.clone()
# Apply regular patches to clip
if regular_patches:
patched_keys = new_clip.add_patches(regular_patches, strength_clip)
k1.update(patched_keys)
# Apply adapter patches via bypass injection
clip_manager = comfy.weight_adapter.BypassInjectionManager()
clip_sd_keys = set(new_clip.cond_stage_model.state_dict().keys())
for key, adapter in bypass_patches.items():
if key in clip_sd_keys:
clip_manager.add_adapter(key, adapter, strength=strength_clip)
k1.add(key)
clip_injections = clip_manager.create_injections(new_clip.cond_stage_model)
if clip_manager.get_hook_count() > 0:
new_clip.patcher.set_injections("bypass_lora", clip_injections)
else:
new_clip = None
for x in loaded:
if (x not in k) and (x not in k1):
patch_data = loaded[x]
patch_type = type(patch_data).__name__
if isinstance(patch_data, tuple):
patch_type = f"tuple({patch_data[0]})"
logging.warning(f"NOT LOADED: {x} (type={patch_type})")
return (new_modelpatcher, new_clip)
class CLIP:
def __init__(self, target=None, embedding_directory=None, no_init=False, tokenizer_data={}, parameters=0, state_dict=[], model_options={}):
if no_init:
@@ -635,14 +736,13 @@ class VAE:
self.upscale_index_formula = (4, 16, 16)
self.downscale_ratio = (lambda a: max(0, math.floor((a + 3) / 4)), 16, 16)
self.downscale_index_formula = (4, 16, 16)
if self.latent_channels == 48: # Wan 2.2
if self.latent_channels in [48, 128]: # Wan 2.2 and LTX2
self.first_stage_model = comfy.taesd.taehv.TAEHV(latent_channels=self.latent_channels, latent_format=None) # taehv doesn't need scaling
self.process_input = lambda image: (_ for _ in ()).throw(NotImplementedError("This light tae doesn't support encoding currently"))
self.process_input = self.process_output = lambda image: image
self.process_output = lambda image: image
self.memory_used_decode = lambda shape, dtype: (1800 * (max(1, (shape[-3] ** 0.7 * 0.1)) * shape[-2] * shape[-1] * 16 * 16) * model_management.dtype_size(dtype))
elif self.latent_channels == 32 and sd["decoder.22.bias"].shape[0] == 12: # lighttae_hv15
self.first_stage_model = comfy.taesd.taehv.TAEHV(latent_channels=self.latent_channels, latent_format=comfy.latent_formats.HunyuanVideo15)
self.process_input = lambda image: (_ for _ in ()).throw(NotImplementedError("This light tae doesn't support encoding currently"))
self.memory_used_decode = lambda shape, dtype: (1200 * (max(1, (shape[-3] ** 0.7 * 0.05)) * shape[-2] * shape[-1] * 32 * 32) * model_management.dtype_size(dtype))
else:
if sd["decoder.1.weight"].dtype == torch.float16: # taehv currently only available in float16, so assume it's not lighttaew2_1 as otherwise state dicts are identical
@@ -1014,6 +1114,7 @@ class CLIPType(Enum):
KANDINSKY5 = 22
KANDINSKY5_IMAGE = 23
NEWBIE = 24
FLUX2 = 25
def load_clip(ckpt_paths, embedding_directory=None, clip_type=CLIPType.STABLE_DIFFUSION, model_options={}):
@@ -1046,6 +1147,8 @@ class TEModel(Enum):
QWEN3_2B = 17
GEMMA_3_12B = 18
JINA_CLIP_2 = 19
QWEN3_8B = 20
QWEN3_06B = 21
def detect_te_model(sd):
@@ -1059,9 +1162,9 @@ def detect_te_model(sd):
return TEModel.JINA_CLIP_2
if "encoder.block.23.layer.1.DenseReluDense.wi_1.weight" in sd:
weight = sd["encoder.block.23.layer.1.DenseReluDense.wi_1.weight"]
if weight.shape[-1] == 4096:
if weight.shape[0] == 10240:
return TEModel.T5_XXL
elif weight.shape[-1] == 2048:
elif weight.shape[0] == 5120:
return TEModel.T5_XL
if 'encoder.block.23.layer.1.DenseReluDense.wi.weight' in sd:
return TEModel.T5_XXL_OLD
@@ -1089,6 +1192,10 @@ def detect_te_model(sd):
return TEModel.QWEN3_4B
elif weight.shape[0] == 2048:
return TEModel.QWEN3_2B
elif weight.shape[0] == 4096:
return TEModel.QWEN3_8B
elif weight.shape[0] == 1024:
return TEModel.QWEN3_06B
if weight.shape[0] == 5120:
if "model.layers.39.post_attention_layernorm.weight" in sd:
return TEModel.MISTRAL3_24B
@@ -1214,14 +1321,24 @@ def load_text_encoder_state_dicts(state_dicts=[], embedding_directory=None, clip
clip_target.tokenizer = comfy.text_encoders.flux.Flux2Tokenizer
tokenizer_data["tekken_model"] = clip_data[0].get("tekken_model", None)
elif te_model == TEModel.QWEN3_4B:
clip_target.clip = comfy.text_encoders.z_image.te(**llama_detect(clip_data))
clip_target.tokenizer = comfy.text_encoders.z_image.ZImageTokenizer
if clip_type == CLIPType.FLUX or clip_type == CLIPType.FLUX2:
clip_target.clip = comfy.text_encoders.flux.klein_te(**llama_detect(clip_data), model_type="qwen3_4b")
clip_target.tokenizer = comfy.text_encoders.flux.KleinTokenizer
else:
clip_target.clip = comfy.text_encoders.z_image.te(**llama_detect(clip_data))
clip_target.tokenizer = comfy.text_encoders.z_image.ZImageTokenizer
elif te_model == TEModel.QWEN3_2B:
clip_target.clip = comfy.text_encoders.ovis.te(**llama_detect(clip_data))
clip_target.tokenizer = comfy.text_encoders.ovis.OvisTokenizer
elif te_model == TEModel.QWEN3_8B:
clip_target.clip = comfy.text_encoders.flux.klein_te(**llama_detect(clip_data), model_type="qwen3_8b")
clip_target.tokenizer = comfy.text_encoders.flux.KleinTokenizer8B
elif te_model == TEModel.JINA_CLIP_2:
clip_target.clip = comfy.text_encoders.jina_clip_2.JinaClip2TextModelWrapper
clip_target.tokenizer = comfy.text_encoders.jina_clip_2.JinaClip2TokenizerWrapper
elif te_model == TEModel.QWEN3_06B:
clip_target.clip = comfy.text_encoders.anima.te(**llama_detect(clip_data))
clip_target.tokenizer = comfy.text_encoders.anima.AnimaTokenizer
else:
# clip_l
if clip_type == CLIPType.SD3:

View File

@@ -466,7 +466,7 @@ def load_embed(embedding_name, embedding_directory, embedding_size, embed_key=No
return embed_out
class SDTokenizer:
def __init__(self, tokenizer_path=None, max_length=77, pad_with_end=True, embedding_directory=None, embedding_size=768, embedding_key='clip_l', tokenizer_class=CLIPTokenizer, has_start_token=True, has_end_token=True, pad_to_max_length=True, min_length=None, pad_token=None, end_token=None, min_padding=None, pad_left=False, disable_weights=False, tokenizer_data={}, tokenizer_args={}):
def __init__(self, tokenizer_path=None, max_length=77, pad_with_end=True, embedding_directory=None, embedding_size=768, embedding_key='clip_l', tokenizer_class=CLIPTokenizer, has_start_token=True, has_end_token=True, pad_to_max_length=True, min_length=None, pad_token=None, end_token=None, start_token=None, min_padding=None, pad_left=False, disable_weights=False, tokenizer_data={}, tokenizer_args={}):
if tokenizer_path is None:
tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "sd1_tokenizer")
self.tokenizer = tokenizer_class.from_pretrained(tokenizer_path, **tokenizer_args)
@@ -479,8 +479,15 @@ class SDTokenizer:
empty = self.tokenizer('')["input_ids"]
self.tokenizer_adds_end_token = has_end_token
if has_start_token:
self.tokens_start = 1
self.start_token = empty[0]
if len(empty) > 0:
self.tokens_start = 1
self.start_token = empty[0]
else:
self.tokens_start = 0
self.start_token = start_token
if start_token is None:
logging.warning("WARNING: There's something wrong with your tokenizers.'")
if end_token is not None:
self.end_token = end_token
else:
@@ -488,7 +495,7 @@ class SDTokenizer:
self.end_token = empty[1]
else:
self.tokens_start = 0
self.start_token = None
self.start_token = start_token
if end_token is not None:
self.end_token = end_token
else:

View File

@@ -23,6 +23,7 @@ import comfy.text_encoders.qwen_image
import comfy.text_encoders.hunyuan_image
import comfy.text_encoders.kandinsky5
import comfy.text_encoders.z_image
import comfy.text_encoders.anima
from . import supported_models_base
from . import latent_formats
@@ -763,17 +764,31 @@ class Flux2(Flux):
def __init__(self, unet_config):
super().__init__(unet_config)
self.memory_usage_factor = self.memory_usage_factor * (2.0 * 2.0) * 2.36
self.memory_usage_factor = self.memory_usage_factor * (2.0 * 2.0) * (unet_config['hidden_size'] / 2604)
def get_model(self, state_dict, prefix="", device=None):
out = model_base.Flux2(self, device=device)
return out
def clip_target(self, state_dict={}):
return None # TODO
pref = self.text_encoder_key_prefix[0]
t5_detect = comfy.text_encoders.sd3_clip.t5_xxl_detect(state_dict, "{}t5xxl.transformer.".format(pref))
return supported_models_base.ClipTarget(comfy.text_encoders.flux.FluxTokenizer, comfy.text_encoders.flux.flux_clip(**t5_detect))
detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen3_4b.transformer.".format(pref))
if len(detect) > 0:
detect["model_type"] = "qwen3_4b"
return supported_models_base.ClipTarget(comfy.text_encoders.flux.KleinTokenizer, comfy.text_encoders.flux.klein_te(**detect))
detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen3_8b.transformer.".format(pref))
if len(detect) > 0:
detect["model_type"] = "qwen3_8b"
return supported_models_base.ClipTarget(comfy.text_encoders.flux.KleinTokenizer8B, comfy.text_encoders.flux.klein_te(**detect))
detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}mistral3_24b.transformer.".format(pref))
if len(detect) > 0:
if "{}mistral3_24b.transformer.model.layers.39.post_attention_layernorm.weight".format(pref) not in state_dict:
detect["pruned"] = True
return supported_models_base.ClipTarget(comfy.text_encoders.flux.Flux2Tokenizer, comfy.text_encoders.flux.flux2_te(**detect))
return None
class GenmoMochi(supported_models_base.BASE):
unet_config = {
@@ -845,7 +860,7 @@ class LTXAV(LTXV):
def __init__(self, unet_config):
super().__init__(unet_config)
self.memory_usage_factor = 0.061 # TODO
self.memory_usage_factor = 0.077 # TODO
def get_model(self, state_dict, prefix="", device=None):
out = model_base.LTXAV(self, device=device)
@@ -992,6 +1007,36 @@ class CosmosT2IPredict2(supported_models_base.BASE):
t5_detect = comfy.text_encoders.sd3_clip.t5_xxl_detect(state_dict, "{}t5xxl.transformer.".format(pref))
return supported_models_base.ClipTarget(comfy.text_encoders.cosmos.CosmosT5Tokenizer, comfy.text_encoders.cosmos.te(**t5_detect))
class Anima(supported_models_base.BASE):
unet_config = {
"image_model": "anima",
}
sampling_settings = {
"multiplier": 1.0,
"shift": 3.0,
}
unet_extra_config = {}
latent_format = latent_formats.Wan21
memory_usage_factor = 1.0
supported_inference_dtypes = [torch.bfloat16, torch.float32]
def __init__(self, unet_config):
super().__init__(unet_config)
self.memory_usage_factor = (unet_config.get("model_channels", 2048) / 2048) * 0.95
def get_model(self, state_dict, prefix="", device=None):
out = model_base.Anima(self, device=device)
return out
def clip_target(self, state_dict={}):
pref = self.text_encoder_key_prefix[0]
detect = comfy.text_encoders.hunyuan_video.llama_detect(state_dict, "{}qwen3_06b.transformer.".format(pref))
return supported_models_base.ClipTarget(comfy.text_encoders.anima.AnimaTokenizer, comfy.text_encoders.anima.te(**detect))
class CosmosI2VPredict2(CosmosT2IPredict2):
unet_config = {
"image_model": "cosmos_predict2",
@@ -1042,13 +1087,13 @@ class ZImage(Lumina2):
"shift": 3.0,
}
memory_usage_factor = 2.0
memory_usage_factor = 2.8
supported_inference_dtypes = [torch.bfloat16, torch.float32]
def __init__(self, unet_config):
super().__init__(unet_config)
if comfy.model_management.extended_fp16_support():
if comfy.model_management.extended_fp16_support() and unet_config.get("allow_fp16", False):
self.supported_inference_dtypes = self.supported_inference_dtypes.copy()
self.supported_inference_dtypes.insert(1, torch.float16)
@@ -1551,6 +1596,6 @@ class Kandinsky5Image(Kandinsky5):
return supported_models_base.ClipTarget(comfy.text_encoders.kandinsky5.Kandinsky5TokenizerImage, comfy.text_encoders.kandinsky5.te(**hunyuan_detect))
models = [LotusD, Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, LTXAV, HunyuanVideo15_SR_Distilled, HunyuanVideo15, HunyuanImage21Refiner, HunyuanImage21, HunyuanVideoSkyreelsI2V, HunyuanVideoI2V, HunyuanVideo, CosmosT2V, CosmosI2V, CosmosT2IPredict2, CosmosI2VPredict2, ZImage, Lumina2, WAN22_T2V, WAN21_T2V, WAN21_I2V, WAN21_FunControl2V, WAN21_Vace, WAN21_Camera, WAN22_Camera, WAN22_S2V, WAN21_HuMo, WAN22_Animate, Hunyuan3Dv2mini, Hunyuan3Dv2, Hunyuan3Dv2_1, HiDream, Chroma, ChromaRadiance, ACEStep, Omnigen2, QwenImage, Flux2, Kandinsky5Image, Kandinsky5]
models = [LotusD, Stable_Zero123, SD15_instructpix2pix, SD15, SD20, SD21UnclipL, SD21UnclipH, SDXL_instructpix2pix, SDXLRefiner, SDXL, SSD1B, KOALA_700M, KOALA_1B, Segmind_Vega, SD_X4Upscaler, Stable_Cascade_C, Stable_Cascade_B, SV3D_u, SV3D_p, SD3, StableAudio, AuraFlow, PixArtAlpha, PixArtSigma, HunyuanDiT, HunyuanDiT1, FluxInpaint, Flux, FluxSchnell, GenmoMochi, LTXV, LTXAV, HunyuanVideo15_SR_Distilled, HunyuanVideo15, HunyuanImage21Refiner, HunyuanImage21, HunyuanVideoSkyreelsI2V, HunyuanVideoI2V, HunyuanVideo, CosmosT2V, CosmosI2V, CosmosT2IPredict2, CosmosI2VPredict2, ZImage, Lumina2, WAN22_T2V, WAN21_T2V, WAN21_I2V, WAN21_FunControl2V, WAN21_Vace, WAN21_Camera, WAN22_Camera, WAN22_S2V, WAN21_HuMo, WAN22_Animate, Hunyuan3Dv2mini, Hunyuan3Dv2, Hunyuan3Dv2_1, HiDream, Chroma, ChromaRadiance, ACEStep, Omnigen2, QwenImage, Flux2, Kandinsky5Image, Kandinsky5, Anima]
models += [SVD_img2vid]

View File

@@ -112,7 +112,8 @@ def apply_model_with_memblocks(model, x, parallel, show_progress_bar):
class TAEHV(nn.Module):
def __init__(self, latent_channels, parallel=False, decoder_time_upscale=(True, True), decoder_space_upscale=(True, True, True), latent_format=None, show_progress_bar=True):
def __init__(self, latent_channels, parallel=False, encoder_time_downscale=(True, True, False), decoder_time_upscale=(False, True, True), decoder_space_upscale=(True, True, True),
latent_format=None, show_progress_bar=False):
super().__init__()
self.image_channels = 3
self.patch_size = 1
@@ -124,6 +125,9 @@ class TAEHV(nn.Module):
self.process_out = latent_format().process_out if latent_format is not None else (lambda x: x)
if self.latent_channels in [48, 32]: # Wan 2.2 and HunyuanVideo1.5
self.patch_size = 2
elif self.latent_channels == 128: # LTX2
self.patch_size, self.latent_channels, encoder_time_downscale, decoder_time_upscale = 4, 128, (True, True, True), (True, True, True)
if self.latent_channels == 32: # HunyuanVideo1.5
act_func = nn.LeakyReLU(0.2, inplace=True)
else: # HunyuanVideo, Wan 2.1
@@ -131,41 +135,52 @@ class TAEHV(nn.Module):
self.encoder = nn.Sequential(
conv(self.image_channels*self.patch_size**2, 64), act_func,
TPool(64, 2), conv(64, 64, stride=2, bias=False), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func),
TPool(64, 2), conv(64, 64, stride=2, bias=False), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func),
TPool(64, 1), conv(64, 64, stride=2, bias=False), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func),
TPool(64, 2 if encoder_time_downscale[0] else 1), conv(64, 64, stride=2, bias=False), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func),
TPool(64, 2 if encoder_time_downscale[1] else 1), conv(64, 64, stride=2, bias=False), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func),
TPool(64, 2 if encoder_time_downscale[2] else 1), conv(64, 64, stride=2, bias=False), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func), MemBlock(64, 64, act_func),
conv(64, self.latent_channels),
)
n_f = [256, 128, 64, 64]
self.frames_to_trim = 2**sum(decoder_time_upscale) - 1
self.decoder = nn.Sequential(
Clamp(), conv(self.latent_channels, n_f[0]), act_func,
MemBlock(n_f[0], n_f[0], act_func), MemBlock(n_f[0], n_f[0], act_func), MemBlock(n_f[0], n_f[0], act_func), nn.Upsample(scale_factor=2 if decoder_space_upscale[0] else 1), TGrow(n_f[0], 1), conv(n_f[0], n_f[1], bias=False),
MemBlock(n_f[1], n_f[1], act_func), MemBlock(n_f[1], n_f[1], act_func), MemBlock(n_f[1], n_f[1], act_func), nn.Upsample(scale_factor=2 if decoder_space_upscale[1] else 1), TGrow(n_f[1], 2 if decoder_time_upscale[0] else 1), conv(n_f[1], n_f[2], bias=False),
MemBlock(n_f[2], n_f[2], act_func), MemBlock(n_f[2], n_f[2], act_func), MemBlock(n_f[2], n_f[2], act_func), nn.Upsample(scale_factor=2 if decoder_space_upscale[2] else 1), TGrow(n_f[2], 2 if decoder_time_upscale[1] else 1), conv(n_f[2], n_f[3], bias=False),
MemBlock(n_f[0], n_f[0], act_func), MemBlock(n_f[0], n_f[0], act_func), MemBlock(n_f[0], n_f[0], act_func), nn.Upsample(scale_factor=2 if decoder_space_upscale[0] else 1), TGrow(n_f[0], 2 if decoder_time_upscale[0] else 1), conv(n_f[0], n_f[1], bias=False),
MemBlock(n_f[1], n_f[1], act_func), MemBlock(n_f[1], n_f[1], act_func), MemBlock(n_f[1], n_f[1], act_func), nn.Upsample(scale_factor=2 if decoder_space_upscale[1] else 1), TGrow(n_f[1], 2 if decoder_time_upscale[1] else 1), conv(n_f[1], n_f[2], bias=False),
MemBlock(n_f[2], n_f[2], act_func), MemBlock(n_f[2], n_f[2], act_func), MemBlock(n_f[2], n_f[2], act_func), nn.Upsample(scale_factor=2 if decoder_space_upscale[2] else 1), TGrow(n_f[2], 2 if decoder_time_upscale[2] else 1), conv(n_f[2], n_f[3], bias=False),
act_func, conv(n_f[3], self.image_channels*self.patch_size**2),
)
@property
def show_progress_bar(self):
return self._show_progress_bar
@show_progress_bar.setter
def show_progress_bar(self, value):
self._show_progress_bar = value
self.t_downscale = 2**sum(t.stride == 2 for t in self.encoder if isinstance(t, TPool))
self.t_upscale = 2**sum(t.stride == 2 for t in self.decoder if isinstance(t, TGrow))
self.frames_to_trim = self.t_upscale - 1
self._show_progress_bar = show_progress_bar
@property
def show_progress_bar(self):
return self._show_progress_bar
@show_progress_bar.setter
def show_progress_bar(self, value):
self._show_progress_bar = value
def encode(self, x, **kwargs):
if self.patch_size > 1:
x = F.pixel_unshuffle(x, self.patch_size)
x = x.movedim(2, 1) # [B, C, T, H, W] -> [B, T, C, H, W]
if x.shape[1] % 4 != 0:
# pad at end to multiple of 4
n_pad = 4 - x.shape[1] % 4
if self.patch_size > 1:
B, T, C, H, W = x.shape
x = x.reshape(B * T, C, H, W)
x = F.pixel_unshuffle(x, self.patch_size)
x = x.reshape(B, T, C * self.patch_size ** 2, H // self.patch_size, W // self.patch_size)
if x.shape[1] % self.t_downscale != 0:
# pad at end to multiple of t_downscale
n_pad = self.t_downscale - x.shape[1] % self.t_downscale
padding = x[:, -1:].repeat_interleave(n_pad, dim=1)
x = torch.cat([x, padding], 1)
x = apply_model_with_memblocks(self.encoder, x, self.parallel, self.show_progress_bar).movedim(2, 1)
return self.process_out(x)
def decode(self, x, **kwargs):
x = x.unsqueeze(0) if x.ndim == 4 else x # [T, C, H, W] -> [1, T, C, H, W]
x = x.movedim(1, 2) if x.shape[1] != self.latent_channels else x # [B, T, C, H, W] or [B, C, T, H, W]
x = self.process_in(x).movedim(2, 1) # [B, C, T, H, W] -> [B, T, C, H, W]
x = apply_model_with_memblocks(self.decoder, x, self.parallel, self.show_progress_bar)
if self.patch_size > 1:

View File

@@ -0,0 +1,61 @@
from transformers import Qwen2Tokenizer, T5TokenizerFast
import comfy.text_encoders.llama
from comfy import sd1_clip
import os
import torch
class Qwen3Tokenizer(sd1_clip.SDTokenizer):
def __init__(self, embedding_directory=None, tokenizer_data={}):
tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "qwen25_tokenizer")
super().__init__(tokenizer_path, pad_with_end=False, embedding_size=1024, embedding_key='qwen3_06b', tokenizer_class=Qwen2Tokenizer, has_start_token=False, has_end_token=False, pad_to_max_length=False, max_length=99999999, min_length=1, pad_token=151643, tokenizer_data=tokenizer_data)
class T5XXLTokenizer(sd1_clip.SDTokenizer):
def __init__(self, embedding_directory=None, tokenizer_data={}):
tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "t5_tokenizer")
super().__init__(tokenizer_path, embedding_directory=embedding_directory, pad_with_end=False, embedding_size=4096, embedding_key='t5xxl', tokenizer_class=T5TokenizerFast, has_start_token=False, pad_to_max_length=False, max_length=99999999, min_length=1, tokenizer_data=tokenizer_data)
class AnimaTokenizer:
def __init__(self, embedding_directory=None, tokenizer_data={}):
self.qwen3_06b = Qwen3Tokenizer(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data)
self.t5xxl = T5XXLTokenizer(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data)
def tokenize_with_weights(self, text:str, return_word_ids=False, **kwargs):
out = {}
qwen_ids = self.qwen3_06b.tokenize_with_weights(text, return_word_ids, **kwargs)
out["qwen3_06b"] = [[(token, 1.0) for token, _ in inner_list] for inner_list in qwen_ids] # Set weights to 1.0
out["t5xxl"] = self.t5xxl.tokenize_with_weights(text, return_word_ids, **kwargs)
return out
def untokenize(self, token_weight_pair):
return self.t5xxl.untokenize(token_weight_pair)
def state_dict(self):
return {}
class Qwen3_06BModel(sd1_clip.SDClipModel):
def __init__(self, device="cpu", layer="last", layer_idx=None, dtype=None, attention_mask=True, model_options={}):
super().__init__(device=device, layer=layer, layer_idx=layer_idx, textmodel_json_config={}, dtype=dtype, special_tokens={"pad": 151643}, layer_norm_hidden_state=False, model_class=comfy.text_encoders.llama.Qwen3_06B, enable_attention_masks=attention_mask, return_attention_masks=attention_mask, model_options=model_options)
class AnimaTEModel(sd1_clip.SD1ClipModel):
def __init__(self, device="cpu", dtype=None, model_options={}):
super().__init__(device=device, dtype=dtype, name="qwen3_06b", clip_model=Qwen3_06BModel, model_options=model_options)
def encode_token_weights(self, token_weight_pairs):
out = super().encode_token_weights(token_weight_pairs)
out[2]["t5xxl_ids"] = torch.tensor(list(map(lambda a: a[0], token_weight_pairs["t5xxl"][0])), dtype=torch.int)
out[2]["t5xxl_weights"] = torch.tensor(list(map(lambda a: a[1], token_weight_pairs["t5xxl"][0])))
return out
def te(dtype_llama=None, llama_quantization_metadata=None):
class AnimaTEModel_(AnimaTEModel):
def __init__(self, device="cpu", dtype=None, model_options={}):
if dtype_llama is not None:
dtype = dtype_llama
if llama_quantization_metadata is not None:
model_options = model_options.copy()
model_options["quantization_metadata"] = llama_quantization_metadata
super().__init__(device=device, dtype=dtype, model_options=model_options)
return AnimaTEModel_

View File

@@ -36,7 +36,7 @@ def te(dtype_t5=None, t5_quantization_metadata=None):
if t5_quantization_metadata is not None:
model_options = model_options.copy()
model_options["t5xxl_quantization_metadata"] = t5_quantization_metadata
if dtype is None:
if dtype_t5 is not None:
dtype = dtype_t5
super().__init__(device=device, dtype=dtype, model_options=model_options)
return CosmosTEModel_

View File

@@ -3,7 +3,7 @@ import comfy.text_encoders.t5
import comfy.text_encoders.sd3_clip
import comfy.text_encoders.llama
import comfy.model_management
from transformers import T5TokenizerFast, LlamaTokenizerFast
from transformers import T5TokenizerFast, LlamaTokenizerFast, Qwen2Tokenizer
import torch
import os
import json
@@ -118,7 +118,7 @@ class MistralTokenizerClass:
class Mistral3Tokenizer(sd1_clip.SDTokenizer):
def __init__(self, embedding_directory=None, tokenizer_data={}):
self.tekken_data = tokenizer_data.get("tekken_model", None)
super().__init__("", pad_with_end=False, embedding_size=5120, embedding_key='mistral3_24b', tokenizer_class=MistralTokenizerClass, has_end_token=False, pad_to_max_length=False, pad_token=11, max_length=99999999, min_length=1, pad_left=True, tokenizer_args=load_mistral_tokenizer(self.tekken_data), tokenizer_data=tokenizer_data)
super().__init__("", pad_with_end=False, embedding_size=5120, embedding_key='mistral3_24b', tokenizer_class=MistralTokenizerClass, has_end_token=False, pad_to_max_length=False, pad_token=11, start_token=1, max_length=99999999, min_length=1, pad_left=True, tokenizer_args=load_mistral_tokenizer(self.tekken_data), tokenizer_data=tokenizer_data)
def state_dict(self):
return {"tekken_model": self.tekken_data}
@@ -172,3 +172,60 @@ def flux2_te(dtype_llama=None, llama_quantization_metadata=None, pruned=False):
model_options["num_layers"] = 30
super().__init__(device=device, dtype=dtype, model_options=model_options)
return Flux2TEModel_
class Qwen3Tokenizer(sd1_clip.SDTokenizer):
def __init__(self, embedding_directory=None, tokenizer_data={}):
tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "qwen25_tokenizer")
super().__init__(tokenizer_path, pad_with_end=False, embedding_size=2560, embedding_key='qwen3_4b', tokenizer_class=Qwen2Tokenizer, has_start_token=False, has_end_token=False, pad_to_max_length=False, max_length=99999999, min_length=512, pad_token=151643, tokenizer_data=tokenizer_data)
class Qwen3Tokenizer8B(sd1_clip.SDTokenizer):
def __init__(self, embedding_directory=None, tokenizer_data={}):
tokenizer_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "qwen25_tokenizer")
super().__init__(tokenizer_path, pad_with_end=False, embedding_size=4096, embedding_key='qwen3_8b', tokenizer_class=Qwen2Tokenizer, has_start_token=False, has_end_token=False, pad_to_max_length=False, max_length=99999999, min_length=512, pad_token=151643, tokenizer_data=tokenizer_data)
class KleinTokenizer(sd1_clip.SD1Tokenizer):
def __init__(self, embedding_directory=None, tokenizer_data={}, name="qwen3_4b"):
if name == "qwen3_4b":
tokenizer = Qwen3Tokenizer
elif name == "qwen3_8b":
tokenizer = Qwen3Tokenizer8B
super().__init__(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data, name=name, tokenizer=tokenizer)
self.llama_template = "<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
def tokenize_with_weights(self, text, return_word_ids=False, llama_template=None, **kwargs):
if llama_template is None:
llama_text = self.llama_template.format(text)
else:
llama_text = llama_template.format(text)
tokens = super().tokenize_with_weights(llama_text, return_word_ids=return_word_ids, disable_weights=True, **kwargs)
return tokens
class KleinTokenizer8B(KleinTokenizer):
def __init__(self, embedding_directory=None, tokenizer_data={}, name="qwen3_8b"):
super().__init__(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data, name=name)
class Qwen3_4BModel(sd1_clip.SDClipModel):
def __init__(self, device="cpu", layer=[9, 18, 27], layer_idx=None, dtype=None, attention_mask=True, model_options={}):
super().__init__(device=device, layer=layer, layer_idx=layer_idx, textmodel_json_config={}, dtype=dtype, special_tokens={"pad": 151643}, layer_norm_hidden_state=False, model_class=comfy.text_encoders.llama.Qwen3_4B, enable_attention_masks=attention_mask, return_attention_masks=attention_mask, model_options=model_options)
class Qwen3_8BModel(sd1_clip.SDClipModel):
def __init__(self, device="cpu", layer=[9, 18, 27], layer_idx=None, dtype=None, attention_mask=True, model_options={}):
super().__init__(device=device, layer=layer, layer_idx=layer_idx, textmodel_json_config={}, dtype=dtype, special_tokens={"pad": 151643}, layer_norm_hidden_state=False, model_class=comfy.text_encoders.llama.Qwen3_8B, enable_attention_masks=attention_mask, return_attention_masks=attention_mask, model_options=model_options)
def klein_te(dtype_llama=None, llama_quantization_metadata=None, model_type="qwen3_4b"):
if model_type == "qwen3_4b":
model = Qwen3_4BModel
elif model_type == "qwen3_8b":
model = Qwen3_8BModel
class Flux2TEModel_(Flux2TEModel):
def __init__(self, device="cpu", dtype=None, model_options={}):
if llama_quantization_metadata is not None:
model_options = model_options.copy()
model_options["quantization_metadata"] = llama_quantization_metadata
if dtype_llama is not None:
dtype = dtype_llama
super().__init__(device=device, dtype=dtype, name=model_type, model_options=model_options, clip_model=model)
return Flux2TEModel_

View File

@@ -32,7 +32,7 @@ def mochi_te(dtype_t5=None, t5_quantization_metadata=None):
if t5_quantization_metadata is not None:
model_options = model_options.copy()
model_options["t5xxl_quantization_metadata"] = t5_quantization_metadata
if dtype is None:
if dtype_t5 is not None:
dtype = dtype_t5
super().__init__(device=device, dtype=dtype, model_options=model_options)
return MochiTEModel_

View File

@@ -10,9 +10,11 @@ import comfy.utils
def llama_detect(state_dict, prefix=""):
out = {}
t5_key = "{}model.norm.weight".format(prefix)
if t5_key in state_dict:
out["dtype_llama"] = state_dict[t5_key].dtype
norm_keys = ["{}model.norm.weight".format(prefix), "{}model.layers.0.input_layernorm.weight".format(prefix)]
for norm_key in norm_keys:
if norm_key in state_dict:
out["dtype_llama"] = state_dict[norm_key].dtype
break
quant = comfy.utils.detect_layer_quantization(state_dict, prefix)
if quant is not None:

View File

@@ -77,6 +77,28 @@ class Qwen25_3BConfig:
rope_scale = None
final_norm: bool = True
@dataclass
class Qwen3_06BConfig:
vocab_size: int = 151936
hidden_size: int = 1024
intermediate_size: int = 3072
num_hidden_layers: int = 28
num_attention_heads: int = 16
num_key_value_heads: int = 8
max_position_embeddings: int = 32768
rms_norm_eps: float = 1e-6
rope_theta: float = 1000000.0
transformer_type: str = "llama"
head_dim = 128
rms_norm_add = False
mlp_activation = "silu"
qkv_bias = False
rope_dims = None
q_norm = "gemma3"
k_norm = "gemma3"
rope_scale = None
final_norm: bool = True
@dataclass
class Qwen3_4BConfig:
vocab_size: int = 151936
@@ -99,6 +121,28 @@ class Qwen3_4BConfig:
rope_scale = None
final_norm: bool = True
@dataclass
class Qwen3_8BConfig:
vocab_size: int = 151936
hidden_size: int = 4096
intermediate_size: int = 12288
num_hidden_layers: int = 36
num_attention_heads: int = 32
num_key_value_heads: int = 8
max_position_embeddings: int = 40960
rms_norm_eps: float = 1e-6
rope_theta: float = 1000000.0
transformer_type: str = "llama"
head_dim = 128
rms_norm_add = False
mlp_activation = "silu"
qkv_bias = False
rope_dims = None
q_norm = "gemma3"
k_norm = "gemma3"
rope_scale = None
final_norm: bool = True
@dataclass
class Ovis25_2BConfig:
vocab_size: int = 151936
@@ -619,6 +663,15 @@ class Qwen25_3B(BaseLlama, torch.nn.Module):
self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
self.dtype = dtype
class Qwen3_06B(BaseLlama, torch.nn.Module):
def __init__(self, config_dict, dtype, device, operations):
super().__init__()
config = Qwen3_06BConfig(**config_dict)
self.num_layers = config.num_hidden_layers
self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
self.dtype = dtype
class Qwen3_4B(BaseLlama, torch.nn.Module):
def __init__(self, config_dict, dtype, device, operations):
super().__init__()
@@ -628,6 +681,15 @@ class Qwen3_4B(BaseLlama, torch.nn.Module):
self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
self.dtype = dtype
class Qwen3_8B(BaseLlama, torch.nn.Module):
def __init__(self, config_dict, dtype, device, operations):
super().__init__()
config = Qwen3_8BConfig(**config_dict)
self.num_layers = config.num_hidden_layers
self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
self.dtype = dtype
class Ovis25_2B(BaseLlama, torch.nn.Module):
def __init__(self, config_dict, dtype, device, operations):
super().__init__()

View File

@@ -119,7 +119,17 @@ class LTXAVTEModel(torch.nn.Module):
if len(sdo) == 0:
sdo = sd
return self.load_state_dict(sdo, strict=False)
missing_all = []
unexpected_all = []
for prefix, component in [("text_embedding_projection.", self.text_embedding_projection), ("video_embeddings_connector.", self.video_embeddings_connector), ("audio_embeddings_connector.", self.audio_embeddings_connector)]:
component_sd = {k.replace(prefix, ""): v for k, v in sdo.items() if k.startswith(prefix)}
if component_sd:
missing, unexpected = component.load_state_dict(component_sd, strict=False)
missing_all.extend([f"{prefix}{k}" for k in missing])
unexpected_all.extend([f"{prefix}{k}" for k in unexpected])
return (missing_all, unexpected_all)
def memory_estimation_function(self, token_weight_pairs, device=None):
constant = 6.0

View File

@@ -61,6 +61,7 @@ def te(dtype_llama=None, llama_quantization_metadata=None):
if dtype_llama is not None:
dtype = dtype_llama
if llama_quantization_metadata is not None:
model_options = model_options.copy()
model_options["quantization_metadata"] = llama_quantization_metadata
super().__init__(device=device, dtype=dtype, model_options=model_options)
return OvisTEModel_

View File

@@ -36,7 +36,7 @@ def pixart_te(dtype_t5=None, t5_quantization_metadata=None):
if t5_quantization_metadata is not None:
model_options = model_options.copy()
model_options["t5xxl_quantization_metadata"] = t5_quantization_metadata
if dtype is None:
if dtype_t5 is not None:
dtype = dtype_t5
super().__init__(device=device, dtype=dtype, model_options=model_options)
return PixArtTEModel_

View File

@@ -40,6 +40,7 @@ def te(dtype_llama=None, llama_quantization_metadata=None):
if dtype_llama is not None:
dtype = dtype_llama
if llama_quantization_metadata is not None:
model_options = model_options.copy()
model_options["quantization_metadata"] = llama_quantization_metadata
super().__init__(device=device, dtype=dtype, model_options=model_options)
return ZImageTEModel_

View File

@@ -30,6 +30,7 @@ from torch.nn.functional import interpolate
from einops import rearrange
from comfy.cli_args import args
import json
import time
MMAP_TORCH_FILES = args.mmap_torch_files
DISABLE_MMAP = args.disable_mmap
@@ -610,6 +611,14 @@ def flux_to_diffusers(mmdit_config, output_prefix=""):
"ff_context.net.0.proj.bias": "txt_mlp.0.bias",
"ff_context.net.2.weight": "txt_mlp.2.weight",
"ff_context.net.2.bias": "txt_mlp.2.bias",
"ff.linear_in.weight": "img_mlp.0.weight", # LyCoris LoKr
"ff.linear_in.bias": "img_mlp.0.bias",
"ff.linear_out.weight": "img_mlp.2.weight",
"ff.linear_out.bias": "img_mlp.2.bias",
"ff_context.linear_in.weight": "txt_mlp.0.weight",
"ff_context.linear_in.bias": "txt_mlp.0.bias",
"ff_context.linear_out.weight": "txt_mlp.2.weight",
"ff_context.linear_out.bias": "txt_mlp.2.bias",
"attn.norm_q.weight": "img_attn.norm.query_norm.scale",
"attn.norm_k.weight": "img_attn.norm.key_norm.scale",
"attn.norm_added_q.weight": "txt_attn.norm.query_norm.scale",
@@ -638,6 +647,8 @@ def flux_to_diffusers(mmdit_config, output_prefix=""):
"proj_out.bias": "linear2.bias",
"attn.norm_q.weight": "norm.query_norm.scale",
"attn.norm_k.weight": "norm.key_norm.scale",
"attn.to_qkv_mlp_proj.weight": "linear1.weight", # Flux 2
"attn.to_out.weight": "linear2.weight", # Flux 2
}
for k in block_map:
@@ -928,7 +939,9 @@ def bislerp(samples, width, height):
return result.to(orig_dtype)
def lanczos(samples, width, height):
images = [Image.fromarray(np.clip(255. * image.movedim(0, -1).cpu().numpy(), 0, 255).astype(np.uint8)) for image in samples]
#the below API is strict and expects grayscale to be squeezed
samples = samples.squeeze(1) if samples.shape[1] == 1 else samples.movedim(1, -1)
images = [Image.fromarray(np.clip(255. * image.cpu().numpy(), 0, 255).astype(np.uint8)) for image in samples]
images = [image.resize((width, height), resample=Image.Resampling.LANCZOS) for image in images]
images = [torch.from_numpy(np.array(image).astype(np.float32) / 255.0).movedim(-1, 0) for image in images]
result = torch.stack(images)
@@ -1097,6 +1110,10 @@ def set_progress_bar_global_hook(function):
global PROGRESS_BAR_HOOK
PROGRESS_BAR_HOOK = function
# Throttle settings for progress bar updates to reduce WebSocket flooding
PROGRESS_THROTTLE_MIN_INTERVAL = 0.1 # 100ms minimum between updates
PROGRESS_THROTTLE_MIN_PERCENT = 0.5 # 0.5% minimum progress change
class ProgressBar:
def __init__(self, total, node_id=None):
global PROGRESS_BAR_HOOK
@@ -1104,6 +1121,8 @@ class ProgressBar:
self.current = 0
self.hook = PROGRESS_BAR_HOOK
self.node_id = node_id
self._last_update_time = 0.0
self._last_sent_value = -1
def update_absolute(self, value, total=None, preview=None):
if total is not None:
@@ -1112,7 +1131,29 @@ class ProgressBar:
value = self.total
self.current = value
if self.hook is not None:
self.hook(self.current, self.total, preview, node_id=self.node_id)
current_time = time.perf_counter()
is_first = (self._last_sent_value < 0)
is_final = (value >= self.total)
has_preview = (preview is not None)
# Always send immediately for previews, first update, or final update
if has_preview or is_first or is_final:
self.hook(self.current, self.total, preview, node_id=self.node_id)
self._last_update_time = current_time
self._last_sent_value = value
return
# Apply throttling for regular progress updates
if self.total > 0:
percent_changed = ((value - max(0, self._last_sent_value)) / self.total) * 100
else:
percent_changed = 100
time_elapsed = current_time - self._last_update_time
if time_elapsed >= PROGRESS_THROTTLE_MIN_INTERVAL and percent_changed >= PROGRESS_THROTTLE_MIN_PERCENT:
self.hook(self.current, self.total, preview, node_id=self.node_id)
self._last_update_time = current_time
self._last_sent_value = value
def update(self, value):
self.update_absolute(self.current + value)

View File

@@ -5,6 +5,11 @@ from .lokr import LoKrAdapter
from .glora import GLoRAAdapter
from .oft import OFTAdapter
from .boft import BOFTAdapter
from .bypass import (
BypassInjectionManager,
BypassForwardHook,
create_bypass_injections_from_patches,
)
adapters: list[type[WeightAdapterBase]] = [
@@ -31,4 +36,7 @@ __all__ = [
"WeightAdapterTrainBase",
"adapters",
"adapter_maps",
"BypassInjectionManager",
"BypassForwardHook",
"create_bypass_injections_from_patches",
] + [a.__name__ for a in adapters]

View File

@@ -1,4 +1,4 @@
from typing import Optional
from typing import Callable, Optional
import torch
import torch.nn as nn
@@ -7,12 +7,35 @@ import comfy.model_management
class WeightAdapterBase:
"""
Base class for weight adapters (LoRA, LoHa, LoKr, OFT, etc.)
Bypass Mode:
All adapters follow the pattern: bypass(f)(x) = g(f(x) + h(x))
- h(x): Additive component (LoRA path). Returns delta to add to base output.
- g(y): Output transformation. Applied after base + h(x).
For LoRA/LoHa/LoKr: g = identity, h = adapter(x)
For OFT/BOFT: g = transform, h = 0
"""
name: str
loaded_keys: set[str]
weights: list[torch.Tensor]
# Attributes set by bypass system
multiplier: float = 1.0
shape: tuple = None # (out_features, in_features) or (out_ch, in_ch, *kernel)
@classmethod
def load(cls, x: str, lora: dict[str, torch.Tensor], alpha: float, dora_scale: torch.Tensor) -> Optional["WeightAdapterBase"]:
def load(
cls,
x: str,
lora: dict[str, torch.Tensor],
alpha: float,
dora_scale: torch.Tensor,
) -> Optional["WeightAdapterBase"]:
raise NotImplementedError
def to_train(self) -> "WeightAdapterTrainBase":
@@ -39,18 +62,202 @@ class WeightAdapterBase:
):
raise NotImplementedError
# ===== Bypass Mode Methods =====
#
# IMPORTANT: Bypass mode is designed for quantized models where original weights
# may not be accessible in a usable format. Therefore, h() and bypass_forward()
# do NOT take org_weight as a parameter. All necessary information (out_channels,
# in_channels, conv params, etc.) is provided via attributes set by BypassForwardHook.
def h(self, x: torch.Tensor, base_out: torch.Tensor) -> torch.Tensor:
"""
Additive bypass component: h(x, base_out)
Computes the adapter's contribution to be added to base forward output.
For adapters that only transform output (OFT/BOFT), returns zeros.
Note:
This method does NOT access original model weights. Bypass mode is
designed for quantized models where weights may not be in a usable format.
All shape info comes from module attributes set by BypassForwardHook.
Args:
x: Input tensor
base_out: Output from base forward f(x), can be used for shape reference
Returns:
Delta tensor to add to base output. Shape matches base output.
Reference: LyCORIS LoConModule.bypass_forward_diff
"""
# Default: no additive component (for OFT/BOFT)
# Simply return zeros matching base_out shape
return torch.zeros_like(base_out)
def g(self, y: torch.Tensor) -> torch.Tensor:
"""
Output transformation: g(y)
Applied after base forward + h(x). For most adapters this is identity.
OFT/BOFT override this to apply orthogonal transformation.
Args:
y: Combined output (base + h(x))
Returns:
Transformed output
Reference: LyCORIS OFTModule applies orthogonal transform here
"""
# Default: identity (for LoRA/LoHa/LoKr)
return y
def bypass_forward(
self,
org_forward: Callable,
x: torch.Tensor,
*args,
**kwargs,
) -> torch.Tensor:
"""
Full bypass forward: g(f(x) + h(x, f(x)))
Note:
This method does NOT take org_weight/org_bias parameters. Bypass mode
is designed for quantized models where weights may not be accessible.
The original forward function handles weight access internally.
Args:
org_forward: Original module forward function
x: Input tensor
*args, **kwargs: Additional arguments for org_forward
Returns:
Output with adapter applied in bypass mode
Reference: LyCORIS LoConModule.bypass_forward
"""
# Base forward: f(x)
base_out = org_forward(x, *args, **kwargs)
# Additive component: h(x, base_out) - base_out provided for shape reference
h_out = self.h(x, base_out)
# Output transformation: g(base + h)
return self.g(base_out + h_out)
class WeightAdapterTrainBase(nn.Module):
# We follow the scheme of PR #7032
"""
Base class for trainable weight adapters (LoRA, LoHa, LoKr, OFT, etc.)
Bypass Mode:
All adapters follow the pattern: bypass(f)(x) = g(f(x) + h(x))
- h(x): Additive component (LoRA path). Returns delta to add to base output.
- g(y): Output transformation. Applied after base + h(x).
For LoRA/LoHa/LoKr: g = identity, h = adapter(x)
For OFT: g = transform, h = 0
Note:
Unlike WeightAdapterBase, TrainBase classes have simplified weight formats
with fewer branches (e.g., LoKr only has w1/w2, not w1_a/w1_b decomposition).
We follow the scheme of PR #7032
"""
# Attributes set by bypass system (BypassForwardHook)
# These are set before h()/g()/bypass_forward() are called
multiplier: float = 1.0
is_conv: bool = False
conv_dim: int = 0 # 0=linear, 1=conv1d, 2=conv2d, 3=conv3d
kw_dict: dict = {} # Conv kwargs: stride, padding, dilation, groups
kernel_size: tuple = ()
in_channels: int = None
out_channels: int = None
def __init__(self):
super().__init__()
def __call__(self, w):
"""
w: The original weight tensor to be modified.
Weight modification mode: returns modified weight.
Args:
w: The original weight tensor to be modified.
Returns:
Modified weight tensor.
"""
raise NotImplementedError
# ===== Bypass Mode Methods =====
def h(self, x: torch.Tensor, base_out: torch.Tensor) -> torch.Tensor:
"""
Additive bypass component: h(x, base_out)
Computes the adapter's contribution to be added to base forward output.
For adapters that only transform output (OFT), returns zeros.
Args:
x: Input tensor
base_out: Output from base forward f(x), can be used for shape reference
Returns:
Delta tensor to add to base output. Shape matches base output.
Subclasses should override this method.
"""
raise NotImplementedError(
f"{self.__class__.__name__}.h() not implemented. "
"Subclasses must implement h() for bypass mode."
)
def g(self, y: torch.Tensor) -> torch.Tensor:
"""
Output transformation: g(y)
Applied after base forward + h(x). For most adapters this is identity.
OFT overrides this to apply orthogonal transformation.
Args:
y: Combined output (base + h(x))
Returns:
Transformed output
"""
# Default: identity (for LoRA/LoHa/LoKr)
return y
def bypass_forward(
self,
org_forward: Callable,
x: torch.Tensor,
*args,
**kwargs,
) -> torch.Tensor:
"""
Full bypass forward: g(f(x) + h(x, f(x)))
Args:
org_forward: Original module forward function
x: Input tensor
*args, **kwargs: Additional arguments for org_forward
Returns:
Output with adapter applied in bypass mode
"""
# Base forward: f(x)
base_out = org_forward(x, *args, **kwargs)
# Additive component: h(x, base_out) - base_out provided for shape reference
h_out = self.h(x, base_out)
# Output transformation: g(base + h)
return self.g(base_out + h_out)
def passive_memory_usage(self):
raise NotImplementedError("passive_memory_usage is not implemented")
@@ -59,8 +266,12 @@ class WeightAdapterTrainBase(nn.Module):
return self.passive_memory_usage()
def weight_decompose(dora_scale, weight, lora_diff, alpha, strength, intermediate_dtype, function):
dora_scale = comfy.model_management.cast_to_device(dora_scale, weight.device, intermediate_dtype)
def weight_decompose(
dora_scale, weight, lora_diff, alpha, strength, intermediate_dtype, function
):
dora_scale = comfy.model_management.cast_to_device(
dora_scale, weight.device, intermediate_dtype
)
lora_diff *= alpha
weight_calc = weight + function(lora_diff).type(weight.dtype)
@@ -106,10 +317,14 @@ def pad_tensor_to_shape(tensor: torch.Tensor, new_shape: list[int]) -> torch.Ten
the original tensor will be truncated in that dimension.
"""
if any([new_shape[i] < tensor.shape[i] for i in range(len(new_shape))]):
raise ValueError("The new shape must be larger than the original tensor in all dimensions")
raise ValueError(
"The new shape must be larger than the original tensor in all dimensions"
)
if len(new_shape) != len(tensor.shape):
raise ValueError("The new shape must have the same number of dimensions as the original tensor")
raise ValueError(
"The new shape must have the same number of dimensions as the original tensor"
)
# Create a new tensor filled with zeros
padded_tensor = torch.zeros(new_shape, dtype=tensor.dtype, device=tensor.device)

View File

@@ -62,9 +62,13 @@ class BOFTAdapter(WeightAdapterBase):
alpha = v[2]
dora_scale = v[3]
blocks = comfy.model_management.cast_to_device(blocks, weight.device, intermediate_dtype)
blocks = comfy.model_management.cast_to_device(
blocks, weight.device, intermediate_dtype
)
if rescale is not None:
rescale = comfy.model_management.cast_to_device(rescale, weight.device, intermediate_dtype)
rescale = comfy.model_management.cast_to_device(
rescale, weight.device, intermediate_dtype
)
boft_m, block_num, boft_b, *_ = blocks.shape
@@ -74,7 +78,7 @@ class BOFTAdapter(WeightAdapterBase):
# for Q = -Q^T
q = blocks - blocks.transpose(-1, -2)
normed_q = q
if alpha > 0: # alpha in boft/bboft is for constraint
if alpha > 0: # alpha in boft/bboft is for constraint
q_norm = torch.norm(q) + 1e-8
if q_norm > alpha:
normed_q = q * alpha / q_norm
@@ -83,13 +87,13 @@ class BOFTAdapter(WeightAdapterBase):
r = r.to(weight)
inp = org = weight
r_b = boft_b//2
r_b = boft_b // 2
for i in range(boft_m):
bi = r[i]
g = 2
k = 2**i * r_b
if strength != 1:
bi = bi * strength + (1-strength) * I
bi = bi * strength + (1 - strength) * I
inp = (
inp.unflatten(0, (-1, g, k))
.transpose(1, 2)
@@ -98,18 +102,117 @@ class BOFTAdapter(WeightAdapterBase):
)
inp = torch.einsum("b i j, b j ...-> b i ...", bi, inp)
inp = (
inp.flatten(0, 1).unflatten(0, (-1, k, g)).transpose(1, 2).flatten(0, 2)
inp.flatten(0, 1)
.unflatten(0, (-1, k, g))
.transpose(1, 2)
.flatten(0, 2)
)
if rescale is not None:
inp = inp * rescale
lora_diff = inp - org
lora_diff = comfy.model_management.cast_to_device(lora_diff, weight.device, intermediate_dtype)
lora_diff = comfy.model_management.cast_to_device(
lora_diff, weight.device, intermediate_dtype
)
if dora_scale is not None:
weight = weight_decompose(dora_scale, weight, lora_diff, alpha, strength, intermediate_dtype, function)
weight = weight_decompose(
dora_scale,
weight,
lora_diff,
alpha,
strength,
intermediate_dtype,
function,
)
else:
weight += function((strength * lora_diff).type(weight.dtype))
except Exception as e:
logging.error("ERROR {} {} {}".format(self.name, key, e))
return weight
def _get_orthogonal_matrices(self, device, dtype):
"""Compute the orthogonal rotation matrices R from BOFT blocks."""
v = self.weights
blocks = v[0].to(device=device, dtype=dtype)
alpha = v[2]
if alpha is None:
alpha = 0
boft_m, block_num, boft_b, _ = blocks.shape
I = torch.eye(boft_b, device=device, dtype=dtype)
# Q = blocks - blocks^T (skew-symmetric)
q = blocks - blocks.transpose(-1, -2)
normed_q = q
# Apply constraint if alpha > 0
if alpha > 0:
q_norm = torch.norm(q) + 1e-8
if q_norm > alpha:
normed_q = q * alpha / q_norm
# Cayley transform: R = (I + Q)(I - Q)^-1
r = (I + normed_q) @ (I - normed_q).float().inverse()
return r, boft_m, boft_b
def g(self, y: torch.Tensor) -> torch.Tensor:
"""
Output transformation for BOFT: applies butterfly orthogonal transform.
BOFT uses multiple stages of butterfly-structured orthogonal transforms.
Reference: LyCORIS ButterflyOFTModule._bypass_forward
"""
v = self.weights
rescale = v[1]
r, boft_m, boft_b = self._get_orthogonal_matrices(y.device, y.dtype)
r_b = boft_b // 2
# Apply multiplier
multiplier = getattr(self, "multiplier", 1.0)
I = torch.eye(boft_b, device=y.device, dtype=y.dtype)
# Use module info from bypass injection to determine conv vs linear
is_conv = getattr(self, "is_conv", y.dim() > 2)
if is_conv:
# Conv output: (N, C, H, W, ...) -> transpose to (N, H, W, ..., C)
y = y.transpose(1, -1)
# Apply butterfly transform stages
inp = y
for i in range(boft_m):
bi = r[i] # (block_num, boft_b, boft_b)
g = 2
k = 2**i * r_b
# Interpolate with identity based on multiplier
if multiplier != 1:
bi = bi * multiplier + (1 - multiplier) * I
# Reshape for butterfly: unflatten last dim, transpose, flatten, unflatten
inp = (
inp.unflatten(-1, (-1, g, k))
.transpose(-2, -1)
.flatten(-3)
.unflatten(-1, (-1, boft_b))
)
# Apply block-diagonal orthogonal transform
inp = torch.einsum("b i j, ... b j -> ... b i", bi, inp)
# Reshape back
inp = (
inp.flatten(-2).unflatten(-1, (-1, k, g)).transpose(-2, -1).flatten(-3)
)
# Apply rescale if present
if rescale is not None:
rescale = rescale.to(device=y.device, dtype=y.dtype)
inp = inp * rescale.transpose(0, -1)
if is_conv:
# Transpose back: (N, H, W, ..., C) -> (N, C, H, W, ...)
inp = inp.transpose(1, -1)
return inp

View File

@@ -0,0 +1,437 @@
"""
Bypass mode implementation for weight adapters (LoRA, LoKr, LoHa, etc.)
Bypass mode applies adapters during forward pass without modifying base weights:
bypass(f)(x) = g(f(x) + h(x))
Where:
- f(x): Original layer forward
- h(x): Additive component from adapter (LoRA path)
- g(y): Output transformation (identity for most adapters)
This is useful for:
- Training with gradient checkpointing
- Avoiding weight modifications when weights are offloaded
- Supporting multiple adapters with different strengths dynamically
"""
import logging
from typing import Optional, Union
import torch
import torch.nn as nn
from .base import WeightAdapterBase, WeightAdapterTrainBase
from comfy.patcher_extension import PatcherInjection
# Type alias for adapters that support bypass mode
BypassAdapter = Union[WeightAdapterBase, WeightAdapterTrainBase]
def get_module_type_info(module: nn.Module) -> dict:
"""
Determine module type and extract conv parameters from module class.
This is more reliable than checking weight.ndim, especially for quantized layers
where weight shape might be different.
Returns:
dict with keys: is_conv, conv_dim, stride, padding, dilation, groups
"""
info = {
"is_conv": False,
"conv_dim": 0,
"stride": (1,),
"padding": (0,),
"dilation": (1,),
"groups": 1,
"kernel_size": (1,),
"in_channels": None,
"out_channels": None,
}
# Determine conv type
if isinstance(module, nn.Conv1d):
info["is_conv"] = True
info["conv_dim"] = 1
elif isinstance(module, nn.Conv2d):
info["is_conv"] = True
info["conv_dim"] = 2
elif isinstance(module, nn.Conv3d):
info["is_conv"] = True
info["conv_dim"] = 3
elif isinstance(module, nn.Linear):
info["is_conv"] = False
info["conv_dim"] = 0
else:
# Try to infer from class name for custom/quantized layers
class_name = type(module).__name__.lower()
if "conv3d" in class_name:
info["is_conv"] = True
info["conv_dim"] = 3
elif "conv2d" in class_name:
info["is_conv"] = True
info["conv_dim"] = 2
elif "conv1d" in class_name:
info["is_conv"] = True
info["conv_dim"] = 1
elif "conv" in class_name:
info["is_conv"] = True
info["conv_dim"] = 2
# Extract conv parameters if it's a conv layer
if info["is_conv"]:
# Try to get stride, padding, dilation, groups, kernel_size from module
info["stride"] = getattr(module, "stride", (1,) * info["conv_dim"])
info["padding"] = getattr(module, "padding", (0,) * info["conv_dim"])
info["dilation"] = getattr(module, "dilation", (1,) * info["conv_dim"])
info["groups"] = getattr(module, "groups", 1)
info["kernel_size"] = getattr(module, "kernel_size", (1,) * info["conv_dim"])
info["in_channels"] = getattr(module, "in_channels", None)
info["out_channels"] = getattr(module, "out_channels", None)
# Ensure they're tuples
if isinstance(info["stride"], int):
info["stride"] = (info["stride"],) * info["conv_dim"]
if isinstance(info["padding"], int):
info["padding"] = (info["padding"],) * info["conv_dim"]
if isinstance(info["dilation"], int):
info["dilation"] = (info["dilation"],) * info["conv_dim"]
if isinstance(info["kernel_size"], int):
info["kernel_size"] = (info["kernel_size"],) * info["conv_dim"]
return info
class BypassForwardHook:
"""
Hook that wraps a layer's forward to apply adapter in bypass mode.
Stores the original forward and replaces it with bypass version.
Supports both:
- WeightAdapterBase: Inference adapters (uses self.weights tuple)
- WeightAdapterTrainBase: Training adapters (nn.Module with parameters)
"""
def __init__(
self,
module: nn.Module,
adapter: BypassAdapter,
multiplier: float = 1.0,
):
self.module = module
self.adapter = adapter
self.multiplier = multiplier
self.original_forward = None
# Determine layer type and conv params from module class (works for quantized layers)
module_info = get_module_type_info(module)
# Set multiplier and layer type info on adapter for use in h()
adapter.multiplier = multiplier
adapter.is_conv = module_info["is_conv"]
adapter.conv_dim = module_info["conv_dim"]
adapter.kernel_size = module_info["kernel_size"]
adapter.in_channels = module_info["in_channels"]
adapter.out_channels = module_info["out_channels"]
# Store kw_dict for conv operations (like LyCORIS extra_args)
if module_info["is_conv"]:
adapter.kw_dict = {
"stride": module_info["stride"],
"padding": module_info["padding"],
"dilation": module_info["dilation"],
"groups": module_info["groups"],
}
else:
adapter.kw_dict = {}
def _bypass_forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:
"""Bypass forward: uses adapter's bypass_forward or default g(f(x) + h(x))
Note:
Bypass mode does NOT access original model weights (org_weight).
This is intentional - bypass mode is designed for quantized models
where weights may not be in a usable format. All necessary shape
information is provided via adapter attributes set during inject().
"""
# Check if adapter has custom bypass_forward (e.g., GLoRA)
adapter_bypass = getattr(self.adapter, "bypass_forward", None)
if adapter_bypass is not None:
# Check if it's overridden (not the base class default)
# Need to check both base classes since adapter could be either type
adapter_type = type(self.adapter)
is_default_bypass = (
adapter_type.bypass_forward is WeightAdapterBase.bypass_forward
or adapter_type.bypass_forward is WeightAdapterTrainBase.bypass_forward
)
if not is_default_bypass:
return adapter_bypass(self.original_forward, x, *args, **kwargs)
# Default bypass: g(f(x) + h(x, f(x)))
base_out = self.original_forward(x, *args, **kwargs)
h_out = self.adapter.h(x, base_out)
return self.adapter.g(base_out + h_out)
def inject(self):
"""Replace module forward with bypass version."""
if self.original_forward is not None:
logging.debug(
f"[BypassHook] Already injected for {type(self.module).__name__}"
)
return # Already injected
# Move adapter weights to module's device to avoid CPU-GPU transfer on every forward
device = None
dtype = None
if hasattr(self.module, "weight") and self.module.weight is not None:
device = self.module.weight.device
dtype = self.module.weight.dtype
elif hasattr(self.module, "W_q"): # Quantized layers might use different attr
device = self.module.W_q.device
dtype = self.module.W_q.dtype
if device is not None:
self._move_adapter_weights_to_device(device, dtype)
self.original_forward = self.module.forward
self.module.forward = self._bypass_forward
logging.debug(
f"[BypassHook] Injected bypass forward for {type(self.module).__name__} (adapter={type(self.adapter).__name__})"
)
def _move_adapter_weights_to_device(self, device, dtype=None):
"""Move adapter weights to specified device to avoid per-forward transfers.
Handles both:
- WeightAdapterBase: has self.weights tuple of tensors
- WeightAdapterTrainBase: nn.Module with parameters, uses .to() method
"""
adapter = self.adapter
# Check if adapter is an nn.Module (WeightAdapterTrainBase)
if isinstance(adapter, nn.Module):
# In training mode we don't touch dtype as trainer will handle it
adapter.to(device=device)
logging.debug(
f"[BypassHook] Moved training adapter (nn.Module) to {device}"
)
return
# WeightAdapterBase: handle self.weights tuple
if not hasattr(adapter, "weights") or adapter.weights is None:
return
weights = adapter.weights
if isinstance(weights, (list, tuple)):
new_weights = []
for w in weights:
if isinstance(w, torch.Tensor):
if dtype is not None:
new_weights.append(w.to(device=device, dtype=dtype))
else:
new_weights.append(w.to(device=device))
else:
new_weights.append(w)
adapter.weights = (
tuple(new_weights) if isinstance(weights, tuple) else new_weights
)
elif isinstance(weights, torch.Tensor):
if dtype is not None:
adapter.weights = weights.to(device=device, dtype=dtype)
else:
adapter.weights = weights.to(device=device)
logging.debug(f"[BypassHook] Moved adapter weights to {device}")
def eject(self):
"""Restore original module forward."""
if self.original_forward is None:
logging.debug(f"[BypassHook] Not injected for {type(self.module).__name__}")
return # Not injected
self.module.forward = self.original_forward
self.original_forward = None
logging.debug(
f"[BypassHook] Ejected bypass forward for {type(self.module).__name__}"
)
class BypassInjectionManager:
"""
Manages bypass mode injection for a collection of adapters.
Creates PatcherInjection objects that can be used with ModelPatcher.
Supports both inference adapters (WeightAdapterBase) and training adapters
(WeightAdapterTrainBase).
Usage:
manager = BypassInjectionManager()
manager.add_adapter("model.layers.0.self_attn.q_proj", lora_adapter, strength=0.8)
manager.add_adapter("model.layers.0.self_attn.k_proj", lora_adapter, strength=0.8)
injections = manager.create_injections(model)
model_patcher.set_injections("bypass_lora", injections)
"""
def __init__(self):
self.adapters: dict[str, tuple[BypassAdapter, float]] = {}
self.hooks: list[BypassForwardHook] = []
def add_adapter(
self,
key: str,
adapter: BypassAdapter,
strength: float = 1.0,
):
"""
Add an adapter for a specific weight key.
Args:
key: Weight key (e.g., "model.layers.0.self_attn.q_proj.weight")
adapter: The weight adapter (LoRAAdapter, LoKrAdapter, etc.)
strength: Multiplier for adapter effect
"""
# Remove .weight suffix if present for module lookup
module_key = key
if module_key.endswith(".weight"):
module_key = module_key[:-7]
logging.debug(
f"[BypassManager] Stripped .weight suffix: {key} -> {module_key}"
)
self.adapters[module_key] = (adapter, strength)
logging.debug(
f"[BypassManager] Added adapter: {module_key} (type={type(adapter).__name__}, strength={strength})"
)
def clear_adapters(self):
"""Remove all adapters."""
self.adapters.clear()
def _get_module_by_key(self, model: nn.Module, key: str) -> Optional[nn.Module]:
"""Get a submodule by dot-separated key."""
parts = key.split(".")
module = model
try:
for i, part in enumerate(parts):
if part.isdigit():
module = module[int(part)]
else:
module = getattr(module, part)
logging.debug(
f"[BypassManager] Found module for key {key}: {type(module).__name__}"
)
return module
except (AttributeError, IndexError, KeyError) as e:
logging.error(f"[BypassManager] Failed to find module for key {key}: {e}")
logging.error(
f"[BypassManager] Failed at part index {i}, part={part}, current module type={type(module).__name__}"
)
return None
def create_injections(self, model: nn.Module) -> list[PatcherInjection]:
"""
Create PatcherInjection objects for all registered adapters.
Args:
model: The model to inject into (e.g., model_patcher.model)
Returns:
List of PatcherInjection objects to use with model_patcher.set_injections()
"""
self.hooks.clear()
logging.debug(
f"[BypassManager] create_injections called with {len(self.adapters)} adapters"
)
logging.debug(f"[BypassManager] Model type: {type(model).__name__}")
for key, (adapter, strength) in self.adapters.items():
logging.debug(f"[BypassManager] Looking for module: {key}")
module = self._get_module_by_key(model, key)
if module is None:
logging.warning(f"[BypassManager] Module not found for key {key}")
continue
if not hasattr(module, "weight"):
logging.warning(
f"[BypassManager] Module {key} has no weight attribute (type={type(module).__name__})"
)
continue
logging.debug(
f"[BypassManager] Creating hook for {key} (module type={type(module).__name__}, weight shape={module.weight.shape})"
)
hook = BypassForwardHook(module, adapter, multiplier=strength)
self.hooks.append(hook)
logging.debug(f"[BypassManager] Created {len(self.hooks)} hooks")
# Create single injection that manages all hooks
def inject_all(model_patcher):
logging.debug(
f"[BypassManager] inject_all called, injecting {len(self.hooks)} hooks"
)
for hook in self.hooks:
hook.inject()
logging.debug(
f"[BypassManager] Injected hook for {type(hook.module).__name__}"
)
def eject_all(model_patcher):
logging.debug(
f"[BypassManager] eject_all called, ejecting {len(self.hooks)} hooks"
)
for hook in self.hooks:
hook.eject()
return [PatcherInjection(inject=inject_all, eject=eject_all)]
def get_hook_count(self) -> int:
"""Return number of hooks that will be/are injected."""
return len(self.hooks)
def create_bypass_injections_from_patches(
model: nn.Module,
patches: dict,
strength: float = 1.0,
) -> list[PatcherInjection]:
"""
Convenience function to create bypass injections from a patches dict.
This is useful when you have patches in the format used by model_patcher.add_patches()
and want to apply them in bypass mode instead.
Args:
model: The model to inject into
patches: Dict mapping weight keys to adapter data
strength: Global strength multiplier
Returns:
List of PatcherInjection objects
"""
manager = BypassInjectionManager()
for key, patch_list in patches.items():
if not patch_list:
continue
# patches format: list of (strength_patch, patch_data, strength_model, offset, function)
for patch in patch_list:
patch_strength, patch_data, strength_model, offset, function = patch
# patch_data should be a WeightAdapterBase/WeightAdapterTrainBase or tuple
if isinstance(patch_data, (WeightAdapterBase, WeightAdapterTrainBase)):
adapter = patch_data
else:
# Skip non-adapter patches
continue
combined_strength = strength * patch_strength
manager.add_adapter(key, adapter, strength=combined_strength)
return manager.create_injections(model)

View File

@@ -1,7 +1,8 @@
import logging
from typing import Optional
from typing import Callable, Optional
import torch
import torch.nn.functional as F
import comfy.model_management
from .base import WeightAdapterBase, weight_decompose
@@ -29,7 +30,14 @@ class GLoRAAdapter(WeightAdapterBase):
b1_name = "{}.b1.weight".format(x)
b2_name = "{}.b2.weight".format(x)
if a1_name in lora:
weights = (lora[a1_name], lora[a2_name], lora[b1_name], lora[b2_name], alpha, dora_scale)
weights = (
lora[a1_name],
lora[a2_name],
lora[b1_name],
lora[b2_name],
alpha,
dora_scale,
)
loaded_keys.add(a1_name)
loaded_keys.add(a2_name)
loaded_keys.add(b1_name)
@@ -58,16 +66,28 @@ class GLoRAAdapter(WeightAdapterBase):
old_glora = True
if v[3].shape[0] == v[2].shape[1] == v[0].shape[1] == v[1].shape[0]:
if old_glora and v[1].shape[0] == weight.shape[0] and weight.shape[0] == weight.shape[1]:
if (
old_glora
and v[1].shape[0] == weight.shape[0]
and weight.shape[0] == weight.shape[1]
):
pass
else:
old_glora = False
rank = v[1].shape[0]
a1 = comfy.model_management.cast_to_device(v[0].flatten(start_dim=1), weight.device, intermediate_dtype)
a2 = comfy.model_management.cast_to_device(v[1].flatten(start_dim=1), weight.device, intermediate_dtype)
b1 = comfy.model_management.cast_to_device(v[2].flatten(start_dim=1), weight.device, intermediate_dtype)
b2 = comfy.model_management.cast_to_device(v[3].flatten(start_dim=1), weight.device, intermediate_dtype)
a1 = comfy.model_management.cast_to_device(
v[0].flatten(start_dim=1), weight.device, intermediate_dtype
)
a2 = comfy.model_management.cast_to_device(
v[1].flatten(start_dim=1), weight.device, intermediate_dtype
)
b1 = comfy.model_management.cast_to_device(
v[2].flatten(start_dim=1), weight.device, intermediate_dtype
)
b2 = comfy.model_management.cast_to_device(
v[3].flatten(start_dim=1), weight.device, intermediate_dtype
)
if v[4] is not None:
alpha = v[4] / rank
@@ -76,18 +96,195 @@ class GLoRAAdapter(WeightAdapterBase):
try:
if old_glora:
lora_diff = (torch.mm(b2, b1) + torch.mm(torch.mm(weight.flatten(start_dim=1).to(dtype=intermediate_dtype), a2), a1)).reshape(weight.shape) #old lycoris glora
lora_diff = (
torch.mm(b2, b1)
+ torch.mm(
torch.mm(
weight.flatten(start_dim=1).to(dtype=intermediate_dtype), a2
),
a1,
)
).reshape(
weight.shape
) # old lycoris glora
else:
if weight.dim() > 2:
lora_diff = torch.einsum("o i ..., i j -> o j ...", torch.einsum("o i ..., i j -> o j ...", weight.to(dtype=intermediate_dtype), a1), a2).reshape(weight.shape)
lora_diff = torch.einsum(
"o i ..., i j -> o j ...",
torch.einsum(
"o i ..., i j -> o j ...",
weight.to(dtype=intermediate_dtype),
a1,
),
a2,
).reshape(weight.shape)
else:
lora_diff = torch.mm(torch.mm(weight.to(dtype=intermediate_dtype), a1), a2).reshape(weight.shape)
lora_diff = torch.mm(
torch.mm(weight.to(dtype=intermediate_dtype), a1), a2
).reshape(weight.shape)
lora_diff += torch.mm(b1, b2).reshape(weight.shape)
if dora_scale is not None:
weight = weight_decompose(dora_scale, weight, lora_diff, alpha, strength, intermediate_dtype, function)
weight = weight_decompose(
dora_scale,
weight,
lora_diff,
alpha,
strength,
intermediate_dtype,
function,
)
else:
weight += function(((strength * alpha) * lora_diff).type(weight.dtype))
except Exception as e:
logging.error("ERROR {} {} {}".format(self.name, key, e))
return weight
def _compute_paths(self, x: torch.Tensor):
"""
Compute A path and B path outputs for GLoRA bypass.
GLoRA: f(x) = Wx + WAx + Bx
- A path: a1(a2(x)) - modifies input to base forward
- B path: b1(b2(x)) - additive component
Note:
Does not access original model weights - bypass mode is designed
for quantized models where weights may not be accessible.
Returns: (a_out, b_out)
"""
v = self.weights
# v = (a1, a2, b1, b2, alpha, dora_scale)
a1 = v[0]
a2 = v[1]
b1 = v[2]
b2 = v[3]
alpha = v[4]
dtype = x.dtype
# Cast dtype (weights should already be on correct device from inject())
a1 = a1.to(dtype=dtype)
a2 = a2.to(dtype=dtype)
b1 = b1.to(dtype=dtype)
b2 = b2.to(dtype=dtype)
# Determine rank and scale
# Check for old vs new glora format
old_glora = False
if b2.shape[1] == b1.shape[0] == a1.shape[0] == a2.shape[1]:
rank = a1.shape[0]
old_glora = True
if b2.shape[0] == b1.shape[1] == a1.shape[1] == a2.shape[0]:
if old_glora and a2.shape[0] == x.shape[-1] and x.shape[-1] == x.shape[-1]:
pass
else:
old_glora = False
rank = a2.shape[0]
if alpha is not None:
scale = alpha / rank
else:
scale = 1.0
# Apply multiplier
multiplier = getattr(self, "multiplier", 1.0)
scale = scale * multiplier
# Use module info from bypass injection, not input tensor shape
is_conv = getattr(self, "is_conv", False)
conv_dim = getattr(self, "conv_dim", 0)
kw_dict = getattr(self, "kw_dict", {})
if is_conv:
# Conv case - conv_dim is 1/2/3 for conv1d/2d/3d
conv_fn = (F.conv1d, F.conv2d, F.conv3d)[conv_dim - 1]
# Get module's stride/padding for spatial dimension handling
module_stride = kw_dict.get("stride", (1,) * conv_dim)
module_padding = kw_dict.get("padding", (0,) * conv_dim)
kernel_size = getattr(self, "kernel_size", (1,) * conv_dim)
in_channels = getattr(self, "in_channels", None)
# Ensure weights are in conv shape
# a1, a2, b1 are always 1x1 kernels
if a1.ndim == 2:
a1 = a1.view(*a1.shape, *([1] * conv_dim))
if a2.ndim == 2:
a2 = a2.view(*a2.shape, *([1] * conv_dim))
if b1.ndim == 2:
b1 = b1.view(*b1.shape, *([1] * conv_dim))
# b2 has actual kernel_size (like LoRA down)
if b2.ndim == 2:
if in_channels is not None:
b2 = b2.view(b2.shape[0], in_channels, *kernel_size)
else:
b2 = b2.view(*b2.shape, *([1] * conv_dim))
# A path: a2(x) -> a1(...) - 1x1 convs, no stride/padding needed, a_out is added to x
a2_out = conv_fn(x, a2)
a_out = conv_fn(a2_out, a1) * scale
# B path: b2(x) with kernel/stride/padding -> b1(...) 1x1
b2_out = conv_fn(x, b2, stride=module_stride, padding=module_padding)
b_out = conv_fn(b2_out, b1) * scale
else:
# Linear case
if old_glora:
# Old format: a1 @ a2 @ x, b2 @ b1
a_out = F.linear(F.linear(x, a2), a1) * scale
b_out = F.linear(F.linear(x, b1), b2) * scale
else:
# New format: x @ a1 @ a2, b1 @ b2
a_out = F.linear(F.linear(x, a1), a2) * scale
b_out = F.linear(F.linear(x, b2), b1) * scale
return a_out, b_out
def bypass_forward(
self,
org_forward: Callable,
x: torch.Tensor,
*args,
**kwargs,
) -> torch.Tensor:
"""
GLoRA bypass forward: f(x + a(x)) + b(x)
Unlike standard adapters, GLoRA modifies the input to the base forward
AND adds the B path output.
Note:
Does not access original model weights - bypass mode is designed
for quantized models where weights may not be accessible.
Reference: LyCORIS GLoRAModule._bypass_forward
"""
a_out, b_out = self._compute_paths(x)
# Call base forward with modified input
base_out = org_forward(x + a_out, *args, **kwargs)
# Add B path
return base_out + b_out
def h(self, x: torch.Tensor, base_out: torch.Tensor) -> torch.Tensor:
"""
For GLoRA, h() returns the B path output.
Note:
GLoRA's full bypass requires overriding bypass_forward() since
it also modifies the input to org_forward. This h() is provided for
compatibility but bypass_forward() should be used for correct behavior.
Does not access original model weights - bypass mode is designed
for quantized models where weights may not be accessible.
Args:
x: Input tensor
base_out: Output from base forward (unused, for API consistency)
"""
_, b_out = self._compute_paths(x)
return b_out

View File

@@ -1,11 +1,22 @@
import logging
from functools import cache
from typing import Optional
import torch
import torch.nn.functional as F
import comfy.model_management
from .base import WeightAdapterBase, WeightAdapterTrainBase, weight_decompose
@cache
def _warn_loha_bypass_inefficient():
"""One-time warning about LoHa bypass inefficiency."""
logging.warning(
"LoHa bypass mode is inefficient: full weight diff is computed each forward pass. "
"Consider using LoRA or LoKr for training with bypass mode."
)
class HadaWeight(torch.autograd.Function):
@staticmethod
def forward(ctx, w1u, w1d, w2u, w2d, scale=torch.tensor(1)):
@@ -105,9 +116,19 @@ class LohaDiff(WeightAdapterTrainBase):
scale = self.alpha / self.rank
if self.use_tucker:
diff_weight = HadaWeightTucker.apply(self.hada_t1, self.hada_w1_a, self.hada_w1_b, self.hada_t2, self.hada_w2_a, self.hada_w2_b, scale)
diff_weight = HadaWeightTucker.apply(
self.hada_t1,
self.hada_w1_a,
self.hada_w1_b,
self.hada_t2,
self.hada_w2_a,
self.hada_w2_b,
scale,
)
else:
diff_weight = HadaWeight.apply(self.hada_w1_a, self.hada_w1_b, self.hada_w2_a, self.hada_w2_b, scale)
diff_weight = HadaWeight.apply(
self.hada_w1_a, self.hada_w1_b, self.hada_w2_a, self.hada_w2_b, scale
)
# Add the scaled difference to the original weight
weight = w.to(diff_weight) + diff_weight.reshape(w.shape)
@@ -138,9 +159,7 @@ class LoHaAdapter(WeightAdapterBase):
mat4 = torch.empty(rank, in_dim, device=weight.device, dtype=torch.float32)
torch.nn.init.normal_(mat3, 0.1)
torch.nn.init.normal_(mat4, 0.01)
return LohaDiff(
(mat1, mat2, alpha, mat3, mat4, None, None, None)
)
return LohaDiff((mat1, mat2, alpha, mat3, mat4, None, None, None))
def to_train(self):
return LohaDiff(self.weights)
@@ -172,7 +191,16 @@ class LoHaAdapter(WeightAdapterBase):
loaded_keys.add(hada_t1_name)
loaded_keys.add(hada_t2_name)
weights = (lora[hada_w1_a_name], lora[hada_w1_b_name], alpha, lora[hada_w2_a_name], lora[hada_w2_b_name], hada_t1, hada_t2, dora_scale)
weights = (
lora[hada_w1_a_name],
lora[hada_w1_b_name],
alpha,
lora[hada_w2_a_name],
lora[hada_w2_b_name],
hada_t1,
hada_t2,
dora_scale,
)
loaded_keys.add(hada_w1_a_name)
loaded_keys.add(hada_w1_b_name)
loaded_keys.add(hada_w2_a_name)
@@ -203,30 +231,148 @@ class LoHaAdapter(WeightAdapterBase):
w2a = v[3]
w2b = v[4]
dora_scale = v[7]
if v[5] is not None: #cp decomposition
if v[5] is not None: # cp decomposition
t1 = v[5]
t2 = v[6]
m1 = torch.einsum('i j k l, j r, i p -> p r k l',
comfy.model_management.cast_to_device(t1, weight.device, intermediate_dtype),
comfy.model_management.cast_to_device(w1b, weight.device, intermediate_dtype),
comfy.model_management.cast_to_device(w1a, weight.device, intermediate_dtype))
m1 = torch.einsum(
"i j k l, j r, i p -> p r k l",
comfy.model_management.cast_to_device(
t1, weight.device, intermediate_dtype
),
comfy.model_management.cast_to_device(
w1b, weight.device, intermediate_dtype
),
comfy.model_management.cast_to_device(
w1a, weight.device, intermediate_dtype
),
)
m2 = torch.einsum('i j k l, j r, i p -> p r k l',
comfy.model_management.cast_to_device(t2, weight.device, intermediate_dtype),
comfy.model_management.cast_to_device(w2b, weight.device, intermediate_dtype),
comfy.model_management.cast_to_device(w2a, weight.device, intermediate_dtype))
m2 = torch.einsum(
"i j k l, j r, i p -> p r k l",
comfy.model_management.cast_to_device(
t2, weight.device, intermediate_dtype
),
comfy.model_management.cast_to_device(
w2b, weight.device, intermediate_dtype
),
comfy.model_management.cast_to_device(
w2a, weight.device, intermediate_dtype
),
)
else:
m1 = torch.mm(comfy.model_management.cast_to_device(w1a, weight.device, intermediate_dtype),
comfy.model_management.cast_to_device(w1b, weight.device, intermediate_dtype))
m2 = torch.mm(comfy.model_management.cast_to_device(w2a, weight.device, intermediate_dtype),
comfy.model_management.cast_to_device(w2b, weight.device, intermediate_dtype))
m1 = torch.mm(
comfy.model_management.cast_to_device(
w1a, weight.device, intermediate_dtype
),
comfy.model_management.cast_to_device(
w1b, weight.device, intermediate_dtype
),
)
m2 = torch.mm(
comfy.model_management.cast_to_device(
w2a, weight.device, intermediate_dtype
),
comfy.model_management.cast_to_device(
w2b, weight.device, intermediate_dtype
),
)
try:
lora_diff = (m1 * m2).reshape(weight.shape)
if dora_scale is not None:
weight = weight_decompose(dora_scale, weight, lora_diff, alpha, strength, intermediate_dtype, function)
weight = weight_decompose(
dora_scale,
weight,
lora_diff,
alpha,
strength,
intermediate_dtype,
function,
)
else:
weight += function(((strength * alpha) * lora_diff).type(weight.dtype))
except Exception as e:
logging.error("ERROR {} {} {}".format(self.name, key, e))
return weight
def h(self, x: torch.Tensor, base_out: torch.Tensor) -> torch.Tensor:
"""
Additive bypass component for LoHa: h(x) = diff_weight @ x
WARNING: Inefficient - computes full Hadamard product each forward.
Note:
Does not access original model weights - bypass mode is designed
for quantized models where weights may not be accessible.
Args:
x: Input tensor
base_out: Output from base forward (unused, for API consistency)
Reference: LyCORIS functional/loha.py bypass_forward_diff
"""
_warn_loha_bypass_inefficient()
# FUNC_LIST: [None, None, F.linear, F.conv1d, F.conv2d, F.conv3d]
FUNC_LIST = [None, None, F.linear, F.conv1d, F.conv2d, F.conv3d]
v = self.weights
# v[0]=w1a, v[1]=w1b, v[2]=alpha, v[3]=w2a, v[4]=w2b, v[5]=t1, v[6]=t2, v[7]=dora
w1a = v[0]
w1b = v[1]
alpha = v[2]
w2a = v[3]
w2b = v[4]
t1 = v[5]
t2 = v[6]
# Compute scale
rank = w1b.shape[0]
scale = (alpha / rank if alpha is not None else 1.0) * getattr(
self, "multiplier", 1.0
)
# Cast dtype
w1a = w1a.to(dtype=x.dtype)
w1b = w1b.to(dtype=x.dtype)
w2a = w2a.to(dtype=x.dtype)
w2b = w2b.to(dtype=x.dtype)
# Use module info from bypass injection, not weight dimension
is_conv = getattr(self, "is_conv", False)
conv_dim = getattr(self, "conv_dim", 0)
kw_dict = getattr(self, "kw_dict", {})
# Compute diff weight using Hadamard product
if t1 is not None and t2 is not None:
t1 = t1.to(dtype=x.dtype)
t2 = t2.to(dtype=x.dtype)
m1 = torch.einsum("i j k l, j r, i p -> p r k l", t1, w1b, w1a)
m2 = torch.einsum("i j k l, j r, i p -> p r k l", t2, w2b, w2a)
diff_weight = (m1 * m2) * scale
else:
m1 = w1a @ w1b
m2 = w2a @ w2b
diff_weight = (m1 * m2) * scale
if is_conv:
op = FUNC_LIST[conv_dim + 2]
kernel_size = getattr(self, "kernel_size", (1,) * conv_dim)
in_channels = getattr(self, "in_channels", None)
# Reshape 2D diff_weight to conv format using kernel_size
# diff_weight: [out_channels, in_channels * prod(kernel_size)] -> [out_channels, in_channels, *kernel_size]
if diff_weight.dim() == 2:
if in_channels is not None:
diff_weight = diff_weight.view(
diff_weight.shape[0], in_channels, *kernel_size
)
else:
diff_weight = diff_weight.view(
*diff_weight.shape, *([1] * conv_dim)
)
else:
op = F.linear
kw_dict = {}
return op(x, diff_weight, **kw_dict)

View File

@@ -2,6 +2,7 @@ import logging
from typing import Optional
import torch
import torch.nn.functional as F
import comfy.model_management
from .base import (
WeightAdapterBase,
@@ -14,7 +15,17 @@ from .base import (
class LokrDiff(WeightAdapterTrainBase):
def __init__(self, weights):
super().__init__()
(lokr_w1, lokr_w2, alpha, lokr_w1_a, lokr_w1_b, lokr_w2_a, lokr_w2_b, lokr_t2, dora_scale) = weights
(
lokr_w1,
lokr_w2,
alpha,
lokr_w1_a,
lokr_w1_b,
lokr_w2_a,
lokr_w2_b,
lokr_t2,
dora_scale,
) = weights
self.use_tucker = False
if lokr_w1_a is not None:
_, rank_a = lokr_w1_a.shape[0], lokr_w1_a.shape[1]
@@ -57,10 +68,10 @@ class LokrDiff(WeightAdapterTrainBase):
if self.w2_rebuild:
if self.use_tucker:
w2 = torch.einsum(
'i j k l, j r, i p -> p r k l',
"i j k l, j r, i p -> p r k l",
self.lokr_t2,
self.lokr_w2_b,
self.lokr_w2_a
self.lokr_w2_a,
)
else:
w2 = self.lokr_w2_a @ self.lokr_w2_b
@@ -69,9 +80,89 @@ class LokrDiff(WeightAdapterTrainBase):
return self.lokr_w2
def __call__(self, w):
diff = torch.kron(self.w1, self.w2)
w1 = self.w1
w2 = self.w2
# Unsqueeze w1 to match w2 dims for proper kron product (like LyCORIS make_kron)
for _ in range(w2.dim() - w1.dim()):
w1 = w1.unsqueeze(-1)
diff = torch.kron(w1, w2)
return w + diff.reshape(w.shape).to(w)
def h(self, x: torch.Tensor, base_out: torch.Tensor) -> torch.Tensor:
"""
Additive bypass component for LoKr training: efficient Kronecker product.
Uses w1/w2 properties which handle both direct and decomposed cases.
For create_train (direct w1/w2), no alpha scaling in properties.
For to_train (decomposed), alpha/rank scaling is in properties.
Args:
x: Input tensor
base_out: Output from base forward (unused, for API consistency)
"""
# Get w1, w2 from properties (handles rebuild vs direct)
w1 = self.w1
w2 = self.w2
# Multiplier from bypass injection
multiplier = getattr(self, "multiplier", 1.0)
# Get module info from bypass injection
is_conv = getattr(self, "is_conv", False)
conv_dim = getattr(self, "conv_dim", 0)
kw_dict = getattr(self, "kw_dict", {})
# Efficient Kronecker application without materializing full weight
# kron(w1, w2) @ x can be computed as nested operations
# w1: [out_l, in_m], w2: [out_k, in_n, *k_size]
# Full weight would be [out_l*out_k, in_m*in_n, *k_size]
uq = w1.size(1) # in_m - inner grouping dimension
if is_conv:
conv_fn = (F.conv1d, F.conv2d, F.conv3d)[conv_dim - 1]
B, C_in, *spatial = x.shape
# Reshape input for grouped application: [B * uq, C_in // uq, *spatial]
h_in_group = x.reshape(B * uq, -1, *spatial)
# Ensure w2 has conv dims
if w2.dim() == 2:
w2 = w2.view(*w2.shape, *([1] * conv_dim))
# Apply w2 path with stride/padding
hb = conv_fn(h_in_group, w2, **kw_dict)
# Reshape for cross-group operation
hb = hb.view(B, -1, *hb.shape[1:])
h_cross = hb.transpose(1, -1)
# Apply w1 (always 2D, applied as linear on channel dim)
hc = F.linear(h_cross, w1)
hc = hc.transpose(1, -1)
# Reshape to output
out = hc.reshape(B, -1, *hc.shape[3:])
else:
# Linear case
# Reshape input: [..., in_m * in_n] -> [..., uq (in_m), in_n]
h_in_group = x.reshape(*x.shape[:-1], uq, -1)
# Apply w2: [..., uq, in_n] @ [out_k, in_n].T -> [..., uq, out_k]
hb = F.linear(h_in_group, w2)
# Transpose for w1: [..., uq, out_k] -> [..., out_k, uq]
h_cross = hb.transpose(-1, -2)
# Apply w1: [..., out_k, uq] @ [out_l, uq].T -> [..., out_k, out_l]
hc = F.linear(h_cross, w1)
# Transpose back and flatten: [..., out_k, out_l] -> [..., out_l * out_k]
hc = hc.transpose(-1, -2)
out = hc.reshape(*hc.shape[:-2], -1)
return out * multiplier
def passive_memory_usage(self):
return sum(param.numel() * param.element_size() for param in self.parameters())
@@ -86,16 +177,22 @@ class LoKrAdapter(WeightAdapterBase):
@classmethod
def create_train(cls, weight, rank=1, alpha=1.0):
out_dim = weight.shape[0]
in_dim = weight.shape[1:].numel()
out1, out2 = factorization(out_dim, rank)
in1, in2 = factorization(in_dim, rank)
mat1 = torch.empty(out1, in1, device=weight.device, dtype=torch.float32)
mat2 = torch.empty(out2, in2, device=weight.device, dtype=torch.float32)
in_dim = weight.shape[1] # Just in_channels, not flattened with kernel
k_size = weight.shape[2:] if weight.dim() > 2 else ()
out_l, out_k = factorization(out_dim, rank)
in_m, in_n = factorization(in_dim, rank)
# w1: [out_l, in_m]
mat1 = torch.empty(out_l, in_m, device=weight.device, dtype=torch.float32)
# w2: [out_k, in_n, *k_size] for conv, [out_k, in_n] for linear
mat2 = torch.empty(
out_k, in_n, *k_size, device=weight.device, dtype=torch.float32
)
torch.nn.init.kaiming_uniform_(mat2, a=5**0.5)
torch.nn.init.constant_(mat1, 0.0)
return LokrDiff(
(mat1, mat2, alpha, None, None, None, None, None, None)
)
return LokrDiff((mat1, mat2, alpha, None, None, None, None, None, None))
def to_train(self):
return LokrDiff(self.weights)
@@ -154,8 +251,23 @@ class LoKrAdapter(WeightAdapterBase):
lokr_t2 = lora[lokr_t2_name]
loaded_keys.add(lokr_t2_name)
if (lokr_w1 is not None) or (lokr_w2 is not None) or (lokr_w1_a is not None) or (lokr_w2_a is not None):
weights = (lokr_w1, lokr_w2, alpha, lokr_w1_a, lokr_w1_b, lokr_w2_a, lokr_w2_b, lokr_t2, dora_scale)
if (
(lokr_w1 is not None)
or (lokr_w2 is not None)
or (lokr_w1_a is not None)
or (lokr_w2_a is not None)
):
weights = (
lokr_w1,
lokr_w2,
alpha,
lokr_w1_a,
lokr_w1_b,
lokr_w2_a,
lokr_w2_b,
lokr_t2,
dora_scale,
)
return cls(loaded_keys, weights)
else:
return None
@@ -184,23 +296,47 @@ class LoKrAdapter(WeightAdapterBase):
if w1 is None:
dim = w1_b.shape[0]
w1 = torch.mm(comfy.model_management.cast_to_device(w1_a, weight.device, intermediate_dtype),
comfy.model_management.cast_to_device(w1_b, weight.device, intermediate_dtype))
w1 = torch.mm(
comfy.model_management.cast_to_device(
w1_a, weight.device, intermediate_dtype
),
comfy.model_management.cast_to_device(
w1_b, weight.device, intermediate_dtype
),
)
else:
w1 = comfy.model_management.cast_to_device(w1, weight.device, intermediate_dtype)
w1 = comfy.model_management.cast_to_device(
w1, weight.device, intermediate_dtype
)
if w2 is None:
dim = w2_b.shape[0]
if t2 is None:
w2 = torch.mm(comfy.model_management.cast_to_device(w2_a, weight.device, intermediate_dtype),
comfy.model_management.cast_to_device(w2_b, weight.device, intermediate_dtype))
w2 = torch.mm(
comfy.model_management.cast_to_device(
w2_a, weight.device, intermediate_dtype
),
comfy.model_management.cast_to_device(
w2_b, weight.device, intermediate_dtype
),
)
else:
w2 = torch.einsum('i j k l, j r, i p -> p r k l',
comfy.model_management.cast_to_device(t2, weight.device, intermediate_dtype),
comfy.model_management.cast_to_device(w2_b, weight.device, intermediate_dtype),
comfy.model_management.cast_to_device(w2_a, weight.device, intermediate_dtype))
w2 = torch.einsum(
"i j k l, j r, i p -> p r k l",
comfy.model_management.cast_to_device(
t2, weight.device, intermediate_dtype
),
comfy.model_management.cast_to_device(
w2_b, weight.device, intermediate_dtype
),
comfy.model_management.cast_to_device(
w2_a, weight.device, intermediate_dtype
),
)
else:
w2 = comfy.model_management.cast_to_device(w2, weight.device, intermediate_dtype)
w2 = comfy.model_management.cast_to_device(
w2, weight.device, intermediate_dtype
)
if len(w2.shape) == 4:
w1 = w1.unsqueeze(2).unsqueeze(2)
@@ -212,9 +348,134 @@ class LoKrAdapter(WeightAdapterBase):
try:
lora_diff = torch.kron(w1, w2).reshape(weight.shape)
if dora_scale is not None:
weight = weight_decompose(dora_scale, weight, lora_diff, alpha, strength, intermediate_dtype, function)
weight = weight_decompose(
dora_scale,
weight,
lora_diff,
alpha,
strength,
intermediate_dtype,
function,
)
else:
weight += function(((strength * alpha) * lora_diff).type(weight.dtype))
except Exception as e:
logging.error("ERROR {} {} {}".format(self.name, key, e))
return weight
def h(self, x: torch.Tensor, base_out: torch.Tensor) -> torch.Tensor:
"""
Additive bypass component for LoKr: efficient Kronecker product application.
Note:
Does not access original model weights - bypass mode is designed
for quantized models where weights may not be accessible.
Args:
x: Input tensor
base_out: Output from base forward (unused, for API consistency)
Reference: LyCORIS functional/lokr.py bypass_forward_diff
"""
# FUNC_LIST: [None, None, F.linear, F.conv1d, F.conv2d, F.conv3d]
FUNC_LIST = [None, None, F.linear, F.conv1d, F.conv2d, F.conv3d]
v = self.weights
# v[0]=w1, v[1]=w2, v[2]=alpha, v[3]=w1_a, v[4]=w1_b, v[5]=w2_a, v[6]=w2_b, v[7]=t2, v[8]=dora
w1 = v[0]
w2 = v[1]
alpha = v[2]
w1_a = v[3]
w1_b = v[4]
w2_a = v[5]
w2_b = v[6]
t2 = v[7]
use_w1 = w1 is not None
use_w2 = w2 is not None
tucker = t2 is not None
# Use module info from bypass injection, not weight dimension
is_conv = getattr(self, "is_conv", False)
conv_dim = getattr(self, "conv_dim", 0)
kw_dict = getattr(self, "kw_dict", {}) if is_conv else {}
if is_conv:
op = FUNC_LIST[conv_dim + 2]
else:
op = F.linear
# Determine rank and scale
rank = w1_b.size(0) if not use_w1 else w2_b.size(0) if not use_w2 else alpha
scale = (alpha / rank if alpha is not None else 1.0) * getattr(
self, "multiplier", 1.0
)
# Build c (w1)
if use_w1:
c = w1.to(dtype=x.dtype)
else:
c = w1_a.to(dtype=x.dtype) @ w1_b.to(dtype=x.dtype)
uq = c.size(1)
# Build w2 components
if use_w2:
ba = w2.to(dtype=x.dtype)
else:
a = w2_b.to(dtype=x.dtype)
b = w2_a.to(dtype=x.dtype)
if is_conv:
if tucker:
# Tucker: a, b get 1s appended (kernel is in t2)
if a.dim() == 2:
a = a.view(*a.shape, *([1] * conv_dim))
if b.dim() == 2:
b = b.view(*b.shape, *([1] * conv_dim))
else:
# Non-tucker conv: b may need 1s appended
if b.dim() == 2:
b = b.view(*b.shape, *([1] * conv_dim))
# Reshape input by uq groups
if is_conv:
B, _, *rest = x.shape
h_in_group = x.reshape(B * uq, -1, *rest)
else:
h_in_group = x.reshape(*x.shape[:-1], uq, -1)
# Apply w2 path
if use_w2:
hb = op(h_in_group, ba, **kw_dict)
else:
if is_conv:
if tucker:
t = t2.to(dtype=x.dtype)
if t.dim() == 2:
t = t.view(*t.shape, *([1] * conv_dim))
ha = op(h_in_group, a)
ht = op(ha, t, **kw_dict)
hb = op(ht, b)
else:
ha = op(h_in_group, a, **kw_dict)
hb = op(ha, b)
else:
ha = op(h_in_group, a)
hb = op(ha, b)
# Reshape and apply c (w1)
if is_conv:
hb = hb.view(B, -1, *hb.shape[1:])
h_cross_group = hb.transpose(1, -1)
else:
h_cross_group = hb.transpose(-1, -2)
hc = F.linear(h_cross_group, c)
if is_conv:
hc = hc.transpose(1, -1)
out = hc.reshape(B, -1, *hc.shape[3:])
else:
hc = hc.transpose(-1, -2)
out = hc.reshape(*hc.shape[:-2], -1)
return out * scale

View File

@@ -2,6 +2,7 @@ import logging
from typing import Optional
import torch
import torch.nn.functional as F
import comfy.model_management
from .base import (
WeightAdapterBase,
@@ -20,11 +21,7 @@ class LoraDiff(WeightAdapterTrainBase):
rank, in_dim = mat2.shape[0], mat2.shape[1]
if mid is not None:
convdim = mid.ndim - 2
layer = (
torch.nn.Conv1d,
torch.nn.Conv2d,
torch.nn.Conv3d
)[convdim]
layer = (torch.nn.Conv1d, torch.nn.Conv2d, torch.nn.Conv3d)[convdim]
else:
layer = torch.nn.Linear
self.lora_up = layer(rank, out_dim, bias=False)
@@ -51,6 +48,78 @@ class LoraDiff(WeightAdapterTrainBase):
weight = w + scale * diff.reshape(w.shape)
return weight.to(org_dtype)
def h(self, x: torch.Tensor, base_out: torch.Tensor) -> torch.Tensor:
"""
Additive bypass component for LoRA training: h(x) = up(down(x)) * scale
Simple implementation using the nn.Module weights directly.
No mid/dora/reshape branches (create_train doesn't create them).
Args:
x: Input tensor
base_out: Output from base forward (unused, for API consistency)
"""
# Compute scale = alpha / rank * multiplier
scale = (self.alpha / self.rank) * getattr(self, "multiplier", 1.0)
# Get module info from bypass injection
is_conv = getattr(self, "is_conv", False)
conv_dim = getattr(self, "conv_dim", 0)
kw_dict = getattr(self, "kw_dict", {})
# Get weights (keep in original dtype for numerical stability)
down_weight = self.lora_down.weight
up_weight = self.lora_up.weight
if is_conv:
# Conv path: use functional conv
# conv_dim: 1=conv1d, 2=conv2d, 3=conv3d
conv_fn = (F.conv1d, F.conv2d, F.conv3d)[conv_dim - 1]
# Reshape 2D weights to conv format if needed
# down: [rank, in_features] -> [rank, in_channels, *kernel_size]
# up: [out_features, rank] -> [out_features, rank, 1, 1, ...]
if down_weight.dim() == 2:
kernel_size = getattr(self, "kernel_size", (1,) * conv_dim)
in_channels = getattr(self, "in_channels", None)
if in_channels is not None:
down_weight = down_weight.view(
down_weight.shape[0], in_channels, *kernel_size
)
else:
# Fallback: assume 1x1 kernel
down_weight = down_weight.view(
*down_weight.shape, *([1] * conv_dim)
)
if up_weight.dim() == 2:
# up always uses 1x1 kernel
up_weight = up_weight.view(*up_weight.shape, *([1] * conv_dim))
# down conv uses stride/padding from module, up is 1x1
hidden = conv_fn(x, down_weight, **kw_dict)
# mid layer if exists (tucker decomposition)
if self.lora_mid is not None:
mid_weight = self.lora_mid.weight
if mid_weight.dim() == 2:
mid_weight = mid_weight.view(*mid_weight.shape, *([1] * conv_dim))
hidden = conv_fn(hidden, mid_weight)
# up conv is always 1x1 (no stride/padding)
out = conv_fn(hidden, up_weight)
else:
# Linear path: simple matmul chain
hidden = F.linear(x, down_weight)
# mid layer if exists
if self.lora_mid is not None:
mid_weight = self.lora_mid.weight
hidden = F.linear(hidden, mid_weight)
out = F.linear(hidden, up_weight)
return out * scale
def passive_memory_usage(self):
return sum(param.numel() * param.element_size() for param in self.parameters())
@@ -70,9 +139,7 @@ class LoRAAdapter(WeightAdapterBase):
mat2 = torch.empty(rank, in_dim, device=weight.device, dtype=torch.float32)
torch.nn.init.kaiming_uniform_(mat1, a=5**0.5)
torch.nn.init.constant_(mat2, 0.0)
return LoraDiff(
(mat1, mat2, alpha, None, None, None)
)
return LoraDiff((mat1, mat2, alpha, None, None, None))
def to_train(self):
return LoraDiff(self.weights)
@@ -210,3 +277,85 @@ class LoRAAdapter(WeightAdapterBase):
except Exception as e:
logging.error("ERROR {} {} {}".format(self.name, key, e))
return weight
def h(self, x: torch.Tensor, base_out: torch.Tensor) -> torch.Tensor:
"""
Additive bypass component for LoRA: h(x) = up(down(x)) * scale
Note:
Does not access original model weights - bypass mode is designed
for quantized models where weights may not be accessible.
Args:
x: Input tensor
base_out: Output from base forward (unused, for API consistency)
Reference: LyCORIS functional/locon.py bypass_forward_diff
"""
# FUNC_LIST: [None, None, F.linear, F.conv1d, F.conv2d, F.conv3d]
FUNC_LIST = [None, None, F.linear, F.conv1d, F.conv2d, F.conv3d]
v = self.weights
# v[0]=up, v[1]=down, v[2]=alpha, v[3]=mid, v[4]=dora_scale, v[5]=reshape
up = v[0]
down = v[1]
alpha = v[2]
mid = v[3]
# Compute scale = alpha / rank
rank = down.shape[0]
if alpha is not None:
scale = alpha / rank
else:
scale = 1.0
scale = scale * getattr(self, "multiplier", 1.0)
# Cast dtype
up = up.to(dtype=x.dtype)
down = down.to(dtype=x.dtype)
# Use module info from bypass injection, not weight dimension
is_conv = getattr(self, "is_conv", False)
conv_dim = getattr(self, "conv_dim", 0)
kw_dict = getattr(self, "kw_dict", {})
if is_conv:
op = FUNC_LIST[
conv_dim + 2
] # conv_dim 1->conv1d(3), 2->conv2d(4), 3->conv3d(5)
kernel_size = getattr(self, "kernel_size", (1,) * conv_dim)
in_channels = getattr(self, "in_channels", None)
# Reshape 2D weights to conv format using kernel_size
# down: [rank, in_channels * prod(kernel_size)] -> [rank, in_channels, *kernel_size]
# up: [out_channels, rank] -> [out_channels, rank, 1, 1, ...] (1x1 kernel)
if down.dim() == 2:
# down.shape[1] = in_channels * prod(kernel_size)
if in_channels is not None:
down = down.view(down.shape[0], in_channels, *kernel_size)
else:
# Fallback: assume 1x1 kernel if in_channels unknown
down = down.view(*down.shape, *([1] * conv_dim))
if up.dim() == 2:
# up always uses 1x1 kernel
up = up.view(*up.shape, *([1] * conv_dim))
if mid is not None:
mid = mid.to(dtype=x.dtype)
if mid.dim() == 2:
mid = mid.view(*mid.shape, *([1] * conv_dim))
else:
op = F.linear
kw_dict = {} # linear doesn't take stride/padding
# Simple chain: down -> mid (if tucker) -> up
if mid is not None:
if not is_conv:
mid = mid.to(dtype=x.dtype)
hidden = op(x, down)
hidden = op(hidden, mid, **kw_dict)
out = op(hidden, up)
else:
hidden = op(x, down, **kw_dict)
out = op(hidden, up)
return out * scale

View File

@@ -3,13 +3,18 @@ from typing import Optional
import torch
import comfy.model_management
from .base import WeightAdapterBase, WeightAdapterTrainBase, weight_decompose, factorization
from .base import (
WeightAdapterBase,
WeightAdapterTrainBase,
weight_decompose,
factorization,
)
class OFTDiff(WeightAdapterTrainBase):
def __init__(self, weights):
super().__init__()
# Unpack weights tuple from LoHaAdapter
# Unpack weights tuple from OFTAdapter
blocks, rescale, alpha, _ = weights
# Create trainable parameters
@@ -52,6 +57,78 @@ class OFTDiff(WeightAdapterTrainBase):
weight = self.rescale * weight
return weight.to(org_dtype)
def _get_orthogonal_matrix(self, device, dtype):
"""Compute the orthogonal rotation matrix R from OFT blocks."""
blocks = self.oft_blocks.to(device=device, dtype=dtype)
I = torch.eye(self.block_size, device=device, dtype=dtype)
# Q = blocks - blocks^T (skew-symmetric)
q = blocks - blocks.transpose(1, 2)
normed_q = q
# Apply constraint if set
if self.constraint:
q_norm = torch.norm(q) + 1e-8
if q_norm > self.constraint:
normed_q = q * self.constraint / q_norm
# Cayley transform: R = (I + Q)(I - Q)^-1
r = (I + normed_q) @ (I - normed_q).float().inverse()
return r.to(dtype)
def h(self, x: torch.Tensor, base_out: torch.Tensor) -> torch.Tensor:
"""
OFT has no additive component - returns zeros matching base_out shape.
OFT only transforms the output via g(), it doesn't add to it.
"""
return torch.zeros_like(base_out)
def g(self, y: torch.Tensor) -> torch.Tensor:
"""
Output transformation for OFT: applies orthogonal rotation.
OFT transforms output channels using block-diagonal orthogonal matrices.
"""
r = self._get_orthogonal_matrix(y.device, y.dtype)
# Apply multiplier to interpolate between identity and full transform
multiplier = getattr(self, "multiplier", 1.0)
I = torch.eye(self.block_size, device=y.device, dtype=y.dtype)
r = r * multiplier + (1 - multiplier) * I
# Use module info from bypass injection
is_conv = getattr(self, "is_conv", y.dim() > 2)
if is_conv:
# Conv output: (N, C, H, W, ...) -> transpose to (N, H, W, ..., C)
y = y.transpose(1, -1)
# y now has channels in last dim
*batch_shape, out_features = y.shape
# Reshape to apply block-diagonal transform
# (*, out_features) -> (*, block_num, block_size)
y_blocked = y.reshape(*batch_shape, self.block_num, self.block_size)
# Apply orthogonal transform: R @ y for each block
# r: (block_num, block_size, block_size), y_blocked: (*, block_num, block_size)
out_blocked = torch.einsum("k n m, ... k n -> ... k m", r, y_blocked)
# Reshape back: (*, block_num, block_size) -> (*, out_features)
out = out_blocked.reshape(*batch_shape, out_features)
# Apply rescale if present
if self.rescaled:
rescale = self.rescale.to(device=y.device, dtype=y.dtype)
out = out * rescale.view(-1)
if is_conv:
# Transpose back: (N, H, W, ..., C) -> (N, C, H, W, ...)
out = out.transpose(1, -1)
return out
def passive_memory_usage(self):
"""Calculates memory usage of the trainable parameters."""
return sum(param.numel() * param.element_size() for param in self.parameters())
@@ -68,10 +145,10 @@ class OFTAdapter(WeightAdapterBase):
def create_train(cls, weight, rank=1, alpha=1.0):
out_dim = weight.shape[0]
block_size, block_num = factorization(out_dim, rank)
block = torch.zeros(block_num, block_size, block_size, device=weight.device, dtype=torch.float32)
return OFTDiff(
(block, None, alpha, None)
block = torch.zeros(
block_num, block_size, block_size, device=weight.device, dtype=torch.float32
)
return OFTDiff((block, None, alpha, None))
def to_train(self):
return OFTDiff(self.weights)
@@ -127,9 +204,13 @@ class OFTAdapter(WeightAdapterBase):
alpha = 0
dora_scale = v[3]
blocks = comfy.model_management.cast_to_device(blocks, weight.device, intermediate_dtype)
blocks = comfy.model_management.cast_to_device(
blocks, weight.device, intermediate_dtype
)
if rescale is not None:
rescale = comfy.model_management.cast_to_device(rescale, weight.device, intermediate_dtype)
rescale = comfy.model_management.cast_to_device(
rescale, weight.device, intermediate_dtype
)
block_num, block_size, *_ = blocks.shape
@@ -139,23 +220,108 @@ class OFTAdapter(WeightAdapterBase):
# for Q = -Q^T
q = blocks - blocks.transpose(1, 2)
normed_q = q
if alpha > 0: # alpha in oft/boft is for constraint
if alpha > 0: # alpha in oft/boft is for constraint
q_norm = torch.norm(q) + 1e-8
if q_norm > alpha:
normed_q = q * alpha / q_norm
# use float() to prevent unsupported type in .inverse()
r = (I + normed_q) @ (I - normed_q).float().inverse()
r = r.to(weight)
# Create I in weight's dtype for the einsum
I_w = torch.eye(block_size, device=weight.device, dtype=weight.dtype)
_, *shape = weight.shape
lora_diff = torch.einsum(
"k n m, k n ... -> k m ...",
(r * strength) - strength * I,
(r * strength) - strength * I_w,
weight.view(block_num, block_size, *shape),
).view(-1, *shape)
if dora_scale is not None:
weight = weight_decompose(dora_scale, weight, lora_diff, alpha, strength, intermediate_dtype, function)
weight = weight_decompose(
dora_scale,
weight,
lora_diff,
alpha,
strength,
intermediate_dtype,
function,
)
else:
weight += function((strength * lora_diff).type(weight.dtype))
except Exception as e:
logging.error("ERROR {} {} {}".format(self.name, key, e))
return weight
def _get_orthogonal_matrix(self, device, dtype):
"""Compute the orthogonal rotation matrix R from OFT blocks."""
v = self.weights
blocks = v[0].to(device=device, dtype=dtype)
alpha = v[2]
if alpha is None:
alpha = 0
block_num, block_size, _ = blocks.shape
I = torch.eye(block_size, device=device, dtype=dtype)
# Q = blocks - blocks^T (skew-symmetric)
q = blocks - blocks.transpose(1, 2)
normed_q = q
# Apply constraint if alpha > 0
if alpha > 0:
q_norm = torch.norm(q) + 1e-8
if q_norm > alpha:
normed_q = q * alpha / q_norm
# Cayley transform: R = (I + Q)(I - Q)^-1
r = (I + normed_q) @ (I - normed_q).float().inverse()
return r, block_num, block_size
def g(self, y: torch.Tensor) -> torch.Tensor:
"""
Output transformation for OFT: applies orthogonal rotation to output.
OFT transforms the output channels using block-diagonal orthogonal matrices.
Reference: LyCORIS DiagOFTModule._bypass_forward
"""
v = self.weights
rescale = v[1]
r, block_num, block_size = self._get_orthogonal_matrix(y.device, y.dtype)
# Apply multiplier to interpolate between identity and full transform
multiplier = getattr(self, "multiplier", 1.0)
I = torch.eye(block_size, device=y.device, dtype=y.dtype)
r = r * multiplier + (1 - multiplier) * I
# Use module info from bypass injection to determine conv vs linear
is_conv = getattr(self, "is_conv", y.dim() > 2)
if is_conv:
# Conv output: (N, C, H, W, ...) -> transpose to (N, H, W, ..., C)
y = y.transpose(1, -1)
# y now has channels in last dim
*batch_shape, out_features = y.shape
# Reshape to apply block-diagonal transform
# (*, out_features) -> (*, block_num, block_size)
y_blocked = y.view(*batch_shape, block_num, block_size)
# Apply orthogonal transform: R @ y for each block
# r: (block_num, block_size, block_size), y_blocked: (*, block_num, block_size)
out_blocked = torch.einsum("k n m, ... k n -> ... k m", r, y_blocked)
# Reshape back: (*, block_num, block_size) -> (*, out_features)
out = out_blocked.view(*batch_shape, out_features)
# Apply rescale if present
if rescale is not None:
rescale = rescale.to(device=y.device, dtype=y.dtype)
out = out * rescale.view(-1)
if is_conv:
# Transpose back: (N, H, W, ..., C) -> (N, C, H, W, ...)
out = out.transpose(1, -1)
return out

View File

@@ -374,7 +374,7 @@ class VideoFromComponents(VideoInput):
if audio_stream and self.__components.audio:
waveform = self.__components.audio['waveform']
waveform = waveform[:, :, :math.ceil((audio_sample_rate / frame_rate) * self.__components.images.shape[0])]
frame = av.AudioFrame.from_ndarray(waveform.movedim(2, 1).reshape(1, -1).float().numpy(), format='flt', layout='mono' if waveform.shape[1] == 1 else 'stereo')
frame = av.AudioFrame.from_ndarray(waveform.movedim(2, 1).reshape(1, -1).float().cpu().numpy(), format='flt', layout='mono' if waveform.shape[1] == 1 else 'stereo')
frame.sample_rate = audio_sample_rate
frame.pts = 0
output.mux(audio_stream.encode(frame))

View File

@@ -153,7 +153,7 @@ class Input(_IO_V3):
'''
Base class for a V3 Input.
'''
def __init__(self, id: str, display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None, extra_dict=None, raw_link: bool=None):
def __init__(self, id: str, display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None, extra_dict=None, raw_link: bool=None, advanced: bool=None):
super().__init__()
self.id = id
self.display_name = display_name
@@ -162,6 +162,7 @@ class Input(_IO_V3):
self.lazy = lazy
self.extra_dict = extra_dict if extra_dict is not None else {}
self.rawLink = raw_link
self.advanced = advanced
def as_dict(self):
return prune_dict({
@@ -170,6 +171,7 @@ class Input(_IO_V3):
"tooltip": self.tooltip,
"lazy": self.lazy,
"rawLink": self.rawLink,
"advanced": self.advanced,
}) | prune_dict(self.extra_dict)
def get_io_type(self):
@@ -184,8 +186,8 @@ class WidgetInput(Input):
'''
def __init__(self, id: str, display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None,
default: Any=None,
socketless: bool=None, widget_type: str=None, force_input: bool=None, extra_dict=None, raw_link: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, extra_dict, raw_link)
socketless: bool=None, widget_type: str=None, force_input: bool=None, extra_dict=None, raw_link: bool=None, advanced: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, extra_dict, raw_link, advanced)
self.default = default
self.socketless = socketless
self.widget_type = widget_type
@@ -242,8 +244,8 @@ class Boolean(ComfyTypeIO):
'''Boolean input.'''
def __init__(self, id: str, display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None,
default: bool=None, label_on: str=None, label_off: str=None,
socketless: bool=None, force_input: bool=None, extra_dict=None, raw_link: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, force_input, extra_dict, raw_link)
socketless: bool=None, force_input: bool=None, extra_dict=None, raw_link: bool=None, advanced: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, force_input, extra_dict, raw_link, advanced)
self.label_on = label_on
self.label_off = label_off
self.default: bool
@@ -262,8 +264,8 @@ class Int(ComfyTypeIO):
'''Integer input.'''
def __init__(self, id: str, display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None,
default: int=None, min: int=None, max: int=None, step: int=None, control_after_generate: bool=None,
display_mode: NumberDisplay=None, socketless: bool=None, force_input: bool=None, extra_dict=None, raw_link: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, force_input, extra_dict, raw_link)
display_mode: NumberDisplay=None, socketless: bool=None, force_input: bool=None, extra_dict=None, raw_link: bool=None, advanced: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, force_input, extra_dict, raw_link, advanced)
self.min = min
self.max = max
self.step = step
@@ -288,8 +290,8 @@ class Float(ComfyTypeIO):
'''Float input.'''
def __init__(self, id: str, display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None,
default: float=None, min: float=None, max: float=None, step: float=None, round: float=None,
display_mode: NumberDisplay=None, socketless: bool=None, force_input: bool=None, extra_dict=None, raw_link: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, force_input, extra_dict, raw_link)
display_mode: NumberDisplay=None, socketless: bool=None, force_input: bool=None, extra_dict=None, raw_link: bool=None, advanced: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, force_input, extra_dict, raw_link, advanced)
self.min = min
self.max = max
self.step = step
@@ -314,8 +316,8 @@ class String(ComfyTypeIO):
'''String input.'''
def __init__(self, id: str, display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None,
multiline=False, placeholder: str=None, default: str=None, dynamic_prompts: bool=None,
socketless: bool=None, force_input: bool=None, extra_dict=None, raw_link: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, force_input, extra_dict, raw_link)
socketless: bool=None, force_input: bool=None, extra_dict=None, raw_link: bool=None, advanced: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, force_input, extra_dict, raw_link, advanced)
self.multiline = multiline
self.placeholder = placeholder
self.dynamic_prompts = dynamic_prompts
@@ -350,12 +352,13 @@ class Combo(ComfyTypeIO):
socketless: bool=None,
extra_dict=None,
raw_link: bool=None,
advanced: bool=None,
):
if isinstance(options, type) and issubclass(options, Enum):
options = [v.value for v in options]
if isinstance(default, Enum):
default = default.value
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, None, extra_dict, raw_link)
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, None, extra_dict, raw_link, advanced)
self.multiselect = False
self.options = options
self.control_after_generate = control_after_generate
@@ -387,8 +390,8 @@ class MultiCombo(ComfyTypeI):
class Input(Combo.Input):
def __init__(self, id: str, options: list[str], display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None,
default: list[str]=None, placeholder: str=None, chip: bool=None, control_after_generate: bool=None,
socketless: bool=None, extra_dict=None, raw_link: bool=None):
super().__init__(id, options, display_name, optional, tooltip, lazy, default, control_after_generate, socketless=socketless, extra_dict=extra_dict, raw_link=raw_link)
socketless: bool=None, extra_dict=None, raw_link: bool=None, advanced: bool=None):
super().__init__(id, options, display_name, optional, tooltip, lazy, default, control_after_generate, socketless=socketless, extra_dict=extra_dict, raw_link=raw_link, advanced=advanced)
self.multiselect = True
self.placeholder = placeholder
self.chip = chip
@@ -421,9 +424,9 @@ class Webcam(ComfyTypeIO):
Type = str
def __init__(
self, id: str, display_name: str=None, optional=False,
tooltip: str=None, lazy: bool=None, default: str=None, socketless: bool=None, extra_dict=None, raw_link: bool=None
tooltip: str=None, lazy: bool=None, default: str=None, socketless: bool=None, extra_dict=None, raw_link: bool=None, advanced: bool=None
):
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, None, extra_dict, raw_link)
super().__init__(id, display_name, optional, tooltip, lazy, default, socketless, None, None, extra_dict, raw_link, advanced)
@comfytype(io_type="MASK")
@@ -751,7 +754,7 @@ class AnyType(ComfyTypeIO):
Type = Any
@comfytype(io_type="MODEL_PATCH")
class MODEL_PATCH(ComfyTypeIO):
class ModelPatch(ComfyTypeIO):
Type = Any
@comfytype(io_type="AUDIO_ENCODER")
@@ -776,7 +779,7 @@ class MultiType:
'''
Input that permits more than one input type; if `id` is an instance of `ComfyType.Input`, then that input will be used to create a widget (if applicable) with overridden values.
'''
def __init__(self, id: str | Input, types: list[type[_ComfyType] | _ComfyType], display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None, extra_dict=None, raw_link: bool=None):
def __init__(self, id: str | Input, types: list[type[_ComfyType] | _ComfyType], display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None, extra_dict=None, raw_link: bool=None, advanced: bool=None):
# if id is an Input, then use that Input with overridden values
self.input_override = None
if isinstance(id, Input):
@@ -789,7 +792,7 @@ class MultiType:
# if is a widget input, make sure widget_type is set appropriately
if isinstance(self.input_override, WidgetInput):
self.input_override.widget_type = self.input_override.get_io_type()
super().__init__(id, display_name, optional, tooltip, lazy, extra_dict, raw_link)
super().__init__(id, display_name, optional, tooltip, lazy, extra_dict, raw_link, advanced)
self._io_types = types
@property
@@ -843,8 +846,8 @@ class MatchType(ComfyTypeIO):
class Input(Input):
def __init__(self, id: str, template: MatchType.Template,
display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None, extra_dict=None, raw_link: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, extra_dict, raw_link)
display_name: str=None, optional=False, tooltip: str=None, lazy: bool=None, extra_dict=None, raw_link: bool=None, advanced: bool=None):
super().__init__(id, display_name, optional, tooltip, lazy, extra_dict, raw_link, advanced)
self.template = template
def as_dict(self):
@@ -997,20 +1000,38 @@ class Autogrow(ComfyTypeI):
names = [f"{prefix}{i}" for i in range(max)]
# need to create a new input based on the contents of input
template_input = None
for _, dict_input in input.items():
# for now, get just the first value from dict_input
template_required = True
for _input_type, dict_input in input.items():
# for now, get just the first value from dict_input; if not required, min can be ignored
if len(dict_input) == 0:
continue
template_input = list(dict_input.values())[0]
template_required = _input_type == "required"
break
if template_input is None:
raise Exception("template_input could not be determined from required or optional; this should never happen.")
new_dict = {}
new_dict_added_to = False
# first, add possible inputs into out_dict
for i, name in enumerate(names):
expected_id = finalize_prefix(curr_prefix, name)
# required
if i < min and template_required:
out_dict["required"][expected_id] = template_input
type_dict = new_dict.setdefault("required", {})
# optional
else:
out_dict["optional"][expected_id] = template_input
type_dict = new_dict.setdefault("optional", {})
if expected_id in live_inputs:
# required
if i < min:
type_dict = new_dict.setdefault("required", {})
# optional
else:
type_dict = new_dict.setdefault("optional", {})
# NOTE: prefix gets added in parse_class_inputs
type_dict[name] = template_input
new_dict_added_to = True
# account for the edge case that all inputs are optional and no values are received
if not new_dict_added_to:
finalized_prefix = finalize_prefix(curr_prefix)
out_dict["dynamic_paths"][finalized_prefix] = finalized_prefix
out_dict["dynamic_paths_default_value"][finalized_prefix] = DynamicPathsDefaultValue.EMPTY_DICT
parse_class_inputs(out_dict, live_inputs, new_dict, curr_prefix)
@comfytype(io_type="COMFY_DYNAMICCOMBO_V3")
@@ -1119,8 +1140,8 @@ class ImageCompare(ComfyTypeI):
class Input(WidgetInput):
def __init__(self, id: str, display_name: str=None, optional=False, tooltip: str=None,
socketless: bool=True):
super().__init__(id, display_name, optional, tooltip, None, None, socketless)
socketless: bool=True, advanced: bool=None):
super().__init__(id, display_name, optional, tooltip, None, None, socketless, None, None, None, None, advanced)
def as_dict(self):
return super().as_dict()
@@ -1148,6 +1169,8 @@ class V3Data(TypedDict):
'Dictionary where the keys are the hidden input ids and the values are the values of the hidden inputs.'
dynamic_paths: dict[str, Any]
'Dictionary where the keys are the input ids and the values dictate how to turn the inputs into a nested dictionary.'
dynamic_paths_default_value: dict[str, Any]
'Dictionary where the keys are the input ids and the values are a string from DynamicPathsDefaultValue for the inputs if value is None.'
create_dynamic_tuple: bool
'When True, the value of the dynamic input will be in the format (value, path_key).'
@@ -1224,7 +1247,10 @@ class NodeInfoV1:
output_node: bool=None
deprecated: bool=None
experimental: bool=None
dev_only: bool=None
api_node: bool=None
price_badge: dict | None = None
search_aliases: list[str]=None
@dataclass
class NodeInfoV3:
@@ -1234,11 +1260,78 @@ class NodeInfoV3:
name: str=None
display_name: str=None
description: str=None
python_module: Any = None
category: str=None
output_node: bool=None
deprecated: bool=None
experimental: bool=None
dev_only: bool=None
api_node: bool=None
price_badge: dict | None = None
@dataclass
class PriceBadgeDepends:
widgets: list[str] = field(default_factory=list)
inputs: list[str] = field(default_factory=list)
input_groups: list[str] = field(default_factory=list)
def validate(self) -> None:
if not isinstance(self.widgets, list) or any(not isinstance(x, str) for x in self.widgets):
raise ValueError("PriceBadgeDepends.widgets must be a list[str].")
if not isinstance(self.inputs, list) or any(not isinstance(x, str) for x in self.inputs):
raise ValueError("PriceBadgeDepends.inputs must be a list[str].")
if not isinstance(self.input_groups, list) or any(not isinstance(x, str) for x in self.input_groups):
raise ValueError("PriceBadgeDepends.input_groups must be a list[str].")
def as_dict(self, schema_inputs: list["Input"]) -> dict[str, Any]:
# Build lookup: widget_id -> io_type
input_types: dict[str, str] = {}
for inp in schema_inputs:
all_inputs = inp.get_all()
input_types[inp.id] = inp.get_io_type() # First input is always the parent itself
for nested_inp in all_inputs[1:]:
# For DynamicCombo/DynamicSlot, nested inputs are prefixed with parent ID
# to match frontend naming convention (e.g., "should_texture.enable_pbr")
prefixed_id = f"{inp.id}.{nested_inp.id}"
input_types[prefixed_id] = nested_inp.get_io_type()
# Enrich widgets with type information, raising error for unknown widgets
widgets_data: list[dict[str, str]] = []
for w in self.widgets:
if w not in input_types:
raise ValueError(
f"PriceBadge depends_on.widgets references unknown widget '{w}'. "
f"Available widgets: {list(input_types.keys())}"
)
widgets_data.append({"name": w, "type": input_types[w]})
return {
"widgets": widgets_data,
"inputs": self.inputs,
"input_groups": self.input_groups,
}
@dataclass
class PriceBadge:
expr: str
depends_on: PriceBadgeDepends = field(default_factory=PriceBadgeDepends)
engine: str = field(default="jsonata")
def validate(self) -> None:
if self.engine != "jsonata":
raise ValueError(f"Unsupported PriceBadge.engine '{self.engine}'. Only 'jsonata' is supported.")
if not isinstance(self.expr, str) or not self.expr.strip():
raise ValueError("PriceBadge.expr must be a non-empty string.")
self.depends_on.validate()
def as_dict(self, schema_inputs: list["Input"]) -> dict[str, Any]:
return {
"engine": self.engine,
"depends_on": self.depends_on.as_dict(schema_inputs),
"expr": self.expr,
}
@dataclass
@@ -1256,6 +1349,8 @@ class Schema:
hidden: list[Hidden] = field(default_factory=list)
description: str=""
"""Node description, shown as a tooltip when hovering over the node."""
search_aliases: list[str] = field(default_factory=list)
"""Alternative names for search. Useful for synonyms, abbreviations, or old names after renaming."""
is_input_list: bool = False
"""A flag indicating if this node implements the additional code necessary to deal with OUTPUT_IS_LIST nodes.
@@ -1282,12 +1377,18 @@ class Schema:
"""Flags a node as deprecated, indicating to users that they should find alternatives to this node."""
is_experimental: bool=False
"""Flags a node as experimental, informing users that it may change or not work as expected."""
is_dev_only: bool=False
"""Flags a node as dev-only, hiding it from search/menus unless dev mode is enabled."""
is_api_node: bool=False
"""Flags a node as an API node. See: https://docs.comfy.org/tutorials/api-nodes/overview."""
price_badge: PriceBadge | None = None
"""Optional client-evaluated pricing badge declaration for this node."""
not_idempotent: bool=False
"""Flags a node as not idempotent; when True, the node will run and not reuse the cached outputs when identical inputs are provided on a different node in the graph."""
enable_expand: bool=False
"""Flags a node as expandable, allowing NodeOutput to include 'expand' property."""
accept_all_inputs: bool=False
"""When True, all inputs from the prompt will be passed to the node as kwargs, even if not defined in the schema."""
def validate(self):
'''Validate the schema:
@@ -1314,6 +1415,8 @@ class Schema:
input.validate()
for output in self.outputs:
output.validate()
if self.price_badge is not None:
self.price_badge.validate()
def finalize(self):
"""Add hidden based on selected schema options, and give outputs without ids default ids."""
@@ -1386,8 +1489,11 @@ class Schema:
output_node=self.is_output_node,
deprecated=self.is_deprecated,
experimental=self.is_experimental,
dev_only=self.is_dev_only,
api_node=self.is_api_node,
python_module=getattr(cls, "RELATIVE_PYTHON_MODULE", "nodes")
python_module=getattr(cls, "RELATIVE_PYTHON_MODULE", "nodes"),
price_badge=self.price_badge.as_dict(self.inputs) if self.price_badge is not None else None,
search_aliases=self.search_aliases if self.search_aliases else None,
)
return info
@@ -1418,8 +1524,10 @@ class Schema:
output_node=self.is_output_node,
deprecated=self.is_deprecated,
experimental=self.is_experimental,
dev_only=self.is_dev_only,
api_node=self.is_api_node,
python_module=getattr(cls, "RELATIVE_PYTHON_MODULE", "nodes")
python_module=getattr(cls, "RELATIVE_PYTHON_MODULE", "nodes"),
price_badge=self.price_badge.as_dict(self.inputs) if self.price_badge is not None else None,
)
return info
@@ -1428,6 +1536,7 @@ def get_finalized_class_inputs(d: dict[str, Any], live_inputs: dict[str, Any], i
"required": {},
"optional": {},
"dynamic_paths": {},
"dynamic_paths_default_value": {},
}
d = d.copy()
# ignore hidden for parsing
@@ -1437,8 +1546,12 @@ def get_finalized_class_inputs(d: dict[str, Any], live_inputs: dict[str, Any], i
out_dict["hidden"] = hidden
v3_data = {}
dynamic_paths = out_dict.pop("dynamic_paths", None)
if dynamic_paths is not None:
if dynamic_paths is not None and len(dynamic_paths) > 0:
v3_data["dynamic_paths"] = dynamic_paths
# this list is used for autogrow, in the case all inputs are optional and no values are passed
dynamic_paths_default_value = out_dict.pop("dynamic_paths_default_value", None)
if dynamic_paths_default_value is not None and len(dynamic_paths_default_value) > 0:
v3_data["dynamic_paths_default_value"] = dynamic_paths_default_value
return out_dict, hidden, v3_data
def parse_class_inputs(out_dict: dict[str, Any], live_inputs: dict[str, Any], curr_dict: dict[str, Any], curr_prefix: list[str] | None=None) -> None:
@@ -1475,11 +1588,16 @@ def add_to_dict_v1(i: Input, d: dict):
def add_to_dict_v3(io: Input | Output, d: dict):
d[io.id] = (io.get_io_type(), io.as_dict())
class DynamicPathsDefaultValue:
EMPTY_DICT = "empty_dict"
def build_nested_inputs(values: dict[str, Any], v3_data: V3Data):
paths = v3_data.get("dynamic_paths", None)
default_value_dict = v3_data.get("dynamic_paths_default_value", {})
if paths is None:
return values
values = values.copy()
result = {}
create_tuple = v3_data.get("create_dynamic_tuple", False)
@@ -1493,6 +1611,11 @@ def build_nested_inputs(values: dict[str, Any], v3_data: V3Data):
if is_last:
value = values.pop(key, None)
if value is None:
# see if a default value was provided for this key
default_option = default_value_dict.get(key, None)
if default_option == DynamicPathsDefaultValue.EMPTY_DICT:
value = {}
if create_tuple:
value = (value, key)
current[p] = value
@@ -1674,6 +1797,14 @@ class _ComfyNodeBaseInternal(_ComfyNodeInternal):
cls.GET_SCHEMA()
return cls._DEPRECATED
_DEV_ONLY = None
@final
@classproperty
def DEV_ONLY(cls): # noqa
if cls._DEV_ONLY is None:
cls.GET_SCHEMA()
return cls._DEV_ONLY
_API_NODE = None
@final
@classproperty
@@ -1738,6 +1869,14 @@ class _ComfyNodeBaseInternal(_ComfyNodeInternal):
cls.GET_SCHEMA()
return cls._NOT_IDEMPOTENT
_ACCEPT_ALL_INPUTS = None
@final
@classproperty
def ACCEPT_ALL_INPUTS(cls): # noqa
if cls._ACCEPT_ALL_INPUTS is None:
cls.GET_SCHEMA()
return cls._ACCEPT_ALL_INPUTS
@final
@classmethod
def INPUT_TYPES(cls) -> dict[str, dict]:
@@ -1768,6 +1907,8 @@ class _ComfyNodeBaseInternal(_ComfyNodeInternal):
cls._EXPERIMENTAL = schema.is_experimental
if cls._DEPRECATED is None:
cls._DEPRECATED = schema.is_deprecated
if cls._DEV_ONLY is None:
cls._DEV_ONLY = schema.is_dev_only
if cls._API_NODE is None:
cls._API_NODE = schema.is_api_node
if cls._OUTPUT_NODE is None:
@@ -1776,6 +1917,8 @@ class _ComfyNodeBaseInternal(_ComfyNodeInternal):
cls._INPUT_IS_LIST = schema.is_input_list
if cls._NOT_IDEMPOTENT is None:
cls._NOT_IDEMPOTENT = schema.not_idempotent
if cls._ACCEPT_ALL_INPUTS is None:
cls._ACCEPT_ALL_INPUTS = schema.accept_all_inputs
if cls._RETURN_TYPES is None:
output = []
@@ -1923,6 +2066,7 @@ __all__ = [
"ControlNet",
"Vae",
"Model",
"ModelPatch",
"ClipVision",
"ClipVisionOutput",
"AudioEncoder",
@@ -1971,4 +2115,6 @@ __all__ = [
"add_to_dict_v3",
"V3Data",
"ImageCompare",
"PriceBadgeDepends",
"PriceBadge",
]

View File

@@ -1,65 +0,0 @@
# ComfyUI API Nodes
## Introduction
Below are a collection of nodes that work by calling external APIs. More information available in our [docs](https://docs.comfy.org/tutorials/api-nodes/overview).
## Development
While developing, you should be testing against the Staging environment. To test against staging:
**Install ComfyUI_frontend**
Follow the instructions [here](https://github.com/Comfy-Org/ComfyUI_frontend) to start the frontend server. By default, it will connect to Staging authentication.
> **Hint:** If you use --front-end-version argument for ComfyUI, it will use production authentication.
```bash
python run main.py --comfy-api-base https://stagingapi.comfy.org
```
To authenticate to staging, please login and then ask one of Comfy Org team to whitelist you for access to staging.
API stubs are generated through automatic codegen tools from OpenAPI definitions. Since the Comfy Org OpenAPI definition contains many things from the Comfy Registry as well, we use redocly/cli to filter out only the paths relevant for API nodes.
### Redocly Instructions
**Tip**
When developing locally, use the `redocly-dev.yaml` file to generate pydantic models. This lets you use stubs for APIs that are not marked `Released` yet.
Before your API node PR merges, make sure to add the `Released` tag to the `openapi.yaml` file and test in staging.
```bash
# Download the OpenAPI file from staging server.
curl -o openapi.yaml https://stagingapi.comfy.org/openapi
# Filter out unneeded API definitions.
npm install -g @redocly/cli
redocly bundle openapi.yaml --output filtered-openapi.yaml --config comfy_api_nodes/redocly-dev.yaml --remove-unused-components
# Generate the pydantic datamodels for validation.
datamodel-codegen --use-subclass-enum --field-constraints --strict-types bytes --input filtered-openapi.yaml --output comfy_api_nodes/apis/__init__.py --output-model-type pydantic_v2.BaseModel
```
# Merging to Master
Before merging to comfyanonymous/ComfyUI master, follow these steps:
1. Add the "Released" tag to the ComfyUI OpenAPI yaml file for each endpoint you are using in the nodes.
1. Make sure the ComfyUI API is deployed to prod with your changes.
1. Run the code generation again with `redocly.yaml` and the production OpenAPI yaml file.
```bash
# Download the OpenAPI file from prod server.
curl -o openapi.yaml https://api.comfy.org/openapi
# Filter out unneeded API definitions.
npm install -g @redocly/cli
redocly bundle openapi.yaml --output filtered-openapi.yaml --config comfy_api_nodes/redocly.yaml --remove-unused-components
# Generate the pydantic datamodels for validation.
datamodel-codegen --use-subclass-enum --field-constraints --strict-types bytes --input filtered-openapi.yaml --output comfy_api_nodes/apis/__init__.py --output-model-type pydantic_v2.BaseModel
```

View File

@@ -0,0 +1,61 @@
from typing import TypedDict
from pydantic import BaseModel, Field
class InputModerationSettings(TypedDict):
prompt_content_moderation: bool
visual_input_moderation: bool
visual_output_moderation: bool
class BriaEditImageRequest(BaseModel):
instruction: str | None = Field(...)
structured_instruction: str | None = Field(
...,
description="Use this instead of instruction for precise, programmatic control.",
)
images: list[str] = Field(
...,
description="Required. Publicly available URL or Base64-encoded. Must contain exactly one item.",
)
mask: str | None = Field(
None,
description="Mask image (black and white). Black areas will be preserved, white areas will be edited. "
"If omitted, the edit applies to the entire image. "
"The input image and the the input mask must be of the same size.",
)
negative_prompt: str | None = Field(None)
guidance_scale: float = Field(...)
model_version: str = Field(...)
steps_num: int = Field(...)
seed: int = Field(...)
ip_signal: bool = Field(
False,
description="If true, returns a warning for potential IP content in the instruction.",
)
prompt_content_moderation: bool = Field(
False, description="If true, returns 422 on instruction moderation failure."
)
visual_input_content_moderation: bool = Field(
False, description="If true, returns 422 on images or mask moderation failure."
)
visual_output_content_moderation: bool = Field(
False, description="If true, returns 422 on visual output moderation failure."
)
class BriaStatusResponse(BaseModel):
request_id: str = Field(...)
status_url: str = Field(...)
warning: str | None = Field(None)
class BriaResult(BaseModel):
structured_prompt: str = Field(...)
image_url: str = Field(...)
class BriaResponse(BaseModel):
status: str = Field(...)
result: BriaResult | None = Field(None)

View File

@@ -13,17 +13,6 @@ class Text2ImageTaskCreationRequest(BaseModel):
watermark: bool | None = Field(False)
class Image2ImageTaskCreationRequest(BaseModel):
model: str = Field(...)
prompt: str = Field(...)
response_format: str | None = Field("url")
image: str = Field(..., description="Base64 encoded string or image URL")
size: str | None = Field("adaptive")
seed: int | None = Field(..., ge=0, le=2147483647)
guidance_scale: float | None = Field(..., ge=1.0, le=10.0)
watermark: bool | None = Field(False)
class Seedream4Options(BaseModel):
max_images: int = Field(15)
@@ -65,11 +54,13 @@ class TaskImageContent(BaseModel):
class Text2VideoTaskCreationRequest(BaseModel):
model: str = Field(...)
content: list[TaskTextContent] = Field(..., min_length=1)
generate_audio: bool | None = Field(...)
class Image2VideoTaskCreationRequest(BaseModel):
model: str = Field(...)
content: list[TaskTextContent | TaskImageContent] = Field(..., min_length=2)
generate_audio: bool | None = Field(...)
class TaskCreationResponse(BaseModel):
@@ -141,4 +132,9 @@ VIDEO_TASKS_EXECUTION_TIME = {
"720p": 65,
"1080p": 100,
},
"seedance-1-5-pro-251215": {
"480p": 80,
"720p": 100,
"1080p": 150,
},
}

View File

@@ -0,0 +1,66 @@
from typing import TypedDict
from pydantic import BaseModel, Field, model_validator
class InputGenerateType(TypedDict):
generate_type: str
polygon_type: str
pbr: bool
class Hunyuan3DViewImage(BaseModel):
ViewType: str = Field(..., description="Valid values: back, left, right.")
ViewImageUrl: str = Field(...)
class To3DProTaskRequest(BaseModel):
Model: str = Field(...)
Prompt: str | None = Field(None)
ImageUrl: str | None = Field(None)
MultiViewImages: list[Hunyuan3DViewImage] | None = Field(None)
EnablePBR: bool | None = Field(...)
FaceCount: int | None = Field(...)
GenerateType: str | None = Field(...)
PolygonType: str | None = Field(...)
class RequestError(BaseModel):
Code: str = Field("")
Message: str = Field("")
class To3DProTaskCreateResponse(BaseModel):
JobId: str | None = Field(None)
Error: RequestError | None = Field(None)
@model_validator(mode="before")
@classmethod
def unwrap_data(cls, values: dict) -> dict:
if "Response" in values and isinstance(values["Response"], dict):
return values["Response"]
return values
class ResultFile3D(BaseModel):
Type: str = Field(...)
Url: str = Field(...)
PreviewImageUrl: str = Field("")
class To3DProTaskResultResponse(BaseModel):
ErrorCode: str = Field("")
ErrorMessage: str = Field("")
ResultFile3Ds: list[ResultFile3D] = Field([])
Status: str = Field(...)
@model_validator(mode="before")
@classmethod
def unwrap_data(cls, values: dict) -> dict:
if "Response" in values and isinstance(values["Response"], dict):
return values["Response"]
return values
class To3DProTaskQueryRequest(BaseModel):
JobId: str = Field(...)

View File

@@ -0,0 +1,292 @@
from enum import Enum
from typing import Optional, List, Dict, Any, Union
from datetime import datetime
from pydantic import BaseModel, Field, RootModel, StrictBytes
class IdeogramColorPalette1(BaseModel):
name: str = Field(..., description='Name of the preset color palette')
class Member(BaseModel):
color: Optional[str] = Field(
None, description='Hexadecimal color code', pattern='^#[0-9A-Fa-f]{6}$'
)
weight: Optional[float] = Field(
None, description='Optional weight for the color (0-1)', ge=0.0, le=1.0
)
class IdeogramColorPalette2(BaseModel):
members: List[Member] = Field(
..., description='Array of color definitions with optional weights'
)
class IdeogramColorPalette(
RootModel[Union[IdeogramColorPalette1, IdeogramColorPalette2]]
):
root: Union[IdeogramColorPalette1, IdeogramColorPalette2] = Field(
...,
description='A color palette specification that can either use a preset name or explicit color definitions with weights',
)
class ImageRequest(BaseModel):
aspect_ratio: Optional[str] = Field(
None,
description="Optional. The aspect ratio (e.g., 'ASPECT_16_9', 'ASPECT_1_1'). Cannot be used with resolution. Defaults to 'ASPECT_1_1' if unspecified.",
)
color_palette: Optional[Dict[str, Any]] = Field(
None, description='Optional. Color palette object. Only for V_2, V_2_TURBO.'
)
magic_prompt_option: Optional[str] = Field(
None, description="Optional. MagicPrompt usage ('AUTO', 'ON', 'OFF')."
)
model: str = Field(..., description="The model used (e.g., 'V_2', 'V_2A_TURBO')")
negative_prompt: Optional[str] = Field(
None,
description='Optional. Description of what to exclude. Only for V_1, V_1_TURBO, V_2, V_2_TURBO.',
)
num_images: Optional[int] = Field(
1,
description='Optional. Number of images to generate (1-8). Defaults to 1.',
ge=1,
le=8,
)
prompt: str = Field(
..., description='Required. The prompt to use to generate the image.'
)
resolution: Optional[str] = Field(
None,
description="Optional. Resolution (e.g., 'RESOLUTION_1024_1024'). Only for model V_2. Cannot be used with aspect_ratio.",
)
seed: Optional[int] = Field(
None,
description='Optional. A number between 0 and 2147483647.',
ge=0,
le=2147483647,
)
style_type: Optional[str] = Field(
None,
description="Optional. Style type ('AUTO', 'GENERAL', 'REALISTIC', 'DESIGN', 'RENDER_3D', 'ANIME'). Only for models V_2 and above.",
)
class IdeogramGenerateRequest(BaseModel):
image_request: ImageRequest = Field(
..., description='The image generation request parameters.'
)
class Datum(BaseModel):
is_image_safe: Optional[bool] = Field(
None, description='Indicates whether the image is considered safe.'
)
prompt: Optional[str] = Field(
None, description='The prompt used to generate this image.'
)
resolution: Optional[str] = Field(
None, description="The resolution of the generated image (e.g., '1024x1024')."
)
seed: Optional[int] = Field(
None, description='The seed value used for this generation.'
)
style_type: Optional[str] = Field(
None,
description="The style type used for generation (e.g., 'REALISTIC', 'ANIME').",
)
url: Optional[str] = Field(None, description='URL to the generated image.')
class IdeogramGenerateResponse(BaseModel):
created: Optional[datetime] = Field(
None, description='Timestamp when the generation was created.'
)
data: Optional[List[Datum]] = Field(
None, description='Array of generated image information.'
)
class StyleCode(RootModel[str]):
root: str = Field(..., pattern='^[0-9A-Fa-f]{8}$')
class Datum1(BaseModel):
is_image_safe: Optional[bool] = None
prompt: Optional[str] = None
resolution: Optional[str] = None
seed: Optional[int] = None
style_type: Optional[str] = None
url: Optional[str] = None
class IdeogramV3IdeogramResponse(BaseModel):
created: Optional[datetime] = None
data: Optional[List[Datum1]] = None
class RenderingSpeed1(str, Enum):
TURBO = 'TURBO'
DEFAULT = 'DEFAULT'
QUALITY = 'QUALITY'
class IdeogramV3ReframeRequest(BaseModel):
color_palette: Optional[Dict[str, Any]] = None
image: Optional[StrictBytes] = None
num_images: Optional[int] = Field(None, ge=1, le=8)
rendering_speed: Optional[RenderingSpeed1] = None
resolution: str
seed: Optional[int] = Field(None, ge=0, le=2147483647)
style_codes: Optional[List[str]] = None
style_reference_images: Optional[List[StrictBytes]] = None
class MagicPrompt(str, Enum):
AUTO = 'AUTO'
ON = 'ON'
OFF = 'OFF'
class StyleType(str, Enum):
AUTO = 'AUTO'
GENERAL = 'GENERAL'
REALISTIC = 'REALISTIC'
DESIGN = 'DESIGN'
class IdeogramV3RemixRequest(BaseModel):
aspect_ratio: Optional[str] = None
color_palette: Optional[Dict[str, Any]] = None
image: Optional[StrictBytes] = None
image_weight: Optional[int] = Field(50, ge=1, le=100)
magic_prompt: Optional[MagicPrompt] = None
negative_prompt: Optional[str] = None
num_images: Optional[int] = Field(None, ge=1, le=8)
prompt: str
rendering_speed: Optional[RenderingSpeed1] = None
resolution: Optional[str] = None
seed: Optional[int] = Field(None, ge=0, le=2147483647)
style_codes: Optional[List[str]] = None
style_reference_images: Optional[List[StrictBytes]] = None
style_type: Optional[StyleType] = None
class IdeogramV3ReplaceBackgroundRequest(BaseModel):
color_palette: Optional[Dict[str, Any]] = None
image: Optional[StrictBytes] = None
magic_prompt: Optional[MagicPrompt] = None
num_images: Optional[int] = Field(None, ge=1, le=8)
prompt: str
rendering_speed: Optional[RenderingSpeed1] = None
seed: Optional[int] = Field(None, ge=0, le=2147483647)
style_codes: Optional[List[str]] = None
style_reference_images: Optional[List[StrictBytes]] = None
class ColorPalette(BaseModel):
name: str = Field(..., description='Name of the color palette', examples=['PASTEL'])
class MagicPrompt2(str, Enum):
ON = 'ON'
OFF = 'OFF'
class StyleType1(str, Enum):
AUTO = 'AUTO'
GENERAL = 'GENERAL'
REALISTIC = 'REALISTIC'
DESIGN = 'DESIGN'
FICTION = 'FICTION'
class RenderingSpeed(str, Enum):
DEFAULT = 'DEFAULT'
TURBO = 'TURBO'
QUALITY = 'QUALITY'
class IdeogramV3EditRequest(BaseModel):
color_palette: Optional[IdeogramColorPalette] = None
image: Optional[StrictBytes] = Field(
None,
description='The image being edited (max size 10MB); only JPEG, WebP and PNG formats are supported at this time.',
)
magic_prompt: Optional[str] = Field(
None,
description='Determine if MagicPrompt should be used in generating the request or not.',
)
mask: Optional[StrictBytes] = Field(
None,
description='A black and white image of the same size as the image being edited (max size 10MB). Black regions in the mask should match up with the regions of the image that you would like to edit; only JPEG, WebP and PNG formats are supported at this time.',
)
num_images: Optional[int] = Field(
None, description='The number of images to generate.'
)
prompt: str = Field(
..., description='The prompt used to describe the edited result.'
)
rendering_speed: RenderingSpeed
seed: Optional[int] = Field(
None, description='Random seed. Set for reproducible generation.'
)
style_codes: Optional[List[StyleCode]] = Field(
None,
description='A list of 8 character hexadecimal codes representing the style of the image. Cannot be used in conjunction with style_reference_images or style_type.',
)
style_reference_images: Optional[List[StrictBytes]] = Field(
None,
description='A set of images to use as style references (maximum total size 10MB across all style references). The images should be in JPEG, PNG or WebP format.',
)
character_reference_images: Optional[List[str]] = Field(
None,
description='Generations with character reference are subject to the character reference pricing. A set of images to use as character references (maximum total size 10MB across all character references), currently only supports 1 character reference image. The images should be in JPEG, PNG or WebP format.'
)
character_reference_images_mask: Optional[List[str]] = Field(
None,
description='Optional masks for character reference images. When provided, must match the number of character_reference_images. Each mask should be a grayscale image of the same dimensions as the corresponding character reference image. The images should be in JPEG, PNG or WebP format.'
)
class IdeogramV3Request(BaseModel):
aspect_ratio: Optional[str] = Field(
None, description='Aspect ratio in format WxH', examples=['1x3']
)
color_palette: Optional[ColorPalette] = None
magic_prompt: Optional[MagicPrompt2] = Field(
None, description='Whether to enable magic prompt enhancement'
)
negative_prompt: Optional[str] = Field(
None, description='Text prompt specifying what to avoid in the generation'
)
num_images: Optional[int] = Field(
None, description='Number of images to generate', ge=1
)
prompt: str = Field(..., description='The text prompt for image generation')
rendering_speed: RenderingSpeed
resolution: Optional[str] = Field(
None, description='Image resolution in format WxH', examples=['1280x800']
)
seed: Optional[int] = Field(
None, description='Seed value for reproducible generation'
)
style_codes: Optional[List[StyleCode]] = Field(
None, description='Array of style codes in hexadecimal format'
)
style_reference_images: Optional[List[str]] = Field(
None, description='Array of reference image URLs or identifiers'
)
style_type: Optional[StyleType1] = Field(
None, description='The type of style to apply'
)
character_reference_images: Optional[List[str]] = Field(
None,
description='Generations with character reference are subject to the character reference pricing. A set of images to use as character references (maximum total size 10MB across all character references), currently only supports 1 character reference image. The images should be in JPEG, PNG or WebP format.'
)
character_reference_images_mask: Optional[List[str]] = Field(
None,
description='Optional masks for character reference images. When provided, must match the number of character_reference_images. Each mask should be a grayscale image of the same dimensions as the corresponding character reference image. The images should be in JPEG, PNG or WebP format.'
)

View File

@@ -0,0 +1,122 @@
from typing import TypedDict
from pydantic import AliasChoices, BaseModel, Field, model_validator
class InputPortraitMode(TypedDict):
portrait_mode: str
portrait_style: str
portrait_beautifier: str
class InputAdvancedSettings(TypedDict):
advanced_settings: str
whites: int
blacks: int
brightness: int
contrast: int
saturation: int
engine: str
transfer_light_a: str
transfer_light_b: str
fixed_generation: bool
class InputSkinEnhancerMode(TypedDict):
mode: str
skin_detail: int
optimized_for: str
class ImageUpscalerCreativeRequest(BaseModel):
image: str = Field(...)
scale_factor: str = Field(...)
optimized_for: str = Field(...)
prompt: str | None = Field(None)
creativity: int = Field(...)
hdr: int = Field(...)
resemblance: int = Field(...)
fractality: int = Field(...)
engine: str = Field(...)
class ImageUpscalerPrecisionV2Request(BaseModel):
image: str = Field(...)
sharpen: int = Field(...)
smart_grain: int = Field(...)
ultra_detail: int = Field(...)
flavor: str = Field(...)
scale_factor: int = Field(...)
class ImageRelightAdvancedSettingsRequest(BaseModel):
whites: int = Field(...)
blacks: int = Field(...)
brightness: int = Field(...)
contrast: int = Field(...)
saturation: int = Field(...)
engine: str = Field(...)
transfer_light_a: str = Field(...)
transfer_light_b: str = Field(...)
fixed_generation: bool = Field(...)
class ImageRelightRequest(BaseModel):
image: str = Field(...)
prompt: str | None = Field(None)
transfer_light_from_reference_image: str | None = Field(None)
light_transfer_strength: int = Field(...)
interpolate_from_original: bool = Field(...)
change_background: bool = Field(...)
style: str = Field(...)
preserve_details: bool = Field(...)
advanced_settings: ImageRelightAdvancedSettingsRequest | None = Field(...)
class ImageStyleTransferRequest(BaseModel):
image: str = Field(...)
reference_image: str = Field(...)
prompt: str | None = Field(None)
style_strength: int = Field(...)
structure_strength: int = Field(...)
is_portrait: bool = Field(...)
portrait_style: str | None = Field(...)
portrait_beautifier: str | None = Field(...)
flavor: str = Field(...)
engine: str = Field(...)
fixed_generation: bool = Field(...)
class ImageSkinEnhancerCreativeRequest(BaseModel):
image: str = Field(...)
sharpen: int = Field(...)
smart_grain: int = Field(...)
class ImageSkinEnhancerFaithfulRequest(BaseModel):
image: str = Field(...)
sharpen: int = Field(...)
smart_grain: int = Field(...)
skin_detail: int = Field(...)
class ImageSkinEnhancerFlexibleRequest(BaseModel):
image: str = Field(...)
sharpen: int = Field(...)
smart_grain: int = Field(...)
optimized_for: str = Field(...)
class TaskResponse(BaseModel):
"""Unified response model that handles both wrapped and unwrapped API responses."""
task_id: str = Field(...)
status: str = Field(validation_alias=AliasChoices("status", "task_status"))
generated: list[str] | None = Field(None)
@model_validator(mode="before")
@classmethod
def unwrap_data(cls, values: dict) -> dict:
if "data" in values and isinstance(values["data"], dict):
return values["data"]
return values

View File

@@ -0,0 +1,160 @@
from typing import TypedDict
from pydantic import BaseModel, Field
from comfy_api.latest import Input
class InputShouldRemesh(TypedDict):
should_remesh: str
topology: str
target_polycount: int
class InputShouldTexture(TypedDict):
should_texture: str
enable_pbr: bool
texture_prompt: str
texture_image: Input.Image | None
class MeshyTaskResponse(BaseModel):
result: str = Field(...)
class MeshyTextToModelRequest(BaseModel):
mode: str = Field("preview")
prompt: str = Field(..., max_length=600)
art_style: str = Field(..., description="'realistic' or 'sculpture'")
ai_model: str = Field(...)
topology: str | None = Field(..., description="'quad' or 'triangle'")
target_polycount: int | None = Field(..., ge=100, le=300000)
should_remesh: bool = Field(
True,
description="False returns the original mesh, ignoring topology and polycount.",
)
symmetry_mode: str = Field(..., description="'auto', 'off' or 'on'")
pose_mode: str = Field(...)
seed: int = Field(...)
moderation: bool = Field(False)
class MeshyRefineTask(BaseModel):
mode: str = Field("refine")
preview_task_id: str = Field(...)
enable_pbr: bool | None = Field(...)
texture_prompt: str | None = Field(...)
texture_image_url: str | None = Field(...)
ai_model: str = Field(...)
moderation: bool = Field(False)
class MeshyImageToModelRequest(BaseModel):
image_url: str = Field(...)
ai_model: str = Field(...)
topology: str | None = Field(..., description="'quad' or 'triangle'")
target_polycount: int | None = Field(..., ge=100, le=300000)
symmetry_mode: str = Field(..., description="'auto', 'off' or 'on'")
should_remesh: bool = Field(
True,
description="False returns the original mesh, ignoring topology and polycount.",
)
should_texture: bool = Field(...)
enable_pbr: bool | None = Field(...)
pose_mode: str = Field(...)
texture_prompt: str | None = Field(None, max_length=600)
texture_image_url: str | None = Field(None)
seed: int = Field(...)
moderation: bool = Field(False)
class MeshyMultiImageToModelRequest(BaseModel):
image_urls: list[str] = Field(...)
ai_model: str = Field(...)
topology: str | None = Field(..., description="'quad' or 'triangle'")
target_polycount: int | None = Field(..., ge=100, le=300000)
symmetry_mode: str = Field(..., description="'auto', 'off' or 'on'")
should_remesh: bool = Field(
True,
description="False returns the original mesh, ignoring topology and polycount.",
)
should_texture: bool = Field(...)
enable_pbr: bool | None = Field(...)
pose_mode: str = Field(...)
texture_prompt: str | None = Field(None, max_length=600)
texture_image_url: str | None = Field(None)
seed: int = Field(...)
moderation: bool = Field(False)
class MeshyRiggingRequest(BaseModel):
input_task_id: str = Field(...)
height_meters: float = Field(...)
texture_image_url: str | None = Field(...)
class MeshyAnimationRequest(BaseModel):
rig_task_id: str = Field(...)
action_id: int = Field(...)
class MeshyTextureRequest(BaseModel):
input_task_id: str = Field(...)
ai_model: str = Field(...)
enable_original_uv: bool = Field(...)
enable_pbr: bool = Field(...)
text_style_prompt: str | None = Field(...)
image_style_url: str | None = Field(...)
class MeshyModelsUrls(BaseModel):
glb: str = Field("")
class MeshyRiggedModelsUrls(BaseModel):
rigged_character_glb_url: str = Field("")
class MeshyAnimatedModelsUrls(BaseModel):
animation_glb_url: str = Field("")
class MeshyResultTextureUrls(BaseModel):
base_color: str = Field(...)
metallic: str | None = Field(None)
normal: str | None = Field(None)
roughness: str | None = Field(None)
class MeshyTaskError(BaseModel):
message: str | None = Field(None)
class MeshyModelResult(BaseModel):
id: str = Field(...)
type: str = Field(...)
model_urls: MeshyModelsUrls = Field(MeshyModelsUrls())
thumbnail_url: str = Field(...)
video_url: str | None = Field(None)
status: str = Field(...)
progress: int = Field(0)
texture_urls: list[MeshyResultTextureUrls] | None = Field([])
task_error: MeshyTaskError | None = Field(None)
class MeshyRiggedResult(BaseModel):
id: str = Field(...)
type: str = Field(...)
status: str = Field(...)
progress: int = Field(0)
result: MeshyRiggedModelsUrls = Field(MeshyRiggedModelsUrls())
task_error: MeshyTaskError | None = Field(None)
class MeshyAnimationResult(BaseModel):
id: str = Field(...)
type: str = Field(...)
status: str = Field(...)
progress: int = Field(0)
result: MeshyAnimatedModelsUrls = Field(MeshyAnimatedModelsUrls())
task_error: MeshyTaskError | None = Field(None)

View File

@@ -0,0 +1,152 @@
from enum import Enum
from typing import Optional, Dict, Any
from pydantic import BaseModel, Field, StrictBytes
class MoonvalleyPromptResponse(BaseModel):
error: Optional[Dict[str, Any]] = None
frame_conditioning: Optional[Dict[str, Any]] = None
id: Optional[str] = None
inference_params: Optional[Dict[str, Any]] = None
meta: Optional[Dict[str, Any]] = None
model_params: Optional[Dict[str, Any]] = None
output_url: Optional[str] = None
prompt_text: Optional[str] = None
status: Optional[str] = None
class MoonvalleyTextToVideoInferenceParams(BaseModel):
add_quality_guidance: Optional[bool] = Field(
True, description='Whether to add quality guidance'
)
caching_coefficient: Optional[float] = Field(
0.3, description='Caching coefficient for optimization'
)
caching_cooldown: Optional[int] = Field(
3, description='Number of caching cooldown steps'
)
caching_warmup: Optional[int] = Field(
3, description='Number of caching warmup steps'
)
clip_value: Optional[float] = Field(
3, description='CLIP value for generation control'
)
conditioning_frame_index: Optional[int] = Field(
0, description='Index of the conditioning frame'
)
cooldown_steps: Optional[int] = Field(
75, description='Number of cooldown steps (calculated based on num_frames)'
)
fps: Optional[int] = Field(
24, description='Frames per second of the generated video'
)
guidance_scale: Optional[float] = Field(
10, description='Guidance scale for generation control'
)
height: Optional[int] = Field(
1080, description='Height of the generated video in pixels'
)
negative_prompt: Optional[str] = Field(None, description='Negative prompt text')
num_frames: Optional[int] = Field(64, description='Number of frames to generate')
seed: Optional[int] = Field(
None, description='Random seed for generation (default: random)'
)
shift_value: Optional[float] = Field(
3, description='Shift value for generation control'
)
steps: Optional[int] = Field(80, description='Number of denoising steps')
use_guidance_schedule: Optional[bool] = Field(
True, description='Whether to use guidance scheduling'
)
use_negative_prompts: Optional[bool] = Field(
False, description='Whether to use negative prompts'
)
use_timestep_transform: Optional[bool] = Field(
True, description='Whether to use timestep transformation'
)
warmup_steps: Optional[int] = Field(
0, description='Number of warmup steps (calculated based on num_frames)'
)
width: Optional[int] = Field(
1920, description='Width of the generated video in pixels'
)
class MoonvalleyTextToVideoRequest(BaseModel):
image_url: Optional[str] = None
inference_params: Optional[MoonvalleyTextToVideoInferenceParams] = None
prompt_text: Optional[str] = None
webhook_url: Optional[str] = None
class MoonvalleyUploadFileRequest(BaseModel):
file: Optional[StrictBytes] = None
class MoonvalleyUploadFileResponse(BaseModel):
access_url: Optional[str] = None
class MoonvalleyVideoToVideoInferenceParams(BaseModel):
add_quality_guidance: Optional[bool] = Field(
True, description='Whether to add quality guidance'
)
caching_coefficient: Optional[float] = Field(
0.3, description='Caching coefficient for optimization'
)
caching_cooldown: Optional[int] = Field(
3, description='Number of caching cooldown steps'
)
caching_warmup: Optional[int] = Field(
3, description='Number of caching warmup steps'
)
clip_value: Optional[float] = Field(
3, description='CLIP value for generation control'
)
conditioning_frame_index: Optional[int] = Field(
0, description='Index of the conditioning frame'
)
cooldown_steps: Optional[int] = Field(
36, description='Number of cooldown steps (calculated based on num_frames)'
)
guidance_scale: Optional[float] = Field(
15, description='Guidance scale for generation control'
)
negative_prompt: Optional[str] = Field(None, description='Negative prompt text')
seed: Optional[int] = Field(
None, description='Random seed for generation (default: random)'
)
shift_value: Optional[float] = Field(
3, description='Shift value for generation control'
)
steps: Optional[int] = Field(80, description='Number of denoising steps')
use_guidance_schedule: Optional[bool] = Field(
True, description='Whether to use guidance scheduling'
)
use_negative_prompts: Optional[bool] = Field(
False, description='Whether to use negative prompts'
)
use_timestep_transform: Optional[bool] = Field(
True, description='Whether to use timestep transformation'
)
warmup_steps: Optional[int] = Field(
24, description='Number of warmup steps (calculated based on num_frames)'
)
class ControlType(str, Enum):
motion_control = 'motion_control'
pose_control = 'pose_control'
class MoonvalleyVideoToVideoRequest(BaseModel):
control_type: ControlType = Field(
..., description='Supported types for video control'
)
inference_params: Optional[MoonvalleyVideoToVideoInferenceParams] = None
prompt_text: str = Field(..., description='Describes the video to generate')
video_url: str = Field(..., description='Url to control video')
webhook_url: Optional[str] = Field(
None, description='Optional webhook URL for notifications'
)

View File

@@ -0,0 +1,170 @@
from pydantic import BaseModel, Field
class Datum2(BaseModel):
b64_json: str | None = Field(None, description="Base64 encoded image data")
revised_prompt: str | None = Field(None, description="Revised prompt")
url: str | None = Field(None, description="URL of the image")
class InputTokensDetails(BaseModel):
image_tokens: int | None = Field(None)
text_tokens: int | None = Field(None)
class Usage(BaseModel):
input_tokens: int | None = Field(None)
input_tokens_details: InputTokensDetails | None = Field(None)
output_tokens: int | None = Field(None)
total_tokens: int | None = Field(None)
class OpenAIImageGenerationResponse(BaseModel):
data: list[Datum2] | None = Field(None)
usage: Usage | None = Field(None)
class OpenAIImageEditRequest(BaseModel):
background: str | None = Field(None, description="Background transparency")
model: str = Field(...)
moderation: str | None = Field(None)
n: int | None = Field(None, description="The number of images to generate")
output_compression: int | None = Field(None, description="Compression level for JPEG or WebP (0-100)")
output_format: str | None = Field(None)
prompt: str = Field(...)
quality: str | None = Field(None, description="Size of the image (e.g., 1024x1024, 1536x1024, auto)")
size: str | None = Field(None, description="Size of the output image")
class OpenAIImageGenerationRequest(BaseModel):
background: str | None = Field(None, description="Background transparency")
model: str | None = Field(None)
moderation: str | None = Field(None)
n: int | None = Field(
None,
description="The number of images to generate.",
)
output_compression: int | None = Field(None, description="Compression level for JPEG or WebP (0-100)")
output_format: str | None = Field(None)
prompt: str = Field(...)
quality: str | None = Field(None, description="The quality of the generated image")
size: str | None = Field(None, description="Size of the image (e.g., 1024x1024, 1536x1024, auto)")
style: str | None = Field(None, description="Style of the image (only for dall-e-3)")
class ModelResponseProperties(BaseModel):
instructions: str | None = Field(None)
max_output_tokens: int | None = Field(None)
model: str | None = Field(None)
temperature: float | None = Field(1, description="Controls randomness in the response", ge=0.0, le=2.0)
top_p: float | None = Field(
1,
description="Controls diversity of the response via nucleus sampling",
ge=0.0,
le=1.0,
)
truncation: str | None = Field("disabled", description="Allowed values: 'auto' or 'disabled'")
class ResponseProperties(BaseModel):
instructions: str | None = Field(None)
max_output_tokens: int | None = Field(None)
model: str | None = Field(None)
previous_response_id: str | None = Field(None)
truncation: str | None = Field("disabled", description="Allowed values: 'auto' or 'disabled'")
class ResponseError(BaseModel):
code: str = Field(...)
message: str = Field(...)
class OutputTokensDetails(BaseModel):
reasoning_tokens: int = Field(..., description="The number of reasoning tokens.")
class CachedTokensDetails(BaseModel):
cached_tokens: int = Field(
...,
description="The number of tokens that were retrieved from the cache.",
)
class ResponseUsage(BaseModel):
input_tokens: int = Field(..., description="The number of input tokens.")
input_tokens_details: CachedTokensDetails = Field(...)
output_tokens: int = Field(..., description="The number of output tokens.")
output_tokens_details: OutputTokensDetails = Field(...)
total_tokens: int = Field(..., description="The total number of tokens used.")
class InputTextContent(BaseModel):
text: str = Field(..., description="The text input to the model.")
type: str = Field("input_text")
class OutputContent(BaseModel):
type: str = Field(..., description="The type of output content")
text: str | None = Field(None, description="The text content")
data: str | None = Field(None, description="Base64-encoded audio data")
transcript: str | None = Field(None, description="Transcript of the audio")
class OutputMessage(BaseModel):
type: str = Field(..., description="The type of output item")
content: list[OutputContent] | None = Field(None, description="The content of the message")
role: str | None = Field(None, description="The role of the message")
class OpenAIResponse(ModelResponseProperties, ResponseProperties):
created_at: float | None = Field(
None,
description="Unix timestamp (in seconds) of when this Response was created.",
)
error: ResponseError | None = Field(None)
id: str | None = Field(None, description="Unique identifier for this Response.")
object: str | None = Field(None, description="The object type of this resource - always set to `response`.")
output: list[OutputMessage] | None = Field(None)
parallel_tool_calls: bool | None = Field(True)
status: str | None = Field(
None,
description="One of `completed`, `failed`, `in_progress`, or `incomplete`.",
)
usage: ResponseUsage | None = Field(None)
class InputImageContent(BaseModel):
detail: str = Field(..., description="One of `high`, `low`, or `auto`. Defaults to `auto`.")
file_id: str | None = Field(None)
image_url: str | None = Field(None)
type: str = Field(..., description="The type of the input item. Always `input_image`.")
class InputFileContent(BaseModel):
file_data: str | None = Field(None)
file_id: str | None = Field(None)
filename: str | None = Field(None, description="The name of the file to be sent to the model.")
type: str = Field(..., description="The type of the input item. Always `input_file`.")
class InputMessage(BaseModel):
content: list[InputTextContent | InputImageContent | InputFileContent] = Field(
...,
description="A list of one or many input items to the model, containing different content types.",
)
role: str | None = Field(None)
type: str | None = Field(None)
class OpenAICreateResponse(ModelResponseProperties, ResponseProperties):
include: str | None = Field(None)
input: list[InputMessage] = Field(...)
parallel_tool_calls: bool | None = Field(
True, description="Whether to allow the model to run tool calls in parallel."
)
store: bool | None = Field(
True,
description="Whether to store the generated model response for later retrieval via API.",
)
stream: bool | None = Field(False)
usage: ResponseUsage | None = Field(None)

View File

@@ -1,52 +0,0 @@
from pydantic import BaseModel, Field
class Datum2(BaseModel):
b64_json: str | None = Field(None, description="Base64 encoded image data")
revised_prompt: str | None = Field(None, description="Revised prompt")
url: str | None = Field(None, description="URL of the image")
class InputTokensDetails(BaseModel):
image_tokens: int | None = None
text_tokens: int | None = None
class Usage(BaseModel):
input_tokens: int | None = None
input_tokens_details: InputTokensDetails | None = None
output_tokens: int | None = None
total_tokens: int | None = None
class OpenAIImageGenerationResponse(BaseModel):
data: list[Datum2] | None = None
usage: Usage | None = None
class OpenAIImageEditRequest(BaseModel):
background: str | None = Field(None, description="Background transparency")
model: str = Field(...)
moderation: str | None = Field(None)
n: int | None = Field(None, description="The number of images to generate")
output_compression: int | None = Field(None, description="Compression level for JPEG or WebP (0-100)")
output_format: str | None = Field(None)
prompt: str = Field(...)
quality: str | None = Field(None, description="Size of the image (e.g., 1024x1024, 1536x1024, auto)")
size: str | None = Field(None, description="Size of the output image")
class OpenAIImageGenerationRequest(BaseModel):
background: str | None = Field(None, description="Background transparency")
model: str | None = Field(None)
moderation: str | None = Field(None)
n: int | None = Field(
None,
description="The number of images to generate.",
)
output_compression: int | None = Field(None, description="Compression level for JPEG or WebP (0-100)")
output_format: str | None = Field(None)
prompt: str = Field(...)
quality: str | None = Field(None, description="The quality of the generated image")
size: str | None = Field(None, description="Size of the image (e.g., 1024x1024, 1536x1024, auto)")
style: str | None = Field(None, description="Style of the image (only for dall-e-3)")

View File

@@ -0,0 +1,127 @@
from enum import Enum
from typing import Optional, List, Union
from datetime import datetime
from pydantic import BaseModel, Field, RootModel
class RunwayAspectRatioEnum(str, Enum):
field_1280_720 = '1280:720'
field_720_1280 = '720:1280'
field_1104_832 = '1104:832'
field_832_1104 = '832:1104'
field_960_960 = '960:960'
field_1584_672 = '1584:672'
field_1280_768 = '1280:768'
field_768_1280 = '768:1280'
class Position(str, Enum):
first = 'first'
last = 'last'
class RunwayPromptImageDetailedObject(BaseModel):
position: Position = Field(
...,
description="The position of the image in the output video. 'last' is currently supported for gen3a_turbo only.",
)
uri: str = Field(
..., description='A HTTPS URL or data URI containing an encoded image.'
)
class RunwayPromptImageObject(
RootModel[Union[str, List[RunwayPromptImageDetailedObject]]]
):
root: Union[str, List[RunwayPromptImageDetailedObject]] = Field(
...,
description='Image(s) to use for the video generation. Can be a single URI or an array of image objects with positions.',
)
class RunwayModelEnum(str, Enum):
gen4_turbo = 'gen4_turbo'
gen3a_turbo = 'gen3a_turbo'
class RunwayDurationEnum(int, Enum):
integer_5 = 5
integer_10 = 10
class RunwayImageToVideoRequest(BaseModel):
duration: RunwayDurationEnum
model: RunwayModelEnum
promptImage: RunwayPromptImageObject
promptText: Optional[str] = Field(
None, description='Text prompt for the generation', max_length=1000
)
ratio: RunwayAspectRatioEnum
seed: int = Field(
..., description='Random seed for generation', ge=0, le=4294967295
)
class RunwayImageToVideoResponse(BaseModel):
id: Optional[str] = Field(None, description='Task ID')
class RunwayTaskStatusEnum(str, Enum):
SUCCEEDED = 'SUCCEEDED'
RUNNING = 'RUNNING'
FAILED = 'FAILED'
PENDING = 'PENDING'
CANCELLED = 'CANCELLED'
THROTTLED = 'THROTTLED'
class RunwayTaskStatusResponse(BaseModel):
createdAt: datetime = Field(..., description='Task creation timestamp')
id: str = Field(..., description='Task ID')
output: Optional[List[str]] = Field(None, description='Array of output video URLs')
progress: Optional[float] = Field(
None,
description='Float value between 0 and 1 representing the progress of the task. Only available if status is RUNNING.',
ge=0.0,
le=1.0,
)
status: RunwayTaskStatusEnum
class Model4(str, Enum):
gen4_image = 'gen4_image'
class ReferenceImage(BaseModel):
uri: Optional[str] = Field(
None, description='A HTTPS URL or data URI containing an encoded image'
)
class RunwayTextToImageAspectRatioEnum(str, Enum):
field_1920_1080 = '1920:1080'
field_1080_1920 = '1080:1920'
field_1024_1024 = '1024:1024'
field_1360_768 = '1360:768'
field_1080_1080 = '1080:1080'
field_1168_880 = '1168:880'
field_1440_1080 = '1440:1080'
field_1080_1440 = '1080:1440'
field_1808_768 = '1808:768'
field_2112_912 = '2112:912'
class RunwayTextToImageRequest(BaseModel):
model: Model4 = Field(..., description='Model to use for generation')
promptText: str = Field(
..., description='Text prompt for the image generation', max_length=1000
)
ratio: RunwayTextToImageAspectRatioEnum
referenceImages: Optional[List[ReferenceImage]] = Field(
None, description='Array of reference images to guide the generation'
)
class RunwayTextToImageResponse(BaseModel):
id: Optional[str] = Field(None, description='Task ID')

View File

@@ -41,7 +41,7 @@ class Resolution(BaseModel):
height: int = Field(...)
class CreateCreateVideoRequestSource(BaseModel):
class CreateVideoRequestSource(BaseModel):
container: str = Field(...)
size: int = Field(..., description="Size of the video file in bytes")
duration: int = Field(..., description="Duration of the video file in seconds")
@@ -89,7 +89,7 @@ class Overrides(BaseModel):
class CreateVideoRequest(BaseModel):
source: CreateCreateVideoRequestSource = Field(...)
source: CreateVideoRequestSource = Field(...)
filters: list[Union[VideoFrameInterpolationFilter, VideoEnhancementFilter]] = Field(...)
output: OutputInformationVideo = Field(...)
overrides: Overrides = Field(Overrides(isPaidDiffusion=True))

View File

@@ -0,0 +1,41 @@
from pydantic import BaseModel, Field
class SubjectReference(BaseModel):
id: str = Field(...)
images: list[str] = Field(...)
class TaskCreationRequest(BaseModel):
model: str = Field(...)
prompt: str = Field(..., max_length=2000)
duration: int = Field(...)
seed: int = Field(..., ge=0, le=2147483647)
aspect_ratio: str | None = Field(None)
resolution: str | None = Field(None)
movement_amplitude: str | None = Field(None)
images: list[str] | None = Field(None, description="Base64 encoded string or image URL")
subjects: list[SubjectReference] | None = Field(None)
bgm: bool | None = Field(None)
audio: bool | None = Field(None)
class TaskCreationResponse(BaseModel):
task_id: str = Field(...)
state: str = Field(...)
created_at: str = Field(...)
code: int | None = Field(None, description="Error code")
class TaskResult(BaseModel):
id: str = Field(..., description="Creation id")
url: str = Field(..., description="The URL of the generated results, valid for one hour")
cover_url: str = Field(..., description="The cover URL of the generated results, valid for one hour")
class TaskStatusResponse(BaseModel):
state: str = Field(...)
err_code: str | None = Field(None)
progress: float | None = Field(None)
credits: int | None = Field(None)
creations: list[TaskResult] = Field(..., description="Generated results")

View File

@@ -0,0 +1,35 @@
from pydantic import BaseModel, Field
class SeedVR2ImageRequest(BaseModel):
image: str = Field(...)
target_resolution: str = Field(...)
output_format: str = Field("png")
enable_sync_mode: bool = Field(False)
class FlashVSRRequest(BaseModel):
target_resolution: str = Field(...)
video: str = Field(...)
duration: float = Field(...)
class TaskCreatedDataResponse(BaseModel):
id: str = Field(...)
class TaskCreatedResponse(BaseModel):
code: int = Field(...)
message: str = Field(...)
data: TaskCreatedDataResponse | None = Field(None)
class TaskResultDataResponse(BaseModel):
status: str = Field(...)
outputs: list[str] = Field([])
class TaskResultResponse(BaseModel):
code: int = Field(...)
message: str = Field(...)
data: TaskResultDataResponse | None = Field(None)

View File

@@ -1,10 +0,0 @@
import av
ver = av.__version__.split(".")
if int(ver[0]) < 14:
raise Exception("INSTALL NEW VERSION OF PYAV TO USE API NODES.")
if int(ver[0]) == 14 and int(ver[1]) < 2:
raise Exception("INSTALL NEW VERSION OF PYAV TO USE API NODES.")
NODE_CLASS_MAPPINGS = {}

View File

@@ -1,116 +0,0 @@
from enum import Enum
from pydantic.fields import FieldInfo
from pydantic import BaseModel
from pydantic_core import PydanticUndefined
from comfy.comfy_types.node_typing import IO, InputTypeOptions
NodeInput = tuple[IO, InputTypeOptions]
def _create_base_config(field_info: FieldInfo) -> InputTypeOptions:
config = {}
if hasattr(field_info, "default") and field_info.default is not PydanticUndefined:
config["default"] = field_info.default
if hasattr(field_info, "description") and field_info.description is not None:
config["tooltip"] = field_info.description
return config
def _get_number_constraints_config(field_info: FieldInfo) -> dict:
config = {}
if hasattr(field_info, "metadata"):
metadata = field_info.metadata
for constraint in metadata:
if hasattr(constraint, "ge"):
config["min"] = constraint.ge
if hasattr(constraint, "le"):
config["max"] = constraint.le
if hasattr(constraint, "multiple_of"):
config["step"] = constraint.multiple_of
return config
def _model_field_to_image_input(field_info: FieldInfo, **kwargs) -> NodeInput:
return IO.IMAGE, {
**_create_base_config(field_info),
**kwargs,
}
def _model_field_to_string_input(field_info: FieldInfo, **kwargs) -> NodeInput:
return IO.STRING, {
**_create_base_config(field_info),
**kwargs,
}
def _model_field_to_float_input(field_info: FieldInfo, **kwargs) -> NodeInput:
return IO.FLOAT, {
**_create_base_config(field_info),
**_get_number_constraints_config(field_info),
**kwargs,
}
def _model_field_to_int_input(field_info: FieldInfo, **kwargs) -> NodeInput:
return IO.INT, {
**_create_base_config(field_info),
**_get_number_constraints_config(field_info),
**kwargs,
}
def _model_field_to_combo_input(
field_info: FieldInfo, enum_type: type[Enum] = None, **kwargs
) -> NodeInput:
combo_config = {}
if enum_type is not None:
combo_config["options"] = [option.value for option in enum_type]
combo_config = {
**combo_config,
**_create_base_config(field_info),
**kwargs,
}
return IO.COMBO, combo_config
def model_field_to_node_input(
input_type: IO, base_model: type[BaseModel], field_name: str, **kwargs
) -> NodeInput:
"""
Maps a field from a Pydantic model to a Comfy node input.
Args:
input_type: The type of the input.
base_model: The Pydantic model to map the field from.
field_name: The name of the field to map.
**kwargs: Additional key/values to include in the input options.
Note:
For combo inputs, pass an `Enum` to the `enum_type` keyword argument to populate the options automatically.
Example:
>>> model_field_to_node_input(IO.STRING, MyModel, "my_field", multiline=True)
>>> model_field_to_node_input(IO.COMBO, MyModel, "my_field", enum_type=MyEnum)
>>> model_field_to_node_input(IO.FLOAT, MyModel, "my_field", slider=True)
"""
field_info: FieldInfo = base_model.model_fields[field_name]
result: NodeInput
if input_type == IO.IMAGE:
result = _model_field_to_image_input(field_info, **kwargs)
elif input_type == IO.STRING:
result = _model_field_to_string_input(field_info, **kwargs)
elif input_type == IO.FLOAT:
result = _model_field_to_float_input(field_info, **kwargs)
elif input_type == IO.INT:
result = _model_field_to_int_input(field_info, **kwargs)
elif input_type == IO.COMBO:
result = _model_field_to_combo_input(field_info, **kwargs)
else:
message = f"Invalid input type: {input_type}"
raise ValueError(message)
return result

View File

@@ -3,7 +3,7 @@ from pydantic import BaseModel
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis.bfl_api import (
from comfy_api_nodes.apis.bfl import (
BFLFluxExpandImageRequest,
BFLFluxFillImageRequest,
BFLFluxKontextProGenerateRequest,
@@ -97,6 +97,9 @@ class FluxProUltraImageNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.06}""",
),
)
@classmethod
@@ -352,6 +355,9 @@ class FluxProExpandNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.05}""",
),
)
@classmethod
@@ -458,6 +464,9 @@ class FluxProFillNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.05}""",
),
)
@classmethod
@@ -511,6 +520,21 @@ class Flux2ProImageNode(IO.ComfyNode):
NODE_ID = "Flux2ProImageNode"
DISPLAY_NAME = "Flux.2 [pro] Image"
API_ENDPOINT = "/proxy/bfl/flux-2-pro/generate"
PRICE_BADGE_EXPR = """
(
$MP := 1024 * 1024;
$outMP := $max([1, $floor(((widgets.width * widgets.height) + $MP - 1) / $MP)]);
$outputCost := 0.03 + 0.015 * ($outMP - 1);
inputs.images.connected
? {
"type":"range_usd",
"min_usd": $outputCost + 0.015,
"max_usd": $outputCost + 0.12,
"format": { "approximate": true }
}
: {"type":"usd","usd": $outputCost}
)
"""
@classmethod
def define_schema(cls) -> IO.Schema:
@@ -563,6 +587,10 @@ class Flux2ProImageNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["width", "height"], inputs=["images"]),
expr=cls.PRICE_BADGE_EXPR,
),
)
@classmethod
@@ -623,6 +651,22 @@ class Flux2MaxImageNode(Flux2ProImageNode):
NODE_ID = "Flux2MaxImageNode"
DISPLAY_NAME = "Flux.2 [max] Image"
API_ENDPOINT = "/proxy/bfl/flux-2-max/generate"
PRICE_BADGE_EXPR = """
(
$MP := 1024 * 1024;
$outMP := $max([1, $floor(((widgets.width * widgets.height) + $MP - 1) / $MP)]);
$outputCost := 0.07 + 0.03 * ($outMP - 1);
inputs.images.connected
? {
"type":"range_usd",
"min_usd": $outputCost + 0.03,
"max_usd": $outputCost + 0.24,
"format": { "approximate": true }
}
: {"type":"usd","usd": $outputCost}
)
"""
class BFLExtension(ComfyExtension):

View File

@@ -0,0 +1,198 @@
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis.bria import (
BriaEditImageRequest,
BriaResponse,
BriaStatusResponse,
InputModerationSettings,
)
from comfy_api_nodes.util import (
ApiEndpoint,
convert_mask_to_image,
download_url_to_image_tensor,
get_number_of_images,
poll_op,
sync_op,
upload_images_to_comfyapi,
)
class BriaImageEditNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="BriaImageEditNode",
display_name="Bria FIBO Image Edit",
category="api node/image/Bria",
description="Edit images using Bria latest model",
inputs=[
IO.Combo.Input("model", options=["FIBO"]),
IO.Image.Input("image"),
IO.String.Input(
"prompt",
multiline=True,
default="",
tooltip="Instruction to edit image",
),
IO.String.Input("negative_prompt", multiline=True, default=""),
IO.String.Input(
"structured_prompt",
multiline=True,
default="",
tooltip="A string containing the structured edit prompt in JSON format. "
"Use this instead of usual prompt for precise, programmatic control.",
),
IO.Int.Input(
"seed",
default=1,
min=1,
max=2147483647,
step=1,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
),
IO.Float.Input(
"guidance_scale",
default=3,
min=3,
max=5,
step=0.01,
display_mode=IO.NumberDisplay.number,
tooltip="Higher value makes the image follow the prompt more closely.",
),
IO.Int.Input(
"steps",
default=50,
min=20,
max=50,
step=1,
display_mode=IO.NumberDisplay.number,
),
IO.DynamicCombo.Input(
"moderation",
options=[
IO.DynamicCombo.Option(
"true",
[
IO.Boolean.Input(
"prompt_content_moderation", default=False
),
IO.Boolean.Input(
"visual_input_moderation", default=False
),
IO.Boolean.Input(
"visual_output_moderation", default=True
),
],
),
IO.DynamicCombo.Option("false", []),
],
tooltip="Moderation settings",
),
IO.Mask.Input(
"mask",
tooltip="If omitted, the edit applies to the entire image.",
optional=True,
),
],
outputs=[
IO.Image.Output(),
IO.String.Output(display_name="structured_prompt"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.04}""",
),
)
@classmethod
async def execute(
cls,
model: str,
image: Input.Image,
prompt: str,
negative_prompt: str,
structured_prompt: str,
seed: int,
guidance_scale: float,
steps: int,
moderation: InputModerationSettings,
mask: Input.Image | None = None,
) -> IO.NodeOutput:
if not prompt and not structured_prompt:
raise ValueError(
"One of prompt or structured_prompt is required to be non-empty."
)
if get_number_of_images(image) != 1:
raise ValueError("Exactly one input image is required.")
mask_url = None
if mask is not None:
mask_url = (
await upload_images_to_comfyapi(
cls,
convert_mask_to_image(mask),
max_images=1,
mime_type="image/png",
wait_label="Uploading mask",
)
)[0]
response = await sync_op(
cls,
ApiEndpoint(path="proxy/bria/v2/image/edit", method="POST"),
data=BriaEditImageRequest(
instruction=prompt if prompt else None,
structured_instruction=structured_prompt if structured_prompt else None,
images=await upload_images_to_comfyapi(
cls,
image,
max_images=1,
mime_type="image/png",
wait_label="Uploading image",
),
mask=mask_url,
negative_prompt=negative_prompt if negative_prompt else None,
guidance_scale=guidance_scale,
seed=seed,
model_version=model,
steps_num=steps,
prompt_content_moderation=moderation.get(
"prompt_content_moderation", False
),
visual_input_content_moderation=moderation.get(
"visual_input_moderation", False
),
visual_output_content_moderation=moderation.get(
"visual_output_moderation", False
),
),
response_model=BriaStatusResponse,
)
response = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/bria/v2/status/{response.request_id}"),
status_extractor=lambda r: r.status,
response_model=BriaResponse,
)
return IO.NodeOutput(
await download_url_to_image_tensor(response.result.image_url),
response.result.structured_prompt,
)
class BriaExtension(ComfyExtension):
@override
async def get_node_list(self) -> list[type[IO.ComfyNode]]:
return [
BriaImageEditNode,
]
async def comfy_entrypoint() -> BriaExtension:
return BriaExtension()

View File

@@ -5,11 +5,10 @@ import torch
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis.bytedance_api import (
from comfy_api_nodes.apis.bytedance import (
RECOMMENDED_PRESETS,
RECOMMENDED_PRESETS_SEEDREAM_4,
VIDEO_TASKS_EXECUTION_TIME,
Image2ImageTaskCreationRequest,
Image2VideoTaskCreationRequest,
ImageTaskCreationResponse,
Seedream4Options,
@@ -126,6 +125,9 @@ class ByteDanceImageNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.03}""",
),
)
@classmethod
@@ -171,99 +173,6 @@ class ByteDanceImageNode(IO.ComfyNode):
return IO.NodeOutput(await download_url_to_image_tensor(get_image_url_from_response(response)))
class ByteDanceImageEditNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="ByteDanceImageEditNode",
display_name="ByteDance Image Edit",
category="api node/image/ByteDance",
description="Edit images using ByteDance models via api based on prompt",
inputs=[
IO.Combo.Input("model", options=["seededit-3-0-i2i-250628"]),
IO.Image.Input(
"image",
tooltip="The base image to edit",
),
IO.String.Input(
"prompt",
multiline=True,
default="",
tooltip="Instruction to edit image",
),
IO.Int.Input(
"seed",
default=0,
min=0,
max=2147483647,
step=1,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
tooltip="Seed to use for generation",
optional=True,
),
IO.Float.Input(
"guidance_scale",
default=5.5,
min=1.0,
max=10.0,
step=0.01,
display_mode=IO.NumberDisplay.number,
tooltip="Higher value makes the image follow the prompt more closely",
optional=True,
),
IO.Boolean.Input(
"watermark",
default=False,
tooltip='Whether to add an "AI generated" watermark to the image',
optional=True,
),
],
outputs=[
IO.Image.Output(),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
is_deprecated=True,
)
@classmethod
async def execute(
cls,
model: str,
image: Input.Image,
prompt: str,
seed: int,
guidance_scale: float,
watermark: bool,
) -> IO.NodeOutput:
validate_string(prompt, strip_whitespace=True, min_length=1)
if get_number_of_images(image) != 1:
raise ValueError("Exactly one input image is required.")
validate_image_aspect_ratio(image, (1, 3), (3, 1))
source_url = (await upload_images_to_comfyapi(cls, image, max_images=1, mime_type="image/png"))[0]
payload = Image2ImageTaskCreationRequest(
model=model,
prompt=prompt,
image=source_url,
seed=seed,
guidance_scale=guidance_scale,
watermark=watermark,
)
response = await sync_op(
cls,
ApiEndpoint(path=BYTEPLUS_IMAGE_ENDPOINT, method="POST"),
data=payload,
response_model=ImageTaskCreationResponse,
)
return IO.NodeOutput(await download_url_to_image_tensor(get_image_url_from_response(response)))
class ByteDanceSeedreamNode(IO.ComfyNode):
@classmethod
@@ -367,6 +276,19 @@ class ByteDanceSeedreamNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model"]),
expr="""
(
$price := $contains(widgets.model, "seedream-4-5-251128") ? 0.04 : 0.03;
{
"type":"usd",
"usd": $price,
"format": { "suffix":" x images/Run", "approximate": true }
}
)
""",
),
)
@classmethod
@@ -461,7 +383,12 @@ class ByteDanceTextToVideoNode(IO.ComfyNode):
inputs=[
IO.Combo.Input(
"model",
options=["seedance-1-0-pro-250528", "seedance-1-0-lite-t2v-250428", "seedance-1-0-pro-fast-251015"],
options=[
"seedance-1-5-pro-251215",
"seedance-1-0-pro-250528",
"seedance-1-0-lite-t2v-250428",
"seedance-1-0-pro-fast-251015",
],
default="seedance-1-0-pro-fast-251015",
),
IO.String.Input(
@@ -512,6 +439,12 @@ class ByteDanceTextToVideoNode(IO.ComfyNode):
tooltip='Whether to add an "AI generated" watermark to the video.',
optional=True,
),
IO.Boolean.Input(
"generate_audio",
default=False,
tooltip="This parameter is ignored for any model except seedance-1-5-pro.",
optional=True,
),
],
outputs=[
IO.Video.Output(),
@@ -522,6 +455,7 @@ class ByteDanceTextToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=PRICE_BADGE_VIDEO,
)
@classmethod
@@ -535,7 +469,10 @@ class ByteDanceTextToVideoNode(IO.ComfyNode):
seed: int,
camera_fixed: bool,
watermark: bool,
generate_audio: bool = False,
) -> IO.NodeOutput:
if model == "seedance-1-5-pro-251215" and duration < 4:
raise ValueError("Minimum supported duration for Seedance 1.5 Pro is 4 seconds.")
validate_string(prompt, strip_whitespace=True, min_length=1)
raise_if_text_params(prompt, ["resolution", "ratio", "duration", "seed", "camerafixed", "watermark"])
@@ -550,7 +487,11 @@ class ByteDanceTextToVideoNode(IO.ComfyNode):
)
return await process_video_task(
cls,
payload=Text2VideoTaskCreationRequest(model=model, content=[TaskTextContent(text=prompt)]),
payload=Text2VideoTaskCreationRequest(
model=model,
content=[TaskTextContent(text=prompt)],
generate_audio=generate_audio if model == "seedance-1-5-pro-251215" else None,
),
estimated_duration=max(1, math.ceil(VIDEO_TASKS_EXECUTION_TIME[model][resolution] * (duration / 10.0))),
)
@@ -567,7 +508,12 @@ class ByteDanceImageToVideoNode(IO.ComfyNode):
inputs=[
IO.Combo.Input(
"model",
options=["seedance-1-0-pro-250528", "seedance-1-0-lite-t2v-250428", "seedance-1-0-pro-fast-251015"],
options=[
"seedance-1-5-pro-251215",
"seedance-1-0-pro-250528",
"seedance-1-0-lite-i2v-250428",
"seedance-1-0-pro-fast-251015",
],
default="seedance-1-0-pro-fast-251015",
),
IO.String.Input(
@@ -622,6 +568,12 @@ class ByteDanceImageToVideoNode(IO.ComfyNode):
tooltip='Whether to add an "AI generated" watermark to the video.',
optional=True,
),
IO.Boolean.Input(
"generate_audio",
default=False,
tooltip="This parameter is ignored for any model except seedance-1-5-pro.",
optional=True,
),
],
outputs=[
IO.Video.Output(),
@@ -632,6 +584,7 @@ class ByteDanceImageToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=PRICE_BADGE_VIDEO,
)
@classmethod
@@ -646,7 +599,10 @@ class ByteDanceImageToVideoNode(IO.ComfyNode):
seed: int,
camera_fixed: bool,
watermark: bool,
generate_audio: bool = False,
) -> IO.NodeOutput:
if model == "seedance-1-5-pro-251215" and duration < 4:
raise ValueError("Minimum supported duration for Seedance 1.5 Pro is 4 seconds.")
validate_string(prompt, strip_whitespace=True, min_length=1)
raise_if_text_params(prompt, ["resolution", "ratio", "duration", "seed", "camerafixed", "watermark"])
validate_image_dimensions(image, min_width=300, min_height=300, max_width=6000, max_height=6000)
@@ -668,6 +624,7 @@ class ByteDanceImageToVideoNode(IO.ComfyNode):
payload=Image2VideoTaskCreationRequest(
model=model,
content=[TaskTextContent(text=prompt), TaskImageContent(image_url=TaskImageContentUrl(url=image_url))],
generate_audio=generate_audio if model == "seedance-1-5-pro-251215" else None,
),
estimated_duration=max(1, math.ceil(VIDEO_TASKS_EXECUTION_TIME[model][resolution] * (duration / 10.0))),
)
@@ -685,7 +642,7 @@ class ByteDanceFirstLastFrameNode(IO.ComfyNode):
inputs=[
IO.Combo.Input(
"model",
options=["seedance-1-0-pro-250528", "seedance-1-0-lite-i2v-250428"],
options=["seedance-1-5-pro-251215", "seedance-1-0-pro-250528", "seedance-1-0-lite-i2v-250428"],
default="seedance-1-0-lite-i2v-250428",
),
IO.String.Input(
@@ -744,6 +701,12 @@ class ByteDanceFirstLastFrameNode(IO.ComfyNode):
tooltip='Whether to add an "AI generated" watermark to the video.',
optional=True,
),
IO.Boolean.Input(
"generate_audio",
default=False,
tooltip="This parameter is ignored for any model except seedance-1-5-pro.",
optional=True,
),
],
outputs=[
IO.Video.Output(),
@@ -754,6 +717,7 @@ class ByteDanceFirstLastFrameNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=PRICE_BADGE_VIDEO,
)
@classmethod
@@ -769,7 +733,10 @@ class ByteDanceFirstLastFrameNode(IO.ComfyNode):
seed: int,
camera_fixed: bool,
watermark: bool,
generate_audio: bool = False,
) -> IO.NodeOutput:
if model == "seedance-1-5-pro-251215" and duration < 4:
raise ValueError("Minimum supported duration for Seedance 1.5 Pro is 4 seconds.")
validate_string(prompt, strip_whitespace=True, min_length=1)
raise_if_text_params(prompt, ["resolution", "ratio", "duration", "seed", "camerafixed", "watermark"])
for i in (first_frame, last_frame):
@@ -802,6 +769,7 @@ class ByteDanceFirstLastFrameNode(IO.ComfyNode):
TaskImageContent(image_url=TaskImageContentUrl(url=str(download_urls[0])), role="first_frame"),
TaskImageContent(image_url=TaskImageContentUrl(url=str(download_urls[1])), role="last_frame"),
],
generate_audio=generate_audio if model == "seedance-1-5-pro-251215" else None,
),
estimated_duration=max(1, math.ceil(VIDEO_TASKS_EXECUTION_TIME[model][resolution] * (duration / 10.0))),
)
@@ -877,6 +845,41 @@ class ByteDanceImageReferenceNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model", "duration", "resolution"]),
expr="""
(
$priceByModel := {
"seedance-1-0-pro": {
"480p":[0.23,0.24],
"720p":[0.51,0.56]
},
"seedance-1-0-lite": {
"480p":[0.17,0.18],
"720p":[0.37,0.41]
}
};
$model := widgets.model;
$modelKey :=
$contains($model, "seedance-1-0-pro") ? "seedance-1-0-pro" :
"seedance-1-0-lite";
$resolution := widgets.resolution;
$resKey :=
$contains($resolution, "720") ? "720p" :
"480p";
$modelPrices := $lookup($priceByModel, $modelKey);
$baseRange := $lookup($modelPrices, $resKey);
$min10s := $baseRange[0];
$max10s := $baseRange[1];
$scale := widgets.duration / 10;
$minCost := $min10s * $scale;
$maxCost := $max10s * $scale;
($minCost = $maxCost)
? {"type":"usd","usd": $minCost}
: {"type":"range_usd","min_usd": $minCost, "max_usd": $maxCost}
)
""",
),
)
@classmethod
@@ -946,12 +949,64 @@ def raise_if_text_params(prompt: str, text_params: list[str]) -> None:
)
PRICE_BADGE_VIDEO = IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model", "duration", "resolution", "generate_audio"]),
expr="""
(
$priceByModel := {
"seedance-1-5-pro": {
"480p":[0.12,0.12],
"720p":[0.26,0.26],
"1080p":[0.58,0.59]
},
"seedance-1-0-pro": {
"480p":[0.23,0.24],
"720p":[0.51,0.56],
"1080p":[1.18,1.22]
},
"seedance-1-0-pro-fast": {
"480p":[0.09,0.1],
"720p":[0.21,0.23],
"1080p":[0.47,0.49]
},
"seedance-1-0-lite": {
"480p":[0.17,0.18],
"720p":[0.37,0.41],
"1080p":[0.85,0.88]
}
};
$model := widgets.model;
$modelKey :=
$contains($model, "seedance-1-5-pro") ? "seedance-1-5-pro" :
$contains($model, "seedance-1-0-pro-fast") ? "seedance-1-0-pro-fast" :
$contains($model, "seedance-1-0-pro") ? "seedance-1-0-pro" :
"seedance-1-0-lite";
$resolution := widgets.resolution;
$resKey :=
$contains($resolution, "1080") ? "1080p" :
$contains($resolution, "720") ? "720p" :
"480p";
$modelPrices := $lookup($priceByModel, $modelKey);
$baseRange := $lookup($modelPrices, $resKey);
$min10s := $baseRange[0];
$max10s := $baseRange[1];
$scale := widgets.duration / 10;
$audioMultiplier := ($modelKey = "seedance-1-5-pro" and widgets.generate_audio) ? 2 : 1;
$minCost := $min10s * $scale * $audioMultiplier;
$maxCost := $max10s * $scale * $audioMultiplier;
($minCost = $maxCost)
? {"type":"usd","usd": $minCost, "format": { "approximate": true }}
: {"type":"range_usd","min_usd": $minCost, "max_usd": $maxCost, "format": { "approximate": true }}
)
""",
)
class ByteDanceExtension(ComfyExtension):
@override
async def get_node_list(self) -> list[type[IO.ComfyNode]]:
return [
ByteDanceImageNode,
ByteDanceImageEditNode,
ByteDanceSeedreamNode,
ByteDanceTextToVideoNode,
ByteDanceImageToVideoNode,

View File

@@ -14,7 +14,7 @@ from typing_extensions import override
import folder_paths
from comfy_api.latest import IO, ComfyExtension, Input, Types
from comfy_api_nodes.apis.gemini_api import (
from comfy_api_nodes.apis.gemini import (
GeminiContent,
GeminiFileData,
GeminiGenerateContentRequest,
@@ -130,7 +130,7 @@ def get_parts_by_type(response: GeminiGenerateContentResponse, part_type: Litera
Returns:
List of response parts matching the requested type.
"""
if response.candidates is None:
if not response.candidates:
if response.promptFeedback and response.promptFeedback.blockReason:
feedback = response.promptFeedback
raise ValueError(
@@ -141,14 +141,24 @@ def get_parts_by_type(response: GeminiGenerateContentResponse, part_type: Litera
"try changing it to `IMAGE+TEXT` to view the model's reasoning and understand why image generation failed."
)
parts = []
for part in response.candidates[0].content.parts:
if part_type == "text" and part.text:
parts.append(part)
elif part.inlineData and part.inlineData.mimeType == part_type:
parts.append(part)
elif part.fileData and part.fileData.mimeType == part_type:
parts.append(part)
# Skip parts that don't match the requested type
blocked_reasons = []
for candidate in response.candidates:
if candidate.finishReason and candidate.finishReason.upper() == "IMAGE_PROHIBITED_CONTENT":
blocked_reasons.append(candidate.finishReason)
continue
if candidate.content is None or candidate.content.parts is None:
continue
for part in candidate.content.parts:
if part_type == "text" and part.text:
parts.append(part)
elif part.inlineData and part.inlineData.mimeType == part_type:
parts.append(part)
elif part.fileData and part.fileData.mimeType == part_type:
parts.append(part)
if not parts and blocked_reasons:
raise ValueError(f"Gemini API blocked the request. Reasons: {blocked_reasons}")
return parts
@@ -309,6 +319,30 @@ class GeminiNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model"]),
expr="""
(
$m := widgets.model;
$contains($m, "gemini-2.5-flash") ? {
"type": "list_usd",
"usd": [0.0003, 0.0025],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens"}
}
: $contains($m, "gemini-2.5-pro") ? {
"type": "list_usd",
"usd": [0.00125, 0.01],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "gemini-3-pro-preview") ? {
"type": "list_usd",
"usd": [0.002, 0.012],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: {"type":"text", "text":"Token-based"}
)
""",
),
)
@classmethod
@@ -570,6 +604,9 @@ class GeminiImage(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.039,"format":{"suffix":"/Image (1K)","approximate":true}}""",
),
)
@classmethod
@@ -700,6 +737,19 @@ class GeminiImage2(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["resolution"]),
expr="""
(
$r := widgets.resolution;
($contains($r,"1k") or $contains($r,"2k"))
? {"type":"usd","usd":0.134,"format":{"suffix":"/Image","approximate":true}}
: $contains($r,"4k")
? {"type":"usd","usd":0.24,"format":{"suffix":"/Image","approximate":true}}
: {"type":"text","text":"Token-based"}
)
""",
),
)
@classmethod

View File

@@ -0,0 +1,297 @@
import os
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis.hunyuan3d import (
Hunyuan3DViewImage,
InputGenerateType,
ResultFile3D,
To3DProTaskCreateResponse,
To3DProTaskQueryRequest,
To3DProTaskRequest,
To3DProTaskResultResponse,
)
from comfy_api_nodes.util import (
ApiEndpoint,
download_url_to_bytesio,
downscale_image_tensor_by_max_side,
poll_op,
sync_op,
upload_image_to_comfyapi,
validate_image_dimensions,
validate_string,
)
from folder_paths import get_output_directory
def get_glb_obj_from_response(response_objs: list[ResultFile3D]) -> ResultFile3D:
for i in response_objs:
if i.Type.lower() == "glb":
return i
raise ValueError("No GLB file found in response. Please report this to the developers.")
class TencentTextToModelNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="TencentTextToModelNode",
display_name="Hunyuan3D: Text to Model (Pro)",
category="api node/3d/Tencent",
inputs=[
IO.Combo.Input(
"model",
options=["3.0", "3.1"],
tooltip="The LowPoly option is unavailable for the `3.1` model.",
),
IO.String.Input("prompt", multiline=True, default="", tooltip="Supports up to 1024 characters."),
IO.Int.Input("face_count", default=500000, min=40000, max=1500000),
IO.DynamicCombo.Input(
"generate_type",
options=[
IO.DynamicCombo.Option("Normal", [IO.Boolean.Input("pbr", default=False)]),
IO.DynamicCombo.Option(
"LowPoly",
[
IO.Combo.Input("polygon_type", options=["triangle", "quadrilateral"]),
IO.Boolean.Input("pbr", default=False),
],
),
IO.DynamicCombo.Option("Geometry", []),
],
),
IO.Int.Input(
"seed",
default=0,
min=0,
max=2147483647,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
tooltip="Seed controls whether the node should re-run; "
"results are non-deterministic regardless of seed.",
),
],
outputs=[
IO.String.Output(display_name="model_file"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
is_output_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["generate_type", "generate_type.pbr", "face_count"]),
expr="""
(
$base := widgets.generate_type = "normal" ? 25 : widgets.generate_type = "lowpoly" ? 30 : 15;
$pbr := $lookup(widgets, "generate_type.pbr") ? 10 : 0;
$face := widgets.face_count != 500000 ? 10 : 0;
{"type":"usd","usd": ($base + $pbr + $face) * 0.02}
)
""",
),
)
@classmethod
async def execute(
cls,
model: str,
prompt: str,
face_count: int,
generate_type: InputGenerateType,
seed: int,
) -> IO.NodeOutput:
_ = seed
validate_string(prompt, field_name="prompt", min_length=1, max_length=1024)
if model == "3.1" and generate_type["generate_type"].lower() == "lowpoly":
raise ValueError("The LowPoly option is currently unavailable for the 3.1 model.")
response = await sync_op(
cls,
ApiEndpoint(path="/proxy/tencent/hunyuan/3d-pro", method="POST"),
response_model=To3DProTaskCreateResponse,
data=To3DProTaskRequest(
Model=model,
Prompt=prompt,
FaceCount=face_count,
GenerateType=generate_type["generate_type"],
EnablePBR=generate_type.get("pbr", None),
PolygonType=generate_type.get("polygon_type", None),
),
)
if response.Error:
raise ValueError(f"Task creation failed with code {response.Error.Code}: {response.Error.Message}")
result = await poll_op(
cls,
ApiEndpoint(path="/proxy/tencent/hunyuan/3d-pro/query", method="POST"),
data=To3DProTaskQueryRequest(JobId=response.JobId),
response_model=To3DProTaskResultResponse,
status_extractor=lambda r: r.Status,
)
model_file = f"hunyuan_model_{response.JobId}.glb"
await download_url_to_bytesio(
get_glb_obj_from_response(result.ResultFile3Ds).Url,
os.path.join(get_output_directory(), model_file),
)
return IO.NodeOutput(model_file)
class TencentImageToModelNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="TencentImageToModelNode",
display_name="Hunyuan3D: Image(s) to Model (Pro)",
category="api node/3d/Tencent",
inputs=[
IO.Combo.Input(
"model",
options=["3.0", "3.1"],
tooltip="The LowPoly option is unavailable for the `3.1` model.",
),
IO.Image.Input("image"),
IO.Image.Input("image_left", optional=True),
IO.Image.Input("image_right", optional=True),
IO.Image.Input("image_back", optional=True),
IO.Int.Input("face_count", default=500000, min=40000, max=1500000),
IO.DynamicCombo.Input(
"generate_type",
options=[
IO.DynamicCombo.Option("Normal", [IO.Boolean.Input("pbr", default=False)]),
IO.DynamicCombo.Option(
"LowPoly",
[
IO.Combo.Input("polygon_type", options=["triangle", "quadrilateral"]),
IO.Boolean.Input("pbr", default=False),
],
),
IO.DynamicCombo.Option("Geometry", []),
],
),
IO.Int.Input(
"seed",
default=0,
min=0,
max=2147483647,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
tooltip="Seed controls whether the node should re-run; "
"results are non-deterministic regardless of seed.",
),
],
outputs=[
IO.String.Output(display_name="model_file"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
is_output_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(
widgets=["generate_type", "generate_type.pbr", "face_count"],
inputs=["image_left", "image_right", "image_back"],
),
expr="""
(
$base := widgets.generate_type = "normal" ? 25 : widgets.generate_type = "lowpoly" ? 30 : 15;
$multiview := (
inputs.image_left.connected or inputs.image_right.connected or inputs.image_back.connected
) ? 10 : 0;
$pbr := $lookup(widgets, "generate_type.pbr") ? 10 : 0;
$face := widgets.face_count != 500000 ? 10 : 0;
{"type":"usd","usd": ($base + $multiview + $pbr + $face) * 0.02}
)
""",
),
)
@classmethod
async def execute(
cls,
model: str,
image: Input.Image,
face_count: int,
generate_type: InputGenerateType,
seed: int,
image_left: Input.Image | None = None,
image_right: Input.Image | None = None,
image_back: Input.Image | None = None,
) -> IO.NodeOutput:
_ = seed
if model == "3.1" and generate_type["generate_type"].lower() == "lowpoly":
raise ValueError("The LowPoly option is currently unavailable for the 3.1 model.")
validate_image_dimensions(image, min_width=128, min_height=128)
multiview_images = []
for k, v in {
"left": image_left,
"right": image_right,
"back": image_back,
}.items():
if v is None:
continue
validate_image_dimensions(v, min_width=128, min_height=128)
multiview_images.append(
Hunyuan3DViewImage(
ViewType=k,
ViewImageUrl=await upload_image_to_comfyapi(
cls,
downscale_image_tensor_by_max_side(v, max_side=4900),
mime_type="image/webp",
total_pixels=24_010_000,
),
)
)
response = await sync_op(
cls,
ApiEndpoint(path="/proxy/tencent/hunyuan/3d-pro", method="POST"),
response_model=To3DProTaskCreateResponse,
data=To3DProTaskRequest(
Model=model,
FaceCount=face_count,
GenerateType=generate_type["generate_type"],
ImageUrl=await upload_image_to_comfyapi(
cls,
downscale_image_tensor_by_max_side(image, max_side=4900),
mime_type="image/webp",
total_pixels=24_010_000,
),
MultiViewImages=multiview_images if multiview_images else None,
EnablePBR=generate_type.get("pbr", None),
PolygonType=generate_type.get("polygon_type", None),
),
)
if response.Error:
raise ValueError(f"Task creation failed with code {response.Error.Code}: {response.Error.Message}")
result = await poll_op(
cls,
ApiEndpoint(path="/proxy/tencent/hunyuan/3d-pro/query", method="POST"),
data=To3DProTaskQueryRequest(JobId=response.JobId),
response_model=To3DProTaskResultResponse,
status_extractor=lambda r: r.Status,
)
model_file = f"hunyuan_model_{response.JobId}.glb"
await download_url_to_bytesio(
get_glb_obj_from_response(result.ResultFile3Ds).Url,
os.path.join(get_output_directory(), model_file),
)
return IO.NodeOutput(model_file)
class TencentHunyuan3DExtension(ComfyExtension):
@override
async def get_node_list(self) -> list[type[IO.ComfyNode]]:
return [
TencentTextToModelNode,
TencentImageToModelNode,
]
async def comfy_entrypoint() -> TencentHunyuan3DExtension:
return TencentHunyuan3DExtension()

View File

@@ -4,7 +4,7 @@ from comfy_api.latest import IO, ComfyExtension
from PIL import Image
import numpy as np
import torch
from comfy_api_nodes.apis import (
from comfy_api_nodes.apis.ideogram import (
IdeogramGenerateRequest,
IdeogramGenerateResponse,
ImageRequest,
@@ -236,7 +236,6 @@ class IdeogramV1(IO.ComfyNode):
display_name="Ideogram V1",
category="api node/image/Ideogram",
description="Generates images using the Ideogram V1 model.",
is_api_node=True,
inputs=[
IO.String.Input(
"prompt",
@@ -298,6 +297,17 @@ class IdeogramV1(IO.ComfyNode):
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["num_images", "turbo"]),
expr="""
(
$n := widgets.num_images;
$base := (widgets.turbo = true) ? 0.0286 : 0.0858;
{"type":"usd","usd": $round($base * $n, 2)}
)
""",
),
)
@classmethod
@@ -351,7 +361,6 @@ class IdeogramV2(IO.ComfyNode):
display_name="Ideogram V2",
category="api node/image/Ideogram",
description="Generates images using the Ideogram V2 model.",
is_api_node=True,
inputs=[
IO.String.Input(
"prompt",
@@ -436,6 +445,17 @@ class IdeogramV2(IO.ComfyNode):
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["num_images", "turbo"]),
expr="""
(
$n := widgets.num_images;
$base := (widgets.turbo = true) ? 0.0715 : 0.1144;
{"type":"usd","usd": $round($base * $n, 2)}
)
""",
),
)
@classmethod
@@ -506,7 +526,6 @@ class IdeogramV3(IO.ComfyNode):
category="api node/image/Ideogram",
description="Generates images using the Ideogram V3 model. "
"Supports both regular image generation from text prompts and image editing with mask.",
is_api_node=True,
inputs=[
IO.String.Input(
"prompt",
@@ -591,6 +610,23 @@ class IdeogramV3(IO.ComfyNode):
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["rendering_speed", "num_images"], inputs=["character_image"]),
expr="""
(
$n := widgets.num_images;
$speed := widgets.rendering_speed;
$hasChar := inputs.character_image.connected;
$base :=
$contains($speed,"quality") ? ($hasChar ? 0.286 : 0.1287) :
$contains($speed,"default") ? ($hasChar ? 0.2145 : 0.0858) :
$contains($speed,"turbo") ? ($hasChar ? 0.143 : 0.0429) :
0.0858;
{"type":"usd","usd": $round($base * $n, 2)}
)
""",
),
)
@classmethod

View File

@@ -49,7 +49,7 @@ from comfy_api_nodes.apis import (
KlingCharacterEffectModelName,
KlingSingleImageEffectModelName,
)
from comfy_api_nodes.apis.kling_api import (
from comfy_api_nodes.apis.kling import (
ImageToVideoWithAudioRequest,
MotionControlRequest,
OmniImageParamImage,
@@ -249,7 +249,6 @@ async def finish_omni_video_task(cls: type[IO.ComfyNode], response: TaskStatusRe
ApiEndpoint(path=f"/proxy/kling/v1/videos/omni-video/{response.data.task_id}"),
response_model=TaskStatusResponse,
status_extractor=lambda r: (r.data.task_status if r.data else None),
max_poll_attempts=160,
)
return IO.NodeOutput(await download_url_to_video_output(final_response.data.task_result.videos[0].url))
@@ -567,7 +566,7 @@ async def execute_lipsync(
# Upload the audio file to Comfy API and get download URL
if audio:
audio_url = await upload_audio_to_comfyapi(
cls, audio, container_format="mp3", codec_name="libmp3lame", mime_type="audio/mpeg", filename="output.mp3"
cls, audio, container_format="mp3", codec_name="libmp3lame", mime_type="audio/mpeg"
)
logging.info("Uploaded audio to Comfy API. URL: %s", audio_url)
else:
@@ -764,6 +763,33 @@ class KlingTextToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["mode"]),
expr="""
(
$m := widgets.mode;
$contains($m,"v2-5-turbo")
? ($contains($m,"10") ? {"type":"usd","usd":0.7} : {"type":"usd","usd":0.35})
: $contains($m,"v2-1-master")
? ($contains($m,"10s") ? {"type":"usd","usd":2.8} : {"type":"usd","usd":1.4})
: $contains($m,"v2-master")
? ($contains($m,"10s") ? {"type":"usd","usd":2.8} : {"type":"usd","usd":1.4})
: $contains($m,"v1-6")
? (
$contains($m,"pro")
? ($contains($m,"10s") ? {"type":"usd","usd":0.98} : {"type":"usd","usd":0.49})
: ($contains($m,"10s") ? {"type":"usd","usd":0.56} : {"type":"usd","usd":0.28})
)
: $contains($m,"v1")
? (
$contains($m,"pro")
? ($contains($m,"10s") ? {"type":"usd","usd":0.98} : {"type":"usd","usd":0.49})
: ($contains($m,"10s") ? {"type":"usd","usd":0.28} : {"type":"usd","usd":0.14})
)
: {"type":"usd","usd":0.14}
)
""",
),
)
@classmethod
@@ -818,6 +844,16 @@ class OmniProTextToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["duration", "resolution"]),
expr="""
(
$mode := (widgets.resolution = "720p") ? "std" : "pro";
$rates := {"std": 0.084, "pro": 0.112};
{"type":"usd","usd": $lookup($rates, $mode) * widgets.duration}
)
""",
),
)
@classmethod
@@ -886,6 +922,16 @@ class OmniProFirstLastFrameNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["duration", "resolution"]),
expr="""
(
$mode := (widgets.resolution = "720p") ? "std" : "pro";
$rates := {"std": 0.084, "pro": 0.112};
{"type":"usd","usd": $lookup($rates, $mode) * widgets.duration}
)
""",
),
)
@classmethod
@@ -981,6 +1027,16 @@ class OmniProImageToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["duration", "resolution"]),
expr="""
(
$mode := (widgets.resolution = "720p") ? "std" : "pro";
$rates := {"std": 0.084, "pro": 0.112};
{"type":"usd","usd": $lookup($rates, $mode) * widgets.duration}
)
""",
),
)
@classmethod
@@ -1056,6 +1112,16 @@ class OmniProVideoToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["duration", "resolution"]),
expr="""
(
$mode := (widgets.resolution = "720p") ? "std" : "pro";
$rates := {"std": 0.126, "pro": 0.168};
{"type":"usd","usd": $lookup($rates, $mode) * widgets.duration}
)
""",
),
)
@classmethod
@@ -1142,6 +1208,16 @@ class OmniProEditVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["resolution"]),
expr="""
(
$mode := (widgets.resolution = "720p") ? "std" : "pro";
$rates := {"std": 0.126, "pro": 0.168};
{"type":"usd","usd": $lookup($rates, $mode), "format":{"suffix":"/second"}}
)
""",
),
)
@classmethod
@@ -1228,6 +1304,9 @@ class OmniProImageNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.028}""",
),
)
@classmethod
@@ -1313,6 +1392,9 @@ class KlingCameraControlT2VNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.14}""",
),
)
@classmethod
@@ -1375,6 +1457,33 @@ class KlingImage2VideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["mode", "model_name", "duration"]),
expr="""
(
$mode := widgets.mode;
$model := widgets.model_name;
$dur := widgets.duration;
$contains($model,"v2-5-turbo")
? ($contains($dur,"10") ? {"type":"usd","usd":0.7} : {"type":"usd","usd":0.35})
: ($contains($model,"v2-1-master") or $contains($model,"v2-master"))
? ($contains($dur,"10") ? {"type":"usd","usd":2.8} : {"type":"usd","usd":1.4})
: ($contains($model,"v2-1") or $contains($model,"v1-6") or $contains($model,"v1-5"))
? (
$contains($mode,"pro")
? ($contains($dur,"10") ? {"type":"usd","usd":0.98} : {"type":"usd","usd":0.49})
: ($contains($dur,"10") ? {"type":"usd","usd":0.56} : {"type":"usd","usd":0.28})
)
: $contains($model,"v1")
? (
$contains($mode,"pro")
? ($contains($dur,"10") ? {"type":"usd","usd":0.98} : {"type":"usd","usd":0.49})
: ($contains($dur,"10") ? {"type":"usd","usd":0.28} : {"type":"usd","usd":0.14})
)
: {"type":"usd","usd":0.14}
)
""",
),
)
@classmethod
@@ -1448,6 +1557,9 @@ class KlingCameraControlI2VNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.49}""",
),
)
@classmethod
@@ -1518,6 +1630,33 @@ class KlingStartEndFrameNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["mode"]),
expr="""
(
$m := widgets.mode;
$contains($m,"v2-5-turbo")
? ($contains($m,"10") ? {"type":"usd","usd":0.7} : {"type":"usd","usd":0.35})
: $contains($m,"v2-1")
? ($contains($m,"10s") ? {"type":"usd","usd":0.98} : {"type":"usd","usd":0.49})
: $contains($m,"v2-master")
? ($contains($m,"10s") ? {"type":"usd","usd":2.8} : {"type":"usd","usd":1.4})
: $contains($m,"v1-6")
? (
$contains($m,"pro")
? ($contains($m,"10s") ? {"type":"usd","usd":0.98} : {"type":"usd","usd":0.49})
: ($contains($m,"10s") ? {"type":"usd","usd":0.56} : {"type":"usd","usd":0.28})
)
: $contains($m,"v1")
? (
$contains($m,"pro")
? ($contains($m,"10s") ? {"type":"usd","usd":0.98} : {"type":"usd","usd":0.49})
: ($contains($m,"10s") ? {"type":"usd","usd":0.28} : {"type":"usd","usd":0.14})
)
: {"type":"usd","usd":0.14}
)
""",
),
)
@classmethod
@@ -1583,6 +1722,9 @@ class KlingVideoExtendNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.28}""",
),
)
@classmethod
@@ -1664,6 +1806,29 @@ class KlingDualCharacterVideoEffectNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["mode", "model_name", "duration"]),
expr="""
(
$mode := widgets.mode;
$model := widgets.model_name;
$dur := widgets.duration;
($contains($model,"v1-6") or $contains($model,"v1-5"))
? (
$contains($mode,"pro")
? ($contains($dur,"10") ? {"type":"usd","usd":0.98} : {"type":"usd","usd":0.49})
: ($contains($dur,"10") ? {"type":"usd","usd":0.56} : {"type":"usd","usd":0.28})
)
: $contains($model,"v1")
? (
$contains($mode,"pro")
? ($contains($dur,"10") ? {"type":"usd","usd":0.98} : {"type":"usd","usd":0.49})
: ($contains($dur,"10") ? {"type":"usd","usd":0.28} : {"type":"usd","usd":0.14})
)
: {"type":"usd","usd":0.14}
)
""",
),
)
@classmethod
@@ -1728,6 +1893,16 @@ class KlingSingleImageVideoEffectNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["effect_scene"]),
expr="""
(
($contains(widgets.effect_scene,"dizzydizzy") or $contains(widgets.effect_scene,"bloombloom"))
? {"type":"usd","usd":0.49}
: {"type":"usd","usd":0.28}
)
""",
),
)
@classmethod
@@ -1782,6 +1957,9 @@ class KlingLipSyncAudioToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.1,"format":{"approximate":true}}""",
),
)
@classmethod
@@ -1842,6 +2020,9 @@ class KlingLipSyncTextToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.1,"format":{"approximate":true}}""",
),
)
@classmethod
@@ -1892,6 +2073,9 @@ class KlingVirtualTryOnNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.7}""",
),
)
@classmethod
@@ -1991,6 +2175,19 @@ class KlingImageGenerationNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model_name", "n"], inputs=["image"]),
expr="""
(
$m := widgets.model_name;
$base :=
$contains($m,"kling-v1-5")
? (inputs.image.connected ? 0.028 : 0.014)
: ($contains($m,"kling-v1") ? 0.0035 : 0.014);
{"type":"usd","usd": $base * widgets.n}
)
""",
),
)
@classmethod
@@ -2074,6 +2271,10 @@ class TextToVideoWithAudio(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["duration", "generate_audio"]),
expr="""{"type":"usd","usd": 0.07 * widgets.duration * (widgets.generate_audio ? 2 : 1)}""",
),
)
@classmethod
@@ -2138,6 +2339,10 @@ class ImageToVideoWithAudio(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["duration", "generate_audio"]),
expr="""{"type":"usd","usd": 0.07 * widgets.duration * (widgets.generate_audio ? 2 : 1)}""",
),
)
@classmethod
@@ -2218,6 +2423,15 @@ class MotionControl(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["mode"]),
expr="""
(
$prices := {"std": 0.07, "pro": 0.112};
{"type":"usd","usd": $lookup($prices, widgets.mode), "format":{"suffix":"/second"}}
)
""",
),
)
@classmethod

View File

@@ -28,6 +28,22 @@ class ExecuteTaskRequest(BaseModel):
image_uri: str | None = Field(None)
PRICE_BADGE = IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model", "duration", "resolution"]),
expr="""
(
$prices := {
"ltx-2 (pro)": {"1920x1080":0.06,"2560x1440":0.12,"3840x2160":0.24},
"ltx-2 (fast)": {"1920x1080":0.04,"2560x1440":0.08,"3840x2160":0.16}
};
$modelPrices := $lookup($prices, $lowercase(widgets.model));
$pps := $lookup($modelPrices, widgets.resolution);
{"type":"usd","usd": $pps * widgets.duration}
)
""",
)
class TextToVideoNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
@@ -69,6 +85,7 @@ class TextToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=PRICE_BADGE,
)
@classmethod
@@ -145,6 +162,7 @@ class ImageToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=PRICE_BADGE,
)
@classmethod

View File

@@ -4,7 +4,7 @@ import torch
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension
from comfy_api_nodes.apis.luma_api import (
from comfy_api_nodes.apis.luma import (
LumaAspectRatio,
LumaCharacterRef,
LumaConceptChain,
@@ -189,6 +189,19 @@ class LumaImageGenerationNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model"]),
expr="""
(
$m := widgets.model;
$contains($m,"photon-flash-1")
? {"type":"usd","usd":0.0027}
: $contains($m,"photon-1")
? {"type":"usd","usd":0.0104}
: {"type":"usd","usd":0.0246}
)
""",
),
)
@classmethod
@@ -303,6 +316,19 @@ class LumaImageModifyNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model"]),
expr="""
(
$m := widgets.model;
$contains($m,"photon-flash-1")
? {"type":"usd","usd":0.0027}
: $contains($m,"photon-1")
? {"type":"usd","usd":0.0104}
: {"type":"usd","usd":0.0246}
)
""",
),
)
@classmethod
@@ -395,6 +421,7 @@ class LumaTextToVideoGenerationNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=PRICE_BADGE_VIDEO,
)
@classmethod
@@ -505,6 +532,8 @@ class LumaImageToVideoGenerationNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=PRICE_BADGE_VIDEO,
)
@classmethod
@@ -568,6 +597,53 @@ class LumaImageToVideoGenerationNode(IO.ComfyNode):
return LumaKeyframes(frame0=frame0, frame1=frame1)
PRICE_BADGE_VIDEO = IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model", "resolution", "duration"]),
expr="""
(
$p := {
"ray-flash-2": {
"5s": {"4k":3.13,"1080p":0.79,"720p":0.34,"540p":0.2},
"9s": {"4k":5.65,"1080p":1.42,"720p":0.61,"540p":0.36}
},
"ray-2": {
"5s": {"4k":9.11,"1080p":2.27,"720p":1.02,"540p":0.57},
"9s": {"4k":16.4,"1080p":4.1,"720p":1.83,"540p":1.03}
}
};
$m := widgets.model;
$d := widgets.duration;
$r := widgets.resolution;
$modelKey :=
$contains($m,"ray-flash-2") ? "ray-flash-2" :
$contains($m,"ray-2") ? "ray-2" :
$contains($m,"ray-1-6") ? "ray-1-6" :
"other";
$durKey := $contains($d,"5s") ? "5s" : $contains($d,"9s") ? "9s" : "";
$resKey :=
$contains($r,"4k") ? "4k" :
$contains($r,"1080p") ? "1080p" :
$contains($r,"720p") ? "720p" :
$contains($r,"540p") ? "540p" : "";
$modelPrices := $lookup($p, $modelKey);
$durPrices := $lookup($modelPrices, $durKey);
$v := $lookup($durPrices, $resKey);
$price :=
($modelKey = "ray-1-6") ? 0.5 :
($modelKey = "other") ? 0.79 :
($exists($v) ? $v : 0.79);
{"type":"usd","usd": $price}
)
""",
)
class LumaExtension(ComfyExtension):
@override
async def get_node_list(self) -> list[type[IO.ComfyNode]]:

View File

@@ -0,0 +1,889 @@
import math
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis.magnific import (
ImageRelightAdvancedSettingsRequest,
ImageRelightRequest,
ImageSkinEnhancerCreativeRequest,
ImageSkinEnhancerFaithfulRequest,
ImageSkinEnhancerFlexibleRequest,
ImageStyleTransferRequest,
ImageUpscalerCreativeRequest,
ImageUpscalerPrecisionV2Request,
InputAdvancedSettings,
InputPortraitMode,
InputSkinEnhancerMode,
TaskResponse,
)
from comfy_api_nodes.util import (
ApiEndpoint,
download_url_to_image_tensor,
downscale_image_tensor,
get_image_dimensions,
get_number_of_images,
poll_op,
sync_op,
upload_images_to_comfyapi,
validate_image_aspect_ratio,
validate_image_dimensions,
)
class MagnificImageUpscalerCreativeNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MagnificImageUpscalerCreativeNode",
display_name="Magnific Image Upscale (Creative)",
category="api node/image/Magnific",
description="Promptguided enhancement, stylization, and 2x/4x/8x/16x upscaling. "
"Maximum output: 25.3 megapixels.",
inputs=[
IO.Image.Input("image"),
IO.String.Input("prompt", multiline=True, default=""),
IO.Combo.Input("scale_factor", options=["2x", "4x", "8x", "16x"]),
IO.Combo.Input(
"optimized_for",
options=[
"standard",
"soft_portraits",
"hard_portraits",
"art_n_illustration",
"videogame_assets",
"nature_n_landscapes",
"films_n_photography",
"3d_renders",
"science_fiction_n_horror",
],
),
IO.Int.Input("creativity", min=-10, max=10, default=0, display_mode=IO.NumberDisplay.slider),
IO.Int.Input(
"hdr",
min=-10,
max=10,
default=0,
tooltip="The level of definition and detail.",
display_mode=IO.NumberDisplay.slider,
),
IO.Int.Input(
"resemblance",
min=-10,
max=10,
default=0,
tooltip="The level of resemblance to the original image.",
display_mode=IO.NumberDisplay.slider,
),
IO.Int.Input(
"fractality",
min=-10,
max=10,
default=0,
tooltip="The strength of the prompt and intricacy per square pixel.",
display_mode=IO.NumberDisplay.slider,
),
IO.Combo.Input(
"engine",
options=["automatic", "magnific_illusio", "magnific_sharpy", "magnific_sparkle"],
),
IO.Boolean.Input(
"auto_downscale",
default=False,
tooltip="Automatically downscale input image if output would exceed maximum pixel limit.",
),
],
outputs=[
IO.Image.Output(),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["scale_factor"]),
expr="""
(
$max := widgets.scale_factor = "2x" ? 1.326 : 1.657;
{"type": "range_usd", "min_usd": 0.11, "max_usd": $max}
)
""",
),
)
@classmethod
async def execute(
cls,
image: Input.Image,
prompt: str,
scale_factor: str,
optimized_for: str,
creativity: int,
hdr: int,
resemblance: int,
fractality: int,
engine: str,
auto_downscale: bool,
) -> IO.NodeOutput:
if get_number_of_images(image) != 1:
raise ValueError("Exactly one input image is required.")
validate_image_aspect_ratio(image, (1, 3), (3, 1), strict=False)
validate_image_dimensions(image, min_height=160, min_width=160)
max_output_pixels = 25_300_000
height, width = get_image_dimensions(image)
requested_scale = int(scale_factor.rstrip("x"))
output_pixels = height * width * requested_scale * requested_scale
if output_pixels > max_output_pixels:
if auto_downscale:
# Find optimal scale factor that doesn't require >2x downscale.
# Server upscales in 2x steps, so aggressive downscaling degrades quality.
input_pixels = width * height
scale = 2
max_input_pixels = max_output_pixels // 4
for candidate in [16, 8, 4, 2]:
if candidate > requested_scale:
continue
scale_output_pixels = input_pixels * candidate * candidate
if scale_output_pixels <= max_output_pixels:
scale = candidate
max_input_pixels = None
break
downscale_ratio = math.sqrt(scale_output_pixels / max_output_pixels)
if downscale_ratio <= 2.0:
scale = candidate
max_input_pixels = max_output_pixels // (candidate * candidate)
break
if max_input_pixels is not None:
image = downscale_image_tensor(image, total_pixels=max_input_pixels)
scale_factor = f"{scale}x"
else:
raise ValueError(
f"Output size ({width * requested_scale}x{height * requested_scale} = {output_pixels:,} pixels) "
f"exceeds maximum allowed size of {max_output_pixels:,} pixels. "
f"Use a smaller input image or lower scale factor."
)
initial_res = await sync_op(
cls,
ApiEndpoint(path="/proxy/freepik/v1/ai/image-upscaler", method="POST"),
response_model=TaskResponse,
data=ImageUpscalerCreativeRequest(
image=(await upload_images_to_comfyapi(cls, image, max_images=1, total_pixels=None))[0],
scale_factor=scale_factor,
optimized_for=optimized_for,
creativity=creativity,
hdr=hdr,
resemblance=resemblance,
fractality=fractality,
engine=engine,
prompt=prompt if prompt else None,
),
)
final_response = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/freepik/v1/ai/image-upscaler/{initial_res.task_id}"),
response_model=TaskResponse,
status_extractor=lambda x: x.status,
poll_interval=10.0,
max_poll_attempts=480,
)
return IO.NodeOutput(await download_url_to_image_tensor(final_response.generated[0]))
class MagnificImageUpscalerPreciseV2Node(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MagnificImageUpscalerPreciseV2Node",
display_name="Magnific Image Upscale (Precise V2)",
category="api node/image/Magnific",
description="High-fidelity upscaling with fine control over sharpness, grain, and detail. "
"Maximum output: 10060×10060 pixels.",
inputs=[
IO.Image.Input("image"),
IO.Combo.Input("scale_factor", options=["2x", "4x", "8x", "16x"]),
IO.Combo.Input(
"flavor",
options=["sublime", "photo", "photo_denoiser"],
tooltip="Processing style: "
"sublime for general use, photo for photographs, photo_denoiser for noisy photos.",
),
IO.Int.Input(
"sharpen",
min=0,
max=100,
default=7,
tooltip="Image sharpness intensity. Higher values increase edge definition and clarity.",
display_mode=IO.NumberDisplay.slider,
),
IO.Int.Input(
"smart_grain",
min=0,
max=100,
default=7,
tooltip="Intelligent grain/texture enhancement to prevent the image from "
"looking too smooth or artificial.",
display_mode=IO.NumberDisplay.slider,
),
IO.Int.Input(
"ultra_detail",
min=0,
max=100,
default=30,
tooltip="Controls fine detail, textures, and micro-details added during upscaling.",
display_mode=IO.NumberDisplay.slider,
),
IO.Boolean.Input(
"auto_downscale",
default=False,
tooltip="Automatically downscale input image if output would exceed maximum resolution.",
),
],
outputs=[
IO.Image.Output(),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["scale_factor"]),
expr="""
(
$max := widgets.scale_factor = "2x" ? 1.326 : 1.657;
{"type": "range_usd", "min_usd": 0.11, "max_usd": $max}
)
""",
),
)
@classmethod
async def execute(
cls,
image: Input.Image,
scale_factor: str,
flavor: str,
sharpen: int,
smart_grain: int,
ultra_detail: int,
auto_downscale: bool,
) -> IO.NodeOutput:
if get_number_of_images(image) != 1:
raise ValueError("Exactly one input image is required.")
validate_image_aspect_ratio(image, (1, 3), (3, 1), strict=False)
validate_image_dimensions(image, min_height=160, min_width=160)
max_output_dimension = 10060
height, width = get_image_dimensions(image)
requested_scale = int(scale_factor.strip("x"))
output_width = width * requested_scale
output_height = height * requested_scale
if output_width > max_output_dimension or output_height > max_output_dimension:
if auto_downscale:
# Find optimal scale factor that doesn't require >2x downscale.
# Server upscales in 2x steps, so aggressive downscaling degrades quality.
max_dim = max(width, height)
scale = 2
max_input_dim = max_output_dimension // 2
scale_ratio = max_input_dim / max_dim
max_input_pixels = int(width * height * scale_ratio * scale_ratio)
for candidate in [16, 8, 4, 2]:
if candidate > requested_scale:
continue
output_dim = max_dim * candidate
if output_dim <= max_output_dimension:
scale = candidate
max_input_pixels = None
break
downscale_ratio = output_dim / max_output_dimension
if downscale_ratio <= 2.0:
scale = candidate
max_input_dim = max_output_dimension // candidate
scale_ratio = max_input_dim / max_dim
max_input_pixels = int(width * height * scale_ratio * scale_ratio)
break
if max_input_pixels is not None:
image = downscale_image_tensor(image, total_pixels=max_input_pixels)
requested_scale = scale
else:
raise ValueError(
f"Output dimensions ({output_width}x{output_height}) exceed maximum allowed "
f"resolution of {max_output_dimension}x{max_output_dimension} pixels. "
f"Use a smaller input image or lower scale factor."
)
initial_res = await sync_op(
cls,
ApiEndpoint(path="/proxy/freepik/v1/ai/image-upscaler-precision-v2", method="POST"),
response_model=TaskResponse,
data=ImageUpscalerPrecisionV2Request(
image=(await upload_images_to_comfyapi(cls, image, max_images=1, total_pixels=None))[0],
scale_factor=requested_scale,
flavor=flavor,
sharpen=sharpen,
smart_grain=smart_grain,
ultra_detail=ultra_detail,
),
)
final_response = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/freepik/v1/ai/image-upscaler-precision-v2/{initial_res.task_id}"),
response_model=TaskResponse,
status_extractor=lambda x: x.status,
poll_interval=10.0,
max_poll_attempts=480,
)
return IO.NodeOutput(await download_url_to_image_tensor(final_response.generated[0]))
class MagnificImageStyleTransferNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MagnificImageStyleTransferNode",
display_name="Magnific Image Style Transfer",
category="api node/image/Magnific",
description="Transfer the style from a reference image to your input image.",
inputs=[
IO.Image.Input("image", tooltip="The image to apply style transfer to."),
IO.Image.Input("reference_image", tooltip="The reference image to extract style from."),
IO.String.Input("prompt", multiline=True, default=""),
IO.Int.Input(
"style_strength",
min=0,
max=100,
default=100,
tooltip="Percentage of style strength.",
display_mode=IO.NumberDisplay.slider,
),
IO.Int.Input(
"structure_strength",
min=0,
max=100,
default=50,
tooltip="Maintains the structure of the original image.",
display_mode=IO.NumberDisplay.slider,
),
IO.Combo.Input(
"flavor",
options=["faithful", "gen_z", "psychedelia", "detaily", "clear", "donotstyle", "donotstyle_sharp"],
tooltip="Style transfer flavor.",
),
IO.Combo.Input(
"engine",
options=[
"balanced",
"definio",
"illusio",
"3d_cartoon",
"colorful_anime",
"caricature",
"real",
"super_real",
"softy",
],
tooltip="Processing engine selection.",
),
IO.DynamicCombo.Input(
"portrait_mode",
options=[
IO.DynamicCombo.Option("disabled", []),
IO.DynamicCombo.Option(
"enabled",
[
IO.Combo.Input(
"portrait_style",
options=["standard", "pop", "super_pop"],
tooltip="Visual style applied to portrait images.",
),
IO.Combo.Input(
"portrait_beautifier",
options=["none", "beautify_face", "beautify_face_max"],
tooltip="Facial beautification intensity on portraits.",
),
],
),
],
tooltip="Enable portrait mode for facial enhancements.",
),
IO.Boolean.Input(
"fixed_generation",
default=True,
tooltip="When disabled, expect each generation to introduce a degree of randomness, "
"leading to more diverse outcomes.",
),
],
outputs=[
IO.Image.Output(),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.11}""",
),
)
@classmethod
async def execute(
cls,
image: Input.Image,
reference_image: Input.Image,
prompt: str,
style_strength: int,
structure_strength: int,
flavor: str,
engine: str,
portrait_mode: InputPortraitMode,
fixed_generation: bool,
) -> IO.NodeOutput:
if get_number_of_images(image) != 1:
raise ValueError("Exactly one input image is required.")
if get_number_of_images(reference_image) != 1:
raise ValueError("Exactly one reference image is required.")
validate_image_aspect_ratio(image, (1, 3), (3, 1), strict=False)
validate_image_aspect_ratio(reference_image, (1, 3), (3, 1), strict=False)
validate_image_dimensions(image, min_height=160, min_width=160)
validate_image_dimensions(reference_image, min_height=160, min_width=160)
is_portrait = portrait_mode["portrait_mode"] == "enabled"
portrait_style = portrait_mode.get("portrait_style", "standard")
portrait_beautifier = portrait_mode.get("portrait_beautifier", "none")
uploaded_urls = await upload_images_to_comfyapi(cls, [image, reference_image], max_images=2)
initial_res = await sync_op(
cls,
ApiEndpoint(path="/proxy/freepik/v1/ai/image-style-transfer", method="POST"),
response_model=TaskResponse,
data=ImageStyleTransferRequest(
image=uploaded_urls[0],
reference_image=uploaded_urls[1],
prompt=prompt if prompt else None,
style_strength=style_strength,
structure_strength=structure_strength,
is_portrait=is_portrait,
portrait_style=portrait_style if is_portrait else None,
portrait_beautifier=portrait_beautifier if is_portrait and portrait_beautifier != "none" else None,
flavor=flavor,
engine=engine,
fixed_generation=fixed_generation,
),
)
final_response = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/freepik/v1/ai/image-style-transfer/{initial_res.task_id}"),
response_model=TaskResponse,
status_extractor=lambda x: x.status,
poll_interval=10.0,
max_poll_attempts=480,
)
return IO.NodeOutput(await download_url_to_image_tensor(final_response.generated[0]))
class MagnificImageRelightNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MagnificImageRelightNode",
display_name="Magnific Image Relight",
category="api node/image/Magnific",
description="Relight an image with lighting adjustments and optional reference-based light transfer.",
inputs=[
IO.Image.Input("image", tooltip="The image to relight."),
IO.String.Input(
"prompt",
multiline=True,
default="",
tooltip="Descriptive guidance for lighting. Supports emphasis notation (1-1.4).",
),
IO.Int.Input(
"light_transfer_strength",
min=0,
max=100,
default=100,
tooltip="Intensity of light transfer application.",
display_mode=IO.NumberDisplay.slider,
),
IO.Combo.Input(
"style",
options=[
"standard",
"darker_but_realistic",
"clean",
"smooth",
"brighter",
"contrasted_n_hdr",
"just_composition",
],
tooltip="Stylistic output preference.",
),
IO.Boolean.Input(
"interpolate_from_original",
default=False,
tooltip="Restricts generation freedom to match original more closely.",
),
IO.Boolean.Input(
"change_background",
default=True,
tooltip="Modifies background based on prompt/reference.",
),
IO.Boolean.Input(
"preserve_details",
default=True,
tooltip="Maintains texture and fine details from original.",
),
IO.DynamicCombo.Input(
"advanced_settings",
options=[
IO.DynamicCombo.Option("disabled", []),
IO.DynamicCombo.Option(
"enabled",
[
IO.Int.Input(
"whites",
min=0,
max=100,
default=50,
tooltip="Adjusts the brightest tones in the image.",
display_mode=IO.NumberDisplay.slider,
),
IO.Int.Input(
"blacks",
min=0,
max=100,
default=50,
tooltip="Adjusts the darkest tones in the image.",
display_mode=IO.NumberDisplay.slider,
),
IO.Int.Input(
"brightness",
min=0,
max=100,
default=50,
tooltip="Overall brightness adjustment.",
display_mode=IO.NumberDisplay.slider,
),
IO.Int.Input(
"contrast",
min=0,
max=100,
default=50,
tooltip="Contrast adjustment.",
display_mode=IO.NumberDisplay.slider,
),
IO.Int.Input(
"saturation",
min=0,
max=100,
default=50,
tooltip="Color saturation adjustment.",
display_mode=IO.NumberDisplay.slider,
),
IO.Combo.Input(
"engine",
options=[
"automatic",
"balanced",
"cool",
"real",
"illusio",
"fairy",
"colorful_anime",
"hard_transform",
"softy",
],
tooltip="Processing engine selection.",
),
IO.Combo.Input(
"transfer_light_a",
options=["automatic", "low", "medium", "normal", "high", "high_on_faces"],
tooltip="The intensity of light transfer.",
),
IO.Combo.Input(
"transfer_light_b",
options=[
"automatic",
"composition",
"straight",
"smooth_in",
"smooth_out",
"smooth_both",
"reverse_both",
"soft_in",
"soft_out",
"soft_mid",
# "strong_mid", # Commented out because requests fail when this is set.
"style_shift",
"strong_shift",
],
tooltip="Also modifies light transfer intensity. "
"Can be combined with the previous control for varied effects.",
),
IO.Boolean.Input(
"fixed_generation",
default=True,
tooltip="Ensures consistent output with the same settings.",
),
],
),
],
tooltip="Fine-tuning options for advanced lighting control.",
),
IO.Image.Input(
"reference_image",
optional=True,
tooltip="Optional reference image to transfer lighting from.",
),
],
outputs=[
IO.Image.Output(),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.11}""",
),
)
@classmethod
async def execute(
cls,
image: Input.Image,
prompt: str,
light_transfer_strength: int,
style: str,
interpolate_from_original: bool,
change_background: bool,
preserve_details: bool,
advanced_settings: InputAdvancedSettings,
reference_image: Input.Image | None = None,
) -> IO.NodeOutput:
if get_number_of_images(image) != 1:
raise ValueError("Exactly one input image is required.")
if reference_image is not None and get_number_of_images(reference_image) != 1:
raise ValueError("Exactly one reference image is required.")
validate_image_aspect_ratio(image, (1, 3), (3, 1), strict=False)
validate_image_dimensions(image, min_height=160, min_width=160)
if reference_image is not None:
validate_image_aspect_ratio(reference_image, (1, 3), (3, 1), strict=False)
validate_image_dimensions(reference_image, min_height=160, min_width=160)
image_url = (await upload_images_to_comfyapi(cls, image, max_images=1))[0]
reference_url = None
if reference_image is not None:
reference_url = (await upload_images_to_comfyapi(cls, reference_image, max_images=1))[0]
adv_settings = None
if advanced_settings["advanced_settings"] == "enabled":
adv_settings = ImageRelightAdvancedSettingsRequest(
whites=advanced_settings["whites"],
blacks=advanced_settings["blacks"],
brightness=advanced_settings["brightness"],
contrast=advanced_settings["contrast"],
saturation=advanced_settings["saturation"],
engine=advanced_settings["engine"],
transfer_light_a=advanced_settings["transfer_light_a"],
transfer_light_b=advanced_settings["transfer_light_b"],
fixed_generation=advanced_settings["fixed_generation"],
)
initial_res = await sync_op(
cls,
ApiEndpoint(path="/proxy/freepik/v1/ai/image-relight", method="POST"),
response_model=TaskResponse,
data=ImageRelightRequest(
image=image_url,
prompt=prompt if prompt else None,
transfer_light_from_reference_image=reference_url,
light_transfer_strength=light_transfer_strength,
interpolate_from_original=interpolate_from_original,
change_background=change_background,
style=style,
preserve_details=preserve_details,
advanced_settings=adv_settings,
),
)
final_response = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/freepik/v1/ai/image-relight/{initial_res.task_id}"),
response_model=TaskResponse,
status_extractor=lambda x: x.status,
poll_interval=10.0,
max_poll_attempts=480,
)
return IO.NodeOutput(await download_url_to_image_tensor(final_response.generated[0]))
class MagnificImageSkinEnhancerNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MagnificImageSkinEnhancerNode",
display_name="Magnific Image Skin Enhancer",
category="api node/image/Magnific",
description="Skin enhancement for portraits with multiple processing modes.",
inputs=[
IO.Image.Input("image", tooltip="The portrait image to enhance."),
IO.Int.Input(
"sharpen",
min=0,
max=100,
default=0,
tooltip="Sharpening intensity level.",
display_mode=IO.NumberDisplay.slider,
),
IO.Int.Input(
"smart_grain",
min=0,
max=100,
default=2,
tooltip="Smart grain intensity level.",
display_mode=IO.NumberDisplay.slider,
),
IO.DynamicCombo.Input(
"mode",
options=[
IO.DynamicCombo.Option("creative", []),
IO.DynamicCombo.Option(
"faithful",
[
IO.Int.Input(
"skin_detail",
min=0,
max=100,
default=80,
tooltip="Skin detail enhancement level.",
display_mode=IO.NumberDisplay.slider,
),
],
),
IO.DynamicCombo.Option(
"flexible",
[
IO.Combo.Input(
"optimized_for",
options=[
"enhance_skin",
"improve_lighting",
"enhance_everything",
"transform_to_real",
"no_make_up",
],
tooltip="Enhancement optimization target.",
),
],
),
],
tooltip="Processing mode: creative for artistic enhancement, "
"faithful for preserving original appearance, "
"flexible for targeted optimization.",
),
],
outputs=[
IO.Image.Output(),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["mode"]),
expr="""
(
$rates := {"creative": 0.29, "faithful": 0.37, "flexible": 0.45};
{"type":"usd","usd": $lookup($rates, widgets.mode)}
)
""",
),
)
@classmethod
async def execute(
cls,
image: Input.Image,
sharpen: int,
smart_grain: int,
mode: InputSkinEnhancerMode,
) -> IO.NodeOutput:
if get_number_of_images(image) != 1:
raise ValueError("Exactly one input image is required.")
validate_image_aspect_ratio(image, (1, 3), (3, 1), strict=False)
validate_image_dimensions(image, min_height=160, min_width=160)
image_url = (await upload_images_to_comfyapi(cls, image, max_images=1, total_pixels=4096 * 4096))[0]
selected_mode = mode["mode"]
if selected_mode == "creative":
endpoint = "creative"
data = ImageSkinEnhancerCreativeRequest(
image=image_url,
sharpen=sharpen,
smart_grain=smart_grain,
)
elif selected_mode == "faithful":
endpoint = "faithful"
data = ImageSkinEnhancerFaithfulRequest(
image=image_url,
sharpen=sharpen,
smart_grain=smart_grain,
skin_detail=mode["skin_detail"],
)
else: # flexible
endpoint = "flexible"
data = ImageSkinEnhancerFlexibleRequest(
image=image_url,
sharpen=sharpen,
smart_grain=smart_grain,
optimized_for=mode["optimized_for"],
)
initial_res = await sync_op(
cls,
ApiEndpoint(path=f"/proxy/freepik/v1/ai/skin-enhancer/{endpoint}", method="POST"),
response_model=TaskResponse,
data=data,
)
final_response = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/freepik/v1/ai/skin-enhancer/{initial_res.task_id}"),
response_model=TaskResponse,
status_extractor=lambda x: x.status,
poll_interval=10.0,
max_poll_attempts=480,
)
return IO.NodeOutput(await download_url_to_image_tensor(final_response.generated[0]))
class MagnificExtension(ComfyExtension):
@override
async def get_node_list(self) -> list[type[IO.ComfyNode]]:
return [
# MagnificImageUpscalerCreativeNode,
# MagnificImageUpscalerPreciseV2Node,
MagnificImageStyleTransferNode,
MagnificImageRelightNode,
MagnificImageSkinEnhancerNode,
]
async def comfy_entrypoint() -> MagnificExtension:
return MagnificExtension()

View File

@@ -0,0 +1,790 @@
import os
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis.meshy import (
InputShouldRemesh,
InputShouldTexture,
MeshyAnimationRequest,
MeshyAnimationResult,
MeshyImageToModelRequest,
MeshyModelResult,
MeshyMultiImageToModelRequest,
MeshyRefineTask,
MeshyRiggedResult,
MeshyRiggingRequest,
MeshyTaskResponse,
MeshyTextToModelRequest,
MeshyTextureRequest,
)
from comfy_api_nodes.util import (
ApiEndpoint,
download_url_to_bytesio,
poll_op,
sync_op,
upload_images_to_comfyapi,
validate_string,
)
from folder_paths import get_output_directory
class MeshyTextToModelNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MeshyTextToModelNode",
display_name="Meshy: Text to Model",
category="api node/3d/Meshy",
inputs=[
IO.Combo.Input("model", options=["latest"]),
IO.String.Input("prompt", multiline=True, default=""),
IO.Combo.Input("style", options=["realistic", "sculpture"]),
IO.DynamicCombo.Input(
"should_remesh",
options=[
IO.DynamicCombo.Option(
"true",
[
IO.Combo.Input("topology", options=["triangle", "quad"]),
IO.Int.Input(
"target_polycount",
default=300000,
min=100,
max=300000,
display_mode=IO.NumberDisplay.number,
),
],
),
IO.DynamicCombo.Option("false", []),
],
tooltip="When set to false, returns an unprocessed triangular mesh.",
),
IO.Combo.Input("symmetry_mode", options=["auto", "on", "off"]),
IO.Combo.Input(
"pose_mode",
options=["", "A-pose", "T-pose"],
tooltip="Specify the pose mode for the generated model.",
),
IO.Int.Input(
"seed",
default=0,
min=0,
max=2147483647,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
tooltip="Seed controls whether the node should re-run; "
"results are non-deterministic regardless of seed.",
),
],
outputs=[
IO.String.Output(display_name="model_file"),
IO.Custom("MESHY_TASK_ID").Output(display_name="meshy_task_id"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
is_output_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.8}""",
),
)
@classmethod
async def execute(
cls,
model: str,
prompt: str,
style: str,
should_remesh: InputShouldRemesh,
symmetry_mode: str,
pose_mode: str,
seed: int,
) -> IO.NodeOutput:
validate_string(prompt, field_name="prompt", min_length=1, max_length=600)
response = await sync_op(
cls,
ApiEndpoint(path="/proxy/meshy/openapi/v2/text-to-3d", method="POST"),
response_model=MeshyTaskResponse,
data=MeshyTextToModelRequest(
prompt=prompt,
art_style=style,
ai_model=model,
topology=should_remesh.get("topology", None),
target_polycount=should_remesh.get("target_polycount", None),
should_remesh=should_remesh["should_remesh"] == "true",
symmetry_mode=symmetry_mode,
pose_mode=pose_mode.lower(),
seed=seed,
),
)
result = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/meshy/openapi/v2/text-to-3d/{response.result}"),
response_model=MeshyModelResult,
status_extractor=lambda r: r.status,
progress_extractor=lambda r: r.progress,
)
model_file = f"meshy_model_{response.result}.glb"
await download_url_to_bytesio(result.model_urls.glb, os.path.join(get_output_directory(), model_file))
return IO.NodeOutput(model_file, response.result)
class MeshyRefineNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MeshyRefineNode",
display_name="Meshy: Refine Draft Model",
category="api node/3d/Meshy",
description="Refine a previously created draft model.",
inputs=[
IO.Combo.Input("model", options=["latest"]),
IO.Custom("MESHY_TASK_ID").Input("meshy_task_id"),
IO.Boolean.Input(
"enable_pbr",
default=False,
tooltip="Generate PBR Maps (metallic, roughness, normal) in addition to the base color. "
"Note: this should be set to false when using Sculpture style, "
"as Sculpture style generates its own set of PBR maps.",
),
IO.String.Input(
"texture_prompt",
default="",
multiline=True,
tooltip="Provide a text prompt to guide the texturing process. "
"Maximum 600 characters. Cannot be used at the same time as 'texture_image'.",
),
IO.Image.Input(
"texture_image",
tooltip="Only one of 'texture_image' or 'texture_prompt' may be used at the same time.",
optional=True,
),
],
outputs=[
IO.String.Output(display_name="model_file"),
IO.Custom("MESHY_TASK_ID").Output(display_name="meshy_task_id"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
is_output_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.4}""",
),
)
@classmethod
async def execute(
cls,
model: str,
meshy_task_id: str,
enable_pbr: bool,
texture_prompt: str,
texture_image: Input.Image | None = None,
) -> IO.NodeOutput:
if texture_prompt and texture_image is not None:
raise ValueError("texture_prompt and texture_image cannot be used at the same time")
texture_image_url = None
if texture_prompt:
validate_string(texture_prompt, field_name="texture_prompt", max_length=600)
if texture_image is not None:
texture_image_url = (await upload_images_to_comfyapi(cls, texture_image, wait_label="Uploading texture"))[0]
response = await sync_op(
cls,
endpoint=ApiEndpoint(path="/proxy/meshy/openapi/v2/text-to-3d", method="POST"),
response_model=MeshyTaskResponse,
data=MeshyRefineTask(
preview_task_id=meshy_task_id,
enable_pbr=enable_pbr,
texture_prompt=texture_prompt if texture_prompt else None,
texture_image_url=texture_image_url,
ai_model=model,
),
)
result = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/meshy/openapi/v2/text-to-3d/{response.result}"),
response_model=MeshyModelResult,
status_extractor=lambda r: r.status,
progress_extractor=lambda r: r.progress,
)
model_file = f"meshy_model_{response.result}.glb"
await download_url_to_bytesio(result.model_urls.glb, os.path.join(get_output_directory(), model_file))
return IO.NodeOutput(model_file, response.result)
class MeshyImageToModelNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MeshyImageToModelNode",
display_name="Meshy: Image to Model",
category="api node/3d/Meshy",
inputs=[
IO.Combo.Input("model", options=["latest"]),
IO.Image.Input("image"),
IO.DynamicCombo.Input(
"should_remesh",
options=[
IO.DynamicCombo.Option(
"true",
[
IO.Combo.Input("topology", options=["triangle", "quad"]),
IO.Int.Input(
"target_polycount",
default=300000,
min=100,
max=300000,
display_mode=IO.NumberDisplay.number,
),
],
),
IO.DynamicCombo.Option("false", []),
],
tooltip="When set to false, returns an unprocessed triangular mesh.",
),
IO.Combo.Input("symmetry_mode", options=["auto", "on", "off"]),
IO.DynamicCombo.Input(
"should_texture",
options=[
IO.DynamicCombo.Option(
"true",
[
IO.Boolean.Input(
"enable_pbr",
default=False,
tooltip="Generate PBR Maps (metallic, roughness, normal) "
"in addition to the base color.",
),
IO.String.Input(
"texture_prompt",
default="",
multiline=True,
tooltip="Provide a text prompt to guide the texturing process. "
"Maximum 600 characters. Cannot be used at the same time as 'texture_image'.",
),
IO.Image.Input(
"texture_image",
tooltip="Only one of 'texture_image' or 'texture_prompt' "
"may be used at the same time.",
optional=True,
),
],
),
IO.DynamicCombo.Option("false", []),
],
tooltip="Determines whether textures are generated. "
"Setting it to false skips the texture phase and returns a mesh without textures.",
),
IO.Combo.Input(
"pose_mode",
options=["", "A-pose", "T-pose"],
tooltip="Specify the pose mode for the generated model.",
),
IO.Int.Input(
"seed",
default=0,
min=0,
max=2147483647,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
tooltip="Seed controls whether the node should re-run; "
"results are non-deterministic regardless of seed.",
),
],
outputs=[
IO.String.Output(display_name="model_file"),
IO.Custom("MESHY_TASK_ID").Output(display_name="meshy_task_id"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
is_output_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["should_texture"]),
expr="""
(
$prices := {"true": 1.2, "false": 0.8};
{"type":"usd","usd": $lookup($prices, widgets.should_texture)}
)
""",
),
)
@classmethod
async def execute(
cls,
model: str,
image: Input.Image,
should_remesh: InputShouldRemesh,
symmetry_mode: str,
should_texture: InputShouldTexture,
pose_mode: str,
seed: int,
) -> IO.NodeOutput:
texture = should_texture["should_texture"] == "true"
texture_image_url = texture_prompt = None
if texture:
if should_texture["texture_prompt"] and should_texture["texture_image"] is not None:
raise ValueError("texture_prompt and texture_image cannot be used at the same time")
if should_texture["texture_prompt"]:
validate_string(should_texture["texture_prompt"], field_name="texture_prompt", max_length=600)
texture_prompt = should_texture["texture_prompt"]
if should_texture["texture_image"] is not None:
texture_image_url = (
await upload_images_to_comfyapi(
cls, should_texture["texture_image"], wait_label="Uploading texture"
)
)[0]
response = await sync_op(
cls,
ApiEndpoint(path="/proxy/meshy/openapi/v1/image-to-3d", method="POST"),
response_model=MeshyTaskResponse,
data=MeshyImageToModelRequest(
image_url=(await upload_images_to_comfyapi(cls, image, wait_label="Uploading base image"))[0],
ai_model=model,
topology=should_remesh.get("topology", None),
target_polycount=should_remesh.get("target_polycount", None),
symmetry_mode=symmetry_mode,
should_remesh=should_remesh["should_remesh"] == "true",
should_texture=texture,
enable_pbr=should_texture.get("enable_pbr", None),
pose_mode=pose_mode.lower(),
texture_prompt=texture_prompt,
texture_image_url=texture_image_url,
seed=seed,
),
)
result = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/meshy/openapi/v1/image-to-3d/{response.result}"),
response_model=MeshyModelResult,
status_extractor=lambda r: r.status,
progress_extractor=lambda r: r.progress,
)
model_file = f"meshy_model_{response.result}.glb"
await download_url_to_bytesio(result.model_urls.glb, os.path.join(get_output_directory(), model_file))
return IO.NodeOutput(model_file, response.result)
class MeshyMultiImageToModelNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MeshyMultiImageToModelNode",
display_name="Meshy: Multi-Image to Model",
category="api node/3d/Meshy",
inputs=[
IO.Combo.Input("model", options=["latest"]),
IO.Autogrow.Input(
"images",
template=IO.Autogrow.TemplatePrefix(IO.Image.Input("image"), prefix="image", min=2, max=4),
),
IO.DynamicCombo.Input(
"should_remesh",
options=[
IO.DynamicCombo.Option(
"true",
[
IO.Combo.Input("topology", options=["triangle", "quad"]),
IO.Int.Input(
"target_polycount",
default=300000,
min=100,
max=300000,
display_mode=IO.NumberDisplay.number,
),
],
),
IO.DynamicCombo.Option("false", []),
],
tooltip="When set to false, returns an unprocessed triangular mesh.",
),
IO.Combo.Input("symmetry_mode", options=["auto", "on", "off"]),
IO.DynamicCombo.Input(
"should_texture",
options=[
IO.DynamicCombo.Option(
"true",
[
IO.Boolean.Input(
"enable_pbr",
default=False,
tooltip="Generate PBR Maps (metallic, roughness, normal) "
"in addition to the base color.",
),
IO.String.Input(
"texture_prompt",
default="",
multiline=True,
tooltip="Provide a text prompt to guide the texturing process. "
"Maximum 600 characters. Cannot be used at the same time as 'texture_image'.",
),
IO.Image.Input(
"texture_image",
tooltip="Only one of 'texture_image' or 'texture_prompt' "
"may be used at the same time.",
optional=True,
),
],
),
IO.DynamicCombo.Option("false", []),
],
tooltip="Determines whether textures are generated. "
"Setting it to false skips the texture phase and returns a mesh without textures.",
),
IO.Combo.Input(
"pose_mode",
options=["", "A-pose", "T-pose"],
tooltip="Specify the pose mode for the generated model.",
),
IO.Int.Input(
"seed",
default=0,
min=0,
max=2147483647,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
tooltip="Seed controls whether the node should re-run; "
"results are non-deterministic regardless of seed.",
),
],
outputs=[
IO.String.Output(display_name="model_file"),
IO.Custom("MESHY_TASK_ID").Output(display_name="meshy_task_id"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
is_output_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["should_texture"]),
expr="""
(
$prices := {"true": 0.6, "false": 0.2};
{"type":"usd","usd": $lookup($prices, widgets.should_texture)}
)
""",
),
)
@classmethod
async def execute(
cls,
model: str,
images: IO.Autogrow.Type,
should_remesh: InputShouldRemesh,
symmetry_mode: str,
should_texture: InputShouldTexture,
pose_mode: str,
seed: int,
) -> IO.NodeOutput:
texture = should_texture["should_texture"] == "true"
texture_image_url = texture_prompt = None
if texture:
if should_texture["texture_prompt"] and should_texture["texture_image"] is not None:
raise ValueError("texture_prompt and texture_image cannot be used at the same time")
if should_texture["texture_prompt"]:
validate_string(should_texture["texture_prompt"], field_name="texture_prompt", max_length=600)
texture_prompt = should_texture["texture_prompt"]
if should_texture["texture_image"] is not None:
texture_image_url = (
await upload_images_to_comfyapi(
cls, should_texture["texture_image"], wait_label="Uploading texture"
)
)[0]
response = await sync_op(
cls,
ApiEndpoint(path="/proxy/meshy/openapi/v1/multi-image-to-3d", method="POST"),
response_model=MeshyTaskResponse,
data=MeshyMultiImageToModelRequest(
image_urls=await upload_images_to_comfyapi(
cls, list(images.values()), wait_label="Uploading base images"
),
ai_model=model,
topology=should_remesh.get("topology", None),
target_polycount=should_remesh.get("target_polycount", None),
symmetry_mode=symmetry_mode,
should_remesh=should_remesh["should_remesh"] == "true",
should_texture=texture,
enable_pbr=should_texture.get("enable_pbr", None),
pose_mode=pose_mode.lower(),
texture_prompt=texture_prompt,
texture_image_url=texture_image_url,
seed=seed,
),
)
result = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/meshy/openapi/v1/multi-image-to-3d/{response.result}"),
response_model=MeshyModelResult,
status_extractor=lambda r: r.status,
progress_extractor=lambda r: r.progress,
)
model_file = f"meshy_model_{response.result}.glb"
await download_url_to_bytesio(result.model_urls.glb, os.path.join(get_output_directory(), model_file))
return IO.NodeOutput(model_file, response.result)
class MeshyRigModelNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MeshyRigModelNode",
display_name="Meshy: Rig Model",
category="api node/3d/Meshy",
description="Provides a rigged character in standard formats. "
"Auto-rigging is currently not suitable for untextured meshes, non-humanoid assets, "
"or humanoid assets with unclear limb and body structure.",
inputs=[
IO.Custom("MESHY_TASK_ID").Input("meshy_task_id"),
IO.Float.Input(
"height_meters",
min=0.1,
max=15.0,
default=1.7,
tooltip="The approximate height of the character model in meters. "
"This aids in scaling and rigging accuracy.",
),
IO.Image.Input(
"texture_image",
tooltip="The model's UV-unwrapped base color texture image.",
optional=True,
),
],
outputs=[
IO.String.Output(display_name="model_file"),
IO.Custom("MESHY_RIGGED_TASK_ID").Output(display_name="rig_task_id"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
is_output_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.2}""",
),
)
@classmethod
async def execute(
cls,
meshy_task_id: str,
height_meters: float,
texture_image: Input.Image | None = None,
) -> IO.NodeOutput:
texture_image_url = None
if texture_image is not None:
texture_image_url = (await upload_images_to_comfyapi(cls, texture_image, wait_label="Uploading texture"))[0]
response = await sync_op(
cls,
endpoint=ApiEndpoint(path="/proxy/meshy/openapi/v1/rigging", method="POST"),
response_model=MeshyTaskResponse,
data=MeshyRiggingRequest(
input_task_id=meshy_task_id,
height_meters=height_meters,
texture_image_url=texture_image_url,
),
)
result = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/meshy/openapi/v1/rigging/{response.result}"),
response_model=MeshyRiggedResult,
status_extractor=lambda r: r.status,
progress_extractor=lambda r: r.progress,
)
model_file = f"meshy_model_{response.result}.glb"
await download_url_to_bytesio(
result.result.rigged_character_glb_url, os.path.join(get_output_directory(), model_file)
)
return IO.NodeOutput(model_file, response.result)
class MeshyAnimateModelNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MeshyAnimateModelNode",
display_name="Meshy: Animate Model",
category="api node/3d/Meshy",
description="Apply a specific animation action to a previously rigged character.",
inputs=[
IO.Custom("MESHY_RIGGED_TASK_ID").Input("rig_task_id"),
IO.Int.Input(
"action_id",
default=0,
min=0,
max=696,
tooltip="Visit https://docs.meshy.ai/en/api/animation-library for a list of available values.",
),
],
outputs=[
IO.String.Output(display_name="model_file"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
is_output_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.12}""",
),
)
@classmethod
async def execute(
cls,
rig_task_id: str,
action_id: int,
) -> IO.NodeOutput:
response = await sync_op(
cls,
endpoint=ApiEndpoint(path="/proxy/meshy/openapi/v1/animations", method="POST"),
response_model=MeshyTaskResponse,
data=MeshyAnimationRequest(
rig_task_id=rig_task_id,
action_id=action_id,
),
)
result = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/meshy/openapi/v1/animations/{response.result}"),
response_model=MeshyAnimationResult,
status_extractor=lambda r: r.status,
progress_extractor=lambda r: r.progress,
)
model_file = f"meshy_model_{response.result}.glb"
await download_url_to_bytesio(result.result.animation_glb_url, os.path.join(get_output_directory(), model_file))
return IO.NodeOutput(model_file, response.result)
class MeshyTextureNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="MeshyTextureNode",
display_name="Meshy: Texture Model",
category="api node/3d/Meshy",
inputs=[
IO.Combo.Input("model", options=["latest"]),
IO.Custom("MESHY_TASK_ID").Input("meshy_task_id"),
IO.Boolean.Input(
"enable_original_uv",
default=True,
tooltip="Use the original UV of the model instead of generating new UVs. "
"When enabled, Meshy preserves existing textures from the uploaded model. "
"If the model has no original UV, the quality of the output might not be as good.",
),
IO.Boolean.Input("pbr", default=False),
IO.String.Input(
"text_style_prompt",
default="",
multiline=True,
tooltip="Describe your desired texture style of the object using text. Maximum 600 characters."
"Maximum 600 characters. Cannot be used at the same time as 'image_style'.",
),
IO.Image.Input(
"image_style",
optional=True,
tooltip="A 2d image to guide the texturing process. "
"Can not be used at the same time with 'text_style_prompt'.",
),
],
outputs=[
IO.String.Output(display_name="model_file"),
IO.Custom("MODEL_TASK_ID").Output(display_name="meshy_task_id"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
is_output_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.4}""",
),
)
@classmethod
async def execute(
cls,
model: str,
meshy_task_id: str,
enable_original_uv: bool,
pbr: bool,
text_style_prompt: str,
image_style: Input.Image | None = None,
) -> IO.NodeOutput:
if text_style_prompt and image_style is not None:
raise ValueError("text_style_prompt and image_style cannot be used at the same time")
if not text_style_prompt and image_style is None:
raise ValueError("Either text_style_prompt or image_style is required")
image_style_url = None
if image_style is not None:
image_style_url = (await upload_images_to_comfyapi(cls, image_style, wait_label="Uploading style"))[0]
response = await sync_op(
cls,
endpoint=ApiEndpoint(path="/proxy/meshy/openapi/v1/retexture", method="POST"),
response_model=MeshyTaskResponse,
data=MeshyTextureRequest(
input_task_id=meshy_task_id,
ai_model=model,
enable_original_uv=enable_original_uv,
enable_pbr=pbr,
text_style_prompt=text_style_prompt if text_style_prompt else None,
image_style_url=image_style_url,
),
)
result = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/meshy/openapi/v1/retexture/{response.result}"),
response_model=MeshyModelResult,
status_extractor=lambda r: r.status,
progress_extractor=lambda r: r.progress,
)
model_file = f"meshy_model_{response.result}.glb"
await download_url_to_bytesio(result.model_urls.glb, os.path.join(get_output_directory(), model_file))
return IO.NodeOutput(model_file, response.result)
class MeshyExtension(ComfyExtension):
@override
async def get_node_list(self) -> list[type[IO.ComfyNode]]:
return [
MeshyTextToModelNode,
MeshyRefineNode,
MeshyImageToModelNode,
MeshyMultiImageToModelNode,
MeshyRigModelNode,
MeshyAnimateModelNode,
MeshyTextureNode,
]
async def comfy_entrypoint() -> MeshyExtension:
return MeshyExtension()

View File

@@ -4,7 +4,7 @@ import torch
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension
from comfy_api_nodes.apis.minimax_api import (
from comfy_api_nodes.apis.minimax import (
MinimaxFileRetrieveResponse,
MiniMaxModel,
MinimaxTaskResultResponse,
@@ -134,6 +134,9 @@ class MinimaxTextToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.43}""",
),
)
@classmethod
@@ -197,6 +200,9 @@ class MinimaxImageToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.43}""",
),
)
@classmethod
@@ -340,6 +346,20 @@ class MinimaxHailuoVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["resolution", "duration"]),
expr="""
(
$prices := {
"768p": {"6": 0.28, "10": 0.56},
"1080p": {"6": 0.49}
};
$resPrices := $lookup($prices, $lowercase(widgets.resolution));
$price := $lookup($resPrices, $string(widgets.duration));
{"type":"usd","usd": $price ? $price : 0.43}
)
""",
),
)
@classmethod

View File

@@ -3,7 +3,7 @@ import logging
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis import (
from comfy_api_nodes.apis.moonvalley import (
MoonvalleyPromptResponse,
MoonvalleyTextToVideoInferenceParams,
MoonvalleyTextToVideoRequest,
@@ -233,6 +233,10 @@ class MoonvalleyImg2VideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(),
expr="""{"type":"usd","usd": 1.5}""",
),
)
@classmethod
@@ -351,6 +355,10 @@ class MoonvalleyVideo2VideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(),
expr="""{"type":"usd","usd": 2.25}""",
),
)
@classmethod
@@ -471,6 +479,10 @@ class MoonvalleyTxt2VideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(),
expr="""{"type":"usd","usd": 1.5}""",
),
)
@classmethod

View File

@@ -10,24 +10,18 @@ from typing_extensions import override
import folder_paths
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis import (
CreateModelResponseProperties,
Detail,
InputContent,
from comfy_api_nodes.apis.openai import (
InputFileContent,
InputImageContent,
InputMessage,
InputMessageContentList,
InputTextContent,
Item,
ModelResponseProperties,
OpenAICreateResponse,
OpenAIResponse,
OutputContent,
)
from comfy_api_nodes.apis.openai_api import (
OpenAIImageEditRequest,
OpenAIImageGenerationRequest,
OpenAIImageGenerationResponse,
OpenAIResponse,
OutputContent,
)
from comfy_api_nodes.util import (
ApiEndpoint,
@@ -160,6 +154,23 @@ class OpenAIDalle2(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["size", "n"]),
expr="""
(
$size := widgets.size;
$nRaw := widgets.n;
$n := ($nRaw != null and $nRaw != 0) ? $nRaw : 1;
$base :=
$contains($size, "256x256") ? 0.016 :
$contains($size, "512x512") ? 0.018 :
0.02;
{"type":"usd","usd": $round($base * $n, 3)}
)
""",
),
)
@classmethod
@@ -249,7 +260,7 @@ class OpenAIDalle3(IO.ComfyNode):
"seed",
default=0,
min=0,
max=2 ** 31 - 1,
max=2**31 - 1,
step=1,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
@@ -287,6 +298,25 @@ class OpenAIDalle3(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["size", "quality"]),
expr="""
(
$size := widgets.size;
$q := widgets.quality;
$hd := $contains($q, "hd");
$price :=
$contains($size, "1024x1024")
? ($hd ? 0.08 : 0.04)
: (($contains($size, "1792x1024") or $contains($size, "1024x1792"))
? ($hd ? 0.12 : 0.08)
: 0.04);
{"type":"usd","usd": $price}
)
""",
),
)
@classmethod
@@ -334,9 +364,9 @@ class OpenAIGPTImage1(IO.ComfyNode):
def define_schema(cls):
return IO.Schema(
node_id="OpenAIGPTImage1",
display_name="OpenAI GPT Image 1",
display_name="OpenAI GPT Image 1.5",
category="api node/image/OpenAI",
description="Generates images synchronously via OpenAI's GPT Image 1 endpoint.",
description="Generates images synchronously via OpenAI's GPT Image endpoint.",
inputs=[
IO.String.Input(
"prompt",
@@ -348,7 +378,7 @@ class OpenAIGPTImage1(IO.ComfyNode):
"seed",
default=0,
min=0,
max=2 ** 31 - 1,
max=2**31 - 1,
step=1,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
@@ -399,6 +429,7 @@ class OpenAIGPTImage1(IO.ComfyNode):
IO.Combo.Input(
"model",
options=["gpt-image-1", "gpt-image-1.5"],
default="gpt-image-1.5",
optional=True,
),
],
@@ -411,6 +442,28 @@ class OpenAIGPTImage1(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["quality", "n"]),
expr="""
(
$ranges := {
"low": [0.011, 0.02],
"medium": [0.046, 0.07],
"high": [0.167, 0.3]
};
$range := $lookup($ranges, widgets.quality);
$n := widgets.n;
($n = 1)
? {"type":"range_usd","min_usd": $range[0], "max_usd": $range[1]}
: {
"type":"range_usd",
"min_usd": $range[0],
"max_usd": $range[1],
"format": { "suffix": " x " & $string($n) & "/Run" }
}
)
""",
),
)
@classmethod
@@ -442,8 +495,8 @@ class OpenAIGPTImage1(IO.ComfyNode):
files = []
batch_size = image.shape[0]
for i in range(batch_size):
single_image = image[i: i + 1]
scaled_image = downscale_image_tensor(single_image, total_pixels=2048*2048).squeeze()
single_image = image[i : i + 1]
scaled_image = downscale_image_tensor(single_image, total_pixels=2048 * 2048).squeeze()
image_np = (scaled_image.numpy() * 255).astype(np.uint8)
img = Image.fromarray(image_np)
@@ -465,7 +518,7 @@ class OpenAIGPTImage1(IO.ComfyNode):
rgba_mask = torch.zeros(height, width, 4, device="cpu")
rgba_mask[:, :, 3] = 1 - mask.squeeze().cpu()
scaled_mask = downscale_image_tensor(rgba_mask.unsqueeze(0), total_pixels=2048*2048).squeeze()
scaled_mask = downscale_image_tensor(rgba_mask.unsqueeze(0), total_pixels=2048 * 2048).squeeze()
mask_np = (scaled_mask.numpy() * 255).astype(np.uint8)
mask_img = Image.fromarray(mask_np)
@@ -566,32 +619,95 @@ class OpenAIChatNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model"]),
expr="""
(
$m := widgets.model;
$contains($m, "o4-mini") ? {
"type": "list_usd",
"usd": [0.0011, 0.0044],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "o1-pro") ? {
"type": "list_usd",
"usd": [0.15, 0.6],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "o1") ? {
"type": "list_usd",
"usd": [0.015, 0.06],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "o3-mini") ? {
"type": "list_usd",
"usd": [0.0011, 0.0044],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "o3") ? {
"type": "list_usd",
"usd": [0.01, 0.04],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "gpt-4o") ? {
"type": "list_usd",
"usd": [0.0025, 0.01],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "gpt-4.1-nano") ? {
"type": "list_usd",
"usd": [0.0001, 0.0004],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "gpt-4.1-mini") ? {
"type": "list_usd",
"usd": [0.0004, 0.0016],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "gpt-4.1") ? {
"type": "list_usd",
"usd": [0.002, 0.008],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "gpt-5-nano") ? {
"type": "list_usd",
"usd": [0.00005, 0.0004],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "gpt-5-mini") ? {
"type": "list_usd",
"usd": [0.00025, 0.002],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: $contains($m, "gpt-5") ? {
"type": "list_usd",
"usd": [0.00125, 0.01],
"format": { "approximate": true, "separator": "-", "suffix": " per 1K tokens" }
}
: {"type": "text", "text": "Token-based"}
)
""",
),
)
@classmethod
def get_message_content_from_response(
cls, response: OpenAIResponse
) -> list[OutputContent]:
def get_message_content_from_response(cls, response: OpenAIResponse) -> list[OutputContent]:
"""Extract message content from the API response."""
for output in response.output:
if output.root.type == "message":
return output.root.content
if output.type == "message":
return output.content
raise TypeError("No output message found in response")
@classmethod
def get_text_from_message_content(
cls, message_content: list[OutputContent]
) -> str:
def get_text_from_message_content(cls, message_content: list[OutputContent]) -> str:
"""Extract text content from message content."""
for content_item in message_content:
if content_item.root.type == "output_text":
return str(content_item.root.text)
if content_item.type == "output_text":
return str(content_item.text)
return "No text output found in response"
@classmethod
def tensor_to_input_image_content(
cls, image: torch.Tensor, detail_level: Detail = "auto"
) -> InputImageContent:
def tensor_to_input_image_content(cls, image: torch.Tensor, detail_level: str = "auto") -> InputImageContent:
"""Convert a tensor to an input image content object."""
return InputImageContent(
detail=detail_level,
@@ -605,9 +721,9 @@ class OpenAIChatNode(IO.ComfyNode):
prompt: str,
image: torch.Tensor | None = None,
files: list[InputFileContent] | None = None,
) -> InputMessageContentList:
) -> list[InputTextContent | InputImageContent | InputFileContent]:
"""Create a list of input message contents from prompt and optional image."""
content_list: list[InputContent | InputTextContent | InputImageContent | InputFileContent] = [
content_list: list[InputTextContent | InputImageContent | InputFileContent] = [
InputTextContent(text=prompt, type="input_text"),
]
if image is not None:
@@ -619,13 +735,9 @@ class OpenAIChatNode(IO.ComfyNode):
type="input_image",
)
)
if files is not None:
content_list.extend(files)
return InputMessageContentList(
root=content_list,
)
return content_list
@classmethod
async def execute(
@@ -635,7 +747,7 @@ class OpenAIChatNode(IO.ComfyNode):
model: SupportedOpenAIModel = SupportedOpenAIModel.gpt_5.value,
images: torch.Tensor | None = None,
files: list[InputFileContent] | None = None,
advanced_options: CreateModelResponseProperties | None = None,
advanced_options: ModelResponseProperties | None = None,
) -> IO.NodeOutput:
validate_string(prompt, strip_whitespace=False)
@@ -646,36 +758,28 @@ class OpenAIChatNode(IO.ComfyNode):
response_model=OpenAIResponse,
data=OpenAICreateResponse(
input=[
Item(
root=InputMessage(
content=cls.create_input_message_contents(
prompt, images, files
),
role="user",
)
InputMessage(
content=cls.create_input_message_contents(prompt, images, files),
role="user",
),
],
store=True,
stream=False,
model=model,
previous_response_id=None,
**(
advanced_options.model_dump(exclude_none=True)
if advanced_options
else {}
),
**(advanced_options.model_dump(exclude_none=True) if advanced_options else {}),
),
)
response_id = create_response.id
# Get result output
result_response = await poll_op(
cls,
ApiEndpoint(path=f"{RESPONSES_ENDPOINT}/{response_id}"),
response_model=OpenAIResponse,
status_extractor=lambda response: response.status,
completed_statuses=["incomplete", "completed"]
)
cls,
ApiEndpoint(path=f"{RESPONSES_ENDPOINT}/{response_id}"),
response_model=OpenAIResponse,
status_extractor=lambda response: response.status,
completed_statuses=["incomplete", "completed"],
)
return IO.NodeOutput(cls.get_text_from_message_content(cls.get_message_content_from_response(result_response)))
@@ -796,7 +900,7 @@ class OpenAIChatConfig(IO.ComfyNode):
remove depending on model choice.
"""
return IO.NodeOutput(
CreateModelResponseProperties(
ModelResponseProperties(
instructions=instructions,
truncation=truncation,
max_output_tokens=max_output_tokens,

View File

@@ -1,7 +1,7 @@
import torch
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension
from comfy_api_nodes.apis.pixverse_api import (
from comfy_api_nodes.apis.pixverse import (
PixverseTextVideoRequest,
PixverseImageVideoRequest,
PixverseTransitionVideoRequest,
@@ -128,6 +128,7 @@ class PixverseTextToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=PRICE_BADGE_VIDEO,
)
@classmethod
@@ -242,6 +243,7 @@ class PixverseImageToVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=PRICE_BADGE_VIDEO,
)
@classmethod
@@ -355,6 +357,7 @@ class PixverseTransitionVideoNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=PRICE_BADGE_VIDEO,
)
@classmethod
@@ -416,6 +419,33 @@ class PixverseTransitionVideoNode(IO.ComfyNode):
return IO.NodeOutput(await download_url_to_video_output(response_poll.Resp.url))
PRICE_BADGE_VIDEO = IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["duration_seconds", "quality", "motion_mode"]),
expr="""
(
$prices := {
"5": {
"1080p": {"normal": 1.2, "fast": 1.2},
"720p": {"normal": 0.6, "fast": 1.2},
"540p": {"normal": 0.45, "fast": 0.9},
"360p": {"normal": 0.45, "fast": 0.9}
},
"8": {
"1080p": {"normal": 1.2, "fast": 1.2},
"720p": {"normal": 1.2, "fast": 1.2},
"540p": {"normal": 0.9, "fast": 1.2},
"360p": {"normal": 0.9, "fast": 1.2}
}
};
$durPrices := $lookup($prices, $string(widgets.duration_seconds));
$qualityPrices := $lookup($durPrices, widgets.quality);
$price := $lookup($qualityPrices, widgets.motion_mode);
{"type":"usd","usd": $price ? $price : 0.9}
)
""",
)
class PixVerseExtension(ComfyExtension):
@override
async def get_node_list(self) -> list[type[IO.ComfyNode]]:

View File

@@ -8,7 +8,7 @@ from typing_extensions import override
from comfy.utils import ProgressBar
from comfy_api.latest import IO, ComfyExtension
from comfy_api_nodes.apis.recraft_api import (
from comfy_api_nodes.apis.recraft import (
RecraftColor,
RecraftColorChain,
RecraftControls,
@@ -378,6 +378,10 @@ class RecraftTextToImageNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["n"]),
expr="""{"type":"usd","usd": $round(0.04 * widgets.n, 2)}""",
),
)
@classmethod
@@ -490,6 +494,10 @@ class RecraftImageToImageNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["n"]),
expr="""{"type":"usd","usd": $round(0.04 * widgets.n, 2)}""",
),
)
@classmethod
@@ -591,6 +599,10 @@ class RecraftImageInpaintingNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["n"]),
expr="""{"type":"usd","usd": $round(0.04 * widgets.n, 2)}""",
),
)
@classmethod
@@ -692,6 +704,10 @@ class RecraftTextToVectorNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["n"]),
expr="""{"type":"usd","usd": $round(0.08 * widgets.n, 2)}""",
),
)
@classmethod
@@ -759,6 +775,10 @@ class RecraftVectorizeImageNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(),
expr="""{"type":"usd","usd": 0.01}""",
),
)
@classmethod
@@ -817,6 +837,9 @@ class RecraftReplaceBackgroundNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.04}""",
),
)
@classmethod
@@ -883,6 +906,9 @@ class RecraftRemoveBackgroundNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.01}""",
),
)
@classmethod
@@ -929,6 +955,9 @@ class RecraftCrispUpscaleNode(IO.ComfyNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.004}""",
),
)
@classmethod
@@ -972,6 +1001,9 @@ class RecraftCreativeUpscaleNode(RecraftCrispUpscaleNode):
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.25}""",
),
)

Some files were not shown because too many files have changed in this diff Show More