Compare commits

..

28 Commits

Author SHA1 Message Date
bymyself
1202709996 feat: add SEARCH_ALIASES for model and misc nodes
Add search aliases to model-related and miscellaneous nodes:
- Model nodes: nodes_model_merging.py, nodes_model_advanced.py, nodes_lora_extract.py
- Sampler nodes: nodes_custom_sampler.py, nodes_align_your_steps.py
- Control nodes: nodes_controlnet.py, nodes_attention_multiply.py, nodes_hooks.py
- Training nodes: nodes_train.py, nodes_dataset.py
- Utility nodes: nodes_logic.py, nodes_canny.py, nodes_differential_diffusion.py
- Architecture-specific: nodes_sd3.py, nodes_pixart.py, nodes_lumina2.py, nodes_kandinsky5.py, nodes_hidream.py, nodes_fresca.py, nodes_hunyuan3d.py
- Media nodes: nodes_load_3d.py, nodes_webcam.py, nodes_preview_any.py, nodes_wanmove.py

Uses search_aliases parameter in io.Schema() for v3 nodes, SEARCH_ALIASES class attribute for legacy nodes.
2026-01-21 19:26:51 -08:00
bymyself
dcde86463c Propagate search_aliases through V3 Schema.get_v1_info to NodeInfoV1 2026-01-21 15:26:49 -08:00
bymyself
f02abedcd9 feat: Add search_aliases field to node schema
Adds `search_aliases` field to improve node discoverability. Users can define alternative search terms for nodes (e.g., "text concat" → StringConcatenate).

Changes:
- Add `search_aliases: list[str]` to V3 Schema
- Add `SEARCH_ALIASES` support for V1 nodes
- Include field in `/object_info` response
- Add aliases to high-priority core nodes

V1 usage:
```python
class MyNode:
    SEARCH_ALIASES = ["alt name", "synonym"]
```

V3 usage:
```python
io.Schema(
    node_id="MyNode",
    search_aliases=["alt name", "synonym"],
    ...
)
```

## Related PRs
- Frontend: Comfy-Org/ComfyUI_frontend#XXXX (draft - merge after this)
- Docs: Comfy-Org/docs#XXXX (draft - merge after stable)
2026-01-21 14:25:45 -08:00
Alexander Piskun
451af70154 fix(api-nodes-Vidu): allow passing up to 7 subjects in Vidu Reference node (#12002) 2026-01-21 04:03:45 -08:00
Markury
0fc15700be Add LyCoris LoKr MLP layer support for Flux2 (#11997) 2026-01-20 23:18:33 -05:00
comfyanonymous
e755268e7b Config for Qwen 3 0.6B model. (#11998) 2026-01-20 23:08:31 -05:00
Mylo
c4a14df9a3 Dynamically detect chroma radiance patch size (#11991) 2026-01-20 18:46:11 -05:00
Ivan Zorin
965d0ed509 fix: remove normalization of audio in LTX Mel spectrogram creation (#11990)
For LTX Audio VAE, remove normalization of audio during MEL spectrogram creation.
This aligs inference with training and prevents loud audio from being attenuated.
2026-01-20 18:44:28 -05:00
Alexander Piskun
ddc541ffda feat(api-nodes): add WaveSpeed nodes (#11945) 2026-01-20 13:05:40 -08:00
comfyanonymous
8ccc0c94fa Make omni stuff work on regular z image for easier testing. (#11985) 2026-01-20 00:32:00 -05:00
Comfy Org PR Bot
4edb87aa50 Bump comfyui-frontend-package to 1.37.11 (#11976) 2026-01-19 23:57:50 -05:00
ComfyUI Wiki
0fc3b6e3a6 chore: update workflow templates to v0.8.15 (#11984) 2026-01-19 23:17:56 -05:00
comfyanonymous
2108167f9f Support zimage omni base model. (#11979) 2026-01-19 23:17:38 -05:00
comfyanonymous
9d273d3ab1 ComfyUI v0.10.0 2026-01-19 22:40:18 -05:00
comfyanonymous
70c91b8248 Fix #11963 (#11982) 2026-01-19 22:32:40 -05:00
rkfg
0da5a0fe58 Convert mono audio to fake stereo for LTXV VAE encoding (#11965) 2026-01-19 22:12:02 -05:00
comfyanonymous
e0eacb0688 Simpler way to implement the #11980 loras. (#11981) 2026-01-19 22:00:36 -05:00
Jedrzej Kosinski
7458e20465 Make Autogrow validation work properly (#11977)
* In-progress autogrow validation fixes - properly looks at required/optional inputs, now working on the edge case that all inputs are optional and nothing is plugged in (should just be an empty dictionary passed into node)

* Allow autogrow to work with all inputs being optional

* Revert accidentally pushed changes to nodes_logic.py
2026-01-19 16:58:30 -08:00
Jedrzej Kosinski
b931b37e30 feat(api-nodes): add Bria Edit node (#11978)
Co-authored-by: Alexander Piskun <bigcat88@icloud.com>
2026-01-19 16:47:14 -08:00
ComfyUI Wiki
866a4619db chore: update workflow templates to v0.8.14 (#11974) 2026-01-19 14:21:35 -08:00
comfyanonymous
1a72bf2046 Readme update. (#11957) 2026-01-18 19:53:43 -08:00
Alexander Piskun
034fac7054 chore(api-nodes): auto-discover all nodes_*.py files to avoid merge conflicts when adding new API nodes (#11943) 2026-01-17 22:40:39 -08:00
Christian Byrne
a498556d0d feat: add advanced parameter to Input classes for advanced widgets support (#11939)
Add 'advanced' boolean parameter to Input and WidgetInput base classes
and propagate to all typed Input subclasses (Boolean, Int, Float, String,
Combo, MultiCombo, Webcam, MultiType, MatchType, ImageCompare).

When set to True, the frontend will hide these inputs by default in a
collapsible 'Advanced Inputs' section in the right side panel, reducing
visual clutter for power-user options.

This enables nodes to expose advanced configuration options (like encoding
parameters, quality settings, etc.) without overwhelming typical users.

Frontend support: ComfyUI_frontend PR #7812
2026-01-17 19:06:03 -08:00
Alexander Piskun
f7ca41ff62 chore(api-nodes): remove check for pyav>=14.2 in code (it was added to requirements.txt long ago) (#11934) 2026-01-17 18:57:57 -08:00
Alexander Piskun
ac26065e61 chore(api-nodes): remove non-used; extract model to separate files (#11927)
* chore(api-nodes): remove non-used; extract model to separate files

* chore(api-nodes): remove non-needed prefix in filenames
2026-01-17 18:52:45 -08:00
comfyanonymous
190c4416cc Bump comfy-kitchen dependency to version 0.2.7 (#11941) 2026-01-17 21:20:35 -05:00
Theephop
0fd10ffa09 fix: use .cpu() for waveform conversion in AudioFrame creation (#11787) 2026-01-17 20:18:24 -05:00
Alex Butler
00c775950a Update readme rdna3 nightly url (#11937) 2026-01-17 20:18:04 -05:00
92 changed files with 1860 additions and 776 deletions

View File

@@ -108,7 +108,7 @@ See what ComfyUI can do with the [example workflows](https://comfyanonymous.gith
- [LCM models and Loras](https://comfyanonymous.github.io/ComfyUI_examples/lcm/)
- Latent previews with [TAESD](#how-to-show-high-quality-previews)
- Works fully offline: core will never download anything unless you want to.
- Optional API nodes to use paid models from external providers through the online [Comfy API](https://docs.comfy.org/tutorials/api-nodes/overview).
- Optional API nodes to use paid models from external providers through the online [Comfy API](https://docs.comfy.org/tutorials/api-nodes/overview) disable with: `--disable-api-nodes`
- [Config file](extra_model_paths.yaml.example) to set the search paths for models.
Workflow examples can be found on the [Examples page](https://comfyanonymous.github.io/ComfyUI_examples/)
@@ -212,7 +212,7 @@ Python 3.14 works but you may encounter issues with the torch compile node. The
Python 3.13 is very well supported. If you have trouble with some custom node dependencies on 3.13 you can try 3.12
torch 2.4 and above is supported but some features might only work on newer versions. We generally recommend using the latest major version of pytorch with the latest cuda version unless it is less than 2 weeks old.
torch 2.4 and above is supported but some features and optimizations might only work on newer versions. We generally recommend using the latest major version of pytorch with the latest cuda version unless it is less than 2 weeks old.
### Instructions:
@@ -229,7 +229,7 @@ AMD users can install rocm and pytorch with pip if you don't have it already ins
```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.4```
This is the command to install the nightly with ROCm 7.0 which might have some performance improvements:
This is the command to install the nightly with ROCm 7.1 which might have some performance improvements:
```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.1```
@@ -240,7 +240,7 @@ These have less hardware support than the builds above but they work on windows.
RDNA 3 (RX 7000 series):
```pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx110X-dgpu/```
```pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/```
RDNA 3.5 (Strix halo/Ryzen AI Max+ 365):

View File

@@ -103,20 +103,10 @@ class AudioPreprocessor:
return waveform
return torchaudio.functional.resample(waveform, source_rate, self.target_sample_rate)
@staticmethod
def normalize_amplitude(
waveform: torch.Tensor, max_amplitude: float = 0.5, eps: float = 1e-5
) -> torch.Tensor:
waveform = waveform - waveform.mean(dim=2, keepdim=True)
peak = torch.max(torch.abs(waveform)) + eps
scale = peak.clamp(max=max_amplitude) / peak
return waveform * scale
def waveform_to_mel(
self, waveform: torch.Tensor, waveform_sample_rate: int, device
) -> torch.Tensor:
waveform = self.resample(waveform, waveform_sample_rate)
waveform = self.normalize_amplitude(waveform)
mel_transform = torchaudio.transforms.MelSpectrogram(
sample_rate=self.target_sample_rate,
@@ -189,9 +179,12 @@ class AudioVAE(torch.nn.Module):
waveform = self.device_manager.move_to_load_device(waveform)
expected_channels = self.autoencoder.encoder.in_channels
if waveform.shape[1] != expected_channels:
raise ValueError(
f"Input audio must have {expected_channels} channels, got {waveform.shape[1]}"
)
if waveform.shape[1] == 1:
waveform = waveform.expand(-1, expected_channels, *waveform.shape[2:])
else:
raise ValueError(
f"Input audio must have {expected_channels} channels, got {waveform.shape[1]}"
)
mel_spec = self.preprocessor.waveform_to_mel(
waveform, waveform_sample_rate, device=self.device_manager.load_device

View File

@@ -13,10 +13,53 @@ from comfy.ldm.modules.attention import optimized_attention_masked
from comfy.ldm.flux.layers import EmbedND
from comfy.ldm.flux.math import apply_rope
import comfy.patcher_extension
import comfy.utils
def modulate(x, scale):
return x * (1 + scale.unsqueeze(1))
def invert_slices(slices, length):
sorted_slices = sorted(slices)
result = []
current = 0
for start, end in sorted_slices:
if current < start:
result.append((current, start))
current = max(current, end)
if current < length:
result.append((current, length))
return result
def modulate(x, scale, timestep_zero_index=None):
if timestep_zero_index is None:
return x * (1 + scale.unsqueeze(1))
else:
scale = (1 + scale.unsqueeze(1))
actual_batch = scale.size(0) // 2
slices = timestep_zero_index
invert = invert_slices(timestep_zero_index, x.shape[1])
for s in slices:
x[:, s[0]:s[1]] *= scale[actual_batch:]
for s in invert:
x[:, s[0]:s[1]] *= scale[:actual_batch]
return x
def apply_gate(gate, x, timestep_zero_index=None):
if timestep_zero_index is None:
return gate * x
else:
actual_batch = gate.size(0) // 2
slices = timestep_zero_index
invert = invert_slices(timestep_zero_index, x.shape[1])
for s in slices:
x[:, s[0]:s[1]] *= gate[actual_batch:]
for s in invert:
x[:, s[0]:s[1]] *= gate[:actual_batch]
return x
#############################################################################
# Core NextDiT Model #
@@ -258,6 +301,7 @@ class JointTransformerBlock(nn.Module):
x_mask: torch.Tensor,
freqs_cis: torch.Tensor,
adaln_input: Optional[torch.Tensor]=None,
timestep_zero_index=None,
transformer_options={},
):
"""
@@ -276,18 +320,18 @@ class JointTransformerBlock(nn.Module):
assert adaln_input is not None
scale_msa, gate_msa, scale_mlp, gate_mlp = self.adaLN_modulation(adaln_input).chunk(4, dim=1)
x = x + gate_msa.unsqueeze(1).tanh() * self.attention_norm2(
x = x + apply_gate(gate_msa.unsqueeze(1).tanh(), self.attention_norm2(
clamp_fp16(self.attention(
modulate(self.attention_norm1(x), scale_msa),
modulate(self.attention_norm1(x), scale_msa, timestep_zero_index=timestep_zero_index),
x_mask,
freqs_cis,
transformer_options=transformer_options,
))
))), timestep_zero_index=timestep_zero_index
)
x = x + gate_mlp.unsqueeze(1).tanh() * self.ffn_norm2(
x = x + apply_gate(gate_mlp.unsqueeze(1).tanh(), self.ffn_norm2(
clamp_fp16(self.feed_forward(
modulate(self.ffn_norm1(x), scale_mlp),
))
modulate(self.ffn_norm1(x), scale_mlp, timestep_zero_index=timestep_zero_index),
))), timestep_zero_index=timestep_zero_index
)
else:
assert adaln_input is None
@@ -345,13 +389,37 @@ class FinalLayer(nn.Module):
),
)
def forward(self, x, c):
def forward(self, x, c, timestep_zero_index=None):
scale = self.adaLN_modulation(c)
x = modulate(self.norm_final(x), scale)
x = modulate(self.norm_final(x), scale, timestep_zero_index=timestep_zero_index)
x = self.linear(x)
return x
def pad_zimage(feats, pad_token, pad_tokens_multiple):
pad_extra = (-feats.shape[1]) % pad_tokens_multiple
return torch.cat((feats, pad_token.to(device=feats.device, dtype=feats.dtype, copy=True).unsqueeze(0).repeat(feats.shape[0], pad_extra, 1)), dim=1), pad_extra
def pos_ids_x(start_t, H_tokens, W_tokens, batch_size, device, transformer_options={}):
rope_options = transformer_options.get("rope_options", None)
h_scale = 1.0
w_scale = 1.0
h_start = 0
w_start = 0
if rope_options is not None:
h_scale = rope_options.get("scale_y", 1.0)
w_scale = rope_options.get("scale_x", 1.0)
h_start = rope_options.get("shift_y", 0.0)
w_start = rope_options.get("shift_x", 0.0)
x_pos_ids = torch.zeros((batch_size, H_tokens * W_tokens, 3), dtype=torch.float32, device=device)
x_pos_ids[:, :, 0] = start_t
x_pos_ids[:, :, 1] = (torch.arange(H_tokens, dtype=torch.float32, device=device) * h_scale + h_start).view(-1, 1).repeat(1, W_tokens).flatten()
x_pos_ids[:, :, 2] = (torch.arange(W_tokens, dtype=torch.float32, device=device) * w_scale + w_start).view(1, -1).repeat(H_tokens, 1).flatten()
return x_pos_ids
class NextDiT(nn.Module):
"""
Diffusion model with a Transformer backbone.
@@ -378,6 +446,7 @@ class NextDiT(nn.Module):
time_scale=1.0,
pad_tokens_multiple=None,
clip_text_dim=None,
siglip_feat_dim=None,
image_model=None,
device=None,
dtype=None,
@@ -491,6 +560,41 @@ class NextDiT(nn.Module):
for layer_id in range(n_layers)
]
)
if siglip_feat_dim is not None:
self.siglip_embedder = nn.Sequential(
operation_settings.get("operations").RMSNorm(siglip_feat_dim, eps=norm_eps, elementwise_affine=True, device=operation_settings.get("device"), dtype=operation_settings.get("dtype")),
operation_settings.get("operations").Linear(
siglip_feat_dim,
dim,
bias=True,
device=operation_settings.get("device"),
dtype=operation_settings.get("dtype"),
),
)
self.siglip_refiner = nn.ModuleList(
[
JointTransformerBlock(
layer_id,
dim,
n_heads,
n_kv_heads,
multiple_of,
ffn_dim_multiplier,
norm_eps,
qk_norm,
modulation=False,
operation_settings=operation_settings,
)
for layer_id in range(n_refiner_layers)
]
)
self.siglip_pad_token = nn.Parameter(torch.empty((1, dim), device=device, dtype=dtype))
else:
self.siglip_embedder = None
self.siglip_refiner = None
self.siglip_pad_token = None
# This norm final is in the lumina 2.0 code but isn't actually used for anything.
# self.norm_final = operation_settings.get("operations").RMSNorm(dim, eps=norm_eps, elementwise_affine=True, device=operation_settings.get("device"), dtype=operation_settings.get("dtype"))
self.final_layer = FinalLayer(dim, patch_size, self.out_channels, z_image_modulation=z_image_modulation, operation_settings=operation_settings)
@@ -531,70 +635,168 @@ class NextDiT(nn.Module):
imgs = torch.stack(imgs, dim=0)
return imgs
def patchify_and_embed(
self, x: List[torch.Tensor] | torch.Tensor, cap_feats: torch.Tensor, cap_mask: torch.Tensor, t: torch.Tensor, num_tokens, transformer_options={}
) -> Tuple[torch.Tensor, torch.Tensor, List[Tuple[int, int]], List[int], torch.Tensor]:
bsz = len(x)
pH = pW = self.patch_size
device = x[0].device
orig_x = x
if self.pad_tokens_multiple is not None:
pad_extra = (-cap_feats.shape[1]) % self.pad_tokens_multiple
cap_feats = torch.cat((cap_feats, self.cap_pad_token.to(device=cap_feats.device, dtype=cap_feats.dtype, copy=True).unsqueeze(0).repeat(cap_feats.shape[0], pad_extra, 1)), dim=1)
def embed_cap(self, cap_feats=None, offset=0, bsz=1, device=None, dtype=None):
if cap_feats is not None:
cap_feats = self.cap_embedder(cap_feats)
cap_feats_len = cap_feats.shape[1]
if self.pad_tokens_multiple is not None:
cap_feats, _ = pad_zimage(cap_feats, self.cap_pad_token, self.pad_tokens_multiple)
else:
cap_feats_len = 0
cap_feats = self.cap_pad_token.to(device=device, dtype=dtype, copy=True).unsqueeze(0).repeat(bsz, self.pad_tokens_multiple, 1)
cap_pos_ids = torch.zeros(bsz, cap_feats.shape[1], 3, dtype=torch.float32, device=device)
cap_pos_ids[:, :, 0] = torch.arange(cap_feats.shape[1], dtype=torch.float32, device=device) + 1.0
cap_pos_ids[:, :, 0] = torch.arange(cap_feats.shape[1], dtype=torch.float32, device=device) + 1.0 + offset
embeds = (cap_feats,)
freqs_cis = (self.rope_embedder(cap_pos_ids).movedim(1, 2),)
return embeds, freqs_cis, cap_feats_len
def embed_all(self, x, cap_feats=None, siglip_feats=None, offset=0, omni=False, transformer_options={}):
bsz = 1
pH = pW = self.patch_size
device = x.device
embeds, freqs_cis, cap_feats_len = self.embed_cap(cap_feats, offset=offset, bsz=bsz, device=device, dtype=x.dtype)
if (not omni) or self.siglip_embedder is None:
cap_feats_len = embeds[0].shape[1] + offset
embeds += (None,)
freqs_cis += (None,)
else:
cap_feats_len += offset
if siglip_feats is not None:
b, h, w, c = siglip_feats.shape
siglip_feats = siglip_feats.permute(0, 3, 1, 2).reshape(b, h * w, c)
siglip_feats = self.siglip_embedder(siglip_feats)
siglip_pos_ids = torch.zeros((bsz, siglip_feats.shape[1], 3), dtype=torch.float32, device=device)
siglip_pos_ids[:, :, 0] = cap_feats_len + 2
siglip_pos_ids[:, :, 1] = (torch.linspace(0, h * 8 - 1, steps=h, dtype=torch.float32, device=device).floor()).view(-1, 1).repeat(1, w).flatten()
siglip_pos_ids[:, :, 2] = (torch.linspace(0, w * 8 - 1, steps=w, dtype=torch.float32, device=device).floor()).view(1, -1).repeat(h, 1).flatten()
if self.siglip_pad_token is not None:
siglip_feats, pad_extra = pad_zimage(siglip_feats, self.siglip_pad_token, self.pad_tokens_multiple) # TODO: double check
siglip_pos_ids = torch.nn.functional.pad(siglip_pos_ids, (0, 0, 0, pad_extra))
else:
if self.siglip_pad_token is not None:
siglip_feats = self.siglip_pad_token.to(device=device, dtype=x.dtype, copy=True).unsqueeze(0).repeat(bsz, self.pad_tokens_multiple, 1)
siglip_pos_ids = torch.zeros((bsz, siglip_feats.shape[1], 3), dtype=torch.float32, device=device)
if siglip_feats is None:
embeds += (None,)
freqs_cis += (None,)
else:
embeds += (siglip_feats,)
freqs_cis += (self.rope_embedder(siglip_pos_ids).movedim(1, 2),)
B, C, H, W = x.shape
x = self.x_embedder(x.view(B, C, H // pH, pH, W // pW, pW).permute(0, 2, 4, 3, 5, 1).flatten(3).flatten(1, 2))
rope_options = transformer_options.get("rope_options", None)
h_scale = 1.0
w_scale = 1.0
h_start = 0
w_start = 0
if rope_options is not None:
h_scale = rope_options.get("scale_y", 1.0)
w_scale = rope_options.get("scale_x", 1.0)
h_start = rope_options.get("shift_y", 0.0)
w_start = rope_options.get("shift_x", 0.0)
H_tokens, W_tokens = H // pH, W // pW
x_pos_ids = torch.zeros((bsz, x.shape[1], 3), dtype=torch.float32, device=device)
x_pos_ids[:, :, 0] = cap_feats.shape[1] + 1
x_pos_ids[:, :, 1] = (torch.arange(H_tokens, dtype=torch.float32, device=device) * h_scale + h_start).view(-1, 1).repeat(1, W_tokens).flatten()
x_pos_ids[:, :, 2] = (torch.arange(W_tokens, dtype=torch.float32, device=device) * w_scale + w_start).view(1, -1).repeat(H_tokens, 1).flatten()
x_pos_ids = pos_ids_x(cap_feats_len + 1, H // pH, W // pW, bsz, device, transformer_options=transformer_options)
if self.pad_tokens_multiple is not None:
pad_extra = (-x.shape[1]) % self.pad_tokens_multiple
x = torch.cat((x, self.x_pad_token.to(device=x.device, dtype=x.dtype, copy=True).unsqueeze(0).repeat(x.shape[0], pad_extra, 1)), dim=1)
x, pad_extra = pad_zimage(x, self.x_pad_token, self.pad_tokens_multiple)
x_pos_ids = torch.nn.functional.pad(x_pos_ids, (0, 0, 0, pad_extra))
freqs_cis = self.rope_embedder(torch.cat((cap_pos_ids, x_pos_ids), dim=1)).movedim(1, 2)
embeds += (x,)
freqs_cis += (self.rope_embedder(x_pos_ids).movedim(1, 2),)
return embeds, freqs_cis, cap_feats_len + len(freqs_cis) - 1
def patchify_and_embed(
self, x: torch.Tensor, cap_feats: torch.Tensor, cap_mask: torch.Tensor, t: torch.Tensor, num_tokens, ref_latents=[], ref_contexts=[], siglip_feats=[], transformer_options={}
) -> Tuple[torch.Tensor, torch.Tensor, List[Tuple[int, int]], List[int], torch.Tensor]:
bsz = x.shape[0]
cap_mask = None # TODO?
main_siglip = None
orig_x = x
embeds = ([], [], [])
freqs_cis = ([], [], [])
leftover_cap = []
start_t = 0
omni = len(ref_latents) > 0
if omni:
for i, ref in enumerate(ref_latents):
if i < len(ref_contexts):
ref_con = ref_contexts[i]
else:
ref_con = None
if i < len(siglip_feats):
sig_feat = siglip_feats[i]
else:
sig_feat = None
out = self.embed_all(ref, ref_con, sig_feat, offset=start_t, omni=omni, transformer_options=transformer_options)
for i, e in enumerate(out[0]):
if e is not None:
embeds[i].append(comfy.utils.repeat_to_batch_size(e, bsz))
freqs_cis[i].append(out[1][i])
start_t = out[2]
leftover_cap = ref_contexts[len(ref_latents):]
H, W = x.shape[-2], x.shape[-1]
img_sizes = [(H, W)] * bsz
out = self.embed_all(x, cap_feats, main_siglip, offset=start_t, omni=omni, transformer_options=transformer_options)
img_len = out[0][-1].shape[1]
cap_len = out[0][0].shape[1]
for i, e in enumerate(out[0]):
if e is not None:
e = comfy.utils.repeat_to_batch_size(e, bsz)
embeds[i].append(e)
freqs_cis[i].append(out[1][i])
start_t = out[2]
for cap in leftover_cap:
out = self.embed_cap(cap, offset=start_t, bsz=bsz, device=x.device, dtype=x.dtype)
cap_len += out[0][0].shape[1]
embeds[0].append(comfy.utils.repeat_to_batch_size(out[0][0], bsz))
freqs_cis[0].append(out[1][0])
start_t += out[2]
patches = transformer_options.get("patches", {})
# refine context
cap_feats = torch.cat(embeds[0], dim=1)
cap_freqs_cis = torch.cat(freqs_cis[0], dim=1)
for layer in self.context_refiner:
cap_feats = layer(cap_feats, cap_mask, freqs_cis[:, :cap_pos_ids.shape[1]], transformer_options=transformer_options)
cap_feats = layer(cap_feats, cap_mask, cap_freqs_cis, transformer_options=transformer_options)
feats = (cap_feats,)
fc = (cap_freqs_cis,)
if omni and len(embeds[1]) > 0:
siglip_mask = None
siglip_feats_combined = torch.cat(embeds[1], dim=1)
siglip_feats_freqs_cis = torch.cat(freqs_cis[1], dim=1)
if self.siglip_refiner is not None:
for layer in self.siglip_refiner:
siglip_feats_combined = layer(siglip_feats_combined, siglip_mask, siglip_feats_freqs_cis, transformer_options=transformer_options)
feats += (siglip_feats_combined,)
fc += (siglip_feats_freqs_cis,)
padded_img_mask = None
x = torch.cat(embeds[-1], dim=1)
fc_x = torch.cat(freqs_cis[-1], dim=1)
if omni:
timestep_zero_index = [(x.shape[1] - img_len, x.shape[1])]
else:
timestep_zero_index = None
x_input = x
for i, layer in enumerate(self.noise_refiner):
x = layer(x, padded_img_mask, freqs_cis[:, cap_pos_ids.shape[1]:], t, transformer_options=transformer_options)
x = layer(x, padded_img_mask, fc_x, t, timestep_zero_index=timestep_zero_index, transformer_options=transformer_options)
if "noise_refiner" in patches:
for p in patches["noise_refiner"]:
out = p({"img": x, "img_input": x_input, "txt": cap_feats, "pe": freqs_cis[:, cap_pos_ids.shape[1]:], "vec": t, "x": orig_x, "block_index": i, "transformer_options": transformer_options, "block_type": "noise_refiner"})
out = p({"img": x, "img_input": x_input, "txt": cap_feats, "pe": fc_x, "vec": t, "x": orig_x, "block_index": i, "transformer_options": transformer_options, "block_type": "noise_refiner"})
if "img" in out:
x = out["img"]
padded_full_embed = torch.cat((cap_feats, x), dim=1)
padded_full_embed = torch.cat(feats + (x,), dim=1)
if timestep_zero_index is not None:
ind = padded_full_embed.shape[1] - x.shape[1]
timestep_zero_index = [(ind + x.shape[1] - img_len, ind + x.shape[1])]
timestep_zero_index.append((feats[0].shape[1] - cap_len, feats[0].shape[1]))
mask = None
img_sizes = [(H, W)] * bsz
l_effective_cap_len = [cap_feats.shape[1]] * bsz
return padded_full_embed, mask, img_sizes, l_effective_cap_len, freqs_cis
l_effective_cap_len = [padded_full_embed.shape[1] - img_len] * bsz
return padded_full_embed, mask, img_sizes, l_effective_cap_len, torch.cat(fc + (fc_x,), dim=1), timestep_zero_index
def forward(self, x, timesteps, context, num_tokens, attention_mask=None, **kwargs):
return comfy.patcher_extension.WrapperExecutor.new_class_executor(
@@ -604,7 +806,11 @@ class NextDiT(nn.Module):
).execute(x, timesteps, context, num_tokens, attention_mask, **kwargs)
# def forward(self, x, t, cap_feats, cap_mask):
def _forward(self, x, timesteps, context, num_tokens, attention_mask=None, transformer_options={}, **kwargs):
def _forward(self, x, timesteps, context, num_tokens, attention_mask=None, ref_latents=[], ref_contexts=[], siglip_feats=[], transformer_options={}, **kwargs):
omni = len(ref_latents) > 0
if omni:
timesteps = torch.cat([timesteps * 0, timesteps], dim=0)
t = 1.0 - timesteps
cap_feats = context
cap_mask = attention_mask
@@ -619,8 +825,6 @@ class NextDiT(nn.Module):
t = self.t_embedder(t * self.time_scale, dtype=x.dtype) # (N, D)
adaln_input = t
cap_feats = self.cap_embedder(cap_feats) # (N, L, D) # todo check if able to batchify w.o. redundant compute
if self.clip_text_pooled_proj is not None:
pooled = kwargs.get("clip_text_pooled", None)
if pooled is not None:
@@ -632,7 +836,7 @@ class NextDiT(nn.Module):
patches = transformer_options.get("patches", {})
x_is_tensor = isinstance(x, torch.Tensor)
img, mask, img_size, cap_size, freqs_cis = self.patchify_and_embed(x, cap_feats, cap_mask, adaln_input, num_tokens, transformer_options=transformer_options)
img, mask, img_size, cap_size, freqs_cis, timestep_zero_index = self.patchify_and_embed(x, cap_feats, cap_mask, adaln_input, num_tokens, ref_latents=ref_latents, ref_contexts=ref_contexts, siglip_feats=siglip_feats, transformer_options=transformer_options)
freqs_cis = freqs_cis.to(img.device)
transformer_options["total_blocks"] = len(self.layers)
@@ -640,7 +844,7 @@ class NextDiT(nn.Module):
img_input = img
for i, layer in enumerate(self.layers):
transformer_options["block_index"] = i
img = layer(img, mask, freqs_cis, adaln_input, transformer_options=transformer_options)
img = layer(img, mask, freqs_cis, adaln_input, timestep_zero_index=timestep_zero_index, transformer_options=transformer_options)
if "double_block" in patches:
for p in patches["double_block"]:
out = p({"img": img[:, cap_size[0]:], "img_input": img_input[:, cap_size[0]:], "txt": img[:, :cap_size[0]], "pe": freqs_cis[:, cap_size[0]:], "vec": adaln_input, "x": x, "block_index": i, "transformer_options": transformer_options})
@@ -649,8 +853,7 @@ class NextDiT(nn.Module):
if "txt" in out:
img[:, :cap_size[0]] = out["txt"]
img = self.final_layer(img, adaln_input)
img = self.final_layer(img, adaln_input, timestep_zero_index=timestep_zero_index)
img = self.unpatchify(img, img_size, cap_size, return_tensor=x_is_tensor)[:, :, :h, :w]
return -img

View File

@@ -1150,6 +1150,7 @@ class CosmosPredict2(BaseModel):
class Lumina2(BaseModel):
def __init__(self, model_config, model_type=ModelType.FLOW, device=None):
super().__init__(model_config, model_type, device=device, unet_model=comfy.ldm.lumina.model.NextDiT)
self.memory_usage_factor_conds = ("ref_latents",)
def extra_conds(self, **kwargs):
out = super().extra_conds(**kwargs)
@@ -1169,6 +1170,35 @@ class Lumina2(BaseModel):
if clip_text_pooled is not None:
out['clip_text_pooled'] = comfy.conds.CONDRegular(clip_text_pooled)
clip_vision_outputs = kwargs.get("clip_vision_outputs", list(map(lambda a: a.get("clip_vision_output"), kwargs.get("unclip_conditioning", [{}])))) # Z Image omni
if clip_vision_outputs is not None and len(clip_vision_outputs) > 0:
sigfeats = []
for clip_vision_output in clip_vision_outputs:
if clip_vision_output is not None:
image_size = clip_vision_output.image_sizes[0]
shape = clip_vision_output.last_hidden_state.shape
sigfeats.append(clip_vision_output.last_hidden_state.reshape(shape[0], image_size[1] // 16, image_size[2] // 16, shape[-1]))
if len(sigfeats) > 0:
out['siglip_feats'] = comfy.conds.CONDList(sigfeats)
ref_latents = kwargs.get("reference_latents", None)
if ref_latents is not None:
latents = []
for lat in ref_latents:
latents.append(self.process_latent_in(lat))
out['ref_latents'] = comfy.conds.CONDList(latents)
ref_contexts = kwargs.get("reference_latents_text_embeds", None)
if ref_contexts is not None:
out['ref_contexts'] = comfy.conds.CONDList(ref_contexts)
return out
def extra_conds_shapes(self, **kwargs):
out = {}
ref_latents = kwargs.get("reference_latents", None)
if ref_latents is not None:
out['ref_latents'] = list([1, 16, sum(map(lambda a: math.prod(a.size()[2:]), ref_latents))])
return out
class WAN21(BaseModel):

View File

@@ -253,7 +253,7 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
dit_config["image_model"] = "chroma_radiance"
dit_config["in_channels"] = 3
dit_config["out_channels"] = 3
dit_config["patch_size"] = 16
dit_config["patch_size"] = state_dict.get('{}img_in_patch.weight'.format(key_prefix)).size(dim=-1)
dit_config["nerf_hidden_size"] = 64
dit_config["nerf_mlp_ratio"] = 4
dit_config["nerf_depth"] = 4
@@ -446,6 +446,9 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
dit_config["time_scale"] = 1000.0
if '{}cap_pad_token'.format(key_prefix) in state_dict_keys:
dit_config["pad_tokens_multiple"] = 32
sig_weight = state_dict.get('{}siglip_embedder.0.weight'.format(key_prefix), None)
if sig_weight is not None:
dit_config["siglip_feat_dim"] = sig_weight.shape[0]
return dit_config

View File

@@ -77,6 +77,28 @@ class Qwen25_3BConfig:
rope_scale = None
final_norm: bool = True
@dataclass
class Qwen3_06BConfig:
vocab_size: int = 151936
hidden_size: int = 1024
intermediate_size: int = 3072
num_hidden_layers: int = 28
num_attention_heads: int = 16
num_key_value_heads: int = 8
max_position_embeddings: int = 32768
rms_norm_eps: float = 1e-6
rope_theta: float = 1000000.0
transformer_type: str = "llama"
head_dim = 128
rms_norm_add = False
mlp_activation = "silu"
qkv_bias = False
rope_dims = None
q_norm = "gemma3"
k_norm = "gemma3"
rope_scale = None
final_norm: bool = True
@dataclass
class Qwen3_4BConfig:
vocab_size: int = 151936
@@ -641,6 +663,15 @@ class Qwen25_3B(BaseLlama, torch.nn.Module):
self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
self.dtype = dtype
class Qwen3_06B(BaseLlama, torch.nn.Module):
def __init__(self, config_dict, dtype, device, operations):
super().__init__()
config = Qwen3_06BConfig(**config_dict)
self.num_layers = config.num_hidden_layers
self.model = Llama2_(config, device=device, dtype=dtype, ops=operations)
self.dtype = dtype
class Qwen3_4B(BaseLlama, torch.nn.Module):
def __init__(self, config_dict, dtype, device, operations):
super().__init__()

View File

@@ -61,6 +61,7 @@ def te(dtype_llama=None, llama_quantization_metadata=None):
if dtype_llama is not None:
dtype = dtype_llama
if llama_quantization_metadata is not None:
model_options = model_options.copy()
model_options["quantization_metadata"] = llama_quantization_metadata
super().__init__(device=device, dtype=dtype, model_options=model_options)
return OvisTEModel_

View File

@@ -40,6 +40,7 @@ def te(dtype_llama=None, llama_quantization_metadata=None):
if dtype_llama is not None:
dtype = dtype_llama
if llama_quantization_metadata is not None:
model_options = model_options.copy()
model_options["quantization_metadata"] = llama_quantization_metadata
super().__init__(device=device, dtype=dtype, model_options=model_options)
return ZImageTEModel_

View File

@@ -611,6 +611,14 @@ def flux_to_diffusers(mmdit_config, output_prefix=""):
"ff_context.net.0.proj.bias": "txt_mlp.0.bias",
"ff_context.net.2.weight": "txt_mlp.2.weight",
"ff_context.net.2.bias": "txt_mlp.2.bias",
"ff.linear_in.weight": "img_mlp.0.weight", # LyCoris LoKr
"ff.linear_in.bias": "img_mlp.0.bias",
"ff.linear_out.weight": "img_mlp.2.weight",
"ff.linear_out.bias": "img_mlp.2.bias",
"ff_context.linear_in.weight": "txt_mlp.0.weight",
"ff_context.linear_in.bias": "txt_mlp.0.bias",
"ff_context.linear_out.weight": "txt_mlp.2.weight",
"ff_context.linear_out.bias": "txt_mlp.2.bias",
"attn.norm_q.weight": "img_attn.norm.query_norm.scale",
"attn.norm_k.weight": "img_attn.norm.key_norm.scale",
"attn.norm_added_q.weight": "txt_attn.norm.query_norm.scale",
@@ -639,6 +647,8 @@ def flux_to_diffusers(mmdit_config, output_prefix=""):
"proj_out.bias": "linear2.bias",
"attn.norm_q.weight": "norm.query_norm.scale",
"attn.norm_k.weight": "norm.key_norm.scale",
"attn.to_qkv_mlp_proj.weight": "linear1.weight", # Flux 2
"attn.to_out.weight": "linear2.weight", # Flux 2
}
for k in block_map:

View File

@@ -374,7 +374,7 @@ class VideoFromComponents(VideoInput):
if audio_stream and self.__components.audio:
waveform = self.__components.audio['waveform']
waveform = waveform[:, :, :math.ceil((audio_sample_rate / frame_rate) * self.__components.images.shape[0])]
frame = av.AudioFrame.from_ndarray(waveform.movedim(2, 1).reshape(1, -1).float().numpy(), format='flt', layout='mono' if waveform.shape[1] == 1 else 'stereo')
frame = av.AudioFrame.from_ndarray(waveform.movedim(2, 1).reshape(1, -1).float().cpu().numpy(), format='flt', layout='mono' if waveform.shape[1] == 1 else 'stereo')
frame.sample_rate = audio_sample_rate
frame.pts = 0
output.mux(audio_stream.encode(frame))

View File

@@ -1000,20 +1000,38 @@ class Autogrow(ComfyTypeI):
names = [f"{prefix}{i}" for i in range(max)]
# need to create a new input based on the contents of input
template_input = None
for _, dict_input in input.items():
# for now, get just the first value from dict_input
template_required = True
for _input_type, dict_input in input.items():
# for now, get just the first value from dict_input; if not required, min can be ignored
if len(dict_input) == 0:
continue
template_input = list(dict_input.values())[0]
template_required = _input_type == "required"
break
if template_input is None:
raise Exception("template_input could not be determined from required or optional; this should never happen.")
new_dict = {}
new_dict_added_to = False
# first, add possible inputs into out_dict
for i, name in enumerate(names):
expected_id = finalize_prefix(curr_prefix, name)
# required
if i < min and template_required:
out_dict["required"][expected_id] = template_input
type_dict = new_dict.setdefault("required", {})
# optional
else:
out_dict["optional"][expected_id] = template_input
type_dict = new_dict.setdefault("optional", {})
if expected_id in live_inputs:
# required
if i < min:
type_dict = new_dict.setdefault("required", {})
# optional
else:
type_dict = new_dict.setdefault("optional", {})
# NOTE: prefix gets added in parse_class_inputs
type_dict[name] = template_input
new_dict_added_to = True
# account for the edge case that all inputs are optional and no values are received
if not new_dict_added_to:
finalized_prefix = finalize_prefix(curr_prefix)
out_dict["dynamic_paths"][finalized_prefix] = finalized_prefix
out_dict["dynamic_paths_default_value"][finalized_prefix] = DynamicPathsDefaultValue.EMPTY_DICT
parse_class_inputs(out_dict, live_inputs, new_dict, curr_prefix)
@comfytype(io_type="COMFY_DYNAMICCOMBO_V3")
@@ -1151,6 +1169,8 @@ class V3Data(TypedDict):
'Dictionary where the keys are the hidden input ids and the values are the values of the hidden inputs.'
dynamic_paths: dict[str, Any]
'Dictionary where the keys are the input ids and the values dictate how to turn the inputs into a nested dictionary.'
dynamic_paths_default_value: dict[str, Any]
'Dictionary where the keys are the input ids and the values are a string from DynamicPathsDefaultValue for the inputs if value is None.'
create_dynamic_tuple: bool
'When True, the value of the dynamic input will be in the format (value, path_key).'
@@ -1229,6 +1249,7 @@ class NodeInfoV1:
experimental: bool=None
api_node: bool=None
price_badge: dict | None = None
search_aliases: list[str]=None
@dataclass
class NodeInfoV3:
@@ -1326,6 +1347,8 @@ class Schema:
hidden: list[Hidden] = field(default_factory=list)
description: str=""
"""Node description, shown as a tooltip when hovering over the node."""
search_aliases: list[str] = field(default_factory=list)
"""Alternative names for search. Useful for synonyms, abbreviations, or old names after renaming."""
is_input_list: bool = False
"""A flag indicating if this node implements the additional code necessary to deal with OUTPUT_IS_LIST nodes.
@@ -1463,6 +1486,7 @@ class Schema:
api_node=self.is_api_node,
python_module=getattr(cls, "RELATIVE_PYTHON_MODULE", "nodes"),
price_badge=self.price_badge.as_dict(self.inputs) if self.price_badge is not None else None,
search_aliases=self.search_aliases if self.search_aliases else None,
)
return info
@@ -1504,6 +1528,7 @@ def get_finalized_class_inputs(d: dict[str, Any], live_inputs: dict[str, Any], i
"required": {},
"optional": {},
"dynamic_paths": {},
"dynamic_paths_default_value": {},
}
d = d.copy()
# ignore hidden for parsing
@@ -1513,8 +1538,12 @@ def get_finalized_class_inputs(d: dict[str, Any], live_inputs: dict[str, Any], i
out_dict["hidden"] = hidden
v3_data = {}
dynamic_paths = out_dict.pop("dynamic_paths", None)
if dynamic_paths is not None:
if dynamic_paths is not None and len(dynamic_paths) > 0:
v3_data["dynamic_paths"] = dynamic_paths
# this list is used for autogrow, in the case all inputs are optional and no values are passed
dynamic_paths_default_value = out_dict.pop("dynamic_paths_default_value", None)
if dynamic_paths_default_value is not None and len(dynamic_paths_default_value) > 0:
v3_data["dynamic_paths_default_value"] = dynamic_paths_default_value
return out_dict, hidden, v3_data
def parse_class_inputs(out_dict: dict[str, Any], live_inputs: dict[str, Any], curr_dict: dict[str, Any], curr_prefix: list[str] | None=None) -> None:
@@ -1551,11 +1580,16 @@ def add_to_dict_v1(i: Input, d: dict):
def add_to_dict_v3(io: Input | Output, d: dict):
d[io.id] = (io.get_io_type(), io.as_dict())
class DynamicPathsDefaultValue:
EMPTY_DICT = "empty_dict"
def build_nested_inputs(values: dict[str, Any], v3_data: V3Data):
paths = v3_data.get("dynamic_paths", None)
default_value_dict = v3_data.get("dynamic_paths_default_value", {})
if paths is None:
return values
values = values.copy()
result = {}
create_tuple = v3_data.get("create_dynamic_tuple", False)
@@ -1569,6 +1603,11 @@ def build_nested_inputs(values: dict[str, Any], v3_data: V3Data):
if is_last:
value = values.pop(key, None)
if value is None:
# see if a default value was provided for this key
default_option = default_value_dict.get(key, None)
if default_option == DynamicPathsDefaultValue.EMPTY_DICT:
value = {}
if create_tuple:
value = (value, key)
current[p] = value

View File

@@ -1,65 +0,0 @@
# ComfyUI API Nodes
## Introduction
Below are a collection of nodes that work by calling external APIs. More information available in our [docs](https://docs.comfy.org/tutorials/api-nodes/overview).
## Development
While developing, you should be testing against the Staging environment. To test against staging:
**Install ComfyUI_frontend**
Follow the instructions [here](https://github.com/Comfy-Org/ComfyUI_frontend) to start the frontend server. By default, it will connect to Staging authentication.
> **Hint:** If you use --front-end-version argument for ComfyUI, it will use production authentication.
```bash
python run main.py --comfy-api-base https://stagingapi.comfy.org
```
To authenticate to staging, please login and then ask one of Comfy Org team to whitelist you for access to staging.
API stubs are generated through automatic codegen tools from OpenAPI definitions. Since the Comfy Org OpenAPI definition contains many things from the Comfy Registry as well, we use redocly/cli to filter out only the paths relevant for API nodes.
### Redocly Instructions
**Tip**
When developing locally, use the `redocly-dev.yaml` file to generate pydantic models. This lets you use stubs for APIs that are not marked `Released` yet.
Before your API node PR merges, make sure to add the `Released` tag to the `openapi.yaml` file and test in staging.
```bash
# Download the OpenAPI file from staging server.
curl -o openapi.yaml https://stagingapi.comfy.org/openapi
# Filter out unneeded API definitions.
npm install -g @redocly/cli
redocly bundle openapi.yaml --output filtered-openapi.yaml --config comfy_api_nodes/redocly-dev.yaml --remove-unused-components
# Generate the pydantic datamodels for validation.
datamodel-codegen --use-subclass-enum --field-constraints --strict-types bytes --input filtered-openapi.yaml --output comfy_api_nodes/apis/__init__.py --output-model-type pydantic_v2.BaseModel
```
# Merging to Master
Before merging to comfyanonymous/ComfyUI master, follow these steps:
1. Add the "Released" tag to the ComfyUI OpenAPI yaml file for each endpoint you are using in the nodes.
1. Make sure the ComfyUI API is deployed to prod with your changes.
1. Run the code generation again with `redocly.yaml` and the production OpenAPI yaml file.
```bash
# Download the OpenAPI file from prod server.
curl -o openapi.yaml https://api.comfy.org/openapi
# Filter out unneeded API definitions.
npm install -g @redocly/cli
redocly bundle openapi.yaml --output filtered-openapi.yaml --config comfy_api_nodes/redocly.yaml --remove-unused-components
# Generate the pydantic datamodels for validation.
datamodel-codegen --use-subclass-enum --field-constraints --strict-types bytes --input filtered-openapi.yaml --output comfy_api_nodes/apis/__init__.py --output-model-type pydantic_v2.BaseModel
```

View File

@@ -0,0 +1,61 @@
from typing import TypedDict
from pydantic import BaseModel, Field
class InputModerationSettings(TypedDict):
prompt_content_moderation: bool
visual_input_moderation: bool
visual_output_moderation: bool
class BriaEditImageRequest(BaseModel):
instruction: str | None = Field(...)
structured_instruction: str | None = Field(
...,
description="Use this instead of instruction for precise, programmatic control.",
)
images: list[str] = Field(
...,
description="Required. Publicly available URL or Base64-encoded. Must contain exactly one item.",
)
mask: str | None = Field(
None,
description="Mask image (black and white). Black areas will be preserved, white areas will be edited. "
"If omitted, the edit applies to the entire image. "
"The input image and the the input mask must be of the same size.",
)
negative_prompt: str | None = Field(None)
guidance_scale: float = Field(...)
model_version: str = Field(...)
steps_num: int = Field(...)
seed: int = Field(...)
ip_signal: bool = Field(
False,
description="If true, returns a warning for potential IP content in the instruction.",
)
prompt_content_moderation: bool = Field(
False, description="If true, returns 422 on instruction moderation failure."
)
visual_input_content_moderation: bool = Field(
False, description="If true, returns 422 on images or mask moderation failure."
)
visual_output_content_moderation: bool = Field(
False, description="If true, returns 422 on visual output moderation failure."
)
class BriaStatusResponse(BaseModel):
request_id: str = Field(...)
status_url: str = Field(...)
warning: str | None = Field(None)
class BriaResult(BaseModel):
structured_prompt: str = Field(...)
image_url: str = Field(...)
class BriaResponse(BaseModel):
status: str = Field(...)
result: BriaResult | None = Field(None)

View File

@@ -0,0 +1,292 @@
from enum import Enum
from typing import Optional, List, Dict, Any, Union
from datetime import datetime
from pydantic import BaseModel, Field, RootModel, StrictBytes
class IdeogramColorPalette1(BaseModel):
name: str = Field(..., description='Name of the preset color palette')
class Member(BaseModel):
color: Optional[str] = Field(
None, description='Hexadecimal color code', pattern='^#[0-9A-Fa-f]{6}$'
)
weight: Optional[float] = Field(
None, description='Optional weight for the color (0-1)', ge=0.0, le=1.0
)
class IdeogramColorPalette2(BaseModel):
members: List[Member] = Field(
..., description='Array of color definitions with optional weights'
)
class IdeogramColorPalette(
RootModel[Union[IdeogramColorPalette1, IdeogramColorPalette2]]
):
root: Union[IdeogramColorPalette1, IdeogramColorPalette2] = Field(
...,
description='A color palette specification that can either use a preset name or explicit color definitions with weights',
)
class ImageRequest(BaseModel):
aspect_ratio: Optional[str] = Field(
None,
description="Optional. The aspect ratio (e.g., 'ASPECT_16_9', 'ASPECT_1_1'). Cannot be used with resolution. Defaults to 'ASPECT_1_1' if unspecified.",
)
color_palette: Optional[Dict[str, Any]] = Field(
None, description='Optional. Color palette object. Only for V_2, V_2_TURBO.'
)
magic_prompt_option: Optional[str] = Field(
None, description="Optional. MagicPrompt usage ('AUTO', 'ON', 'OFF')."
)
model: str = Field(..., description="The model used (e.g., 'V_2', 'V_2A_TURBO')")
negative_prompt: Optional[str] = Field(
None,
description='Optional. Description of what to exclude. Only for V_1, V_1_TURBO, V_2, V_2_TURBO.',
)
num_images: Optional[int] = Field(
1,
description='Optional. Number of images to generate (1-8). Defaults to 1.',
ge=1,
le=8,
)
prompt: str = Field(
..., description='Required. The prompt to use to generate the image.'
)
resolution: Optional[str] = Field(
None,
description="Optional. Resolution (e.g., 'RESOLUTION_1024_1024'). Only for model V_2. Cannot be used with aspect_ratio.",
)
seed: Optional[int] = Field(
None,
description='Optional. A number between 0 and 2147483647.',
ge=0,
le=2147483647,
)
style_type: Optional[str] = Field(
None,
description="Optional. Style type ('AUTO', 'GENERAL', 'REALISTIC', 'DESIGN', 'RENDER_3D', 'ANIME'). Only for models V_2 and above.",
)
class IdeogramGenerateRequest(BaseModel):
image_request: ImageRequest = Field(
..., description='The image generation request parameters.'
)
class Datum(BaseModel):
is_image_safe: Optional[bool] = Field(
None, description='Indicates whether the image is considered safe.'
)
prompt: Optional[str] = Field(
None, description='The prompt used to generate this image.'
)
resolution: Optional[str] = Field(
None, description="The resolution of the generated image (e.g., '1024x1024')."
)
seed: Optional[int] = Field(
None, description='The seed value used for this generation.'
)
style_type: Optional[str] = Field(
None,
description="The style type used for generation (e.g., 'REALISTIC', 'ANIME').",
)
url: Optional[str] = Field(None, description='URL to the generated image.')
class IdeogramGenerateResponse(BaseModel):
created: Optional[datetime] = Field(
None, description='Timestamp when the generation was created.'
)
data: Optional[List[Datum]] = Field(
None, description='Array of generated image information.'
)
class StyleCode(RootModel[str]):
root: str = Field(..., pattern='^[0-9A-Fa-f]{8}$')
class Datum1(BaseModel):
is_image_safe: Optional[bool] = None
prompt: Optional[str] = None
resolution: Optional[str] = None
seed: Optional[int] = None
style_type: Optional[str] = None
url: Optional[str] = None
class IdeogramV3IdeogramResponse(BaseModel):
created: Optional[datetime] = None
data: Optional[List[Datum1]] = None
class RenderingSpeed1(str, Enum):
TURBO = 'TURBO'
DEFAULT = 'DEFAULT'
QUALITY = 'QUALITY'
class IdeogramV3ReframeRequest(BaseModel):
color_palette: Optional[Dict[str, Any]] = None
image: Optional[StrictBytes] = None
num_images: Optional[int] = Field(None, ge=1, le=8)
rendering_speed: Optional[RenderingSpeed1] = None
resolution: str
seed: Optional[int] = Field(None, ge=0, le=2147483647)
style_codes: Optional[List[str]] = None
style_reference_images: Optional[List[StrictBytes]] = None
class MagicPrompt(str, Enum):
AUTO = 'AUTO'
ON = 'ON'
OFF = 'OFF'
class StyleType(str, Enum):
AUTO = 'AUTO'
GENERAL = 'GENERAL'
REALISTIC = 'REALISTIC'
DESIGN = 'DESIGN'
class IdeogramV3RemixRequest(BaseModel):
aspect_ratio: Optional[str] = None
color_palette: Optional[Dict[str, Any]] = None
image: Optional[StrictBytes] = None
image_weight: Optional[int] = Field(50, ge=1, le=100)
magic_prompt: Optional[MagicPrompt] = None
negative_prompt: Optional[str] = None
num_images: Optional[int] = Field(None, ge=1, le=8)
prompt: str
rendering_speed: Optional[RenderingSpeed1] = None
resolution: Optional[str] = None
seed: Optional[int] = Field(None, ge=0, le=2147483647)
style_codes: Optional[List[str]] = None
style_reference_images: Optional[List[StrictBytes]] = None
style_type: Optional[StyleType] = None
class IdeogramV3ReplaceBackgroundRequest(BaseModel):
color_palette: Optional[Dict[str, Any]] = None
image: Optional[StrictBytes] = None
magic_prompt: Optional[MagicPrompt] = None
num_images: Optional[int] = Field(None, ge=1, le=8)
prompt: str
rendering_speed: Optional[RenderingSpeed1] = None
seed: Optional[int] = Field(None, ge=0, le=2147483647)
style_codes: Optional[List[str]] = None
style_reference_images: Optional[List[StrictBytes]] = None
class ColorPalette(BaseModel):
name: str = Field(..., description='Name of the color palette', examples=['PASTEL'])
class MagicPrompt2(str, Enum):
ON = 'ON'
OFF = 'OFF'
class StyleType1(str, Enum):
AUTO = 'AUTO'
GENERAL = 'GENERAL'
REALISTIC = 'REALISTIC'
DESIGN = 'DESIGN'
FICTION = 'FICTION'
class RenderingSpeed(str, Enum):
DEFAULT = 'DEFAULT'
TURBO = 'TURBO'
QUALITY = 'QUALITY'
class IdeogramV3EditRequest(BaseModel):
color_palette: Optional[IdeogramColorPalette] = None
image: Optional[StrictBytes] = Field(
None,
description='The image being edited (max size 10MB); only JPEG, WebP and PNG formats are supported at this time.',
)
magic_prompt: Optional[str] = Field(
None,
description='Determine if MagicPrompt should be used in generating the request or not.',
)
mask: Optional[StrictBytes] = Field(
None,
description='A black and white image of the same size as the image being edited (max size 10MB). Black regions in the mask should match up with the regions of the image that you would like to edit; only JPEG, WebP and PNG formats are supported at this time.',
)
num_images: Optional[int] = Field(
None, description='The number of images to generate.'
)
prompt: str = Field(
..., description='The prompt used to describe the edited result.'
)
rendering_speed: RenderingSpeed
seed: Optional[int] = Field(
None, description='Random seed. Set for reproducible generation.'
)
style_codes: Optional[List[StyleCode]] = Field(
None,
description='A list of 8 character hexadecimal codes representing the style of the image. Cannot be used in conjunction with style_reference_images or style_type.',
)
style_reference_images: Optional[List[StrictBytes]] = Field(
None,
description='A set of images to use as style references (maximum total size 10MB across all style references). The images should be in JPEG, PNG or WebP format.',
)
character_reference_images: Optional[List[str]] = Field(
None,
description='Generations with character reference are subject to the character reference pricing. A set of images to use as character references (maximum total size 10MB across all character references), currently only supports 1 character reference image. The images should be in JPEG, PNG or WebP format.'
)
character_reference_images_mask: Optional[List[str]] = Field(
None,
description='Optional masks for character reference images. When provided, must match the number of character_reference_images. Each mask should be a grayscale image of the same dimensions as the corresponding character reference image. The images should be in JPEG, PNG or WebP format.'
)
class IdeogramV3Request(BaseModel):
aspect_ratio: Optional[str] = Field(
None, description='Aspect ratio in format WxH', examples=['1x3']
)
color_palette: Optional[ColorPalette] = None
magic_prompt: Optional[MagicPrompt2] = Field(
None, description='Whether to enable magic prompt enhancement'
)
negative_prompt: Optional[str] = Field(
None, description='Text prompt specifying what to avoid in the generation'
)
num_images: Optional[int] = Field(
None, description='Number of images to generate', ge=1
)
prompt: str = Field(..., description='The text prompt for image generation')
rendering_speed: RenderingSpeed
resolution: Optional[str] = Field(
None, description='Image resolution in format WxH', examples=['1280x800']
)
seed: Optional[int] = Field(
None, description='Seed value for reproducible generation'
)
style_codes: Optional[List[StyleCode]] = Field(
None, description='Array of style codes in hexadecimal format'
)
style_reference_images: Optional[List[str]] = Field(
None, description='Array of reference image URLs or identifiers'
)
style_type: Optional[StyleType1] = Field(
None, description='The type of style to apply'
)
character_reference_images: Optional[List[str]] = Field(
None,
description='Generations with character reference are subject to the character reference pricing. A set of images to use as character references (maximum total size 10MB across all character references), currently only supports 1 character reference image. The images should be in JPEG, PNG or WebP format.'
)
character_reference_images_mask: Optional[List[str]] = Field(
None,
description='Optional masks for character reference images. When provided, must match the number of character_reference_images. Each mask should be a grayscale image of the same dimensions as the corresponding character reference image. The images should be in JPEG, PNG or WebP format.'
)

View File

@@ -0,0 +1,152 @@
from enum import Enum
from typing import Optional, Dict, Any
from pydantic import BaseModel, Field, StrictBytes
class MoonvalleyPromptResponse(BaseModel):
error: Optional[Dict[str, Any]] = None
frame_conditioning: Optional[Dict[str, Any]] = None
id: Optional[str] = None
inference_params: Optional[Dict[str, Any]] = None
meta: Optional[Dict[str, Any]] = None
model_params: Optional[Dict[str, Any]] = None
output_url: Optional[str] = None
prompt_text: Optional[str] = None
status: Optional[str] = None
class MoonvalleyTextToVideoInferenceParams(BaseModel):
add_quality_guidance: Optional[bool] = Field(
True, description='Whether to add quality guidance'
)
caching_coefficient: Optional[float] = Field(
0.3, description='Caching coefficient for optimization'
)
caching_cooldown: Optional[int] = Field(
3, description='Number of caching cooldown steps'
)
caching_warmup: Optional[int] = Field(
3, description='Number of caching warmup steps'
)
clip_value: Optional[float] = Field(
3, description='CLIP value for generation control'
)
conditioning_frame_index: Optional[int] = Field(
0, description='Index of the conditioning frame'
)
cooldown_steps: Optional[int] = Field(
75, description='Number of cooldown steps (calculated based on num_frames)'
)
fps: Optional[int] = Field(
24, description='Frames per second of the generated video'
)
guidance_scale: Optional[float] = Field(
10, description='Guidance scale for generation control'
)
height: Optional[int] = Field(
1080, description='Height of the generated video in pixels'
)
negative_prompt: Optional[str] = Field(None, description='Negative prompt text')
num_frames: Optional[int] = Field(64, description='Number of frames to generate')
seed: Optional[int] = Field(
None, description='Random seed for generation (default: random)'
)
shift_value: Optional[float] = Field(
3, description='Shift value for generation control'
)
steps: Optional[int] = Field(80, description='Number of denoising steps')
use_guidance_schedule: Optional[bool] = Field(
True, description='Whether to use guidance scheduling'
)
use_negative_prompts: Optional[bool] = Field(
False, description='Whether to use negative prompts'
)
use_timestep_transform: Optional[bool] = Field(
True, description='Whether to use timestep transformation'
)
warmup_steps: Optional[int] = Field(
0, description='Number of warmup steps (calculated based on num_frames)'
)
width: Optional[int] = Field(
1920, description='Width of the generated video in pixels'
)
class MoonvalleyTextToVideoRequest(BaseModel):
image_url: Optional[str] = None
inference_params: Optional[MoonvalleyTextToVideoInferenceParams] = None
prompt_text: Optional[str] = None
webhook_url: Optional[str] = None
class MoonvalleyUploadFileRequest(BaseModel):
file: Optional[StrictBytes] = None
class MoonvalleyUploadFileResponse(BaseModel):
access_url: Optional[str] = None
class MoonvalleyVideoToVideoInferenceParams(BaseModel):
add_quality_guidance: Optional[bool] = Field(
True, description='Whether to add quality guidance'
)
caching_coefficient: Optional[float] = Field(
0.3, description='Caching coefficient for optimization'
)
caching_cooldown: Optional[int] = Field(
3, description='Number of caching cooldown steps'
)
caching_warmup: Optional[int] = Field(
3, description='Number of caching warmup steps'
)
clip_value: Optional[float] = Field(
3, description='CLIP value for generation control'
)
conditioning_frame_index: Optional[int] = Field(
0, description='Index of the conditioning frame'
)
cooldown_steps: Optional[int] = Field(
36, description='Number of cooldown steps (calculated based on num_frames)'
)
guidance_scale: Optional[float] = Field(
15, description='Guidance scale for generation control'
)
negative_prompt: Optional[str] = Field(None, description='Negative prompt text')
seed: Optional[int] = Field(
None, description='Random seed for generation (default: random)'
)
shift_value: Optional[float] = Field(
3, description='Shift value for generation control'
)
steps: Optional[int] = Field(80, description='Number of denoising steps')
use_guidance_schedule: Optional[bool] = Field(
True, description='Whether to use guidance scheduling'
)
use_negative_prompts: Optional[bool] = Field(
False, description='Whether to use negative prompts'
)
use_timestep_transform: Optional[bool] = Field(
True, description='Whether to use timestep transformation'
)
warmup_steps: Optional[int] = Field(
24, description='Number of warmup steps (calculated based on num_frames)'
)
class ControlType(str, Enum):
motion_control = 'motion_control'
pose_control = 'pose_control'
class MoonvalleyVideoToVideoRequest(BaseModel):
control_type: ControlType = Field(
..., description='Supported types for video control'
)
inference_params: Optional[MoonvalleyVideoToVideoInferenceParams] = None
prompt_text: str = Field(..., description='Describes the video to generate')
video_url: str = Field(..., description='Url to control video')
webhook_url: Optional[str] = Field(
None, description='Optional webhook URL for notifications'
)

View File

@@ -0,0 +1,170 @@
from pydantic import BaseModel, Field
class Datum2(BaseModel):
b64_json: str | None = Field(None, description="Base64 encoded image data")
revised_prompt: str | None = Field(None, description="Revised prompt")
url: str | None = Field(None, description="URL of the image")
class InputTokensDetails(BaseModel):
image_tokens: int | None = Field(None)
text_tokens: int | None = Field(None)
class Usage(BaseModel):
input_tokens: int | None = Field(None)
input_tokens_details: InputTokensDetails | None = Field(None)
output_tokens: int | None = Field(None)
total_tokens: int | None = Field(None)
class OpenAIImageGenerationResponse(BaseModel):
data: list[Datum2] | None = Field(None)
usage: Usage | None = Field(None)
class OpenAIImageEditRequest(BaseModel):
background: str | None = Field(None, description="Background transparency")
model: str = Field(...)
moderation: str | None = Field(None)
n: int | None = Field(None, description="The number of images to generate")
output_compression: int | None = Field(None, description="Compression level for JPEG or WebP (0-100)")
output_format: str | None = Field(None)
prompt: str = Field(...)
quality: str | None = Field(None, description="Size of the image (e.g., 1024x1024, 1536x1024, auto)")
size: str | None = Field(None, description="Size of the output image")
class OpenAIImageGenerationRequest(BaseModel):
background: str | None = Field(None, description="Background transparency")
model: str | None = Field(None)
moderation: str | None = Field(None)
n: int | None = Field(
None,
description="The number of images to generate.",
)
output_compression: int | None = Field(None, description="Compression level for JPEG or WebP (0-100)")
output_format: str | None = Field(None)
prompt: str = Field(...)
quality: str | None = Field(None, description="The quality of the generated image")
size: str | None = Field(None, description="Size of the image (e.g., 1024x1024, 1536x1024, auto)")
style: str | None = Field(None, description="Style of the image (only for dall-e-3)")
class ModelResponseProperties(BaseModel):
instructions: str | None = Field(None)
max_output_tokens: int | None = Field(None)
model: str | None = Field(None)
temperature: float | None = Field(1, description="Controls randomness in the response", ge=0.0, le=2.0)
top_p: float | None = Field(
1,
description="Controls diversity of the response via nucleus sampling",
ge=0.0,
le=1.0,
)
truncation: str | None = Field("disabled", description="Allowed values: 'auto' or 'disabled'")
class ResponseProperties(BaseModel):
instructions: str | None = Field(None)
max_output_tokens: int | None = Field(None)
model: str | None = Field(None)
previous_response_id: str | None = Field(None)
truncation: str | None = Field("disabled", description="Allowed values: 'auto' or 'disabled'")
class ResponseError(BaseModel):
code: str = Field(...)
message: str = Field(...)
class OutputTokensDetails(BaseModel):
reasoning_tokens: int = Field(..., description="The number of reasoning tokens.")
class CachedTokensDetails(BaseModel):
cached_tokens: int = Field(
...,
description="The number of tokens that were retrieved from the cache.",
)
class ResponseUsage(BaseModel):
input_tokens: int = Field(..., description="The number of input tokens.")
input_tokens_details: CachedTokensDetails = Field(...)
output_tokens: int = Field(..., description="The number of output tokens.")
output_tokens_details: OutputTokensDetails = Field(...)
total_tokens: int = Field(..., description="The total number of tokens used.")
class InputTextContent(BaseModel):
text: str = Field(..., description="The text input to the model.")
type: str = Field("input_text")
class OutputContent(BaseModel):
type: str = Field(..., description="The type of output content")
text: str | None = Field(None, description="The text content")
data: str | None = Field(None, description="Base64-encoded audio data")
transcript: str | None = Field(None, description="Transcript of the audio")
class OutputMessage(BaseModel):
type: str = Field(..., description="The type of output item")
content: list[OutputContent] | None = Field(None, description="The content of the message")
role: str | None = Field(None, description="The role of the message")
class OpenAIResponse(ModelResponseProperties, ResponseProperties):
created_at: float | None = Field(
None,
description="Unix timestamp (in seconds) of when this Response was created.",
)
error: ResponseError | None = Field(None)
id: str | None = Field(None, description="Unique identifier for this Response.")
object: str | None = Field(None, description="The object type of this resource - always set to `response`.")
output: list[OutputMessage] | None = Field(None)
parallel_tool_calls: bool | None = Field(True)
status: str | None = Field(
None,
description="One of `completed`, `failed`, `in_progress`, or `incomplete`.",
)
usage: ResponseUsage | None = Field(None)
class InputImageContent(BaseModel):
detail: str = Field(..., description="One of `high`, `low`, or `auto`. Defaults to `auto`.")
file_id: str | None = Field(None)
image_url: str | None = Field(None)
type: str = Field(..., description="The type of the input item. Always `input_image`.")
class InputFileContent(BaseModel):
file_data: str | None = Field(None)
file_id: str | None = Field(None)
filename: str | None = Field(None, description="The name of the file to be sent to the model.")
type: str = Field(..., description="The type of the input item. Always `input_file`.")
class InputMessage(BaseModel):
content: list[InputTextContent | InputImageContent | InputFileContent] = Field(
...,
description="A list of one or many input items to the model, containing different content types.",
)
role: str | None = Field(None)
type: str | None = Field(None)
class OpenAICreateResponse(ModelResponseProperties, ResponseProperties):
include: str | None = Field(None)
input: list[InputMessage] = Field(...)
parallel_tool_calls: bool | None = Field(
True, description="Whether to allow the model to run tool calls in parallel."
)
store: bool | None = Field(
True,
description="Whether to store the generated model response for later retrieval via API.",
)
stream: bool | None = Field(False)
usage: ResponseUsage | None = Field(None)

View File

@@ -1,52 +0,0 @@
from pydantic import BaseModel, Field
class Datum2(BaseModel):
b64_json: str | None = Field(None, description="Base64 encoded image data")
revised_prompt: str | None = Field(None, description="Revised prompt")
url: str | None = Field(None, description="URL of the image")
class InputTokensDetails(BaseModel):
image_tokens: int | None = None
text_tokens: int | None = None
class Usage(BaseModel):
input_tokens: int | None = None
input_tokens_details: InputTokensDetails | None = None
output_tokens: int | None = None
total_tokens: int | None = None
class OpenAIImageGenerationResponse(BaseModel):
data: list[Datum2] | None = None
usage: Usage | None = None
class OpenAIImageEditRequest(BaseModel):
background: str | None = Field(None, description="Background transparency")
model: str = Field(...)
moderation: str | None = Field(None)
n: int | None = Field(None, description="The number of images to generate")
output_compression: int | None = Field(None, description="Compression level for JPEG or WebP (0-100)")
output_format: str | None = Field(None)
prompt: str = Field(...)
quality: str | None = Field(None, description="Size of the image (e.g., 1024x1024, 1536x1024, auto)")
size: str | None = Field(None, description="Size of the output image")
class OpenAIImageGenerationRequest(BaseModel):
background: str | None = Field(None, description="Background transparency")
model: str | None = Field(None)
moderation: str | None = Field(None)
n: int | None = Field(
None,
description="The number of images to generate.",
)
output_compression: int | None = Field(None, description="Compression level for JPEG or WebP (0-100)")
output_format: str | None = Field(None)
prompt: str = Field(...)
quality: str | None = Field(None, description="The quality of the generated image")
size: str | None = Field(None, description="Size of the image (e.g., 1024x1024, 1536x1024, auto)")
style: str | None = Field(None, description="Style of the image (only for dall-e-3)")

View File

@@ -0,0 +1,127 @@
from enum import Enum
from typing import Optional, List, Union
from datetime import datetime
from pydantic import BaseModel, Field, RootModel
class RunwayAspectRatioEnum(str, Enum):
field_1280_720 = '1280:720'
field_720_1280 = '720:1280'
field_1104_832 = '1104:832'
field_832_1104 = '832:1104'
field_960_960 = '960:960'
field_1584_672 = '1584:672'
field_1280_768 = '1280:768'
field_768_1280 = '768:1280'
class Position(str, Enum):
first = 'first'
last = 'last'
class RunwayPromptImageDetailedObject(BaseModel):
position: Position = Field(
...,
description="The position of the image in the output video. 'last' is currently supported for gen3a_turbo only.",
)
uri: str = Field(
..., description='A HTTPS URL or data URI containing an encoded image.'
)
class RunwayPromptImageObject(
RootModel[Union[str, List[RunwayPromptImageDetailedObject]]]
):
root: Union[str, List[RunwayPromptImageDetailedObject]] = Field(
...,
description='Image(s) to use for the video generation. Can be a single URI or an array of image objects with positions.',
)
class RunwayModelEnum(str, Enum):
gen4_turbo = 'gen4_turbo'
gen3a_turbo = 'gen3a_turbo'
class RunwayDurationEnum(int, Enum):
integer_5 = 5
integer_10 = 10
class RunwayImageToVideoRequest(BaseModel):
duration: RunwayDurationEnum
model: RunwayModelEnum
promptImage: RunwayPromptImageObject
promptText: Optional[str] = Field(
None, description='Text prompt for the generation', max_length=1000
)
ratio: RunwayAspectRatioEnum
seed: int = Field(
..., description='Random seed for generation', ge=0, le=4294967295
)
class RunwayImageToVideoResponse(BaseModel):
id: Optional[str] = Field(None, description='Task ID')
class RunwayTaskStatusEnum(str, Enum):
SUCCEEDED = 'SUCCEEDED'
RUNNING = 'RUNNING'
FAILED = 'FAILED'
PENDING = 'PENDING'
CANCELLED = 'CANCELLED'
THROTTLED = 'THROTTLED'
class RunwayTaskStatusResponse(BaseModel):
createdAt: datetime = Field(..., description='Task creation timestamp')
id: str = Field(..., description='Task ID')
output: Optional[List[str]] = Field(None, description='Array of output video URLs')
progress: Optional[float] = Field(
None,
description='Float value between 0 and 1 representing the progress of the task. Only available if status is RUNNING.',
ge=0.0,
le=1.0,
)
status: RunwayTaskStatusEnum
class Model4(str, Enum):
gen4_image = 'gen4_image'
class ReferenceImage(BaseModel):
uri: Optional[str] = Field(
None, description='A HTTPS URL or data URI containing an encoded image'
)
class RunwayTextToImageAspectRatioEnum(str, Enum):
field_1920_1080 = '1920:1080'
field_1080_1920 = '1080:1920'
field_1024_1024 = '1024:1024'
field_1360_768 = '1360:768'
field_1080_1080 = '1080:1080'
field_1168_880 = '1168:880'
field_1440_1080 = '1440:1080'
field_1080_1440 = '1080:1440'
field_1808_768 = '1808:768'
field_2112_912 = '2112:912'
class RunwayTextToImageRequest(BaseModel):
model: Model4 = Field(..., description='Model to use for generation')
promptText: str = Field(
..., description='Text prompt for the image generation', max_length=1000
)
ratio: RunwayTextToImageAspectRatioEnum
referenceImages: Optional[List[ReferenceImage]] = Field(
None, description='Array of reference images to guide the generation'
)
class RunwayTextToImageResponse(BaseModel):
id: Optional[str] = Field(None, description='Task ID')

View File

@@ -41,7 +41,7 @@ class Resolution(BaseModel):
height: int = Field(...)
class CreateCreateVideoRequestSource(BaseModel):
class CreateVideoRequestSource(BaseModel):
container: str = Field(...)
size: int = Field(..., description="Size of the video file in bytes")
duration: int = Field(..., description="Duration of the video file in seconds")
@@ -89,7 +89,7 @@ class Overrides(BaseModel):
class CreateVideoRequest(BaseModel):
source: CreateCreateVideoRequestSource = Field(...)
source: CreateVideoRequestSource = Field(...)
filters: list[Union[VideoFrameInterpolationFilter, VideoEnhancementFilter]] = Field(...)
output: OutputInformationVideo = Field(...)
overrides: Overrides = Field(Overrides(isPaidDiffusion=True))

View File

@@ -0,0 +1,35 @@
from pydantic import BaseModel, Field
class SeedVR2ImageRequest(BaseModel):
image: str = Field(...)
target_resolution: str = Field(...)
output_format: str = Field("png")
enable_sync_mode: bool = Field(False)
class FlashVSRRequest(BaseModel):
target_resolution: str = Field(...)
video: str = Field(...)
duration: float = Field(...)
class TaskCreatedDataResponse(BaseModel):
id: str = Field(...)
class TaskCreatedResponse(BaseModel):
code: int = Field(...)
message: str = Field(...)
data: TaskCreatedDataResponse | None = Field(None)
class TaskResultDataResponse(BaseModel):
status: str = Field(...)
outputs: list[str] = Field([])
class TaskResultResponse(BaseModel):
code: int = Field(...)
message: str = Field(...)
data: TaskResultDataResponse | None = Field(None)

View File

@@ -1,10 +0,0 @@
import av
ver = av.__version__.split(".")
if int(ver[0]) < 14:
raise Exception("INSTALL NEW VERSION OF PYAV TO USE API NODES.")
if int(ver[0]) == 14 and int(ver[1]) < 2:
raise Exception("INSTALL NEW VERSION OF PYAV TO USE API NODES.")
NODE_CLASS_MAPPINGS = {}

View File

@@ -1,116 +0,0 @@
from enum import Enum
from pydantic.fields import FieldInfo
from pydantic import BaseModel
from pydantic_core import PydanticUndefined
from comfy.comfy_types.node_typing import IO, InputTypeOptions
NodeInput = tuple[IO, InputTypeOptions]
def _create_base_config(field_info: FieldInfo) -> InputTypeOptions:
config = {}
if hasattr(field_info, "default") and field_info.default is not PydanticUndefined:
config["default"] = field_info.default
if hasattr(field_info, "description") and field_info.description is not None:
config["tooltip"] = field_info.description
return config
def _get_number_constraints_config(field_info: FieldInfo) -> dict:
config = {}
if hasattr(field_info, "metadata"):
metadata = field_info.metadata
for constraint in metadata:
if hasattr(constraint, "ge"):
config["min"] = constraint.ge
if hasattr(constraint, "le"):
config["max"] = constraint.le
if hasattr(constraint, "multiple_of"):
config["step"] = constraint.multiple_of
return config
def _model_field_to_image_input(field_info: FieldInfo, **kwargs) -> NodeInput:
return IO.IMAGE, {
**_create_base_config(field_info),
**kwargs,
}
def _model_field_to_string_input(field_info: FieldInfo, **kwargs) -> NodeInput:
return IO.STRING, {
**_create_base_config(field_info),
**kwargs,
}
def _model_field_to_float_input(field_info: FieldInfo, **kwargs) -> NodeInput:
return IO.FLOAT, {
**_create_base_config(field_info),
**_get_number_constraints_config(field_info),
**kwargs,
}
def _model_field_to_int_input(field_info: FieldInfo, **kwargs) -> NodeInput:
return IO.INT, {
**_create_base_config(field_info),
**_get_number_constraints_config(field_info),
**kwargs,
}
def _model_field_to_combo_input(
field_info: FieldInfo, enum_type: type[Enum] = None, **kwargs
) -> NodeInput:
combo_config = {}
if enum_type is not None:
combo_config["options"] = [option.value for option in enum_type]
combo_config = {
**combo_config,
**_create_base_config(field_info),
**kwargs,
}
return IO.COMBO, combo_config
def model_field_to_node_input(
input_type: IO, base_model: type[BaseModel], field_name: str, **kwargs
) -> NodeInput:
"""
Maps a field from a Pydantic model to a Comfy node input.
Args:
input_type: The type of the input.
base_model: The Pydantic model to map the field from.
field_name: The name of the field to map.
**kwargs: Additional key/values to include in the input options.
Note:
For combo inputs, pass an `Enum` to the `enum_type` keyword argument to populate the options automatically.
Example:
>>> model_field_to_node_input(IO.STRING, MyModel, "my_field", multiline=True)
>>> model_field_to_node_input(IO.COMBO, MyModel, "my_field", enum_type=MyEnum)
>>> model_field_to_node_input(IO.FLOAT, MyModel, "my_field", slider=True)
"""
field_info: FieldInfo = base_model.model_fields[field_name]
result: NodeInput
if input_type == IO.IMAGE:
result = _model_field_to_image_input(field_info, **kwargs)
elif input_type == IO.STRING:
result = _model_field_to_string_input(field_info, **kwargs)
elif input_type == IO.FLOAT:
result = _model_field_to_float_input(field_info, **kwargs)
elif input_type == IO.INT:
result = _model_field_to_int_input(field_info, **kwargs)
elif input_type == IO.COMBO:
result = _model_field_to_combo_input(field_info, **kwargs)
else:
message = f"Invalid input type: {input_type}"
raise ValueError(message)
return result

View File

@@ -3,7 +3,7 @@ from pydantic import BaseModel
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis.bfl_api import (
from comfy_api_nodes.apis.bfl import (
BFLFluxExpandImageRequest,
BFLFluxFillImageRequest,
BFLFluxKontextProGenerateRequest,

View File

@@ -0,0 +1,198 @@
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis.bria import (
BriaEditImageRequest,
BriaResponse,
BriaStatusResponse,
InputModerationSettings,
)
from comfy_api_nodes.util import (
ApiEndpoint,
convert_mask_to_image,
download_url_to_image_tensor,
get_number_of_images,
poll_op,
sync_op,
upload_images_to_comfyapi,
)
class BriaImageEditNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="BriaImageEditNode",
display_name="Bria Image Edit",
category="api node/image/Bria",
description="Edit images using Bria latest model",
inputs=[
IO.Combo.Input("model", options=["FIBO"]),
IO.Image.Input("image"),
IO.String.Input(
"prompt",
multiline=True,
default="",
tooltip="Instruction to edit image",
),
IO.String.Input("negative_prompt", multiline=True, default=""),
IO.String.Input(
"structured_prompt",
multiline=True,
default="",
tooltip="A string containing the structured edit prompt in JSON format. "
"Use this instead of usual prompt for precise, programmatic control.",
),
IO.Int.Input(
"seed",
default=1,
min=1,
max=2147483647,
step=1,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
),
IO.Float.Input(
"guidance_scale",
default=3,
min=3,
max=5,
step=0.01,
display_mode=IO.NumberDisplay.number,
tooltip="Higher value makes the image follow the prompt more closely.",
),
IO.Int.Input(
"steps",
default=50,
min=20,
max=50,
step=1,
display_mode=IO.NumberDisplay.number,
),
IO.DynamicCombo.Input(
"moderation",
options=[
IO.DynamicCombo.Option(
"true",
[
IO.Boolean.Input(
"prompt_content_moderation", default=False
),
IO.Boolean.Input(
"visual_input_moderation", default=False
),
IO.Boolean.Input(
"visual_output_moderation", default=True
),
],
),
IO.DynamicCombo.Option("false", []),
],
tooltip="Moderation settings",
),
IO.Mask.Input(
"mask",
tooltip="If omitted, the edit applies to the entire image.",
optional=True,
),
],
outputs=[
IO.Image.Output(),
IO.String.Output(display_name="structured_prompt"),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
expr="""{"type":"usd","usd":0.04}""",
),
)
@classmethod
async def execute(
cls,
model: str,
image: Input.Image,
prompt: str,
negative_prompt: str,
structured_prompt: str,
seed: int,
guidance_scale: float,
steps: int,
moderation: InputModerationSettings,
mask: Input.Image | None = None,
) -> IO.NodeOutput:
if not prompt and not structured_prompt:
raise ValueError(
"One of prompt or structured_prompt is required to be non-empty."
)
if get_number_of_images(image) != 1:
raise ValueError("Exactly one input image is required.")
mask_url = None
if mask is not None:
mask_url = (
await upload_images_to_comfyapi(
cls,
convert_mask_to_image(mask),
max_images=1,
mime_type="image/png",
wait_label="Uploading mask",
)
)[0]
response = await sync_op(
cls,
ApiEndpoint(path="proxy/bria/v2/image/edit", method="POST"),
data=BriaEditImageRequest(
instruction=prompt if prompt else None,
structured_instruction=structured_prompt if structured_prompt else None,
images=await upload_images_to_comfyapi(
cls,
image,
max_images=1,
mime_type="image/png",
wait_label="Uploading image",
),
mask=mask_url,
negative_prompt=negative_prompt if negative_prompt else None,
guidance_scale=guidance_scale,
seed=seed,
model_version=model,
steps_num=steps,
prompt_content_moderation=moderation.get(
"prompt_content_moderation", False
),
visual_input_content_moderation=moderation.get(
"visual_input_moderation", False
),
visual_output_content_moderation=moderation.get(
"visual_output_moderation", False
),
),
response_model=BriaStatusResponse,
)
response = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/bria/v2/status/{response.request_id}"),
status_extractor=lambda r: r.status,
response_model=BriaResponse,
)
return IO.NodeOutput(
await download_url_to_image_tensor(response.result.image_url),
response.result.structured_prompt,
)
class BriaExtension(ComfyExtension):
@override
async def get_node_list(self) -> list[type[IO.ComfyNode]]:
return [
BriaImageEditNode,
]
async def comfy_entrypoint() -> BriaExtension:
return BriaExtension()

View File

@@ -5,7 +5,7 @@ import torch
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis.bytedance_api import (
from comfy_api_nodes.apis.bytedance import (
RECOMMENDED_PRESETS,
RECOMMENDED_PRESETS_SEEDREAM_4,
VIDEO_TASKS_EXECUTION_TIME,

View File

@@ -14,7 +14,7 @@ from typing_extensions import override
import folder_paths
from comfy_api.latest import IO, ComfyExtension, Input, Types
from comfy_api_nodes.apis.gemini_api import (
from comfy_api_nodes.apis.gemini import (
GeminiContent,
GeminiFileData,
GeminiGenerateContentRequest,

View File

@@ -4,7 +4,7 @@ from comfy_api.latest import IO, ComfyExtension
from PIL import Image
import numpy as np
import torch
from comfy_api_nodes.apis import (
from comfy_api_nodes.apis.ideogram import (
IdeogramGenerateRequest,
IdeogramGenerateResponse,
ImageRequest,

View File

@@ -49,7 +49,7 @@ from comfy_api_nodes.apis import (
KlingCharacterEffectModelName,
KlingSingleImageEffectModelName,
)
from comfy_api_nodes.apis.kling_api import (
from comfy_api_nodes.apis.kling import (
ImageToVideoWithAudioRequest,
MotionControlRequest,
OmniImageParamImage,

View File

@@ -4,7 +4,7 @@ import torch
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension
from comfy_api_nodes.apis.luma_api import (
from comfy_api_nodes.apis.luma import (
LumaAspectRatio,
LumaCharacterRef,
LumaConceptChain,

View File

@@ -4,7 +4,7 @@ import torch
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension
from comfy_api_nodes.apis.minimax_api import (
from comfy_api_nodes.apis.minimax import (
MinimaxFileRetrieveResponse,
MiniMaxModel,
MinimaxTaskResultResponse,

View File

@@ -3,7 +3,7 @@ import logging
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis import (
from comfy_api_nodes.apis.moonvalley import (
MoonvalleyPromptResponse,
MoonvalleyTextToVideoInferenceParams,
MoonvalleyTextToVideoRequest,

View File

@@ -10,24 +10,18 @@ from typing_extensions import override
import folder_paths
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis import (
CreateModelResponseProperties,
Detail,
InputContent,
from comfy_api_nodes.apis.openai import (
InputFileContent,
InputImageContent,
InputMessage,
InputMessageContentList,
InputTextContent,
Item,
ModelResponseProperties,
OpenAICreateResponse,
OpenAIResponse,
OutputContent,
)
from comfy_api_nodes.apis.openai_api import (
OpenAIImageEditRequest,
OpenAIImageGenerationRequest,
OpenAIImageGenerationResponse,
OpenAIResponse,
OutputContent,
)
from comfy_api_nodes.util import (
ApiEndpoint,
@@ -266,7 +260,7 @@ class OpenAIDalle3(IO.ComfyNode):
"seed",
default=0,
min=0,
max=2 ** 31 - 1,
max=2**31 - 1,
step=1,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
@@ -384,7 +378,7 @@ class OpenAIGPTImage1(IO.ComfyNode):
"seed",
default=0,
min=0,
max=2 ** 31 - 1,
max=2**31 - 1,
step=1,
display_mode=IO.NumberDisplay.number,
control_after_generate=True,
@@ -500,8 +494,8 @@ class OpenAIGPTImage1(IO.ComfyNode):
files = []
batch_size = image.shape[0]
for i in range(batch_size):
single_image = image[i: i + 1]
scaled_image = downscale_image_tensor(single_image, total_pixels=2048*2048).squeeze()
single_image = image[i : i + 1]
scaled_image = downscale_image_tensor(single_image, total_pixels=2048 * 2048).squeeze()
image_np = (scaled_image.numpy() * 255).astype(np.uint8)
img = Image.fromarray(image_np)
@@ -523,7 +517,7 @@ class OpenAIGPTImage1(IO.ComfyNode):
rgba_mask = torch.zeros(height, width, 4, device="cpu")
rgba_mask[:, :, 3] = 1 - mask.squeeze().cpu()
scaled_mask = downscale_image_tensor(rgba_mask.unsqueeze(0), total_pixels=2048*2048).squeeze()
scaled_mask = downscale_image_tensor(rgba_mask.unsqueeze(0), total_pixels=2048 * 2048).squeeze()
mask_np = (scaled_mask.numpy() * 255).astype(np.uint8)
mask_img = Image.fromarray(mask_np)
@@ -696,29 +690,23 @@ class OpenAIChatNode(IO.ComfyNode):
)
@classmethod
def get_message_content_from_response(
cls, response: OpenAIResponse
) -> list[OutputContent]:
def get_message_content_from_response(cls, response: OpenAIResponse) -> list[OutputContent]:
"""Extract message content from the API response."""
for output in response.output:
if output.root.type == "message":
return output.root.content
if output.type == "message":
return output.content
raise TypeError("No output message found in response")
@classmethod
def get_text_from_message_content(
cls, message_content: list[OutputContent]
) -> str:
def get_text_from_message_content(cls, message_content: list[OutputContent]) -> str:
"""Extract text content from message content."""
for content_item in message_content:
if content_item.root.type == "output_text":
return str(content_item.root.text)
if content_item.type == "output_text":
return str(content_item.text)
return "No text output found in response"
@classmethod
def tensor_to_input_image_content(
cls, image: torch.Tensor, detail_level: Detail = "auto"
) -> InputImageContent:
def tensor_to_input_image_content(cls, image: torch.Tensor, detail_level: str = "auto") -> InputImageContent:
"""Convert a tensor to an input image content object."""
return InputImageContent(
detail=detail_level,
@@ -732,9 +720,9 @@ class OpenAIChatNode(IO.ComfyNode):
prompt: str,
image: torch.Tensor | None = None,
files: list[InputFileContent] | None = None,
) -> InputMessageContentList:
) -> list[InputTextContent | InputImageContent | InputFileContent]:
"""Create a list of input message contents from prompt and optional image."""
content_list: list[InputContent | InputTextContent | InputImageContent | InputFileContent] = [
content_list: list[InputTextContent | InputImageContent | InputFileContent] = [
InputTextContent(text=prompt, type="input_text"),
]
if image is not None:
@@ -746,13 +734,9 @@ class OpenAIChatNode(IO.ComfyNode):
type="input_image",
)
)
if files is not None:
content_list.extend(files)
return InputMessageContentList(
root=content_list,
)
return content_list
@classmethod
async def execute(
@@ -762,7 +746,7 @@ class OpenAIChatNode(IO.ComfyNode):
model: SupportedOpenAIModel = SupportedOpenAIModel.gpt_5.value,
images: torch.Tensor | None = None,
files: list[InputFileContent] | None = None,
advanced_options: CreateModelResponseProperties | None = None,
advanced_options: ModelResponseProperties | None = None,
) -> IO.NodeOutput:
validate_string(prompt, strip_whitespace=False)
@@ -773,36 +757,28 @@ class OpenAIChatNode(IO.ComfyNode):
response_model=OpenAIResponse,
data=OpenAICreateResponse(
input=[
Item(
root=InputMessage(
content=cls.create_input_message_contents(
prompt, images, files
),
role="user",
)
InputMessage(
content=cls.create_input_message_contents(prompt, images, files),
role="user",
),
],
store=True,
stream=False,
model=model,
previous_response_id=None,
**(
advanced_options.model_dump(exclude_none=True)
if advanced_options
else {}
),
**(advanced_options.model_dump(exclude_none=True) if advanced_options else {}),
),
)
response_id = create_response.id
# Get result output
result_response = await poll_op(
cls,
ApiEndpoint(path=f"{RESPONSES_ENDPOINT}/{response_id}"),
response_model=OpenAIResponse,
status_extractor=lambda response: response.status,
completed_statuses=["incomplete", "completed"]
)
cls,
ApiEndpoint(path=f"{RESPONSES_ENDPOINT}/{response_id}"),
response_model=OpenAIResponse,
status_extractor=lambda response: response.status,
completed_statuses=["incomplete", "completed"],
)
return IO.NodeOutput(cls.get_text_from_message_content(cls.get_message_content_from_response(result_response)))
@@ -923,7 +899,7 @@ class OpenAIChatConfig(IO.ComfyNode):
remove depending on model choice.
"""
return IO.NodeOutput(
CreateModelResponseProperties(
ModelResponseProperties(
instructions=instructions,
truncation=truncation,
max_output_tokens=max_output_tokens,

View File

@@ -1,7 +1,7 @@
import torch
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension
from comfy_api_nodes.apis.pixverse_api import (
from comfy_api_nodes.apis.pixverse import (
PixverseTextVideoRequest,
PixverseImageVideoRequest,
PixverseTransitionVideoRequest,

View File

@@ -8,7 +8,7 @@ from typing_extensions import override
from comfy.utils import ProgressBar
from comfy_api.latest import IO, ComfyExtension
from comfy_api_nodes.apis.recraft_api import (
from comfy_api_nodes.apis.recraft import (
RecraftColor,
RecraftColorChain,
RecraftControls,

View File

@@ -14,7 +14,7 @@ from typing import Optional
from io import BytesIO
from typing_extensions import override
from PIL import Image
from comfy_api_nodes.apis.rodin_api import (
from comfy_api_nodes.apis.rodin import (
Rodin3DGenerateRequest,
Rodin3DGenerateResponse,
Rodin3DCheckStatusRequest,

View File

@@ -16,7 +16,7 @@ from enum import Enum
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input, InputImpl
from comfy_api_nodes.apis import (
from comfy_api_nodes.apis.runway import (
RunwayImageToVideoRequest,
RunwayImageToVideoResponse,
RunwayTaskStatusResponse as TaskStatusResponse,

View File

@@ -3,7 +3,7 @@ from typing import Optional
from typing_extensions import override
from comfy_api.latest import ComfyExtension, Input, IO
from comfy_api_nodes.apis.stability_api import (
from comfy_api_nodes.apis.stability import (
StabilityUpscaleConservativeRequest,
StabilityUpscaleCreativeRequest,
StabilityAsyncResponse,

View File

@@ -5,7 +5,24 @@ import aiohttp
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis import topaz_api
from comfy_api_nodes.apis.topaz import (
CreateVideoRequest,
CreateVideoRequestSource,
CreateVideoResponse,
ImageAsyncTaskResponse,
ImageDownloadResponse,
ImageEnhanceRequest,
ImageStatusResponse,
OutputInformationVideo,
Resolution,
VideoAcceptResponse,
VideoCompleteUploadRequest,
VideoCompleteUploadRequestPart,
VideoCompleteUploadResponse,
VideoEnhancementFilter,
VideoFrameInterpolationFilter,
VideoStatusResponse,
)
from comfy_api_nodes.util import (
ApiEndpoint,
download_url_to_image_tensor,
@@ -153,13 +170,13 @@ class TopazImageEnhance(IO.ComfyNode):
if get_number_of_images(image) != 1:
raise ValueError("Only one input image is supported.")
download_url = await upload_images_to_comfyapi(
cls, image, max_images=1, mime_type="image/png", total_pixels=4096*4096
cls, image, max_images=1, mime_type="image/png", total_pixels=4096 * 4096
)
initial_response = await sync_op(
cls,
ApiEndpoint(path="/proxy/topaz/image/v1/enhance-gen/async", method="POST"),
response_model=topaz_api.ImageAsyncTaskResponse,
data=topaz_api.ImageEnhanceRequest(
response_model=ImageAsyncTaskResponse,
data=ImageEnhanceRequest(
model=model,
prompt=prompt,
subject_detection=subject_detection,
@@ -181,7 +198,7 @@ class TopazImageEnhance(IO.ComfyNode):
await poll_op(
cls,
poll_endpoint=ApiEndpoint(path=f"/proxy/topaz/image/v1/status/{initial_response.process_id}"),
response_model=topaz_api.ImageStatusResponse,
response_model=ImageStatusResponse,
status_extractor=lambda x: x.status,
progress_extractor=lambda x: getattr(x, "progress", 0),
price_extractor=lambda x: x.credits * 0.08,
@@ -193,7 +210,7 @@ class TopazImageEnhance(IO.ComfyNode):
results = await sync_op(
cls,
ApiEndpoint(path=f"/proxy/topaz/image/v1/download/{initial_response.process_id}"),
response_model=topaz_api.ImageDownloadResponse,
response_model=ImageDownloadResponse,
monitor_progress=False,
)
return IO.NodeOutput(await download_url_to_image_tensor(results.download_url))
@@ -331,7 +348,7 @@ class TopazVideoEnhance(IO.ComfyNode):
if target_height % 2 != 0:
target_height += 1
filters.append(
topaz_api.VideoEnhancementFilter(
VideoEnhancementFilter(
model=UPSCALER_MODELS_MAP[upscaler_model],
creativity=(upscaler_creativity if UPSCALER_MODELS_MAP[upscaler_model] == "slc-1" else None),
isOptimizedMode=(True if UPSCALER_MODELS_MAP[upscaler_model] == "slc-1" else None),
@@ -340,7 +357,7 @@ class TopazVideoEnhance(IO.ComfyNode):
if interpolation_enabled:
target_frame_rate = interpolation_frame_rate
filters.append(
topaz_api.VideoFrameInterpolationFilter(
VideoFrameInterpolationFilter(
model=interpolation_model,
slowmo=interpolation_slowmo,
fps=interpolation_frame_rate,
@@ -351,19 +368,19 @@ class TopazVideoEnhance(IO.ComfyNode):
initial_res = await sync_op(
cls,
ApiEndpoint(path="/proxy/topaz/video/", method="POST"),
response_model=topaz_api.CreateVideoResponse,
data=topaz_api.CreateVideoRequest(
source=topaz_api.CreateCreateVideoRequestSource(
response_model=CreateVideoResponse,
data=CreateVideoRequest(
source=CreateVideoRequestSource(
container="mp4",
size=get_fs_object_size(src_video_stream),
duration=int(duration_sec),
frameCount=video.get_frame_count(),
frameRate=src_frame_rate,
resolution=topaz_api.Resolution(width=src_width, height=src_height),
resolution=Resolution(width=src_width, height=src_height),
),
filters=filters,
output=topaz_api.OutputInformationVideo(
resolution=topaz_api.Resolution(width=target_width, height=target_height),
output=OutputInformationVideo(
resolution=Resolution(width=target_width, height=target_height),
frameRate=target_frame_rate,
audioCodec="AAC",
audioTransfer="Copy",
@@ -379,7 +396,7 @@ class TopazVideoEnhance(IO.ComfyNode):
path=f"/proxy/topaz/video/{initial_res.requestId}/accept",
method="PATCH",
),
response_model=topaz_api.VideoAcceptResponse,
response_model=VideoAcceptResponse,
wait_label="Preparing upload",
final_label_on_success="Upload started",
)
@@ -402,10 +419,10 @@ class TopazVideoEnhance(IO.ComfyNode):
path=f"/proxy/topaz/video/{initial_res.requestId}/complete-upload",
method="PATCH",
),
response_model=topaz_api.VideoCompleteUploadResponse,
data=topaz_api.VideoCompleteUploadRequest(
response_model=VideoCompleteUploadResponse,
data=VideoCompleteUploadRequest(
uploadResults=[
topaz_api.VideoCompleteUploadRequestPart(
VideoCompleteUploadRequestPart(
partNum=1,
eTag=upload_etag,
),
@@ -417,7 +434,7 @@ class TopazVideoEnhance(IO.ComfyNode):
final_response = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/topaz/video/{initial_res.requestId}/status"),
response_model=topaz_api.VideoStatusResponse,
response_model=VideoStatusResponse,
status_extractor=lambda x: x.status,
progress_extractor=lambda x: getattr(x, "progress", 0),
price_extractor=lambda x: (x.estimates.cost[0] * 0.08 if x.estimates and x.estimates.cost[0] else None),

View File

@@ -5,7 +5,7 @@ import torch
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension
from comfy_api_nodes.apis.tripo_api import (
from comfy_api_nodes.apis.tripo import (
TripoAnimateRetargetRequest,
TripoAnimateRigRequest,
TripoConvertModelRequest,

View File

@@ -4,7 +4,7 @@ from io import BytesIO
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input, InputImpl
from comfy_api_nodes.apis.veo_api import (
from comfy_api_nodes.apis.veo import (
VeoGenVidPollRequest,
VeoGenVidPollResponse,
VeoGenVidRequest,

View File

@@ -703,7 +703,7 @@ class Vidu2ReferenceVideoNode(IO.ComfyNode):
"subjects",
template=IO.Autogrow.TemplateNames(
IO.Image.Input("reference_images"),
names=["subject1", "subject2", "subject3"],
names=["subject1", "subject2", "subject3", "subject4", "subject5", "subject6", "subject7"],
min=1,
),
tooltip="For each subject, provide up to 3 reference images (7 images total across all subjects). "
@@ -738,7 +738,7 @@ class Vidu2ReferenceVideoNode(IO.ComfyNode):
control_after_generate=True,
),
IO.Combo.Input("aspect_ratio", options=["16:9", "9:16", "4:3", "3:4", "1:1"]),
IO.Combo.Input("resolution", options=["720p"]),
IO.Combo.Input("resolution", options=["720p", "1080p"]),
IO.Combo.Input(
"movement_amplitude",
options=["auto", "small", "medium", "large"],

View File

@@ -0,0 +1,178 @@
from typing_extensions import override
from comfy_api.latest import IO, ComfyExtension, Input
from comfy_api_nodes.apis.wavespeed import (
FlashVSRRequest,
TaskCreatedResponse,
TaskResultResponse,
SeedVR2ImageRequest,
)
from comfy_api_nodes.util import (
ApiEndpoint,
download_url_to_video_output,
poll_op,
sync_op,
upload_video_to_comfyapi,
validate_container_format_is_mp4,
validate_video_duration,
upload_images_to_comfyapi,
get_number_of_images,
download_url_to_image_tensor,
)
class WavespeedFlashVSRNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="WavespeedFlashVSRNode",
display_name="FlashVSR Video Upscale",
category="api node/video/WaveSpeed",
description="Fast, high-quality video upscaler that "
"boosts resolution and restores clarity for low-resolution or blurry footage.",
inputs=[
IO.Video.Input("video"),
IO.Combo.Input("target_resolution", options=["720p", "1080p", "2K", "4K"]),
],
outputs=[
IO.Video.Output(),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["target_resolution"]),
expr="""
(
$price_for_1sec := {"720p": 0.012, "1080p": 0.018, "2k": 0.024, "4k": 0.032};
{
"type":"usd",
"usd": $lookup($price_for_1sec, widgets.target_resolution),
"format":{"suffix": "/second", "approximate": true}
}
)
""",
),
)
@classmethod
async def execute(
cls,
video: Input.Video,
target_resolution: str,
) -> IO.NodeOutput:
validate_container_format_is_mp4(video)
validate_video_duration(video, min_duration=5, max_duration=60 * 10)
initial_res = await sync_op(
cls,
ApiEndpoint(path="/proxy/wavespeed/api/v3/wavespeed-ai/flashvsr", method="POST"),
response_model=TaskCreatedResponse,
data=FlashVSRRequest(
target_resolution=target_resolution.lower(),
video=await upload_video_to_comfyapi(cls, video),
duration=video.get_duration(),
),
)
if initial_res.code != 200:
raise ValueError(f"Task creation fails with code={initial_res.code} and message={initial_res.message}")
final_response = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/wavespeed/api/v3/predictions/{initial_res.data.id}/result"),
response_model=TaskResultResponse,
status_extractor=lambda x: "failed" if x.data is None else x.data.status,
poll_interval=10.0,
max_poll_attempts=480,
)
if final_response.code != 200:
raise ValueError(
f"Task processing failed with code={final_response.code} and message={final_response.message}"
)
return IO.NodeOutput(await download_url_to_video_output(final_response.data.outputs[0]))
class WavespeedImageUpscaleNode(IO.ComfyNode):
@classmethod
def define_schema(cls):
return IO.Schema(
node_id="WavespeedImageUpscaleNode",
display_name="WaveSpeed Image Upscale",
category="api node/image/WaveSpeed",
description="Boost image resolution and quality, upscaling photos to 4K or 8K for sharp, detailed results.",
inputs=[
IO.Combo.Input("model", options=["SeedVR2", "Ultimate"]),
IO.Image.Input("image"),
IO.Combo.Input("target_resolution", options=["2K", "4K", "8K"]),
],
outputs=[
IO.Image.Output(),
],
hidden=[
IO.Hidden.auth_token_comfy_org,
IO.Hidden.api_key_comfy_org,
IO.Hidden.unique_id,
],
is_api_node=True,
price_badge=IO.PriceBadge(
depends_on=IO.PriceBadgeDepends(widgets=["model"]),
expr="""
(
$prices := {"seedvr2": 0.01, "ultimate": 0.06};
{"type":"usd", "usd": $lookup($prices, widgets.model)}
)
""",
),
)
@classmethod
async def execute(
cls,
model: str,
image: Input.Image,
target_resolution: str,
) -> IO.NodeOutput:
if get_number_of_images(image) != 1:
raise ValueError("Exactly one input image is required.")
if model == "SeedVR2":
model_path = "seedvr2/image"
else:
model_path = "ultimate-image-upscaler"
initial_res = await sync_op(
cls,
ApiEndpoint(path=f"/proxy/wavespeed/api/v3/wavespeed-ai/{model_path}", method="POST"),
response_model=TaskCreatedResponse,
data=SeedVR2ImageRequest(
target_resolution=target_resolution.lower(),
image=(await upload_images_to_comfyapi(cls, image, max_images=1))[0],
),
)
if initial_res.code != 200:
raise ValueError(f"Task creation fails with code={initial_res.code} and message={initial_res.message}")
final_response = await poll_op(
cls,
ApiEndpoint(path=f"/proxy/wavespeed/api/v3/predictions/{initial_res.data.id}/result"),
response_model=TaskResultResponse,
status_extractor=lambda x: "failed" if x.data is None else x.data.status,
poll_interval=10.0,
max_poll_attempts=480,
)
if final_response.code != 200:
raise ValueError(
f"Task processing failed with code={final_response.code} and message={final_response.message}"
)
return IO.NodeOutput(await download_url_to_image_tensor(final_response.data.outputs[0]))
class WavespeedExtension(ComfyExtension):
@override
async def get_node_list(self) -> list[type[IO.ComfyNode]]:
return [
WavespeedFlashVSRNode,
WavespeedImageUpscaleNode,
]
async def comfy_entrypoint() -> WavespeedExtension:
return WavespeedExtension()

View File

@@ -1,10 +0,0 @@
# This file is used to filter the Comfy Org OpenAPI spec for schemas related to API Nodes.
# This is used for development purposes to generate stubs for unreleased API endpoints.
apis:
filter:
root: openapi.yaml
decorators:
filter-in:
property: tags
value: ['API Nodes']
matchStrategy: all

View File

@@ -1,10 +0,0 @@
# This file is used to filter the Comfy Org OpenAPI spec for schemas related to API Nodes.
apis:
filter:
root: openapi.yaml
decorators:
filter-in:
property: tags
value: ['API Nodes', 'Released']
matchStrategy: all

View File

@@ -11,6 +11,7 @@ from .conversions import (
audio_input_to_mp3,
audio_to_base64_string,
bytesio_to_image_tensor,
convert_mask_to_image,
downscale_image_tensor,
image_tensor_pair_to_batch,
pil_to_bytesio,
@@ -72,6 +73,7 @@ __all__ = [
"audio_input_to_mp3",
"audio_to_base64_string",
"bytesio_to_image_tensor",
"convert_mask_to_image",
"downscale_image_tensor",
"image_tensor_pair_to_batch",
"pil_to_bytesio",

View File

@@ -451,6 +451,12 @@ def resize_mask_to_image(
return mask
def convert_mask_to_image(mask: Input.Image) -> torch.Tensor:
"""Make mask have the expected amount of dims (4) and channels (3) to be recognized as an image."""
mask = mask.unsqueeze(-1)
return torch.cat([mask] * 3, dim=-1)
def text_filepath_to_base64_string(filepath: str) -> str:
"""Converts a text file to a base64 string."""
with open(filepath, "rb") as f:

View File

@@ -28,6 +28,7 @@ class AlignYourStepsScheduler(io.ComfyNode):
def define_schema(cls) -> io.Schema:
return io.Schema(
node_id="AlignYourStepsScheduler",
search_aliases=["AYS scheduler"],
category="sampling/custom_sampling/schedulers",
inputs=[
io.Combo.Input("model_type", options=["SD1", "SDXL", "SVD"]),

View File

@@ -71,6 +71,7 @@ class CLIPAttentionMultiply(io.ComfyNode):
def define_schema(cls) -> io.Schema:
return io.Schema(
node_id="CLIPAttentionMultiply",
search_aliases=["clip attention scale", "text encoder attention"],
category="_for_testing/attention_experiments",
inputs=[
io.Clip.Input("clip"),

View File

@@ -10,6 +10,7 @@ class Canny(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="Canny",
search_aliases=["edge detection", "outline", "contour detection", "line art"],
category="image/preprocessors",
inputs=[
io.Image.Input("image"),

View File

@@ -38,6 +38,7 @@ class ControlNetInpaintingAliMamaApply(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="ControlNetInpaintingAliMamaApply",
search_aliases=["masked controlnet"],
category="conditioning/controlnet",
inputs=[
io.Conditioning.Input("positive"),

View File

@@ -297,6 +297,7 @@ class ExtendIntermediateSigmas(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="ExtendIntermediateSigmas",
search_aliases=["interpolate sigmas"],
category="sampling/custom_sampling/sigmas",
inputs=[
io.Sigmas.Input("sigmas"),
@@ -856,6 +857,7 @@ class DualCFGGuider(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="DualCFGGuider",
search_aliases=["dual prompt guidance"],
category="sampling/custom_sampling/guiders",
inputs=[
io.Model.Input("model"),
@@ -883,6 +885,7 @@ class DisableNoise(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="DisableNoise",
search_aliases=["zero noise"],
category="sampling/custom_sampling/noise",
inputs=[],
outputs=[io.Noise.Output()]
@@ -1019,6 +1022,7 @@ class ManualSigmas(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="ManualSigmas",
search_aliases=["custom noise schedule", "define sigmas"],
category="_for_testing/custom_sampling",
is_experimental=True,
inputs=[

View File

@@ -1223,11 +1223,11 @@ class ResolutionBucket(io.ComfyNode):
class MakeTrainingDataset(io.ComfyNode):
"""Encode images with VAE and texts with CLIP to create a training dataset."""
@classmethod
def define_schema(cls):
return io.Schema(
node_id="MakeTrainingDataset",
search_aliases=["encode dataset"],
display_name="Make Training Dataset",
category="dataset",
is_experimental=True,
@@ -1309,11 +1309,11 @@ class MakeTrainingDataset(io.ComfyNode):
class SaveTrainingDataset(io.ComfyNode):
"""Save encoded training dataset (latents + conditioning) to disk."""
@classmethod
def define_schema(cls):
return io.Schema(
node_id="SaveTrainingDataset",
search_aliases=["export training data"],
display_name="Save Training Dataset",
category="dataset",
is_experimental=True,
@@ -1410,11 +1410,11 @@ class SaveTrainingDataset(io.ComfyNode):
class LoadTrainingDataset(io.ComfyNode):
"""Load encoded training dataset from disk."""
@classmethod
def define_schema(cls):
return io.Schema(
node_id="LoadTrainingDataset",
search_aliases=["import dataset", "training data"],
display_name="Load Training Dataset",
category="dataset",
is_experimental=True,

View File

@@ -11,6 +11,7 @@ class DifferentialDiffusion(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="DifferentialDiffusion",
search_aliases=["inpaint gradient", "variable denoise strength"],
display_name="Differential Diffusion",
category="_for_testing",
inputs=[

View File

@@ -58,6 +58,7 @@ class FreSca(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="FreSca",
search_aliases=["frequency guidance"],
display_name="FreSca",
category="_for_testing",
description="Applies frequency-dependent scaling to the guidance",

View File

@@ -38,6 +38,7 @@ class CLIPTextEncodeHiDream(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="CLIPTextEncodeHiDream",
search_aliases=["hidream prompt"],
category="advanced/conditioning",
inputs=[
io.Clip.Input("clip"),

View File

@@ -259,6 +259,7 @@ class SetClipHooks:
return (clip,)
class ConditioningTimestepsRange:
SEARCH_ALIASES = ["prompt scheduling", "timestep segments", "conditioning phases"]
NodeId = 'ConditioningTimestepsRange'
NodeName = 'Timesteps Range'
@classmethod
@@ -468,6 +469,7 @@ class SetHookKeyframes:
return (hooks,)
class CreateHookKeyframe:
SEARCH_ALIASES = ["hook scheduling", "strength animation", "timed hook"]
NodeId = 'CreateHookKeyframe'
NodeName = 'Create Hook Keyframe'
@classmethod
@@ -497,6 +499,7 @@ class CreateHookKeyframe:
return (prev_hook_kf,)
class CreateHookKeyframesInterpolated:
SEARCH_ALIASES = ["ease hook strength", "smooth hook transition", "interpolate keyframes"]
NodeId = 'CreateHookKeyframesInterpolated'
NodeName = 'Create Hook Keyframes Interp.'
@classmethod
@@ -544,6 +547,7 @@ class CreateHookKeyframesInterpolated:
return (prev_hook_kf,)
class CreateHookKeyframesFromFloats:
SEARCH_ALIASES = ["batch keyframes", "strength list to keyframes"]
NodeId = 'CreateHookKeyframesFromFloats'
NodeName = 'Create Hook Keyframes From Floats'
@classmethod
@@ -618,6 +622,7 @@ class SetModelHooksOnCond:
# Combine Hooks
#------------------------------------------
class CombineHooks:
SEARCH_ALIASES = ["merge hooks"]
NodeId = 'CombineHooks2'
NodeName = 'Combine Hooks [2]'
@classmethod

View File

@@ -618,6 +618,7 @@ class SaveGLB(IO.ComfyNode):
def define_schema(cls):
return IO.Schema(
node_id="SaveGLB",
search_aliases=["export 3d model", "save mesh"],
category="3d",
is_output_node=True,
inputs=[

View File

@@ -104,6 +104,7 @@ class CLIPTextEncodeKandinsky5(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="CLIPTextEncodeKandinsky5",
search_aliases=["kandinsky prompt"],
category="advanced/conditioning/kandinsky5",
inputs=[
io.Clip.Input("clip"),

View File

@@ -75,6 +75,7 @@ class Preview3D(IO.ComfyNode):
def define_schema(cls):
return IO.Schema(
node_id="Preview3D",
search_aliases=["view mesh", "3d viewer"],
display_name="Preview 3D & Animation",
category="3d",
is_experimental=True,

View File

@@ -224,6 +224,7 @@ class ConvertStringToComboNode(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="ConvertStringToComboNode",
search_aliases=["string to dropdown", "text to combo"],
display_name="Convert String to Combo",
category="logic",
inputs=[io.String.Input("string")],
@@ -239,6 +240,7 @@ class InvertBooleanNode(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="InvertBooleanNode",
search_aliases=["not", "toggle", "negate", "flip boolean"],
display_name="Invert Boolean",
category="logic",
inputs=[io.Boolean.Input("boolean")],

View File

@@ -78,6 +78,7 @@ class LoraSave(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="LoraSave",
search_aliases=["export lora"],
display_name="Extract and Save Lora",
category="_for_testing",
inputs=[

View File

@@ -79,6 +79,7 @@ class CLIPTextEncodeLumina2(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="CLIPTextEncodeLumina2",
search_aliases=["lumina prompt"],
display_name="CLIP Text Encode for Lumina2",
category="conditioning",
description="Encodes a system prompt and a user prompt using a CLIP model into an embedding "

View File

@@ -299,6 +299,7 @@ class RescaleCFG:
return (m, )
class ModelComputeDtype:
SEARCH_ALIASES = ["model precision", "change dtype"]
@classmethod
def INPUT_TYPES(s):
return {"required": { "model": ("MODEL",),

View File

@@ -91,6 +91,7 @@ class CLIPMergeSimple:
class CLIPSubtract:
SEARCH_ALIASES = ["clip difference", "text encoder subtract"]
@classmethod
def INPUT_TYPES(s):
return {"required": { "clip1": ("CLIP",),
@@ -113,6 +114,7 @@ class CLIPSubtract:
class CLIPAdd:
SEARCH_ALIASES = ["combine clip"]
@classmethod
def INPUT_TYPES(s):
return {"required": { "clip1": ("CLIP",),
@@ -225,6 +227,7 @@ def save_checkpoint(model, clip=None, vae=None, clip_vision=None, filename_prefi
comfy.sd.save_checkpoint(output_checkpoint, model, clip, vae, clip_vision, metadata=metadata, extra_keys=extra_keys)
class CheckpointSave:
SEARCH_ALIASES = ["save model", "export checkpoint", "merge save"]
def __init__(self):
self.output_dir = folder_paths.get_output_directory()
@@ -337,6 +340,7 @@ class VAESave:
return {}
class ModelSave:
SEARCH_ALIASES = ["export model", "checkpoint save"]
def __init__(self):
self.output_dir = folder_paths.get_output_directory()

View File

@@ -7,6 +7,7 @@ class CLIPTextEncodePixArtAlpha(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="CLIPTextEncodePixArtAlpha",
search_aliases=["pixart prompt"],
category="advanced/conditioning",
description="Encodes text and sets the resolution conditioning for PixArt Alpha. Does not apply to PixArt Sigma.",
inputs=[

View File

@@ -550,6 +550,7 @@ class BatchImagesNode(io.ComfyNode):
node_id="BatchImagesNode",
display_name="Batch Images",
category="image",
search_aliases=["batch", "image batch", "batch images", "combine images", "merge images", "stack images"],
inputs=[
io.Autogrow.Input("images", template=autogrow_template)
],

View File

@@ -16,6 +16,7 @@ class PreviewAny():
OUTPUT_NODE = True
CATEGORY = "utils"
SEARCH_ALIASES = ["show output", "inspect", "debug", "print value", "show text"]
def main(self, source=None):
value = 'None'

View File

@@ -65,6 +65,7 @@ class CLIPTextEncodeSD3(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="CLIPTextEncodeSD3",
search_aliases=["sd3 prompt"],
category="advanced/conditioning",
inputs=[
io.Clip.Input("clip"),

View File

@@ -11,6 +11,7 @@ class StringConcatenate(io.ComfyNode):
node_id="StringConcatenate",
display_name="Concatenate",
category="utils/string",
search_aliases=["text concat", "join text", "merge text", "combine strings", "concat", "concatenate", "append text", "combine text", "string"],
inputs=[
io.String.Input("string_a", multiline=True),
io.String.Input("string_b", multiline=True),

View File

@@ -1101,6 +1101,7 @@ class SaveLoRA(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="SaveLoRA",
search_aliases=["export lora"],
display_name="Save LoRA Weights",
category="loaders",
is_experimental=True,
@@ -1144,6 +1145,7 @@ class LossGraphNode(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="LossGraphNode",
search_aliases=["training chart", "training visualization", "plot loss"],
display_name="Plot Loss Graph",
category="training",
is_experimental=True,

View File

@@ -53,6 +53,7 @@ class ImageUpscaleWithModel(io.ComfyNode):
node_id="ImageUpscaleWithModel",
display_name="Upscale Image (using Model)",
category="image/upscaling",
search_aliases=["upscale", "upscaler", "upsc", "enlarge image", "super resolution", "hires", "superres", "increase resolution"],
inputs=[
io.UpscaleModel.Input("upscale_model"),
io.Image.Input("image"),

View File

@@ -324,6 +324,7 @@ class GenerateTracks(io.ComfyNode):
def define_schema(cls):
return io.Schema(
node_id="GenerateTracks",
search_aliases=["motion paths", "camera movement", "trajectory"],
category="conditioning/video_models",
inputs=[
io.Int.Input("width", default=832, min=16, max=4096, step=16),

View File

@@ -5,6 +5,7 @@ MAX_RESOLUTION = nodes.MAX_RESOLUTION
class WebcamCapture(nodes.LoadImage):
SEARCH_ALIASES = ["camera input", "live capture", "camera feed", "snapshot"]
@classmethod
def INPUT_TYPES(s):
return {

View File

@@ -0,0 +1,88 @@
import node_helpers
from typing_extensions import override
from comfy_api.latest import ComfyExtension, io
import math
import comfy.utils
class TextEncodeZImageOmni(io.ComfyNode):
@classmethod
def define_schema(cls):
return io.Schema(
node_id="TextEncodeZImageOmni",
category="advanced/conditioning",
is_experimental=True,
inputs=[
io.Clip.Input("clip"),
io.ClipVision.Input("image_encoder", optional=True),
io.String.Input("prompt", multiline=True, dynamic_prompts=True),
io.Boolean.Input("auto_resize_images", default=True),
io.Vae.Input("vae", optional=True),
io.Image.Input("image1", optional=True),
io.Image.Input("image2", optional=True),
io.Image.Input("image3", optional=True),
],
outputs=[
io.Conditioning.Output(),
],
)
@classmethod
def execute(cls, clip, prompt, image_encoder=None, auto_resize_images=True, vae=None, image1=None, image2=None, image3=None) -> io.NodeOutput:
ref_latents = []
images = list(filter(lambda a: a is not None, [image1, image2, image3]))
prompt_list = []
template = None
if len(images) > 0:
prompt_list = ["<|im_start|>user\n<|vision_start|>"]
prompt_list += ["<|vision_end|><|vision_start|>"] * (len(images) - 1)
prompt_list += ["<|vision_end|><|im_end|>"]
template = "<|vision_end|>{}<|im_end|>\n<|im_start|>assistant\n<|vision_start|>"
encoded_images = []
for i, image in enumerate(images):
if image_encoder is not None:
encoded_images.append(image_encoder.encode_image(image))
if vae is not None:
if auto_resize_images:
samples = image.movedim(-1, 1)
total = int(1024 * 1024)
scale_by = math.sqrt(total / (samples.shape[3] * samples.shape[2]))
width = round(samples.shape[3] * scale_by / 8.0) * 8
height = round(samples.shape[2] * scale_by / 8.0) * 8
image = comfy.utils.common_upscale(samples, width, height, "area", "disabled").movedim(1, -1)
ref_latents.append(vae.encode(image))
tokens = clip.tokenize(prompt, llama_template=template)
conditioning = clip.encode_from_tokens_scheduled(tokens)
extra_text_embeds = []
for p in prompt_list:
tokens = clip.tokenize(p, llama_template="{}")
text_embeds = clip.encode_from_tokens_scheduled(tokens)
extra_text_embeds.append(text_embeds[0][0])
if len(ref_latents) > 0:
conditioning = node_helpers.conditioning_set_values(conditioning, {"reference_latents": ref_latents}, append=True)
if len(encoded_images) > 0:
conditioning = node_helpers.conditioning_set_values(conditioning, {"clip_vision_outputs": encoded_images}, append=True)
if len(extra_text_embeds) > 0:
conditioning = node_helpers.conditioning_set_values(conditioning, {"reference_latents_text_embeds": extra_text_embeds}, append=True)
return io.NodeOutput(conditioning)
class ZImageExtension(ComfyExtension):
@override
async def get_node_list(self) -> list[type[io.ComfyNode]]:
return [
TextEncodeZImageOmni,
]
async def comfy_entrypoint() -> ZImageExtension:
return ZImageExtension()

View File

@@ -1,3 +1,3 @@
# This file is automatically generated by the build process when version is
# updated in pyproject.toml.
__version__ = "0.9.2"
__version__ = "0.10.0"

View File

@@ -5,6 +5,7 @@ import torch
import os
import sys
import json
import glob
import hashlib
import inspect
import traceback
@@ -69,6 +70,7 @@ class CLIPTextEncode(ComfyNodeABC):
CATEGORY = "conditioning"
DESCRIPTION = "Encodes a text prompt using a CLIP model into an embedding that can be used to guide the diffusion model towards generating specific images."
SEARCH_ALIASES = ["text", "prompt", "text prompt", "positive prompt", "negative prompt", "encode text", "text encoder", "encode prompt"]
def encode(self, clip, text):
if clip is None:
@@ -85,6 +87,7 @@ class ConditioningCombine:
FUNCTION = "combine"
CATEGORY = "conditioning"
SEARCH_ALIASES = ["combine", "merge conditioning", "combine prompts", "merge prompts", "mix prompts", "add prompt"]
def combine(self, conditioning_1, conditioning_2):
return (conditioning_1 + conditioning_2, )
@@ -293,6 +296,7 @@ class VAEDecode:
CATEGORY = "latent"
DESCRIPTION = "Decodes latent images back into pixel space images."
SEARCH_ALIASES = ["decode", "decode latent", "latent to image", "render latent"]
def decode(self, vae, samples):
latent = samples["samples"]
@@ -345,6 +349,7 @@ class VAEEncode:
FUNCTION = "encode"
CATEGORY = "latent"
SEARCH_ALIASES = ["encode", "encode image", "image to latent"]
def encode(self, vae, pixels):
t = vae.encode(pixels)
@@ -580,6 +585,7 @@ class CheckpointLoaderSimple:
CATEGORY = "loaders"
DESCRIPTION = "Loads a diffusion model checkpoint, diffusion models are used to denoise latents."
SEARCH_ALIASES = ["load model", "checkpoint", "model loader", "load checkpoint", "ckpt", "model"]
def load_checkpoint(self, ckpt_name):
ckpt_path = folder_paths.get_full_path_or_raise("checkpoints", ckpt_name)
@@ -666,6 +672,7 @@ class LoraLoader:
CATEGORY = "loaders"
DESCRIPTION = "LoRAs are used to modify diffusion and CLIP models, altering the way in which latents are denoised such as applying styles. Multiple LoRA nodes can be linked together."
SEARCH_ALIASES = ["lora", "load lora", "apply lora", "lora loader", "lora model"]
def load_lora(self, model, clip, lora_name, strength_model, strength_clip):
if strength_model == 0 and strength_clip == 0:
@@ -813,6 +820,7 @@ class ControlNetLoader:
FUNCTION = "load_controlnet"
CATEGORY = "loaders"
SEARCH_ALIASES = ["controlnet", "control net", "cn", "load controlnet", "controlnet loader"]
def load_controlnet(self, control_net_name):
controlnet_path = folder_paths.get_full_path_or_raise("controlnet", control_net_name)
@@ -889,6 +897,7 @@ class ControlNetApplyAdvanced:
FUNCTION = "apply_controlnet"
CATEGORY = "conditioning/controlnet"
SEARCH_ALIASES = ["controlnet", "apply controlnet", "use controlnet", "control net"]
def apply_controlnet(self, positive, negative, control_net, image, strength, start_percent, end_percent, vae=None, extra_concat=[]):
if strength == 0:
@@ -1199,6 +1208,7 @@ class EmptyLatentImage:
CATEGORY = "latent"
DESCRIPTION = "Create a new batch of empty latent images to be denoised via sampling."
SEARCH_ALIASES = ["empty", "empty latent", "new latent", "create latent", "blank latent", "blank"]
def generate(self, width, height, batch_size=1):
latent = torch.zeros([batch_size, 4, height // 8, width // 8], device=self.device)
@@ -1539,6 +1549,7 @@ class KSampler:
CATEGORY = "sampling"
DESCRIPTION = "Uses the provided model, positive and negative conditioning to denoise the latent image."
SEARCH_ALIASES = ["sampler", "sample", "generate", "denoise", "diffuse", "txt2img", "img2img"]
def sample(self, model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=1.0):
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
@@ -1603,6 +1614,7 @@ class SaveImage:
CATEGORY = "image"
DESCRIPTION = "Saves the input images to your ComfyUI output directory."
SEARCH_ALIASES = ["save", "save image", "export image", "output image", "write image", "download"]
def save_images(self, images, filename_prefix="ComfyUI", prompt=None, extra_pnginfo=None):
filename_prefix += self.prefix_append
@@ -1639,6 +1651,8 @@ class PreviewImage(SaveImage):
self.prefix_append = "_temp_" + ''.join(random.choice("abcdefghijklmnopqrstupvxyz") for x in range(5))
self.compress_level = 1
SEARCH_ALIASES = ["preview", "preview image", "show image", "view image", "display image", "image viewer"]
@classmethod
def INPUT_TYPES(s):
return {"required":
@@ -1657,6 +1671,7 @@ class LoadImage:
}
CATEGORY = "image"
SEARCH_ALIASES = ["load image", "open image", "import image", "image input", "upload image", "read image", "image loader"]
RETURN_TYPES = ("IMAGE", "MASK")
FUNCTION = "load_image"
@@ -1809,6 +1824,7 @@ class ImageScale:
FUNCTION = "upscale"
CATEGORY = "image/upscaling"
SEARCH_ALIASES = ["resize", "resize image", "scale image", "image resize", "zoom", "zoom in", "change size"]
def upscale(self, image, upscale_method, width, height, crop):
if width == 0 and height == 0:
@@ -2372,6 +2388,7 @@ async def init_builtin_extra_nodes():
"nodes_kandinsky5.py",
"nodes_wanmove.py",
"nodes_image_compare.py",
"nodes_zimage.py",
]
import_failed = []
@@ -2384,38 +2401,12 @@ async def init_builtin_extra_nodes():
async def init_builtin_api_nodes():
api_nodes_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "comfy_api_nodes")
api_nodes_files = [
"nodes_ideogram.py",
"nodes_openai.py",
"nodes_minimax.py",
"nodes_veo2.py",
"nodes_kling.py",
"nodes_bfl.py",
"nodes_bytedance.py",
"nodes_ltxv.py",
"nodes_luma.py",
"nodes_recraft.py",
"nodes_pixverse.py",
"nodes_stability.py",
"nodes_runway.py",
"nodes_sora.py",
"nodes_topaz.py",
"nodes_tripo.py",
"nodes_meshy.py",
"nodes_moonvalley.py",
"nodes_rodin.py",
"nodes_gemini.py",
"nodes_vidu.py",
"nodes_wan.py",
]
if not await load_custom_node(os.path.join(api_nodes_dir, "canary.py"), module_parent="comfy_api_nodes"):
return api_nodes_files
api_nodes_files = sorted(glob.glob(os.path.join(api_nodes_dir, "nodes_*.py")))
import_failed = []
for node_file in api_nodes_files:
if not await load_custom_node(os.path.join(api_nodes_dir, node_file), module_parent="comfy_api_nodes"):
import_failed.append(node_file)
if not await load_custom_node(node_file, module_parent="comfy_api_nodes"):
import_failed.append(os.path.basename(node_file))
return import_failed

View File

@@ -1,6 +1,6 @@
[project]
name = "ComfyUI"
version = "0.9.2"
version = "0.10.0"
readme = "README.md"
license = { file = "LICENSE" }
requires-python = ">=3.10"

View File

@@ -1,5 +1,5 @@
comfyui-frontend-package==1.36.14
comfyui-workflow-templates==0.8.11
comfyui-frontend-package==1.37.11
comfyui-workflow-templates==0.8.15
comfyui-embedded-docs==0.4.0
torch
torchsde
@@ -21,7 +21,7 @@ psutil
alembic
SQLAlchemy
av>=14.2.0
comfy-kitchen>=0.2.6
comfy-kitchen>=0.2.7
#non essential dependencies:
kornia>=0.7.1

View File

@@ -682,6 +682,8 @@ class PromptServer():
if hasattr(obj_class, 'API_NODE'):
info['api_node'] = obj_class.API_NODE
info['search_aliases'] = getattr(obj_class, 'SEARCH_ALIASES', [])
return info
@routes.get("/object_info")

View File

@@ -1,297 +0,0 @@
from typing import Optional
from enum import Enum
from pydantic import BaseModel, Field
from comfy.comfy_types.node_typing import IO
from comfy_api_nodes.mapper_utils import model_field_to_node_input
def test_model_field_to_float_input():
"""Tests mapping a float field with constraints."""
class ModelWithFloatField(BaseModel):
cfg_scale: Optional[float] = Field(
default=0.5,
description="Flexibility in video generation",
ge=0.0,
le=1.0,
multiple_of=0.001,
)
expected_output = (
IO.FLOAT,
{
"default": 0.5,
"tooltip": "Flexibility in video generation",
"min": 0.0,
"max": 1.0,
"step": 0.001,
},
)
actual_output = model_field_to_node_input(
IO.FLOAT, ModelWithFloatField, "cfg_scale"
)
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_float_input_no_constraints():
"""Tests mapping a float field with no constraints."""
class ModelWithFloatField(BaseModel):
cfg_scale: Optional[float] = Field(default=0.5)
expected_output = (
IO.FLOAT,
{
"default": 0.5,
},
)
actual_output = model_field_to_node_input(
IO.FLOAT, ModelWithFloatField, "cfg_scale"
)
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_int_input():
"""Tests mapping an int field with constraints."""
class ModelWithIntField(BaseModel):
num_frames: Optional[int] = Field(
default=10,
description="Number of frames to generate",
ge=1,
le=100,
multiple_of=1,
)
expected_output = (
IO.INT,
{
"default": 10,
"tooltip": "Number of frames to generate",
"min": 1,
"max": 100,
"step": 1,
},
)
actual_output = model_field_to_node_input(IO.INT, ModelWithIntField, "num_frames")
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_string_input():
"""Tests mapping a string field."""
class ModelWithStringField(BaseModel):
prompt: Optional[str] = Field(
default="A beautiful sunset over a calm ocean",
description="A prompt for the video generation",
)
expected_output = (
IO.STRING,
{
"default": "A beautiful sunset over a calm ocean",
"tooltip": "A prompt for the video generation",
},
)
actual_output = model_field_to_node_input(IO.STRING, ModelWithStringField, "prompt")
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_string_input_multiline():
"""Tests mapping a string field."""
class ModelWithStringField(BaseModel):
prompt: Optional[str] = Field(
default="A beautiful sunset over a calm ocean",
description="A prompt for the video generation",
)
expected_output = (
IO.STRING,
{
"default": "A beautiful sunset over a calm ocean",
"tooltip": "A prompt for the video generation",
"multiline": True,
},
)
actual_output = model_field_to_node_input(
IO.STRING, ModelWithStringField, "prompt", multiline=True
)
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_combo_input():
"""Tests mapping a combo field."""
class MockEnum(str, Enum):
option_1 = "option 1"
option_2 = "option 2"
option_3 = "option 3"
class ModelWithComboField(BaseModel):
model_name: Optional[MockEnum] = Field("option 1", description="Model Name")
expected_output = (
IO.COMBO,
{
"options": ["option 1", "option 2", "option 3"],
"default": "option 1",
"tooltip": "Model Name",
},
)
actual_output = model_field_to_node_input(
IO.COMBO, ModelWithComboField, "model_name", enum_type=MockEnum
)
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_combo_input_no_options():
"""Tests mapping a combo field with no options."""
class ModelWithComboField(BaseModel):
model_name: Optional[str] = Field(description="Model Name")
expected_output = (
IO.COMBO,
{
"tooltip": "Model Name",
},
)
actual_output = model_field_to_node_input(
IO.COMBO, ModelWithComboField, "model_name"
)
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_image_input():
"""Tests mapping an image field."""
class ModelWithImageField(BaseModel):
image: Optional[str] = Field(
default=None,
description="An image for the video generation",
)
expected_output = (
IO.IMAGE,
{
"default": None,
"tooltip": "An image for the video generation",
},
)
actual_output = model_field_to_node_input(IO.IMAGE, ModelWithImageField, "image")
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_node_input_no_description():
"""Tests mapping a field with no description."""
class ModelWithNoDescriptionField(BaseModel):
field: Optional[str] = Field(default="default value")
expected_output = (
IO.STRING,
{
"default": "default value",
},
)
actual_output = model_field_to_node_input(
IO.STRING, ModelWithNoDescriptionField, "field"
)
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_node_input_no_default():
"""Tests mapping a field with no default."""
class ModelWithNoDefaultField(BaseModel):
field: Optional[str] = Field(description="A field with no default")
expected_output = (
IO.STRING,
{
"tooltip": "A field with no default",
},
)
actual_output = model_field_to_node_input(
IO.STRING, ModelWithNoDefaultField, "field"
)
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_node_input_no_metadata():
"""Tests mapping a field with no metadata or properties defined on the schema."""
class ModelWithNoMetadataField(BaseModel):
field: Optional[str] = Field()
expected_output = (
IO.STRING,
{},
)
actual_output = model_field_to_node_input(
IO.STRING, ModelWithNoMetadataField, "field"
)
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]
def test_model_field_to_node_input_default_is_none():
"""
Tests mapping a field with a default of `None`.
I.e., the default field should be included as the schema explicitly sets it to `None`.
"""
class ModelWithNoneDefaultField(BaseModel):
field: Optional[str] = Field(
default=None, description="A field with a default of None"
)
expected_output = (
IO.STRING,
{
"default": None,
"tooltip": "A field with a default of None",
},
)
actual_output = model_field_to_node_input(
IO.STRING, ModelWithNoneDefaultField, "field"
)
assert actual_output[0] == expected_output[0]
assert actual_output[1] == expected_output[1]