Compare commits

..

154 Commits

Author SHA1 Message Date
snomiao
c4a243060b feat: Agent SDK auto-detects Claude Code session — no API key needed locally
ANTHROPIC_API_KEY is optional: Agent SDK uses Claude Code OAuth
session when running locally (detects CLAUDE_CODE_SSE_PORT).
In CI, ANTHROPIC_API_KEY from secrets is used.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 15:07:57 +00:00
snomiao
3690e98c79 refactor: require ANTHROPIC_API_KEY, remove Gemini-only fallback
The Gemini-only agentic loop had ~47% success rate — too low to be
useful as a fallback. Now ANTHROPIC_API_KEY is required for issue
reproduction. Fails clearly if missing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 15:01:51 +00:00
snomiao
1c40893cfa fix: inject cursor overlay via addScriptTag after login, not addInitScript
addInitScript runs before page load — Vue's app mount destroys the
cursor div when it takes over the DOM. Using addScriptTag after login
ensures the cursor persists in the stable DOM.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 13:30:37 +00:00
snomiao
72a28a1e76 fix: cursor overlay on locator clicks (clickByText, menu items)
Locator.click/hover bypasses our page.mouse monkey-patch. Now
clickByText, hoverMenuItem, clickSubmenuItem get the element
bounding box and update cursor overlay manually.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 11:58:01 +00:00
snomiao
49c95248e5 fix: verdict JSON grep pattern — capture value without closing quote
The grep \{"verdict":\s*"[^"]+ captures up to but not including the
closing quote. The second grep for "[A-Z_]+"$ then fails because
there's no closing quote. Fixed: match "verdict":\s*"[A-Z_]+ then
extract [A-Z_]+$ (no quotes needed).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 10:51:35 +00:00
snomiao
6bb9d18ca6 fix: grep pipefail crash + add QA troubleshooting doc
- Add || true to all grep pipelines in deploy script (grep returns 1
  on no match, pipefail kills script)
- Add docs/qa/TROUBLESHOOTING.md covering all failures encountered:
  __name errors, zod/v4 imports, model IDs, badge mismatches, cursor,
  loadDefaultWorkflow, pressKey timing, agent behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 10:17:36 +00:00
snomiao
ca49e9cb1b feat: structured JSON verdict from AI reviewer, light-first theme
- Video review prompt now requests a ## Verdict JSON block:
  {"verdict": "REPRODUCED|NOT_REPRODUCIBLE|INCONCLUSIVE", "risk": "low|medium|high"}
- Deploy script reads JSON verdict first, falls back to grep
- Eliminates all regex-matching false positives permanently
- Theme: light mode is default, dark via prefers-color-scheme:dark
- Cards use solid backgrounds, grain overlay only in dark mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 09:11:09 +00:00
snomiao
db538c9d76 feat: report site follows system light/dark theme
Add prefers-color-scheme:light media query with light palette.
Replace hardcoded dark oklch values with CSS variables.
Light mode: white surfaces, dark text, subtle borders, no grain overlay.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 08:53:08 +00:00
snomiao
c04b31a0f1 fix: loadDefaultWorkflow uses API instead of menu, pressKey uses instant press
- loadDefaultWorkflow now calls app.resetToDefaultWorkflow() via JS API
  instead of navigating File → Load Default menu (menu item name varies)
- pressKey reverted to instant press() — the 400ms hold via down/up
  prevented Escape from propagating to parent dialog (#10397 BEFORE video
  showed wrong behavior because hold intercepted the event)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 08:45:14 +00:00
snomiao
5938fdef8d fix: monkey-patch page.mouse for universal cursor overlay
Instead of manually calling moveCursorOverlay in each action,
patch page.mouse.move/click/dblclick/down/up globally. Now EVERY
mouse operation shows the cursor — text clicks, menu hovers, etc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 15:47:56 +00:00
snomiao
27d55f093b fix: use gemini-3-flash-preview in hybrid agent (not 2.5 preview)
Gemini 2.5 preview models return 404. Always use gemini-3+ models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 15:45:29 +00:00
snomiao
c1ddb8669e fix: add 'could not be confirmed' to negative verdict patterns
"could not be confirmed" contains "confirmed" which matched the
positive reproduc|confirm check. Now caught by the negative check first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 15:17:02 +00:00
snomiao
57dbf0132d fix: correct Claude model ID — claude-sonnet-4-6 (not dated suffix)
The Agent SDK returned "model not found" for claude-sonnet-4-6-20250514.
Correct ID is claude-sonnet-4-6.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 14:36:22 +00:00
snomiao
b4c49588dc fix: cursor overlay now controlled via __moveCursor, not DOM events
Headless Chrome's Playwright CDP doesn't trigger DOM mousemove events
reliably. Now executeAction calls __moveCursor(x,y) directly after
every mouse.move/click/drag. Cursor is an SVG arrow (white + outline).
Click state shown via scale animation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 13:25:12 +00:00
snomiao
00dc10e9e6 feat: badge label includes QA date — #10397 QA0327
Shows when the QA was run so stale results are obvious at a glance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 08:43:19 +00:00
snomiao
48414fa1b5 fix: make key presses visible in video — hold + subtitle
pressKey now uses keyboard.down/up with 400ms hold instead of
instant press(). Shows subtitle "⌨ Escape" and the keyboard HUD
catches the held state for video frame capture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 08:33:22 +00:00
snomiao
e744f101b0 fix: use zod instead of zod/v4 — project zod doesn't export /v4 subpath
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:51:37 +00:00
snomiao
9ad8267067 fix: add claude-agent-sdk to workspace catalog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:26:54 +00:00
snomiao
cf54ddb6d3 feat: control/test comparison strategy + QA backlog doc
- Agent system prompt now instructs Claude to demonstrate BOTH working
  (control) and broken (test) states when bug is triggered by a setting
- Added docs/qa/backlog.md with future improvements: Type B/C comparisons,
  TTS, pre-seeding, cost optimization, environment-dependent issues

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:24:10 +00:00
snomiao
fce31cf0bf feat: all badges use vertical box style
Drop horizontal badges. Universal box badge shows:
  ┌──────────────────┐
  │    #7414 QA       │
  │ ✓ 1 reproduced   │
  │ ⚙ Fix: APPROVED  │  ← only for PRs
  └──────────────────┘

Issues show repro/not-repro/inconclusive rows.
PRs add a fix quality row (APPROVED/MINOR/MAJOR).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:20:05 +00:00
snomiao
050091abc6 feat: show QA pipeline commit hash + timing on report site
- Shows "QA @ abc1234" linking to the pipeline code commit
- Shows start time → deploy time in header
- Helps trace which version of QA scripts generated each report

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:16:41 +00:00
snomiao
548e37b9a5 feat: hybrid QA agent — Claude Sonnet 4.6 brain + Gemini vision
Architecture:
- Claude Sonnet 4.6 plans and reasons (via Claude Agent SDK)
- Gemini 2.5 Flash watches video buffer and describes what it sees
- 4 tools: observe(), inspect(), perform(), done()

observe(seconds, focus): builds video clip from screenshot buffer,
  sends to Gemini with Claude's focused question.
inspect(selector): searches a11y tree for specific element state.
perform(action, params): executes Playwright action.
done(verdict, summary): signals completion.

Falls back to Gemini-only loop if ANTHROPIC_API_KEY not set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 06:27:30 +00:00
snomiao
458b2e918c feat: pass OPENAI_API_KEY to recording step for TTS narration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 04:53:13 +00:00
snomiao
63dbe002d1 feat: subtitle overlay + OpenAI TTS narration on reproduce videos
- Agent reasoning shown as subtitle bar at bottom of video during recording
- After recording, generates TTS audio via OpenAI API (tts-1, nova voice)
- Merges audio clips at correct timestamps into the video with ffmpeg
- Requires OPENAI_API_KEY env var; gracefully skips if not set
- No-sandbox + disable-dev-shm-usage for headless Chrome compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 04:42:03 +00:00
snomiao
f3d9a8c2e4 feat: show test requirements from QA guide on report site
- Download QA guide artifact in report job
- Extract prerequisites, test focus, and steps from guide JSON
- Display below the purpose description: focus → prerequisites → steps
- Separated by a subtle divider with smaller font

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 03:28:03 +00:00
snomiao
831718bd50 feat: purpose description on report + multi-pass video link fix
- Report site shows "PR #N aims to..." or "Issue #N reports..." block
  above the video cards, extracted from pr-context.txt
- Multi-pass video links fall back to pass1 when qa-{os}.mp4 is 404
- More negative verdict patterns: "does not demonstrate", "never tested"
- Risk uses first word of Overall Risk (avoids "high confidence" match)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 03:23:50 +00:00
snomiao
7d3ddbf619 fix: verdict detection — more negative patterns, risk uses first word
- Add "does not demonstrate", "steps were not performed", "never tested"
  to NOT_REPRO patterns (fixes #9101 false positive)
- Risk detection uses first word of Overall Risk section instead of
  grepping entire text (fixes "high confidence" matching HIGH)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 02:23:18 +00:00
snomiao
a42240cb65 fix: use addScriptTag for keyboard HUD to avoid tsx __name issue
tsx compiles arrow functions with __name helpers that don't exist in
browser context. Using addScriptTag with plain JS string avoids this.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 02:03:34 +00:00
snomiao
c957c0833b fix: remove TS type annotation from page.evaluate (browser context)
Set<string>() in page.evaluate causes __name ReferenceError in browser.
Use untyped Set() since browser JS doesn't support TS generics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 01:55:46 +00:00
snomiao
e01d4aaffc debug: add verdict count logging to deploy script 2026-03-27 01:54:14 +00:00
snomiao
3f226467cd fix: check negative verdicts before positive in per-report classification
"fails to reproduce" contains "reproduce" — must check negatives first
within each report. Across reports, REPRODUCED still wins (multi-pass).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 01:41:06 +00:00
snomiao
d4d8772ae7 feat: keyboard HUD overlay shows pressed keys in video
Injects a persistent overlay in bottom-right corner that displays
currently held keys (e.g. "⌨ Space", "⌨ CTRL+C"). Makes keyboard
interactions visible in the recording for both human and AI reviewers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 01:38:44 +00:00
snomiao
b578f8d7c4 feat: vertical box badge for multi-pass with breakdown
Multi-pass issues show a stacked box badge:
  ┌──────────────┐
  │  #7806 QA    │
  │ ✓ 1 reproduced    │
  │ ⚠ 1 inconclusive  │
  └──────────────┘

Single-pass issues keep the standard horizontal badge.
Badge colors: blue=reproduced, gray=not-repro, yellow=inconclusive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:49:19 +00:00
snomiao
d5360ce45c feat: show pass counts in badge for multi-pass reports (X/Y REPRODUCED)
When multiple report files exist, badge shows "2/3 REPRODUCED" instead
of just "REPRODUCED". Single-pass issues still show plain verdict.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:45:16 +00:00
snomiao
0e6d9fd926 fix: REPRODUCED wins over INCONCLUSIVE in multi-pass badge
When multiple passes exist and one confirms while another is
inconclusive, the badge should show REPRODUCED. Previously
INCONCLUSIVE was checked first, hiding successful reproductions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:36:59 +00:00
snomiao
de810f88a4 fix: cloneNode uses Ctrl+C/V instead of right-click Clone menu
The "Clone" context menu item doesn't exist in Nodes 2.0 mode.
Using Ctrl+C/Ctrl+V works in both legacy and Nodes 2.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:22:31 +00:00
snomiao
722b01a253 fix: preflight performs actual repro steps, not just setup
- #10307: preflight clones KSampler node, hint says drag to overlap
- #7414: preflight clicks numeric widget, hint says drag to change value
- #7806: preflight takes baseline screenshot, hint gives exact coords
  for holdKeyAndDrag with spacebar
- Hints now reference "Preflight already did X, NOW do Y" pattern

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:07:42 +00:00
snomiao
dfd19a3cf9 fix: tell agent what preflight already did to prevent repeated actions
Agent was wasting turns re-doing loadDefaultWorkflow and setSetting
that preflight already executed. Now the system prompt includes
"Already Done" section listing preflight actions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:40:48 +00:00
snomiao
3531e37ae7 fix: preflight actions + badge false-positive pattern
- Auto-execute prerequisite actions (enable Nodes 2.0, load default
  workflow) BEFORE the agentic loop starts. Agent model ignores prompt
  hints but preflight guarantees nodes are on canvas.
- Add "fails to reproduce" to NOT REPRODUCIBLE badge patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:25:41 +00:00
snomiao
024b231c05 feat: qa-issue label trigger + labels in issue context
- Add issues:[labeled] trigger and qa-issue label support
- Resolve github.event.issue.number for issue-triggered runs
- Include issue labels in context (feeds keyword matcher for hints)
- Remove qa-issue label after run completes (same as qa-changes/qa-full)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:49:00 +00:00
snomiao
0c22369c60 fix: keyword-driven action hints for agent issue reproduction
Scan issue context for keywords (clone, copy-paste, spacebar, resize,
sidebar, scroll, middle-click, node shape, Nodes 2.0, etc.) and inject
specific MUST-follow action steps into the agentic system prompt.

Addresses 9 INCONCLUSIVE issues where agent had actions available
but didn't know when to use them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:43:31 +00:00
snomiao
511fdf1b24 fix: restyle QA annotations to avoid misleading AI reviewer
- Annotations now use cyan dashed border + monospace "QA:" prefix
  instead of red solid labels that look like UI error messages
- Video review prompts explicitly tell reviewer to ignore QA annotations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:34:51 +00:00
snomiao
d756c362e3 fix: badge mismatch, multi-pass report overwrite, agent node creation
P1: Filter out QA bot's own comments from pr-context (INCONCLUSIVE loop)
P2: Grep only ## Summary section for verdict (false REPRODUCED fix)
P3: Strip markdown bold before matching Overall Risk section
P4: Deploy full placeholder page with spinner during CI
P5: Pass #NUM QA label to PREPARING/ANALYZING badges
P6: Add copyPaste, holdKeyAndDrag, resizeNode, middleClick actions
P7: preload=auto + custom seekbar (already deployed)
P8: Deploy FAILED badge on report job failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:02:17 +00:00
snomiao
ba512fd263 fix: video seeking — preload=auto, custom seekbar, _headers
- Change preload=metadata to preload=auto for full video download
- Add _headers file with Accept-Ranges for Cloudflare Pages
- Add custom seekbar (range input + buffer indicator) that works
  even without server HTTP range request support
- Seekbar shows buffered progress and allows dragging to any point

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:20:56 +00:00
snomiao
e1fb782832 fix: strengthen after-mode prompt to test PR-specific behavior
Previous prompt said "test the specific behavior" which was too vague,
leading to generic UI walkthroughs instead of targeted tests.

New prompt: explicitly instructs to read the diff, trigger the exact
scenario the PR fixes, and avoid generic menu screenshots.

Also added reload action to before/after prompt for state persistence tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 12:58:02 +09:00
snomiao
9c347642ba fix: badge mismatch, multi-pass report overwrite, agent node creation
- Fix quality badge now reads "## Overall Risk" section only
- Prevents false MAJOR ISSUES from severity labels or negated phrases
- "Low" risk → APPROVED, "High" → MAJOR ISSUES, "Medium" → MINOR ISSUES

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 03:47:54 +00:00
snomiao
5f8f40b559 fix: install pnpm before building PR frontend in sno-qa-* triggers
setup-frontend must run first to install node/pnpm, then rebuild
with PR code. Also re-install sno-skills deps after switching back
so QA scripts' dependencies are available.

Also gitignore .claude/scheduled_tasks.lock.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 12:17:26 +09:00
snomiao
264e71a9de fix: restore sno-skills scripts after building PR frontend
When triggered via sno-qa-* push, the workflow checks out the PR code
to build its frontend, but this replaces qa-record.ts which only
exists on sno-skills. Fix: build PR frontend, then checkout back to
sno-skills so QA scripts remain available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:56:12 +09:00
snomiao
49fa1b3caa fix: use array subshell instead of mapfile for macOS compat
mapfile is not available on macOS default bash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:22:05 +09:00
snomiao
3142c8ead6 fix: resolve remaining shellcheck warnings in qa-deploy-pages.sh
- SC2231: quote glob expansions in for loop
- SC2002: use sed directly instead of cat | sed
- SC2086: quote variable in echo

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:52:12 +09:00
snomiao
4310c5238a feat: clickable badges with #NUM label and copy button
- Badge generators accept optional label param (#NUM QA)
- Badge in PR/issue comments links to report site
- Report site shows badge with copy-to-clipboard button
- Copy button produces markdown: [![QA Badge](url/badge.svg)](url/)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 15:34:23 +00:00
snomiao
a7d7d39712 fix: resolve CI failures — test, shellcheck, format
- Update DefaultThumbnail test to match size-full class change
- Fix shellcheck warnings in qa-batch.sh (SC2001, SC2207)
- Fix shellcheck warnings in qa-deploy-pages.sh (SC2034, SC2235, SC2231, SC2002)
- Add qa-report-template.html to oxfmt ignore (minified, not formattable)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:33:29 +09:00
snomiao
24ac6f1566 fix: badge pattern too narrow/broad, multi-pass video discovery
- "confirmed" didn't match "confirms"/"reproducible" — use "reproduc|confirm" stem
- "partial" matched unrelated text — require "partially reproduced" specifically
- collectVideoCandidates now finds qa-session-*.mp4 for multi-pass reviews

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 12:18:53 +00:00
snomiao
f9a5baba1a fix: badge mismatch, multi-pass report overwrite, agent node creation
- Check INCONCLUSIVE before reproduced/confirmed in badge detection
- Exclude markdown headings from reproduced grep match
- Add --pass-label to qa-video-review.ts for unique multi-pass filenames
- Pass pass label from workflow YAML when reviewing numbered sessions
- Collect all pass-specific reports in deploy script HTML
- Add addNode/cloneNode convenience actions to qa-record agent
- Improve strategy hints for visual/rendering bug reproduction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 10:48:38 +00:00
snomiao
628f64631b chore: tidy PR for merge — resolve TODOs, fix misplaced import
- Remove push trigger (was for dev testing only)
- Restore concurrency group (was commented out for dev)
- Move misplaced import in qa-analyze-pr.ts to top of file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 19:40:55 +09:00
snomiao
bc38b2ce13 fix: extract deploy step into script to fix expression length limit
The Cloudflare Pages deploy step exceeded GitHub Actions' 21000 char
expression limit due to inline HTML/CSS/JS. Extract to
scripts/qa-deploy-pages.sh + scripts/qa-report-template.html.
2026-03-25 09:24:17 +00:00
snomiao
ca8e4d2a29 fix: enforce menu navigation pattern + add CI job link to report
- Strengthen prompt: MUST use openMenu → hoverMenuItem → clickMenuItem
  in that order. Previous runs skipped openMenu causing silent failures.
- Add CI Job link to the QA report site header for quick navigation
  to the GitHub Actions run that generated the report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 17:46:46 +09:00
snomiao
55ce174c5b fix: split dual badge generator into separate step to fix expression length
GitHub Actions has a 21000 char limit per expression. The combined
badge setup step exceeded this after adding the dual badge generator.
Split into its own step.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 08:37:23 +00:00
snomiao
49176271f8 chore: retrigger QA pipeline 2026-03-25 06:23:00 +00:00
snomiao
50518449fc fix: remove Bug: prefix from PR badge, just show REPRODUCED directly
Badge now reads: QA Bot | REPRODUCED | Fix: APPROVED
Not all issues are bugs — could be feature requests too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 15:21:02 +09:00
snomiao
65aa03b20d fix: use blue for REPRODUCED badge (success, not failure)
Reproducing a bug is a successful outcome for the QA bot.
Blue (#2196f3) = bot succeeded. Red = bot found problems with the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 15:19:28 +09:00
snomiao
ce59b6a431 feat: combine PR bug+fix into single dual-segment badge
PRs now show one badge with three segments:
  QA Bot | Bug: REPRODUCED | Fix: APPROVED

Instead of two separate badges. Uses gen-badge-dual.sh which
renders label + bug status + fix status in one SVG.

Issues still use single two-segment badge:
  QA Bot | FINISHED: REPRODUCED

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 15:14:35 +09:00
snomiao
203e8c8a60 feat: add visible cursor overlay and annotation action to QA recorder
- Inject fake cursor (red dot with click animation) via addInitScript
  since headless Chrome doesn't render the system cursor in video
- Add hover-before-click delay to clickByText and canvas clicks
  so viewers can see where the cursor moves before clicking
- Add 'annotate' action: shows a floating label at (x,y) for N ms
  so AI can draw viewer attention to important UI state changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:50:25 +09:00
snomiao
b485d22760 fix: feed QA guide + issue context to agentic reproduce loop
Root cause: runAgenticLoop never read the QA guide — agent saw
"No issue context provided" for issues. Now reads qaGuideFile,
parses structured fields, and injects into system prompt.

Also: fetch issue body via gh issue view in workflow, increase
budget to 120s/30 turns, add focus reminders, smarter stuck
detection (50px grid normalization + action-type frequency),
reject invalid click targets, add loadDefaultWorkflow and
openSettings convenience actions, strategy hints in prompt.

Fix pre-existing typecheck error in eslint.config.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 05:47:04 +00:00
snomiao
2d1088f79e feat: dual badges for PRs — bug reproduction + fix quality
PRs now get two separate badges:
- Bug: REPRODUCED / NOT REPRODUCIBLE / PARTIAL (before branch)
- Fix: APPROVED / MAJOR ISSUES / MINOR ISSUES (after branch)

Issues keep a single badge: FINISHED: REPRODUCED / etc.

Both badge-bug.svg and badge-fix.svg served from the deploy site.
PR comment shows all three: ![badge] ![bug] ![fix]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:39:15 +09:00
snomiao
1fe0f97aa5 fix: badge FINISHED state includes result sub-state
FINISHED is not standalone — always shows result:
- FINISHED: REPRODUCED / NOT REPRODUCIBLE / PARTIAL (issues)
- FINISHED: APPROVED / MAJOR ISSUES / MINOR ISSUES (PRs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:25:53 +09:00
snomiao
178ecc6746 feat: add SVG status badge to QA report site
Badge shows QA pipeline status, deployed at each stage:
- PREPARING (blue) — setting up artifacts
- ANALYZING (orange) — running video review
- Final status with color:
  - Issues: REPRODUCED (red) / NOT REPRODUCIBLE (gray) / PARTIAL (yellow)
  - PRs: APPROVED (green) / MAJOR ISSUES (red) / MINOR ISSUES (yellow)

Badge served as /badge.svg from the same Cloudflare Pages site.
Included in PR comment as ![QA Badge](url/badge.svg).

Also restore @ts-expect-error for import-x plugin type incompatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:04:57 +09:00
GitHub Action
20f878f929 [automated] Apply ESLint and Oxfmt fixes 2026-03-25 04:31:51 +00:00
snomiao
712c386a69 fix: regenerate pnpm-lock.yaml after rebase
The rebase introduced a duplicated mapping key in the lockfile,
causing ERR_PNPM_BROKEN_LOCKFILE in CI.
2026-03-25 04:28:26 +00:00
snomiao
8f6fa738a5 fix: use correct flash model name for agentic loop 2026-03-25 03:19:28 +00:00
snomiao
387862d8b9 feat: agentic screenshot feedback loop + multi-pass recording
Replace single-shot step generation in reproduce mode with an agentic
loop where Gemini sees the screen after each action and decides what
to do next. For multi-bug issues, decompose into sub-issues and run
separate recording passes.

- Extract executeAction() from executeSteps() for reuse
- Add reload and done action types
- Add captureScreenshotForGemini() (JPEG q50, ~50KB)
- Add runAgenticLoop() with sliding window history (3 screenshots)
- Add decomposeIssue() for multi-pass recording (1-3 sub-issues)
- Update workflow to handle numbered session videos (qa-session-1, etc.)
2026-03-25 03:18:01 +00:00
snomiao
ff98eb13e4 feat: add frame-by-frame video controls to QA report player
- Add custom video controls below each video with frame stepping
- Frame back/forward buttons (1 frame at 30fps, 10 frames skip)
- Speed selector: 0.1x, 0.25x, 0.5x (default), 1x, 1.5x, 2x
- Keyboard shortcuts: arrow keys for frame step, space for play/pause
- SMPTE-style timecode display (m:ss.ms)
- Default 0.5x speed since AI operates UI faster than humans
- Videos no longer autoplay (pause on load for inspection)
- Zero external dependencies (pure HTML5 video API)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:18:01 +00:00
snomiao
f4f80d179f fix: cap reproduce video at 5min, skip env setup in Phase 4
- Reproduce video must be max 5 minutes (short, focused demo)
- Phase 4 reuses the environment from Phase 3 (no re-setup)
- Use video-start/video-stop commands (not --save-video flag)
- Start recording right before steps, stop immediately after

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:18:01 +00:00
snomiao
f758e16b72 feat: add reproduce-issue skill with two-video architecture
New Claude agent-driven issue reproduction skill that:
- Phase 1-2: Research issue and set up environment (custom nodes, workflows, settings)
- Phase 3: Record research video while exploring interactively via playwright-cli
- Phase 4: Record clean reproduce video with only the minimal repro steps
- Phase 5: Generate structured reproduction report

Key difference from the old approach: Claude agent explores and adapts
instead of blindly executing a Gemini-generated static plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:18:01 +00:00
snomiao
45e309c5f8 fix: add default workflow node positions to QA prompts
Gemini was right-clicking empty canvas instead of nodes because it
didn't know where the default workflow nodes are positioned. Now the
prompt includes approximate coordinates for all 7 default nodes and
clarifies the difference between node context menu vs canvas menu.

Also fixes TS2352 in page.evaluate by using double-cast through unknown.
2026-03-25 03:18:01 +00:00
snomiao
79df405733 feat: upgrade QA pipeline to Gemini 3.x models
- qa-record.ts, qa-analyze-pr.ts: gemini-2.5-flash/pro → gemini-3.1-pro-preview
- qa-video-review.ts, qa-generate-test.ts: gemini-2.5-flash → gemini-3-flash-preview
- pr-qa.yaml: update hardcoded model reference
- Add docs/qa/models.md with model comparison and rationale
2026-03-25 03:18:01 +00:00
snomiao
27c64e1092 feat: add loadWorkflow and setSetting actions to QA recorder
- Add loadWorkflow action to load workflow JSON from URL
- Add setSetting action to configure ComfyUI settings
- Improve reproduce mode prompt to emphasize establishing prerequisites
  (save workflow first, make it dirty, add needed nodes, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:18:01 +00:00
snomiao
a1307ef35c feat: add clickable issue/PR URLs to QA reports
- Add --target-url CLI option to qa-video-review.ts
- Include target URL in generated markdown reports
- Add clickable issue/PR link in deployed HTML report header
- Workflow passes the target URL automatically

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:17:56 +00:00
snomiao
11f02a2645 fix: resolve pre-existing typecheck errors
- Remove unused @ts-expect-error directives in eslint.config.ts
- Simplify LazyImage prop types from ClassValue to string
- Fix DialogInstance to avoid infinitely deep type instantiation
- Use cn() in DefaultThumbnail for class merging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:17:30 +00:00
snomiao
b3bcc3ff4c fix: improve fillDialog and clickSubmenuItem for litegraph UI components
- fillDialog now tries: PrimeVue dialog → node search box → focused input → keyboard fallback
- clickSubmenuItem now tries: PrimeVue tiered menu → litegraph context menu → role menuitem
- Fixes double-click-to-add-node flow and right-click context menu clicks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:30 +00:00
snomiao
49e904918e fix: add step-level error resilience and click timeout in QA recording
- Wrap each step in try/catch so failed steps don't abort the recording
- Add 5s timeout to clickByText to prevent 30s hangs on disabled elements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
78e5b1e1b3 feat: expand QA action set and improve issue reproduction depth
- Add new canvas actions: rightClick, doubleClick, clickCanvas,
  rightClickCanvas, dragCanvas, scrollCanvas for node graph interactions
- Increase reproduce mode step limit from 3-6 to 8-15 steps
- Add ComfyUI UI context to prompts (canvas layout, node interactions)
- Add anti-hallucination instructions to video review for issue mode
- Improve issue analysis prompt with detailed action descriptions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
81e3dc72f8 fix: use pulls API instead of gh pr view for PR/issue detection
gh pr view can't distinguish PRs from issues — it succeeds for both.
Use the REST API endpoint repos/{owner}/{repo}/pulls/{number} which
returns 404 for issues.
2026-03-25 03:17:29 +00:00
snomiao
936cf83337 fix: address Copilot review feedback on QA scripts
- Enforce requestTimeoutMs via Gemini SDK requestOptions
- Add 100MB video size check before base64 encoding
- Sanitize screenshot filenames to prevent path traversal
- Sort video files by mtime for reliable rename
- Validate --mode arg against allowed values
- Add Content-Length pre-check in downloadMedia
- Add GitHub domain allowlist for media downloads (SSRF mitigation)
- Add contents:write permission and git config for report job
- Update Node.js requirement in SKILL.md from 18+ to 22+
2026-03-25 03:17:29 +00:00
snomiao
1d880bf493 fix: gracefully skip invalid pressKey values in QA recording
Instead of crashing the entire recording session when Gemini generates
an invalid key name (e.g. "mouseWheelDown"), catch the error and
continue with remaining steps.
2026-03-25 03:17:29 +00:00
snomiao
da3e6cb4cf feat: add behavior changes summary table to QA video review
Add a "Behavior Changes" table (Behavior, Before, After, Verdict)
alongside the existing timeline comparison. This gives reviewers a
quick high-level view of all behavioral differences before diving
into the frame-by-frame timeline.
2026-03-25 03:17:29 +00:00
snomiao
a39e3054cf fix: format before/after comparison as table in QA video review
Instruct Gemini to output the Before vs After section as a markdown
table with Time, Type, Severity, Before, After columns for easier
comparison. Update HTML template table styles with fixed layout and
column widths optimized for the 5-column comparison format.
2026-03-25 03:17:29 +00:00
snomiao
5a6178e924 fix: handle array response from Gemini in analyze-pr
Gemini Pro with responseMimeType: 'application/json' returns a JSON
array [before, after] instead of {before, after}. Handle both shapes.
2026-03-25 03:17:29 +00:00
snomiao
a11e8a67f8 fix: make analyze-pr non-blocking and log Gemini response
- Log raw Gemini response for debugging when parsing fails
- Handle possible wrapper keys in response
- Make qa-before/qa-after run even if analyze-pr fails (only gate
  on resolve-matrix success)
2026-03-25 03:17:29 +00:00
snomiao
6a836b7c25 fix: extract PR number from sno-qa-<number> branch name
When running on push events for sno-qa-* branches without an open PR,
extract the PR number from the branch name so analyze-pr can fetch
the full PR thread for analysis.
2026-03-25 03:17:29 +00:00
snomiao
f91f94f71a feat: add analyze-pr job to QA pipeline
Add Gemini Pro-powered PR analysis that generates targeted QA guides
from the full PR thread (description, comments, screenshots, diff).
The analyze-pr job runs on lightweight ubuntu before recordings start,
producing qa-guide-before.json and qa-guide-after.json that are
downloaded by recording jobs to produce more focused test steps.

Graceful fallback: if analysis fails, recordings proceed without guides.
2026-03-25 03:17:29 +00:00
snomiao
7502528733 fix: download before/after artifacts into separate directories
download-artifact@v7 merges all files flat regardless of
merge-multiple setting. Use separate path dirs (before/after)
and copy all files into the report directory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
0656091959 fix: set merge-multiple false for download-artifact v7
download-artifact@v7 defaults merge-multiple to true, which puts all
files flat in qa-artifacts/ instead of per-artifact subdirectories.
The merge step expects qa-artifacts/qa-before-{os}-{run}/ subdirs,
so the report directory never gets created and video review finds
no files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
6515170d08 refactor: split QA into parallel before/after jobs
Instead of running before/after sequentially in a single job with
fragile git stash/checkout gymnastics, split into two independent
parallel jobs on separate runners:

  resolve-matrix → qa-before (main) ─┐
                 → qa-after  (PR)   ─┴→ report

- qa-before: uses git worktree for clean main branch build
- qa-after: normal PR build via setup-frontend
- report: downloads both artifact sets, merges, runs Gemini review

Benefits:
- Clean workspace isolation (no git checkout origin/main -- .)
- ~2x faster (parallel execution)
- Each job gets its own ComfyUI server (no shared state)
- Eliminates entire class of workspace contamination bugs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
25cbe56a34 fix: select existing user from dropdown instead of re-creating
The Pre-seed step creates qa-ci via API, so the "New user" form
shows "already exists" error. Fix by selecting the existing user
from the dropdown first, falling back to a unique username.
2026-03-25 03:17:29 +00:00
snomiao
120a531ef9 fix: use login page directly instead of localStorage bypass
The localStorage userId bypass doesn't work because the server
validates user IDs and rejects the simple 'qa-ci' string. Instead,
detect the login page by its input fields and create a user via the
"New user" text input, which is how real users would log in.
2026-03-25 03:17:29 +00:00
snomiao
389f6ca6b8 fix: switch from Firefox to Chromium for WebGL canvas support
Firefox headless doesn't support WebGL, causing "getCanvas: canvas is
null" errors. Switch to Chromium which has full headless WebGL support.
Also fix login flow to wait for async router guard to settle and
create user via text input as fallback.
2026-03-25 03:17:29 +00:00
snomiao
6298fc3a58 fix: add debug screenshot and fallback for menu button click
Add coordinate fallback when .comfy-menu-button-wrapper selector isn't
found, and capture a debug screenshot after login to diagnose what the
page looks like when the editor UI fails to render.
2026-03-25 03:17:29 +00:00
snomiao
83702c2e87 fix: use proper CSS selectors for menu interactions in QA recording
The openComfyMenu was clicking at hardcoded coordinates (20, 67) which
missed the menu button. Now uses .comfy-menu-button-wrapper selector
matching the browser tests. Also fixes menu item hover/click selectors
to use .p-menubar-item-label and .p-tieredmenu-item classes, and adds
a wait for the editor UI to fully load before executing test steps.
2026-03-25 03:17:29 +00:00
snomiao
c5d207fa9a fix: bypass login in QA recordings with localStorage pre-seeding
The QA recordings were stuck on the user selection screen because CI
has no existing users. Fix by pre-seeding localStorage with userId,
userName, and TutorialCompleted before navigation, plus creating a
qa-ci user via API as a fallback.
2026-03-25 03:17:29 +00:00
snomiao
ab6dff02c9 feat: autoplay and loop videos on QA dashboard 2026-03-25 03:17:29 +00:00
snomiao
7396d39a6a fix: handle flat artifact layout when no report.md exists
The normalize step couldn't create the qa-report-* subdir because it
only looked for *-report.md files. Add fallback to detect webm files.
2026-03-25 03:17:29 +00:00
snomiao
99e6681237 fix: exclude known video names when renaming Playwright recordings
The AFTER step was renaming qa-before-session.webm instead of the new
recording. Filter out already-named files before picking the latest.
2026-03-25 03:17:29 +00:00
snomiao
ae29224874 fix: dismiss dropdown overlay before clicking Next in QA login flow
The user dropdown shows "No available options" in CI, and the overlay
blocks the Next button. Dismiss with Escape before attempting click.
2026-03-25 03:17:29 +00:00
snomiao
47e5c39ac9 fix: install main branch deps before building, then reinstall PR deps
Main branch imports @vueuse/router which isn't in PR's node_modules.
Need both: main deps for building main, PR deps for running QA scripts.
2026-03-25 03:17:29 +00:00
snomiao
28f530d53a fix: reinstall PR deps after main branch build to restore @google/generative-ai
The main branch build step was running pnpm install with main's lockfile,
which removed @google/generative-ai from node_modules. Move the reinstall
to after restoring PR files so the QA recording script can find its deps.
2026-03-25 03:17:28 +00:00
snomiao
282754743d fix: temporarily disable concurrency group to unstick QA runs 2026-03-25 03:17:28 +00:00
snomiao
0a91ec09e8 fix: restore PR files with git checkout HEAD instead of git checkout -
git checkout - uses @{-1} which requires a previous branch switch.
Since we use 'git checkout origin/main -- .' (file checkout, not branch
switch), there is no @{-1} ref. Use HEAD to restore from current branch.

Also restore proper concurrency group.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
25fd1b2700 fix: use unique concurrency group to unstick QA runs 2026-03-25 03:17:28 +00:00
snomiao
b9d5ff0f8d ci: trigger QA run 2026-03-25 03:17:28 +00:00
snomiao
746d465912 feat: integrate comfy-qa skill test plan into QA recording pipeline
Pass the comprehensive test plan from .claude/skills/comfy-qa/SKILL.md
to Gemini when generating test steps. This gives Gemini knowledge of all
12 QA categories (canvas, menus, sidebar, settings, etc.) so it picks
the most relevant tests for each PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
e314a18b90 fix: use vite build directly to skip nx typecheck dependency
nx build runs typecheck as a prerequisite (via @nx/vite/plugin config).
Use vite build directly for the main branch comparison build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
78fb9ef27f fix: skip typecheck when building main branch for QA comparison
Main branch may have transient TS errors when built with the PR
branch's lockfile. Since we only need the dist for visual comparison,
run nx build directly instead of pnpm build (which includes typecheck).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
41999c2e0f refactor: replace Codex with direct Playwright recording in QA pipeline
Replace the unreliable codex exec approach with a Playwright script
(qa-record.ts) that uses Gemini to generate targeted test steps from
the PR diff, then executes them deterministically via Playwright's API.

Key changes:
- New scripts/qa-record.ts: Gemini generates JSON test actions, Playwright
  executes them with reliable helper functions (menu nav, dialog fill, etc.)
- Remove codex CLI and playwright-cli dependencies
- Remove 150+ lines of prompt templates from pr-qa.yaml
- Firefox headless with video recording (same approach proven locally)
- Fallback steps if Gemini fails

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
5c243d16be feat: auto-generate regression tests from QA reports
- Tighten BEFORE prompt to 15s snapshot (show old state only)
- Add qa-generate-test.ts: Gemini-powered Playwright test generator
- New workflow step: generate .spec.ts and push to {branch}-add-qa-test
- Tests assert UIUX behavior (tab names, dirty state, visibility)
2026-03-25 03:17:28 +00:00
snomiao
774dcd823e feat: before/after video comparison for QA pipeline
- Build both main (dist-before/) and PR (dist/) frontends in focused mode
- Run QA twice: BEFORE on main branch frontend, AFTER on PR branch
- Send both videos to Gemini in one request for comparative analysis
- Side-by-side dashboard layout with Before (main) / After (PR) panels
- Comparative prompt evaluates whether before confirms old behavior
  and after proves the fix works
- Falls back to single-video mode when no before video available
2026-03-25 03:17:28 +00:00
snomiao
b500f826fc fix: make QA videos seekable with faststart and frequent keyframes
moov atom was at end of file (8.6MB offset) — browser had to download
the entire video before seeking. Keyframes were only every 10 seconds.

Add -movflags +faststart (moov before mdat) and -g 60 (keyframe every
2.4s at 25fps) to ffmpeg conversion.
2026-03-25 03:17:28 +00:00
snomiao
1cd9e171c6 fix: issue cards instead of dense table, rename to comfy-qa.pages.dev
- Replace 6-column confirmed issues table with vertical card blocks
  using colored severity/timestamp/confidence badges
- Rename Cloudflare Pages project from comfyui-qa-videos to comfy-qa
2026-03-25 03:17:28 +00:00
snomiao
5c0bef9b72 fix: seekable video, hide empty cards, PR-aware video review
- Remove autoplay/loop so video timeline is seekable
- Skip card generation for platforms without recordings
- Add --pr-context flag to qa-video-review.ts so Gemini evaluates
  against PR purpose instead of just describing what happened
- Workflow now builds pr-context.txt from PR title/body/diff
2026-03-25 03:17:28 +00:00
snomiao
94e1388495 feat: redesign QA dashboard with modern frontend design
OKLCH color tokens, liquid glass card surfaces, Inter + JetBrains Mono
typography, grain texture overlay, staggered fade-up animations, pill
action buttons with SVG icons, and improved report table styling.
2026-03-25 03:17:28 +00:00
snomiao
0268e8f977 fix: make settings pre-seed non-fatal and try both API endpoints
The /api/settings endpoint returned 4xx in CI. Try both /api/settings
and /settings endpoints, and don't fail the job if neither works.
2026-03-25 03:17:28 +00:00
snomiao
4f1df7c7ce fix: pre-seed Comfy.TutorialCompleted to skip template gallery in QA
The Codex agent was spending 35s browsing the "Getting Started" template
gallery instead of testing the PR's changes. Pre-seeding this setting
via the ComfyUI API ensures the agent lands directly in the graph editor.
2026-03-25 03:17:28 +00:00
snomiao
3b2fdc786a fix: tighten focused QA prompt to only test PR-specific behavior
The Codex agent was spending time on login flow, template browsing,
and general smoke testing instead of testing the PR's actual changes.

Changes:
- Add 30-second time budget for video recording
- Move video-start AFTER login and editor verification
- Explicitly prohibit template browsing and sidebar exploration
- Reduce test steps to 3-6 targeted actions
- Restructure prompt with clear Instructions/Rules sections
2026-03-25 03:17:28 +00:00
snomiao
4d1ad4dcf0 fix: render markdown in QA reports with marked.js
Replace crude sed-based markdown conversion with client-side
rendering via marked.js CDN. Adds proper table, list, and
code styling for the report section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
7ba1aaed53 fix: run report job on workflow_dispatch events
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
09a3c10d50 refactor: replace GPT frame extraction with Gemini native video analysis
Replace the OpenAI GPT-based frame extraction approach (ffmpeg + screenshots)
with Gemini 2.5 Flash's native video understanding. This eliminates false
positives from frame-based analysis (e.g. "black screen = critical bug" during
page transitions) and produces dramatically better QA reviews.

Changes:
- Remove ffmpeg frame extraction, ffprobe duration detection, and all related
  logic (~365 lines removed)
- Add @google/generative-ai SDK for native video/mp4 upload to Gemini
- Update CLI: remove --max-frames, --min-interval-seconds, --keep-frames flags
- Update env: OPENAI_API_KEY → GEMINI_API_KEY
- Update workflow: swap API key secret and model in pr-qa.yaml
- Update report: replace "Frames analyzed" with "Video size"
- Add note in prompt that brief black frames during transitions are normal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
2f30fbe060 fix: use fill+click for quick login before video recording
playwright-cli doesn't support 'evaluate' command. Instead, instruct
Codex to quickly fill the username input and click Next on user-select
page BEFORE starting video recording, so the video only shows actual
QA testing.
2026-03-25 03:17:27 +00:00
snomiao
85adbe4878 fix: use evaluate to set localStorage before video recording
storageState config doesn't work with playwright-cli. Instead, use
evaluate to set Comfy.userId/userName after opening the page, then
navigate back. This skips user-select before video-start so the
recording only shows actual QA testing.
2026-03-25 03:17:27 +00:00
snomiao
0c707b5deb fix: pre-seed localStorage to skip user-select in QA runs
Write a Playwright storageState JSON with Comfy.userId/userName pre-set
so the app loads directly to the graph editor. Saves ~40s per QA run
that was wasted on navigating the user-select page.
2026-03-25 03:17:27 +00:00
snomiao
4f98518c22 fix: prefer explicit qa-session.webm over corrupt auto-recorded videos
The convert step was using find which picked up a 0-byte file from
playwright's videos/ directory instead of the valid qa-session.webm.
Now prefers qa-session.webm explicitly and skips empty files.
2026-03-25 03:17:27 +00:00
snomiao
ca394b9ff7 fix: improve focused QA prompt to test PR-specific behavior, not random walk 2026-03-25 03:17:27 +00:00
snomiao
51878db99d fix: re-add push trigger for sno-skills and sno-qa-* branches 2026-03-25 03:17:27 +00:00
snomiao
d0534003e7 fix: also install ffprobe for GPT video review frame extraction 2026-03-25 03:17:27 +00:00
snomiao
3d0cd72465 fix: use sudo for ffmpeg static binary extraction to /usr/local/bin 2026-03-25 03:17:27 +00:00
snomiao
82849df891 fix: use static ffmpeg binary instead of apt-get (avoids dpkg lock hang) 2026-03-25 03:17:27 +00:00
snomiao
1c62c0edc3 fix: add ffmpeg install back (not pre-installed on GH runners) 2026-03-25 03:17:27 +00:00
snomiao
6ea2ce755d fix: normalize flat artifact download into expected subdirectory 2026-03-25 03:17:27 +00:00
snomiao
8fc4480ee2 fix: pre-install chromium and clarify prompt for codex
Codex was using pnpm dlx instead of the global playwright-cli.
Pre-install chromium in setup step and make prompt explicit about
using the global command directly without pnpm/npx.
2026-03-25 03:17:27 +00:00
snomiao
3515a478fd fix: add debug output to video convert step 2026-03-25 03:17:27 +00:00
snomiao
11432992d3 fix: use danger-full-access sandbox for codex on GH Actions 2026-03-25 03:17:27 +00:00
snomiao
e619b0143a fix: use correct codex model name gpt-5.4-mini 2026-03-25 03:17:27 +00:00
snomiao
7d4a008f29 feat: switch QA from Claude Code to OpenAI Codex CLI
Replace claude --print with codex exec for cheaper QA runs.
Uses codex-mini-latest model ($1.50/$6 vs Sonnet $3/$15).
Uses existing OPENAI_API_KEY secret (no new secrets needed).
2026-03-25 03:17:27 +00:00
snomiao
a3e65140a9 fix: default to linux-only QA, full 3-OS only via qa-full label
Reduces per-run cost from ~$10-16 to ~$2.50 by defaulting to
Linux-only. Use qa-full label or workflow_dispatch for 3-OS runs.
2026-03-25 03:17:27 +00:00
snomiao
f55ab36dd7 fix: use explicit video-start/stop, remove ffmpeg install, use gpt-4.1-mini
- Replace saveVideo config (didn't produce video) with explicit
  playwright-cli video-start/video-stop commands in QA prompt
- Remove apt-get install ffmpeg step (pre-installed on GH runners)
- Switch video review model from gpt-4o to gpt-4.1-mini
2026-03-25 03:17:27 +00:00
snomiao
698a894b42 fix: use auto video recording and show GPT reports on QA site
- Enable saveVideo in playwright-cli config for real video recording
- Replace screenshot stitching with webm→mp4 conversion
- Move video review step before deploy so reports are included
- Add GPT video review reports inline on the Cloudflare Pages site
- Each video card now has expandable "GPT Video Review" section
2026-03-25 03:17:27 +00:00
snomiao
fbd7f404ef fix: configure playwright-cli outputDir and improve artifact collection
- Set .playwright/cli.config.json with outputDir pointing to screenshots/
- This way bare 'playwright-cli screenshot' auto-saves to the right place
- Create screenshot directory before Claude runs (don't rely on Claude)
- Collect step now searches working directory for stray PNGs
- Simplified prompt: no --filename needed, just 'playwright-cli screenshot'
2026-03-25 03:17:27 +00:00
snomiao
d633ce19a7 fix: stitch screenshots from correct directory and simplify prompt
Screenshots were saved to artifact root but stitch looked in frames/.
Now: prompt tells Claude to save to screenshots/ dir with numbered names,
collect step consolidates PNGs there, stitch step globs from screenshots/.
Removed video-start/video-stop (Claude doesn't use them).
2026-03-25 03:17:27 +00:00
snomiao
9a7b5f88a0 fix: use playwright-cli video recording and collect default output
- Add playwright-cli config with outputDir and saveVideo
- Use video-start/video-stop instead of relying on screenshot frames
- Add fallback artifact collection from .playwright-cli/ default dir
- Simplify prompts to focus on video recording workflow
2026-03-25 03:17:27 +00:00
snomiao
be261a8a86 fix: resolve QA_ARTIFACTS path in prompt so Claude gets the literal path
The escaped \$QA_ARTIFACTS in the heredoc produced literal text
'$QA_ARTIFACTS' in the prompt. Claude's Bash tool didn't reliably
expand this env var, so no screenshots or reports were saved.
Remove the escapes so the heredoc expands the variable to the actual
path (e.g. /home/runner/work/_temp/qa-artifacts).
2026-03-25 03:17:27 +00:00
snomiao
b2c31e785f fix: escape backticks in QA prompt heredoc to prevent command substitution
Backtick-wrapped playwright-cli examples in the unquoted heredoc were
being interpreted as bash command substitution, producing empty prompts.
Replace backtick syntax with plain "Run:" prefixed commands.
2026-03-25 03:17:27 +00:00
snomiao
475f9ae5f0 fix: reorganize QA CI — remove screen recording, merge video-review into report
- Remove all Xvfb/ffmpeg screen recording infrastructure from qa job
  (captured blank display since playwright-cli runs headless)
- Add screenshot instructions to QA prompts: Claude saves sequential
  frames to $QA_ARTIFACTS/frames/ after every interaction
- Stitch screenshots into video via ffmpeg in report job (2fps)
- Merge video-review job into report job (4 jobs → 3 jobs)
- Unified PR comment with video links + video review in <details> collapse
- Clean up stale QA_VIDEO_REVIEW_COMMENT markers from prior runs
2026-03-25 03:17:27 +00:00
GitHub Action
c329022e15 [automated] Apply ESLint and Oxfmt fixes 2026-03-25 03:17:27 +00:00
snomiao
63d0df9ff0 fix: harden setup-comfyui-server against shell injection
Move extra_server_params input to env var to prevent shell injection
from untrusted input. Replace wait-for-it pip dependency with a
cross-platform curl polling loop.
2026-03-25 03:17:27 +00:00
snomiao
c1593054fb feat: add comfy-qa skill and automated QA CI pipeline
Add Claude Code skills and a label-triggered QA workflow:

- .claude/skills/comfy-qa/SKILL.md: 12-category QA test plan using
  playwright-cli for browser automation
- .github/workflows/pr-qa.yaml: CI workflow triggered by qa-changes
  (focused, Linux) or qa-full (3-OS matrix) labels. Records screen via
  ffmpeg, runs Claude CLI with playwright-cli, deploys video gallery to
  Cloudflare Pages, posts PR comment with GIF thumbnails, and runs
  OpenAI vision-based video review
- scripts/qa-video-review.ts: frame extraction + GPT-4o analysis
- scripts/qa-video-review.test.ts: unit tests for video review
- knip.config.ts: resolve knip errors for ingest-types package
2026-03-25 03:17:26 +00:00
880 changed files with 16172 additions and 73592 deletions

View File

@@ -1,118 +0,0 @@
---
name: adr-compliance
description: Checks code changes against Architecture Decision Records, with emphasis on ECS (ADR 0008) and command-pattern (ADR 0003) compliance
severity-default: medium
tools: [Read, Grep, glob]
---
Check that code changes are consistent with the project's Architecture Decision Records in `docs/adr/`.
## Priority 1: ECS and Command-Pattern Compliance (ADR 0008 + ADR 0003)
These are the primary architectural guardrails. Every entity/litegraph change must be checked against them.
### Command Pattern (ADR 0003)
All entity state mutations MUST be expressible as **serializable, idempotent, deterministic commands**. This is required for CRDT sync, undo/redo, cross-environment portability, and gateway backends.
Flag:
- **Direct spatial mutation** — `node.pos = ...`, `node.size = ...`, `group.pos = ...` outside of a store or command. All spatial data flows through `layoutStore` commands.
- **Imperative fire-and-forget mutation** — Any new API that mutates entity state as a side effect rather than producing a serializable command object. Systems should produce command batches, not execute mutations directly.
- **Void-returning mutation APIs** — New entity mutation functions that return `void` instead of a result type (`{ status: 'applied' | 'rejected' | 'no-op' }`). Commands need error/rejection semantics.
- **Auto-incrementing IDs in new entity code** — New entity creation using auto-increment counters without acknowledging the CRDT collision problem. Concurrent environments need globally unique, stable identifiers.
### ECS Architecture (ADR 0008)
The graph domain model is migrating to ECS. New code must not make the migration harder.
Flag:
- **God-object growth** — New methods/properties added to `LGraphNode` (~4k lines), `LGraphCanvas` (~9k lines), `LGraph` (~3k lines), or `Subgraph`. Extract to systems, stores, or composables instead.
- **Mixed data and behavior** — New component-like data structures that contain methods or back-references to parent entities. ECS components are plain data objects.
- **New circular entity dependencies** — New circular imports between `LGraph``Subgraph`, `LGraphNode``LGraphCanvas`, or similar entity classes.
- **Direct `graph._version++`** — Mutating the private version counter directly instead of through a public API. Extensions already depend on this side-channel; it must become a proper API.
### Centralized Registries and ECS-Style Access
All entity data access should move toward centralized query patterns, not instance property access.
Flag:
- **New instance method/property patterns** — Adding `node.someProperty` or `node.someMethod()` for data that should be a component in the World, queried via `world.getComponent(entityId, ComponentType)`.
- **OOP inheritance for entity modeling** — Extending entity classes with new subclasses instead of composing behavior through components and systems.
- **Scattered state** — New entity state stored in multiple locations (class properties, stores, local variables) instead of being consolidated in the World or in a single store.
### Extension Ecosystem Impact
Entity API changes affect 40+ custom node repos. Changes to these patterns require an extension migration path.
Flag when changed without migration guidance:
- `onConnectionsChange`, `onRemoved`, `onAdded`, `onConfigure` callbacks
- `onConnectInput` / `onConnectOutput` validation hooks
- `onWidgetChanged` handlers
- `node.widgets.find(w => w.name === ...)` patterns
- `node.serialize` overrides
- `graph._version++` direct mutation
- `getNodeById` usage patterns
## Priority 2: General ADR Compliance
For all other ADRs, iterate through each file in `docs/adr/` and extract the core lesson. Ensure changed code does not contradict accepted ADRs. Flag contradictions with proposed ADRs as directional guidance.
### How to Apply
1. Read `docs/adr/README.md` to get the full ADR index
2. For each ADR, read the Decision and Consequences sections
3. Check the diff against each ADR's constraints
4. Only flag ACTUAL violations in changed code, not pre-existing patterns
### Skip List
These ADRs can be skipped for most reviews (they cover completed or narrow-scope decisions):
- **ADR 0004** (Rejected — Fork PrimeVue) — only relevant if someone proposes forking PrimeVue again
## How to Check
1. Identify changed files in the entity/litegraph layer: `src/lib/litegraph/`, `src/ecs/`, `src/platform/`, entity-related stores
2. For Priority 1 patterns, use targeted searches:
```
# Direct position mutation
Grep: pattern="\.pos\s*=" path="src/lib/litegraph"
Grep: pattern="\.size\s*=" path="src/lib/litegraph"
# God object growth (new methods)
Grep: pattern="(class LGraphNode|class LGraphCanvas|class LGraph\b)" path="src/lib/litegraph"
# Version mutation
Grep: pattern="_version\+\+" path="src/lib/litegraph"
# Extension callback changes
Grep: pattern="on(ConnectionsChange|Removed|Added|Configure|ConnectInput|ConnectOutput|WidgetChanged)" path="src/lib/litegraph"
```
3. For Priority 2, read `docs/adr/` files and check for contradictions
## Severity Guidelines
| Issue | Severity |
| -------------------------------------------------------- | -------- |
| Imperative mutation API without command-pattern wrapper | high |
| New god-object method on LGraphNode/LGraphCanvas/LGraph | high |
| Breaking extension callback without migration path | high |
| New circular entity dependency | high |
| Direct spatial mutation bypassing command pattern | medium |
| Mixed data/behavior in component-like structures | medium |
| New OOP inheritance pattern for entities | medium |
| Contradicts accepted ADR direction | medium |
| Contradicts proposed ADR direction without justification | low |
## Rules
- Only flag ACTUAL violations in changed code, not pre-existing patterns
- If a change explicitly acknowledges an ADR tradeoff in comments or PR description, lower severity
- Proposed ADRs carry less weight than accepted ones — flag as directional guidance
- Reference the specific ADR number in every finding

View File

@@ -1,74 +0,0 @@
---
name: playwright-e2e
description: Reviews Playwright E2E test code for ComfyUI-specific patterns, flakiness risks, and fixture misuse
severity-default: medium
tools: [Read, Grep]
---
You are reviewing Playwright E2E test code in `browser_tests/`. Focus on issues a **reviewer** would catch that an author might miss — flakiness risks, fixture misuse, test isolation problems, and convention violations.
Reference docs (read if you need full context):
- `browser_tests/README.md` — setup, patterns, screenshot workflow
- `browser_tests/AGENTS.md` — directory structure, fixture overview
- `docs/guidance/playwright.md` — type assertion rules, test tags, forbidden patterns
- `.claude/skills/writing-playwright-tests/SKILL.md` — anti-patterns, retry patterns, Vue Nodes vs LiteGraph decision guide
## Checks
### Flakiness Risks (Major)
1. **`waitForTimeout` usage** — Always wrong. Must use retrying assertions (`toBeVisible`, `toHaveText`), `expect.poll()`, or `expect().toPass()`. See retry patterns in `.claude/skills/writing-playwright-tests/SKILL.md`.
2. **Missing `nextFrame()` after canvas ops** — Any `drag`, `click` on canvas, `resizeNode`, `pan`, `zoom`, or programmatic graph mutation via `page.evaluate` that changes visual state needs `await comfyPage.nextFrame()` before assertions. `loadWorkflow()` does NOT need it. Prefer encapsulating `nextFrame()` calls inside Page Object methods so tests don't manage frame timing directly.
3. **Keyboard actions without prior focus**`page.keyboard.press()` without a preceding `comfyPage.canvas.click()` or element `.focus()` will silently send keys to nothing.
4. **Coordinate-based interactions where node refs exist** — Raw `{ x, y }` clicks on canvas are fragile. If the test targets a node, use `comfyPage.nodeOps.getNodeRefById()` / `getNodeRefsByTitle()` / `getNodeRefsByType()` instead.
5. **Shared mutable state between tests** — Variables declared outside `test()` blocks, `let` state mutated across tests, or tests depending on execution order. Each test must be independently runnable.
6. **Missing cleanup of server-persisted state** — Settings changed via `comfyPage.settings.setSetting()` persist across tests. Must be reset in `afterEach` or at test start. Same for uploaded files or saved workflows. Prefer moving cleanup into [fixture options](https://playwright.dev/docs/test-fixtures#fixtures-options) so individual tests don't manage reset logic.
7. **Double-click without `{ delay }` option**`dblclick()` without `{ delay: 5 }` or similar can be too fast for the canvas event handler.
### Fixture & API Misuse (Medium)
8. **Reimplementing existing fixture helpers** — Before flagging, grep `browser_tests/fixtures/` for the functionality. Common missed helpers:
- `comfyPage.command.executeCommand()` for menu/command actions
- `comfyPage.workflow.loadWorkflow()` for loading test workflows
- `comfyPage.canvasOps.resetView()` for view reset
- `comfyPage.settings.setSetting()` for settings
- Component page objects in `browser_tests/fixtures/components/`
9. **Building workflows programmatically when a JSON asset would work** — Complex `page.evaluate` chains to construct a graph should use a premade JSON workflow in `browser_tests/assets/` loaded via `comfyPage.workflow.loadWorkflow()`.
10. **Selectors not using `TestIds`** — Hard-coded `data-testid` strings should reference `browser_tests/fixtures/selectors.ts` when a matching entry exists. Check `selectors.ts` before flagging.
### Convention Violations (Minor)
11. **Missing test tags** — Every `test.describe` should have `tag` with at least one of: `@smoke`, `@slow`, `@screenshot`, `@canvas`, `@node`, `@widget`, `@mobile`, `@2x`. See `.claude/skills/writing-playwright-tests/SKILL.md` for when to use each.
12. **`as any` type assertions** — Forbidden in E2E tests. Use specific type assertions or test-local type helpers. See `docs/guidance/playwright.md` for acceptable patterns.
13. **Screenshot tests without masking dynamic content** — Timestamps, version numbers, or other non-deterministic content in screenshots will cause flakes. Use `mask` option.
14. **`test.describe` without `afterEach` cleanup when canvas state changes** — Tests that manipulate canvas view (drag, zoom, pan) should include `afterEach` with `comfyPage.canvasOps.resetView()`. Prefer moving canvas reset into the fixture so individual tests don't manage cleanup.
15. **Debug helpers left in committed code**`debugAddMarker`, `debugAttachScreenshot`, `debugShowCanvasOverlay`, `debugGetCanvasDataURL` are for local debugging only.
### Test Design (Nitpick)
16. **Screenshot-only assertions where functional assertions are possible** — Prefer `expect(await node.isPinned()).toBe(true)` over screenshot comparison when testing non-visual behavior.
17. **Overly large test workflows** — Test should load the minimal workflow needed. If a test only needs one node, don't load the full default graph.
18. **Vue Nodes / LiteGraph mismatch** — If testing Vue-rendered node UI (DOM widgets, CSS states), should use `comfyPage.vueNodes.*`. If testing canvas interactions/connections, should use `comfyPage.nodeOps.*`. Mixing both in one test is a smell.
## Rules
- Only review `.spec.ts` files and supporting code in `browser_tests/`
- Do NOT flag patterns in fixture/helper code (`browser_tests/fixtures/`) — those are shared infrastructure with different rules
- "Major" for flakiness risks (items 1-7), "medium" for fixture misuse (8-10), "minor" for convention violations (11-15), "nitpick" for test design (16-18)
- When flagging missing fixture usage (item 8), confirm the helper exists by checking the fixture code — don't assume
- Existing tests that predate conventions are acceptable to modify but not required to fix

View File

@@ -1,94 +0,0 @@
# ADR Compliance Audit
Audit the current changes (or a specified PR) for compliance with Architecture Decision Records.
## Step 1: Gather the Diff
- If a PR number is provided, run: `gh pr diff $PR_NUMBER`
- Otherwise, run: `git diff origin/main...HEAD` (or `git diff --cached` for staged changes)
## Step 2: Priority 1 — ECS and Command-Pattern Compliance
Read these documents for context:
```
docs/adr/0003-crdt-based-layout-system.md
docs/adr/0008-entity-component-system.md
docs/architecture/ecs-target-architecture.md
docs/architecture/ecs-migration-plan.md
docs/architecture/appendix-critical-analysis.md
```
### Check A: Command Pattern (ADR 0003)
Every entity state mutation must be a **serializable, idempotent, deterministic command** — replayable, undoable, transmittable over CRDT.
Flag:
1. **Direct spatial mutation**`node.pos = ...`, `node.size = ...`, `group.pos = ...` outside a store/command
2. **Imperative fire-and-forget APIs** — Functions that mutate entity state as side effects rather than producing serializable command objects. Systems should produce command batches, not execute mutations directly.
3. **Void-returning mutation APIs** — Entity mutations returning `void` instead of `{ status: 'applied' | 'rejected' | 'no-op' }`
4. **Auto-increment IDs** — New entity creation via counters without addressing CRDT collision. Concurrent environments need globally unique identifiers.
5. **Missing transaction semantics** — Multi-entity operations without atomic grouping (e.g., node removal = 10+ deletes with no rollback on failure)
### Check B: ECS Architecture (ADR 0008)
Flag:
1. **God-object growth** — New methods/properties on `LGraphNode`, `LGraphCanvas`, `LGraph`, `Subgraph`
2. **Mixed data/behavior** — Component-like structures with methods or back-references
3. **OOP instance patterns** — New `node.someProperty` or `node.someMethod()` for data that should be a World component
4. **OOP inheritance** — New entity subclasses instead of component composition
5. **Circular entity deps** — New `LGraph``Subgraph`, `LGraphNode``LGraphCanvas` circular imports
6. **Direct `_version++`** — Mutating private version counter instead of through public API
### Check C: Extension Ecosystem Impact
If any of these patterns are changed, flag and require migration guidance:
- `onConnectionsChange`, `onRemoved`, `onAdded`, `onConfigure` callbacks
- `onConnectInput` / `onConnectOutput` validation hooks
- `onWidgetChanged` handlers
- `node.widgets.find(w => w.name === ...)` access patterns
- `node.serialize` overrides
- `graph._version++` direct mutation
Reference: 40+ custom node repos depend on these (rgthree-comfy, ComfyUI-Impact-Pack, cg-use-everywhere, etc.)
## Step 3: Priority 2 — General ADR Compliance
1. Read `docs/adr/README.md` for the full ADR index
2. For each ADR (except skip list), read the Decision section
3. Check the diff for contradictions
4. Only flag ACTUAL violations in changed code
**Skip list**: ADR 0004 (Rejected — Fork PrimeVue)
## Step 4: Generate Report
```
## ADR Compliance Audit Report
### Summary
- Files audited: N
- Priority 1 findings: N (command-pattern: N, ECS: N, ecosystem: N)
- Priority 2 findings: N
### Priority 1: Command Pattern & ECS
(List each with ADR reference, file, line, description)
### Priority 1: Extension Ecosystem Impact
(List each changed callback/API with affected custom node repos)
### Priority 2: General ADR Compliance
(List each with ADR reference, file, line, description)
### Compliant Patterns
(Note changes that positively align with ADR direction)
```
## Severity
- **Must fix**: Contradicts accepted ADR, or introduces imperative mutation API without command-pattern wrapper, or breaks extension callback without migration path
- **Should discuss**: Contradicts proposed ADR direction — either align or propose ADR amendment
- **Note**: Surfaces open architectural question not yet addressed by ADRs

View File

@@ -1,84 +0,0 @@
---
name: adding-deprecation-warnings
description: 'Adds deprecation warnings for renamed or removed properties/APIs. Searches custom node ecosystem for usage, applies defineDeprecatedProperty helper, adds JSDoc. Triggers on: deprecate, deprecation warning, rename property, backward compatibility.'
---
# Adding Deprecation Warnings
Adds backward-compatible deprecation warnings for renamed or removed
properties using the `defineDeprecatedProperty` helper in
`src/lib/litegraph/src/utils/feedback.ts`.
## When to Use
- A property or API has been renamed and custom nodes still use the old name
- A property is being removed but needs a grace period
- Backward compatibility must be preserved while nudging adoption
## Steps
### 1. Search the Custom Node Ecosystem
Before implementing, assess impact by searching for usage of the
deprecated property across ComfyUI custom nodes:
```text
Use the comfy_codesearch tool to search for the old property name.
Search for both `widget.oldProp` and just `oldProp` to catch all patterns.
```
Document the usage patterns found (property access, truthiness checks,
caching to local vars, style mutation, etc.) — these all must continue
working.
### 2. Apply the Deprecation
Use `defineDeprecatedProperty` from `src/lib/litegraph/src/utils/feedback.ts`:
```typescript
import { defineDeprecatedProperty } from '@/lib/litegraph/src/utils/feedback'
/** @deprecated Use {@link obj.newProp} instead. */
defineDeprecatedProperty(
obj,
'oldProp',
'newProp',
'obj.oldProp is deprecated. Use obj.newProp instead.'
)
```
### 3. Checklist
- [ ] Ecosystem search completed — all usage patterns are compatible
- [ ] `defineDeprecatedProperty` call added after the new property is assigned
- [ ] JSDoc `@deprecated` tag added above the call for IDE support
- [ ] Warning message names both old and new property clearly
- [ ] `pnpm typecheck` passes
- [ ] `pnpm lint` passes
### 4. PR Comment
Add a PR comment summarizing the ecosystem search results: which repos
use the deprecated property, what access patterns were found, and
confirmation that all patterns are compatible with the ODP getter/setter.
## How `defineDeprecatedProperty` Works
- Creates an `Object.defineProperty` getter/setter on the target object
- Getter returns `this[currentKey]`, setter assigns `this[currentKey]`
- Both log via `warnDeprecated`, which deduplicates (once per unique
message per session via a `Set`)
- `enumerable: false` keeps the alias out of `Object.keys()` / `for...in`
/ `JSON.stringify`
- `configurable: true` allows further redefinition if needed
## Edge Cases
- **Truthiness checks** (`if (widget.oldProp)`) — works, getter fires
- **Caching to local var** (`const el = widget.oldProp`) — works, warns
once then the cached ref is used directly
- **Style/property mutation** (`widget.oldProp.style.color = 'red'`) —
works, getter returns the real object
- **Serialization** (`JSON.stringify`) — `enumerable: false` excludes it
- **Heavy access in loops** — `warnDeprecated` deduplicates, only warns
once per session regardless of call count

View File

@@ -18,20 +18,12 @@ Cherry-pick backport management for Comfy-Org/ComfyUI_frontend stable release br
## System Context
| Item | Value |
| -------------- | --------------------------------------------------------------------------- |
| Repo | `~/ComfyUI_frontend` (Comfy-Org/ComfyUI_frontend) |
| Merge strategy | Auto-merge via workflow (`--auto --squash`); `--admin` only after CI passes |
| Automation | `pr-backport.yaml` GitHub Action (label-driven, auto-merge enabled) |
| Tracking dir | `~/temp/backport-session/` |
## CI Safety Rules
**NEVER merge a backport PR without all CI checks passing.** This applies to both automation-created and manual cherry-pick PRs.
- **Automation PRs:** The `pr-backport.yaml` workflow now enables `gh pr merge --auto --squash`, so clean PRs auto-merge once CI passes. Monitor with polling (`gh pr list --base TARGET_BRANCH --state open`). Do not intervene unless CI fails.
- **Manual cherry-pick PRs:** After `gh pr create`, wait for CI before merging. Poll with `gh pr checks $PR --watch` or use a sleep+check loop. Only merge after all checks pass.
- **CI failures:** DO NOT use `--admin` to bypass failing CI. Analyze the failure, present it to the user with possible causes (test backported without implementation, missing dependency, flaky test), and let the user decide the next step.
| Item | Value |
| -------------- | ------------------------------------------------- |
| Repo | `~/ComfyUI_frontend` (Comfy-Org/ComfyUI_frontend) |
| Merge strategy | Squash merge (`gh pr merge --squash --admin`) |
| Automation | `pr-backport.yaml` GitHub Action (label-driven) |
| Tracking dir | `~/temp/backport-session/` |
## Branch Scope Rules
@@ -116,15 +108,11 @@ git fetch origin TARGET_BRANCH
# Quick smoke check: does the branch build?
git worktree add /tmp/verify-TARGET origin/TARGET_BRANCH
cd /tmp/verify-TARGET
source ~/.nvm/nvm.sh && nvm use 24 && pnpm install && pnpm typecheck && pnpm test:unit
source ~/.nvm/nvm.sh && nvm use 24 && pnpm install && pnpm typecheck
git worktree remove /tmp/verify-TARGET --force
```
If typecheck or tests fail, stop and investigate before continuing. A broken branch after wave N means all subsequent waves will compound the problem.
### Never Admin-Merge Without CI
In a previous bulk session, all 69 backport PRs were merged with `gh pr merge --squash --admin`, bypassing required CI checks. This shipped 3 test failures to a release branch. **Lesson: `--admin` skips all branch protection, including required status checks.** Only use `--admin` after confirming CI has passed (e.g., `gh pr checks $PR` shows all green), or rely on auto-merge (`--auto --squash`) which waits for CI by design.
If typecheck fails, stop and investigate before continuing. A broken branch after wave N means all subsequent waves will compound the problem.
## Continuous Backporting Recommendation

View File

@@ -19,44 +19,23 @@ done
# Wait 3 minutes for automation
sleep 180
# Check which got auto-PRs (auto-merge is enabled, so clean ones will self-merge after CI)
# Check which got auto-PRs
gh pr list --base TARGET_BRANCH --state open --limit 50 --json number,title
```
> **Note:** The `pr-backport.yaml` workflow now enables `gh pr merge --auto --squash` on automation-created PRs. Clean PRs will auto-merge once CI passes — no manual merge needed for those.
## Step 2: Wait for CI & Merge Clean Auto-PRs
Most automation PRs will auto-merge once CI passes (via `--auto --squash` in the workflow). Monitor and handle failures:
## Step 2: Review & Merge Clean Auto-PRs
```bash
# Wait for CI to complete (~45 minutes for full suite)
sleep 2700
# Check which PRs are still open (CI may have failed, or auto-merge succeeded)
STILL_OPEN_PRS=$(gh pr list --base TARGET_BRANCH --state open --limit 50 --json number --jq '.[].number')
RECENTLY_MERGED=$(gh pr list --base TARGET_BRANCH --state merged --limit 50 --json number,title,mergedAt)
# For PRs still open, check CI status
for pr in $STILL_OPEN_PRS; do
CI_FAILED=$(gh pr checks $pr --json name,state --jq '[.[] | select(.state == "FAILURE")] | length')
CI_PENDING=$(gh pr checks $pr --json name,state --jq '[.[] | select(.state == "PENDING" or .state == "QUEUED")] | length')
if [ "$CI_FAILED" != "0" ]; then
# CI failed — collect details for triage
echo "PR #$pr — CI FAILED:"
gh pr checks $pr --json name,state,link --jq '.[] | select(.state == "FAILURE") | "\(.name): \(.state)"'
elif [ "$CI_PENDING" != "0" ]; then
echo "PR #$pr — CI still running ($CI_PENDING checks pending)"
else
# All checks passed but didn't auto-merge (race condition or label issue)
gh pr merge $pr --squash --admin
sleep 3
fi
for pr in $AUTO_PRS; do
# Check size
gh pr view $pr --json title,additions,deletions,changedFiles \
--jq '"Files: \(.changedFiles), +\(.additions)/-\(.deletions)"'
# Admin merge
gh pr merge $pr --squash --admin
sleep 3
done
```
**⚠️ If CI fails: DO NOT admin-merge to bypass.** See "CI Failure Triage" below.
## Step 3: Manual Worktree for Conflicts
```bash
@@ -84,13 +63,6 @@ for PR in ${CONFLICT_PRS[@]}; do
NEW_PR=$(gh pr create --base TARGET_BRANCH --head backport-$PR-to-TARGET \
--title "[backport TARGET] TITLE (#$PR)" \
--body "Backport of #$PR..." | grep -oP '\d+$')
# Wait for CI before merging — NEVER admin-merge without CI passing
echo "Waiting for CI on PR #$NEW_PR..."
gh pr checks $NEW_PR --watch --fail-fast || {
echo "⚠️ CI failed on PR #$NEW_PR — skipping merge, needs triage"
continue
}
gh pr merge $NEW_PR --squash --admin
sleep 3
done
@@ -110,7 +82,7 @@ After completing all PRs in a wave for a target branch:
git fetch origin TARGET_BRANCH
git worktree add /tmp/verify-TARGET origin/TARGET_BRANCH
cd /tmp/verify-TARGET
source ~/.nvm/nvm.sh && nvm use 24 && pnpm install && pnpm typecheck && pnpm test:unit
source ~/.nvm/nvm.sh && nvm use 24 && pnpm install && pnpm typecheck
git worktree remove /tmp/verify-TARGET --force
```
@@ -160,8 +132,7 @@ git rebase origin/TARGET_BRANCH
# Resolve new conflicts
git push --force origin backport-$PR-to-TARGET
sleep 20 # Wait for GitHub to recompute merge state
# Wait for CI after rebase before merging
gh pr checks $PR --watch --fail-fast && gh pr merge $PR --squash --admin
gh pr merge $PR --squash --admin
```
## Lessons Learned
@@ -175,31 +146,5 @@ gh pr checks $PR --watch --fail-fast && gh pr merge $PR --squash --admin
7. **appModeStore.ts, painter files, GLSLShader files** don't exist on core/1.40 — `git rm` these
8. **Always validate JSON** after resolving locale file conflicts
9. **Dep refresh PRs** — skip on stable branches. Risk of transitive dep regressions outweighs audit cleanup. Cherry-pick individual CVE fixes instead.
10. **Verify after each wave** — run `pnpm typecheck && pnpm test:unit` on the target branch after merging a batch. Catching breakage early prevents compounding errors.
10. **Verify after each wave** — run `pnpm typecheck` on the target branch after merging a batch. Catching breakage early prevents compounding errors.
11. **Cloud-only PRs don't belong on core/\* branches** — app mode, cloud auth, and cloud-specific UI changes are irrelevant to local users. Always check PR scope against branch scope before backporting.
12. **Never admin-merge without CI**`--admin` bypasses all branch protections including required status checks. A bulk session of 69 admin-merges shipped 3 test failures. Always wait for CI to pass first, or use `--auto --squash` which waits by design.
## CI Failure Triage
When CI fails on a backport PR, present failures to the user using this template:
```markdown
### PR #XXXX — CI Failed
- **Failing check:** test / lint / typecheck
- **Error:** (summary of the failure message)
- **Likely cause:** test backported without implementation / missing dependency / flaky test / snapshot mismatch
- **Recommendation:** backport PR #YYYY first / skip this PR / rerun CI after fixing prerequisites
```
Common failure categories:
| Category | Example | Resolution |
| --------------------------- | ---------------------------------------- | ----------------------------------------- |
| Test without implementation | Test references function not on branch | Backport the implementation PR first |
| Missing dependency | Import from module not on branch | Backport the dependency PR first, or skip |
| Snapshot mismatch | Screenshot test differs | Usually safe — update snapshots on branch |
| Flaky test | Passes on retry | Re-run CI, merge if green on retry |
| Type error | Interface changed on main but not branch | May need manual adaptation |
**Never assume a failure is safe to skip.** Present all failures to the user with analysis.

View File

@@ -5,9 +5,9 @@
Maintain `execution-log.md` with per-branch tables:
```markdown
| PR# | Title | CI Status | Status | Backport PR | Notes |
| ----- | ----- | ------------------------------ | --------------------------------- | ----------- | ------- |
| #XXXX | Title | ✅ Pass / ❌ Fail / ⏳ Pending | ✅ Merged / ⏭️ Skip / ⏸️ Deferred | #YYYY | Details |
| PR# | Title | Status | Backport PR | Notes |
| ----- | ----- | --------------------------------- | ----------- | ------- |
| #XXXX | Title | ✅ Merged / ⏭️ Skip / ⏸️ Deferred | #YYYY | Details |
```
## Wave Verification Log
@@ -19,7 +19,6 @@ Track verification results per wave:
- PRs merged: #A, #B, #C
- Typecheck: ✅ Pass / ❌ Fail
- Unit tests: ✅ Pass / ❌ Fail
- Issues found: (if any)
- Human review needed: (list any non-trivial conflict resolutions)
```
@@ -42,11 +41,6 @@ Track verification results per wave:
| PR# | Branch | Conflict Type | Resolution Summary |
## CI Failure Report
| PR# | Branch | Failing Check | Error Summary | Cause | Resolution |
| --- | ------ | ------------- | ------------- | ----- | ---------- |
## Automation Performance
| Metric | Value |

View File

@@ -0,0 +1,278 @@
---
name: reproduce-issue
description: 'Reproduce a GitHub issue by researching prerequisites, setting up the environment (custom nodes, workflows, settings), and interactively exploring ComfyUI via playwright-cli until the bug is confirmed. Then records a clean demo video.'
---
# Issue Reproduction Skill
Reproduce a reported GitHub issue against a running ComfyUI instance. This skill uses an interactive, agent-driven approach — not a static script. You will research, explore, retry, and adapt until the bug is reproduced, then record a clean demo.
## Architecture
Two videos are produced:
1. **Research video** — the full exploration session: installing deps, trying things, failing, retrying, figuring out the bug. Valuable for debugging context.
2. **Reproduce video** — a clean, minimal recording of just the reproduction steps. This is the demo you'd attach to the issue.
```
Phase 1: Research → Read issue, understand prerequisites
Phase 2: Environment → Install custom nodes, load workflows, configure settings
Phase 3: Explore → [VIDEO 1: research] Interactively try to reproduce (retries OK)
Phase 4: Record → [VIDEO 2: reproduce] Clean recording of just the minimal repro steps
Phase 5: Report → Generate a structured reproduction report
```
## Prerequisites
- ComfyUI server running (ask user for URL, default: `http://127.0.0.1:8188`)
- `playwright-cli` installed: `npm install -g @playwright/cli@latest`
- `gh` CLI (authenticated, for reading issues)
- ComfyUI backend with Python environment (for installing custom nodes)
## Phase 1: Research the Issue
1. Fetch the issue details:
```bash
gh issue view <number> --repo Comfy-Org/ComfyUI_frontend --json title,body,comments
```
2. Extract from the issue body:
- **Reproduction steps** (the exact sequence)
- **Prerequisites**: specific workflows, custom nodes, settings, models
- **Environment**: OS, browser, ComfyUI version
- **Media**: screenshots or videos showing the bug
3. Search the codebase for related code:
- Find the feature/component mentioned in the issue
- Understand how it works currently
- Identify what state the UI needs to be in
## Phase 2: Environment Setup
Set up everything the issue requires BEFORE attempting reproduction.
### Custom Nodes
If the issue mentions custom nodes:
```bash
# Find the custom node repo
# Clone into ComfyUI's custom_nodes directory
cd <comfyui_path>/custom_nodes
git clone <custom_node_repo_url>
# Install dependencies if needed
cd <custom_node_name>
pip install -r requirements.txt 2>/dev/null || true
# Restart ComfyUI server to load the new nodes
```
### Workflows
If the issue references a specific workflow:
```bash
# Download workflow JSON if a URL is provided
curl -L "<workflow_url>" -o /tmp/test-workflow.json
# Load it via the API
curl -X POST http://127.0.0.1:8188/api/workflow \
-H "Content-Type: application/json" \
-d @/tmp/test-workflow.json
```
Or load via playwright-cli:
```bash
playwright-cli goto "http://127.0.0.1:8188"
# Drag-and-drop or use File > Open to load the workflow
```
### Settings
If the issue requires specific settings:
```bash
# Use playwright-cli to open settings and change them
playwright-cli press "Control+,"
playwright-cli snapshot
# Find and modify the relevant setting
```
## Phase 3: Interactive Exploration — Research Video
Start recording the **research video** (Video 1). This captures the full exploration — mistakes, retries, dead ends — all valuable context.
```bash
# Open browser and start video recording
playwright-cli open "http://127.0.0.1:8188"
playwright-cli video-start
# Take a snapshot to see current state
playwright-cli snapshot
# Interact based on what you see
playwright-cli click <ref>
playwright-cli fill <ref> "text"
playwright-cli press "Control+s"
# Check results
playwright-cli snapshot
playwright-cli screenshot --filename=/tmp/qa/research-step-1.png
```
### Key Principles
- **Observe before acting**: Always `snapshot` before interacting
- **Retry and adapt**: If a step fails, try a different approach
- **Document what works**: Keep notes on which steps trigger the bug
- **Don't give up**: Try multiple approaches if the first doesn't work
- **Establish prerequisites**: Many bugs require specific UI state:
- Save a workflow first (File > Save)
- Make changes to dirty the workflow
- Open multiple tabs
- Add specific node types
- Change settings
- Resize the window
### Common ComfyUI Interactions via playwright-cli
| Action | Command |
| ------------------- | -------------------------------------------------------------- |
| Open hamburger menu | `playwright-cli click` on the C logo button |
| Navigate menu | `playwright-cli hover <ref>` then `playwright-cli click <ref>` |
| Add node | Double-click canvas → type node name → select from results |
| Connect nodes | Drag from output slot to input slot |
| Save workflow | `playwright-cli press "Control+s"` |
| Save As | Menu > File > Save As |
| Select node | Click on the node |
| Delete node | Select → `playwright-cli press "Delete"` |
| Right-click menu | `playwright-cli click <ref> --button right` |
| Keyboard shortcut | `playwright-cli press "Control+z"` |
## Phase 4: Record Clean Demo — Reproduce Video (max 5 minutes)
Once the bug is confirmed, **stop the research video** and **close the research browser**:
```bash
playwright-cli video-stop
playwright-cli close
```
Now start a **fresh browser session** for the clean reproduce video (Video 2).
**IMPORTANT constraints:**
- **Max 5 minutes** — the reproduce video must be short and focused
- **No environment setup** — server, user, custom nodes are already set up from Phase 3. Just log in and go.
- **No exploration** — you already know the exact steps. Execute them quickly and precisely.
- **Start video recording immediately**, execute steps, stop. Don't leave the recording running while thinking.
1. **Open browser and start recording**:
```bash
playwright-cli open "http://127.0.0.1:8188"
playwright-cli video-start
```
2. **Execute only the minimal reproduction steps** — no exploration, no mistakes. Just the clean sequence that demonstrates the bug. You already know exactly what works from Phase 3.
3. **Take key screenshots** at critical moments:
```bash
playwright-cli screenshot --filename=/tmp/qa/before-bug.png
# ... trigger the bug ...
playwright-cli screenshot --filename=/tmp/qa/bug-visible.png
```
4. **Stop recording and close** immediately after the bug is demonstrated:
```bash
playwright-cli video-stop
playwright-cli close
```
## Phase 5: Generate Report
Create a reproduction report at `tmp/qa/reproduce-report.md`:
```markdown
# Issue Reproduction Report
- **Issue**: <issue_url>
- **Title**: <issue_title>
- **Date**: <today>
- **Status**: Reproduced / Not Reproduced / Partially Reproduced
## Environment
- ComfyUI Server: <url>
- OS: <os>
- Custom Nodes Installed: <list or "none">
- Settings Changed: <list or "none">
## Prerequisites
List everything that had to be set up before the bug could be triggered:
1. ...
2. ...
## Reproduction Steps
Minimal steps to reproduce (the clean sequence):
1. ...
2. ...
3. ...
## Expected Behavior
<from the issue>
## Actual Behavior
<what actually happened>
## Evidence
- Research video: `research-video/video.webm` (full exploration session)
- Reproduce video: `reproduce-video/video.webm` (clean minimal repro)
- Screenshots: `before-bug.png`, `bug-visible.png`
## Root Cause Analysis (if identified)
<code pointers, hypothesis about what's going wrong>
## Notes
<any additional observations, workarounds discovered, related issues>
```
## Handling Failures
If the bug **cannot be reproduced**:
1. Document what you tried and why it didn't work
2. Check if the issue was already fixed (search git log for related commits)
3. Check if it's environment-specific (OS, browser, specific version)
4. Set report status to "Not Reproduced" with detailed notes
5. The report is still valuable — it saves others from repeating the same investigation
## CI Integration
In CI, this skill runs as a Claude Code agent with:
- `ANTHROPIC_API_KEY` for Claude
- `GEMINI_API_KEY` for initial issue analysis (optional)
- ComfyUI server pre-started in the container
- `playwright-cli` pre-installed
The CI workflow:
1. Gemini generates a reproduce guide (markdown) from the issue
2. Claude agent receives the guide and runs this skill
3. Claude explores interactively, installs dependencies, retries
4. Claude records a clean demo once reproduced
5. Video and report are uploaded as artifacts

View File

@@ -0,0 +1,361 @@
---
name: comfy-qa
description: 'Comprehensive QA of ComfyUI frontend. Navigates all routes, tests all interactive features using playwright-cli, generates a report, and submits a draft PR. Works in CI and local environments, cross-platform.'
---
# ComfyUI Frontend QA Skill
Perform comprehensive quality assurance of the ComfyUI frontend application by navigating all routes, clicking interactive elements, and testing features. Generate a structured report and submit it as a draft PR.
## Prerequisites
- Node.js 22+
- `pnpm` package manager
- `gh` CLI (authenticated)
- `playwright-cli` (browser automation): `npm install -g @playwright/cli@latest`
## Step 1: Environment Detection & Setup
Detect the runtime environment and ensure the app is accessible.
### CI Environment
If `CI=true` is set:
1. The ComfyUI backend is pre-configured in the CI container (`ghcr.io/comfy-org/comfyui-ci-container`)
2. Frontend dist is already built and served by the backend
3. Server runs at `http://127.0.0.1:8188`
4. Skip user prompts — run fully automated
### Local Environment
If `CI` is not set:
1. **Ask the user**: "Is a ComfyUI server already running? If so, what URL? (default: http://127.0.0.1:8188)"
- If yes: use the provided URL
- If no: offer to start one:
```bash
# Option A: Use existing ComfyUI installation
# Ask for the path to ComfyUI, then:
cd <comfyui_path>
python main.py --cpu --multi-user --front-end-root <frontend_dist_path> &
# Option B: Build frontend and use preview server (no backend features)
pnpm build && pnpm preview &
```
2. Wait for server readiness by polling the URL (retry with 2s intervals, 60s timeout)
### Browser Automation Setup
Use **playwright-cli** for browser interaction via Bash commands:
```bash
playwright-cli open http://127.0.0.1:8188 # open browser and navigate
playwright-cli snapshot # capture snapshot with element refs
playwright-cli click e1 # click by element ref from snapshot
playwright-cli press Tab # keyboard shortcuts
playwright-cli screenshot --filename=f.png # save screenshot
```
playwright-cli is headless by default (CI-friendly). Each command outputs the current page snapshot with element references (`e1`, `e2`, …) that you use for subsequent `click`, `fill`, `hover` commands. Always run `snapshot` before interacting to get fresh refs.
For local dev servers behind proxies, adjust the URL accordingly (e.g., `https://[port].stukivx.xyz` pattern if configured).
## Step 2: QA Test Plan
Navigate to the application URL and systematically test each area below. For each test, record:
- **Status**: pass / fail / skip (with reason)
- **Notes**: any issues, unexpected behavior, or visual glitches
- **Screenshots**: take screenshots of failures or notable states
### 2.1 Application Load & Routes
| Test | Steps |
| ----------------- | ------------------------------------------------------------ |
| Root route loads | Navigate to `/` — GraphView should render with canvas |
| User select route | Navigate to `/user-select` — user selection UI should appear |
| Default redirect | If multi-user mode, `/` redirects to `/user-select` first |
| 404 handling | Navigate to `/nonexistent` — should handle gracefully |
### 2.2 Canvas & Graph View
| Test | Steps |
| ------------------------- | -------------------------------------------------------------- |
| Canvas renders | The LiteGraph canvas is visible and interactive |
| Pan canvas | Click and drag on empty canvas area |
| Zoom in/out | Use scroll wheel or Alt+=/Alt+- |
| Fit view | Press `.` key — canvas fits to content |
| Add node via double-click | Double-click canvas to open search, type "KSampler", select it |
| Add node via search | Open search box, find and add a node |
| Delete node | Select a node, press Delete key |
| Connect nodes | Drag from output slot to input slot |
| Disconnect nodes | Right-click a link and remove, or drag from connected slot |
| Multi-select | Shift+click or drag-select multiple nodes |
| Copy/Paste | Select nodes, Ctrl+C then Ctrl+V |
| Undo/Redo | Make changes, Ctrl+Z to undo, Ctrl+Y to redo |
| Node context menu | Right-click a node — menu appears with all expected options |
| Canvas context menu | Right-click empty canvas — menu appears |
### 2.3 Node Operations
| Test | Steps |
| ------------------- | ---------------------------------------------------------- |
| Bypass node | Select node, Ctrl+B — node shows bypass state |
| Mute node | Select node, Ctrl+M — node shows muted state |
| Collapse node | Select node, Alt+C — node collapses |
| Pin node | Select node, press P — node becomes pinned |
| Rename node | Double-click node title — edit mode activates |
| Node color | Right-click > Color — color picker works |
| Group nodes | Select multiple nodes, Ctrl+G — group created |
| Ungroup | Right-click group > Ungroup |
| Widget interactions | Toggle checkboxes, adjust sliders, type in text fields |
| Combo widget | Click dropdown widgets — options appear and are selectable |
### 2.4 Sidebar Tabs
| Test | Steps |
| ---------------------- | ------------------------------------------------------ |
| Workflows tab | Press W — workflows sidebar opens with saved workflows |
| Node Library tab | Press N — node library opens with categories |
| Model Library tab | Press M — model library opens |
| Assets tab | Press A — assets browser opens |
| Tab toggle | Press same key again — sidebar closes |
| Search in sidebar | Type in search box — results filter |
| Drag node from library | Drag a node from library onto canvas |
### 2.5 Topbar & Workflow Tabs
| Test | Steps |
| -------------------- | ------------------------------------------------------ |
| Workflow tab display | Current workflow name shown in tab bar |
| New workflow | Ctrl+N — new blank workflow created |
| Rename workflow | Double-click workflow tab |
| Tab context menu | Right-click workflow tab — menu with Close/Rename/etc. |
| Multiple tabs | Open multiple workflows, switch between them |
| Queue button | Click Queue/Run button — prompt queues |
| Batch count | Click batch count editor, change value |
| Menu hamburger | Click hamburger menu — options appear |
### 2.6 Settings Dialog
| Test | Steps |
| ---------------- | ---------------------------------------------------- |
| Open settings | Press Ctrl+, or click settings button |
| Settings tabs | Navigate through all setting categories |
| Change a setting | Toggle a boolean setting — it persists after closing |
| Search settings | Type in settings search box — results filter |
| Keybindings tab | Navigate to keybindings panel |
| About tab | Navigate to about panel — version info shown |
| Close settings | Press Escape or click close button |
### 2.7 Bottom Panel
| Test | Steps |
| ------------------- | -------------------------------------- |
| Toggle panel | Press Ctrl+` — bottom panel opens |
| Logs tab | Logs/terminal tab shows server output |
| Shortcuts tab | Shortcuts reference is displayed |
| Keybindings display | Press Ctrl+Shift+K — keybindings panel |
### 2.8 Execution & Queue
| Test | Steps |
| -------------- | ----------------------------------------------------- |
| Queue prompt | Load default workflow, click Queue — execution starts |
| Queue progress | Progress indicator shows during execution |
| Interrupt | Press Ctrl+Alt+Enter during execution — interrupts |
| Job history | Open job history sidebar — past executions listed |
| Clear history | Clear execution history via menu |
### 2.9 Workflow File Operations
| Test | Steps |
| --------------- | ------------------------------------------------- |
| Save workflow | Ctrl+S — workflow saves (check for prompt if new) |
| Open workflow | Ctrl+O — file picker or workflow browser opens |
| Export JSON | Menu > Export — workflow JSON downloads |
| Import workflow | Drag a .json workflow file onto canvas |
| Load default | Menu > Load Default — default workflow loads |
| Clear workflow | Menu > Clear — canvas clears (after confirmation) |
### 2.10 Advanced Features
| Test | Steps |
| --------------- | ------------------------------------------------- |
| Minimap | Alt+M — minimap toggle |
| Focus mode | Toggle focus mode |
| Canvas lock | Press H to lock, V to unlock |
| Link visibility | Ctrl+Shift+L — toggle links |
| Subgraph | Select nodes > Ctrl+Shift+E — convert to subgraph |
### 2.11 Error Handling
| Test | Steps |
| --------------------- | -------------------------------------------- |
| Missing nodes dialog | Load workflow with non-existent node types |
| Missing models dialog | Trigger missing model warning |
| Network error | Disconnect backend, verify graceful handling |
| Invalid workflow | Try loading malformed JSON |
### 2.12 Responsive & Accessibility
| Test | Steps |
| ------------------- | ------------------------------------- |
| Window resize | Resize browser window — layout adapts |
| Keyboard navigation | Tab through interactive elements |
| Sidebar resize | Drag sidebar edge to resize |
## Step 3: Generate Report
After completing all tests, generate a markdown report file.
### Report Location
```
docs/qa/YYYY-MM-DD-NNN-report.md
```
Where:
- `YYYY-MM-DD` is today's date
- `NNN` is a zero-padded increment index (001, 002, etc.)
To determine the increment, check existing files:
```bash
ls docs/qa/ | grep "$(date +%Y-%m-%d)" | wc -l
```
### Report Template
```markdown
# QA Report: ComfyUI Frontend
**Date**: YYYY-MM-DD
**Environment**: CI / Local (OS, Browser)
**Frontend Version**: (git sha or version)
**Agent**: Claude / Codex / Other
**Server URL**: http://...
## Summary
| Category | Pass | Fail | Skip | Total |
| --------------- | ---- | ---- | ---- | ----- |
| Routes & Load | | | | |
| Canvas | | | | |
| Node Operations | | | | |
| Sidebar | | | | |
| Topbar | | | | |
| Settings | | | | |
| Bottom Panel | | | | |
| Execution | | | | |
| File Operations | | | | |
| Advanced | | | | |
| Error Handling | | | | |
| Responsive | | | | |
| **Total** | | | | |
## Results
### Routes & Load
- [x] Root route loads — pass
- [ ] ...
### Canvas & Graph View
- [x] Canvas renders — pass
- [ ] ...
(repeat for each category)
## Issues Found
### Issue 1: [Title]
- **Severity**: critical / major / minor / cosmetic
- **Steps to reproduce**: ...
- **Expected**: ...
- **Actual**: ...
- **Screenshot**: (if available)
## Notes
Any additional observations, performance notes, or suggestions.
```
## Step 4: Commit and Push Report
### In CI (when `CI=true`)
Save the report directly to `$QA_ARTIFACTS` (the CI workflow uploads this as
an artifact and posts results as a PR comment). Do **not** commit, push, or
create a new PR.
### Local / interactive use
When running locally, create a draft PR after committing:
```bash
# Ensure on a feature branch
BRANCH_NAME="qa/$(date +%Y-%m-%d)-$(git rev-parse --short HEAD)"
git checkout -b "$BRANCH_NAME" 2>/dev/null || git checkout "$BRANCH_NAME"
git add docs/qa/
git commit -m "docs: add QA report $(date +%Y-%m-%d)
Automated QA report covering all frontend routes and features."
git push -u origin "$BRANCH_NAME"
# Create draft PR assigned to comfy-pr-bot
gh pr create \
--draft \
--title "QA Report: $(date +%Y-%m-%d)" \
--body "## QA Report
Automated frontend QA run covering all routes and interactive features.
See \`docs/qa/\` for the full report.
/cc @comfy-pr-bot" \
--assignee comfy-pr-bot
```
## Execution Notes
### Cross-Platform Considerations
- **Windows**: Use `pwsh` or `cmd` equivalents for shell commands. `gh` CLI works on all platforms.
- **macOS**: Keyboard shortcuts use Cmd instead of Ctrl in the actual app, but Playwright sends OS-appropriate keys.
- **Linux**: Primary CI platform. Screenshot baselines are Linux-only.
### Agent Compatibility
This skill uses **playwright-cli** (`@playwright/cli`) — a token-efficient CLI designed for coding agents. Install it once with `npm install -g @playwright/cli@latest`, then use `Bash` to run commands.
The key operations and their playwright-cli equivalents:
| Action | Command |
| ---------------- | ---------------------------------------- |
| Navigate to URL | `playwright-cli goto <url>` |
| Get element refs | `playwright-cli snapshot` |
| Click element | `playwright-cli click <ref>` |
| Type text | `playwright-cli fill <ref> <text>` |
| Press shortcut | `playwright-cli press <key>` |
| Take screenshot | `playwright-cli screenshot --filename=f` |
| Hover element | `playwright-cli hover <ref>` |
| Select dropdown | `playwright-cli select <ref> <value>` |
Snapshots return element references (`e1`, `e2`, …). Always run `snapshot` after navigation or major interactions to refresh refs before acting.
### Tips for Reliable QA
1. **Wait for page stability** before interacting — check that elements are visible and enabled
2. **Take a snapshot after each major navigation** to verify state
3. **Don't use fixed timeouts** — poll for expected conditions
4. **Record the full page snapshot** at the start for baseline comparison
5. **If a test fails**, document it and continue — don't abort the entire QA run
6. **Group related tests** — complete one category before moving to the next

View File

@@ -1,99 +0,0 @@
---
name: contain-audit
description: 'Detect DOM elements where CSS contain:layout+style would improve rendering performance. Runs a Playwright-based audit on a large workflow, scores candidates by subtree size and sizing constraints, measures performance impact, and generates a ranked report.'
---
# CSS Containment Audit
Automatically finds DOM elements where adding `contain: layout style` would reduce browser recalculation overhead.
## What It Does
1. Loads a large workflow (245 nodes) in a real browser
2. Walks the DOM tree and scores every element as a containment candidate
3. For each high-scoring candidate, applies `contain: layout style` via JavaScript
4. Measures rendering performance (style recalcs, layouts, task duration) before and after
5. Takes before/after screenshots to detect visual breakage
6. Generates a ranked report with actionable recommendations
## When to Use
- After adding new Vue components to the node rendering pipeline
- When investigating rendering performance on large workflows
- Before and after refactoring node DOM structure
- As part of periodic performance audits
## How to Run
```bash
# Start the dev server first
pnpm dev &
# Run the audit (uses the @audit tag, not included in normal CI runs)
pnpm exec playwright test browser_tests/tests/containAudit.spec.ts --project=audit
# View the HTML report
pnpm exec playwright show-report
```
## How to Read Results
The audit outputs a table to the console:
```text
CSS Containment Audit Results
=======================================================
Rank | Selector | Subtree | Score | DRecalcs | DLayouts | Visual
1 | [data-testid="node-inner-wrap"] | 18 | 72 | -34% | -12% | OK
2 | .node-body | 12 | 48 | -8% | -3% | OK
3 | .node-header | 4 | 16 | +1% | 0% | OK
```
- **Subtree**: Number of descendant elements (higher = more to skip)
- **Score**: Composite heuristic score (subtree size x sizing constraint bonus)
- **DRecalcs / DLayouts**: Change in style recalcs / layout counts vs baseline (negative = improvement)
- **Visual**: OK if no pixel change, DIFF if screenshot differs (may include subpixel noise — verify manually)
## Candidate Scoring
An element is a good containment candidate when:
1. **Large subtree** -- many descendants that the browser can skip recalculating
2. **Externally constrained size** -- width/height determined by CSS variables, flex, or explicit values (not by content)
3. **No existing containment** -- `contain` is not already applied
4. **Not a leaf** -- has at least a few child elements
Elements that should NOT get containment:
- Elements whose children overflow visually beyond bounds (e.g., absolute-positioned overlays with negative inset)
- Elements whose height is determined by content and affects sibling layout
- Very small subtrees (overhead of containment context outweighs benefit)
## Limitations
- Cannot fully guarantee `contain` safety -- visual review of screenshots is required
- Performance measurements have natural variance; run multiple times for confidence
- Only tests idle and pan scenarios; widget interactions may differ
- The audit modifies styles at runtime via JS, which doesn't account for Tailwind purging or build-time optimizations
## Example PR
[#9946 — fix: add CSS contain:layout contain:style to node inner wrapper](https://github.com/Comfy-Org/ComfyUI_frontend/pull/9946)
This PR added `contain-layout contain-style` to the node inner wrapper div in `LGraphNode.vue`. The audit tool would have flagged this element as a high-scoring candidate because:
- **Large subtree** (18+ descendants: header, slots, widgets, content, badges)
- **Externally constrained size** (`w-(--node-width)`, `flex-1` — dimensions set by CSS variables and flex parent)
- **Natural isolation boundary** between frequently-changing content (widgets) and infrequently-changing overlays (selection outlines, borders)
The actual change was a single line: adding `'contain-layout contain-style'` to the inner wrapper's class list at `src/renderer/extensions/vueNodes/components/LGraphNode.vue:79`.
## Reference
| Resource | Path |
| ----------------- | ------------------------------------------------------- |
| Audit test | `browser_tests/tests/containAudit.spec.ts` |
| PerformanceHelper | `browser_tests/fixtures/helpers/PerformanceHelper.ts` |
| Perf tests | `browser_tests/tests/performance.spec.ts` |
| Large workflow | `browser_tests/assets/large-graph-workflow.json` |
| Example PR | https://github.com/Comfy-Org/ComfyUI_frontend/pull/9946 |

View File

@@ -28,21 +28,3 @@ reviews:
3. The PR description includes a concrete, non-placeholder explanation of why an end-to-end regression test was not added.
Fail otherwise. When failing, mention which bug-fix signal you found and ask the author to either add or update a Playwright regression test under `browser_tests/` or add a concrete explanation in the PR description of why an end-to-end regression test is not practical.
- name: ADR compliance for entity/litegraph changes
mode: warning
instructions: |
Use only PR metadata already available in the review context: the changed-file list relative to the PR base, the PR description, and the diff content. Do not rely on shell commands.
This check applies ONLY when the PR modifies files under `src/lib/litegraph/`, `src/ecs/`, or files related to graph entities (nodes, links, widgets, slots, reroutes, groups, subgraphs).
If none of those paths appear in the changed files, pass immediately.
When applicable, check for:
1. **Command pattern (ADR 0003)**: Entity state mutations must be serializable, idempotent, deterministic commands — not imperative fire-and-forget side effects. Flag direct spatial mutation (`node.pos =`, `node.size =`, `group.pos =`) outside of a store or command, and any new void-returning mutation API that should produce a command object.
2. **God-object growth (ADR 0008)**: New methods/properties added to `LGraphNode`, `LGraphCanvas`, `LGraph`, or `Subgraph` that add responsibilities rather than extracting/migrating existing ones.
3. **ECS data/behavior separation (ADR 0008)**: Component-like data structures that contain methods or back-references to parent entities. ECS components must be plain data. New OOP instance patterns (`node.someProperty`, `node.someMethod()`) for data that should be a World component.
4. **Extension ecosystem (ADR 0008)**: Changes to extension-facing callbacks (`onConnectionsChange`, `onRemoved`, `onAdded`, `onConfigure`, `onConnectInput/Output`, `onWidgetChanged`), `node.widgets` access, `node.serialize` overrides, or `graph._version++` without migration guidance. These affect 40+ custom node repos.
Pass if none of these patterns are found in the diff.
When warning, reference the specific ADR by number and link to `docs/adr/` for context. Frame findings as directional guidance since ADR 0003 and 0008 are in Proposed status.

View File

@@ -44,12 +44,17 @@ runs:
python -m pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
pip install wait-for-it
- name: Start ComfyUI server
if: ${{ inputs.launch_server == 'true' }}
shell: bash
working-directory: ComfyUI
env:
EXTRA_SERVER_PARAMS: ${{ inputs.extra_server_params }}
run: |
python main.py --cpu --multi-user --front-end-root ../dist ${{ inputs.extra_server_params }} &
wait-for-it --service 127.0.0.1:8188 -t 600
python main.py --cpu --multi-user --front-end-root ../dist $EXTRA_SERVER_PARAMS &
for i in $(seq 1 300); do
curl -sf http://127.0.0.1:8188/api/system_stats >/dev/null 2>&1 && echo "Server ready" && exit 0
sleep 2
done
echo "::error::ComfyUI server did not start within 600s" && exit 1

View File

@@ -13,6 +13,8 @@ runs:
# Install pnpm, Node.js, build frontend
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6

View File

@@ -17,6 +17,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6

View File

@@ -22,6 +22,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6

View File

@@ -21,6 +21,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6

View File

@@ -20,6 +20,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Use Node.js
uses: actions/setup-node@6044e13b5dc448c55e2357c09f80417699197238 # v6.2.0

View File

@@ -21,6 +21,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Use Node.js
uses: actions/setup-node@6044e13b5dc448c55e2357c09f80417699197238 # v6.2.0
@@ -74,6 +76,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Use Node.js
uses: actions/setup-node@6044e13b5dc448c55e2357c09f80417699197238 # v6.2.0
@@ -95,7 +99,7 @@ jobs:
if npx license-checker-rseidelsohn@4 \
--production \
--summary \
--excludePackages '@comfyorg/comfyui-frontend;@comfyorg/design-system;@comfyorg/ingest-types;@comfyorg/registry-types;@comfyorg/shared-frontend-utils;@comfyorg/tailwind-utils;@comfyorg/comfyui-electron-types' \
--excludePackages '@comfyorg/comfyui-frontend;@comfyorg/design-system;@comfyorg/registry-types;@comfyorg/shared-frontend-utils;@comfyorg/tailwind-utils;@comfyorg/comfyui-electron-types' \
--clarificationsFile .github/license-clarifications.json \
--onlyAllow 'MIT;MIT*;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC;0BSD;BlueOak-1.0.0;Python-2.0;CC0-1.0;Unlicense;(MIT OR Apache-2.0);(MIT OR GPL-3.0);(Apache-2.0 OR MIT);(MPL-2.0 OR Apache-2.0);CC-BY-4.0;CC-BY-3.0;GPL-3.0-only'; then
echo ''

View File

@@ -33,27 +33,13 @@ jobs:
path: dist/
retention-days: 1
# Build cloud distribution for @cloud tagged tests
# NX_SKIP_NX_CACHE=true is required because `nx build` was already run
# for the OSS distribution above. Without skipping cache, Nx returns
# the cached OSS build since env vars aren't part of the cache key.
- name: Build cloud frontend
run: NX_SKIP_NX_CACHE=true pnpm build:cloud
- name: Upload cloud frontend
uses: actions/upload-artifact@v6
with:
name: frontend-dist-cloud
path: dist/
retention-days: 1
# Sharded chromium tests
playwright-tests-chromium-sharded:
needs: setup
runs-on: ubuntu-latest
timeout-minutes: 60
container:
image: ghcr.io/comfy-org/comfyui-ci-container:0.0.16
image: ghcr.io/comfy-org/comfyui-ci-container:0.0.13
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
@@ -101,7 +87,7 @@ jobs:
needs: setup
runs-on: ubuntu-latest
container:
image: ghcr.io/comfy-org/comfyui-ci-container:0.0.16
image: ghcr.io/comfy-org/comfyui-ci-container:0.0.13
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
@@ -111,14 +97,14 @@ jobs:
strategy:
fail-fast: false
matrix:
browser: [chromium-2x, chromium-0.5x, mobile-chrome, cloud]
browser: [chromium-2x, chromium-0.5x, mobile-chrome]
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Download built frontend
uses: actions/download-artifact@v7
with:
name: ${{ matrix.browser == 'cloud' && 'frontend-dist-cloud' || 'frontend-dist' }}
name: frontend-dist
path: dist/
- name: Start ComfyUI server

View File

@@ -1,4 +1,4 @@
# Description: Unit and component testing with Vitest + coverage reporting
# Description: Unit and component testing with Vitest
name: 'CI: Tests Unit'
on:
@@ -23,12 +23,5 @@ jobs:
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Run Vitest tests with coverage
run: pnpm test:coverage
- name: Upload coverage to Codecov
if: always()
uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
with:
files: coverage/lcov.info
fail_ci_if_error: false
- name: Run Vitest tests
run: pnpm test:unit

View File

@@ -1,182 +0,0 @@
name: Hub CI
on:
push:
branches: [main]
paths:
- 'apps/hub/**'
- '.github/workflows/hub-ci.yaml'
pull_request:
branches: [main]
paths:
- 'apps/hub/**'
- '.github/workflows/hub-ci.yaml'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
pull-requests: write
jobs:
lint:
name: Lint & Check
runs-on: ubuntu-latest
defaults:
run:
working-directory: apps/hub
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Astro Check
run: pnpm run check
- name: Unit Tests
run: pnpm test
- name: Validate Templates
run: pnpm run validate:templates
continue-on-error: true
build:
name: Build Hub
runs-on: ubuntu-latest
defaults:
run:
working-directory: apps/hub
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Build site
run: pnpm run build
env:
HUB_SKIP_SYNC: 'true'
SKIP_AI_GENERATION: 'true'
- name: Upload build artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: hub-build
path: apps/hub/dist
retention-days: 1
seo-audit:
name: SEO Audit
needs: build
runs-on: ubuntu-latest
defaults:
run:
working-directory: apps/hub
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Download build artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: hub-build
path: apps/hub/dist
- name: Validate sitemap
id: sitemap
continue-on-error: true
run: |
echo "## Sitemap Validation" >> $GITHUB_STEP_SUMMARY
if pnpm run validate:sitemap 2>&1 | tee sitemap-output.txt; then
echo "✅ Sitemap validation passed" >> $GITHUB_STEP_SUMMARY
echo "status=passed" >> $GITHUB_OUTPUT
else
echo "❌ Sitemap validation failed" >> $GITHUB_STEP_SUMMARY
echo "status=failed" >> $GITHUB_OUTPUT
fi
- name: Run SEO audit
id: seo
continue-on-error: true
run: |
echo "## SEO Audit" >> $GITHUB_STEP_SUMMARY
if pnpm run audit:seo 2>&1 | tee seo-output.txt; then
echo "✅ SEO audit passed" >> $GITHUB_STEP_SUMMARY
echo "status=passed" >> $GITHUB_OUTPUT
else
echo "⚠️ SEO audit found issues" >> $GITHUB_STEP_SUMMARY
echo "status=issues" >> $GITHUB_OUTPUT
fi
- name: Check internal links
id: links
continue-on-error: true
run: |
echo "## Link Check" >> $GITHUB_STEP_SUMMARY
DIST_DIR="dist"
if [ ! -d "$DIST_DIR" ]; then
echo "⚠️ No build output found at $DIST_DIR" >> $GITHUB_STEP_SUMMARY
echo "status=skipped" >> $GITHUB_OUTPUT
exit 0
fi
BROKEN_FILE="broken-links.txt"
: > "$BROKEN_FILE"
BROKEN_COUNT=0
TOTAL_COUNT=0
for htmlfile in $(find "$DIST_DIR" -name '*.html' \
-not -path "$DIST_DIR/ar/*" -not -path "$DIST_DIR/es/*" -not -path "$DIST_DIR/fr/*" \
-not -path "$DIST_DIR/ja/*" -not -path "$DIST_DIR/ko/*" -not -path "$DIST_DIR/pt-BR/*" \
-not -path "$DIST_DIR/ru/*" -not -path "$DIST_DIR/tr/*" -not -path "$DIST_DIR/zh/*" \
-not -path "$DIST_DIR/zh-TW/*" | head -500); do
hrefs=$(grep -oP 'href="(/[^"]*)"' "$htmlfile" | sed 's/href="//;s/"$//' || true)
for href in $hrefs; do
TOTAL_COUNT=$((TOTAL_COUNT + 1))
clean="${href%%#*}"
clean="${clean%%\?*}"
if [ -z "$clean" ] || [ "$clean" = "/" ]; then continue; fi
found=false
if [[ "$clean" =~ \.[a-zA-Z0-9]+$ ]]; then
[ -f "${DIST_DIR}${clean}" ] && found=true
else
base="${clean%/}"
[ -f "${DIST_DIR}${base}/index.html" ] && found=true
[ "$found" = false ] && [ -f "${DIST_DIR}${base}.html" ] && found=true
[ "$found" = false ] && [ -f "${DIST_DIR}${clean}" ] && found=true
[ "$found" = false ] && [ -d "${DIST_DIR}${base}" ] && found=true
fi
if [ "$found" = false ]; then
BROKEN_COUNT=$((BROKEN_COUNT + 1))
echo "- \`${href}\` (in ${htmlfile#${DIST_DIR}/})" >> "$BROKEN_FILE"
fi
done
done
if [ "$BROKEN_COUNT" -eq 0 ]; then
echo "✅ All internal links valid ($TOTAL_COUNT checked)" >> $GITHUB_STEP_SUMMARY
echo "status=passed" >> $GITHUB_OUTPUT
else
echo "❌ Found $BROKEN_COUNT broken internal links out of $TOTAL_COUNT" >> $GITHUB_STEP_SUMMARY
head -n 50 "$BROKEN_FILE" >> $GITHUB_STEP_SUMMARY
echo "status=failed" >> $GITHUB_OUTPUT
fi
- name: Upload SEO reports
if: always()
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: hub-seo-reports
path: |
apps/hub/seo-output.txt
apps/hub/seo-summary.json
apps/hub/broken-links.txt
if-no-files-found: ignore

View File

@@ -1,68 +0,0 @@
name: Hub Cron Rebuild
on:
schedule:
# Every 15 minutes — rebuilds the site to pick up new UGC workflows
# for search index, sitemap, filter pages, and pre-rendered detail pages.
- cron: '*/15 * * * *'
workflow_dispatch:
concurrency:
group: hub-deploy-prod
cancel-in-progress: false
permissions:
contents: read
jobs:
rebuild:
runs-on: ubuntu-latest
env:
SKIP_AI_GENERATION: 'true'
PUBLIC_POSTHOG_KEY: ${{ secrets.HUB_POSTHOG_KEY }}
PUBLIC_GA_MEASUREMENT_ID: ${{ secrets.HUB_GA_MEASUREMENT_ID }}
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Checkout templates data
uses: actions/checkout@v6
with:
repository: Comfy-Org/workflow_templates
path: _workflow_templates
sparse-checkout: templates
token: ${{ secrets.GH_TOKEN }}
- name: Restore content cache
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: apps/hub/.content-cache
key: hub-content-cache-cron-prod-${{ hashFiles('_workflow_templates/templates/**', 'apps/hub/src/**') }}
restore-keys: |
hub-content-cache-cron-prod-
- name: Sync templates
run: pnpm run sync
working-directory: apps/hub
env:
HUB_TEMPLATES_DIR: ${{ github.workspace }}/_workflow_templates/templates
- name: Build Astro site
run: pnpm run build
working-directory: apps/hub
env:
PUBLIC_HUB_API_URL: ${{ secrets.HUB_API_URL_PRODUCTION }}
PUBLIC_COMFY_CLOUD_URL: ${{ secrets.COMFY_CLOUD_URL_PRODUCTION }}
PUBLIC_APPROVED_ONLY: 'true'
- name: Deploy to Vercel
uses: amondnet/vercel-action@16e87c0a08142b0d0d33b76aeaf20823c381b9b9 # v25.2.0
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.HUB_VERCEL_PROJECT_ID }}
working-directory: apps/hub
vercel-args: '--prebuilt --prod'

View File

@@ -1,80 +0,0 @@
name: Deploy Hub
on:
workflow_dispatch:
inputs:
skip_ai:
description: 'Skip AI content generation'
type: boolean
default: false
force_regenerate:
description: 'Force regenerate all content (ignore cache)'
type: boolean
default: false
template_filter:
description: 'Regenerate specific template only (e.g. "flux_schnell")'
type: string
default: ''
concurrency:
group: hub-deploy-prod
cancel-in-progress: false
permissions:
contents: read
jobs:
build-deploy:
runs-on: ubuntu-latest
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
PUBLIC_POSTHOG_KEY: ${{ secrets.HUB_POSTHOG_KEY }}
PUBLIC_GA_MEASUREMENT_ID: ${{ secrets.HUB_GA_MEASUREMENT_ID }}
SKIP_AI_GENERATION: ${{ inputs.skip_ai && 'true' || '' }}
FORCE_AI_REGENERATE: ${{ inputs.force_regenerate && 'true' || '' }}
AI_TEMPLATE_FILTER: ${{ inputs.template_filter }}
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Checkout templates data
uses: actions/checkout@v6
with:
repository: Comfy-Org/workflow_templates
path: _workflow_templates
sparse-checkout: templates
token: ${{ secrets.GH_TOKEN }}
- name: Restore content cache
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: apps/hub/.content-cache
key: hub-content-cache-${{ hashFiles('_workflow_templates/templates/**', 'apps/hub/src/**') }}
restore-keys: |
hub-content-cache-
- name: Sync templates
run: pnpm run sync
working-directory: apps/hub
env:
HUB_TEMPLATES_DIR: ${{ github.workspace }}/_workflow_templates/templates
- name: Build Astro site
run: pnpm run build
working-directory: apps/hub
env:
PUBLIC_HUB_API_URL: ${{ secrets.HUB_API_URL_PRODUCTION }}
PUBLIC_COMFY_CLOUD_URL: ${{ secrets.COMFY_CLOUD_URL_PRODUCTION }}
PUBLIC_APPROVED_ONLY: 'true'
- name: Deploy to Vercel
uses: amondnet/vercel-action@16e87c0a08142b0d0d33b76aeaf20823c381b9b9 # v25.2.0
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.HUB_VERCEL_PROJECT_ID }}
working-directory: apps/hub
vercel-args: '--prebuilt --prod'

View File

@@ -1,134 +0,0 @@
name: Hub Preview Cron
on:
schedule:
- cron: '*/15 * * * *'
workflow_dispatch:
permissions:
contents: read
pull-requests: write
jobs:
discover:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.targets.outputs.matrix }}
steps:
- uses: actions/checkout@v6
- name: Build rebuild targets
id: targets
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
targets='[]'
# Main with production API (all workflows, no approved filter)
targets=$(echo "$targets" | jq -c '. + [{"ref": "main", "is_main": true, "pr": 0, "api_env": "production"}]')
# Main with test API
targets=$(echo "$targets" | jq -c '. + [{"ref": "main", "is_main": true, "pr": 0, "api_env": "test"}]')
# Find open PRs with the "preview-cron" label
prs=$(gh pr list --label "preview-cron" --state open --json number,headRefName)
for row in $(echo "$prs" | jq -c '.[]'); do
ref=$(echo "$row" | jq -r '.headRefName')
num=$(echo "$row" | jq -r '.number')
targets=$(echo "$targets" | jq -c \
--arg ref "$ref" --argjson num "$num" \
'. + [{"ref": $ref, "is_main": false, "pr": $num, "api_env": "test"}]')
done
echo "matrix={\"include\":$targets}" >> "$GITHUB_OUTPUT"
echo "### Rebuild targets" >> "$GITHUB_STEP_SUMMARY"
echo "$targets" | jq '.' >> "$GITHUB_STEP_SUMMARY"
rebuild:
needs: discover
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.discover.outputs.matrix) }}
concurrency:
group: hub-preview-cron-${{ matrix.ref }}-${{ matrix.api_env }}
cancel-in-progress: true
env:
SKIP_AI_GENERATION: 'true'
steps:
- name: Checkout
uses: actions/checkout@v6
with:
ref: ${{ matrix.ref }}
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Checkout templates data
uses: actions/checkout@v6
with:
repository: Comfy-Org/workflow_templates
path: _workflow_templates
sparse-checkout: templates
token: ${{ secrets.GH_TOKEN }}
- name: Restore content cache
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: apps/hub/.content-cache
key: hub-content-cache-cron-${{ matrix.ref }}-${{ matrix.api_env }}-${{ hashFiles('_workflow_templates/templates/**', 'apps/hub/src/**') }}
restore-keys: |
hub-content-cache-cron-${{ matrix.ref }}-${{ matrix.api_env }}-
- name: Sync templates
run: pnpm run sync:en-only
working-directory: apps/hub
env:
HUB_TEMPLATES_DIR: ${{ github.workspace }}/_workflow_templates/templates
- name: Build Astro site
run: pnpm run build
working-directory: apps/hub
env:
PUBLIC_HUB_API_URL: ${{ matrix.api_env == 'test' && secrets.HUB_API_URL_PREVIEW || secrets.HUB_API_URL_PRODUCTION }}
PUBLIC_COMFY_CLOUD_URL: ${{ matrix.api_env == 'test' && secrets.COMFY_CLOUD_URL_PREVIEW || secrets.COMFY_CLOUD_URL_PRODUCTION }}
- name: Deploy to Vercel
id: deploy
uses: amondnet/vercel-action@16e87c0a08142b0d0d33b76aeaf20823c381b9b9 # v25.2.0
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.HUB_VERCEL_PROJECT_ID }}
working-directory: apps/hub
vercel-args: '--prebuilt'
- name: Alias main preview (prod API)
if: matrix.is_main && matrix.api_env == 'production' && secrets.HUB_PREVIEW_ALIAS
env:
PREVIEW_URL: ${{ steps.deploy.outputs.preview-url }}
ALIAS: ${{ secrets.HUB_PREVIEW_ALIAS }}
VERCEL_TOKEN_VAL: ${{ secrets.VERCEL_TOKEN }}
VERCEL_SCOPE: ${{ secrets.VERCEL_ORG_ID }}
run: |
npx vercel alias "$PREVIEW_URL" "$ALIAS" --token="$VERCEL_TOKEN_VAL" --scope="$VERCEL_SCOPE"
- name: Alias main preview (test API)
if: matrix.is_main && matrix.api_env == 'test' && secrets.HUB_PREVIEW_TEST_ALIAS
env:
PREVIEW_URL: ${{ steps.deploy.outputs.preview-url }}
ALIAS: ${{ secrets.HUB_PREVIEW_TEST_ALIAS }}
VERCEL_TOKEN_VAL: ${{ secrets.VERCEL_TOKEN }}
VERCEL_SCOPE: ${{ secrets.VERCEL_ORG_ID }}
run: |
npx vercel alias "$PREVIEW_URL" "$ALIAS" --token="$VERCEL_TOKEN_VAL" --scope="$VERCEL_SCOPE"
- name: Comment preview URL on PR
if: matrix.pr > 0
uses: marocchino/sticky-pull-request-comment@773744901bac0e8cbb5a0dc842800d45e9b2b405 # v2.9.4
with:
number: ${{ matrix.pr }}
header: hub-preview-cron
message: |
🔄 **Hub preview cron rebuilt:** ${{ steps.deploy.outputs.preview-url }}
_Last rebuild: ${{ github.event.head_commit.timestamp || 'manual trigger' }}_

View File

@@ -1,74 +0,0 @@
name: Hub Preview
on:
pull_request:
paths:
- 'apps/hub/**'
workflow_dispatch:
concurrency:
group: hub-preview-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
permissions:
contents: read
pull-requests: write
jobs:
preview:
runs-on: ubuntu-latest
env:
SKIP_AI_GENERATION: 'true'
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Checkout templates data
uses: actions/checkout@v6
with:
repository: Comfy-Org/workflow_templates
path: _workflow_templates
sparse-checkout: templates
token: ${{ secrets.GH_TOKEN }}
- name: Restore content cache
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: apps/hub/.content-cache
key: hub-content-cache-preview-${{ hashFiles('_workflow_templates/templates/**', 'apps/hub/src/**') }}
restore-keys: |
hub-content-cache-preview-
- name: Sync templates
run: pnpm run sync:en-only
working-directory: apps/hub
env:
HUB_TEMPLATES_DIR: ${{ github.workspace }}/_workflow_templates/templates
- name: Build Astro site
run: pnpm run build
working-directory: apps/hub
env:
PUBLIC_HUB_API_URL: ${{ secrets.HUB_API_URL_PREVIEW }}
PUBLIC_COMFY_CLOUD_URL: ${{ secrets.COMFY_CLOUD_URL_PREVIEW }}
- name: Deploy preview to Vercel
id: deploy
uses: amondnet/vercel-action@16e87c0a08142b0d0d33b76aeaf20823c381b9b9 # v25.2.0
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.HUB_VERCEL_PROJECT_ID }}
working-directory: apps/hub
vercel-args: '--prebuilt'
- name: Comment preview URL
if: github.event_name == 'pull_request'
uses: marocchino/sticky-pull-request-comment@773744901bac0e8cbb5a0dc842800d45e9b2b405 # v2.9.4
with:
header: hub-vercel-preview
message: |
🚀 **Hub preview deployed:** ${{ steps.deploy.outputs.preview-url }}

View File

@@ -30,6 +30,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6

1100
.github/workflows/pr-qa.yaml vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -180,7 +180,7 @@ jobs:
if git ls-remote --exit-code origin perf-data >/dev/null 2>&1; then
git fetch origin perf-data --depth=1
mkdir -p temp/perf-history
for file in $(git ls-tree --name-only origin/perf-data baselines/ 2>/dev/null | sort -r | head -15); do
for file in $(git ls-tree --name-only origin/perf-data baselines/ 2>/dev/null | sort -r | head -10); do
git show "origin/perf-data:${file}" > "temp/perf-history/$(basename "$file")" 2>/dev/null || true
done
echo "Loaded $(ls temp/perf-history/*.json 2>/dev/null | wc -l) historical baselines"

View File

@@ -77,7 +77,7 @@ jobs:
needs: setup
runs-on: ubuntu-latest
container:
image: ghcr.io/comfy-org/comfyui-ci-container:0.0.16
image: ghcr.io/comfy-org/comfyui-ci-container:0.0.13
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -85,6 +85,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6

View File

@@ -76,6 +76,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6
@@ -201,6 +203,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- uses: actions/setup-node@v6
with:

View File

@@ -20,10 +20,10 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v6
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- uses: actions/setup-node@v6
with:
node-version-file: '.nvmrc'

View File

@@ -76,6 +76,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6

View File

@@ -16,10 +16,10 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v6
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- uses: actions/setup-node@v6
with:
node-version-file: '.nvmrc'

View File

@@ -144,6 +144,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6

View File

@@ -52,6 +52,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6

View File

@@ -30,6 +30,8 @@ jobs:
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
with:
version: 10
- name: Setup Node.js
uses: actions/setup-node@v6

5
.gitignore vendored
View File

@@ -99,4 +99,7 @@ vitest.config.*.timestamp*
# Weekly docs check output
/output.txt
.amp
.amp
.playwright-cli/
.playwright/
.claude/scheduled_tasks.lock

View File

@@ -9,6 +9,7 @@
"packages/registry-types/src/comfyRegistryTypes.ts",
"public/materialdesignicons.min.css",
"src/types/generatedManagerTypes.ts",
"**/__fixtures__/**/*.json"
"**/__fixtures__/**/*.json",
"scripts/qa-report-template.html"
]
}

View File

@@ -12,8 +12,6 @@
"playwright-report/*",
"src/extensions/core/*",
"src/scripts/*",
"apps/hub/scripts/**/*",
"apps/hub/src/scripts/*",
"src/types/generatedManagerTypes.ts",
"src/types/vue-shim.d.ts",
"test-results/*",

View File

@@ -208,7 +208,7 @@ See @docs/testing/\*.md for detailed patterns.
3. Keep your module mocks contained
Do not use global mutable state within the test file
Use `vi.hoisted()` if necessary to allow for per-test Arrange phase manipulation of deeper mock state
4. For Component testing, prefer [@testing-library/vue](https://testing-library.com/docs/vue-testing-library/intro/) with `@testing-library/user-event` for user-centric, behavioral tests. [Vue Test Utils](https://test-utils.vuejs.org/) is also accepted, especially for tests that need direct access to the component wrapper (e.g., `findComponent`, `emitted()`). Follow the advice [about making components easy to test](https://test-utils.vuejs.org/guide/essentials/easy-to-test.html)
4. For Component testing, use [Vue Test Utils](https://test-utils.vuejs.org/) and especially follow the advice [about making components easy to test](https://test-utils.vuejs.org/guide/essentials/easy-to-test.html)
5. Aim for behavioral coverage of critical and new features
### Playwright / Browser / E2E Tests
@@ -216,7 +216,6 @@ See @docs/testing/\*.md for detailed patterns.
1. Follow the Best Practices described [in the Playwright documentation](https://playwright.dev/docs/best-practices)
2. Do not use waitForTimeout, use Locator actions and [retrying assertions](https://playwright.dev/docs/test-assertions#auto-retrying-assertions)
3. Tags like `@mobile`, `@2x` are respected by config and should be used for relevant tests
4. Type all API mock responses in `route.fulfill()` using generated types or schemas from `packages/ingest-types`, `packages/registry-types`, `src/workbench/extensions/manager/types/generatedManagerTypes.ts`, or `src/schemas/` — see `docs/guidance/playwright.md` for the full source-of-truth table
## External Resources
@@ -232,18 +231,6 @@ See @docs/testing/\*.md for detailed patterns.
- Nx: <https://nx.dev/docs/reference/nx-commands>
- [Practical Test Pyramid](https://martinfowler.com/articles/practical-test-pyramid.html)
## Architecture Decision Records
All architectural decisions are documented in `docs/adr/`. Code changes must be consistent with accepted ADRs. Proposed ADRs indicate design direction and should be treated as guidance. See `.agents/checks/adr-compliance.md` for automated validation rules.
### Entity Architecture Constraints (ADR 0003 + ADR 0008)
1. **Command pattern for all mutations**: Every entity state change must be a serializable, idempotent, deterministic command — replayable, undoable, and transmittable over CRDT. No imperative fire-and-forget mutation APIs. Systems produce command batches, not direct side effects.
2. **Centralized registries and ECS-style access**: Entity data lives in the World (centralized registry), queried via `world.getComponent(entityId, ComponentType)`. Do not add new instance properties/methods to entity classes. Do not use OOP inheritance for entity modeling.
3. **No god-object growth**: Do not add methods to `LGraphNode`, `LGraphCanvas`, `LGraph`, or `Subgraph`. Extract to systems, stores, or composables.
4. **Plain data components**: ECS components are plain data objects — no methods, no back-references to parent entities. Behavior belongs in systems (pure functions).
5. **Extension ecosystem impact**: Changes to entity callbacks (`onConnectionsChange`, `onRemoved`, `onAdded`, `onConnectInput/Output`, `onConfigure`, `onWidgetChanged`), `node.widgets` access, `node.serialize`, or `graph._version++` affect 40+ custom node repos and require migration guidance.
## Project Philosophy
- Follow good software engineering principles

View File

@@ -41,49 +41,12 @@
/src/components/templates/ @Myestery @christian-byrne @comfyui-wiki
# Mask Editor
/src/extensions/core/maskeditor.ts @trsommer @brucew4yn3rp @jtydhr88
/src/extensions/core/maskEditorLayerFilenames.ts @trsommer @brucew4yn3rp @jtydhr88
/src/components/maskeditor/ @trsommer @brucew4yn3rp @jtydhr88
/src/composables/maskeditor/ @trsommer @brucew4yn3rp @jtydhr88
/src/stores/maskEditorStore.ts @trsommer @brucew4yn3rp @jtydhr88
/src/stores/maskEditorDataStore.ts @trsommer @brucew4yn3rp @jtydhr88
# Image Crop
/src/extensions/core/imageCrop.ts @jtydhr88
/src/components/imagecrop/ @jtydhr88
/src/composables/useImageCrop.ts @jtydhr88
/src/lib/litegraph/src/widgets/ImageCropWidget.ts @jtydhr88
# Image Compare
/src/extensions/core/imageCompare.ts @jtydhr88
/src/renderer/extensions/vueNodes/widgets/components/WidgetImageCompare.vue @jtydhr88
/src/renderer/extensions/vueNodes/widgets/components/WidgetImageCompare.test.ts @jtydhr88
/src/renderer/extensions/vueNodes/widgets/components/WidgetImageCompare.stories.ts @jtydhr88
/src/renderer/extensions/vueNodes/widgets/composables/useImageCompareWidget.ts @jtydhr88
/src/lib/litegraph/src/widgets/ImageCompareWidget.ts @jtydhr88
# Painter
/src/extensions/core/painter.ts @jtydhr88
/src/components/painter/ @jtydhr88
/src/composables/painter/ @jtydhr88
/src/renderer/extensions/vueNodes/widgets/composables/usePainterWidget.ts @jtydhr88
/src/lib/litegraph/src/widgets/PainterWidget.ts @jtydhr88
# GLSL
/src/renderer/glsl/ @jtydhr88 @pythongosssss @christian-byrne
/src/extensions/core/maskeditor.ts @trsommer @brucew4yn3rp
/src/extensions/core/maskEditorLayerFilenames.ts @trsommer @brucew4yn3rp
# 3D
/src/extensions/core/load3d.ts @jtydhr88
/src/extensions/core/load3dLazy.ts @jtydhr88
/src/extensions/core/load3d/ @jtydhr88
/src/components/load3d/ @jtydhr88
/src/composables/useLoad3d.ts @jtydhr88
/src/composables/useLoad3d.test.ts @jtydhr88
/src/composables/useLoad3dDrag.ts @jtydhr88
/src/composables/useLoad3dDrag.test.ts @jtydhr88
/src/composables/useLoad3dViewer.ts @jtydhr88
/src/composables/useLoad3dViewer.test.ts @jtydhr88
/src/services/load3dService.ts @jtydhr88
# Manager
/src/workbench/extensions/manager/ @viva-jinyi @christian-byrne @ltdrdata

View File

@@ -1,6 +1,6 @@
import tailwindcss from '@tailwindcss/vite'
import vue from '@vitejs/plugin-vue'
import { config as dotenvConfig } from 'dotenv'
import dotenv from 'dotenv'
import path from 'node:path'
import { fileURLToPath } from 'node:url'
import { FileSystemIconLoader } from 'unplugin-icons/loaders'
@@ -11,7 +11,7 @@ import { defineConfig } from 'vite'
import { createHtmlPlugin } from 'vite-plugin-html'
import vueDevTools from 'vite-plugin-vue-devtools'
dotenvConfig()
dotenv.config()
const projectRoot = fileURLToPath(new URL('.', import.meta.url))

9
apps/hub/.gitignore vendored
View File

@@ -1,9 +0,0 @@
dist/
.astro/
.content-cache/
src/content/templates/
public/workflows/thumbnails/
public/workflows/avatars/
public/previews/
public/search-index.json
knowledge/tutorials/

View File

@@ -1,254 +0,0 @@
import { defineConfig } from 'astro/config'
import sitemap from '@astrojs/sitemap'
import vercel from '@astrojs/vercel'
import tailwindcss from '@tailwindcss/vite'
import fs from 'node:fs'
import path from 'node:path'
import os from 'node:os'
import vue from '@astrojs/vue'
// Build template date lookup at config time
const templatesDir = path.join(process.cwd(), 'src/content/templates')
const templateDates = new Map()
if (fs.existsSync(templatesDir)) {
const files = fs.readdirSync(templatesDir).filter((f) => f.endsWith('.json'))
for (const file of files) {
try {
const content = JSON.parse(
fs.readFileSync(path.join(templatesDir, file), 'utf-8')
)
if (content.name && content.date) {
templateDates.set(content.name, content.date)
}
} catch {
// Skip invalid JSON files
}
}
}
// Build timestamp used as lastmod fallback for pages without a specific date
const buildDate = new Date().toISOString()
// Supported locales (matches src/i18n/config.ts)
const locales = [
'en',
'zh',
'zh-TW',
'ja',
'ko',
'es',
'fr',
'ru',
'tr',
'ar',
'pt-BR'
]
const nonDefaultLocales = locales.filter((l) => l !== 'en')
// Custom sitemap pages for ISR routes not discovered at build time
const siteOrigin = (
process.env.PUBLIC_SITE_ORIGIN || 'https://www.comfy.org'
).replace(/\/$/, '')
// Creator profile pages — extract unique usernames from synced templates
const creatorUsernames = new Set()
if (fs.existsSync(templatesDir)) {
const files = fs.readdirSync(templatesDir).filter((f) => f.endsWith('.json'))
for (const file of files) {
try {
const content = JSON.parse(
fs.readFileSync(path.join(templatesDir, file), 'utf-8')
)
if (content.username) creatorUsernames.add(content.username)
} catch {
// Skip invalid JSON
}
}
}
const creatorPages = [...creatorUsernames].map(
(u) => `${siteOrigin}/workflows/${u}/`
)
const localeCustomPages = nonDefaultLocales.map(
(locale) => `${siteOrigin}/${locale}/workflows/`
)
const customPages = [...creatorPages, ...localeCustomPages]
// https://astro.build/config
export default defineConfig({
site: (process.env.PUBLIC_SITE_ORIGIN || 'https://www.comfy.org').replace(
/\/$/,
''
),
prefetch: {
prefetchAll: false,
defaultStrategy: 'hover'
},
i18n: {
defaultLocale: 'en',
locales: locales,
routing: {
prefixDefaultLocale: false // English at root, others prefixed (/zh/, /ja/, etc.)
}
},
integrations: [
sitemap({
// Use custom filename to avoid collision with Framer's /sitemap.xml
filenameBase: 'sitemap-workflows',
// Include Framer's marketing sitemap in the index
customSitemaps: ['https://www.comfy.org/sitemap.xml'],
// Include on-demand locale pages that aren't discovered at build time
customPages: customPages,
serialize(item) {
const url = new URL(item.url)
const pathname = url.pathname
// Template detail pages: /workflows/{slug}/ or /{locale}/workflows/{slug}/
const templateMatch = pathname.match(
/^(?:\/([a-z]{2}(?:-[A-Z]{2})?))?\/workflows\/([^/]+)\/?$/
)
if (templateMatch) {
const slug = templateMatch[2]
const date = templateDates.get(slug)
item.lastmod = date ? new Date(date).toISOString() : buildDate
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'monthly'
item.priority = 0.8
return item
}
// Homepage
if (pathname === '/' || pathname === '') {
item.lastmod = buildDate
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'daily'
item.priority = 1.0
return item
}
// Workflows index (including localized versions)
if (pathname.match(/^(?:\/[a-z]{2}(?:-[A-Z]{2})?)?\/workflows\/?$/)) {
item.lastmod = buildDate
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'daily'
item.priority = 0.9
return item
}
// Category pages: /workflows/category/{type}/ or /{locale}/workflows/category/{type}/
if (
pathname.match(
/^(?:\/[a-z]{2}(?:-[A-Z]{2})?)?\/workflows\/category\//
)
) {
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'weekly'
item.priority = 0.7
return item
}
// Model pages: /workflows/model/{model}/ or /{locale}/workflows/model/{model}/
if (
pathname.match(/^(?:\/[a-z]{2}(?:-[A-Z]{2})?)?\/workflows\/model\//)
) {
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'weekly'
item.priority = 0.6
return item
}
// Tag pages: /workflows/tag/{tag}/ or /{locale}/workflows/tag/{tag}/
if (
pathname.match(/^(?:\/[a-z]{2}(?:-[A-Z]{2})?)?\/workflows\/tag\//)
) {
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'weekly'
item.priority = 0.6
return item
}
// Default for other pages
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'weekly'
item.priority = 0.5
return item
},
// Exclude OG image routes and legacy redirect pages from sitemap.
// Legacy redirects are /workflows/{slug}/ without a 12-char hex share_id suffix.
// Canonical detail pages are /workflows/{slug}-{shareId}/ (shareId = 12 hex chars).
filter: (page) => {
if (
page.includes('/workflows/og/') ||
page.includes('/workflows/og.png')
)
return false
// Check if this is a workflow detail path (not category/tag/model/creators)
const match = page.match(/\/workflows\/([^/]+)\/$/)
if (match) {
const segment = match[1]
// Skip known sub-paths
if (
['category', 'tag', 'model', 'creators'].some((p) =>
page.includes(`/workflows/${p}/`)
)
)
return true
// Include if it has a share_id suffix (12 hex chars after last hyphen)
const lastHyphen = segment.lastIndexOf('-')
if (lastHyphen === -1) return false // No hyphen = legacy redirect
const candidate = segment.slice(lastHyphen + 1)
if (candidate.length === 12 && /^[0-9a-f]+$/.test(candidate))
return true
return false // Has hyphen but not a valid share_id = legacy redirect
}
return true
}
}),
vue()
],
output: 'static',
adapter: vercel({
webAnalytics: { enabled: true },
skewProtection: true
}),
// Build performance optimizations
build: {
// Increase concurrency for faster builds on multi-core systems
concurrency: Math.max(1, os.cpus().length),
// Inline small stylesheets automatically
inlineStylesheets: 'auto'
},
// HTML compression
compressHTML: true,
// Image optimization settings
image: {
service: {
entrypoint: 'astro/assets/services/sharp',
config: {
// Limit input pixels to prevent memory issues with large images
limitInputPixels: 268402689 // ~16384x16384
}
}
},
// Responsive images for automatic srcset generation (now stable in Astro 5)
// Note: responsiveImages was moved from experimental to stable in Astro 5.x
vite: {
plugins: [tailwindcss()],
build: {
chunkSizeWarningLimit: 1000
},
optimizeDeps: {
include: ['web-vitals']
},
css: {
devSourcemap: false
}
}
})

View File

@@ -1,22 +0,0 @@
# 3D Generation
3D generation creates three-dimensional models — meshes, point clouds, or multi-view images — from text or image inputs. This enables rapid prototyping of 3D assets without manual modeling. In ComfyUI, several approaches exist: image-to-3D (lifting a single photo into a mesh), text-to-3D (generating a 3D object from a description), and multi-view generation (producing consistent views of an object that can be reconstructed into 3D).
## How It Works in ComfyUI
- Key nodes involved: Model-specific loaders (`TripoSR`, `InstantMesh`, `StableZero123`), `LoadImage`, `Save3D` / `Preview3D`, `CRM` nodes
- Typical workflow pattern: Load image → Load 3D model → Run inference → Preview 3D result → Export mesh
## Key Settings
- **Inference steps**: Number of denoising/reconstruction steps. More steps generally improve quality but increase generation time.
- **Elevation angle**: Camera elevation for multi-view generation, controlling the vertical viewing angle of the generated views.
- **Guidance scale**: How closely the model follows the input image or text. Higher values increase fidelity to the input but may reduce diversity.
- **Output format**: Export format for the 3D mesh — OBJ, GLB, and PLY are common options, each suited to different downstream tools.
## Tips
- Clean single-object images on white or simple backgrounds work best for image-to-3D conversion.
- Multi-view approaches (like Zero123) often produce better geometry than single-view methods.
- Post-process generated meshes in Blender for cleanup, retopology, or texturing before production use.
- Start with TripoSR for quick results — it generates meshes in seconds and is a good baseline to compare against other methods.

View File

@@ -1,374 +0,0 @@
{
"text-to-image": [
"01_get_started_text_to_image",
"api_bfl_flux2_max_sofa_swap",
"api_bfl_flux_1_kontext_max_image",
"api_bfl_flux_1_kontext_multiple_images_input",
"api_bfl_flux_1_kontext_pro_image",
"api_bfl_flux_pro_t2i",
"api_bytedance_seedream4",
"api_flux2",
"api_from_photo_2_miniature",
"api_google_gemini_image",
"api_grok_text_to_image",
"api_ideogram_v3_t2i",
"api_kling_omni_image",
"api_luma_photon_i2i",
"api_luma_photon_style_ref",
"api_nano_banana_pro",
"api_openai_dall_e_2_inpaint",
"api_openai_dall_e_2_t2i",
"api_openai_dall_e_3_t2i",
"api_openai_fashion_billboard_generator",
"api_openai_image_1_i2i",
"api_openai_image_1_inpaint",
"api_openai_image_1_multi_inputs",
"api_openai_image_1_t2i",
"api_recraft_image_gen_with_color_control",
"api_recraft_image_gen_with_style_control",
"api_recraft_style_reference",
"api_recraft_vector_gen",
"api_runway_reference_to_image",
"api_runway_text_to_image",
"api_stability_ai_i2i",
"api_stability_ai_sd3.5_i2i",
"api_stability_ai_sd3.5_t2i",
"api_stability_ai_stable_image_ultra_t2i",
"api_wan_text_to_image",
"default",
"flux1_dev_uso_reference_image_gen",
"flux1_krea_dev",
"flux_canny_model_example",
"flux_depth_lora_example",
"flux_dev_checkpoint_example",
"flux_dev_full_text_to_image",
"flux_fill_inpaint_example",
"flux_fill_outpaint_example",
"flux_redux_model_example",
"flux_schnell",
"flux_schnell_full_text_to_image",
"hidream_e1_1",
"hidream_e1_full",
"hidream_i1_dev",
"hidream_i1_fast",
"hidream_i1_full",
"image-qwen_image_edit_2511_lora_inflation",
"image_anima_preview",
"image_chroma1_radiance_text_to_image",
"image_chroma_text_to_image",
"image_flux2",
"image_flux2_fp8",
"image_flux2_klein_image_edit_4b_base",
"image_flux2_klein_image_edit_4b_distilled",
"image_flux2_klein_image_edit_9b_base",
"image_flux2_klein_image_edit_9b_distilled",
"image_flux2_klein_text_to_image",
"image_flux2_text_to_image",
"image_flux2_text_to_image_9b",
"image_kandinsky5_t2i",
"image_lotus_depth_v1_1",
"image_netayume_lumina_t2i",
"image_newbieimage_exp0_1-t2i",
"image_omnigen2_image_edit",
"image_omnigen2_t2i",
"image_ovis_text_to_image",
"image_qwen_Image_2512",
"image_qwen_image",
"image_qwen_image_2512_with_2steps_lora",
"image_qwen_image_controlnet_patch",
"image_qwen_image_instantx_controlnet",
"image_qwen_image_instantx_inpainting_controlnet",
"image_qwen_image_union_control_lora",
"image_z_image",
"image_z_image_turbo",
"image_z_image_turbo_fun_union_controlnet",
"sd3.5_large_blur",
"sd3.5_large_canny_controlnet_example",
"sd3.5_large_depth",
"sd3.5_simple_example",
"sdxl_refiner_prompt_example",
"sdxl_revision_text_prompts",
"sdxl_simple_example",
"sdxlturbo_example",
"templates-9grid_social_media-v2.0"
],
"img2img": [
"02_qwen_Image_edit_subgraphed",
"api_luma_photon_i2i",
"api_meshy_multi_image_to_model",
"api_openai_image_1_i2i",
"api_runway_reference_to_image",
"api_stability_ai_i2i",
"api_stability_ai_sd3.5_i2i",
"flux1_dev_uso_reference_image_gen",
"flux_canny_model_example",
"flux_depth_lora_example",
"flux_fill_inpaint_example",
"flux_fill_outpaint_example",
"flux_kontext_dev_basic",
"flux_redux_model_example",
"image_chrono_edit_14B",
"image_qwen_image_edit",
"image_qwen_image_edit_2509",
"image_qwen_image_instantx_controlnet",
"image_qwen_image_instantx_inpainting_controlnet",
"sd3.5_large_blur",
"sd3.5_large_canny_controlnet_example",
"sd3.5_large_depth"
],
"inpainting": [
"api_openai_dall_e_2_inpaint",
"api_openai_image_1_inpaint",
"api_stability_ai_audio_inpaint",
"flux_fill_inpaint_example",
"flux_fill_outpaint_example",
"image_flux.1_fill_dev_OneReward",
"image_qwen_image_instantx_inpainting_controlnet",
"video_wan2_2_14B_fun_inpaint",
"video_wan2_2_5B_fun_inpaint",
"video_wan_vace_inpainting",
"wan2.1_fun_inp"
],
"outpainting": [
"api_bria_image_outpainting",
"flux_fill_outpaint_example",
"image_flux.1_fill_dev_OneReward",
"video_wan_vace_outpainting"
],
"controlnet": [
"02_qwen_Image_edit_subgraphed",
"flux_canny_model_example",
"flux_depth_lora_example",
"flux_redux_model_example",
"image_lotus_depth_v1_1",
"image_qwen_image_controlnet_patch",
"image_qwen_image_edit_2509",
"image_qwen_image_instantx_controlnet",
"image_qwen_image_instantx_inpainting_controlnet",
"image_qwen_image_union_control_lora",
"image_z_image_turbo_fun_union_controlnet",
"sd3.5_large_canny_controlnet_example",
"sd3.5_large_depth",
"utility-depthAnything-v2-relative-video",
"utility-frame_interpolation-film",
"utility-lineart-video",
"utility-normal_crafter-video",
"utility-openpose-video",
"video_ltx2_canny_to_video",
"video_ltx2_depth_to_video",
"video_ltx2_pose_to_video",
"wan2.1_fun_control"
],
"upscaling": [
"api_topaz_image_enhance",
"api_topaz_video_enhance",
"api_wavespeed_flshvsr_video_upscale",
"api_wavespped_image_upscale",
"api_wavespped_seedvr2_ai_image_fix",
"ultility_hitpaw_general_image_enhance",
"ultility_hitpaw_video_enhance",
"utility-gan_upscaler",
"utility-topaz_landscape_upscaler",
"utility_interpolation_image_upscale",
"utility_nanobanana_pro_ai_image_fix",
"utility_nanobanana_pro_illustration_upscale",
"utility_nanobanana_pro_product_upscale",
"utility_recraft_creative_image_upscale",
"utility_recraft_crisp_image_upscale",
"utility_seedvr2_image_upscale",
"utility_seedvr2_video_upscale",
"utility_topaz_illustration_upscale",
"utility_video_upscale"
],
"video-generation": [
"03_video_wan2_2_14B_i2v_subgraphed",
"api_bytedace_seedance1_5_flf2v",
"api_bytedace_seedance1_5_image_to_video",
"api_bytedace_seedance1_5_text_to_video",
"api_bytedance_flf2v",
"api_bytedance_image_to_video",
"api_bytedance_text_to_video",
"api_grok_video",
"api_grok_video_edit",
"api_hailuo_minimax_i2v",
"api_hailuo_minimax_t2v",
"api_hailuo_minimax_video",
"api_kling2_6_i2v",
"api_kling2_6_t2v",
"api_kling_effects",
"api_kling_flf",
"api_kling_i2v",
"api_kling_motion_control",
"api_kling_omni_edit_video",
"api_kling_omni_i2v",
"api_kling_omni_t2v",
"api_kling_omni_v2v",
"api_ltxv_image_to_video",
"api_ltxv_text_to_video",
"api_luma_i2v",
"api_luma_t2v",
"api_moonvalley_image_to_video",
"api_moonvalley_text_to_video",
"api_moonvalley_video_to_video_motion_transfer",
"api_moonvalley_video_to_video_pose_control",
"api_openai_sora_video",
"api_pixverse_i2v",
"api_pixverse_t2v",
"api_pixverse_template_i2v",
"api_runway_first_last_frame",
"api_runway_gen3a_turbo_image_to_video",
"api_runway_gen4_turo_image_to_video",
"api_topaz_video_enhance",
"api_veo2_i2v",
"api_veo3",
"api_vidu_image_to_video",
"api_vidu_q2_flf2v",
"api_vidu_q2_i2v",
"api_vidu_q2_r2v",
"api_vidu_q2_t2v",
"api_vidu_q3_image_to_video",
"api_vidu_q3_text_to_video",
"api_vidu_reference_to_video",
"api_vidu_start_end_to_video",
"api_vidu_text_to_video",
"api_vidu_video_extension",
"api_wan2_6_i2v",
"api_wan2_6_t2v",
"api_wan_image_to_video",
"api_wan_r2v",
"api_wan_text_to_video",
"api_wavespeed_flshvsr_video_upscale",
"gsc_starter_2",
"hunyuan_video_text_to_video",
"image_to_video_wan",
"ltxv_image_to_video",
"ltxv_text_to_video",
"template-Animation_Trajectory_Control_Wan_ATI",
"templates-3D_logo_texture_animation",
"templates-6-key-frames",
"templates-car_product",
"templates-photo_to_product_vid",
"templates-sprite_sheet",
"templates-stitched_vid_contact_sheet",
"templates-textured_logo_elements",
"templates-textured_logotype-v2.1",
"text_to_video_wan",
"txt_to_image_to_video",
"ultility_hitpaw_video_enhance",
"utility-depthAnything-v2-relative-video",
"utility-frame_interpolation-film",
"utility-gan_upscaler",
"utility-lineart-video",
"utility-normal_crafter-video",
"utility-openpose-video",
"utility_seedvr2_video_upscale",
"utility_video_upscale",
"video-wan21_scail",
"video_humo",
"video_hunyuan_video_1.5_720p_i2v",
"video_hunyuan_video_1.5_720p_t2v",
"video_kandinsky5_i2v",
"video_kandinsky5_t2v",
"video_ltx2_canny_to_video",
"video_ltx2_depth_to_video",
"video_ltx2_i2v",
"video_ltx2_i2v_distilled",
"video_ltx2_pose_to_video",
"video_ltx2_t2v",
"video_ltx2_t2v_distilled",
"video_wan2.1_alpha_t2v_14B",
"video_wan2.1_fun_camera_v1.1_1.3B",
"video_wan2.1_fun_camera_v1.1_14B",
"video_wan2_1_infinitetalk",
"video_wan2_2_14B_animate",
"video_wan2_2_14B_flf2v",
"video_wan2_2_14B_fun_camera",
"video_wan2_2_14B_fun_control",
"video_wan2_2_14B_fun_inpaint",
"video_wan2_2_14B_i2v",
"video_wan2_2_14B_s2v",
"video_wan2_2_14B_t2v",
"video_wan2_2_5B_fun_control",
"video_wan2_2_5B_fun_inpaint",
"video_wan2_2_5B_ti2v",
"video_wan_ati",
"video_wan_vace_14B_ref2v",
"video_wan_vace_14B_t2v",
"video_wan_vace_14B_v2v",
"video_wan_vace_flf2v",
"video_wan_vace_inpainting",
"video_wan_vace_outpainting",
"video_wanmove_480p",
"video_wanmove_480p_hallucination",
"wan2.1_flf2v_720_f16",
"wan2.1_fun_control",
"wan2.1_fun_inp"
],
"audio-generation": [
"05_audio_ace_step_1_t2a_song_subgraphed",
"api_kling2_6_i2v",
"api_kling2_6_t2v",
"api_stability_ai_audio_inpaint",
"api_stability_ai_audio_to_audio",
"api_stability_ai_text_to_audio",
"api_vidu_q3_image_to_video",
"api_vidu_q3_text_to_video",
"audio-chatterbox_tts",
"audio-chatterbox_tts_dialog",
"audio-chatterbox_tts_multilingual",
"audio-chatterbox_vc",
"audio_ace_step_1_5_checkpoint",
"audio_ace_step_1_5_split",
"audio_ace_step_1_5_split_4b",
"audio_ace_step_1_m2m_editing",
"audio_ace_step_1_t2a_instrumentals",
"audio_ace_step_1_t2a_song",
"audio_stable_audio_example",
"utility-audioseparation",
"video_wan2_1_infinitetalk",
"video_wan2_2_14B_s2v"
],
"3d-generation": [
"04_hunyuan_3d_2.1_subgraphed",
"3d_hunyuan3d-v2.1",
"3d_hunyuan3d_image_to_model",
"3d_hunyuan3d_multiview_to_model",
"3d_hunyuan3d_multiview_to_model_turbo",
"api_from_photo_2_miniature",
"api_hunyuan3d_image_to_model",
"api_hunyuan3d_text_to_model",
"api_meshy_image_to_model",
"api_meshy_multi_image_to_model",
"api_meshy_text_to_model",
"api_rodin_gen2",
"api_rodin_image_to_model",
"api_rodin_multiview_to_model",
"api_tripo3_0_image_to_model",
"api_tripo3_0_text_to_model",
"api_tripo_image_to_model",
"api_tripo_multiview_to_model",
"api_tripo_text_to_model",
"templates-3D_logo_texture_animation",
"templates-qwen_multiangle"
],
"lora": [
"flux_depth_lora_example",
"image-qwen_image_edit_2511_lora_inflation",
"image_qwen_image_2512_with_2steps_lora",
"image_qwen_image_union_control_lora"
],
"embeddings": [],
"ip-adapter": [
"api_kling_omni_i2v",
"api_kling_omni_image",
"api_kling_omni_v2v",
"api_magnific_image_style_transfer",
"api_recraft_style_reference",
"api_vidu_q2_r2v",
"api_wan_r2v",
"templates-product_ad-v2.0"
],
"samplers": [],
"cfg": [],
"vae": []
}

View File

@@ -1,22 +0,0 @@
# Audio Generation
Audio generation in ComfyUI covers creating speech (text-to-speech), music, and sound effects from text prompts or reference audio. Dedicated audio models run within ComfyUI's node graph, letting you integrate audio creation into larger multimedia workflows — for example, generating a video and its soundtrack in a single pipeline.
## How It Works in ComfyUI
- Key nodes involved: Model-specific nodes (`CosyVoice` nodes for TTS, `StableAudio` nodes for music/SFX), audio preview and save nodes, `AudioScheduler`
- Typical workflow pattern: Load audio model → Provide text/reference input → Generate audio → Preview/save audio
## Key Settings
- **Sample rate**: Output audio quality, typically 2400048000 Hz. Higher rates capture more detail but produce larger files.
- **Duration**: Length of generated audio in seconds. Longer durations may reduce quality or coherence depending on the model.
- **Voice reference**: For voice cloning, a short audio clip of the target voice (310 seconds of clean speech works best).
- **Text input**: The text to be spoken (TTS) or the description of the desired audio (music/SFX generation).
## Tips
- CosyVoice and F5-TTS are popular choices for text-to-speech in ComfyUI, each with dedicated custom nodes.
- Stable Audio Open handles music and sound effect generation from text descriptions.
- Use clean, noise-free reference audio clips for voice cloning to get the best results.
- Keep text inputs short and well-punctuated for the highest quality speech output — long paragraphs may degrade in naturalness.

View File

@@ -1,23 +0,0 @@
# CFG / Guidance Scale
Classifier-Free Guidance (CFG) controls how strongly the model follows your text prompt versus generating freely. Higher CFG values produce outputs that adhere more closely to the prompt but can cause oversaturation and artifacts, while lower values yield more natural-looking images at the cost of reduced prompt control. Finding the right balance is essential for every workflow.
## How It Works in ComfyUI
- Key nodes: `KSampler` (the `cfg` parameter), `ModelSamplingDiscrete` (for advanced noise schedule configurations)
- During each sampling step, the model generates both a conditioned prediction (with your prompt) and an unconditioned prediction (without it). CFG scales the difference between the two — higher values push the output further toward the conditioned prediction, amplifying prompt influence.
## Key Settings
- **cfg** (1.030.0): The guidance scale value. Recommended ranges vary by model architecture:
- SD 1.5 / SDXL: 78 is the standard starting point
- Flux: 1.04.0 (Flux uses much lower guidance)
- Video models (e.g., Wan, HunyuanVideo): 3.55.0
## Tips
- Start at 7 for SD-based models and 3.5 for Flux, then adjust based on results
- Values above ~12 for SD models typically cause color oversaturation, harsh contrast, and visible artifacts
- Values below ~3 for SD models tend to produce blurry or incoherent results
- Some models like Flux Schnell use a guidance embedding baked into the model rather than traditional CFG — for these, the `cfg` parameter may have little or no effect
- When experimenting, change CFG in increments of 0.51.0 to see its impact clearly

View File

@@ -1,28 +0,0 @@
# ControlNet
ControlNet guides image generation using structural conditions extracted from reference images — such as edge maps, depth information, or human poses. Instead of relying solely on text prompts for composition, ControlNet lets you specify the spatial layout precisely. This bridges the gap between text-to-image flexibility and the structural precision needed for professional workflows.
## How It Works in ComfyUI
- Key nodes involved: `ControlNetLoader`, `ControlNetApplyAdvanced`, preprocessor nodes (`CannyEdgePreprocessor`, `DepthAnythingPreprocessor`, `DWPosePreprocessor`, `LineartPreprocessor`)
- Typical workflow pattern: Load reference image → preprocess to extract condition (edges/depth/pose) → load ControlNet model → apply condition to sampling → generate image with structural guidance
## ControlNet Types
- **Canny**: Detects edges to preserve outlines and shapes
- **Depth**: Captures spatial depth for accurate foreground/background placement
- **OpenPose**: Extracts human body and hand poses for character positioning
- **Normal Map**: Encodes surface orientation for consistent lighting and geometry
- **Lineart**: Follows line drawings and illustrations as generation guides
- **Scribble**: Uses rough sketches as loose compositional guides
## Key Settings
- **Strength**: Controls how strongly the condition guides generation (0.01.0). Values of 0.51.0 are typical. Higher values enforce the structure more rigidly; lower values allow the model more creative freedom.
- **start_percent / end_percent**: Controls when the ControlNet activates during the sampling process. Starting at 0.0 and ending at 1.0 applies guidance throughout. Ending earlier (e.g., 0.8) lets the model refine fine details freely in final steps.
## Tips
- Always preprocess your input image with the appropriate preprocessor node before feeding it to ControlNet. Raw images will not produce correct conditioning.
- Combine multiple ControlNets for precise control — for example, Depth for spatial layout plus OpenPose for character positioning. Stack them by chaining `ControlNetApplyAdvanced` nodes.
- If your generation looks distorted or overcooked, lower the ControlNet strength. Values above 0.8 can fight with the text prompt and produce artifacts.

View File

@@ -1,19 +0,0 @@
# Textual Embeddings
Textual embeddings are learned text representations that encode specific concepts, styles, or objects into the CLIP text encoder's vocabulary. These tiny files (~10100 KB) effectively add new "words" to your prompt vocabulary, letting you reference complex visual concepts — a particular art style, a specific character, or a set of undesirable artifacts — with a single token. Because they operate at the text-encoding level, embeddings integrate seamlessly with your existing prompts and require no changes to the model itself.
## How It Works in ComfyUI
- Key nodes: `CLIPTextEncode` — reference embeddings directly in your prompt text using the syntax `embedding:name_of_embedding`
- Typical workflow pattern: Place embedding files in `ComfyUI/models/embeddings/` → type `embedding:name_of_embedding` inside your positive or negative prompt in a `CLIPTextEncode` node → connect to sampler as usual
## Key Settings
- **Prompt weighting**: Embeddings have no dedicated strength slider, but you can adjust their influence with prompt weighting syntax, e.g., `(embedding:name_of_embedding:1.2)` to increase strength or `(embedding:name_of_embedding:0.6)` to soften it
- **Placement**: Add embeddings to the negative prompt to suppress unwanted features, or to the positive prompt to invoke a learned concept
## Tips
- Embeddings are commonly used in negative prompts (e.g., `embedding:EasyNegative`, `embedding:bad-hands-5`) to reduce common artifacts like malformed hands or distorted faces
- Make sure the embedding matches your base model version — an SD 1.5 embedding will not work correctly with an SDXL checkpoint
- You can combine multiple embeddings with regular text in the same prompt for fine-grained control

View File

@@ -1,20 +0,0 @@
# Image-to-Image
Image-to-image (img2img) transforms an existing image using a text prompt while preserving the original structure and composition. Instead of starting from pure noise, the source image is encoded into latent space and partially noised, then the sampler denoises it guided by your prompt. This lets you restyle photos, refine AI-generated images, or apply creative modifications while keeping the overall layout intact.
## How It Works in ComfyUI
- Key nodes involved: `LoadImage`, `VAEEncode`, `CLIPTextEncode` (positive + negative), `KSampler`, `VAEDecode`, `SaveImage`
- Typical workflow pattern: Load source image → encode to latent with VAE → encode text prompts → sample with partial denoise → decode latent to image → save
## Key Settings
- **Denoise Strength**: The most important setting, ranging from 0.0 to 1.0. Lower values (0.20.4) preserve more of the original image with subtle changes. Higher values (0.60.8) allow more creative freedom but deviate further from the source. A value of 1.0 effectively ignores the input image entirely.
- **Steps**: Number of sampling steps. 2030 is typical. Fewer steps may be sufficient at low denoise values since less transformation is needed.
- **CFG Scale**: Controls prompt adherence, same as text-to-image. 78 is a standard starting point.
## Tips
- Start with a denoise strength of 0.5 and adjust up or down based on how much change you want. This gives a balanced mix of original structure and new content.
- Your input image resolution should match the model's training resolution. Resize or crop your source image to 512×512 (SD 1.5) or 1024×1024 (SDXL) before loading to avoid quality issues.
- Use img2img iteratively: generate an initial text-to-image result, then refine it with img2img at low denoise to fix details without losing the overall composition.

View File

@@ -1,21 +0,0 @@
# Inpainting
Inpainting selectively regenerates parts of an image using a mask while leaving the rest untouched. You paint a mask over the area you want to change, provide a text prompt describing the desired replacement, and the model fills in only the masked region. This is essential for fixing defects, replacing objects, or refining specific details in an otherwise finished image.
## How It Works in ComfyUI
- Key nodes involved: `LoadImage`, `VAEEncodeForInpainting`, `CLIPTextEncode` (positive + negative), `KSampler`, `VAEDecode`, `SaveImage`
- Typical workflow pattern: Load image + mask → encode with inpainting-aware VAE node → encode text prompts → sample → decode → save
- The mask can be created using ComfyUI's built-in mask editor or loaded from an external image
## Key Settings
- **grow_mask_by**: Expands the mask by a number of pixels, helping the regenerated area blend smoothly with the surrounding image. 68 pixels is typical. Too little causes visible seams; too much affects areas you wanted to keep.
- **Denoise Strength**: For inpainting, higher values (0.71.0) generally work best since you want the masked region to be fully regenerated. Lower values may produce inconsistent blending.
- **Checkpoint**: Dedicated inpainting models like `512-inpainting-ema` produce significantly better edge blending than standard checkpoints.
## Tips
- Always expand your mask slightly beyond the target area. Tight masks create hard edges that look unnatural against the surrounding image.
- Describe what you want to appear in the masked region, not what you want to remove. For example, prompt "a clear blue sky" rather than "remove the bird."
- Use inpainting-specific checkpoints whenever possible. Standard models can inpaint but often struggle with seamless blending at mask boundaries.

View File

@@ -1,21 +0,0 @@
# IP-Adapter
IP-Adapter (Image Prompt Adapter) uses reference images to guide generation style, composition, or subject instead of — or alongside — text prompts. Rather than describing what you want in words, you show the model an image, enabling "image prompting." This is especially powerful for transferring artistic style, maintaining character consistency across generations, or conveying visual concepts that are difficult to express in text.
## How It Works in ComfyUI
- Key nodes: `IPAdapterModelLoader`, `IPAdapterApply` (or `IPAdapterAdvanced`), `CLIPVisionLoader`, `CLIPVisionEncode`, `PrepImageForClipVision`
- Typical workflow pattern: Load IP-Adapter model + CLIP Vision model → prepare and encode reference image → apply adapter to the main model → connect to sampler → decode
## Key Settings
- **weight** (0.01.0): Controls the influence of the reference image on the output. A range of 0.50.8 is typical; higher values make the output closer to the reference
- **weight_type**: Determines how the reference is interpreted — `standard` for general use, `style transfer` for artistic style without copying content, `composition` for layout guidance
- **start_at / end_at** (0.01.0): Controls when the adapter is active during sampling. Limiting the range (e.g., 0.00.8) can improve prompt responsiveness while retaining reference influence
## Tips
- Use the `style_transfer` weight type when you want to borrow an artistic style without reproducing the reference image's content
- Combine IP-Adapter with a text prompt for the best results — the text adds detail and specificity on top of the visual guidance
- Face-specific IP-Adapter models (e.g., `ip-adapter-faceid`) exist for portrait consistency across multiple generations
- Lower the weight if your output looks too similar to the reference image

View File

@@ -1,20 +0,0 @@
# LoRA
LoRA (Low-Rank Adaptation) is a technique for fine-tuning a base model's behavior using a small add-on file rather than retraining the entire model. LoRAs adjust a model's style, teach it specific subjects, or introduce new concepts — all in a file typically just 10200 MB, compared to multi-gigabyte full checkpoints. This makes them easy to share, swap, and combine. In ComfyUI, you load LoRAs on top of a checkpoint and control how strongly they influence the output.
## How It Works in ComfyUI
- Key nodes involved: `LoraLoader` (loads one LoRA and applies it to both MODEL and CLIP), `LoraLoaderModelOnly` (applies to MODEL only, skipping CLIP for faster loading)
- Typical workflow pattern: Load checkpoint → LoraLoader (attach LoRA) → CLIP Text Encode → KSampler → VAE Decode. Chain multiple `LoraLoader` nodes to stack LoRAs.
## Key Settings
- **strength_model**: Controls how much the LoRA affects the diffusion model. Range 0.01.0; typical values are 0.61.0. Higher values apply the LoRA effect more strongly.
- **strength_clip**: Controls how much the LoRA affects text encoding. Usually set to the same value as strength_model, but can be adjusted independently for fine control.
## Tips
- Start with strength 0.7 and adjust up or down based on results — too high can cause oversaturation or artifacts.
- Stacking too many LoRAs simultaneously can cause visual artifacts or conflicting styles; two or three is usually a safe limit.
- Ensure the LoRA matches your base model architecture — SD 1.5 LoRAs will not work with SDXL checkpoints, and vice versa.
- Many LoRAs require specific trigger words in your prompt to activate; always check the LoRA's documentation or model card.

View File

@@ -1,20 +0,0 @@
# Outpainting
Outpainting extends an image beyond its original borders, generating new content that seamlessly continues the existing scene. Unlike inpainting which replaces content within an image, outpainting adds content outside the frame — expanding the canvas in any direction. This is useful for changing aspect ratios, adding environmental context, or creating panoramic compositions from a single image.
## How It Works in ComfyUI
- Key nodes involved: `LoadImage`, `ImagePadForOutpaint`, `VAEEncodeForInpainting`, `CLIPTextEncode` (positive + negative), `KSampler`, `VAEDecode`, `SaveImage`
- Typical workflow pattern: Load image → pad image with transparent/noised borders → encode with inpainting VAE node (padded area becomes the mask) → encode text prompts → sample → decode → save
## Key Settings
- **Padding Pixels**: The number of pixels to extend on each side, typically 64256. Smaller increments produce more coherent results since the model has more context relative to the new area.
- **Denoise Strength**: Use high values (0.81.0) for outpainted regions since the padded area is essentially blank and needs full generation.
- **Feathering**: Controls the gradient blend between the original image and the new content. Higher feathering values create smoother transitions and reduce visible seams.
## Tips
- Outpaint in stages rather than all at once. Extending by 128 pixels at a time and iterating produces far more coherent results than trying to add 512 pixels in a single pass.
- Use a lower CFG scale (56) for outpainting. This allows the model to generate more natural, context-aware extensions rather than forcing strict prompt adherence that may clash with the existing image.
- Include scene context in your prompt that matches the original image. If the source shows an indoor room, describe the room's style and lighting so the extension feels continuous.

View File

@@ -1,21 +0,0 @@
# Samplers & Schedulers
Samplers are the algorithms that iteratively denoise a random latent into a coherent image, while schedulers control the noise schedule — how much noise is removed at each step. Together they determine the image's quality, speed, and visual character. Choosing the right combination is one of the most impactful decisions in any generation workflow.
## How It Works in ComfyUI
- Key nodes: `KSampler` (main sampling node), `KSamplerAdvanced` (provides control over start/end steps for multi-pass workflows)
- Typical workflow pattern: Load model → connect conditioning → configure sampler/scheduler/steps → sample → decode
## Key Settings
- **sampler_name**: The denoising algorithm. Common choices include `euler` (fast, good baseline), `euler_ancestral` (more creative variation), `dpmpp_2m` (balanced quality and speed), `dpmpp_2m_sde` (high quality, slightly slower), `dpmpp_3m_sde` (very high quality), and `uni_pc` (fast convergence)
- **scheduler**: Controls the noise reduction curve. `normal` is linear, `karras` front-loads noise reduction for better detail, `exponential` and `sgm_uniform` (recommended for SDXL) are also available
- **steps** (1100): Number of denoising iterations. 2030 is typical; more steps give diminishing returns. Flux and LCM models need far fewer (48 steps)
## Tips
- `euler` + `normal` is the safest starting combination for any model
- `dpmpp_2m` + `karras` is a popular choice when you want higher quality with minimal speed cost
- Ancestral samplers (`euler_ancestral`, any `_sde` variant) produce different results each run even with the same seed — useful for exploration, but not for reproducibility
- Flux and LCM models converge much faster; using 20+ steps with them wastes time without improving quality

View File

@@ -1,21 +0,0 @@
# Text-to-Image Generation
Text-to-image is the foundational workflow in ComfyUI: you provide a text description (prompt) and the system generates an image from scratch. This is the starting point for most generative AI art. A diffusion model iteratively denoises a random latent image, guided by your text prompt encoded through CLIP, to produce a coherent image matching your description.
## How It Works in ComfyUI
- Key nodes involved: `CheckpointLoaderSimple`, `CLIPTextEncode` (positive + negative), `EmptyLatentImage`, `KSampler`, `VAEDecode`, `SaveImage`
- Typical workflow pattern: Load checkpoint → encode text prompts → create empty latent → sample → decode latent to image → save
## Key Settings
- **Resolution**: Must match the model's training resolution. Use 512×512 for SD 1.5, 1024×1024 for SDXL and Flux models. Mismatched resolutions produce artifacts like duplicated limbs or distorted compositions.
- **Steps**: Number of denoising iterations. 2030 steps is a good balance between quality and speed. More steps refine details but with diminishing returns beyond 30.
- **CFG Scale**: Controls how strongly the sampler follows your prompt. 78 is the typical range. Higher values increase prompt adherence but can introduce oversaturation or artifacts.
- **Seed**: Determines the initial random noise. A fixed seed produces reproducible results, which is useful for iterating on prompts while keeping composition consistent.
## Tips
- Start with simple, descriptive prompts before adding stylistic modifiers. Complex prompts can conflict and produce muddy results.
- Use the negative prompt `CLIPTextEncode` to specify what you want to avoid (e.g., "blurry, low quality, deformed hands") — this significantly improves output quality.
- Always match your `EmptyLatentImage` resolution to the model you loaded. A 768×768 image on an SD 1.5 checkpoint will produce noticeably worse results than 512×512.

View File

@@ -1,21 +0,0 @@
# Upscaling
Upscaling increases image resolution while adding detail, turning a small generated image into a large, sharp result. In ComfyUI, there are two main approaches: model-based upscaling, which uses trained AI models (like RealESRGAN or 4x-UltraSharp) to intelligently enlarge an image in one pass, and latent-based upscaling, which works in latent space with a KSampler to add new detail during the enlargement process. Model-based is faster, while latent-based offers more creative control.
## How It Works in ComfyUI
- Key nodes involved: `UpscaleModelLoader`, `ImageUpscaleWithModel`, `ImageScaleBy`, `LatentUpscale`, `VAEDecodeTiled`
- Typical workflow pattern: Generate image → Upscale model loader → ImageUpscaleWithModel → Save image (model-based), or Generate latent → LatentUpscale → KSampler (lower denoise) → VAEDecode → Save image (latent-based)
## Key Settings
- **Upscale model**: The AI model used for model-based upscaling. `RealESRGAN_x4plus` is a reliable general-purpose choice; `4x-UltraSharp` excels at photo-realistic detail.
- **Scale factor**: How much to enlarge — 2x and 4x are typical. Higher factors increase VRAM usage significantly.
- **tile_size**: For tiled decoding/encoding of very large images. Range 5121024; smaller tiles use less VRAM but take longer.
## Tips
- Model-based upscaling is faster but less creative; latent upscaling paired with a KSampler adds genuinely new detail.
- Use `VAEDecodeTiled` for very large images to avoid out-of-memory errors.
- Chain two 2x upscales instead of one 4x for better overall quality.
- When using latent upscaling, set KSampler denoise to 0.30.5 to add detail without changing the composition.

View File

@@ -1,20 +0,0 @@
# VAE (Variational Autoencoder)
The VAE encodes pixel images into a compact latent representation and decodes latents back into pixel images. All diffusion in Stable Diffusion and Flux happens in latent space — the VAE is the bridge between the images you see and the mathematical space where the model actually works. Every generation workflow ends with a VAE decode step to produce a viewable image.
## How It Works in ComfyUI
- Key nodes: `VAEDecode` (latent → image), `VAEEncode` (image → latent), `VAEDecodeTiled` (for large images to avoid out-of-memory errors), `VAELoader` (load a standalone VAE file)
- Typical workflow pattern: Most checkpoints include a built-in VAE, so the `VAEDecode` node can pull directly from the loaded checkpoint. To use a different VAE, add a `VAELoader` node and connect it to `VAEDecode` instead.
## Key Settings
- **tile_size** (for `VAEDecodeTiled`): Size of each tile when decoding in chunks. Default is 512; reduce if you still encounter memory issues
- **VAE choice**: VAE files are model-specific. Use `sdxl_vae.safetensors` for SDXL, `ae.safetensors` for Flux. Place files in `ComfyUI/models/vae/`
## Tips
- If colors look washed out or slightly off, try loading an external VAE — the VAE baked into a checkpoint is not always optimal, especially for community fine-tunes
- Use `VAEDecodeTiled` for images larger than ~2048 px on either side to prevent out-of-memory crashes
- SDXL and Flux each have their own VAE architecture — using the wrong one will produce corrupted output
- When doing img2img or inpainting, the `VAEEncode` node converts your input image into the latent space the model expects

View File

@@ -1,22 +0,0 @@
# Video Generation
Video generation creates video content from text prompts (T2V), reference images (I2V), or existing video (V2V) using specialized video diffusion models. Unlike image generation, video models must maintain temporal coherence across frames, ensuring smooth motion and consistent subjects. ComfyUI supports several leading open-source video models including WAN 2.1 and HunyuanVideo, each with their own loader and latent nodes.
## How It Works in ComfyUI
- Key nodes involved: Model-specific loaders (e.g. `WAN` video nodes, `HunyuanVideo` nodes, `LTXVLoader`), `EmptyHunyuanLatentVideo` / `EmptyLTXVLatentVideo`, `KSampler`, `VHS_VideoCombine` (from Video Helper Suite)
- Typical workflow pattern: Load video model → Create empty video latent → KSampler (with video-aware scheduling) → VAE decode → VHS_VideoCombine → Save video
## Key Settings
- **Frame count**: Number of frames to generate. Typically 1681 frames depending on the model; more frames require more VRAM and time.
- **Resolution**: Often 512×320 or 848×480 for T2V. Higher resolutions need significantly more resources.
- **FPS**: Frames per second for output, typically 824. Higher FPS gives smoother motion but requires more frames for the same duration.
- **Motion scale/strength**: Controls the amount of movement in the generated video. Lower values produce subtle motion; higher values produce more dynamic scenes.
## Tips
- Start with fewer frames and lower resolution to test your prompt and settings before committing to a full-quality render.
- Image-to-video (I2V) typically gives better coherence than text-to-video (T2V) because the model has a visual anchor.
- Video Helper Suite (VHS) nodes are essential for loading, previewing, and saving video — install this custom node pack first.
- WAN 2.1 and HunyuanVideo are currently the leading open models for quality video generation in ComfyUI.

View File

@@ -1,88 +0,0 @@
{
"Wan": "wan",
"Wan2.1": "wan",
"Wan2.2": "wan",
"Wan2.5": "wan",
"Wan2.6": "wan",
"Wan-Move": "wan",
"Motion Control": "wan",
"Flux": "flux",
"Flux.2": "flux",
"Flux.2 Dev": "flux",
"Flux.2 Klein": "flux",
"Kontext": "flux",
"BFL": "flux",
"SDXL": "sdxl",
"SD1.5": "sdxl",
"Stability": "sdxl",
"Reimagine": "sdxl",
"SD3.5": "sd3-5",
"SVD": "svd",
"Stable Audio": "stable-audio",
"Google": "gemini",
"Google Gemini": "gemini",
"Google Gemini Image": "gemini",
"Gemini3 Pro Image Preview": "gemini",
"Gemini-2.5-Flash": "gemini",
"Veo": "veo",
"Nano Banana Pro": "nano-banana-pro",
"nano-banana": "nano-banana-pro",
"OpenAI": "gpt-image-1",
"GPT-Image-1": "gpt-image-1",
"GPT-Image-1.5": "gpt-image-1",
"Qwen": "qwen",
"Qwen-Image": "qwen",
"Qwen-Image-Edit": "qwen",
"Qwen-Image-Layered": "qwen",
"Qwen-Image 2512": "qwen",
"Hunyuan Video": "hunyuan",
"Hunyuan3D": "hunyuan",
"Tencent": "hunyuan",
"LTX-2": "ltx-video",
"LTXV": "ltx-video",
"Lightricks": "ltx-video",
"ByteDance": "seedance",
"Seedance": "seedance",
"Seedream": "seedream",
"Seedream 4.0": "seedream",
"SeedVR2": "seedvr2",
"Vidu": "vidu",
"Vidu Q2": "vidu",
"Vidu Q3": "vidu",
"Kling": "kling",
"Kling O1": "kling",
"Kling2.6": "kling",
"ACE-Step": "ace-step",
"Chatter Box": "chatterbox",
"Recraft": "recraft",
"Runway": "runway",
"Luma": "luma",
"HiDream": "hidream",
"Tripo": "tripo",
"MiniMax": "minimax",
"Z-Image-Turbo": "z-image",
"Z-Image": "z-image",
"Grok": "grok",
"Moonvalley": "moonvalley",
"Topaz": "topaz",
"Kandinsky": "kandinsky",
"OmniGen": "omnigen",
"Magnific": "magnific",
"PixVerse": "pixverse",
"Meshy": "meshy",
"Rodin": "rodin",
"WaveSpeed": "wavespeed",
"Chroma": "chroma",
"BRIA": "bria",
"HitPaw": "hitpaw",
"NewBie": "newbie",
"Ovis-Image": "ovis-image",
"Ideogram": "ideogram",
"Anima": "anima",
"ChronoEdit": "chronoedit",
"Nvidia": "chronoedit",
"HuMo": "humo",
"FlashVSR": "flashvsr",
"Real-ESRGAN": "real-esrgan",
"Depth Anything\u00a0v2": "depth-anything-v2"
}

View File

@@ -1,47 +0,0 @@
# ACE-Step
ACE-Step is a foundation model for music generation developed by ACE Studio and StepFun. It uses diffusion-based generation with a Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer to achieve state-of-the-art speed and musical coherence.
## Model Variants
### ACE-Step (3.5B)
- 3.5B parameter diffusion model
- DCAE encoder with linear transformer conditioning
- 27 or 60 inference steps recommended
- Apache 2.0 license
## Key Features
- 15x faster than LLM-based baselines (20 seconds for a 4-minute song on A100)
- Full-song generation with lyrics and structure
- Duration control for variable-length output
- Music remixing and style transfer
- Lyrics editing and vocal synthesis
- Supports 16+ languages including English, Chinese, Japanese, Korean, French, German, Spanish, and more
- Text-to-music from natural language descriptions
## Hardware Requirements
- RTX 3090: 12.76x real-time factor at 27 steps
- RTX 4090: 34.48x real-time factor at 27 steps
- NVIDIA A100: 27.27x real-time factor at 27 steps
- Apple M2 Max: 2.27x real-time factor at 27 steps
- Higher step counts (60) reduce speed by roughly half
## Common Use Cases
- Original music generation from text descriptions
- Song remixing and style transfer
- Lyrics-based music creation
- Multi-language vocal music generation
- Rapid music prototyping for content creators
- Background music and soundtrack generation
## Key Parameters
- **steps**: Inference steps (27 for speed, 60 for quality)
- **duration**: Target audio length in seconds (up to ~5 minutes)
- **lyrics**: Song lyrics text input for vocal generation
- **prompt**: Natural language description of desired music style and mood
- **seed**: Random seed for reproducible generation (results are seed-sensitive)

View File

@@ -1,46 +0,0 @@
# Anima
Anima is an API-based AI video generation platform that creates animated video content from text prompts, supporting character consistency and storyboard-driven workflows.
## Model Variants
### Anima Video Generation
- Cloud-based video generation service
- Supports multiple underlying AI models (Runway, Kling, Minimax, Luma)
- Integrated text, image, and audio generation pipeline
## Key Features
- AI character generation with persistent identity across scenes
- Storyboard-based workflow: script to visual scenes with narration
- Multi-model integration (GPT-4, Claude, Gemini for text; FLUX, MidJourney for images)
- Voice generation via ElevenLabs integration
- Music composition via Suno integration
- Autopilot mode for fully automated video creation
- Prompt enhancement for optimized output quality
- Template library for rapid content creation
- Scene-by-scene generation with character consistency
## Hardware Requirements
- No local hardware required (cloud-based service)
- Runs entirely through web API
- Browser-based interface for interactive use
## Common Use Cases
- Animated story series production
- Movie trailer and concept video creation
- Kids bedtime story animation
- Lofi music video generation
- Marketing and explainer video content
- Storyboard visualization
## Key Parameters
- **prompt**: Text description of the scene or story
- **character**: Selected or generated character for identity consistency
- **style**: Visual style preset (animation, cinematic, etc.)
- **duration**: Target video length
- **resolution**: Output video resolution

View File

@@ -1,48 +0,0 @@
# BRIA AI
BRIA AI is an enterprise-focused visual generative AI platform that trains its models exclusively on licensed, ethically sourced data, ensuring commercially safe outputs with full IP indemnification.
## Model Variants
### BRIA Fibo
- Flagship hyper-controllable text-to-image model
- JSON-based control framework with 100+ disentangled visual attributes
- Supports lighting, depth, color, composition, and camera control
- Ideal for agentic workflows and enterprise-scale creative automation
### BRIA Text-to-Image Lite
- Fully private, self-hosted deployment of the Fibo pipeline
- Designed for regulated industries requiring total data control
- Runs on-premises with no external data transfer
## Key Features
- Trained on 100% licensed data from 20+ partners including Getty Images
- Full IP indemnification for commercial use
- Tri-layer content moderation for brand-safe outputs
- Patented attribution engine compensating data owners by usage
- ControlNet support for canny, depth, recoloring, and IP Adapter
- Multilingual prompt support
- Fine-tuning API for brand-specific customization
## Hardware Requirements
- Cloud-hosted API available (no local GPU required)
- Self-hosted Lite version supports deployment on AWS and Azure
- Open-source weights available on Hugging Face for local inference
## Common Use Cases
- Enterprise marketing and advertising content
- E-commerce product photography
- Brand-consistent visual asset generation
- Storyboarding and concept art for media production
## Key Parameters
- **prompt**: Text description of desired image
- **style**: Photorealistic, illustrative, or custom styles
- **guidance_methods**: ControlNet canny, depth, recoloring, IP Adapter
- **resolution**: Multiple aspect ratios supported

View File

@@ -1,52 +0,0 @@
# Chatterbox
Chatterbox is a family of state-of-the-art open-source text-to-speech models developed by Resemble AI, featuring zero-shot voice cloning and emotion control.
## Model Variants
### Chatterbox Turbo
- 350M parameters, single-step mel decoding for low latency
- Paralinguistic tags for non-speech sounds ([laugh], [cough], [chuckle])
- English only, optimized for voice agents and production use
### Chatterbox (Original)
- 500M parameter Llama backbone, English only
- CFG and exaggeration control for emotion intensity
### Chatterbox Multilingual
- 500M parameters, 23 languages (Arabic, Chinese, French, German, Hindi, Japanese, Korean, Spanish, and more)
- Zero-shot voice cloning across languages
## Key Features
- Zero-shot voice cloning from a few seconds of reference audio
- Emotion exaggeration control (first open-source model with this feature)
- Built-in PerTh neural watermarking for responsible AI
- Sub-200ms latency for real-time applications
- Trained on 500K hours of cleaned speech data
- MIT license (free for commercial use)
- Outperforms ElevenLabs in subjective evaluations
## Hardware Requirements
- Minimum: NVIDIA GPU with CUDA support
- Turbo model requires less VRAM than original due to smaller architecture
- Runs on consumer GPUs (RTX 3060 and above)
- CPU inference possible but significantly slower
## Common Use Cases
- Voice cloning for content creation
- AI voice agents and assistants
- Audiobook narration
- Game and media dialogue generation
## Key Parameters
- **exaggeration**: Emotion intensity control (0.0 to 1.0, default 0.5)
- **cfg_weight**: Classifier-free guidance weight (0.0 to 1.0, default 0.5)
- **audio_prompt_path**: Path to reference audio clip for voice cloning
- **language_id**: Language code for multilingual model (e.g., "fr", "zh", "ja")

View File

@@ -1,50 +0,0 @@
# Chroma
Chroma is an open-source 8.9 billion parameter text-to-image model based on the FLUX.1-schnell architecture, developed by Lodestone Rock and the community. It is fully Apache 2.0 licensed.
## Model Variants
### Chroma
- 8.9B parameter model based on FLUX.1-schnell
- Trained on a curated 5M sample dataset (from 20M candidates)
- Apache 2.0 license for unrestricted use
- Supports both tag-based and natural language prompting
### Chroma XL
- Experimental merge and fine-tune based on NoobAI-XL (SDXL architecture)
- Low CFG (2.5-3.0) and low step count (8-12 steps)
- Optimized for fast generation on consumer hardware
## Key Features
- Fully open-source with Apache 2.0 licensing
- Diverse training data spanning anime, artistic, and photographic styles
- Community-driven development with public training logs
- Compatible with FLUX ecosystem (VAE, T5 text encoder)
- ComfyUI workflow support
- LoRA and fine-tuning compatible
- GGUF quantized versions available for lower VRAM
## Hardware Requirements
- Base model: 24GB VRAM recommended (BF16)
- Q8_0 quantized: ~13GB VRAM
- Q4_0 quantized: ~7GB VRAM
- Requires FLUX.1 VAE and T5 text encoder
## Common Use Cases
- Open-source text-to-image generation
- Artistic and stylized image creation
- Community model fine-tuning and experimentation
- LoRA training for custom styles and characters
## Key Parameters
- **prompt**: Text description or tag-based prompt
- **steps**: Inference steps (15-30 recommended)
- **cfg_scale**: Guidance scale (1-4, model uses low CFG)
- **resolution**: Output resolution (1024x1024 default)
- **guidance**: Flux-style guidance parameter (around 4)

View File

@@ -1,58 +0,0 @@
# ChronoEdit
ChronoEdit is an image editing framework by NVIDIA that reframes editing as a video generation task, using temporal reasoning to ensure physically plausible and consistent edits.
## Model Variants
### ChronoEdit-14B
- Full 14 billion parameter model for maximum quality
- Built on pretrained video diffusion model architecture
- Requires ~34GB VRAM (38GB with temporal reasoning enabled)
### ChronoEdit-2B
- Compact 2 billion parameter variant for efficiency
- Maintains core temporal reasoning capabilities
- Lower VRAM requirements for broader hardware compatibility
### ChronoEdit-14B 8-Step Distilled LoRA
- Distilled variant requiring only 8 inference steps
- Faster generation with minimal quality loss
- Uses flow-shift 2.0 and guidance-scale 1.0
## Key Features
- Treats image editing as a video generation task for temporal consistency
- Temporal reasoning tokens simulate intermediate editing trajectories
- Ensures physically plausible edits (object interactions, lighting, shadows)
- Two-stage pipeline: temporal reasoning stage followed by editing frame generation
- Prompt enhancer integration for improved editing instructions
- LoRA fine-tuning support via DiffSynth-Studio
- Upscaler LoRA available for super-resolution editing
- PaintBrush LoRA for sketch-to-object editing
- Apache-2.0 license
## Hardware Requirements
- 14B model: 34GB VRAM minimum (38GB with temporal reasoning)
- 2B model: 12GB+ VRAM estimated
- Supports model offloading to reduce peak VRAM
- Linux only (not supported on Windows/macOS)
## Common Use Cases
- Physically consistent image editing (add/remove/modify objects)
- World simulation for autonomous driving and robotics
- Visualizing editing trajectories and reasoning
- Image super-resolution via upscaler LoRA
- Sketch-to-object conversion via PaintBrush LoRA
## Key Parameters
- **prompt**: Text description of the desired edit
- **num_inference_steps**: Denoising steps (default ~50, or 8 with distilled LoRA)
- **guidance_scale**: Prompt adherence strength (default ~7.5, or 1.0 with distilled LoRA)
- **flow_shift**: Flow matching shift parameter (2.0 for distilled LoRA)
- **enable_temporal_reasoning**: Toggle temporal reasoning stage for better consistency

View File

@@ -1,60 +0,0 @@
# Depth Anything V2
Depth Anything V2 is a monocular depth estimation model trained on 595K synthetic labeled images and 62M+ real unlabeled images, providing robust relative depth maps from single images.
## Model Variants
### Depth-Anything-V2-Small
- Lightweight variant for fast inference
- ViT-S (Small) encoder backbone
- Suitable for real-time applications
### Depth-Anything-V2-Base
- Mid-range variant balancing speed and accuracy
- ViT-B (Base) encoder backbone
### Depth-Anything-V2-Large
- High-accuracy variant for detailed depth maps
- ViT-L (Large) encoder backbone with 256 output features
- Recommended for most production use cases
### Depth-Anything-V2-Giant
- Maximum accuracy variant
- ViT-G (Giant) encoder backbone
- Highest computational requirements
## Key Features
- More fine-grained depth detail than Depth Anything V1
- More robust than V1 and Stable Diffusion-based alternatives (Marigold, Geowizard)
- 10× faster than SD-based depth estimation models
- Trained on large-scale synthetic + real data mixture
- Produces relative (not metric) depth maps by default
- DPT (Dense Prediction Transformer) decoder architecture
## Hardware Requirements
- Small: 2GB VRAM minimum
- Base: 4GB VRAM minimum
- Large: 6GB VRAM recommended
- Giant: 12GB+ VRAM recommended
- CPU inference supported for smaller variants
## Common Use Cases
- Depth map generation for compositing and VFX
- ControlNet depth conditioning for image generation
- 3D scene understanding and reconstruction
- Foreground/background separation
- Augmented reality occlusion
- Video depth estimation for parallax effects
## Key Parameters
- **encoder**: Model size variant (vits, vitb, vitl, vitg)
- **input_size**: Processing resolution (higher = more detail, more VRAM)
- **output_type**: Raw depth array or normalized visualization

View File

@@ -1,50 +0,0 @@
# FlashVSR
FlashVSR is a diffusion-based streaming video super-resolution framework that achieves near real-time 4× upscaling through one-step inference with locality-constrained sparse attention.
## Model Variants
### FlashVSR v1
- Initial release of the one-step streaming VSR model
- Built on Wan2.1 1.3B video diffusion backbone
- 4× super-resolution optimized
### FlashVSR v1.1
- Enhanced stability and fidelity over v1
- Improved artifact handling across different aspect ratios
- Recommended for production use
## Key Features
- One-step diffusion inference (no multi-step denoising required)
- Streaming architecture with KV cache for sequential frame processing
- Locality-Constrained Sparse Attention (LCSA) prevents artifacts at high resolutions
- Tiny Conditional Decoder (TC Decoder) achieves 7× faster decoding than standard WanVAE
- Three-stage distillation pipeline from multi-step to single-step inference
- Runs at ~17 FPS for 768×1408 videos on a single A100 GPU
- Up to 12× speedup over prior one-step diffusion VSR models
- Scales reliably to ultra-high resolutions
## Hardware Requirements
- Minimum: 24GB VRAM (A100 or similar recommended)
- Optimized for NVIDIA A100 GPUs
- Significant VRAM required for high-resolution video processing
- Multi-GPU inference not required but beneficial for throughput
## Common Use Cases
- Real-world video upscaling to 4K
- AI-generated video enhancement and artifact removal
- Long video super-resolution with temporal consistency
- Streaming video quality improvement
- Restoring compressed or low-resolution video footage
## Key Parameters
- **scale**: Upscaling factor (4× recommended for best results)
- **tile_size**: Spatial tiling for memory management (0 = auto)
- **input_resolution**: Source video resolution (outputs 4× larger)
- **model_version**: v1 or v1.1 checkpoint selection

View File

@@ -1,98 +0,0 @@
# Flux
Flux is a family of state-of-the-art text-to-image and image editing models developed by Black Forest Labs (BFL).
## Model Variants
### Flux.1 Schnell
- Ultra-fast inference (1-4 steps)
- 12B parameter rectified flow transformer
- Apache 2.0 license (open source)
- Best for rapid prototyping and real-time applications
### Flux.1 Dev
- High-quality 12B parameter development model
- 20-50 steps for best results
- Non-commercial license for research
- Guidance-distilled for efficient generation
### Flux.1 Pro
- Highest quality Flux.1 outputs via commercial API
- Best prompt adherence and detail
### Flux.2 Dev
- 32B parameter rectified flow transformer
- Unified text-to-image, single-reference editing, and multi-reference editing
- No fine-tuning needed for character/object/style reference
- Up to 4MP photorealistic output with improved autoencoder
- Non-commercial license; quantized versions available for consumer GPUs
### Flux.2 Klein
- Fastest Flux model family — sub-second inference on modern hardware
- **Klein 4B**: ~8GB VRAM, Apache 2.0 license, ideal for edge deployment
- **Klein 9B**: Best quality-to-latency ratio, non-commercial license
- Base (undistilled) variants available for fine-tuning and LoRA training
- Supports text-to-image, single-reference editing, and multi-reference editing
### Flux.1 Kontext
- In-context image generation and editing via text instructions
- Available as Kontext Max (premium), Pro (API), and Dev (open-weights, 12B)
- Character consistency across multiple scenes without fine-tuning
- Typography manipulation and local editing within images
### Flux.1 Fill
- Dedicated inpainting and outpainting model
- Maintains consistency with surrounding image context
- Available as Fill Pro (API) and Fill Dev (open-weights)
### Flux Redux / Canny / Depth
- **Redux**: Image variation generation from reference images
- **Canny**: Edge-detection-based structural conditioning
- **Depth**: Depth-map-based structural conditioning for pose/layout control
## Key Features
- Excellent text rendering in images
- Strong prompt following and instruction adherence
- High resolution output (up to 4MP with Flux.2)
- Multi-reference editing: combine up to 6 reference images
- Consistent style and quality across generations
## Hardware Requirements
- Flux.2 Klein 4B: ~8GB VRAM (consumer GPUs like RTX 4070)
- Flux.2 Klein 9B: ~20GB VRAM
- Flux.1 models: 12GB VRAM minimum (fp16), 24GB recommended
- Flux.2 Dev: 64GB+ VRAM native, FP8 quantized ~40GB
- Quantized and weight-streaming options available for lower VRAM cards
## Common Use Cases
- Text-to-image generation
- Iterative image editing via text instructions
- Character-consistent multi-scene generation
- Inpainting and outpainting
- Style transfer and image variation
- Structural conditioning (canny, depth)
## Key Parameters
- **steps**: 1-4 (Schnell/Klein distilled), 20-50 (Dev/Base)
- **guidance_scale**: 3.5-4.0 typical for Flux.2, 3.5 for Flux.1
- **resolution**: Up to 2048x2048 (Flux.1), up to 4MP (Flux.2)
- **seed**: For reproducible generation
- **prompt_upsampling**: Optional LLM-based prompt enhancement (Flux.2)
## Blog References
- [FLUX.2 Day-0 Support in ComfyUI](../blog/flux2-day-0-support.md) — FLUX.2 with 4MP output, multi-reference consistency, professional text rendering
- [FLUX.2 [klein] 4B & 9B](../blog/flux2-klein-4b.md) — Fastest Flux models, sub-second inference, unified generation and editing
- [The Complete AI Upscaling Handbook](../blog/upscaling-handbook.md) — Benchmarks for upscaling workflows

View File

@@ -1 +0,0 @@
Flux is Black Forest Labs' family of text-to-image and image editing models. The lineup includes Flux.1 Schnell (ultra-fast, 1-4 steps, Apache 2.0), Flux.1 Dev (high-quality, 20-50 steps, non-commercial), Flux.1 Pro (commercial API), and the newer Flux.2 Dev (32B parameters, up to 4MP output, multi-reference editing without fine-tuning). Flux.2 Klein offers sub-second inference in 4B (~8GB VRAM, Apache 2.0) and 9B variants. Specialized models include Kontext (in-context editing, character consistency), Fill (inpainting/outpainting), Redux (image variations), and Canny/Depth (structural conditioning). Flux excels at text rendering in images, strong prompt adherence, and consistent multi-scene generation. VRAM ranges from ~8GB (Klein 4B) to 64GB+ (Flux.2 Dev native), with quantized options available. Key parameters: guidance_scale 3.5-4.0, resolution up to 4MP for Flux.2. Primary uses include text-to-image, iterative editing, style transfer, and structural conditioning.

View File

@@ -1,75 +0,0 @@
# Gemini
Gemini is Google DeepMind's multimodal AI model family with native image generation, editing, and video generation capabilities, accessible in ComfyUI through API nodes.
## Model Variants
### Gemini 3 Pro Image Preview
- Most capable Gemini image model with advanced reasoning
- Complex multi-turn image generation and editing
- Up to 14 input images, native 4K output
- Also known as Nano Banana Pro
- Model ID: `gemini-3-pro-image-preview`
### Gemini 2.5 Flash Image
- Cost-effective image generation optimized for speed and low latency
- Character consistency, multi-image fusion, and prompt-based editing
- $0.039 per image (1290 output tokens per image)
- Model ID: `gemini-2.5-flash-image`
### Google Gemini (General)
- Multimodal model for text, image understanding, and generation
- Interleaved text-and-image output in conversational context
- Supports image input for analysis and editing tasks
### Veo 2
- Text-to-video and image-to-video generation
- 8-second video clips at 720p resolution
- Realistic physics simulation and cinematic styles
- Supports 16:9 and 9:16 aspect ratios
- Model ID: `veo-2.0-generate-001`
### Veo 3 / 3.1
- Latest video generation with native audio (dialogue, SFX, ambient)
- Up to 1080p and 4K resolution (Veo 3.1)
- Style reference images for aesthetic control
- 4, 6, or 8-second video duration options
## Key Features
- Native multimodal generation: text, images, and video in one model family
- World knowledge from Google Search for factually accurate image generation
- SynthID invisible watermarking on all generated content
- Multi-image fusion and character consistency across generations
- Clean text rendering across multiple languages
- Prompt-based image editing without masks or complex workflows
## Hardware Requirements
- No local GPU required — all models accessed via cloud API
- Available through ComfyUI API nodes, Google AI Studio, and Vertex AI
- Requires API key and network access
## Common Use Cases
- Text-to-image and image editing via API nodes
- Multi-turn conversational image generation
- Video generation from text prompts or reference images
- Product animation and social media video content
- Style-consistent character and brand asset generation
- Text rendering and translation in images
## Key Parameters
- **prompt**: Text description for generation or editing
- **aspect_ratio**: 1:1, 3:4, 4:3, 9:16, 16:9, 21:9 (images); 16:9, 9:16 (video)
- **temperature**: 0.0-2.0 (default 1.0 for image models)
- **durationSeconds**: 4-8 seconds for Veo models
- **sampleCount**: 1-4 output videos per request
- **seed**: Integer for reproducible generation
- **personGeneration**: Safety control — `allow_adult`, `dont_allow`, or `allow_all`

View File

@@ -1,62 +0,0 @@
# GPT-Image-1
GPT-Image-1 is OpenAI's natively multimodal image generation model, capable of generating and editing images from text and image inputs. It is accessed in ComfyUI through API nodes.
## Model Variants
### GPT-Image-1.5
- Latest and most advanced GPT Image model
- Best overall quality with superior instruction following
- High input fidelity for the first 5 input images
- Supports generate vs. edit action control
- Multi-turn editing via the Responses API
### GPT-Image-1
- Production-grade image generation and editing
- High input fidelity for the first input image
- Supports up to 16 input images for editing
- Up to 10 images per generation request
### GPT-Image-1-Mini
- Cost-effective variant for lower quality requirements
- Same API surface as GPT-Image-1
- Suitable for rapid prototyping and high-volume workloads
## Key Features
- Superior text rendering in generated images
- Real-world knowledge for accurate depictions
- Transparent background support (PNG and WebP)
- Mask-based inpainting with prompt guidance
- Multi-image editing: combine up to 16 reference images
- Streaming partial image output during generation
- Content moderation with adjustable strictness
## Hardware Requirements
- No local GPU required — cloud API service via OpenAI
- Accessed through ComfyUI API nodes
- Requires OpenAI API key and organization verification
## Common Use Cases
- Text-to-image generation with detailed prompts
- Image editing and compositing from multiple references
- Product photography and mockup generation
- Inpainting with mask-guided editing
- Transparent asset generation (stickers, logos, icons)
- Multi-turn iterative image refinement
## Key Parameters
- **prompt**: Text description up to 32,000 characters
- **size**: `1024x1024`, `1536x1024` (landscape), `1024x1536` (portrait), or `auto`
- **quality**: `low`, `medium`, `high`, or `auto` (affects cost and detail)
- **n**: Number of images to generate (1-10)
- **background**: `transparent`, `opaque`, or `auto`
- **output_format**: `png`, `jpeg`, or `webp`
- **moderation**: `auto` (default) or `low` (less restrictive)
- **input_fidelity**: `low` (default) or `high` for preserving input image details

View File

@@ -1,56 +0,0 @@
# Grok (Aurora)
Aurora is xAI's autoregressive image generation model integrated into Grok, excelling at photorealistic rendering and precise text instruction following.
## Model Variants
### grok-2-image-1212
- API-accessible image generation model
- Generates multiple images from text prompts
- $0.07 per generated image
- OpenAI and Anthropic SDK compatible
### Aurora (Consumer)
- Autoregressive mixture-of-experts network
- Trained on billions of text and image examples
- Available via Grok on X platform, web, iOS, and Android
### Grok Imagine
- Video and image generation model
- State-of-the-art quality across cost and latency
- API available since January 2026
## Key Features
- Photorealistic image generation from text prompts
- Precise text rendering within images
- Accurate rendering of real-world entities, logos, and text
- Image editing via uploaded photos with text instructions
- Multi-image generation per request
- Native multimodal input support
## Hardware Requirements
- Cloud API-based (no local GPU required)
- All generation runs on xAI infrastructure
- API access via console.x.ai
## Common Use Cases
- Photorealistic image generation
- Text and logo rendering in images
- Image editing and style transfer
- Meme and social media content creation
- Product visualization
- Character and portrait generation
## Key Parameters
- **prompt**: Text description of desired image
- **model**: Model identifier (grok-2-image-1212)
- **n**: Number of images to generate
- **response_format**: Output format (url or b64_json)
- **size**: Image dimensions

View File

@@ -1,55 +0,0 @@
# HiDream-I1
HiDream-I1 is a 17B parameter image generation foundation model by HiDream.ai that achieves state-of-the-art quality using a sparse diffusion transformer architecture.
## Model Variants
### HiDream-I1 Full
- Full 17B parameter sparse diffusion transformer
- Uses Llama-3.1-8B-Instruct and T5-XXL as text encoders
- VAE from FLUX.1 Schnell, MIT license
### HiDream-I1 Dev
- Distilled variant, faster inference with minor quality tradeoff
### HiDream-I1 Fast
- Further distilled for maximum speed, best for rapid prototyping
### HiDream-E1
- Instruction-based image editing model
## Key Features
- State-of-the-art HPS v2.1 score (33.82), surpassing Flux.1-dev, DALL-E 3, and Midjourney V6
- Best-in-class prompt following on GenEval (0.83) and DPG-Bench (85.89)
- Multiple output styles: photorealistic, cartoon, artistic, and more
- Dual text encoding with Llama-3.1-8B-Instruct and T5-XXL for strong prompt adherence
- MIT license for commercial use
- Requires Flash Attention for optimal performance
## Hardware Requirements
- Minimum: 24GB VRAM (Full model), Dev and Fast variants run on lower VRAM
- Recommended: 40GB+ VRAM for Full model at high resolution
- CUDA 12.4+ recommended for Flash Attention
- Llama-3.1-8B-Instruct weights downloaded automatically
## Common Use Cases
- High-fidelity text-to-image generation
- Photorealistic image creation
- Artistic and stylized illustrations
- Instruction-based image editing (E1 variant)
- Commercial image generation
## Key Parameters
- **model_type**: Variant selection (full, dev, fast)
- **steps**: Inference steps (varies by variant; fewer for fast/dev)
- **cfg_scale**: Guidance scale for prompt adherence
- **resolution**: Output image dimensions
- **prompt**: Detailed text description of desired image

View File

@@ -1,51 +0,0 @@
# HitPaw
HitPaw is an AI-powered visual enhancement platform providing image and video upscaling, restoration, and denoising through dedicated API services and desktop applications.
## Model Variants
### HitPaw Image Enhancer
- AI-powered photo enhancement with super-resolution up to 8x
- Face Clear Model: dual-model portrait upscaling (2x and 4x)
- Face Natural Model: texture-preserving portrait enhancement
- General Enhance Model: super-resolution for scenes and objects
- High Fidelity Model: premium upscaling for DSLR and AIGC images
- Generative Portrait/Enhance Models: diffusion-based restoration for heavily compressed images
### HitPaw Video Enhancer (VikPea)
- Frame-aware video restoration and ultra HD upscaling
- Face Soft Model: face-optimized noise and blur reduction
- Portrait Restore Model: multi-frame fusion for facial detail
- General Restore Model: GAN-based restoration for broad scenarios
- Ultra HD Model: premium upscaling from HD to ultra HD
- Generative Model: diffusion-driven repair for low-resolution video
## Key Features
- One-click portrait and scene enhancement
- Dual-model face and background processing pipelines
- Batch processing and API access for automated workflows
- Support for 30+ video input formats and 5 export formats
- Multi-frame face restoration for temporal consistency in video
- Denoising models for mobile and camera images
## Hardware Requirements
- Cloud API available (no local GPU required)
- Desktop apps for Windows, Mac, Android, and iOS
- API integration via HTTP-based interface
## Common Use Cases
- Upscaling AI-generated images to publication quality
- Restoring old or low-resolution photos and videos
- Enhancing portrait and landscape photography
- Video quality improvement for content creators
## Key Parameters
- **model**: Select enhancement model per content type
- **scale**: 2x or 4x super-resolution options
- **format**: Output format selection (mp4, mov, mkv, m4v, avi for video)

View File

@@ -1,47 +0,0 @@
# HuMo
HuMo is a human-centric video generation model by ByteDance that produces videos from collaborative multi-modal conditioning using text, image, and audio inputs.
## Model Variants
### HuMo (Wan2.1-T2V-1.3B based)
- Built on the Wan2.1-T2V-1.3B video foundation model
- Supports Text+Image (TI), Text+Audio (TA), and Text+Image+Audio (TIA) modes
- Two-stage training: subject preservation then audio-visual sync
## Key Features
- Multi-modal conditioning: text, reference images, and audio simultaneously
- Subject identity preservation from reference images across frames
- Audio-driven lip synchronization with facial expression alignment
- Focus-by-predicting strategy for facial region attention during audio sync
- Time-adaptive guidance dynamically adjusts input weights across denoising steps
- Minimal-invasive image injection maintains base model prompt understanding
- Progressive two-stage training separates identity learning from audio sync
- Supports text-controlled appearance editing while preserving identity
## Hardware Requirements
- Minimum: 24GB VRAM (RTX 3090/4090 or similar)
- Multi-GPU inference supported via FSDP and sequence parallelism
- Whisper-large-v3 audio encoder required for audio modes
- Optional audio separator for cleaner speech input
## Common Use Cases
- Digital avatar and virtual presenter creation
- Audio-driven talking head generation
- Character-consistent video clips from reference photos
- Lip-synced dialogue video from audio tracks
- Prompted reenactment with identity preservation
- Text-controlled outfit and style changes on consistent subjects
## Key Parameters
- **mode**: Generation mode (TI, TA, or TIA)
- **scale_t**: Text guidance strength (default: 7.5)
- **scale_a**: Audio guidance strength (default: 2.0)
- **frames**: Number of output frames (97 at 25 FPS = ~4 seconds)
- **height/width**: Output resolution (480p or 720p supported)
- **steps**: Denoising steps (30-50 recommended)

View File

@@ -1,75 +0,0 @@
# Hunyuan
Hunyuan is Tencent's family of open-source generative models spanning text-to-image, text-to-video, and 3D asset generation.
## Model Variants
### Hunyuan-DiT
- Text-to-image diffusion transformer with native Chinese and English support
- 1.5B parameter DiT architecture, native 1024x1024 resolution
- Bilingual text encoder for strong CJK text rendering in images
- v1.2 is the latest version with improved quality
### HunyuanVideo
- Large-scale text-to-video and image-to-video generation model
- 13B+ parameters, the largest open-source video generation model
- Dual-stream to single-stream transformer architecture with full attention
- MLLM text encoder (decoder-only LLM) for better instruction following
- Causal 3D VAE with 4x temporal, 8x spatial, 16x channel compression
- Generates 720p video (1280x720) at up to 129 frames (~5s at 24fps)
- FP8 quantized weights available to reduce memory by ~10GB
- Outperforms Runway Gen-3, Luma 1.6 in professional evaluations
- 3 workflow templates available
### Hunyuan3D 2.0
- Image-to-3D and text-to-3D asset generation system
- Two-stage pipeline: Hunyuan3D-DiT (shape) + Hunyuan3D-Paint (texture)
- Flow-based diffusion transformer for geometry generation
- High-resolution texture synthesis with geometric and diffusion priors
- Outputs textured meshes in GLB/OBJ format
- Outperforms both open and closed-source 3D generation models
- 7 workflow templates available
## Key Features
- Native bilingual support (Chinese and English) across the family
- Strong text rendering in generated images (Hunyuan-DiT)
- State-of-the-art video generation quality (HunyuanVideo)
- End-to-end 3D asset creation with texturing (Hunyuan3D)
- Multi-resolution generation across all model types
- Prompt rewrite system for improved generation quality (HunyuanVideo)
## Hardware Requirements
- Hunyuan-DiT: 11GB VRAM minimum (fp16), 16GB recommended
- HunyuanVideo 540p (544x960): 45GB VRAM minimum
- HunyuanVideo 720p (720x1280): 60GB VRAM minimum, 80GB recommended
- HunyuanVideo FP8: Saves ~10GB compared to fp16 weights
- Hunyuan3D 2.0: 16-24GB VRAM for shape + texture pipeline
## Common Use Cases
- Bilingual content creation and marketing materials
- Asian-style artwork and illustrations
- Text-in-image generation (Chinese/English)
- High-quality video generation from text or image prompts
- 3D asset creation for games, design, and prototyping
- Textured mesh generation from reference images
## Key Parameters
- **steps**: 25-50 for Hunyuan-DiT (default 40), 50 for HunyuanVideo
- **cfg_scale**: 5-8 for DiT (6 typical), 6.0 embedded for HunyuanVideo
- **flow_shift**: 7.0 for HunyuanVideo flow matching scheduler
- **video_length**: 129 frames for HunyuanVideo (~5s at 24fps)
- **resolution**: 1024x1024 for DiT, 720x1280 or 544x960 for video
- **negative_prompt**: Recommended for Hunyuan-DiT quality control
## Blog References
- [HunyuanVideo Native Support](../blog/hunyuanvideo-native-support.md) — 13B parameter video model, dual-stream transformer, MLLM text encoder
- [HunyuanVideo 1.5 Native Support](../blog/hunyuanvideo-15-native-support.md) — Lightweight 8.3B model, 720p output, runs on 24GB consumer GPUs
- [Hunyuan3D 2.0 and MultiView Native Support](../blog/hunyuan3d-20-native-support.md) — 3D model generation with PBR materials, 1.1B parameter multi-view model

View File

@@ -1 +0,0 @@
Hunyuan is Tencent's open-source generative model family spanning text-to-image, text-to-video, and 3D generation. Hunyuan-DiT is a 1.5B parameter text-to-image model with native Chinese and English support and strong CJK text rendering at 1024x1024 (11-16GB VRAM). HunyuanVideo is the largest open-source video model at 13B+ parameters, generating 720p video up to 129 frames (~5s at 24fps) using a dual-stream transformer with MLLM text encoder; it requires 45-80GB VRAM depending on resolution (FP8 saves ~10GB). Hunyuan3D 2.0 handles image-to-3D and text-to-3D generation via a two-stage pipeline producing textured GLB/OBJ meshes (16-24GB VRAM). Key strengths: bilingual content creation, state-of-the-art video quality surpassing Runway Gen-3, and end-to-end 3D asset creation. Typical parameters: 25-50 steps for DiT, 50 steps for video, cfg_scale 5-8.

View File

@@ -1,52 +0,0 @@
# Ideogram
Ideogram is an AI image generation platform founded by former Google Brain researchers, known for industry-leading text rendering accuracy in generated images. It achieves approximately 90% text rendering accuracy compared to roughly 30% for competing tools.
## Model Variants
### Ideogram 3.0
- Latest generation released March 2025
- Highest ELO rating in human evaluations across diverse prompts
- Style References support with up to 3 reference images
- Random style feature with 4.3 billion style presets
- Batch generation for scaled content production
### Ideogram 2.0
- Previous generation model
- Available as alternative option in the platform
- Solid text rendering and general image quality
## Key Features
- Best-in-class text rendering with accurate typography and spelling
- Handles complex, multi-line text compositions and curved surfaces
- Style modes: Realistic, Anime, 3D, Watercolor, Typography
- Magic Prompt for automatic prompt enhancement
- Canvas editing for post-generation refinement
- Upscaler up to 8K resolution in 2x increments
- Color palette control for brand consistency
- API available for programmatic integration
## Hardware Requirements
- Cloud API only (no local GPU required)
- API pricing at approximately $0.06 per image
- Web interface with credit-based subscription plans
## Common Use Cases
- Marketing materials with branded text and logos
- Social media graphics with text overlays
- Product packaging and label design
- Event posters, flyers, and invitations
- Book covers and editorial design
## Key Parameters
- **prompt**: Text description with quoted text for typography
- **model**: Version selection (2.0 or 3.0)
- **style**: Realistic, Anime, 3D, Watercolor, Typography
- **aspect_ratio**: 16 aspect ratio options available
- **magic_prompt**: Toggle for automatic prompt enhancement

View File

@@ -1,51 +0,0 @@
# Kandinsky
Kandinsky is a family of open-source diffusion models for video and image generation, developed by Kandinsky Lab (Sber AI, Russia). The models support both English and Russian text prompts.
## Model Variants
### Kandinsky 5.0 Video Pro (19B)
- HD video at 1280x768, 24fps (5 or 10 seconds)
- Controllable camera motion via LoRA
- Top-1 open-source T2V model on LMArena
### Kandinsky 5.0 Video Lite (2B)
- Lightweight model, #1 among open-source in its class
- CFG-distilled (2x faster) and diffusion-distilled (6x faster) variants
- Best Russian concept understanding in open source
### Kandinsky 5.0 Image Lite (6B)
- HD image output (1280x768, 1024x1024)
- Strong text rendering; image editing variant available
## Key Features
- Bilingual support (English and Russian prompts)
- Flow Matching architecture with MIT license
- Camera control via trained LoRAs
- ComfyUI and Diffusers integration
- MagCache acceleration for faster inference
## Hardware Requirements
- Video Lite: 12GB VRAM minimum with optimizations
- Video Pro: 24GB+ VRAM recommended
- NF4 quantization and FlashAttention 2/3 or SDPA supported
## Common Use Cases
- Open-source video generation research
- Russian and English bilingual content creation
- Camera-controlled video synthesis
- Image generation with text rendering
- Fine-tuning with custom LoRAs
## Key Parameters
- **prompt**: Text description in English or Russian
- **num_frames**: Number of output frames (5s or 10s)
- **resolution**: Output resolution (up to 1280x768)
- **steps**: Inference steps (varies by distillation level)

View File

@@ -1,64 +0,0 @@
# Kling
Kling is a video and image generation platform developed by Kuaishou Technology. It offers text-to-video, image-to-video, video editing, audio generation, and virtual try-on capabilities through both a creative studio and a developer API.
## Model Variants
### Kling O1
- First unified multimodal video model combining generation and editing
- Built on Multimodal Visual Language (MVL) framework
- Accepts text, image, video, and subject inputs in a single prompt
- Supports video inpainting, outpainting, style re-rendering, and shot extension
- Character and scene consistency via Element Library with director-like memory
- Generates 3-10 second videos at up to 2K resolution
### Kling 2.6
- Simultaneous audio-visual generation in a single pass
- Produces video with speech, sound effects, and ambient sounds together
- Supports Chinese and English voice generation
- Video content up to 10 seconds with synchronized audio
- Deep semantic alignment between audio and visual dynamics
### Kling (Base Models)
- Text-to-video and image-to-video with Standard and Professional modes
- Multi-image-to-video with multiple reference inputs
- Camera control with 6 basic movements and 4 master shots
- Video extension, lip-sync, and avatar generation
- Start and end frame generation for controlled transitions
## Key Features
- Unified generation and editing in a single model (O1)
- Simultaneous audio-visual generation (2.6)
- Multi-subject consistency across shots and angles
- Conversational editing via natural language prompts
- Video effects center for special effects and transformations
- Virtual try-on and image recognition capabilities
- DeepSeek integration for prompt optimization
## Hardware Requirements
- Cloud API only; no local hardware required
- Accessed via klingai.com creative studio or API platform
- Standard and Professional generation modes (speed vs. quality tradeoff)
## Common Use Cases
- Film and television pre-production and shot generation
- Social media content creation with audio
- E-commerce product videos and virtual try-on
- Advertising with one-click ad generation
- Video post-production editing via text prompts
- Multi-character narrative video creation
## Key Parameters
- **prompt**: Text description with positive and negative prompts
- **mode**: Standard (fast) or Professional (high quality)
- **duration**: Video length (3-10 seconds for O1, up to 10s for 2.6)
- **aspect_ratio**: Width-to-height ratio for output
- **camera_control**: Predefined camera movements and master shots
- **creativity_strength**: Balance between reference fidelity and creative variation

View File

@@ -1,68 +0,0 @@
# LTX-Video
LTX-Video is Lightricks' open-source DiT-based video generation model, the first capable of generating high-quality videos in real-time.
## Model Variants
### LTX-Video 2 (v0.9.7/v0.9.8)
- Major quality upgrade over the original release
- Available in 2B and 13B parameter sizes
- 13B dev: highest quality, requires more VRAM
- 13B distilled: faster inference, fewer steps needed, slight quality trade-off
- 2B distilled: lightweight option for lower VRAM usage
- FP8 quantized versions available for all sizes (13B-dev, 13B-distilled, 2B-distilled)
- Multi-condition generation: condition on multiple images or video segments at specific frames
- Spatial and temporal upscaler models for enhanced resolution and frame rate
- ICLoRA adapters for depth, pose, and canny edge conditioning
- 9 workflow templates available
### LTX-Video 0.9.1/0.9.6
- Original public releases with 2B parameter DiT architecture
- Text-to-video and image-to-video modes
- 768x512 native resolution at 24fps
- 0.9.6 distilled variant: 15x faster, real-time capable, no CFG required
- Foundation for community fine-tunes
## Key Features
- Real-time video generation on high-end GPUs (first DiT model to achieve this)
- Generates 30 FPS video at 1216x704 resolution faster than playback speed
- Multi-condition generation with per-frame image/video conditioning and strength control
- Temporal VAE for smooth, consistent motion
- Multi-scale rendering pipeline mixing dev and distilled models for speed-quality balance
- Latent upsampling pipeline for progressive resolution enhancement
## Hardware Requirements
- 2B model: 12GB VRAM minimum, 16GB recommended
- 2B distilled FP8: 8-10GB VRAM
- 13B model: 24-32GB VRAM (fp16)
- 13B FP8: 16-20GB VRAM
- 13B distilled: less VRAM than 13B dev, ideal for rapid iterations
- 32GB+ system RAM recommended for all variants
## Common Use Cases
- Short-form video content and social media clips
- Image-to-video animation from reference frames
- Video-to-video transformation and extension
- Multi-condition video generation (start/end frame, keyframes)
- Depth, pose, and edge-conditioned video generation via ICLoRA
- Rapid video prototyping and creative experimentation
## Key Parameters
- **num_frames**: Output frame count (divisible by 8 + 1, e.g. 97, 161, 257)
- **steps**: 30-50 for dev models, 8-15 for distilled variants
- **cfg_scale**: 3-5 typical for dev, not required for distilled
- **width/height**: Divisible by 32, best under 720x1280 for 13B
- **denoise_strength**: 0.3-0.5 when using latent upsampler refinement pass
- **conditioning_strength**: Per-condition strength for multi-condition generation (default 1.0)
- **seed**: For reproducible generation
## Blog References
- [LTX-Video 0.9.5 Day-1 Support](../blog/ltx-video-095-support.md) — Commercial license (OpenRail-M), multi-frame control, improved quality
- [LTX-2: Open Source Audio-Video AI](../blog/ltx-2-open-source-audio-video.md) — Synchronized audio-video generation, NVFP4 for 3x speed / 60% less VRAM

View File

@@ -1 +0,0 @@
LTX-Video is Lightricks' open-source DiT-based video generation model, the first to achieve real-time video generation. LTX-Video 2 (v0.9.7/0.9.8) is available in 2B and 13B parameter sizes, with dev, distilled, and FP8 quantized variants. It supports multi-condition generation with per-frame image/video conditioning, spatial and temporal upscalers, and ICLoRA adapters for depth, pose, and canny conditioning. The 2B model needs 12-16GB VRAM (8-10GB FP8), while the 13B model requires 24-32GB (16-20GB FP8). It generates 30fps video at 1216x704 faster than playback speed. Earlier versions (0.9.1/0.9.6) established the 2B foundation with a 15x faster distilled variant. Primary uses: short-form video, image-to-video animation, video extension, and multi-condition keyframe generation. Key parameters: 30-50 steps for dev, 8-15 for distilled, cfg_scale 3-5, frames divisible by 8+1.

View File

@@ -1,50 +0,0 @@
# Luma
Luma AI develops video and image generation models through its Dream Machine platform, powered by the Ray model family and Photon image model.
## Model Variants
### Ray3 / Ray3.14
- Native 1080p video with reasoning-driven generation
- World's first native 16-bit HDR video generation
- Character reference, Modify Video, and Draft Mode (5x faster)
### Ray2
- Production-ready text-to-video and image-to-video
- 5-9 second output at 24fps with coherent motion
### Photon
- Image generation with strong prompt following
- Character and visual reference support
- 1080p output at $0.016 per image
## Key Features
- Reasoning capability for understanding creative intent
- Visual annotation for precise layout and motion control
- HDR generation with 16-bit EXR export for pro workflows
- Keyframe control, video extension, looping, and camera control
## Hardware Requirements
- API-only access via Luma AI API
- No local hardware requirements
- Available through Dream Machine web and iOS app
## Common Use Cases
- Cinematic video production and storytelling
- Commercial advertising and product videos
- Visual effects with Modify Video workflows
- HDR content for professional post-production
## Key Parameters
- **prompt**: Text description for video generation
- **keyframes**: Start and/or end frame images
- **aspect_ratio**: Output dimensions and ratio
- **loop**: Enable seamless looping
- **camera_control**: Camera movement via text instructions

View File

@@ -1,47 +0,0 @@
# Magnific
Magnific is an AI-powered image upscaler and enhancer that uses generative AI to hallucinate new details and textures during the upscaling process.
## Model Variants
### Magnific Creative Upscaler
- Generative upscaling up to 16x (max 10,000px per dimension)
- AI engines: Illusio (illustration), Sharpy (photography), Sparkle (balanced)
- Adds hallucinated details guided by text prompts
### Magnific Precision Upscaler
- Faithful high-fidelity upscaling without creative reinterpretation
- Clean enlargement that stays true to the source image
### Mystic Image Generator
- Photorealistic text-to-image/image-to-image with LoRA styles at up to 4K
## Key Features
- Creativity slider controls AI-hallucinated detail level
- HDR control for micro-contrast and crispness
- Resemblance slider to balance fidelity vs. creative enhancement
- Optimized modes for portraits, illustrations, video games, and film
- API hosted on Freepik with Skin Enhancer endpoint
## Hardware Requirements
- Cloud-only service with no local hardware requirements
- API available through Freepik's developer platform
- Subscription-based with credit system
## Common Use Cases
- Upscaling AI-generated images for print and production
- Enhancing low-resolution concept art and illustrations
- Restoring old or compressed photographs
## Key Parameters
- Creativity: level of new detail hallucination (0-10)
- HDR: micro-contrast and sharpness (-10 to 10)
- Resemblance: fidelity to source image (-10 to 10)
- Scale Factor: 2x, 4x, 8x, or 16x magnification

View File

@@ -1,49 +0,0 @@
# Meshy
Meshy is a popular AI 3D model generator enabling text-to-3D and image-to-3D creation with PBR textures and production-ready exports.
## Model Variants
### Meshy-6
- Latest generation with highest quality geometry
- Supports symmetry and pose control (A-pose, T-pose)
- Configurable polygon counts up to 300,000
### Meshy-5
- Previous generation with art style support
- Realistic and sculpture style options
## Key Features
- Text-to-3D with two-stage workflow (preview mesh, then refine textures)
- Image-to-3D from photos, sketches, or illustrations
- Multi-image input for multi-view reconstruction
- AI texturing with PBR maps (diffuse, roughness, metallic, normal)
- Automatic rigging and 500+ animation motion library
- Smart remesh with quad or triangle topology control
- Export in FBX, GLB, OBJ, STL, 3MF, USDZ, BLEND formats
## Hardware Requirements
- Cloud API-based (no local GPU required)
- All generation runs on Meshy servers
- API available on Pro tier and above
## Common Use Cases
- Game development asset creation
- 3D printing and prototyping
- Film and VFX previsualization
- VR/AR content development
- Product design and e-commerce
## Key Parameters
- **prompt**: Text description up to 600 characters
- **ai_model**: Model version (meshy-5, meshy-6, latest)
- **topology**: Mesh type (quad or triangle)
- **target_polycount**: 100 to 300,000 polygons
- **enable_pbr**: Generate PBR material maps
- **pose_mode**: Character pose (a-pose, t-pose, or none)

View File

@@ -1,58 +0,0 @@
# MiniMax
MiniMax is a multi-modal AI company known for the Hailuo video generation models and Image-01, offering API-based video and image creation.
## Model Variants
### Hailuo 2.3
- Latest video model with improved body movement and facial expressions
- Supports anime, illustration, ink-wash, and game-CG styles
- 768p or 1080p resolution, 6 or 10 second clips
- Available in Quality and Fast variants
### Hailuo 2.0 (Hailuo 02)
- Native 1080p with Noise-aware Compute Redistribution (NCR)
- 2.5x efficiency improvement over predecessors
- Last-frame conditioning support
### Image-01
- Text-to-image generation with multiple output sizes
### T2V-01-Director
- Enhanced camera control with natural language commands
- Pan, zoom, tracking shot, and shake directives
## Key Features
- Text-to-video and image-to-video generation
- Up to 1080p resolution at 25fps
- Video clips up to 10 seconds
- Camera control with natural language commands
- Subject consistency with reference images
- Text-to-image generation with Image-01
## Hardware Requirements
- Cloud API-based (no local GPU required)
- All generation runs on MiniMax servers
- API access via platform.minimax.io
## Common Use Cases
- Social media video content creation
- Cinematic short film production
- Product advertising and e-commerce videos
- Anime and illustrated content
- Character-driven narrative scenes
## Key Parameters
- **prompt**: Text description for generation
- **model**: Model selection (hailuo-2.3, hailuo-02, image-01)
- **resolution**: Output resolution (768p or 1080p)
- **duration**: Clip length (6 or 10 seconds for video)
- **first_frame_image**: Reference image for image-to-video

View File

@@ -1,762 +0,0 @@
{
"generated": "2026-02-07",
"totalModels": 87,
"categories": {
"specific_model": [
{
"name": "Wan",
"category": "specific_model",
"templateCount": 36,
"priority": 108,
"docFile": "wan",
"hasExistingDoc": true
},
{
"name": "Nano Banana Pro",
"category": "specific_model",
"templateCount": 29,
"priority": 87,
"docFile": "nano-banana-pro",
"hasExistingDoc": false
},
{
"name": "Flux",
"category": "specific_model",
"templateCount": 24,
"priority": 72,
"docFile": "flux",
"hasExistingDoc": true
},
{
"name": "SDXL",
"category": "specific_model",
"templateCount": 4,
"priority": 12,
"docFile": "sdxl",
"hasExistingDoc": true
},
{
"name": "ACE-Step",
"category": "specific_model",
"templateCount": 7,
"priority": 21,
"docFile": "ace-step",
"hasExistingDoc": false
},
{
"name": "Seedance",
"category": "specific_model",
"templateCount": 6,
"priority": 18,
"docFile": "seedance",
"hasExistingDoc": false
},
{
"name": "Seedream",
"category": "specific_model",
"templateCount": 5,
"priority": 15,
"docFile": "seedream",
"hasExistingDoc": false
},
{
"name": "HiDream",
"category": "specific_model",
"templateCount": 5,
"priority": 15,
"docFile": "hidream",
"hasExistingDoc": false
},
{
"name": "Stable Audio",
"category": "specific_model",
"templateCount": 4,
"priority": 12,
"docFile": "stable-audio",
"hasExistingDoc": false
},
{
"name": "Chatter Box",
"category": "specific_model",
"templateCount": 4,
"priority": 12,
"docFile": "chatterbox",
"hasExistingDoc": false
},
{
"name": "Z-Image-Turbo",
"category": "specific_model",
"templateCount": 4,
"priority": 12,
"docFile": "z-image",
"hasExistingDoc": false
},
{
"name": "Kandinsky",
"category": "specific_model",
"templateCount": 3,
"priority": 9,
"docFile": "kandinsky",
"hasExistingDoc": false
},
{
"name": "OmniGen",
"category": "specific_model",
"templateCount": 3,
"priority": 9,
"docFile": "omnigen",
"hasExistingDoc": false
},
{
"name": "SeedVR2",
"category": "specific_model",
"templateCount": 3,
"priority": 9,
"docFile": "seedvr2",
"hasExistingDoc": false
},
{
"name": "Chroma",
"category": "specific_model",
"templateCount": 2,
"priority": 6,
"docFile": "chroma",
"hasExistingDoc": false
},
{
"name": "ChronoEdit",
"category": "specific_model",
"templateCount": 1,
"priority": 3,
"docFile": "chronoedit",
"hasExistingDoc": false
},
{
"name": "HuMo",
"category": "specific_model",
"templateCount": 1,
"priority": 3,
"docFile": "humo",
"hasExistingDoc": false
},
{
"name": "NewBie",
"category": "specific_model",
"templateCount": 1,
"priority": 3,
"docFile": "newbie",
"hasExistingDoc": false
},
{
"name": "Ovis-Image",
"category": "specific_model",
"templateCount": 1,
"priority": 3,
"docFile": "ovis-image",
"hasExistingDoc": false
}
],
"provider_name": [
{
"name": "Google",
"category": "provider_name",
"templateCount": 29,
"priority": 0,
"mapsTo": ["gemini", "veo", "nano-banana-pro"],
"hasExistingDoc": false
},
{
"name": "BFL",
"category": "provider_name",
"templateCount": 28,
"priority": 0,
"mapsTo": ["flux"],
"hasExistingDoc": false
},
{
"name": "Stability",
"category": "provider_name",
"templateCount": 19,
"priority": 0,
"mapsTo": ["sdxl", "stable-audio", "reimagine"],
"hasExistingDoc": false
},
{
"name": "ByteDance",
"category": "provider_name",
"templateCount": 11,
"priority": 0,
"mapsTo": ["seedance", "seedvr2", "seedream"],
"hasExistingDoc": false
},
{
"name": "OpenAI",
"category": "provider_name",
"templateCount": 11,
"priority": 0,
"mapsTo": ["gpt-image-1"],
"hasExistingDoc": false
},
{
"name": "Lightricks",
"category": "provider_name",
"templateCount": 9,
"priority": 0,
"mapsTo": ["ltx-video"],
"hasExistingDoc": false
},
{
"name": "Tencent",
"category": "provider_name",
"templateCount": 5,
"priority": 0,
"mapsTo": ["hunyuan"],
"hasExistingDoc": false
},
{
"name": "Qwen",
"category": "provider_name",
"templateCount": 2,
"priority": 0,
"mapsTo": ["qwen"],
"hasExistingDoc": true
},
{
"name": "Nvidia",
"category": "provider_name",
"templateCount": 1,
"priority": 0,
"mapsTo": [],
"hasExistingDoc": false
}
],
"api_only": [
{
"name": "Vidu",
"category": "api_only",
"templateCount": 10,
"priority": 20,
"docFile": "vidu",
"hasExistingDoc": false
},
{
"name": "Kling",
"category": "api_only",
"templateCount": 9,
"priority": 18,
"docFile": "kling",
"hasExistingDoc": false
},
{
"name": "Recraft",
"category": "api_only",
"templateCount": 6,
"priority": 12,
"docFile": "recraft",
"hasExistingDoc": false
},
{
"name": "Runway",
"category": "api_only",
"templateCount": 5,
"priority": 10,
"docFile": "runway",
"hasExistingDoc": false
},
{
"name": "Tripo",
"category": "api_only",
"templateCount": 5,
"priority": 10,
"docFile": "tripo",
"hasExistingDoc": false
},
{
"name": "GPT-Image-1",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "gpt-image-1",
"hasExistingDoc": false
},
{
"name": "MiniMax",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "minimax",
"hasExistingDoc": false
},
{
"name": "Grok",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "grok",
"hasExistingDoc": false
},
{
"name": "Luma",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "luma",
"hasExistingDoc": false
},
{
"name": "Moonvalley",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "moonvalley",
"hasExistingDoc": false
},
{
"name": "Topaz",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "topaz",
"hasExistingDoc": false
},
{
"name": "PixVerse",
"category": "api_only",
"templateCount": 3,
"priority": 6,
"docFile": "pixverse",
"hasExistingDoc": false
},
{
"name": "Meshy",
"category": "api_only",
"templateCount": 3,
"priority": 6,
"docFile": "meshy",
"hasExistingDoc": false
},
{
"name": "Rodin",
"category": "api_only",
"templateCount": 3,
"priority": 6,
"docFile": "rodin",
"hasExistingDoc": false
},
{
"name": "Magnific",
"category": "api_only",
"templateCount": 3,
"priority": 6,
"docFile": "magnific",
"hasExistingDoc": false
},
{
"name": "WaveSpeed",
"category": "api_only",
"templateCount": 3,
"priority": 6,
"docFile": "wavespeed",
"hasExistingDoc": false
},
{
"name": "BRIA",
"category": "api_only",
"templateCount": 2,
"priority": 4,
"docFile": "bria",
"hasExistingDoc": false
},
{
"name": "Veo",
"category": "api_only",
"templateCount": 2,
"priority": 4,
"docFile": "veo",
"hasExistingDoc": false
},
{
"name": "HitPaw",
"category": "api_only",
"templateCount": 2,
"priority": 4,
"docFile": "hitpaw",
"hasExistingDoc": false
},
{
"name": "Z-Image",
"category": "api_only",
"templateCount": 1,
"priority": 2,
"docFile": "z-image",
"hasExistingDoc": false
},
{
"name": "Anima",
"category": "api_only",
"templateCount": 1,
"priority": 2,
"docFile": "anima",
"hasExistingDoc": false
},
{
"name": "Reimagine",
"category": "api_only",
"templateCount": 1,
"priority": 2,
"docFile": "reimagine",
"mapsTo": ["stability"],
"hasExistingDoc": false
},
{
"name": "Ideogram",
"category": "api_only",
"templateCount": 1,
"priority": 2,
"docFile": "ideogram",
"hasExistingDoc": false
},
{
"name": "Gemini3 Pro Image Preview",
"category": "api_only",
"templateCount": 16,
"priority": 32,
"docFile": "gemini",
"hasExistingDoc": false
}
],
"utility_model": [
{
"name": "SVD",
"category": "utility_model",
"templateCount": 1,
"priority": 1,
"docFile": "svd",
"hasExistingDoc": false
},
{
"name": "Real-ESRGAN",
"category": "utility_model",
"templateCount": 1,
"priority": 1,
"docFile": "real-esrgan",
"hasExistingDoc": false
},
{
"name": "Depth Anything v2",
"category": "utility_model",
"templateCount": 1,
"priority": 1,
"docFile": "depth-anything-v2",
"hasExistingDoc": false
},
{
"name": "FlashVSR",
"category": "utility_model",
"templateCount": 1,
"priority": 1,
"docFile": "flashvsr",
"hasExistingDoc": false
}
],
"variant": [
{
"name": "Wan2.1",
"category": "variant",
"templateCount": 21,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": true
},
{
"name": "Wan2.2",
"category": "variant",
"templateCount": 15,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": true
},
{
"name": "Qwen-Image-Edit",
"category": "variant",
"templateCount": 11,
"priority": 0,
"mapsTo": "qwen",
"hasExistingDoc": true
},
{
"name": "LTX-2",
"category": "variant",
"templateCount": 9,
"priority": 0,
"mapsTo": "ltx-video",
"hasExistingDoc": true
},
{
"name": "Qwen-Image",
"category": "variant",
"templateCount": 7,
"priority": 0,
"mapsTo": "qwen",
"hasExistingDoc": true
},
{
"name": "Hunyuan3D",
"category": "variant",
"templateCount": 7,
"priority": 0,
"mapsTo": "hunyuan",
"hasExistingDoc": true
},
{
"name": "Google Gemini Image",
"category": "variant",
"templateCount": 6,
"priority": 0,
"mapsTo": "gemini",
"hasExistingDoc": false
},
{
"name": "Flux.2 Klein",
"category": "variant",
"templateCount": 6,
"priority": 0,
"mapsTo": "flux",
"hasExistingDoc": true
},
{
"name": "Kling O1",
"category": "variant",
"templateCount": 5,
"priority": 0,
"mapsTo": "kling",
"hasExistingDoc": false
},
{
"name": "Vidu Q2",
"category": "variant",
"templateCount": 5,
"priority": 0,
"mapsTo": "vidu",
"hasExistingDoc": false
},
{
"name": "SD3.5",
"category": "variant",
"templateCount": 4,
"priority": 0,
"mapsTo": "sdxl",
"hasExistingDoc": false
},
{
"name": "Google Gemini",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "gemini",
"hasExistingDoc": false
},
{
"name": "Flux.2 Dev",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "flux",
"hasExistingDoc": true
},
{
"name": "Flux.2",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "flux",
"hasExistingDoc": true
},
{
"name": "Wan2.5",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": true
},
{
"name": "Kontext",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "flux",
"hasExistingDoc": false
},
{
"name": "Wan2.6",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": true
},
{
"name": "Hunyuan Video",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "hunyuan",
"hasExistingDoc": true
},
{
"name": "Vidu Q3",
"category": "variant",
"templateCount": 2,
"priority": 0,
"mapsTo": "vidu",
"hasExistingDoc": false
},
{
"name": "LTXV",
"category": "variant",
"templateCount": 2,
"priority": 0,
"mapsTo": "ltx-video",
"hasExistingDoc": true
},
{
"name": "Qwen-Image-Layered",
"category": "variant",
"templateCount": 2,
"priority": 0,
"mapsTo": "qwen",
"hasExistingDoc": true
},
{
"name": "SD1.5",
"category": "variant",
"templateCount": 2,
"priority": 0,
"mapsTo": "sdxl",
"hasExistingDoc": false
},
{
"name": "Gemini-2.5-Flash",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "gemini",
"hasExistingDoc": false
},
{
"name": "Qwen-Image 2512",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "qwen",
"hasExistingDoc": true
},
{
"name": "Seedream 4.0",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "seedream",
"hasExistingDoc": false
},
{
"name": "GPT-Image-1.5",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "gpt-image-1",
"hasExistingDoc": false
},
{
"name": "Kling2.6",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "kling",
"hasExistingDoc": false
},
{
"name": "Wan-Move",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": true
},
{
"name": "Motion Control",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": false
}
],
"skip": [
{
"name": "None",
"category": "skip",
"templateCount": 1,
"priority": 0,
"hasExistingDoc": false
},
{
"name": "nano-banana",
"category": "skip",
"templateCount": 1,
"priority": 0,
"note": "Duplicate of Nano Banana Pro",
"hasExistingDoc": false
}
]
},
"priorityOrder": [
"wan",
"nano-banana-pro",
"flux",
"gemini",
"ace-step",
"vidu",
"kling",
"seedance",
"seedream",
"hidream",
"sdxl",
"stable-audio",
"chatterbox",
"z-image",
"recraft",
"runway",
"tripo",
"kandinsky",
"omnigen",
"seedvr2",
"gpt-image-1",
"minimax",
"grok",
"luma",
"moonvalley",
"topaz",
"chroma",
"pixverse",
"meshy",
"rodin",
"magnific",
"wavespeed",
"bria",
"veo",
"hitpaw",
"newbie",
"ovis-image",
"chronoedit",
"humo",
"anima",
"reimagine",
"ideogram",
"svd",
"real-esrgan",
"depth-anything-v2",
"flashvsr"
]
}

View File

@@ -1,53 +0,0 @@
# Moonvalley (Marey)
Marey is Moonvalley's AI video generation model for professional filmmakers, delivering studio-grade quality and trained exclusively on licensed footage.
## Model Variants
### Marey Realism v1.5
- Latest production model with cinematic detail
- 1080p resolution at 24fps, up to 5-second clips
- Available via ComfyUI native nodes and fal.ai
### Marey Director Controls
- 3D-aware camera control from single images
- Motion transfer from reference videos
- Trajectory control for object path definition
- Pose transfer and keyframing with multi-image timeline
## Key Features
- Text-to-video and image-to-video generation
- Camera control with 3D scene understanding
- Motion transfer from reference video clips
- Trajectory control via drawn paths
- Pose transfer for expressive character animation
- Shot extension for seamless duration increase
- Commercially safe (trained on licensed data only)
## Hardware Requirements
- Cloud API-based (no local GPU required)
- Available via Moonvalley platform, ComfyUI, and fal.ai
- Subscription tiers starting at $14.99/month
## Common Use Cases
- Professional film and commercial production
- Cinematic B-roll generation
- Previsualization and storyboarding
- Music video and social media content
- Product advertising with dynamic camera
- Animation and character-driven storytelling
## Key Parameters
- **prompt**: Text description of desired video scene
- **image**: Reference image for image-to-video mode
- **camera_control**: Camera movement specification
- **motion_reference**: Video reference for motion transfer
- **trajectory**: Drawn path for object movement
- **duration**: Clip length (up to 5 seconds)
- **resolution**: Output resolution (up to 1080p at 24fps)

View File

@@ -1,53 +0,0 @@
# Nano Banana Pro
Nano Banana Pro is Google DeepMind's flagship image generation and editing model, accessed through ComfyUI's API nodes. Internally it is the Gemini 3 Pro Image model, designed for production-ready high-fidelity visuals.
## Model Variants
### Nano Banana Pro (Gemini 3 Pro Image)
- State-of-the-art reasoning-powered image generation
- Supports up to 14 reference image inputs
- Native 4K output resolution (up to 4096x4096)
- Complex multi-turn image generation and editing
- Model ID: `gemini-3-pro-image-preview`
### Gemini 2.5 Flash Image (Nano Banana)
- Cost-effective alternative optimized for speed
- Balanced price-to-performance for interactive workflows
- Character consistency and prompt-based editing
- Model ID: `gemini-2.5-flash-image`
## Key Features
- **World knowledge**: Generates accurate real-world images using Google Search's knowledge base
- **Text rendering**: Clean text generation with detection and translation across 10 languages
- **Multi-image fusion**: Blend up to 14 input images into a single coherent output
- **Studio controls**: Adjust angles, focus, color grading in generated images
- **Character consistency**: Maintain subject identity across multiple generations
- **Prompt-based editing**: Targeted transformations via natural language instructions
## Hardware Requirements
- No local GPU required — runs as a cloud API service
- Accessed via ComfyUI API nodes (requires ComfyUI login and network access)
- Available on Comfy Cloud or local ComfyUI with API node support
## Common Use Cases
- High-fidelity text-to-image generation
- Multi-reference style transfer and image blending
- Product visualization and mockups
- Sketch-to-image and blueprint-to-3D visualization
- Text rendering and translation in images
- Iterative prompt-based image editing
## Key Parameters
- **prompt**: Text description of desired image or edit
- **aspect_ratio**: Supported ratios include 1:1, 3:2, 4:3, 9:16, 16:9, 21:9
- **temperature**: 0.0-2.0 (default 1.0)
- **topP**: 0.0-1.0 (default 0.95)
- **max_output_tokens**: Up to 32,768 tokens per response
- **input images**: Up to 14 reference images per prompt

View File

@@ -1,43 +0,0 @@
# NewBie
NewBie image Exp0.1 is a 3.5B parameter open-source text-to-image model built on the Next-DiT architecture, developed by the NewBie-AI community. It is specifically pretrained on high-quality anime data for detailed and visually striking anime-style image generation.
## Model Variants
### NewBie image Exp0.1
- 3.5B parameter DiT model based on Next-DiT architecture
- Uses Gemma3-4B-it as primary text encoder with Jina CLIP v2 for pooled features
- FLUX.1-dev 16-channel VAE for rich color rendering and fine texture detail
- Supports natural language, tags, and XML structured prompts
- Non-commercial community license (Newbie-NC-1.0) for model weights
## Key Features
- Exceptional anime and ACG (Anime, Comics, Games) style generation
- XML structured prompting for improved attribute binding and element disentanglement
- Strong multi-character scene generation with accurate attribute assignment
- ComfyUI integration via dedicated custom nodes
- LoRA training support with community trainer
- Built on research from the Lumina architecture family
## Hardware Requirements
- Minimum: 12GB VRAM (bfloat16 or float16)
- Recommended: 24GB VRAM for comfortable generation
- Requires Gemma3-4B-it and Jina CLIP v2 text encoders
- Python 3.10, PyTorch 2.6.0+, Transformers 4.57.1+
## Common Use Cases
- Anime and illustration generation
- Character design with precise attribute control
- Multi-character scene composition
- Fan art and creative anime artwork
## Key Parameters
- **num_inference_steps**: 28 recommended
- **height/width**: 1024x1024 native resolution
- **prompt_format**: Natural language, tags, or XML structured
- **torch_dtype**: bfloat16 recommended (float16 fallback)

View File

@@ -1,53 +0,0 @@
# OmniGen2
OmniGen2 is a multimodal generation model with dual decoding pathways for text and image, built on the Qwen-VL-2.5 foundation by VectorSpaceLab.
## Model Variants
### OmniGen2
- 3B vision-language encoder (Qwen-VL-2.5) + 4B image decoder
- Dual decoding with unshared parameters for text and image
- Decoupled image tokenizer
- Apache 2.0 license
### OmniGen v1
- Earlier single-pathway architecture
- Fewer capabilities than OmniGen2
- Superseded by OmniGen2
## Key Features
- Text-to-image generation with high fidelity and aesthetics
- Instruction-guided image editing (state-of-the-art among open-source models)
- In-context generation combining multiple reference inputs (humans, objects, scenes)
- Visual understanding inherited from Qwen-VL-2.5
- CPU offload support reduces VRAM usage by nearly 50%
- Sequential CPU offload available for under 3GB VRAM (slower inference)
- Supports negative prompts and configurable guidance scales
## Hardware Requirements
- Minimum: NVIDIA RTX 3090 or equivalent (~17GB VRAM)
- With CPU offload: ~9GB VRAM
- With sequential CPU offload: under 3GB VRAM (significantly slower)
- Flash Attention optional but recommended for best performance
- CUDA 12.4+ recommended
- Default output resolution: 1024x1024
## Common Use Cases
- Text-to-image generation
- Instruction-based photo editing
- Subject-driven image generation from reference photos
- Multi-image composition and in-context editing
## Key Parameters
- **text_guidance_scale**: Controls adherence to text prompt (CFG)
- **image_guidance_scale**: Controls similarity to reference image (1.2-2.0 for editing, 2.5-3.0 for in-context)
- **num_inference_step**: Diffusion steps (default 50)
- **max_pixels**: Maximum total pixel count for input images (default 1024x1024)
- **negative_prompt**: Text describing undesired qualities (e.g., "blurry, low quality, watermark")
- **scheduler**: ODE solver choice (euler or dpmsolver++)

View File

@@ -1,43 +0,0 @@
# Ovis-Image
Ovis-Image is a 7B text-to-image model by AIDC-AI, built on Ovis-U1, optimized for high-quality text rendering in generated images. It achieves state-of-the-art results on the CVTG-2K text rendering benchmark while remaining compact enough for single-GPU deployment.
## Model Variants
### Ovis-Image-7B
- 2B (Ovis2.5-2B) + 7B parameter architecture
- State-of-the-art on CVTG-2K benchmark for text rendering accuracy
- Competitive with 20B+ models (Qwen-Image) and GPT-4o on text-centric tasks
- Uses FLUX-based autoencoder for latent encoding
- Apache 2.0 license
## Key Features
- Excellent text rendering with correct spelling and consistent typography
- High fidelity on text-heavy, layout-sensitive prompts
- Handles posters, banners, logos, UI mockups, and infographics
- Supports diverse fonts, sizes, and aspect ratios
- Strong performance on both English and Chinese text generation
- Available via Diffusers library with OvisImagePipeline
## Hardware Requirements
- Minimum: 16GB VRAM (bfloat16)
- Recommended: 24GB VRAM for comfortable use
- Fits on a single high-end GPU
- Tested with Python 3.10, PyTorch 2.6.0, Transformers 4.57.1
## Common Use Cases
- Generating posters and banners with accurate text
- Logo and brand asset creation
- UI mockup and infographic generation
- Marketing materials with embedded typography
## Key Parameters
- **num_inference_steps**: 50 recommended
- **guidance_scale**: 5.0
- **resolution**: 1024x1024 native
- **negative_prompt**: Supported for quality control

View File

@@ -1,46 +0,0 @@
# PixVerse
PixVerse is an AI video generation platform founded in 2023 and backed by Alibaba, offering text-to-video and image-to-video capabilities with over 100 million registered users.
## Model Variants
### PixVerse V5.5
- Latest model with improved fidelity, text-to-video, image-to-video, and modification
### PixVerse R1
- Real-time AI video generation model
- Interactive control where users direct character actions as video unfolds
### PixVerse V4.5 / V5
- Previous generation models with strong cinematic quality and trending effects
## Key Features
- Text-to-video generation from natural language prompts
- Image-to-video animation with realistic physics simulation
- Fusion mode combining up to 3 images into one video
- Key frame control and video extension with AI continuity
- AI Video Modify for text-prompt-based editing
## Hardware Requirements
- Cloud-based platform with no local GPU required
- Web app at app.pixverse.ai and mobile apps (iOS/Android)
- API at platform.pixverse.ai for developer integration
## Common Use Cases
- Social media content creation (AI Kiss, Hug, Dance effects)
- Marketing and promotional video production
- Old photo revival and animation
- Cinematic narrative and stylistic art generation
## Key Parameters
- prompt: text description of the desired video content
- duration: video length (typically 5s clips)
- resolution: output quality (360p to 720p+)
- aspect_ratio: 16:9, 9:16, 1:1, and other ratios

View File

@@ -1,77 +0,0 @@
# Qwen
Qwen is Alibaba's family of vision-language and image generation models, spanning visual understanding, image editing, and image generation.
## Model Variants
### Qwen2.5-VL
- Multimodal vision-language model from the Qwen team
- Available in 3B, 7B, and 72B parameter sizes
- Image understanding, video comprehension (1+ hour videos), and visual localization
- Visual agent capabilities: computer use, phone use, dynamic tool calling
- Structured output generation for invoices, forms, and tables
- Dynamic resolution and frame rate training for video understanding
- Optimized ViT encoder with window attention, SwiGLU, and RMSNorm
### Qwen-Image-Edit
- Specialized image editing model with instruction-following
- Supports inpainting, outpainting, style transfer, and content-aware edits
- 11 workflow templates available
### Qwen-Image
- Text-to-image generation model from the Qwen family
- 7 workflow templates available
### Qwen-Image-Layered
- Layered image generation for composable outputs
- Generates images with separate foreground/background layers
- 2 workflow templates available
### Qwen-Image 2512
- Specific variant optimized for particular generation tasks
- 1 workflow template available
## Key Features
- Strong visual understanding with state-of-the-art benchmark results
- Native multi-language support including Chinese and English
- Visual agent capabilities for computer and phone interaction
- Video event capture with temporal segment pinpointing
- Bounding box and point-based visual localization
- Structured JSON output for document and table extraction
- Instruction-based image editing with precise control
## Hardware Requirements
- 3B model: 6-8GB VRAM
- 7B model: 16GB VRAM, flash_attention_2 recommended for multi-image/video
- 72B model: Multi-GPU setup required (80GB+ per GPU)
- Context length: 32,768 tokens default, extendable to 64K+ with YaRN
- Dynamic pixel budget: 256-1280 tokens per image (configurable min/max pixels)
## Common Use Cases
- Image editing based on text instructions
- Visual question answering and image description
- Long video comprehension and event extraction
- Document OCR and structured data extraction
- Visual agent tasks (screen interaction, UI navigation)
- Layered image generation for design workflows
- Text-to-image generation with strong prompt following
## Key Parameters
- **max_new_tokens**: Controls output length for VL model responses
- **min_pixels / max_pixels**: Control image token budget (e.g. 256x28x28 to 1280x28x28)
- **temperature**: Generation diversity for text outputs
- **resized_height / resized_width**: Direct image dimension control (rounded to nearest 28)
- **fps**: Frame rate for video input processing in Qwen2.5-VL
## Blog References
- [Qwen Image Edit 2511 & Qwen Image Layered](../blog/qwen-image-edit-2511.md) — Better character consistency, RGBA layer decomposition, built-in LoRA support

View File

@@ -1 +0,0 @@
Qwen is Alibaba's family of vision-language and image generation models. Qwen2.5-VL is a multimodal vision-language model available in 3B (6-8GB VRAM), 7B (16GB), and 72B (multi-GPU 80GB+) sizes, capable of image understanding, hour-long video comprehension, visual localization, visual agent tasks (computer/phone use), and structured JSON output for document extraction. Qwen-Image-Edit provides instruction-based image editing with inpainting, outpainting, and style transfer. Qwen-Image handles text-to-image generation, while Qwen-Image-Layered produces composable foreground/background layer outputs. The family features native Chinese/English support, strong prompt following, and state-of-the-art visual understanding benchmarks. Key parameters include dynamic pixel budgets (256-1280 tokens per image), configurable frame rates for video input, and temperature for text diversity. Primary uses: image editing, visual QA, video comprehension, document OCR, and layered image generation.

View File

@@ -1,61 +0,0 @@
# Real-ESRGAN
Real-ESRGAN is a practical image and video super-resolution model that extends ESRGAN with improved training on pure synthetic data for real-world restoration.
## Model Variants
### RealESRGAN_x4plus
- General-purpose 4× upscaling model for real-world images
- RRDB (Residual-in-Residual Dense Block) architecture
- Handles noise, blur, JPEG compression artifacts
### RealESRGAN_x4plus_anime_6B
- Optimized for anime and illustration images
- Smaller 6-block model for faster inference
- Better edge preservation for line art
### RealESRGAN_x2plus
- 2× upscaling variant for moderate enlargement
- Lower risk of hallucinated details
### realesr-animevideov3
- Lightweight model designed for anime video frames
- Temporal consistency for video processing
## Key Features
- Trained entirely on synthetic degradation data (no paired real-world data needed)
- Second-order degradation modeling simulates real-world compression chains
- GFPGAN integration for face enhancement during upscaling
- Tiling support for processing large images with limited VRAM
- FP16 (half precision) inference for faster processing
- NCNN Vulkan portable executables for cross-platform GPU support (Intel/AMD/NVIDIA)
- Supports 2×, 3×, and 4× upscaling with arbitrary output scale via LANCZOS4 resize
## Hardware Requirements
- Minimum: 2GB VRAM with tiling enabled
- Recommended: 4GB+ VRAM for comfortable use
- NCNN Vulkan build runs on any GPU with Vulkan support
- CPU inference supported but significantly slower
## Common Use Cases
- Upscaling old or low-resolution photographs
- Enhancing compressed web images
- Anime and manga image upscaling
- Video frame super-resolution
- Restoring degraded historical images
- Pre-processing for print from low-resolution sources
## Key Parameters
- **outscale**: Final upsampling scale factor (default: 4)
- **tile**: Tile size for memory management (0 = no tiling)
- **face_enhance**: Enable GFPGAN face enhancement (default: false)
- **model_name**: Select model variant (RealESRGAN_x4plus, anime_6B, etc.)
- **denoise_strength**: Balance noise removal vs detail preservation (realesr-general-x4v3)

View File

@@ -1,50 +0,0 @@
# Recraft
Recraft is an AI image generation platform known for its V3 model and unique ability to produce both raster and vector (SVG) images from text prompts.
## Model Variants
### Recraft V3
- Top-ranked model on Artificial Analysis Text-to-Image Leaderboard
- Supports raster image generation at $0.04 per image
- Supports vector SVG generation at $0.08 per image
- Accurate text rendering at any size in generated images
### Recraft 20B
- More cost-effective variant at $0.022 per raster image
- Vector generation at $0.044 per image
- Suitable for high-volume production workflows
## Key Features
- Native vector SVG image generation from text prompts
- Accurate text rendering (headlines, labels, signs) in images
- Custom brand style creation from reference images
- Generation in exact brand colors for brand consistency
- AI-powered image vectorization (PNG/JPG to SVG)
- Background removal, creative upscaling, and crisp upscaling
- Multiple style presets: photorealism, clay, retro-pop, hand-drawn, 80s
## Hardware Requirements
- API-only access via Recraft API
- No local hardware requirements
- Available through Recraft Studio web interface
## Common Use Cases
- Logo and icon design (SVG output)
- Brand-consistent marketing asset generation
- Poster and advertisement creation with text
- Scalable vector illustrations for web and print
- Product mockup generation
- SEO blog imagery at scale
## Key Parameters
- **prompt**: Text description of the desired image
- **style**: Visual style (realistic_image, digital_illustration, vector_illustration, icon)
- **colors**: Brand color palette for consistent output
- **format**: Output format (raster PNG/JPG or vector SVG)

Some files were not shown because too many files have changed in this diff Show More