Compare commits

..

104 Commits

Author SHA1 Message Date
snomiao
83de5a222e feat: use demowright showTitleCard API for early title overlay
- Import showTitleCard/hideTitleCard from demowright/video-script
- Replace page.evaluate() hack with official demowright API
- CI clones demowright feat/show-title-card-api branch
- demowright PR: https://github.com/snomiao/demowright/pull/11
2026-04-13 09:24:44 +00:00
snomiao
2faadaeab0 fix: early title card covers setup, remove unstable ffmpeg trim
Show title card via page.evaluate() IMMEDIATELY before setup code runs.
Setup (setSetting, setupWorkflowsDirectory) executes behind the card.
Card is removed before createVideoScript() renders its own title.
This ensures the title card is visible from the first frame of the video.
2026-04-13 09:18:43 +00:00
snomiao
cb921ada71 fix: remove autoplay so browser plays video with sound 2026-04-13 07:21:29 +00:00
snomiao
884270c46f feat: show verdict banner with failure reason for non-reproduced bugs
NOT_REPRODUCIBLE and INCONCLUSIVE verdicts now display a prominent
banner with the agent's summary and evidence explaining why the bug
could not be reproduced. Default video playback speed changed to 1x.
2026-04-13 06:27:05 +00:00
snomiao
51e48b55b3 fix: default video playback speed 1x instead of 0.5x 2026-04-13 06:19:56 +00:00
snomiao
07f6611cc8 trigger: re-run QA for #10766 2026-04-12 13:51:17 +00:00
snomiao
55acfdd43b feat: video trim, deploy video-script, require videoScript, verdict mismatch
1. ffmpeg -ss 7: trim ComfyUI loading screen from demo video
2. Deploy video-script.spec.ts to report site + show in HTML
3. videoScript is REQUIRED when verdict=REPRODUCED (prompt + tool desc)
4. Log warning when E2E and video review verdicts disagree
2026-04-12 10:25:18 +00:00
snomiao
09ea4ae302 fix: title card first — setup before createVideoScript
- Move all setup code (setSetting, setupWorkflowsDirectory) BEFORE
  createVideoScript() so title card is first frame in video
- Add explicit rules: pace() first, waitForTimeout after actions,
  5s hold on final evidence
2026-04-11 23:51:18 +00:00
snomiao
d698347dfd fix: narrate-then-act pattern for video scripts
Call pace() FIRST in each segment so narration finishes before actions.
Add waitForTimeout(2000) after actions for viewer comprehension.
2026-04-11 23:17:17 +00:00
snomiao
7129c2d702 fix: add demowright/video-script exports + separate videoScript tool
- Patch demowright package.json exports in CI to add ./video-script entry
- Add videoScript param to done() tool so Claude provides separate video script
- Phase 2 uses video-script.spec.ts when available
2026-04-11 21:40:02 +00:00
snomiao
2d8c71d683 feat: separate videoScript in done() tool for Phase 2
- Add videoScript optional param to done() tool
- Claude writes test code (no demowright) + video script (with demowright) separately
- Phase 2 uses video-script.spec.ts if agent provided it, else fallback
- Fixes demowright import failures in Phase 1 (test env has no demowright config)
2026-04-11 20:40:52 +00:00
snomiao
231878918d feat: teach Claude to use demowright createVideoScript
- Add createVideoScript API docs to qa-agent system prompt
- Claude now writes tests with native video scripts (title, segment, outro)
- Phase 2 detects createVideoScript in test code → skip regex injection
- Fallback: still injects annotate() for legacy tests without video scripts
2026-04-11 18:28:57 +00:00
snomiao
5228245561 fix: safe step narration + research log in report
- Replace broken annotate() code-wrapping with safe narrate-before-step
  (inserts narrate call before each step comment, no code wrapping)
- Show research-log.json and reproduce.spec.ts in report HTML
- Fix Phase 2 "No tests found" caused by syntax errors in injected code
2026-04-11 16:37:43 +00:00
snomiao
03f2e1d183 fix: intro subtitle + slower demo video
- Use annotate() instead of narrate() for intro (shows subtitle + TTS)
- Increase actionDelay from 500ms to 1500ms for human-readable pacing
- Add testcloud.comfy.org as cloud fallback server
- Fix knip ignoreDependencies
2026-04-10 23:21:52 +00:00
snomiao
a3271cfde8 fix: remove stale knip ignoreDependencies for QA deps 2026-04-10 22:29:22 +00:00
snomiao
3eb8ed0748 feat: auto-clone ComfyUI + findComfyUIDir fallback chain
Server startup: comfy-cli → TEST_COMFYUI_DIR → .comfy-qa/ComfyUI →
auto-clone from GitHub as last resort. Health check before QA run.
2026-04-10 22:27:28 +00:00
snomiao
faac186ebe feat: auto-start ComfyUI server via comfy-cli or python
Tries: comfy launch --background --cpu → python main.py --cpu →
clear error with setup instructions. Health check with 120s timeout.
2026-04-10 22:24:51 +00:00
snomiao
d96d582ce2 fix: use REST API for issue/PR detection
gh pr view --json number falsely matches issues. Use gh api
repos/{repo}/issues/{n} with has("pull_request") for reliable detection.
2026-04-10 21:35:23 +00:00
snomiao
76ef88f18b refactor: tidy QA CLI — extract functions, validate target, dedup
- Extract fetchIssue/fetchPR/resolveTarget/detectType helpers
- Move constants to top (SCRIPT_DIR, RECORD_SCRIPT, DEFAULT_REPO)
- Add VALID_TARGETS validation for -t/--target
- Shared writeTmpFile/resolveOutputDir/logHeader/exit utilities
- Dispatch via runUncommitted/runTarget for clearer flow
2026-04-10 21:14:46 +00:00
snomiao
b42fd03213 fix: show help on parse errors for agent-friendly CLI 2026-04-10 20:44:43 +00:00
snomiao
e22daf415a feat: rewrite QA CLI with node:util parseArgs
- Replace hand-written arg parser with parseArgs (zero deps)
- Add -t/--target <head|base|both> for PR phase selection
- Add --uncommitted for testing local changes
- Cleaner help output with examples for all modes
2026-04-10 20:43:22 +00:00
snomiao
86e61d7f16 feat: add --base flag for PR before/after testing
PR default: test head only (demonstrate the fix)
PR --base: also reproduce bug on base before testing head
Issue: always runs reproduce mode
2026-04-10 20:36:16 +00:00
snomiao
218e44d112 feat: auto-detect issue vs PR + save PR base/head refs
- Remove pr: prefix, just `pnpm qa 10253` auto-detects via gh API
- Fetch PR baseRefOid/headRefOid and save to refs.json
- Output dir simplified to .comfy-qa/<number>/
- Extract runQaRecord() for future before/after dual-phase PR testing
2026-04-10 20:34:11 +00:00
snomiao
14187df785 feat: simplified QA CLI + move scripts to skill folder
- Add `pnpm qa <url|number>` entry point with dotenv auto-loading
  Accepts GitHub issue/PR URLs, shorthand (10253, pr:10270)
  Auto-fetches data via `gh` CLI, outputs to .comfy-qa/
- Move qa-* scripts from scripts/ to .claude/skills/comfy-qa/scripts/
- Fix TypeScript errors: add type annotations, fix browserTestFile ref
- Add .comfy-qa/ to .gitignore
2026-04-10 20:12:08 +00:00
snomiao
d821a94c59 docs: add QA skill env vars to .env_example
GEMINI_API_KEY (required) — PR analysis, video review, TTS narration
ANTHROPIC_API_KEY (optional locally) — Claude Agent SDK auto-detects
Claude Code session; required in CI as GitHub Actions secret

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 19:54:42 +00:00
snomiao
4fd9d723e0 feat: annotate() wraps each test step with TTS narration + subtitle
Uses demowright's annotate(page, text, callback) pattern to narrate
each "// Step N: ..." block while running the actions in parallel.
The viewer hears "Step 1: Save the workflow" while seeing the save
happen. Falls back to running the code directly if annotate fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 19:50:16 +00:00
snomiao
448c76c318 feat: autoplay with sound, remove muted default
Videos now autoplay with TTS narration audible by default.
Removed muted attribute from video tags and the JS pause-on-load.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 19:47:46 +00:00
snomiao
691242afcd feat: enable demowright TTS narration + audio capture + autoAnnotate
- audio: true — captures browser audio + TTS narration
- autoAnnotate: true — auto-generates narration from test steps
- actionDelay: 500ms (was 300ms) for more readable pacing
- Gemini TTS auto-detected via GEMINI_API_KEY in register.cjs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 17:20:50 +00:00
snomiao
49efcfe36a fix: regenerate lockfile with new deps in workspace catalog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 11:27:13 +00:00
snomiao
60bf788050 fix: use withDemowright() config + add @google/generative-ai dep
- Switch from broken register.cjs to withDemowright() config wrapper
  that creates temp playwright.qa.config.ts for Phase 2 recording
- Add @google/generative-ai as explicit dev dependency (was only
  transitive, causing ESLint import-x/no-unresolved error)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 11:22:26 +00:00
snomiao
14e194ab34 fix: add --no-sandbox to PLAYWRIGHT_LOCAL launch options
Headless Chrome crashes in some environments without sandbox flags.
Only applies when PLAYWRIGHT_LOCAL=1 (QA Phase 2 recording).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 08:56:26 +00:00
snomiao
8664d2201e fix: use demowright from npm (^2.0.9) instead of local link
The local link (link:/tmp/demowright-test) doesn't exist in CI.
Changed to npm package for HUD overlay in QA test videos:
- Visible cursor with click ripples
- Keystroke display as HUD badges
- Auto-slowdown between actions
- Subtitles and TTS narration support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 08:51:10 +00:00
sno
c1f7e03c1c feat: enable demowright TTS audio capture in QA videos
- Set QA_HUD_AUDIO=1 to enable demowright audio pipeline
- Install ffmpeg in qa-before/after jobs for audio muxing
- After Phase 2, check .demowright/ for rendered MP4 with audio
- Add narration detection to Gemini video review prompt
- Increase Phase 2 timeout for TTS fetch latency

Amp-Thread-ID: https://ampcode.com/threads/T-019d7181-cd11-7255-9f13-fecc658d2751
Co-authored-by: Amp <amp@ampcode.com>
2026-04-10 08:51:10 +00:00
snomiao
10d6e93197 fix: patch demowright register.cjs to use dist/setup.mjs, simplify Phase 2
register.cjs imports ./src/setup.ts (TS source) which doesn't work
when installed as a dependency. Patch it to ./dist/setup.mjs after build.

Simplify Phase 2 back to --require demowright/register approach (avoids
ESM/CJS issues with withDemowright config).
2026-04-10 08:51:10 +00:00
snomiao
cc85e699ae feat: use demowright withDemowright() config for proper HUD integration
Uses withDemowright() to wrap a dedicated Playwright config instead of
NODE_OPTIONS register approach. This properly patches Browser.newContext
so cursor overlay, keystroke badges, and action delays work reliably.

Injects annotate() title card showing the issue title at video start.
Falls back to register approach if demowright config fails.

Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
2026-04-10 08:51:10 +00:00
snomiao
2394016036 feat: inject demowright annotate() title card into reproduce videos
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
2026-04-10 08:51:10 +00:00
snomiao
72abaefe42 chore: retrigger QA runs 2026-04-10 08:51:10 +00:00
snomiao
d5b6ceead7 fix: install demowright from GitHub repo for latest unpublished changes
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
2026-04-10 08:51:10 +00:00
snomiao
94561270ef fix: make analyze-pr non-blocking, use apt for ffmpeg
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
2026-04-10 08:51:10 +00:00
snomiao
e04d050ad3 fix: use apt-get for ffmpeg install (johnvansickle.com unreliable)
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
2026-04-10 08:51:10 +00:00
snomiao
c930bbaf8e feat: use demowright for reproduce videos, combine E2E+video verdicts
- Replace hand-rolled cursor overlay and waitForTimeout injection with
  demowright's automatic HUD (cursor, keystroke display, 300ms delay)
- Install demowright in both qa-before and qa-after CI jobs
- Always check video review verdicts (not just as fallback)
- Upgrade reproducedBy to 'both' when E2E and video agree
- Prevent false REPRODUCED from discovery/debug tests
- Ban discovery tests in agent system prompt

Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
2026-04-10 08:51:10 +00:00
snomiao
7520a47b5a feat: add automated QA pipeline with E2E test-driven bug reproduction
Three-phase pipeline triggered by labels (qa-changes, qa-full, qa-issue):
1. Research: Claude writes Playwright E2E tests to reproduce reported bugs
2. Reproduce: Deterministic replay with video recording
3. Report: Deploy results to Cloudflare Pages with badges

Key design decisions:
- Playwright assertions are source of truth (not AI vision)
- Agent has readFixture/readTest tools to discover project patterns
- Bug-specific assertions required (trivial assertions banned)
- Main branch dist cached by SHA to speed up before/after comparisons
- QA deps installed inline in CI (no package.json changes needed)

Verified across 48 runs (22 PRs + 26 issues) with 0 false positives.

Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
2026-04-10 08:51:10 +00:00
Alexander Brown
0132c77c7d test: harden 82 Playwright specs for deterministic CI runs (#10967)
## Summary

Harden 98 E2E spec files and 8 fixtures/helpers for deterministic CI
runs by replacing race-prone patterns with retry-safe alternatives.

No source code changes -- only `browser_tests/` is touched.

## Changes

- **E2E spec hardening** (98 spec files, 6 fixtures, 2 helpers):

  | Fix class | Sites | Examples |
  |-----------|-------|---------:|
| `expect(await ...)` -> `expect.poll()` | ~153 | interaction,
defaultKeybindings, workflows, featureFlags |
| `const x = await loc.count(); expect(x)` -> `toHaveCount()` | ~19 |
menu, linkInteraction, assets, bottomPanelShortcuts |
| `nextFrame()` -> `waitForHidden()` after menu clicks | ~22 |
contextMenu, rightClickMenu, subgraphHelper |
| Redundant `nextFrame()` removed | many | defaultKeybindings, minimap,
builderSaveFlow |
| `expect(async () => { ... }).toPass()` retry blocks | 5 | interaction
(graphdialog dismiss guard) |
| `force:true` removed from `BaseDialog.close()` | 1 | BaseDialog
fixture |
| ContextMenu `waitForHidden` simplified (check-then-act race removed) |
1 | ContextMenu fixture |
| Non-deterministic node order -> proximity-based selection | 1 |
interaction (toggle dom widget) |
  | Tight poll timeout (250ms) -> >=2000ms | 2 | templates |

- **Helper improvements**: Exposed locator getters on
`ComfyPage.domWidgets`, `ToastHelper.toastErrors`, and
`WorkflowsSidebarTab.activeWorkflowLabel` so callers can use retrying
assertions (`toHaveCount()`, `toHaveText()`) directly.

- **Flake pattern catalog**: Added section 7 table to
`browser_tests/FLAKE_PREVENTION_RULES.md` documenting 8 pattern classes
for reviewers and future authors.

- **Docs**: Fixed bad examples in `browser_tests/README.md` to use
`expect.poll()`.

- **Breaking**: None
- **Dependencies**: None

## Review Focus

- All fixes follow the rules in
`browser_tests/FLAKE_PREVENTION_RULES.md`
- No behavioral changes to tests -- only timing/retry strategy is
updated
- The `ContextMenu.waitForHidden` simplification removes a
swallowed-error anti-pattern; both locators now use direct `waitFor({
state: 'hidden' })`

---------

Co-authored-by: Amp <amp@ampcode.com>
Co-authored-by: github-actions <github-actions@github.com>
2026-04-09 20:50:56 -07:00
Terry Jia
63eab15c4f Range editor (#10936)
BE change https://github.com/Comfy-Org/ComfyUI/pull/13322

## Summary
Add RANGE widget for image levels adjustment       
- Add RangeEditor widget with three display modes: plain, gradient, and
histogram
- Support optional midpoint (gamma) control for non-linear midtone
adjustment
- Integrate histogram display from upstream node outputs

## Screenshots (if applicable)
<img width="1450" height="715" alt="image"
src="https://github.com/user-attachments/assets/864976af-9eb7-4dd0-9ce1-2f5d7f003117"
/>
<img width="1431" height="701" alt="image"
src="https://github.com/user-attachments/assets/7ee2af65-f87a-407b-8bf2-6ec59a1dff59"
/>
<img width="705" height="822" alt="image"
src="https://github.com/user-attachments/assets/7bcb8f17-795f-498a-9f8a-076ed6c05a98"
/>

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10936-Range-editor-33b6d73d365081089e8be040b40f6c8a)
by [Unito](https://www.unito.io)
2026-04-09 18:37:40 -07:00
Terry Jia
277ee5c32e test: add E2E tests for Load3D model upload and drag-drop and basic e2e for 3d viewer (#10957)
## Summary
Add tests verifying real model loading:
- Upload cube.obj via file chooser button
- Drag-and-drop cube.obj onto the 3D canvas
- Add data-testid to LoadingOverlay for stable test selectors.
Add tests verifying 3d viewer openning:
- Open viewer from Load3D node via expand button, verify canvas and
controls sidebar
- Cancel button closes the viewer dialog

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10957-test-add-E2E-tests-for-Load3D-model-upload-and-drag-drop-and-basic-e2e-for-3d-viewer-33c6d73d3650810c8ff8ed656a5164a6)
by [Unito](https://www.unito.io)

---------

Co-authored-by: github-actions <github-actions@github.com>
2026-04-09 21:36:07 -04:00
Dante
e8787dee9d fix: prevent node context menu from overflowing viewport on desktop (#10854)
## Summary

The node "More Options" (⋮) context menu had `md:max-h-none
md:overflow-y-visible` responsive overrides that removed the height
constraint and scrollability on desktop (768px+). When the menu had many
items (e.g., KSampler), items below the viewport fold were inaccessible
with no scrollbar.

Removed the desktop overrides so `max-h-[80vh] overflow-y-auto` applies
at all screen sizes.

- Fixes #10824

## Red-Green Verification

| Commit | CI Status | Purpose |
|--------|-----------|---------|
| `test: add failing test for node context menu viewport overflow` |
🔴 Red | Proves the test catches the bug |
| `fix: prevent node context menu from overflowing viewport on desktop`
| 🟢 Green | Proves the fix resolves the bug |

## Test Plan

- [ ] CI red on test-only commit
- [ ] CI green on fix commit
- [ ] Manual verification: zoom out to 50%, open node More Options menu,
verify last item ("Remove") is scrollable

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10854-fix-prevent-node-context-menu-from-overflowing-viewport-on-desktop-3396d73d365081989403c981847aeda6)
by [Unito](https://www.unito.io)
2026-04-10 10:31:04 +09:00
Dante
ba0bab3e50 test: add E2E tests for ManagerDialog (#10970)
## Summary
- Add Playwright E2E tests for the ManagerDialog component which had
zero test coverage
- Covers dialog opening, pack browsing, search filtering, tab
navigation, sort controls, search mode switching, error states, install
action, info panel, and close behavior
- All API calls (Algolia search, Manager installed packs, queue) are
mocked with typed responses

Part of the FixIt Burndown test coverage initiative.

## Test plan
- [ ] CI browser tests pass
- [ ] Tests validate core ManagerDialog user flows with mocked APIs
- [ ] No regressions in existing tests

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10970-test-add-E2E-tests-for-ManagerDialog-33c6d73d365081468a4ad8fc1894f05b)
by [Unito](https://www.unito.io)
2026-04-10 10:26:35 +09:00
Kelly Yang
bbb07053c4 test: add E2E tests for CanvasModeSelector toolbar component (#10934)
Adds `browser_tests/tests/canvasModeSelector.spec.ts`, covering the
canvas toolbar mode-selector component that was introduced with no E2E
coverage.
  Covers:
- Trigger button: toolbar visibility, `aria-expanded` state, icon
reflects active mode
- Popover lifecycle: open on click, close on re-click / item selection /
Escape
- Mode switching: clicking Hand/Select drives `canvas.state.readOnly`;
clicking the active item is a no-op
- ARIA state: `aria-checked` and roving `tabindex` track active mode,
including state driven by external commands
- Keyboard navigation: ArrowDown/Up with wraparound, Escape restores
focus to trigger — all using `toBeFocused()` retrying assertions
- Focus management: popover auto-focuses the checked item on open
- Keybinding integration: `H` / `V` keys update both
`canvas.state.readOnly` and the trigger icon
- Shortcut hint display: both menu items render non-empty key-sequence
hints
  22 tests across 7 `describe` groups. All selectors are ARIA-driven.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10934-test-add-E2E-tests-for-CanvasModeSelector-toolbar-component-33b6d73d3650819cb2cfdca22bf0b9a5)
by [Unito](https://www.unito.io)
2026-04-09 20:44:30 -04:00
Christian Byrne
97fca566fb fix: use || instead of ?? and server type in WebcamCapture upload path (#11000)
## Description

Fixes the WebcamCapture image upload path construction that was still
broken on cloud environments after #10220.

### Root cause

The cloud `/upload/image` endpoint returns:
```json
{ "name": "hash.png", "subfolder": "", "type": "input" }
```

The previous fix used `??` (nullish coalescing), which doesn't catch
empty strings:
- `subfolder: ""` → `"" ?? "webcam"` = `""` → path becomes `/hash.png`
(wrong)
- `type` was hardcoded as `[temp]` but cloud stores as `input` → file
not found

### Fix

- `??` → `||` so empty strings fall back to defaults
- Use `data.type` from server response instead of hardcoding `[temp]`

### QA evidence

Prod (cloud/1.42): `ImageDownloadError: the input file
'webcam/1775685296883.png [temp]' doesn't exist`
Staging (cloud/1.42): `ImageDownloadError: Failed to validate images`

### Related

- Fixes the remaining issue from #10220

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-11000-fix-use-instead-of-and-server-type-in-WebcamCapture-upload-path-33d6d73d36508156b93cfce0aae8e017)
by [Unito](https://www.unito.io)
2026-04-09 16:08:45 -07:00
Christian Byrne
c6b8883e61 [chore] Update Ingest API types from cloud@48d94b7 (#10925)
## Automated Ingest API Type Update

This PR updates the Ingest API TypeScript types and Zod schemas from the
latest cloud OpenAPI specification.

- Cloud commit: 48d94b7
- Generated using @hey-api/openapi-ts with Zod plugin

These types cover cloud-only endpoints (workspaces, billing, secrets,
assets, tasks, etc.).
Overlapping endpoints shared with the local ComfyUI Python backend are
excluded.

---------

Co-authored-by: MillerMedia <7741082+MillerMedia@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com>
2026-04-09 14:41:53 -07:00
Christian Byrne
8487c13f14 feat: integrate Typeform survey into feedback button (#10890)
## Summary

Replace Zendesk feedback URLs with Typeform survey (`q7azbWPi`) in the
action bar feedback button and Help Center menu for Cloud/Nightly
distributions.

## Changes

- **What**: 
- `cloudFeedbackTopbarButton.ts`: Replace `buildFeedbackUrl()` (Zendesk)
with direct Typeform survey URL. Remove unused Zendesk import.
- `HelpCenterMenuContent.vue`: Feedback menu item now opens Typeform URL
for Cloud/Nightly builds; falls back to `Comfy.ContactSupport` (Zendesk)
for other distributions. Added external link icon for Cloud/Nightly.
- Help menu item and `Comfy.ContactSupport` command unchanged — support
flows still route to Zendesk.

## Review Focus

- Gating logic: `isCloud || isNightly` correctly limits Typeform
redirect to intended distributions
- Help item intentionally unchanged (support ≠ feedback)

Ticket: COM-17992

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10890-feat-integrate-Typeform-survey-into-feedback-button-33a6d73d36508185abbfe57e7a36b5f7)
by [Unito](https://www.unito.io)
2026-04-09 14:40:52 -07:00
jaeone94
809da9c11c fix: use cloud assets for asset widget default value (#10983)
## Summary

In cloud mode, asset-supported nodes (e.g. CheckpointLoaderSimple) used
the server's `object_info` combo options as their default widget value.
These options list local files on the backend which may not exist in the
user's cloud asset library. When the missing-model pipeline runs (on
undo, reload, or tab switch), it checks widget values against cloud
assets and correctly flags these local-only files as missing — producing
errors that appear to be false positives but are actually valid
detections of unusable defaults.

This PR changes the default value source from server combo options to
the cloud assets store.

## Default Value Behavior (Before → After)

### Cloud + asset-supported widgets (changed)

| Condition | Before | After |
|-----------|--------|-------|
| Assets cached, `inputSpec.default` in assets | `inputSpec.default` |
`inputSpec.default` |
| Assets cached, `inputSpec.default` not in assets | `inputSpec.default`
| `assets[0]` |
| Assets cached, no `inputSpec.default`, `options` exist | `options[0]`
| `assets[0]` |
| Assets not cached, `inputSpec.default` exists | `inputSpec.default` |
`undefined` → "Select model" |
| Assets not cached, no `inputSpec.default`, `options` exist |
`options[0]` | `undefined` → "Select model" |
| Assets not cached, no `inputSpec.default`, no `options` | `undefined`
→ "Select model" | `undefined` → "Select model" |

### Cloud + non-asset widgets (unchanged)

| Condition | Behavior |
|-----------|----------|
| `inputSpec.default` exists | `inputSpec.default` |
| `options` exist | `options[0]` |
| `remote` input | `"Loading..."` |
| None | `undefined` |

### OSS (unchanged)

| Condition | Behavior |
|-----------|----------|
| `inputSpec.default` exists | `inputSpec.default` |
| `options` exist | `options[0]` |
| `remote` input | `"Loading..."` |
| None | `undefined` |

## Root Cause

1. `addComboWidget` called `getDefaultValue(inputSpec)` which returns
`inputSpec.options[0]` — a local file from `object_info`
2. In cloud mode, `shouldUseAssetBrowser()` creates an asset widget with
this local filename as default
3. The model (e.g.
`dynamicrafter/controlnet/dc-sketch_encoder_fp16.safetensors`) exists on
the server but not in the user's cloud asset library
4. On undo/reload, `verifyAssetSupportedCandidates()` checks the widget
value against cloud assets → not found → marked as missing

## Changes

### Production (`useComboWidget.ts`)
- New `resolveCloudDefault(nodeType, specDefault)` function encapsulates
cloud default resolution
- Default priority: `inputSpec.default` (if found in cloud assets) →
first cloud asset → `undefined` (shows "Select model" placeholder)
- Edge case guards: `!= null` check for falsy defaults, `|| undefined`
for empty `getAssetFilename` return
- Server combo options (`object_info`) are no longer used as defaults
for asset widgets

### Unit Tests (`useComboWidget.test.ts`)
- 6 scenarios covering all default value paths:
  - Cloud assets loaded, no `inputSpec.default` → `assets[0]`
  - Cloud assets loaded, `inputSpec.default` in assets → uses default
  - Cloud assets loaded, `inputSpec.default` not in assets → `assets[0]`
  - No cloud assets, with `inputSpec.default` → placeholder
  - No cloud assets, with server options → placeholder
  - Asset widget creation verification
- Test helper refactored: assertions moved from helper to each test for
clarity

### E2E Test (`cloud-asset-default.spec.ts`)
- New `@cloud` tagged test verifying CheckpointLoaderSimple uses first
cloud asset, not server default
- Fixture extension stubs `/api/assets` before app loads (local backend
returns 503 for this endpoint)
- Uses typed mock data from existing `assetFixtures.ts`

## Scope

- **Cloud only**: All changes gated behind `isCloud` +
`shouldUseAssetBrowser()`
- **OSS impact**: None — code path is not entered in non-cloud builds
- **Breaking changes**: None — `useComboWidget` export signature
unchanged

## Review Focus
- Should the `/api/assets` stub in the E2E fixture extension be moved
into `ComfyPage` for all `@cloud` tests?

## Record
Before 


https://github.com/user-attachments/assets/994162a0-b56a-4e84-9b1c-d0f0068196d5



After


https://github.com/user-attachments/assets/ba299990-9bd3-4565-bd09-bffac3db60a9
2026-04-09 15:44:04 +09:00
Dante
65d1313443 fix: preserve CustomCombo options through clone and paste (#10853)
## Summary

- Fix `CustomCombo` copy/paste so the combo keeps its option list and
selected value
- Scope the fix to `src/extensions/core/customWidgets.ts` instead of
changing LiteGraph core deserialization
- Replace the previous round-trip test with a regression test that
exercises the actual clone/paste lifecycle

- Fixes #9927

## Root Cause

`CustomCombo` option widgets override `value` to read from
`widgetValueStore`.
During `node.clone()` and clipboard paste, `configure()` restores widget
values before the new node is added to the graph and before those
widgets are registered in the store.
That meant the option widgets read back as empty while `updateCombo()`
was rebuilding the combo state, so `comboWidget.options.values` became
blank on the pasted node.

## Fix

Keep a local fallback value for each generated `option*` widget in
`customWidgets.ts`.
The getter now returns the store-backed value when available and falls
back to the locally restored value before store registration.
This preserves the option list during `clone().serialize()` and paste
without hard-coding `CustomCombo` behavior into
`LGraphNode.configure()`.

## Why No E2E Test

This regression happens in the internal LiteGraph clipboard lifecycle:
`clone() -> serialize() -> createNode() -> configure() -> graph.add()`.
The failing state is the transient pre-add relationship between
`CustomCombo`'s store-backed option widgets and
`comboWidget.options.values`, which is not directly exposed through a
stable DOM assertion in the current Playwright suite.
A focused unit regression test is the most direct way to cover that
lifecycle without depending on brittle canvas interaction timing.

## Test Plan

- [x] Regression test covers `clone().serialize() -> createNode() ->
configure() -> graph.add()` for `CustomCombo`
- [ ] CI on the latest two commits (`81ac6d2ce`, `94147caf1`)
- [ ] Manual: create `CustomCombo` -> add `alpha`, `beta`, `gamma` ->
select `beta` -> copy/paste -> verify the pasted combo still shows all
three options and keeps `beta` selected
2026-04-09 12:35:20 +09:00
Alexander Brown
f90d6cf607 test: migrate 132 test files from @vue/test-utils to @testing-library/vue (#10965)
## Summary

Migrate 132 test files from `@vue/test-utils` (VTU) to
`@testing-library/vue` (VTL) with `@testing-library/user-event`,
adopting user-centric behavioral testing patterns across the codebase.

## Changes

- **What**: Systematic migration of component/unit tests from VTU's
`mount`/`wrapper` API to VTL's `render`/`screen`/`userEvent` API across
132 files in `src/`
- **Breaking**: None — test-only changes, no production code affected

### Migration breakdown

| Batch | Files | Description |
|-------|-------|-------------|
| 1 | 19 | Simple render/assert tests |
| 2A | 16 | Interactive tests with user events |
| 2B-1 | 14 | Interactive tests (continued) |
| 2B-2 | 32 | Interactive tests (continued) |
| 3A–3E | 51 | Complex tests (stores, composables, heavy mocking) |
| Lint fix | 7 | `await` on `fireEvent` calls for `no-floating-promises`
|
| Review fixes | 15 | Address CodeRabbit feedback (3 rounds) |

### Review feedback addressed

- Removed class-based assertions (`text-ellipsis`, `pr-3`, `.pi-save`,
`.skeleton`, `.bg-black\/15`, Tailwind utilities) in favor of
behavioral/accessible queries
- Added null guards before `querySelector` casts
- Added `expect(roots).toHaveLength(N)` guards before indexed NodeList
access
- Wrapped fake timer tests in `try/finally` for guaranteed cleanup
- Split double-render tests into focused single-render tests
- Replaced CSS class selectors with
`screen.getByText`/`screen.getByRole` queries
- Updated stubs to use semantic `role`/`aria-label` instead of CSS
classes
- Consolidated redundant edge-case tests
- Removed manual `document.body.appendChild` in favor of VTL container
management
- Used distinct mock return values to verify command wiring

### VTU holdouts (2 files)

These files intentionally retain `@vue/test-utils` because their
components use `<script setup>` without `defineExpose`, making internal
computed properties and methods inaccessible via VTL:

1. **`NodeWidgets.test.ts`** — partial VTU for `vm.processedWidgets`
2. **`WidgetSelectDropdown.test.ts`** — full VTU for heavy
`wrapper.vm.*` access

## Follow-up

Deferred items (`ComponentProps` typing, camelCase listener props)
tracked in #10966.

## Review Focus

- Test correctness: all migrated tests preserve original behavioral
coverage
- VTL idioms: proper use of `screen` queries, `userEvent`, and
accessibility-based selectors
- The 2 VTU holdout files are intentional, not oversights

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10965-test-migrate-132-test-files-from-vue-test-utils-to-testing-library-vue-33c6d73d36508199a6a7e513cf5d8296)
by [Unito](https://www.unito.io)

---------

Co-authored-by: Amp <amp@ampcode.com>
Co-authored-by: Christian Byrne <cbyrne@comfy.org>
2026-04-08 19:21:42 -07:00
Christian Byrne
2c34d955cb feat(website): add zh-CN translations for homepage and secondary pages (#10157)
## Summary

<!-- One sentence describing what changed and why. -->

## Changes

- **What**: <!-- Core functionality added/modified -->
- **Breaking**: <!-- Any breaking changes (if none, remove this line)
-->
- **Dependencies**: <!-- New dependencies (if none, remove this line)
-->

## Review Focus

<!-- Critical design decisions or edge cases that need attention -->

<!-- If this PR fixes an issue, uncomment and update the line below -->
<!-- Fixes #ISSUE_NUMBER -->

## Screenshots (if applicable)

<!-- Add screenshots or video recording to help explain your changes -->

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10157-feat-website-add-zh-CN-translations-for-homepage-and-secondary-pages-3266d73d3650811f918cc35eca62a4bc)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
2026-04-08 19:18:19 -07:00
Christian Byrne
8b6c1b3649 refactor: consolidate SubscriptionTier type (#10487)
## Summary

Consolidate the `SubscriptionTier` type from 3 independent definitions
into a single source of truth in `tierPricing.ts`.

## Changes

- **What**: Exported `SubscriptionTier` from `tierPricing.ts`. Removed
hand-written unions from `workspaceApi.ts` (lines 80-88),
`PricingTable.vue`, and `PricingTableWorkspace.vue`. All now import from
the canonical location.
- **Files**: 4 files changed (type-only, ~5 net lines)

## Review Focus

- This is a type-only change — `pnpm typecheck` is the primary
validation
- If the OpenAPI schema ever adds tiers, there is now one place to
update

## Stack

PR 5/5: #10483#10484#10485#10486 → **→ This PR**
2026-04-08 19:17:44 -07:00
Christian Byrne
026aeb71b2 refactor: decompose MembersPanelContent into focused components (#10486)
## Summary

Decompose the 562-line `MembersPanelContent.vue` into focused
single-responsibility components.

## Changes

- **What**: Extracted `RoleBadge.vue`, `MemberListItem.vue`,
`PendingInvitesList.vue`, and `MemberUpsellBanner.vue` from
`MembersPanelContent.vue`. Added `RoleBadge.test.ts`. The parent
component is slimmed from 562 → ~120 lines.
- **Files**: 6 files changed (4 new components + 1 new test + 1
refactored)

## Review Focus

- Component boundaries — each extracted component has a clear single
responsibility
- `MembersPanelContent.vue` still orchestrates all behavior; extracted
components are presentational
- Visual QA needed: workspace settings panel should look and behave
identically

## Stack

PR 4/5: #10483#10484#10485 → **→ This PR** → #10487

---------

Co-authored-by: Alexander Brown <drjkl@comfy.org>
Co-authored-by: GitHub Action <action@github.com>
2026-04-08 18:57:11 -07:00
Alexander Brown
d96a7d2b32 fix: resolve lint/knip warnings and upgrade oxlint, oxfmt, knip (#10973)
## Changes

- Fix unsafe optional chaining warnings in 2 test files
- Promote `no-unsafe-optional-chaining` to error in oxlintrc
- Remove stale knip ignores (useGLSLRenderer, website deps, astro entry)
- Remove `vue/no-dupe-keys` from oxlintrc (removed from oxlint vue
plugin; `eslint/no-dupe-keys` covers it)
- Un-export unused `UniformSource`/`UniformSources` interfaces
- Dedupe pnpm lockfile

## Dependency Upgrades

| Package | Before | After |
|---------|--------|-------|
| knip | 6.0.1 | 6.3.1 |
| oxlint | 1.55.0 | 1.59.0 |
| oxfmt | 0.40.0 | 0.44.0 |
| eslint-plugin-oxlint | 1.55.0 | 1.59.0 |
| oxlint-tsgolint | 0.17.0 | 0.20.0 |

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10973-fix-resolve-lint-knip-warnings-and-upgrade-oxlint-oxfmt-knip-33c6d73d36508135a773f0a174471cf9)
by [Unito](https://www.unito.io)

---------

Co-authored-by: Amp <amp@ampcode.com>
2026-04-08 18:30:37 -07:00
Comfy Org PR Bot
1720aa0286 1.44.0 (#10974)
Minor version increment to 1.44.0

**Base branch:** `main`

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10974-1-44-0-33c6d73d365081d98a3bd646d3374b3b)
by [Unito](https://www.unito.io)

---------

Co-authored-by: christian-byrne <72887196+christian-byrne@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
2026-04-08 18:13:31 -07:00
Christian Byrne
c671a33182 fix(ci): resolve pnpm version conflict in version bump workflow (#10972)
## Summary

Removes hardcoded `version: 10` from `pnpm/action-setup` and instead
injects the `packageManager` field into `package.json` when absent
(legacy `core/*` branches).

## Why

PR #10952 re-added `version: 10` to fix old branches lacking
`packageManager`. But `main` now has **both** `version: 10` (workflow)
and `packageManager: pnpm@10.33.0` (`package.json`), causing
`pnpm/action-setup` to error with:

> Multiple versions of pnpm specified

Failed run:
https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/24158869559

This fix handles both cases:
- **`main`**: has `packageManager` → action reads it directly, no
conflict
- **`core/1.42` etc**: missing `packageManager` → step injects it before
the action runs

E2E test not applicable — this is a CI workflow configuration change
with no user-facing behavior.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10972-fix-ci-resolve-pnpm-version-conflict-in-version-bump-workflow-33c6d73d36508112802df75c0dd5ea50)
by [Unito](https://www.unito.io)
2026-04-08 15:13:33 -07:00
Alexander Brown
25d1ac7456 test: reorganize subgraph test suite into composable domain specs (#10759)
## Summary

Reorganize the subgraph test suite so browser tests are thin
representative user journeys while lower-level Vitest suites own
combinatorics, migration edge cases, and data-shape semantics.

## Changes

- **What**: Migrate 17 flat subgraph browser specs into 10
domain-organized specs under `browser_tests/tests/subgraph/`, move
redundant semantic coverage down to 8 Vitest owner suites, delete all
legacy flat files
- **Browser specs** (54 tests): `subgraphSlots`, `subgraphPromotion`,
`subgraphPromotionDom`, `subgraphSerialization`, `subgraphNavigation`,
`subgraphNested`, `subgraphLifecycle`, `subgraphCrud`, `subgraphSearch`,
`subgraphOperations`
- **Vitest owners** (230 tests): `SubgraphNode.test.ts` (rename/label
propagation), `subgraphNodePromotion.test.ts`,
`promotedWidgetView.test.ts`, `SubgraphSerialization.test.ts`
(duplicate-ID remap), `SubgraphWidgetPromotion.test.ts` (legacy
hydration), `subgraphNavigationStore*.test.ts` (viewport cache,
workflow-switch), `subgraphStore.test.ts` (search aliases, description)
- **Net effect**: browser suite shrinks from ~96 scattered tests to 54
focused journeys

## Review Focus

- Coverage ownership split: each browser test has a unique UI-only
failure mode; semantic coverage lives in Vitest
- `subgraphPromotionDom.spec.ts` forces LiteGraph mode and uses
`canvas.openSubgraph()` instead of `navigateIntoSubgraph()` to avoid a
wrapper-specific DOM overlay duplication issue — entry-affordance
coverage lives in `subgraphNavigation.spec.ts`
- No product code changes — test-only migration

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10759-test-reorganize-subgraph-test-suite-into-composable-domain-specs-3336d73d365081b0a56bcbf809b1f584)
by [Unito](https://www.unito.io)

---------

Co-authored-by: Amp <amp@ampcode.com>
2026-04-08 15:04:33 -07:00
Johnpaul Chiwetelu
2189172f15 fix: Add timeout and abort mechanism for image upload (#9226) (#9491)
Closes #9226

## Summary

Image uploads had no timeout or abort mechanism, meaning a stalled
upload could hang indefinitely with no user feedback. This adds a
2-minute timeout using `AbortController` and shows a user-friendly toast
message when the upload times out.

## Changes

- `src/composables/node/useNodeImageUpload.ts`: Added `AbortController`
with a 120-second timeout to the `uploadFile` function. The abort signal
is passed to `fetchApi`. In the `handleUpload` error handler,
`AbortError` is now caught separately to display a localized timeout
message.
- `src/locales/en/main.json`: Added `uploadTimedOut` i18n translation
key.

---
Automated by coderabbit-fixer

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-9491-fix-Add-timeout-and-abort-mechanism-for-image-upload-9226-31b6d73d365081d7a7d7f7016f3a71c6)
by [Unito](https://www.unito.io)

---------

Co-authored-by: CodeRabbit Fixer <coderabbit-fixer@automated.bot>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Christian Byrne <cbyrne@comfy.org>
2026-04-08 21:27:32 +00:00
Christian Byrne
9b769656ac test: add ShareWorkflow dialog E2E tests (DLG-05) (#10588)
## Summary
Adds Playwright E2E tests for the ShareWorkflow dialog component and its
various states.

## Tests added
- Dialog opens and shows unsaved state for new workflows (save prompt)
- Ready state shows create link button for saved but unpublished
workflows
- Shared state shows copy URL field with share link after publishing
- Stale state shows update link button when workflow modified after
publishing
- Close button dismisses dialog
- Create link transitions dialog from ready to shared state
- Tab switching between share link and publish to hub (when
comfyHubUploadEnabled)
- Tab aria-selected states update correctly on switch

## Approach
- Share dialog is gated behind `isCloud` (compile-time constant), so
tests invoke it directly via `page.evaluate()` importing
`useShareDialog`
- Share service API calls (`/api/userdata/*/publish`,
`/api/assets/from-workflow`) mocked via `page.route()` for deterministic
state testing
- Dialog state (loading → unsaved → ready → shared → stale) controlled
by mock responses
- Feature flags set via `serverFeatureFlags.value` for tab visibility
testing

## Notes
- All pre-existing TS2307 errors are `.vue` module resolution — no new
type errors
- Tests cover the 5 dialog states: loading, unsaved, ready, shared,
stale

## Task
Part of Test Coverage Q2 Overhaul (DLG-05).

## Conventions
- Uses Vue nodes with new menu enabled
- Tests read as user stories
- No full-page screenshots
- Proper waits, no sleeps
- All API calls mocked

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10588-test-add-ShareWorkflow-dialog-E2E-tests-DLG-05-3306d73d365081a0ab15f333707e493b)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
2026-04-08 14:14:47 -07:00
Christian Byrne
934f1487bd feat: add Comfy Design Standards Figma reference for agents (#10696)
## Summary

Add design standards instructions so agents consult the canonical [Comfy
Design
Standards](https://www.figma.com/design/QreIv5htUaSICNuO2VBHw0/Comfy-Design-Standards)
Figma file before implementing user-facing features.

### Changes

- **`docs/guidance/design-standards.md`** — Auto-loaded guidance for
`src/components/**/*.vue` and `src/views/**/*.vue` with Figma MCP fetch
instructions, section node IDs, and component set references
- **`AGENTS.md`** — Added Design Standards section and Figma link in
External Resources

### Design

All content is fetched **live from Figma** via the Figma MCP tool —
designers can update the file and agents will always see the latest
version. No hardcoded design rules that can go stale.

Ref: COM-17639

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10696-feat-add-Comfy-Design-Standards-Figma-reference-for-agents-3326d73d36508181844fdcaa5c17cf00)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
2026-04-08 14:13:57 -07:00
Christian Byrne
6f98fe5ba7 docs: add staging environment setup instructions to CONTRIBUTING.md (#10775)
## Summary

Add a "Testing with Cloud & Staging Environments" section to
CONTRIBUTING.md documenting how to test partner/API nodes that require
cloud backend authentication.

## Changes

- **What**: New section in CONTRIBUTING.md between "Dev Server" and
"Access dev server on touch devices" explaining two approaches for
staging/cloud development:
1. Frontend approach: `pnpm dev:cloud` or custom
`DEV_SERVER_COMFYUI_URL` in `.env`
2. Backend approach: `--comfy-api-base https://stagingapi.comfy.org`
flag

## Review Focus

- Accuracy of the `--comfy-api-base` backend flag documentation (sourced
from internal Slack discussion)
- Whether the section placement and level of detail is appropriate

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10775-docs-add-staging-environment-setup-instructions-to-CONTRIBUTING-md-3346d73d36508112bcd4df6ecb7f83c6)
by [Unito](https://www.unito.io)
2026-04-08 14:11:01 -07:00
Rizumu Ayaka
44c3d08b56 perf: add preload and content-visibility attribute to media preview for improved performance (#10806)
## Summary

this pull request is to improve the performance of large workflows
containing numerous Media Previews.

Added content-visibility: auto, which enables the browser to lazily
render Media Previews outside the viewport.

Added preload="metadata", which makes <video> and <audio> elements only
preload metadata instead of the full content.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10806-perf-add-preload-and-content-visibility-attribute-to-media-preview-for-improved-perfor-3356d73d365081238ce8f1a82f8694ec)
by [Unito](https://www.unito.io)
2026-04-08 14:08:47 -07:00
Dante
537e4bc4f2 test: add E2E tests for settings dialog (#10797)
## Summary
- Adds Playwright E2E tests for the Settings dialog covering behaviors
not tested elsewhere:
- About panel displays version badges (navigates to About, verifies
content)
- Boolean setting toggle through the UI persists the value (uses search
+ ToggleSwitch click, verifies via API)
- Dialog can be closed via the close button (complements existing
Escape-key test)

## Test plan
- [ ] `pnpm test:browser:local -- --grep "Settings dialog"` passes
- [ ] No overlap with existing tests in `dialog.spec.ts` or
`useSettingSearch.spec.ts`

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10797-test-add-E2E-tests-for-settings-dialog-3356d73d365081e4a3adcd3979048444)
by [Unito](https://www.unito.io)
2026-04-09 05:29:53 +09:00
pythongosssss
4b0b8e7240 test: App mode - Welcome screen state (#10747)
## Summary

Adds tests for validating welcome screen state

## Changes

- **What**: 
- add clear graph util function

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10747-test-App-mode-Welcome-screen-state-3336d73d365081f0ba27d567c3c81505)
by [Unito](https://www.unito.io)
2026-04-08 12:33:51 -07:00
Alexander Brown
3b78dfbe1c test: migrate browser_tests/ to @e2e/ path alias and add lint rule (#10958)
## Summary

Complete the @e2e/ path alias migration started in #10735 by converting
all 354 remaining relative imports and adding a lint rule to prevent
backsliding.

## Changes

- **What**: Migrate all relative imports in browser_tests/ to use
`@e2e/` (intra-directory) and `@/` (src/ imports) path aliases. Add
`no-restricted-imports` ESLint rule banning `./` and `../` imports in
`browser_tests/**/*.ts`. Suppress pre-existing oxlint `no-eval` and
`no-console` warnings exposed by touching those files.

## Review Focus

- ESLint flat-config merging: the `@playwright/test` ban and
relative-import ban are in two separate blocks to avoid last-match-wins
collision with the `useI18n`/`useVirtualList` blocks higher in the
config.
- The `['./**', '../**']` glob patterns (not `['./*', '../*']`) are
needed to catch multi-level relative paths like `../../../src/foo`.

Follows up on #10735

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10958-test-migrate-browser_tests-to-e2e-path-alias-and-add-lint-rule-33c6d73d365081649d1be771eac986fd)
by [Unito](https://www.unito.io)

Co-authored-by: Amp <amp@ampcode.com>
2026-04-08 11:28:59 -07:00
pythongosssss
036be1c7e9 test: App mode - Pruning tests (#10805)
## Summary

Adds tests that deleted nodes automatically remove selections from app
mode

## Changes

- **What**: 
- always prune when entering app builder (fix)
- add tests (delete output node, delete input node, change dynamic
widget value)

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10805-test-App-mode-Pruning-tests-3356d73d365081bcb12fc226af31a724)
by [Unito](https://www.unito.io)
2026-04-08 10:42:47 -07:00
Terry Jia
b494392265 fix: record audio node not releasing microphone in Vue mode (#10829)
## Summary

Root cause: setting modelValue on recording completion triggers
NodeWidgets' updateHandler → widget.callback → litegraph recording
handler (getUserMedia), opening a second mic stream never cleaned up.

Fix:
- Stop writing to modelValue (keep defineModel to absorb parent v-model
without triggering the callback)
- On recording complete, set blob URL on litegraph audioUI DOM element
instead of uploading — let the original serializeValue (uploadAudio.ts)
handle upload at serialization time
- Remove registerWidgetSerialization to stop overriding litegraph's
serializeValue
- Move cleanup() before async onRecordingComplete in useAudioRecorder
- Dispose waveform AudioContext on stop

## Screenshots (if applicable)
before


https://github.com/user-attachments/assets/1e464ea1-53ed-44e2-973b-97eebc63fb76


after

https://github.com/user-attachments/assets/badc8a3f-0761-43bd-a899-d8924f413028

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10829-fix-record-audio-node-not-releasing-microphone-in-Vue-mode-3366d73d36508106b4a4dda31501ec4d)
by [Unito](https://www.unito.io)

---------

Co-authored-by: Alexander Brown <drjkl@comfy.org>
2026-04-08 04:31:40 -04:00
jaeone94
3f375bea9c test: comprehensive E2E tests for error dialog, overlay, and errors tab (#10848)
## Summary

Comprehensive Playwright E2E tests for the error systems: ErrorDialog,
ErrorOverlay, and the errors tab (missing nodes, models, media,
execution errors).

## Changes

- **What**:
- **ErrorDialog** (`errorDialog.spec.ts`, 7 tests): configure/prompt
error triggers, Show Report, Copy to Clipboard, Find Issues on GitHub,
Contact Support
- **ErrorOverlay** (`errorOverlay.spec.ts`, 12 tests): error count
labels, per-type button labels (missing nodes/models/media/multiple),
See Errors flow (open panel, dismiss, close), undo/redo persistence
- **Errors tab — common** (`errorsTab.spec.ts`, 3 tests): tab
visibility, search/filter execution errors
- **Errors tab — Missing nodes** (`errorsTabMissingNodes.spec.ts`, 5
tests): MissingNodeCard, packs group, expand/collapse, locate button
- **Errors tab — Missing models** (`errorsTabMissingModels.spec.ts`, 6
tests): group display, model name, expand/referencing nodes, clipboard
copy, OSS Copy URL/Download buttons
- **Errors tab — Missing media** (`errorsTabMissingMedia.spec.ts`, 7
tests): migrated from `missingMedia.spec.ts` with detection,
upload/library/cancel flows, locate
- **Errors tab — Execution** (`errorsTabExecution.spec.ts`, 2 tests):
Find on GitHub/Copy buttons, runtime error panel
- **Shared helpers**: `ErrorsTabHelper.ts` (openErrorsTabViaSeeErrors),
`clipboardSpy.ts` (interceptClipboardWrite/getClipboardText)
- **Component changes**: added `data-testid` to
`ErrorDialogContent.vue`, `FindIssueButton.vue`, `MissingModelRow.vue`,
`MissingModelCard.vue`
  - **Selectors**: registered all new test IDs in `selectors.ts`
- **Test assets**: `missing_nodes_and_media.json` (compound errors),
`missing_models_with_nodes.json` (expand/locate)
- **Migrations**: error tests from `dialog.spec.ts` → dedicated files,
`errorOverlaySeeErrors.spec.ts` → `errorOverlay.spec.ts`,
`missingMedia.spec.ts` → `errorsTabMissingMedia.spec.ts`

## Review Focus

- OSS tests (`@oss` tag) verify Download/Copy URL buttons appear for
models with embedded URLs.
- The `missing_models.json` fixture must remain without nodes — adding
`CheckpointLoaderSimple` nodes causes directory mismatch in
`enrichWithEmbeddedMetadata` that prevents URL enrichment. A separate
`missing_models_with_nodes.json` fixture is used for expand/locate
tests.

## Cloud tests deferred

Missing model cloud environment tests (`@cloud` tag — hidden buttons,
import-unsupported notice) are deferred to a follow-up PR. The
`comfyPage` fixture cannot bypass the Firebase auth guard in cloud
builds, causing `window.app` initialization timeout. A separate infra PR
is needed to add cloud auth bypass to the fixture.

## Bug Discovery

During testing, a bug was found where the **Locate button for missing
nodes in subgraphs fails on initial workflow load**.
`collectMissingNodes` in `loadGraphData` captures execution IDs using
pre-`configure()` JSON node IDs, but `configure()` triggers subgraph
node ID deduplication (PR #8762, always-on since PR #9510) which remaps
colliding IDs. This will be addressed in a follow-up PR.

- Fixes #10847 (tracked, fix pending in separate PR)

## Testing

- 42 new/migrated E2E tests across 8 spec files
- All OSS tests pass locally and in CI

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10848-test-comprehensive-E2E-tests-for-error-dialog-overlay-and-errors-tab-3386d73d36508137a5e4cec8b12fa2fa)
by [Unito](https://www.unito.io)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:45:17 +09:00
Dante
c084089fc8 fix: update upload dialog status when async download completes (#10838)
## Summary

- Upload Model dialog stays stuck in "Processing" after async download
completes because the wizard never watches for task completion
- `useUploadModelWizard.ts` sets `uploadStatus = 'processing'` but has
no watcher linking it to `assetDownloadStore.lastCompletedDownload`
- Added a `watch` on `lastCompletedDownload` that transitions
`uploadStatus` to `'success'` when the tracked task finishes

- Fixes #10609

## Root Cause

`uploadModel()` (line 249) sets `uploadStatus = 'processing'` when the
async task starts, but control flow ends there. The `assetDownloadStore`
receives WebSocket completion events and updates
`lastCompletedDownload`, but the wizard never watches this reactive
state.

## Fix

Added a `watch` inside the async branch that monitors
`assetDownloadStore.lastCompletedDownload`. When the completed task ID
matches the upload's task ID, it transitions `uploadStatus` from
`'processing'` to `'success'` and refreshes model caches. The watcher
auto-disposes after firing.

## Red-Green Verification

| Commit | CI Status | Purpose |
|--------|-----------|---------|
| `test: add failing test for upload dialog stuck in processing state` |
🔴 Red | Proves the test catches the bug |
| `fix: update upload dialog status when async download completes` |
🟢 Green | Proves the fix resolves the bug |

## Test Plan

- [x] CI red on test-only commit
- [x] CI green on fix commit
- [x] Unit test: `updates uploadStatus to success when async download
completes`
- [ ] Manual: Import model via URL → verify dialog transitions from
Processing to Success (requires `--enable-assets`)

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10838-fix-update-upload-dialog-status-when-async-download-completes-3376d73d365081b6a5b1d2be3804ce4b)
by [Unito](https://www.unito.io)
2026-04-08 13:09:02 +09:00
Alexander Brown
4cb83353cb test: stabilize flaky Playwright tests (#10817)
Stabilize flaky Playwright tests by improving test reliability.

This PR aims to identify and fix flaky e2e tests.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10817-test-stabilize-flaky-Playwright-tests-3366d73d365081ada40de73ce11af625)
by [Unito](https://www.unito.io)

---------

Co-authored-by: Amp <amp@ampcode.com>
2026-04-07 19:47:27 -07:00
Terry Jia
d73c4406ed test: add basic E2E tests for Load3D node (#10731)
## Summary
Add Playwright tests covering widget rendering, controls menu
interaction, background color change, and recording controls visibility.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10731-test-add-basic-E2E-tests-for-Load3D-node-3336d73d36508194bff9eb2a7c9356b9)
by [Unito](https://www.unito.io)
2026-04-07 22:41:12 -04:00
Dante
ccdde8697c test: add E2E regression tests for workflow tab save bug (#10815)
## Summary

- Adds E2E regression tests for the bug fixed in PR #10745 where closing
an inactive workflow tab would save the active workflow's content into
the closing tab's file
- Three test scenarios covering the full range of the bug surface:
  1. Closing an unmodified inactive tab preserves both workflows
2. Closing a modified inactive tab with "Save" preserves its own content
(not the active tab's)
3. Closing an unsaved inactive tab with "Save As" preserves its own
content

## Linked Issues

- Regression coverage for #10745 / Comfy-Org/ComfyUI#13230

## Test plan

- [ ] `pnpm exec playwright test
browser_tests/tests/topbar/workflowTabSave.spec.ts` passes against the
current main (with the fix)

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10815-test-add-E2E-regression-tests-for-workflow-tab-save-bug-3366d73d365081eab409ed303620a959)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
2026-04-08 10:40:12 +09:00
Christian Byrne
194baf7aee fix(ci): restore pnpm version input for core/* branch compatibility (#10952)
## Summary

Restores `version: 10` to the `pnpm/action-setup` step in
`release-version-bump.yaml`.

## Why

PR #10687 removed the explicit `version: 10` input, relying on
`packageManager` in `package.json` instead. However, this workflow
checks out a **target branch** (e.g. `core/1.42`, `core/1.41`) that
predates #10687 and lacks the `packageManager` field — causing the
action to fail with:

> Error: No pnpm version is specified.

Failed run:
https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/24110739586

Adding `version: 10` back is safe — the action only errors when
`version` and `packageManager` **conflict**, and `pnpm@10.x` is
compatible.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10952-fix-ci-restore-pnpm-version-input-for-core-branch-compatibility-33c6d73d3650819282bbf6c194d0d2f1)
by [Unito](https://www.unito.io)
2026-04-07 17:45:42 -07:00
Comfy Org PR Bot
5770837e07 1.43.15 (#10951)
Patch version increment to 1.43.15

**Base branch:** `main`

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10951-1-43-15-33c6d73d36508183b029ccd217a8403d)
by [Unito](https://www.unito.io)

Co-authored-by: christian-byrne <72887196+christian-byrne@users.noreply.github.com>
2026-04-07 17:24:14 -07:00
Johnpaul Chiwetelu
4078f8be8f test: add E2E test for subgraph duplicate independent widget values (#10949)
## Summary
- Add E2E test verifying duplicated subgraphs maintain independent
widget values (convert CLIP node to subgraph, duplicate, set different
text in each, assert no bleed)
- Extract `clickMenuItemExact` and `openForVueNode` into `ContextMenu`
fixture for reuse across Vue node tests
- Refactor `contextMenu.spec.ts` to delegate to the new fixture methods

## Test plan
- [x] `pnpm typecheck:browser` passes
- [x] `pnpm lint` passes
- [x] New test passes locally (`pnpm test:browser:local --
browser_tests/tests/subgraph/subgraphDuplicateIndependentValues.spec.ts`)

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10949-test-add-E2E-test-for-subgraph-duplicate-independent-widget-values-33b6d73d3650818191c1f78ef8db4455)
by [Unito](https://www.unito.io)
2026-04-08 00:20:23 +01:00
Christian Byrne
ff0453416a ci: pass CODECOV_TOKEN and add codecov.yml for PR comments (#10774)
## What

Follow-up to #10575. Pass `CODECOV_TOKEN` secret to codecov upload
action and add `codecov.yml` config so Codecov posts coverage diff
comments on PRs.

## Changes

- `ci-tests-unit.yaml`: add `token: ${{ secrets.CODECOV_TOKEN }}`
- `codecov.yml`: configure PR comment layout (header, diff, flags,
files)

## Manual Step Required

An admin needs to add the `CODECOV_TOKEN` secret to the repo:

1. Go to [codecov.io](https://app.codecov.io) → sign in → find
`Comfy-Org/ComfyUI_frontend` → Settings → General → copy the Repository
Upload Token
2. Go to [repo
secrets](https://github.com/Comfy-Org/ComfyUI_frontend/settings/secrets/actions)
→ New repository secret → name: `CODECOV_TOKEN`, value: the token

## Testing

Config-only change. Once the secret is added, the next PR will get a
Codecov coverage comment.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10774-ci-pass-CODECOV_TOKEN-and-add-codecov-yml-for-PR-comments-3346d73d36508169bac5e61eecc94063)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
2026-04-07 15:52:34 -07:00
Benjamin Lu
fd9c67ade8 test: type asset helper update payload (#10751)
What changed

- Typed the `PUT /assets/:id` mock body in `AssetHelper` as
`AssetUpdatePayload` instead of treating it as an untyped record.

Why

- Keeps the mock aligned with the frontend update contract used by the
asset service.
- Narrows the helper without changing behavior, so follow-up typing work
can build on a smaller base.

Validation

- `pnpm typecheck`
- `pnpm typecheck:browser`

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10751-test-type-asset-helper-update-payload-3336d73d365081fd8f1bc0c7c49c5ddb)
by [Unito](https://www.unito.io)
2026-04-07 14:26:56 -07:00
Christian Byrne
83f4e7060a test(infra): AssetHelper with builder pattern + deterministic fixtures (#10545)
## What

Adds `AssetHelper` — a builder-pattern helper for mocking asset-related
API endpoints in Playwright E2E tests, plus deterministic fixture data.

## Why

12+ asset-related API endpoints need mocking for asset browser tests
(PNL-02), cloud dialog testing (DLG-08), and other asset-dependent E2E
scenarios. Random mock data from existing `createMockAssets()` is
unsuitable for deterministic E2E assertions.

## What's included

### `AssetHelper.ts` (307 LOC)
- Fluent builder API: `assetHelper.withModels(3).withImages(5).mock()`
- Stateful mock store (Map) for upload→verify→delete flows
- Endpoint coverage: GET/POST/PUT/DELETE `/assets`, download progress
- `mockError()` for error state testing
- `clearMocks()` cleanup matching QueueHelper/FeatureFlagHelper pattern

### `assetFixtures.ts` (304 LOC)
- 11 stable named constants (checkpoints, loras, VAE, embedding, inputs,
outputs)
- Factory functions: `generateModels()`, `generateInputFiles()`,
`generateOutputAssets()`
- Fixed IDs/dates/sizes — no randomness, safe for screenshot comparisons

### ComfyPage integration
- Available as `comfyPage.assets` in all tests

## Testing
- TypeScript compiles clean
- Follows existing QueueHelper/FeatureFlagHelper conventions

## Unblocks
- PNL-02: Asset browser tests (@Jaewon Yoon)
- DLG-08: Assets modal / cloud dialog testing

Part of: Test Coverage Q2 Overhaul

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10545-test-infra-AssetHelper-with-builder-pattern-deterministic-fixtures-32f6d73d365081d3985ef079ff3dbede)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
2026-04-07 14:23:36 -07:00
AustinMroz
31c789c242 Support svg outputs in assets panel (#10470)
A trivial fix to support svg outputs in the assets panel

| Before | After |
| ------ | ----- |
| <img width="360" alt="before"
src="https://github.com/user-attachments/assets/2fca84f7-40f1-4966-b3dd-96facb8a4067"
/> | <img width="360" alt="after"
src="https://github.com/user-attachments/assets/cad1a9fc-f511-43bc-8895-80d931baad1c"
/>|

Note: SVG do not display on cloud in node, or the assets panel because
they are being served with the incorrect content-type of `text/plain`

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10470-Support-svg-outputs-in-assets-panel-32d6d73d3650815fbba1fe788297e715)
by [Unito](https://www.unito.io)
2026-04-07 13:53:37 -07:00
pythongosssss
1b05927ff4 test: App mode - setting widget value test (#10746)
## Summary

Adds a test for setting various types of widgets in app mode, then
validating the /prompt API is called with the expected values

## Changes

- **What**: 
- extract duplicated enableLinearMode
- add AppModeWidgetHelper for setting values

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10746-test-App-mode-setting-widget-value-test-3336d73d365081739598fb5280d0127e)
by [Unito](https://www.unito.io)
2026-04-07 13:53:01 -07:00
AustinMroz
97853aa8b0 Update autogrow to always show one optional beyond min (#10748)
It can be a little unclear that autogrow inputs will add more slots as
connections are made. To assist with this, the first optional input
beyond the minimum is always displayed. This ensures users always see a
slot with an optional indicator.

| Before | After |
| ------ | ----- |
| <img width="360" alt="before"
src="https://github.com/user-attachments/assets/1dd1241e-c6a4-46a6-a0b9-08b568decd10"
/> | <img width="360" alt="after"
src="https://github.com/user-attachments/assets/79650f9a-7cc6-4484-83a3-2b25e2f1af33"
/>|

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10748-Update-autogrow-to-always-show-one-optional-beyond-min-3336d73d36508184bcc6c79381a62436)
by [Unito](https://www.unito.io)
2026-04-07 12:53:32 -07:00
AustinMroz
ac922fe6aa Remove flex styling from svg (#10941)
This cleans up a minor warning message from the dev server.:
` WARN  Removing unexpected style on "svg": flex`

The icon is displayed in the header of some templates and is not
visually impacted by the change.
<img width="268" height="365" alt="image"
src="https://github.com/user-attachments/assets/a0c03e85-5275-4671-b903-8458d7ba3517"
/>

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10941-Remove-flex-styling-from-svg-33b6d73d36508113a7dce77c269fe4cb)
by [Unito](https://www.unito.io)
2026-04-07 12:31:21 -07:00
pythongosssss
6f8e58bfa5 test: App Mode - widget reordering (#10685)
## Summary

Adds tests that simulate the user sorting the inputs and ensures they
are persisted and visible in app mode

## Changes

- **What**: 
- rework input/output selection helpers to accept widget/node names
- extract additional shared helpers

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10685-test-App-Mode-widget-reordering-3316d73d3650813b8348da5f29fd01f8)
by [Unito](https://www.unito.io)
2026-04-07 11:56:43 -07:00
Arthur R Longbottom
6cd3b59d5f fix: don't override loadGraphData viewport on cache miss (#10810)
## Summary

Fix regression from #10247 where template workflows (e.g. LTX2.3) loaded
with a broken viewport.

## Problem

`restoreViewport()` called `fitView()` on every cache miss via rAF. This
raced with `loadGraphData`'s own viewport restore (`extra.ds` for saved
workflows, or its own `fitView()` for templates at line 1266 of app.ts).
The second `fitView()` overwrote the correct viewport, causing templates
with subgraphs to display incorrectly.

## Fix

On cache miss, check if any nodes are already visible in the current
viewport before calling `fitView()`. If `loadGraphData` already
positioned things correctly, we don't override it. Only intervene when
the viewport is genuinely empty (first visit to a subgraph with no prior
cached state AND no loadGraphData restore).

## Review Focus

Single-file change in `subgraphNavigationStore.ts`. The visibility check
mirrors the same pattern used in `app.ts:1272-1281` where loadGraphData
itself checks for visible nodes.

## E2E Regression Test

The existing Playwright tests in
`browser_tests/tests/subgraphViewport.spec.ts` (added in #10247) already
cover viewport restoration after subgraph navigation. The specific
regression (template load viewport race) is not practically testable in
E2E because:
1. Template loading requires the backend's template API which returns
different templates per environment
2. The race condition depends on exact timing between `loadGraphData`'s
viewport restore and the rAF-deferred `restoreViewport` — Playwright
cannot reliably reproduce frame-level timing races
3. The fix is a guard condition (skip fitView if nodes visible) that
makes the behavior idempotent regardless of timing

## Alternative to #10790

This can replace the full revert in #10790 — it preserves the viewport
persistence feature while fixing the template regression.

Fixes regression from #10247
2026-04-07 10:01:54 -07:00
pythongosssss
0b83926c3e fix: Ensure zero uuid root graphs get assigned a valid id (#10825)
## Summary

Fixes an issue where handlers would be leaked causing Vue node rendering
to be corrupted (Vue nodes would not render) due to the
00000000-0000-0000-0000-000000000000 ID being used on the root graph.

## Changes

- **What**: 
- LGraph clear() skips store cleanup for the zero uuid, leaking handlers
that cause the node manager/handlers to be overwritten during operations
such as undo due to stale onNodeAdded hooks
- Ensures that graph configuration assigns a valid ID for root graphs

## Screenshots (if applicable)

Before fix, after doing ctrl+z after entering subgraph
<img width="1011" height="574" alt="image"
src="https://github.com/user-attachments/assets/1ff4692b-b961-4777-bf2d-9b981e311f91"
/>

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10825-fix-Ensure-zero-uuid-root-graphs-get-assigned-a-valid-id-3366d73d3650817d8603c71ffb5e5742)
by [Unito](https://www.unito.io)

---------

Co-authored-by: jaeone94 <89377375+jaeone94@users.noreply.github.com>
Co-authored-by: Alexander Brown <drjkl@comfy.org>
2026-04-07 08:50:13 -07:00
Dante
858946b0f5 fix: use getAssetFilename in ModelInfoPanel filename field (#10836)
# Summary

* Model Info sidebar panel displays `asset.name` (registry name) instead
of the actual filename from `user_metadata.filename`
* Other UI components (asset cards, widgets, missing model scan)
correctly use `getAssetFilename()` which prefers
`user_metadata.filename` over `asset.name`
* One-line template fix: `{{ asset.name }}` → `{{
getAssetFilename(asset) }}`
* Fixes #10598 

# Bug

`ModelInfoPanel.vue:35` used raw `asset.name` for the "File Name" field.
When `user_metadata.filename` differs from `asset.name` (e.g. registry
name vs actual path like `checkpoints/v1-5-pruned.safetensors`), users
see inconsistent filenames across the UI.

# AS-IS / TO-BE

<img width="800" height="600" alt="before-after-10598"
src="https://github.com/user-attachments/assets/15beb6c8-4bad-4ed2-9c85-6f8c7c0b6d3e"
/>


| | File Name field shows |
| :--- | :--- |
| **AS-IS** (bug) | `sdxl-lightning-4step` — raw `asset.name` (registry
display name) |
| **TO-BE** (fix) | `checkpoints/sdxl_lightning_4step.safetensors` —
`getAssetFilename(asset)` (actual file path) |

# Red-Green Verification

| Commit | CI Status | Purpose |
| :--- | :--- | :--- |
| `test: add failing test for ModelInfoPanel showing wrong filename` | 🔴
Red | Proves the test catches the bug |
| `fix: use getAssetFilename in ModelInfoPanel filename field` | 🟢 Green
| Proves the fix resolves the bug |

# Test Plan

- [x] CI red on test-only commit
- [x] CI green on fix commit
- [x] Unit test: `prefers user_metadata.filename over asset.name for
filename field`
- [ ] Manual: open Asset Browser → click a model → verify File Name in
Model Info panel matches the actual file path (requires
`--enable-assets`)
2026-04-07 21:26:47 +09:00
Christian Byrne
c5b183086d test: add unit tests for commandStore, extensionStore, widgetStore (STORE-04) (#10647)
## Summary

Adds 43 unit tests covering three priority Pinia stores that previously
had zero test coverage.

### commandStore (18 tests)
- `registerCommand` / `registerCommands` — single and batch
registration, duplicate warning
- `getCommand` — retrieval and undefined for missing
- `execute` — successful execution, metadata passing, error handler
delegation, missing command error
- `isRegistered` — presence check
- `loadExtensionCommands` — extension command registration with source,
skip when no commands
- `ComfyCommandImpl` — label/icon/tooltip resolution (string vs
function), menubarLabel defaulting

### extensionStore (16 tests)
- `registerExtension` — name validation, duplicate detection, disabled
extension warning
- `isExtensionEnabled` / `loadDisabledExtensionNames` — enable/disable
lifecycle
- Always-disabled hardcoded extensions (pysssss.Locking,
pysssss.SnapToGrid, pysssss.FaviconStatus, KJNodes.browserstatus)
- `enabledExtensions` — computed filter
- `isExtensionReadOnly` — hardcoded list check
- `inactiveDisabledExtensionNames` — ghost extension tracking
- Core extension capture and `hasThirdPartyExtensions` detection

### widgetStore (9 tests)
- Core widget availability via `ComfyWidgets`
- Custom widget registration and core/custom precedence
- `inputIsWidget` for both v1 array and v2 object InputSpec formats

## Part of
Test Coverage Q2 Overhaul — Phase 5 (Unit & Component Tests)

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10647-test-add-unit-tests-for-commandStore-extensionStore-widgetStore-STORE-04-3316d73d365081e0b4f6ce913130e489)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: Amp <amp@ampcode.com>
Co-authored-by: Alexander Brown <drjkl@comfy.org>
Co-authored-by: Alexander Brown <448862+DrJKL@users.noreply.github.com>
2026-04-07 08:42:16 +00:00
Comfy Org PR Bot
5872885cc5 1.43.14 (#10928)
Patch version increment to 1.43.14

**Base branch:** `main`

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10928-1-43-14-33b6d73d365081158ba6ec0814165770)
by [Unito](https://www.unito.io)

---------

Co-authored-by: christian-byrne <72887196+christian-byrne@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Christian Byrne <cbyrne@comfy.org>
2026-04-06 23:18:53 -07:00
jaeone94
a8d23275d9 fix: prevent subgraph node position corruption during graph transitions (#10828)
## Summary

Fix subgraph internal node positions being permanently corrupted when
entering a subgraph after a draft workflow reload. The corruption
accumulated across page refreshes, causing nodes to progressively drift
apart or compress together.

## Changes

- **What**: In the `ResizeObserver` callback
(`useVueNodeResizeTracking.ts`), node positions are now read from the
Layout Store (source of truth, initialized from LiteGraph) instead of
reverse-converting DOM screen coordinates via `getBoundingClientRect()`
+ `clientPosToCanvasPos()`. The fallback to DOM-based conversion is
retained only for nodes not yet present in the Layout Store.

## Root Cause

`ResizeObserver` was using `getBoundingClientRect()` to get DOM element
positions, then converting them to canvas coordinates via
`clientPosToCanvasPos()`. This conversion depends on the current
`canvas.ds.scale` and `canvas.ds.offset`.

During graph transitions (e.g., entering a subgraph from a draft-loaded
workflow), the canvas viewport was stale — it still had the **parent
graph's zoom level** because `fitView()` hadn't run yet (it's scheduled
via `requestAnimationFrame`). The `ResizeObserver` callback fired before
`fitView`, converting DOM positions using the wrong scale/offset, and
writing the corrupted positions to the Layout Store. The `useLayoutSync`
writeback then permanently overwrote the LiteGraph node positions.

The corruption accumulated across sessions:
1. Load workflow → enter subgraph → `ResizeObserver` writes corrupted
positions
2. Draft auto-saves the corrupted positions to localStorage
3. Page refresh → draft loads with corrupted positions → enter subgraph
→ positions corrupted further
4. Each cycle amplifies the drift based on the parent graph's zoom level

This is the same class of bug that PR #9121 fixed for **slot** positions
— the DOM→canvas coordinate conversion is inherently fragile during
viewport transitions. This PR applies the same principle to **node**
positions.

## Why This Only Affects `main` (No Backport Needed)

This bug requires two features that only exist on `main`, not on
`core/1.41` or `core/1.42`:

1. **PR #10247** changed `subgraphNavigationStore`'s watcher to `flush:
'sync'` and added `requestAnimationFrame(fitView)` on viewport cache
miss. This creates the timing window where `ResizeObserver` fires before
`fitView` corrects the canvas scale.
2. **PR #6811** added hash-based subgraph auto-entry on page load, which
triggers graph transitions during the draft reload flow.

On 1.41/1.42, `restoreViewport` does nothing on cache miss (no `fitView`
scheduling), and the watcher uses default async flush — so the
`ResizeObserver` never runs with a stale viewport.

## Review Focus

- The core change is small: use `nodeLayout.position` (already in the
Layout Store from `initializeFromLiteGraph`) instead of computing
position from `getBoundingClientRect()`. This eliminates the dependency
on canvas scale/offset being up-to-date during `ResizeObserver`
callbacks.
- The fallback path (`getBoundingClientRect` → `clientPosToCanvasPos`)
is retained for nodes not yet in the Layout Store (e.g., first render of
a newly created node). At that point the canvas transform is stable, so
the conversion is safe.
- Unit tests updated to reflect that position is no longer overwritten
from DOM when Layout Store already has the position.
- E2E test added: load subgraph workflow → enter subgraph → reload
(draft) → verify positions preserved.

## E2E Test Fixes

- `subgraphDraftPositions.spec.ts`: replaced `comfyPage.setup({
clearStorage: false })` with `page.reload()` + explicit draft
persistence polling. The `setup()` method performs a full navigation via
`goto()` which bypassed the draft auto-load flow.
- `SubgraphHelper.packAllInteriorNodes`: replaced `canvas.click()` with
`dispatchEvent('pointerdown'/'pointerup')`. The position fix places
subgraph nodes at their correct locations, which now overlap with DOM
widget textareas that intercept pointer events.

## Test Plan

- [x] Unit tests pass (`useVueNodeResizeTracking.test.ts`)
- [x] E2E test: `subgraphDraftPositions.spec.ts` — draft reload
preserves subgraph node positions
- [x] Manual: load workflow with subgraph, zoom in/out on root graph,
enter subgraph, verify no position drift
- [x] Manual: repeat with page refresh (draft reload) — positions should
be stable across reloads
- [x] Manual: drag nodes inside subgraph — positions should update
correctly
- [x] Manual: create new node inside subgraph — position should be set
correctly (fallback path)

## Screenshots
Before
<img width="1331" height="879" alt="스크린샷 2026-04-03 오전 3 56 48"
src="https://github.com/user-attachments/assets/377d1b2e-6d47-4884-8181-920e22fa6541"
/>

After
<img width="1282" height="715" alt="스크린샷 2026-04-03 오전 3 58 24"
src="https://github.com/user-attachments/assets/34528f6c-0225-4538-9383-227c849bccad"
/>


┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10828-fix-prevent-subgraph-node-position-corruption-during-graph-transitions-3366d73d365081418502dbb78da54013)
by [Unito](https://www.unito.io)

---------

Co-authored-by: Alexander Brown <drjkl@comfy.org>
2026-04-07 14:34:56 +09:00
Christian Byrne
84f401bbe9 docs: update backport-management skill with v1.43 session learnings (#10927)
## Summary

Update backport-management skill with learnings from the 2026-04-06
backport session (29 PRs across core/1.42, cloud/1.42, core/1.41,
cloud/1.41).

## Changes

- **What**: Captures operational learnings into the backport skill
reference docs
- Branch scope clarification: app mode and Firebase auth go to both
core+cloud branches, not cloud-only
- Accept-theirs regex warning: produces broken hybrids on component
rewrites (PrimeVue to Reka UI migrations); use `git show SHA:path`
instead
- Missing dependency pattern: cherry-picks can silently bring in code
referencing composables/components not on the target branch
- Fix PRs are expected: plan for 1 fix PR per branch after wave
verification
- `--no-verify` required for worktree commits/pushes (husky hooks fail
in /tmp/ worktrees)
- Automation success varies wildly by branch: core/1.42 got 69%
auto-PRs, cloud/1.42 got 4%
- Test-then-resolve batch pattern for efficient handling of
low-automation branches
- Slack-compatible final deliverables: plain text format replacing
mermaid diagrams (no emojis, tables, headers, or bold)
- Updated conflict triage table with component rewrite, import-only, and
locale/JSON conflict types
- Interactive approval flow replacing static decisions.md for human
review

## Review Focus

Documentation-only change to internal skill files. No production code
affected.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10927-docs-update-backport-management-skill-with-v1-43-session-learnings-33a6d73d3650811aa7cffed4b2d730b0)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
2026-04-06 18:23:21 -07:00
Christian Byrne
0b689a3c3c chore: remove pull-model registry type gen workflow (#9957)
## Summary

Removes `api-update-registry-api-types.yaml` — the workflow that cloned
the private `comfy-api` repo using a PAT to generate registry API types.
This is a security risk: a PAT with private repo access stored on a
public repo.

Type generation now happens in `comfy-api` and pushes PRs to this repo
instead.

### Action needed after merge

- Remove the `COMFY_API_PAT` secret from this repo's settings (Settings
→ Secrets → Actions)

### Depends on

- https://github.com/Comfy-Org/comfy-api/pull/937 (must merge first)
- Refs: COM-16785

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-9957-chore-remove-pull-model-registry-type-gen-workflow-3246d73d365081cd9dbad03597817f05)
by [Unito](https://www.unito.io)
2026-04-06 18:21:38 -07:00
Christian Byrne
32ff1a5bdb feat(website): add SEO, sitemap, redirects, CI workflow, and Vercel config (#10156)
## Summary

<!-- One sentence describing what changed and why. -->

## Changes

- **What**: <!-- Core functionality added/modified -->
- **Breaking**: <!-- Any breaking changes (if none, remove this line)
-->
- **Dependencies**: <!-- New dependencies (if none, remove this line)
-->

## Review Focus

<!-- Critical design decisions or edge cases that need attention -->

<!-- If this PR fixes an issue, uncomment and update the line below -->
<!-- Fixes #ISSUE_NUMBER -->

## Screenshots (if applicable)

<!-- Add screenshots or video recording to help explain your changes -->

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10156-feat-website-add-SEO-sitemap-redirects-CI-workflow-and-Vercel-config-3266d73d3650816ab9eaebd11072d481)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
2026-04-06 18:20:28 -07:00
Christian Byrne
0a5f281291 refactor: reconcile workspaceAuthStore with authStore (#10485)
## Summary

Add dedicated auth token priority tests and reconcile workspace token
resolution between `authStore` and `workspaceAuthStore`.

## Changes

- **What**: Created `src/stores/__tests__/authTokenPriority.test.ts`
with 6 scenarios covering the full priority chain: workspace token →
Firebase token → API key → null. Added `getWorkspaceToken()` method to
`workspaceAuthStore`. Replaced raw `sessionStorage` reads in `authStore`
with proper store delegation.
- **Files**: 3 files changed (+ 1 new test file)

## Review Focus

- Token priority chain in `authTokenPriority.test.ts` — validates
workspace > Firebase > API key ordering
- The `getAuthToken()` and `getAuthHeader()` methods now delegate to
`workspaceAuthStore` instead of reading sessionStorage directly

## Stack

PR 3/5: #10483#10484 → **→ This PR** → #10486#10487

Co-authored-by: Alexander Brown <drjkl@comfy.org>
2026-04-06 18:18:20 -07:00
Christian Byrne
61af482cd4 docs: document when to use page.evaluate vs user actions in browser tests (#10658)
## Summary

Document acceptable vs avoidable `page.evaluate` patterns in Playwright
E2E tests, with migration candidates for existing offenders.

## Changes

- **What**: Add "When to Use `page.evaluate`" section to
`docs/guidance/playwright.md` with acceptable/avoid/preferred guidance
and 8 migration candidates identified from audit

## Review Focus

Whether the migration candidate list covers the right tests and whether
the acceptable/avoid boundary is drawn correctly.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10658-docs-document-when-to-use-page-evaluate-vs-user-actions-in-browser-tests-3316d73d36508186be90f263f36daf75)
by [Unito](https://www.unito.io)
2026-04-06 16:56:43 -07:00
Christian Byrne
3a73ce72bb refactor: replace inline getByRole('dialog') with page objects in E2E tests (#10724)
## Summary

Replace remaining inline `getByRole('dialog')` calls in E2E tests with
page object accessors, following the pattern from PR #10586.

**Note:** The original scope covered 9 calls across 5 files. Three of
those refactors (ConfirmDialog, MediaLightbox, TemplatesDialog) were
already merged into main via other PRs, so this PR now covers only the
remaining 3 files.

## Changes

- **ComfyNodeSearchBox.ts**: Extract a `root` locator on
`ComfyNodeSearchFilterSelectionPanel` so `header` getter uses
`this.root` instead of an inline `page.getByRole('dialog')`
- **AppModeHelper.ts**: Add `imagePickerPopover` getter that locates the
PrimeVue Popover (`role="dialog"`) filtered by a button named "All"
- **appModeDropdownClipping.spec.ts**: Replace 4-line inline
`getByRole('dialog')` chain with `comfyPage.appMode.imagePickerPopover`

## Review Focus

Pure test infrastructure refactor — no production code changes. Each
page object follows existing conventions.

Fixes #10723
2026-04-06 16:42:09 -07:00
Benjamin Lu
f3cbbb8654 [codex] merge hashed auth user data into GTM auth events (#10778)
Merge email dataLayer push event into existing signup/login push. Hash
with SHA 256 and deduplicate with a util.

AI summary below

---

## Summary
- attach hashed auth email to the `login` / `sign_up` GTM event payload
instead of pushing a separate standalone `user_data` message
- add a shared telemetry email hashing utility and reuse it for both GTM
(`SHA-256`) and Impact (`SHA-1`)
- extend tests to cover merged auth payloads, both hash algorithms, and
fallback behavior when Web Crypto is unavailable

## Why
The current GTM auth flow was splitting `user_data` and the auth event
into separate dataLayer pushes, which makes GTM preview harder to
interpret and relies on carried-over dataLayer state instead of the
event payload that actually fired.

There was also an implementation gap: the previous GTM change was
intended to hash email client-side, but the provider was still sending
normalized plaintext email into `dataLayer`.

## Impact
Auth events now carry hashed email on the same event payload that GTM
tags consume, so the preview timeline is cleaner and downstream matching
can rely on the auth event directly.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10778-codex-merge-hashed-auth-user-data-into-GTM-auth-events-3346d73d36508197b57deada345dbeaa)
by [Unito](https://www.unito.io)
2026-04-06 16:39:49 -07:00
Christian Byrne
e2b07f3e9a feat(website): add Wave 4 secondary pages (#10145)
## Summary

<!-- One sentence describing what changed and why. -->

## Changes

- **What**: <!-- Core functionality added/modified -->
- **Breaking**: <!-- Any breaking changes (if none, remove this line)
-->
- **Dependencies**: <!-- New dependencies (if none, remove this line)
-->

## Review Focus

<!-- Critical design decisions or edge cases that need attention -->

<!-- If this PR fixes an issue, uncomment and update the line below -->
<!-- Fixes #ISSUE_NUMBER -->

## Screenshots (if applicable)

<!-- Add screenshots or video recording to help explain your changes -->

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10145-feat-website-add-Wave-4-secondary-pages-3266d73d3650818c9101c7d2086c21ba)
by [Unito](https://www.unito.io)
2026-04-06 14:42:24 -07:00
Comfy Org PR Bot
2856a30b50 1.43.13 (#10857)
Patch version increment to 1.43.13

**Base branch:** `main`

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10857-1-43-13-33a6d73d3650812598bec1f89696cfa5)
by [Unito](https://www.unito.io)

Co-authored-by: christian-byrne <72887196+christian-byrne@users.noreply.github.com>
Co-authored-by: Christian Byrne <cbyrne@comfy.org>
2026-04-06 13:16:25 -07:00
Dante
64f75f0727 test(assets): add E2E tests for delete confirmation flow (#10785)
## Summary

Add E2E Playwright tests for the asset delete confirmation dialog flow.

## Changes

- **What**: New `Assets sidebar - delete confirmation` describe block in
`assets.spec.ts` covering right-click delete showing confirmation
dialog, confirming delete removes asset with success toast, and
cancelling delete preserves asset. Added `mockDeleteHistory()` to
`AssetsHelper` to intercept POST `/api/history` delete payloads and
update mock state.

## Review Focus

Tests use existing `ConfirmDialog` page object and `AssetsHelper` mock
infrastructure. The `mockDeleteHistory` handler removes jobs from the
in-memory mock array so subsequent `/api/jobs` fetches reflect the
deletion.

Fixes #10781

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10785-test-assets-add-E2E-tests-for-delete-confirmation-flow-3356d73d365081fb90c8e2a69de3a666)
by [Unito](https://www.unito.io)
2026-04-06 10:17:23 -07:00
Dante
0d535631a5 refactor(test): extract dialog page objects from inline getByRole usage (#10822)
## Summary

Extract inline `getByRole('dialog')` calls across E2E tests into
reusable page objects.

## Changes

- **What**: Extract `ConfirmDialog` class from `ComfyPage.ts` into
`browser_tests/fixtures/components/ConfirmDialog.ts` with new `save`
button locator. Add `MediaLightbox` and `TemplatesDialog` page objects.
Refactor 4 test files to use these page objects instead of raw dialog
locators.
- **Skipped**: `appModeDropdownClipping.spec.ts` uses
`getByRole('dialog')` for a PrimeVue Popover (not a true dialog), left
as-is.

## Review Focus

The `ConfirmDialog.click()` method now supports a `save` action used by
`workflowPersistence.spec.ts`, which also waits for the dialog mask to
disappear and workflow service to settle.

Fixes #10723

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10822-refactor-test-extract-dialog-page-objects-from-inline-getByRole-usage-3366d73d365081b3bc0ee7ef0ddce658)
by [Unito](https://www.unito.io)
2026-04-06 09:49:32 -07:00
797 changed files with 33022 additions and 37592 deletions

View File

@@ -11,10 +11,11 @@ Cherry-pick backport management for Comfy-Org/ComfyUI_frontend stable release br
1. **Discover** — Collect candidates from Slack bot + git log gap (`reference/discovery.md`)
2. **Analyze** — Categorize MUST/SHOULD/SKIP, check deps (`reference/analysis.md`)
3. **Plan** — Order by dependency (leaf fixes first), group into waves per branch
4. **Execute**Label-driven automation → worktree fallback for conflicts (`reference/execution.md`)
5. **Verify**After each wave, verify branch integrity before proceeding
6. **Log & Report** — Generate session report with mermaid diagram (`reference/logging.md`)
3. **Human Review** — Present candidates in batches for interactive approval (see Interactive Approval Flow)
4. **Plan**Order by dependency (leaf fixes first), group into waves per branch
5. **Execute**Label-driven automation → worktree fallback for conflicts (`reference/execution.md`)
6. **Verify** — After each wave, verify branch integrity before proceeding
7. **Log & Report** — Generate session report (`reference/logging.md`)
## System Context
@@ -37,16 +38,29 @@ Cherry-pick backport management for Comfy-Org/ComfyUI_frontend stable release br
**Critical: Match PRs to the correct target branches.**
| Branch prefix | Scope | Example |
| ------------- | ------------------------------ | ----------------------------------------- |
| `cloud/*` | Cloud-hosted ComfyUI only | App mode, cloud auth, cloud-specific UI |
| `core/*` | Local/self-hosted ComfyUI only | Core editor, local workflows, node system |
| Branch prefix | Scope | Example |
| ------------- | ------------------------------ | ------------------------------------------------- |
| `cloud/*` | Cloud-hosted ComfyUI only | Team workspaces, cloud queue, cloud-only login |
| `core/*` | Local/self-hosted ComfyUI only | Core editor, local workflows, node system |
| Both | Shared infrastructure | App mode, Firebase auth (API nodes), payment URLs |
**⚠️ NEVER backport cloud-only PRs to `core/*` branches.** Cloud-only changes (app mode, cloud auth, cloud billing UI, cloud-specific API calls) are irrelevant to local users and waste effort. Before backporting any PR to a `core/*` branch, check:
### What Goes Where
- Does the PR title/description mention "app mode", "cloud", or cloud-specific features?
- Does the PR only touch files like `appModeStore.ts`, cloud auth, or cloud-specific components?
- If yes → skip for `core/*` branches (may still apply to `cloud/*` branches)
**Both core + cloud:**
- **App mode** PRs — app mode is NOT cloud-only
- **Firebase auth** PRs — Firebase auth is on core for API nodes
- **Payment redirect** PRs — payment infrastructure shared
- **Bug fixes** touching shared components
**Cloud-only (skip for core):**
- Team workspaces
- Cloud queue virtualization
- Hide API key login
- Cloud-specific UI behind cloud feature flags
**⚠️ NEVER backport cloud-only PRs to `core/*` branches.** But do NOT assume "app mode" or "Firebase" = cloud-only. Check the actual files changed.
## ⚠️ Gotchas (Learn from Past Sessions)
@@ -67,6 +81,32 @@ The `pr-backport.yaml` action reports more conflicts than reality. `git cherry-p
12 or 27 conflicting files can be trivial (snapshots, new files). **Categorize conflicts first**, then decide. See Conflict Triage below.
### Accept-Theirs Can Produce Broken Hybrids
When a PR **rewrites a component** (e.g., PrimeVue → Reka UI), the accept-theirs regex produces a broken mix of old and new code. The template may reference new APIs while the script still has old imports, or vice versa.
**Detection:** Content conflicts with 4+ conflict markers in a single `.vue` file, especially when imports change between component libraries.
**Fix:** Instead of accept-theirs regex, use `git show MERGE_SHA:path/to/file > path/to/file` to get the complete correct version from the merge commit on main. This bypasses the conflict entirely.
### Cherry-Picks Can Reference Missing Dependencies
When PR A on main depends on code introduced by PR B (which was merged before A), cherry-picking A brings in code that references B's additions. The cherry-pick succeeds but the branch is broken.
**Common pattern:** Composables, component files, or type definitions introduced by an earlier PR and used by the cherry-picked PR.
**Detection:** `pnpm typecheck` fails with "Cannot find module" or "is not defined" errors after cherry-pick.
**Fix:** Use `git show MERGE_SHA:path/to/missing/file > path/to/missing/file` to bring the missing files from main. Always verify with typecheck.
### Use `--no-verify` for Worktree Pushes
Husky hooks fail in worktrees (can't find lint-staged config). Always use `git push --no-verify` and `git commit --no-verify` when working in `/tmp/` worktrees.
### Automation Success Varies Wildly by Branch
In the 2026-04-06 session: core/1.42 got 18/26 auto-PRs, cloud/1.42 got only 1/25. The cloud branch has more divergence. **Always plan for manual fallback** — don't assume automation will handle most PRs.
## Conflict Triage
**Always categorize before deciding to skip. High conflict count ≠ hard conflicts.**
@@ -77,6 +117,8 @@ The `pr-backport.yaml` action reports more conflicts than reality. `git cherry-p
| **Modify/delete (new file)** | PR introduces files not on target | `git add $FILE` — keep the new file |
| **Modify/delete (removed)** | Target removed files the PR modifies | `git rm $FILE` — file no longer relevant |
| **Content conflicts** | Marker-based (`<<<<<<<`) | Accept theirs via python regex (see below) |
| **Component rewrites** | 4+ markers in `.vue`, library change | Use `git show SHA:path > path` — do NOT accept-theirs |
| **Import-only conflicts** | Only import lines differ | Keep both imports if both used; remove unused after |
| **Add/add** | Both sides added same file | Accept theirs, verify no logic conflict |
| **Locale/JSON files** | i18n key additions | Accept theirs, validate JSON after |
@@ -103,7 +145,7 @@ Skip these without discussion:
- **Test-only / lint rule changes** — Not user-facing
- **Revert pairs** — If PR A reverted by PR B, skip both. If fixed version (PR C) exists, backport only C.
- **Features not on target branch** — e.g., Painter, GLSLShader, appModeStore on core/1.40
- **Cloud-only PRs on core/\* branches** — App mode, cloud auth, cloud billing. These only affect cloud-hosted ComfyUI.
- **Cloud-only PRs on core/\* branches** — Team workspaces, cloud queue, cloud-only login. (Note: app mode and Firebase auth are NOT cloud-only — see Branch Scope Rules)
## Wave Verification
@@ -122,6 +164,18 @@ git worktree remove /tmp/verify-TARGET --force
If typecheck or tests fail, stop and investigate before continuing. A broken branch after wave N means all subsequent waves will compound the problem.
### Fix PRs Are Normal
Expect to create 1 fix PR per branch after verification. Common issues:
1. **Component rewrite hybrids** — accept-theirs produced broken `.vue` files. Fix: overwrite with correct version from merge commit via `git show SHA:path > path`
2. **Missing dependency files** — cherry-pick brought in code referencing composables/components not on the branch. Fix: add missing files from merge commit
3. **Missing type properties** — cherry-picked code uses interface properties not yet on the branch (e.g., `key` on `ConfirmDialogOptions`). Fix: add the property to the interface
4. **Unused imports** — conflict resolution kept imports that the branch doesn't use. Fix: remove unused imports
5. **Wrong types from conflict resolution** — e.g., `{ top: number; right: number }` vs `{ top: number; left: number }`. Fix: match the return type of the actual function
Create a fix PR on a branch from the target, verify typecheck passes, then merge with `--squash --admin`.
### Never Admin-Merge Without CI
In a previous bulk session, all 69 backport PRs were merged with `gh pr merge --squash --admin`, bypassing required CI checks. This shipped 3 test failures to a release branch. **Lesson: `--admin` skips all branch protection, including required status checks.** Only use `--admin` after confirming CI has passed (e.g., `gh pr checks $PR` shows all green), or rely on auto-merge (`--auto --squash`) which waits for CI by design.
@@ -135,6 +189,43 @@ Large backport sessions (50+ PRs) are expensive and error-prone. Prefer continuo
- Reserve session-style bulk backporting for catching up after gaps
- When a release branch is created, immediately start the continuous process
## Interactive Approval Flow
After analysis, present ALL candidates (MUST, SHOULD, and borderline) to the human for interactive review before execution. Do not write a static decisions.md — collect approvals in conversation.
### Batch Presentation
Present PRs in batches of 5-10, grouped by theme (visual bugs, interaction bugs, cloud/auth, data correctness, etc.). Use this table format:
```
# | PR | Title | Target | Rec | Context
----+--------+------------------------------------------+---------------+------+--------
1 | #12345 | fix: broken thing | core+cloud/42 | Y | Description here. Why it matters. Agent reasoning.
2 | #12346 | fix: another issue | core/42 | N | Only affects removed feature. Not on target branch.
```
Each row includes:
- PR number and title
- Target branches
- Agent recommendation: `Rec: Y` or `Rec: N` with brief reasoning
- 2-3 sentence context: what the PR does, why it matters (or doesn't)
### Human Response Format
- `Y` — approve for backport
- `N` — skip
- `?` — investigate (agent shows PR description, files changed, detailed take, then re-asks)
- Any freeform question or comment triggers discussion before moving on
- Bulk responses accepted (e.g. `1 Y, 2 Y, 3 N, 4 ?`)
### Rules
- ALL candidates are reviewed, not just MUST items
- When human responds `?`, show the PR description, files changed, and agent's detailed analysis, then re-ask for their decision
- When human asks a question about a PR, answer with context and recommendation, then wait for their decision
- Do not proceed to execution until all batches are reviewed and every candidate has a Y or N
## Quick Reference
### Label-Driven Automation (default path)
@@ -150,13 +241,96 @@ gh api repos/Comfy-Org/ComfyUI_frontend/issues/$PR/labels \
```bash
git worktree add /tmp/backport-$BRANCH origin/$BRANCH
cd /tmp/backport-$BRANCH
# For each PR:
git fetch origin $BRANCH
git checkout -b backport-$PR-to-$BRANCH origin/$BRANCH
git cherry-pick -m 1 $MERGE_SHA
# Resolve conflicts, push, create PR, merge
# Resolve conflicts (see Conflict Triage)
git push origin backport-$PR-to-$BRANCH --no-verify
gh pr create --base $BRANCH --head backport-$PR-to-$BRANCH \
--title "[backport $BRANCH] $TITLE (#$PR)" \
--body "Backport of #$PR. [conflict notes]"
gh pr merge $NEW_PR --squash --admin
sleep 25
```
### Efficient Batch: Test-Then-Resolve Pattern
When many PRs need manual cherry-pick (e.g., cloud branches), test all first:
```bash
cd /tmp/backport-$BRANCH
for pr in "${ORDER[@]}"; do
git checkout -b test-$pr origin/$BRANCH
if git cherry-pick -m 1 $SHA 2>/dev/null; then
echo "CLEAN: $pr"
else
echo "CONFLICT: $pr"
git cherry-pick --abort
fi
git checkout --detach HEAD
git branch -D test-$pr
done
```
Then process clean PRs in a batch loop, conflicts individually.
### PR Title Convention
```
[backport TARGET_BRANCH] Original Title (#ORIGINAL_PR)
```
## Final Deliverables (Slack-Compatible)
After execution completes, generate two files in `~/temp/backport-session/`. Both must be **Slack-compatible plain text** — no emojis, no markdown tables, no headers (`#`), no bold (`**`), no inline code. Use plain dashes, indentation, and line breaks only.
### 1. Author Accountability Report
File: `backport-author-accountability.md`
Lists all backported PRs grouped by original author (via `gh pr view $PR --json author`). Surfaces who should be self-labeling.
```
Backport Session YYYY-MM-DD -- PRs that should have been labeled by authors
- author-login
- #1234 fix: short title
- #5678 fix: another title
- other-author
- #9012 fix: some other fix
```
Authors sorted alphabetically, 4-space indent for nested items.
### 2. Slack Status Update
File: `slack-status-update.md`
A shareable summary of the session. Structure:
```
Backport session complete -- YYYY-MM-DD
[1-sentence summary: N PRs backported to which branches. All pass typecheck.]
Branches updated:
- core/X.XX: N PRs + N fix PRs (N auto, N manual)
- cloud/X.XX: N PRs + N fix PRs (N auto, N manual)
- ...
N total PRs created and merged (N backports + N fix PRs).
Notable fixes included:
- [category]: [list of fixes]
- ...
Conflict patterns encountered:
- [pattern and how it was resolved]
- ...
N authors had PRs backported. See author accountability list for details.
```
No emojis, no tables, no bold, no headers. Plain text that pastes cleanly into Slack.

View File

@@ -23,10 +23,10 @@ For SHOULD items with conflicts: if conflict resolution requires more than trivi
**Before categorizing, filter by branch scope:**
| Target branch | Skip if PR is... |
| ------------- | ------------------------------------------------------------------- |
| `core/*` | Cloud-only (app mode, cloud auth, cloud billing, cloud-specific UI) |
| `cloud/*` | Local-only features not present on cloud branch |
| Target branch | Skip if PR is... |
| ------------- | ----------------------------------------------------------------------------------------------------------------- |
| `core/*` | Cloud-only (team workspaces, cloud queue, cloud-only login). Note: app mode and Firebase auth are NOT cloud-only. |
| `cloud/*` | Local-only features not present on cloud branch |
Cloud-only PRs backported to `core/*` are wasted effort — `core/*` branches serve local/self-hosted users who never see cloud features. Check PR titles, descriptions, and files changed for cloud-specific indicators.
@@ -61,8 +61,6 @@ done
## Human Review Checkpoint
Present decisions.md before execution. Include:
Use the Interactive Approval Flow (see SKILL.md) to review all candidates interactively. Do not write a static decisions.md for the human to edit — instead, present batches of 5-10 PRs with context and recommendations, and collect Y/N/? responses in conversation.
1. All MUST/SHOULD/SKIP categorizations with rationale
2. Questions for human (feature existence, scope, deps)
3. Estimated effort per branch
All candidates must be reviewed (MUST, SHOULD, and borderline items), not just a subset.

View File

@@ -73,14 +73,22 @@ for PR in ${CONFLICT_PRS[@]}; do
git cherry-pick -m 1 $MERGE_SHA
# If conflict — NEVER skip based on file count alone!
# Categorize conflicts first: binary PNGs, modify/delete, content, add/add
# Categorize conflicts first: binary PNGs, modify/delete, content, add/add, component rewrites
# See SKILL.md Conflict Triage table for resolution per type.
# For component rewrites (4+ markers in a .vue file, library migration):
# DO NOT use accept-theirs regex — it produces broken hybrids.
# Instead, use the complete file from the merge commit:
# git show $MERGE_SHA:path/to/file > path/to/file
# For simple content conflicts, accept theirs:
# python3 -c "import re; ..."
# Resolve all conflicts, then:
git add .
GIT_EDITOR=true git cherry-pick --continue
git push origin backport-$PR-to-TARGET
git push origin backport-$PR-to-TARGET --no-verify
NEW_PR=$(gh pr create --base TARGET_BRANCH --head backport-$PR-to-TARGET \
--title "[backport TARGET] TITLE (#$PR)" \
--body "Backport of #$PR..." | grep -oP '\d+$')
@@ -114,7 +122,30 @@ source ~/.nvm/nvm.sh && nvm use 24 && pnpm install && pnpm typecheck && pnpm tes
git worktree remove /tmp/verify-TARGET --force
```
If verification fails, stop and fix before proceeding to the next wave. Do not compound problems across waves.
If verification fails, **do not skip** — create a fix PR:
```bash
# Stay in the verify worktree
git checkout -b fix-backport-TARGET origin/TARGET_BRANCH
# Common fixes:
# 1. Component rewrite hybrids: overwrite with merge commit version
git show MERGE_SHA:path/to/Component.vue > path/to/Component.vue
# 2. Missing dependency files
git show MERGE_SHA:path/to/missing.ts > path/to/missing.ts
# 3. Missing type properties: edit the interface
# 4. Unused imports: delete the import lines
git add -A
git commit --no-verify -m "fix: resolve backport typecheck issues on TARGET"
git push origin fix-backport-TARGET --no-verify
gh pr create --base TARGET --head fix-backport-TARGET --title "fix: resolve backport typecheck issues on TARGET" --body "..."
gh pr merge $PR --squash --admin
```
Do not proceed to the next branch until typecheck passes.
## Conflict Resolution Patterns
@@ -142,7 +173,35 @@ git rm $FILE
git checkout --theirs $FILE && git add $FILE
```
### 4. Locale Files
### 4. Component Rewrites (DO NOT accept-theirs)
When a PR completely rewrites a component (e.g., PrimeVue → Reka UI), accept-theirs produces
a broken hybrid with mismatched template/script sections.
```bash
# Use the complete correct file from the merge commit instead:
git show $MERGE_SHA:src/components/input/MultiSelect.vue > src/components/input/MultiSelect.vue
git show $MERGE_SHA:src/components/input/SingleSelect.vue > src/components/input/SingleSelect.vue
git add src/components/input/MultiSelect.vue src/components/input/SingleSelect.vue
```
**Detection:** 4+ conflict markers in a single `.vue` file, imports changing between component
libraries (PrimeVue → Reka UI, etc.), template structure completely different on each side.
### 5. Missing Dependencies After Cherry-Pick
Cherry-picks can succeed but leave the branch broken because the PR's code on main
references composables/components introduced by an earlier PR.
```bash
# Add the missing file from the merge commit:
git show $MERGE_SHA:src/composables/queue/useJobDetailsHover.ts > src/composables/queue/useJobDetailsHover.ts
git show $MERGE_SHA:src/components/builder/BuilderSaveDialogContent.vue > src/components/builder/BuilderSaveDialogContent.vue
```
**Detection:** `pnpm typecheck` fails with "Cannot find module" or "X is not defined" after cherry-pick succeeds cleanly.
### 6. Locale Files
Usually adding new i18n keys — accept theirs, validate JSON:
@@ -176,8 +235,14 @@ gh pr checks $PR --watch --fail-fast && gh pr merge $PR --squash --admin
8. **Always validate JSON** after resolving locale file conflicts
9. **Dep refresh PRs** — skip on stable branches. Risk of transitive dep regressions outweighs audit cleanup. Cherry-pick individual CVE fixes instead.
10. **Verify after each wave** — run `pnpm typecheck && pnpm test:unit` on the target branch after merging a batch. Catching breakage early prevents compounding errors.
11. **Cloud-only PRs don't belong on core/\* branches** — app mode, cloud auth, and cloud-specific UI changes are irrelevant to local users. Always check PR scope against branch scope before backporting.
11. **App mode and Firebase auth are NOT cloud-only** — they go to both core and cloud branches. Only team workspaces, cloud queue, and cloud-specific login are cloud-only.
12. **Never admin-merge without CI**`--admin` bypasses all branch protections including required status checks. A bulk session of 69 admin-merges shipped 3 test failures. Always wait for CI to pass first, or use `--auto --squash` which waits by design.
13. **Accept-theirs regex breaks component rewrites** — when a PR migrates between component libraries (PrimeVue → Reka UI), the regex produces a broken hybrid. Use `git show SHA:path > path` to get the complete correct version instead.
14. **Cherry-picks can silently bring in missing-dependency code** — if PR A references a composable introduced by PR B, cherry-picking A succeeds but typecheck fails. Always run typecheck after each wave and add missing files from the merge commit.
15. **Fix PRs are expected** — plan for 1 fix PR per branch to resolve typecheck issues from conflict resolutions. This is normal, not a failure.
16. **Use `--no-verify` in worktrees** — husky hooks fail in `/tmp/` worktrees. Always push/commit with `--no-verify`.
17. **Automation success varies by branch** — core/1.42 got 18/26 auto-PRs (69%), cloud/1.42 got 1/25 (4%). Cloud branches diverge more. Plan for manual fallback.
18. **Test-then-resolve pattern** — for branches with low automation success, run a dry-run loop to classify clean vs conflict PRs before processing. This is much faster than resolving conflicts serially.
## CI Failure Triage

View File

@@ -2,26 +2,25 @@
## During Execution
Maintain `execution-log.md` with per-branch tables:
Maintain `execution-log.md` with per-branch tables (this is internal, markdown tables are fine here):
```markdown
| PR# | Title | CI Status | Status | Backport PR | Notes |
| ----- | ----- | ------------------------------ | --------------------------------- | ----------- | ------- |
| #XXXX | Title | ✅ Pass / ❌ Fail / ⏳ Pending | ✅ Merged / ⏭️ Skip / ⏸️ Deferred | #YYYY | Details |
| PR# | Title | Status | Backport PR | Notes |
| ----- | ----- | ------ | ----------- | ------- |
| #XXXX | Title | merged | #YYYY | Details |
```
## Wave Verification Log
Track verification results per wave:
Track verification results per wave within execution-log.md:
```markdown
## Wave N Verification TARGET_BRANCH
Wave N Verification -- TARGET_BRANCH
- PRs merged: #A, #B, #C
- Typecheck: ✅ Pass / ❌ Fail
- Unit tests: ✅ Pass / ❌ Fail
- Typecheck: pass / fail
- Fix PR: #YYYY (if needed)
- Issues found: (if any)
- Human review needed: (list any non-trivial conflict resolutions)
```
## Session Report Template
@@ -63,40 +62,42 @@ Track verification results per wave:
- Feature branches that need tracking for future sessions?
```
## Final Deliverable: Visual Summary
## Final Deliverables
At session end, generate a **mermaid diagram** showing all backported PRs organized by target branch and category (MUST/SHOULD), plus a summary table. Present this to the user as the final output.
After all branches are complete and verified, generate these files in `~/temp/backport-session/`:
```mermaid
graph TD
subgraph branch1["☁️ cloud/X.XX — N PRs"]
C1["#XXXX title"]
C2["#XXXX title"]
end
### 1. execution-log.md (internal)
subgraph branch2must["🔴 core/X.XX MUST — N PRs"]
M1["#XXXX title"]
end
Per-branch tables with PR#, title, status, backport PR#, notes. Markdown tables are fine — this is for internal tracking, not Slack.
subgraph branch2should["🟡 core/X.XX SHOULD — N PRs"]
S1["#XXXX-#XXXX N auto-merged"]
S2["#XXXX-#XXXX N manual picks"]
end
### 2. backport-author-accountability.md (Slack-compatible)
classDef cloudStyle fill:#1a3a5c,stroke:#4da6ff,color:#e0f0ff
classDef coreStyle fill:#1a4a2e,stroke:#4dff88,color:#e0ffe8
classDef mustStyle fill:#5c1a1a,stroke:#ff4d4d,color:#ffe0e0
classDef shouldStyle fill:#4a3a1a,stroke:#ffcc4d,color:#fff5e0
```
See SKILL.md "Final Deliverables" section. Plain text, no emojis/tables/headers/bold. Authors sorted alphabetically with PRs nested under each.
Use the `mermaid` tool to render this diagram and present it alongside the summary table as the session's final deliverable.
### 3. slack-status-update.md (Slack-compatible)
See SKILL.md "Final Deliverables" section. Plain text summary that pastes cleanly into Slack. Includes branch counts, notable fixes, conflict patterns, author count.
## Slack Formatting Rules
Both shareable files (author accountability + status update) must follow these rules:
- No emojis (no checkmarks, no arrows, no icons)
- No markdown tables (use plain lists with dashes)
- No headers (no # or ##)
- No bold (\*_) or italic (_)
- No inline code backticks
- Use -- instead of em dash
- Use plain dashes (-) for lists with 4-space indent for nesting
- Line breaks between sections for readability
These files should paste directly into a Slack message and look clean.
## Files to Track
- `candidate_list.md` — all candidates per branch
- `decisions.md` — MUST/SHOULD/SKIP with rationale
- `wave-plan.md` — execution order
- `execution-log.md` — real-time status
- `backport-session-report.md` — final summary
All in `~/temp/backport-session/`:
All in `~/temp/backport-session/`.
- `execution-plan.md` -- approved PRs with merge SHAs (input)
- `execution-log.md` -- real-time status with per-branch tables (internal)
- `backport-author-accountability.md` -- PRs grouped by author (Slack-compatible)
- `slack-status-update.md` -- session summary (Slack-compatible)

View File

@@ -0,0 +1,278 @@
---
name: reproduce-issue
description: 'Reproduce a GitHub issue by researching prerequisites, setting up the environment (custom nodes, workflows, settings), and interactively exploring ComfyUI via playwright-cli until the bug is confirmed. Then records a clean demo video.'
---
# Issue Reproduction Skill
Reproduce a reported GitHub issue against a running ComfyUI instance. This skill uses an interactive, agent-driven approach — not a static script. You will research, explore, retry, and adapt until the bug is reproduced, then record a clean demo.
## Architecture
Two videos are produced:
1. **Research video** — the full exploration session: installing deps, trying things, failing, retrying, figuring out the bug. Valuable for debugging context.
2. **Reproduce video** — a clean, minimal recording of just the reproduction steps. This is the demo you'd attach to the issue.
```
Phase 1: Research → Read issue, understand prerequisites
Phase 2: Environment → Install custom nodes, load workflows, configure settings
Phase 3: Explore → [VIDEO 1: research] Interactively try to reproduce (retries OK)
Phase 4: Record → [VIDEO 2: reproduce] Clean recording of just the minimal repro steps
Phase 5: Report → Generate a structured reproduction report
```
## Prerequisites
- ComfyUI server running (ask user for URL, default: `http://127.0.0.1:8188`)
- `playwright-cli` installed: `npm install -g @playwright/cli@latest`
- `gh` CLI (authenticated, for reading issues)
- ComfyUI backend with Python environment (for installing custom nodes)
## Phase 1: Research the Issue
1. Fetch the issue details:
```bash
gh issue view <number> --repo Comfy-Org/ComfyUI_frontend --json title,body,comments
```
2. Extract from the issue body:
- **Reproduction steps** (the exact sequence)
- **Prerequisites**: specific workflows, custom nodes, settings, models
- **Environment**: OS, browser, ComfyUI version
- **Media**: screenshots or videos showing the bug
3. Search the codebase for related code:
- Find the feature/component mentioned in the issue
- Understand how it works currently
- Identify what state the UI needs to be in
## Phase 2: Environment Setup
Set up everything the issue requires BEFORE attempting reproduction.
### Custom Nodes
If the issue mentions custom nodes:
```bash
# Find the custom node repo
# Clone into ComfyUI's custom_nodes directory
cd <comfyui_path>/custom_nodes
git clone <custom_node_repo_url>
# Install dependencies if needed
cd <custom_node_name>
pip install -r requirements.txt 2>/dev/null || true
# Restart ComfyUI server to load the new nodes
```
### Workflows
If the issue references a specific workflow:
```bash
# Download workflow JSON if a URL is provided
curl -L "<workflow_url>" -o /tmp/test-workflow.json
# Load it via the API
curl -X POST http://127.0.0.1:8188/api/workflow \
-H "Content-Type: application/json" \
-d @/tmp/test-workflow.json
```
Or load via playwright-cli:
```bash
playwright-cli goto "http://127.0.0.1:8188"
# Drag-and-drop or use File > Open to load the workflow
```
### Settings
If the issue requires specific settings:
```bash
# Use playwright-cli to open settings and change them
playwright-cli press "Control+,"
playwright-cli snapshot
# Find and modify the relevant setting
```
## Phase 3: Interactive Exploration — Research Video
Start recording the **research video** (Video 1). This captures the full exploration — mistakes, retries, dead ends — all valuable context.
```bash
# Open browser and start video recording
playwright-cli open "http://127.0.0.1:8188"
playwright-cli video-start
# Take a snapshot to see current state
playwright-cli snapshot
# Interact based on what you see
playwright-cli click <ref>
playwright-cli fill <ref> "text"
playwright-cli press "Control+s"
# Check results
playwright-cli snapshot
playwright-cli screenshot --filename=/tmp/qa/research-step-1.png
```
### Key Principles
- **Observe before acting**: Always `snapshot` before interacting
- **Retry and adapt**: If a step fails, try a different approach
- **Document what works**: Keep notes on which steps trigger the bug
- **Don't give up**: Try multiple approaches if the first doesn't work
- **Establish prerequisites**: Many bugs require specific UI state:
- Save a workflow first (File > Save)
- Make changes to dirty the workflow
- Open multiple tabs
- Add specific node types
- Change settings
- Resize the window
### Common ComfyUI Interactions via playwright-cli
| Action | Command |
| ------------------- | -------------------------------------------------------------- |
| Open hamburger menu | `playwright-cli click` on the C logo button |
| Navigate menu | `playwright-cli hover <ref>` then `playwright-cli click <ref>` |
| Add node | Double-click canvas → type node name → select from results |
| Connect nodes | Drag from output slot to input slot |
| Save workflow | `playwright-cli press "Control+s"` |
| Save As | Menu > File > Save As |
| Select node | Click on the node |
| Delete node | Select → `playwright-cli press "Delete"` |
| Right-click menu | `playwright-cli click <ref> --button right` |
| Keyboard shortcut | `playwright-cli press "Control+z"` |
## Phase 4: Record Clean Demo — Reproduce Video (max 5 minutes)
Once the bug is confirmed, **stop the research video** and **close the research browser**:
```bash
playwright-cli video-stop
playwright-cli close
```
Now start a **fresh browser session** for the clean reproduce video (Video 2).
**IMPORTANT constraints:**
- **Max 5 minutes** — the reproduce video must be short and focused
- **No environment setup** — server, user, custom nodes are already set up from Phase 3. Just log in and go.
- **No exploration** — you already know the exact steps. Execute them quickly and precisely.
- **Start video recording immediately**, execute steps, stop. Don't leave the recording running while thinking.
1. **Open browser and start recording**:
```bash
playwright-cli open "http://127.0.0.1:8188"
playwright-cli video-start
```
2. **Execute only the minimal reproduction steps** — no exploration, no mistakes. Just the clean sequence that demonstrates the bug. You already know exactly what works from Phase 3.
3. **Take key screenshots** at critical moments:
```bash
playwright-cli screenshot --filename=/tmp/qa/before-bug.png
# ... trigger the bug ...
playwright-cli screenshot --filename=/tmp/qa/bug-visible.png
```
4. **Stop recording and close** immediately after the bug is demonstrated:
```bash
playwright-cli video-stop
playwright-cli close
```
## Phase 5: Generate Report
Create a reproduction report at `tmp/qa/reproduce-report.md`:
```markdown
# Issue Reproduction Report
- **Issue**: <issue_url>
- **Title**: <issue_title>
- **Date**: <today>
- **Status**: Reproduced / Not Reproduced / Partially Reproduced
## Environment
- ComfyUI Server: <url>
- OS: <os>
- Custom Nodes Installed: <list or "none">
- Settings Changed: <list or "none">
## Prerequisites
List everything that had to be set up before the bug could be triggered:
1. ...
2. ...
## Reproduction Steps
Minimal steps to reproduce (the clean sequence):
1. ...
2. ...
3. ...
## Expected Behavior
<from the issue>
## Actual Behavior
<what actually happened>
## Evidence
- Research video: `research-video/video.webm` (full exploration session)
- Reproduce video: `reproduce-video/video.webm` (clean minimal repro)
- Screenshots: `before-bug.png`, `bug-visible.png`
## Root Cause Analysis (if identified)
<code pointers, hypothesis about what's going wrong>
## Notes
<any additional observations, workarounds discovered, related issues>
```
## Handling Failures
If the bug **cannot be reproduced**:
1. Document what you tried and why it didn't work
2. Check if the issue was already fixed (search git log for related commits)
3. Check if it's environment-specific (OS, browser, specific version)
4. Set report status to "Not Reproduced" with detailed notes
5. The report is still valuable — it saves others from repeating the same investigation
## CI Integration
In CI, this skill runs as a Claude Code agent with:
- `ANTHROPIC_API_KEY` for Claude
- `GEMINI_API_KEY` for initial issue analysis (optional)
- ComfyUI server pre-started in the container
- `playwright-cli` pre-installed
The CI workflow:
1. Gemini generates a reproduce guide (markdown) from the issue
2. Claude agent receives the guide and runs this skill
3. Claude explores interactively, installs dependencies, retries
4. Claude records a clean demo once reproduced
5. Video and report are uploaded as artifacts

View File

@@ -0,0 +1,283 @@
---
name: comfy-qa
description: 'Comprehensive QA of ComfyUI frontend. Navigates all routes, tests all interactive features using playwright-cli, generates a report, and submits a draft PR. Works in CI and local environments, cross-platform.'
---
# ComfyUI Frontend QA Skill
Automated quality assurance for the ComfyUI frontend. The pipeline reproduces reported bugs using Playwright E2E tests, records video evidence, and deploys reports to Cloudflare Pages.
## Architecture Overview
The QA pipeline uses a **three-phase approach**:
1. **RESEARCH** — Claude writes Playwright E2E tests to reproduce bugs (assertion-backed, no hallucination)
2. **REPRODUCE** — Deterministic replay of the research test with video recording
3. **REPORT** — Deploy results to Cloudflare Pages with badge, video, and verdict
### Key Design Decision
Earlier iterations used AI vision (Gemini) to drive a browser and judge results from video. This was abandoned after discovering **AI reviewers hallucinate** — Gemini reported "REPRODUCED" when videos showed idle screens. The current approach uses **Playwright assertions** as the source of truth: if the test passes, the bug is proven.
## Prerequisites
- Node.js 22+
- `pnpm` package manager
- `gh` CLI (authenticated)
- Playwright browsers: `npx playwright install chromium`
- Environment variables:
- `GEMINI_API_KEY` — for PR analysis and video review
- `ANTHROPIC_API_KEY` — for Claude Agent SDK (research phase)
- `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` — for report deployment
## Pipeline Scripts
| Script | Role | Model |
| --------------------------------- | ------------------------------------------------------- | ----------------------------- |
| `scripts/qa-analyze-pr.ts` | Deep PR/issue analysis → QA guide | gemini-3.1-pro-preview |
| `scripts/qa-agent.ts` | Research phase: Claude writes E2E tests | claude-sonnet-4-6 (Agent SDK) |
| `scripts/qa-record.ts` | Before/after video recording with Gemini-driven actions | gemini-3.1-pro-preview |
| `scripts/qa-reproduce.ts` | Deterministic replay with narration | gemini-3-flash-preview |
| `scripts/qa-video-review.ts` | Video comparison review | gemini-3-flash-preview |
| `scripts/qa-generate-test.ts` | Regression test generation from QA report | gemini-3-flash-preview |
| `scripts/qa-deploy-pages.sh` | Deploy to Cloudflare Pages + badge | — |
| `scripts/qa-batch.sh` | Batch-trigger QA for multiple issues | — |
| `scripts/qa-report-template.html` | Report site (light/dark, seekbar, copy badge) | — |
## Triggering QA
### Via GitHub Labels
- **`qa-changes`** — Focused QA on a PR (Linux-only, before/after comparison)
- **`qa-full`** — Full QA (3-OS matrix, after-only)
- **`qa-issue`** — Reproduce a bug from an issue
### Via Batch Script
```bash
# Trigger QA for specific issue numbers
./scripts/qa-batch.sh 10394 10238 9996
# From a triage file (top 5 Tier 1 issues)
./scripts/qa-batch.sh --from tmp/issues.md --top 5
# Preview without pushing
./scripts/qa-batch.sh --dry-run 10394
# Clean up old trigger branches
./scripts/qa-batch.sh --cleanup
```
### Via Workflow Dispatch
Go to Actions → "PR: QA" → Run workflow → choose mode (focused/full).
## CI Workflow (`.github/workflows/pr-qa.yaml`)
```
resolve-matrix → analyze-pr ──┐
├→ qa-before (main branch, worktree build)
├→ qa-after (PR branch)
└→ report (video review, deploy, comment)
```
Before/after jobs run **in parallel** on separate runners for clean isolation.
### Issue Reproduce Mode
For issues (not PRs), the pipeline:
1. Fetches the issue body and comments
2. Runs `qa-analyze-pr.ts --type issue` to generate a QA guide
3. Runs the research phase (Claude writes E2E test to reproduce)
4. Records video of the test execution
5. Posts results as a comment on the issue
## Running Locally
### Step 1: Environment Setup
```bash
# Ensure ComfyUI server is running
# Default: http://127.0.0.1:8188
# Install Playwright browsers
npx playwright install chromium
```
### Step 2: Analyze the Issue/PR
```bash
# For a PR
pnpm exec tsx scripts/qa-analyze-pr.ts \
--pr-number 10394 \
--repo Comfy-Org/ComfyUI_frontend \
--output-dir qa-guides
# For an issue
pnpm exec tsx scripts/qa-analyze-pr.ts \
--pr-number 10394 \
--repo Comfy-Org/ComfyUI_frontend \
--output-dir qa-guides \
--type issue
```
### Step 3: Record Before/After
```bash
# Before (main branch)
pnpm exec tsx scripts/qa-record.ts \
--mode before \
--diff /tmp/pr-diff.txt \
--output-dir /tmp/qa-before \
--qa-guide qa-guides/qa-guide-1.json
# After (PR branch)
pnpm exec tsx scripts/qa-record.ts \
--mode after \
--diff /tmp/pr-diff.txt \
--output-dir /tmp/qa-after \
--qa-guide qa-guides/qa-guide-1.json
```
### Step 4: Review Videos
```bash
pnpm exec tsx scripts/qa-video-review.ts \
--artifacts-dir /tmp/qa-artifacts \
--video-file qa-session.mp4 \
--before-video qa-before-session.mp4 \
--output-dir /tmp/video-reviews \
--pr-context /tmp/pr-context.txt
```
## Research Phase Details (`qa-agent.ts`)
Claude receives:
- The issue description and comments
- A QA guide from `qa-analyze-pr.ts`
- An accessibility tree snapshot of the current UI
Claude's tools:
- **`inspect(selector?)`** — Read a11y tree to discover element selectors
- **`writeTest(code)`** — Write a Playwright `.spec.ts` file
- **`runTest()`** — Execute the test and get pass/fail + errors
- **`done(verdict, summary, evidence, testCode)`** — Finish with verdict
The test uses the project's Playwright fixtures (`comfyPageFixture`), giving access to `comfyPage.page`, `comfyPage.menu`, `comfyPage.settings`, etc.
### Verdict Logic
- **REPRODUCED** — Test passes (asserting the bug exists) → bug is proven
- **NOT_REPRODUCIBLE** — Claude exhausted attempts, test cannot pass
- **INCONCLUSIVE** — Agent timed out or encountered infrastructure issues
Auto-completion: if a test passed but `done()` was never called, the pipeline auto-completes with REPRODUCED.
## Manual QA (Fallback)
When the automated pipeline isn't suitable (e.g., visual-only bugs, complex multi-step interactions), use **playwright-cli** for manual browser interaction:
```bash
# Install
npm install -g @playwright/cli@latest
# Open browser and navigate
playwright-cli open http://127.0.0.1:8188
# Get element references
playwright-cli snapshot
# Interact
playwright-cli click e1
playwright-cli fill e2 "test text"
playwright-cli press Escape
playwright-cli screenshot --filename=f.png
```
Snapshots return element references (`e1`, `e2`, …). Always run `snapshot` after navigation to refresh refs.
## Manual QA Test Plan
When performing manual QA (either via playwright-cli or the automated pipeline), systematically test each area below.
### Application Load & Routes
| Test | Steps |
| ----------------- | ------------------------------------------------------------ |
| Root route loads | Navigate to `/` — GraphView should render with canvas |
| User select route | Navigate to `/user-select` — user selection UI should appear |
| 404 handling | Navigate to `/nonexistent` — should handle gracefully |
### Canvas & Graph View
| Test | Steps |
| ------------------------- | -------------------------------------------------------------- |
| Canvas renders | The LiteGraph canvas is visible and interactive |
| Pan canvas | Click and drag on empty canvas area |
| Zoom in/out | Use scroll wheel or Alt+=/Alt+- |
| Add node via double-click | Double-click canvas to open search, type "KSampler", select it |
| Delete node | Select a node, press Delete key |
| Connect nodes | Drag from output slot to input slot |
| Copy/Paste | Select nodes, Ctrl+C then Ctrl+V |
| Undo/Redo | Make changes, Ctrl+Z to undo, Ctrl+Y to redo |
| Context menus | Right-click node vs empty canvas — different menus |
### Sidebar Tabs
| Test | Steps |
| ----------------- | ------------------------------------- |
| Workflows tab | Press W — workflows sidebar opens |
| Node Library tab | Press N — node library opens |
| Model Library tab | Press M — model library opens |
| Tab toggle | Press same key again — sidebar closes |
| Search in sidebar | Type in search box — results filter |
### Settings Dialog
| Test | Steps |
| ---------------- | ---------------------------------------------------- |
| Open settings | Press Ctrl+, or click settings button |
| Change a setting | Toggle a boolean setting — it persists after closing |
| Search settings | Type in settings search box — results filter |
| Close settings | Press Escape or click close button |
### Execution & Queue
| Test | Steps |
| -------------- | ----------------------------------------------------- |
| Queue prompt | Load default workflow, click Queue — execution starts |
| Queue progress | Progress indicator shows during execution |
| Interrupt | Press Ctrl+Alt+Enter during execution — interrupts |
## Report Site
Deployed to Cloudflare Pages at `https://comfy-qa.pages.dev/<branch>/`.
Features:
- Light/dark theme
- Seekable video player with preload
- Copy badge button (markdown)
- Date-stamped badges (e.g., `QA0327`)
- Vertical box badge for issues and PRs
## Known Issues & Troubleshooting
See `docs/qa/TROUBLESHOOTING.md` for common failures:
- `set -euo pipefail` + grep with no match → append `|| true`
- `__name is not defined` in `page.evaluate` → use `addScriptTag`
- Cursor not visible in videos → monkey-patch `page.mouse` methods
- Agent not calling `done()` → auto-complete from passing test
## Backlog
See `docs/qa/backlog.md` for planned improvements:
- **Type B comparison**: Different commits for regression detection
- **Type C comparison**: Cross-browser testing
- **Pre-seed assets**: Upload test images before recording
- **Lazy a11y tree**: Reduce token usage with `inspect(selector)` vs full dump

View File

@@ -0,0 +1,673 @@
#!/usr/bin/env tsx
/**
* QA Research Phase — Claude writes & debugs E2E tests to reproduce bugs
*
* Instead of driving a browser interactively, Claude:
* 1. Reads the issue + a11y snapshot of the UI
* 2. Writes a Playwright E2E test (.spec.ts) that reproduces the bug
* 3. Runs the test → reads errors → rewrites → repeats until it works
* 4. Outputs the passing test + verdict
*
* Tools:
* - inspect(selector) — read a11y tree to understand UI state
* - writeTest(code) — write a Playwright test file
* - runTest() — execute the test and get results
* - done(verdict, summary, testCode) — finish with the working test
*/
import type { Page } from '@playwright/test'
/* eslint-disable import-x/no-unresolved */
// @ts-expect-error — claude-agent-sdk has no type declarations for vue-tsc
import { query, tool, createSdkMcpServer } from '@anthropic-ai/claude-agent-sdk'
/* eslint-enable import-x/no-unresolved */
import { z } from 'zod'
import { mkdirSync, readFileSync, writeFileSync } from 'fs'
import { execSync } from 'child_process'
// ── Types ──
interface ResearchOptions {
page: Page
issueContext: string
qaGuide: string
outputDir: string
serverUrl: string
anthropicApiKey?: string
maxTurns?: number
timeBudgetMs?: number
}
export type ReproMethod = 'e2e_test' | 'video' | 'both' | 'none'
export interface ResearchResult {
verdict: 'REPRODUCED' | 'NOT_REPRODUCIBLE' | 'INCONCLUSIVE'
reproducedBy: ReproMethod
summary: string
evidence: string
testCode: string
videoScript?: string
log: Array<{
turn: number
timestampMs: number
toolName: string
toolInput: unknown
toolResult: string
}>
}
// ── Main research function ──
export async function runResearchPhase(
opts: ResearchOptions
): Promise<ResearchResult> {
const { page, issueContext, qaGuide, outputDir, serverUrl, anthropicApiKey } =
opts
const maxTurns = opts.maxTurns ?? 50
let agentDone = false
let finalVerdict: ResearchResult['verdict'] = 'INCONCLUSIVE'
let finalReproducedBy: ReproMethod = 'none'
let finalSummary = 'Agent did not complete'
let finalEvidence = ''
let finalTestCode = ''
let finalVideoScript = ''
let turnCount = 0
let lastPassedTurn = -1
const startTime = Date.now()
const researchLog: ResearchResult['log'] = []
const testDir = `${outputDir}/research`
mkdirSync(testDir, { recursive: true })
const testPath = `${testDir}/reproduce.spec.ts`
// Get initial a11y snapshot for context
let initialA11y = ''
try {
initialA11y = await page.locator('body').ariaSnapshot({ timeout: 5000 })
initialA11y = initialA11y.slice(0, 3000)
} catch {
initialA11y = '(could not capture initial a11y snapshot)'
}
// ── Tool: inspect ──
const inspectTool = tool(
'inspect',
'Read the current accessibility tree to understand UI state. Use this to discover element names, roles, and selectors for your test.',
{
selector: z
.string()
.optional()
.describe(
'Optional filter — only show elements matching this name/role. Omit for full tree.'
)
},
async (args: { selector?: string }) => {
let resultText: string
try {
const ariaText = await page
.locator('body')
.ariaSnapshot({ timeout: 5000 })
if (args.selector) {
const lines = ariaText.split('\n')
const matches = lines.filter((l: string) =>
l.toLowerCase().includes(args.selector!.toLowerCase())
)
resultText =
matches.length > 0
? `Found "${args.selector}":\n${matches.slice(0, 15).join('\n')}`
: `"${args.selector}" not found. Full tree:\n${ariaText.slice(0, 2000)}`
} else {
resultText = ariaText.slice(0, 3000)
}
} catch (e) {
resultText = `inspect failed: ${e instanceof Error ? e.message : e}`
}
researchLog.push({
turn: turnCount,
timestampMs: Date.now() - startTime,
toolName: 'inspect',
toolInput: args,
toolResult: resultText.slice(0, 500)
})
return { content: [{ type: 'text' as const, text: resultText }] }
}
)
// ── Tool: readFixture ──
const readFixtureTool = tool(
'readFixture',
'Read a fixture or helper file from browser_tests/fixtures/ to understand the API. Use this to discover available methods on comfyPage helpers before writing your test.',
{
path: z
.string()
.describe(
'Relative path within browser_tests/fixtures/, e.g. "helpers/CanvasHelper.ts" or "components/Topbar.ts" or "ComfyPage.ts"'
)
},
async (args: { path: string }) => {
let resultText: string
try {
const fullPath = `${projectRoot}/browser_tests/fixtures/${args.path}`
const content = readFileSync(fullPath, 'utf-8')
resultText = content.slice(0, 4000)
if (content.length > 4000) {
resultText += `\n\n... (truncated, ${content.length} total chars)`
}
} catch (e) {
resultText = `Could not read fixture: ${e instanceof Error ? e.message : e}`
}
researchLog.push({
turn: turnCount,
timestampMs: Date.now() - startTime,
toolName: 'readFixture',
toolInput: args,
toolResult: resultText.slice(0, 500)
})
return { content: [{ type: 'text' as const, text: resultText }] }
}
)
// ── Tool: readTest ──
const readTestTool = tool(
'readTest',
'Read an existing E2E test file from browser_tests/tests/ to learn patterns and conventions used in this project.',
{
path: z
.string()
.describe(
'Relative path within browser_tests/tests/, e.g. "workflow.spec.ts" or "subgraph.spec.ts"'
)
},
async (args: { path: string }) => {
let resultText: string
try {
const fullPath = `${projectRoot}/browser_tests/tests/${args.path}`
const content = readFileSync(fullPath, 'utf-8')
resultText = content.slice(0, 4000)
if (content.length > 4000) {
resultText += `\n\n... (truncated, ${content.length} total chars)`
}
} catch (e) {
// List available test files if the path doesn't exist
try {
const { readdirSync } = await import('fs')
const files = readdirSync(`${projectRoot}/browser_tests/tests/`)
.filter((f: string) => f.endsWith('.spec.ts'))
.slice(0, 30)
resultText = `File not found: ${args.path}\n\nAvailable test files:\n${files.join('\n')}`
} catch {
resultText = `Could not read test: ${e instanceof Error ? e.message : e}`
}
}
researchLog.push({
turn: turnCount,
timestampMs: Date.now() - startTime,
toolName: 'readTest',
toolInput: args,
toolResult: resultText.slice(0, 500)
})
return { content: [{ type: 'text' as const, text: resultText }] }
}
)
// ── Tool: writeTest ──
const writeTestTool = tool(
'writeTest',
'Write a Playwright E2E test file that reproduces the bug. The test should assert the broken behavior exists.',
{
code: z
.string()
.describe('Complete Playwright test file content (.spec.ts)')
},
async (args: { code: string }) => {
writeFileSync(testPath, args.code)
researchLog.push({
turn: turnCount,
timestampMs: Date.now() - startTime,
toolName: 'writeTest',
toolInput: { path: testPath, codeLength: args.code.length },
toolResult: `Test written to ${testPath} (${args.code.length} chars)`
})
return {
content: [
{
type: 'text' as const,
text: `Test written to ${testPath}. Use runTest() to execute it.`
}
]
}
}
)
// ── Tool: runTest ──
// Place test in browser_tests/ so Playwright config finds fixtures
const projectRoot = process.cwd()
const browserTestPath = `${projectRoot}/browser_tests/tests/qa-reproduce.spec.ts`
const runTestTool = tool(
'runTest',
'Run the Playwright test and get results. Returns stdout/stderr including assertion errors.',
{},
async () => {
turnCount++
// Copy the test to browser_tests/tests/ where Playwright expects it
const { copyFileSync } = await import('fs')
try {
copyFileSync(testPath, browserTestPath)
} catch {
// directory may not exist
mkdirSync(`${projectRoot}/browser_tests/tests`, { recursive: true })
copyFileSync(testPath, browserTestPath)
}
let resultText: string
try {
const output = execSync(
`cd "${projectRoot}" && npx playwright test browser_tests/tests/qa-reproduce.spec.ts --reporter=list --timeout=30000 --retries=0 --workers=1 2>&1`,
{
timeout: 90000,
encoding: 'utf-8',
env: {
...process.env,
COMFYUI_BASE_URL: serverUrl
}
}
)
resultText = `TEST PASSED:\n${output.slice(-1500)}`
} catch (e) {
const err = e as { stdout?: string; stderr?: string; message?: string }
const output = (err.stdout || '') + '\n' + (err.stderr || '')
resultText = `TEST FAILED:\n${output.slice(-2000)}`
}
researchLog.push({
turn: turnCount,
timestampMs: Date.now() - startTime,
toolName: 'runTest',
toolInput: { testPath },
toolResult: resultText.slice(0, 1000)
})
// Auto-save passing test code for fallback completion — but only if
// the test contains a bug-specific assertion (not just a discovery/debug test)
if (resultText.startsWith('TEST PASSED')) {
try {
const code = readFileSync(browserTestPath, 'utf-8')
const hasBugAssertion =
/expect\s*\(/.test(code) &&
!/^\s*expect\([^)]+\)\.toBeDefined\(\)/m.test(code) &&
!/^\s*expect\([^)]+\)\.toBeGreaterThan\(0\)/m.test(code) &&
!/Inspect|Find|Debug|discover/i.test(
code.match(/test\(['"`]([^'"`]+)/)?.[1] ?? ''
)
if (hasBugAssertion) {
finalTestCode = code
lastPassedTurn = turnCount
}
} catch {
// ignore
}
resultText +=
'\n\n⚠ Test PASSED — call done() now with verdict REPRODUCED and the test code. Do NOT write more tests.'
}
return { content: [{ type: 'text' as const, text: resultText }] }
}
)
// ── Tool: done ──
const doneTool = tool(
'done',
'Finish research with verdict and the final test code.',
{
verdict: z.enum(['REPRODUCED', 'NOT_REPRODUCIBLE', 'INCONCLUSIVE']),
reproducedBy: z
.enum(['e2e_test', 'video', 'both', 'none'])
.describe(
'How the bug was proven: e2e_test = Playwright assertion passed, video = visual evidence only, both = both methods, none = not reproduced'
),
summary: z.string().describe('What you found and why'),
evidence: z.string().describe('Test output that proves the verdict'),
testCode: z
.string()
.describe(
'Final Playwright test code. If REPRODUCED, this test asserts the bug exists and passes.'
),
videoScript: z
.string()
.optional()
.describe(
'Demowright video script for Phase 2 demo recording. REQUIRED when verdict is REPRODUCED. A separate test file using createVideoScript with title, segments, and outro. Do NOT include demowright imports in testCode.'
)
},
async (args: {
verdict: ResearchResult['verdict']
reproducedBy: ReproMethod
summary: string
evidence: string
testCode: string
videoScript?: string
}) => {
agentDone = true
finalVerdict = args.verdict
finalReproducedBy = args.reproducedBy
finalSummary = args.summary
finalEvidence = args.evidence
finalTestCode = args.testCode
finalVideoScript = args.videoScript ?? ''
writeFileSync(testPath, args.testCode)
if (args.videoScript) {
writeFileSync(`${outputDir}/video-script.spec.ts`, args.videoScript)
}
return {
content: [
{ type: 'text' as const, text: `Research complete: ${args.verdict}` }
]
}
}
)
// ── MCP Server ──
const server = createSdkMcpServer({
name: 'qa-research',
version: '1.0.0',
tools: [
inspectTool,
readFixtureTool,
readTestTool,
writeTestTool,
runTestTool,
doneTool
]
})
// ── System prompt ──
const systemPrompt = `You are a senior QA engineer who writes Playwright E2E tests to reproduce reported bugs.
## Your tools
- inspect(selector?) — Read the accessibility tree to understand the current UI. Use to discover selectors, element names, and UI state.
- readFixture(path) — Read fixture source code from browser_tests/fixtures/. Use to discover available methods. E.g. "helpers/CanvasHelper.ts", "components/Topbar.ts", "ComfyPage.ts"
- readTest(path) — Read an existing test from browser_tests/tests/ to learn patterns. E.g. "workflow.spec.ts". Pass any name to list available files.
- writeTest(code) — Write a Playwright test file (.spec.ts)
- runTest() — Execute the test and get results (pass/fail + errors)
- done(verdict, summary, evidence, testCode) — Finish with the final test
## Workflow
1. Read the issue description carefully
2. Use inspect() to understand the current UI state and discover element selectors
3. If unsure about the fixture API, use readFixture() to read the relevant helper source code
4. If unsure about test patterns, use readTest() to read an existing test for reference
5. Write a Playwright test that:
- Performs the exact reproduction steps from the issue
- Asserts the BROKEN behavior (the bug) — so the test PASSES when the bug exists
6. Run the test with runTest()
7. If it fails: read the error, fix the test, run again (max 5 attempts)
8. Call done() with the final verdict and test code
## Test writing guidelines
- Import the project fixture: \`import { comfyPageFixture as test } from '../fixtures/ComfyPage'\`
- Import expect: \`import { expect } from '@playwright/test'\`
- The fixture provides \`comfyPage\` which has all the helpers listed below
- If the bug IS present, the test should PASS. If the bug is fixed, the test would FAIL.
- Keep tests focused and minimal — test ONLY the reported bug
- Write ONE test, not multiple. Focus on the single clearest reproduction.
- The test file will be placed in browser_tests/tests/qa-reproduce.spec.ts
- Use \`comfyPage.nextFrame()\` after interactions that trigger UI updates
- NEVER use \`page.waitForTimeout()\` — use Locator actions and retrying assertions instead
- ALWAYS call done() when finished, even if the test passed — do not keep iterating after a passing test
- Use \`expect.poll()\` for async assertions: \`await expect.poll(() => comfyPage.nodeOps.getGraphNodesCount()).toBe(8)\`
- CRITICAL: Your assertions must be SPECIFIC TO THE BUG. A test that asserts \`expect(count).toBeGreaterThan(0)\` proves nothing — it would pass even without the bug. Instead assert the exact broken state, e.g. \`expect(clonedWidgets).toHaveLength(0)\` (missing widgets) or \`expect(zIndex).toBeLessThan(parentZIndex)\` (wrong z-order). If a test passes trivially, it's a false positive.
- NEVER write "debug", "discovery", or "inspect node types" tests. These waste turns and produce false REPRODUCED verdicts. If you need to discover node type names, use inspect() or readFixture() — not a passing test.
- If you cannot write a bug-specific assertion, call done() with verdict NOT_REPRODUCIBLE and explain why.
## ComfyPage Fixture API Reference
### Core properties
- \`comfyPage.page\` — raw Playwright Page
- \`comfyPage.canvas\` — Locator for #graph-canvas
- \`comfyPage.queueButton\` — "Queue Prompt" button
- \`comfyPage.runButton\` — "Run" button (new UI)
- \`comfyPage.confirmDialog\` — ConfirmDialog (has .confirm, .delete, .overwrite, .reject locators + .click(name) method)
- \`comfyPage.nextFrame()\` — wait for next requestAnimationFrame
- \`comfyPage.setup()\` — navigate + wait for app ready (called automatically by fixture)
### Menu (comfyPage.menu)
- \`comfyPage.menu.topbar\` — Topbar helper:
- \`.triggerTopbarCommand(['File', 'Save As'])\` — navigate menu hierarchy
- \`.openTopbarMenu()\` / \`.closeTopbarMenu()\` — open/close hamburger
- \`.openSubmenu('File')\` — hover to open submenu, returns submenu Locator
- \`.getTabNames()\` — get all open workflow tab names
- \`.getActiveTabName()\` — get active tab name
- \`.getWorkflowTab(name)\` — get tab Locator
- \`.closeWorkflowTab(name)\` — close a tab
- \`.saveWorkflow(name)\` / \`.saveWorkflowAs(name)\` / \`.exportWorkflow(name)\`
- \`.switchTheme('dark' | 'light')\`
- \`comfyPage.menu.workflowsTab\` — WorkflowsSidebarTab:
- \`.open()\` / \`.close()\` — toggle workflows sidebar
- \`.getTopLevelSavedWorkflowNames()\` — list saved workflow names
- \`comfyPage.menu.nodeLibraryTab\` — NodeLibrarySidebarTab
- \`comfyPage.menu.assetsTab\` — AssetsSidebarTab
### Canvas (comfyPage.canvasOps)
- \`.click({x, y})\` — click at position on canvas
- \`.rightClick(x, y)\` — right-click (opens context menu)
- \`.doubleClick()\` — double-click canvas (opens node search)
- \`.clickEmptySpace()\` — click known empty area
- \`.dragAndDrop(source, target)\` — drag from source to target position
- \`.pan(offset, safeSpot?)\` — pan canvas by offset
- \`.zoom(deltaY, steps?)\` — zoom via scroll wheel
- \`.resetView()\` — reset zoom/pan to default
- \`.getScale()\` / \`.setScale(n)\` — get/set canvas zoom
- \`.getNodeCenterByTitle(title)\` — get screen coords of node center
- \`.disconnectEdge()\` / \`.connectEdge()\` — default graph edge operations
### Node Operations (comfyPage.nodeOps)
- \`.getGraphNodesCount()\` — count all nodes
- \`.getSelectedGraphNodesCount()\` — count selected nodes
- \`.getNodes()\` — get all nodes
- \`.getFirstNodeRef()\` — get NodeReference for first node
- \`.getNodeRefById(id)\` — get NodeReference by ID
- \`.getNodeRefsByType(type)\` — get all nodes of a type
- \`.waitForGraphNodes(count)\` — wait until node count matches
### Settings (comfyPage.settings)
- \`.setSetting(id, value)\` — change a ComfyUI setting
- \`.getSetting(id)\` — read current setting value
### Keyboard (comfyPage.keyboard)
- \`.undo()\` / \`.redo()\` — Ctrl+Z / Ctrl+Y
- \`.bypass()\` — Ctrl+B
- \`.selectAll()\` — Ctrl+A
- \`.ctrlSend(key)\` — send Ctrl+key
### Workflow (comfyPage.workflow)
- \`.loadWorkflow(name)\` — load from browser_tests/assets/{name}.json
- \`.setupWorkflowsDirectory(structure)\` — setup test directory
- \`.deleteWorkflow(name)\`
- \`.isCurrentWorkflowModified()\` — check dirty state
### Context Menu (comfyPage.contextMenu)
- \`.openFor(locator)\` — right-click locator and wait for menu
- \`.clickMenuItem(name)\` — click a menu item by name
- \`.isVisible()\` — check if context menu is showing
- \`.assertHasItems(items)\` — assert menu contains items
### Other helpers
- \`comfyPage.settingDialog\` — SettingDialog component
- \`comfyPage.searchBox\` / \`comfyPage.searchBoxV2\` — node search
- \`comfyPage.toast\` — ToastHelper (\`.visibleToasts\`)
- \`comfyPage.subgraph\` — SubgraphHelper
- \`comfyPage.vueNodes\` — VueNodeHelpers
- \`comfyPage.bottomPanel\` — BottomPanel
- \`comfyPage.clipboard\` — ClipboardHelper
- \`comfyPage.dragDrop\` — DragDropHelper
### Available fixture files (use readFixture to explore)
- ComfyPage.ts — main fixture with all helpers
- helpers/CanvasHelper.ts, NodeOperationsHelper.ts, WorkflowHelper.ts
- helpers/KeyboardHelper.ts, SettingsHelper.ts, SubgraphHelper.ts
- components/Topbar.ts, ContextMenu.ts, SettingDialog.ts, SidebarTab.ts
## Video Script (IMPORTANT — provide via done() tool)
When calling done(), provide a \`videoScript\` parameter with a SEPARATE test file that uses demowright's createVideoScript.
Do NOT put demowright imports in testCode — they won't resolve in Phase 1.
The videoScript is a complete, standalone Playwright test file for Phase 2 demo recording:
\`\`\`typescript
import { comfyPageFixture as test } from '../fixtures/ComfyPage'
import { createVideoScript, showTitleCard, hideTitleCard } from 'demowright/video-script'
test('Demo: Bug Title', async ({ comfyPage }) => {
// Show title card IMMEDIATELY — covers the screen while setup runs behind it
await showTitleCard(comfyPage.page, 'Bug Title Here', { subtitle: 'Issue #NNNN' })
// Setup runs while title card is visible
await comfyPage.settings.setSetting('Comfy.UseNewMenu', 'Top')
await comfyPage.workflow.setupWorkflowsDirectory({})
// Remove early title card before script starts (script will show its own)
await hideTitleCard(comfyPage.page)
const script = createVideoScript()
.title('Bug Title Here', { subtitle: 'Issue #NNNN', durationMs: 4000 })
.segment('Step 1: description of what we do', async (pace) => {
await pace() // narration finishes FIRST
await comfyPage.menu.topbar.saveWorkflow('name') // THEN action
await comfyPage.page.waitForTimeout(2000) // pause for viewer
})
.segment('Bug evidence: what we see proves the bug', async (pace) => {
await pace()
await comfyPage.page.waitForTimeout(5000) // hold on evidence
})
.outro({ text: 'Bug Reproduced', subtitle: 'Summary' })
await script.render(comfyPage.page)
})
\`\`\`
Key API:
- \`.title(text, {subtitle?, durationMs?})\` — title card (4s default)
- \`.segment(narrationText, async (pace) => { await pace(); ...actions... })\` — TTS narrated step
- \`.outro({text?, subtitle?, durationMs?})\` — ending card
- \`pace()\` — wait for narration audio to finish
CRITICAL TIMING: Call \`await pace()\` FIRST in each segment, BEFORE the Playwright actions.
This makes the narration play and finish, THEN the actions execute — so viewers hear what's about
to happen before it happens. Pattern:
\`\`\`typescript
.segment('Now we save the workflow as a new name', async (pace) => {
await pace() // narration finishes first
await comfyPage.menu.topbar.saveWorkflowAs('new-name') // then action happens
await comfyPage.page.waitForTimeout(2000) // pause so viewer sees the result
})
\`\`\`
IMPORTANT RULES for videoScript:
1. You MUST provide videoScript when verdict is REPRODUCED — every reproduced bug needs a narrated demo
2. Call showTitleCard() BEFORE setup, run setup behind it, call hideTitleCard() before createVideoScript() — see example
3. Call \`await pace()\` FIRST in each segment callback, BEFORE actions
4. Add \`waitForTimeout(2000)\` after each action so viewers can see the result
5. Final evidence segment: hold for 5+ seconds
6. Reproduce the same steps as testCode but slower with clear narration
## Current UI state (accessibility tree)
${initialA11y}
${qaGuide ? `## QA Analysis Guide\n${qaGuide}\n` : ''}
## Issue to Reproduce
${issueContext}`
// ── Run the agent ──
console.warn('Starting research phase (Claude writes E2E tests)...')
try {
for await (const message of query({
prompt:
'Write a Playwright E2E test that reproduces the reported bug. Use inspect() to discover selectors, readFixture() or readTest() if you need to understand the fixture API or see existing test patterns, writeTest() to write the test, runTest() to execute it. Iterate until it works or you determine the bug cannot be reproduced.',
options: {
model: 'claude-sonnet-4-6',
systemPrompt,
...(anthropicApiKey ? { apiKey: anthropicApiKey } : {}),
maxTurns,
mcpServers: { 'qa-research': server },
allowedTools: [
'mcp__qa-research__inspect',
'mcp__qa-research__readFixture',
'mcp__qa-research__readTest',
'mcp__qa-research__writeTest',
'mcp__qa-research__runTest',
'mcp__qa-research__done'
]
}
})) {
if (message.type === 'assistant' && message.message?.content) {
for (const block of message.message.content) {
if ('text' in block && block.text) {
console.warn(` Claude: ${block.text.slice(0, 200)}`)
}
if ('name' in block) {
console.warn(
` Tool: ${block.name}(${JSON.stringify(block.input).slice(0, 100)})`
)
}
}
}
if (agentDone) break
}
} catch (e) {
const errMsg = e instanceof Error ? e.message : String(e)
console.warn(`Research error: ${errMsg}`)
// Detect billing/auth errors and surface them clearly
if (
errMsg.includes('Credit balance is too low') ||
errMsg.includes('insufficient_quota') ||
errMsg.includes('rate_limit')
) {
finalSummary = `API error: ${errMsg.slice(0, 200)}`
finalEvidence = 'Agent could not start due to API billing/auth issue'
console.warn(
'::error::Anthropic API credits exhausted — cannot run research phase'
)
}
}
// Auto-complete: if a test passed but done() was never called, use the passing test
if (!agentDone && lastPassedTurn >= 0 && finalTestCode) {
console.warn(
`Auto-completing: test passed at turn ${lastPassedTurn} but done() was not called`
)
finalVerdict = 'REPRODUCED'
finalReproducedBy = 'e2e_test'
finalSummary = `Test passed at turn ${lastPassedTurn} (auto-completed — agent did not call done())`
finalEvidence = `Test passed with exit code 0`
}
const result: ResearchResult = {
verdict: finalVerdict,
reproducedBy: finalReproducedBy,
summary: finalSummary,
evidence: finalEvidence,
testCode: finalTestCode,
videoScript: finalVideoScript || undefined,
log: researchLog
}
writeFileSync(`${testDir}/research-log.json`, JSON.stringify(result, null, 2))
console.warn(
`Research complete: ${finalVerdict} (${researchLog.length} tool calls)`
)
return result
}

View File

@@ -0,0 +1,84 @@
import { describe, expect, it } from 'vitest'
import { extractMediaUrls } from './qa-analyze-pr'
describe('extractMediaUrls', () => {
it('extracts markdown image URLs', () => {
const text = '![screenshot](https://example.com/image.png)'
expect(extractMediaUrls(text)).toEqual(['https://example.com/image.png'])
})
it('extracts multiple markdown images', () => {
const text = [
'![before](https://example.com/before.png)',
'Some text',
'![after](https://example.com/after.jpg)'
].join('\n')
expect(extractMediaUrls(text)).toEqual([
'https://example.com/before.png',
'https://example.com/after.jpg'
])
})
it('extracts raw URLs with media extensions', () => {
const text = 'Check this: https://cdn.example.com/demo.mp4 for details'
expect(extractMediaUrls(text)).toEqual(['https://cdn.example.com/demo.mp4'])
})
it('extracts GitHub user-attachments URLs', () => {
const text =
'https://github.com/user-attachments/assets/abc12345-6789-0def-1234-567890abcdef'
expect(extractMediaUrls(text)).toEqual([
'https://github.com/user-attachments/assets/abc12345-6789-0def-1234-567890abcdef'
])
})
it('extracts private-user-images URLs', () => {
const text =
'https://private-user-images.githubusercontent.com/12345/abcdef-1234?jwt=token123'
expect(extractMediaUrls(text)).toEqual([
'https://private-user-images.githubusercontent.com/12345/abcdef-1234?jwt=token123'
])
})
it('extracts URLs with query parameters', () => {
const text = 'https://example.com/image.png?w=800&h=600'
expect(extractMediaUrls(text)).toEqual([
'https://example.com/image.png?w=800&h=600'
])
})
it('deduplicates URLs', () => {
const text = [
'![img](https://example.com/same.png)',
'![img2](https://example.com/same.png)',
'Also https://example.com/same.png'
].join('\n')
expect(extractMediaUrls(text)).toEqual(['https://example.com/same.png'])
})
it('returns empty array for empty input', () => {
expect(extractMediaUrls('')).toEqual([])
})
it('returns empty array for text with no media URLs', () => {
expect(extractMediaUrls('Just some text without any URLs')).toEqual([])
})
it('handles mixed media types', () => {
const text = [
'![screen](https://example.com/screenshot.png)',
'Video: https://example.com/demo.webm',
'![gif](https://example.com/animation.gif)'
].join('\n')
const urls = extractMediaUrls(text)
expect(urls).toContain('https://example.com/screenshot.png')
expect(urls).toContain('https://example.com/demo.webm')
expect(urls).toContain('https://example.com/animation.gif')
})
it('ignores non-http URLs in markdown', () => {
const text = '![local](./local-image.png)'
expect(extractMediaUrls(text)).toEqual([])
})
})

View File

@@ -0,0 +1,799 @@
#!/usr/bin/env tsx
/**
* QA PR Analysis Script
*
* Deeply analyzes a PR using Gemini Pro to generate targeted QA guides
* for before/after recording sessions. Fetches PR thread, extracts media,
* and produces structured test plans.
*
* Usage:
* pnpm exec tsx scripts/qa-analyze-pr.ts \
* --pr-number 10270 \
* --repo owner/repo \
* --output-dir qa-guides/ \
* [--model gemini-3.1-pro-preview]
*
* Env: GEMINI_API_KEY (required)
*/
import { execSync } from 'node:child_process'
import { mkdirSync, readFileSync, writeFileSync } from 'node:fs'
import { resolve } from 'node:path'
import { fileURLToPath } from 'node:url'
import { GoogleGenerativeAI } from '@google/generative-ai'
// ── Types ──
interface QaGuideStep {
action: string
description: string
expected_before?: string
expected_after?: string
}
interface QaGuide {
summary: string
test_focus: string
prerequisites: string[]
steps: QaGuideStep[]
visual_checks: string[]
}
interface PrThread {
title: string
body: string
labels: string[]
issueComments: string[]
reviewComments: string[]
reviews: string[]
diff: string
}
type TargetType = 'pr' | 'issue'
interface Options {
prNumber: string
repo: string
outputDir: string
model: string
apiKey: string
mediaBudgetBytes: number
maxVideoBytes: number
type: TargetType
}
// ── CLI parsing ──
function parseArgs(): Options {
const args = process.argv.slice(2)
const opts: Partial<Options> = {
model: 'gemini-3.1-pro-preview',
apiKey: process.env.GEMINI_API_KEY || '',
mediaBudgetBytes: 20 * 1024 * 1024,
maxVideoBytes: 10 * 1024 * 1024,
type: 'pr'
}
for (let i = 0; i < args.length; i++) {
switch (args[i]) {
case '--pr-number':
opts.prNumber = args[++i]
break
case '--repo':
opts.repo = args[++i]
break
case '--output-dir':
opts.outputDir = args[++i]
break
case '--model':
opts.model = args[++i]
break
case '--type':
opts.type = args[++i] as TargetType
break
case '--help':
console.warn(
'Usage: qa-analyze-pr.ts --pr-number <num> --repo <owner/repo> --output-dir <path> [--model <model>] [--type pr|issue]'
)
process.exit(0)
}
}
if (!opts.prNumber || !opts.repo || !opts.outputDir) {
console.error(
'Required: --pr-number <num> --repo <owner/repo> --output-dir <path>'
)
process.exit(1)
}
if (!opts.apiKey) {
console.error('GEMINI_API_KEY environment variable is required')
process.exit(1)
}
return opts as Options
}
// ── PR thread fetching ──
function ghExec(cmd: string): string {
try {
return execSync(cmd, {
encoding: 'utf-8',
timeout: 30_000,
stdio: ['pipe', 'pipe', 'pipe']
}).trim()
} catch (err) {
console.warn(`gh command failed: ${cmd}`)
console.warn((err as Error).message)
return ''
}
}
function fetchPrThread(prNumber: string, repo: string): PrThread {
console.warn('Fetching PR thread...')
const prView = ghExec(
`gh pr view ${prNumber} --repo ${repo} --json title,body,labels`
)
const prData = prView
? JSON.parse(prView)
: { title: '', body: '', labels: [] }
const issueCommentsRaw = ghExec(
`gh api repos/${repo}/issues/${prNumber}/comments --paginate`
)
const issueComments: string[] = issueCommentsRaw
? JSON.parse(issueCommentsRaw).map((c: { body: string }) => c.body)
: []
const reviewCommentsRaw = ghExec(
`gh api repos/${repo}/pulls/${prNumber}/comments --paginate`
)
const reviewComments: string[] = reviewCommentsRaw
? JSON.parse(reviewCommentsRaw).map((c: { body: string }) => c.body)
: []
const reviewsRaw = ghExec(
`gh api repos/${repo}/pulls/${prNumber}/reviews --paginate`
)
const reviews: string[] = reviewsRaw
? JSON.parse(reviewsRaw)
.filter((r: { body: string }) => r.body)
.map((r: { body: string }) => r.body)
: []
const diff = ghExec(`gh pr diff ${prNumber} --repo ${repo}`)
console.warn(
`PR #${prNumber}: "${prData.title}" | ` +
`${issueComments.length} issue comments, ` +
`${reviewComments.length} review comments, ` +
`${reviews.length} reviews, ` +
`diff: ${diff.length} chars`
)
return {
title: prData.title || '',
body: prData.body || '',
labels: (prData.labels || []).map((l: { name: string }) => l.name),
issueComments,
reviewComments,
reviews,
diff
}
}
interface IssueThread {
title: string
body: string
labels: string[]
comments: string[]
}
function fetchIssueThread(issueNumber: string, repo: string): IssueThread {
console.warn('Fetching issue thread...')
const issueView = ghExec(
`gh issue view ${issueNumber} --repo ${repo} --json title,body,labels`
)
const issueData = issueView
? JSON.parse(issueView)
: { title: '', body: '', labels: [] }
const commentsRaw = ghExec(
`gh api repos/${repo}/issues/${issueNumber}/comments --paginate`
)
const comments: string[] = commentsRaw
? JSON.parse(commentsRaw).map((c: { body: string }) => c.body)
: []
console.warn(
`Issue #${issueNumber}: "${issueData.title}" | ` +
`${comments.length} comments`
)
return {
title: issueData.title || '',
body: issueData.body || '',
labels: (issueData.labels || []).map((l: { name: string }) => l.name),
comments
}
}
// ── Media extraction ──
const MEDIA_EXTENSIONS = /\.(png|jpg|jpeg|gif|webp|mp4|webm|mov)$/i
const MEDIA_URL_PATTERNS = [
// Markdown images: ![alt](url)
/!\[[^\]]*\]\(([^)]+)\)/g,
// GitHub user-attachments
/https:\/\/github\.com\/user-attachments\/assets\/[a-f0-9-]+/g,
// Private user images
/https:\/\/private-user-images\.githubusercontent\.com\/[^\s)"]+/g,
// Raw URLs with media extensions (standalone or in text)
/(?<!="|=')https?:\/\/[^\s)<>"]+\.(?:png|jpg|jpeg|gif|webp|mp4|webm|mov)(?:\?[^\s)<>"]*)?/gi
]
export function extractMediaUrls(text: string): string[] {
if (!text) return []
const urls = new Set<string>()
for (const pattern of MEDIA_URL_PATTERNS) {
// Reset lastIndex for global patterns
pattern.lastIndex = 0
let match: RegExpExecArray | null
while ((match = pattern.exec(text)) !== null) {
// For markdown images, the URL is in capture group 1
const url = match[1] || match[0]
// Clean trailing markdown/html artifacts
const cleaned = url.replace(/[)>"'\s]+$/, '')
if (cleaned.startsWith('http')) {
urls.add(cleaned)
}
}
}
return [...urls]
}
// ── Media downloading ──
const ALLOWED_MEDIA_DOMAINS = [
'github.com',
'raw.githubusercontent.com',
'user-images.githubusercontent.com',
'private-user-images.githubusercontent.com',
'objects.githubusercontent.com',
'github.githubassets.com'
]
function isAllowedMediaDomain(url: string): boolean {
try {
const hostname = new URL(url).hostname
return ALLOWED_MEDIA_DOMAINS.some(
(domain) => hostname === domain || hostname.endsWith(`.${domain}`)
)
} catch {
return false
}
}
async function downloadMedia(
urls: string[],
outputDir: string,
budgetBytes: number,
maxVideoBytes: number
): Promise<Array<{ path: string; mimeType: string }>> {
const downloaded: Array<{ path: string; mimeType: string }> = []
let totalBytes = 0
const mediaDir = resolve(outputDir, 'media')
mkdirSync(mediaDir, { recursive: true })
for (const url of urls) {
if (totalBytes >= budgetBytes) {
console.warn(
`Media budget exhausted (${totalBytes} bytes), skipping rest`
)
break
}
if (!isAllowedMediaDomain(url)) {
console.warn(`Skipping non-GitHub URL: ${url.slice(0, 80)}`)
continue
}
try {
const response = await fetch(url, {
signal: AbortSignal.timeout(15_000),
headers: { Accept: 'image/*,video/*' },
redirect: 'follow'
})
if (!response.ok) {
console.warn(`Failed to download ${url}: ${response.status}`)
continue
}
const contentLength = response.headers.get('content-length')
if (contentLength) {
const declaredSize = Number.parseInt(contentLength, 10)
if (declaredSize > budgetBytes - totalBytes) {
console.warn(
`Content-Length ${declaredSize} would exceed budget, skipping ${url}`
)
continue
}
}
const contentType = response.headers.get('content-type') || ''
const buffer = Buffer.from(await response.arrayBuffer())
// Skip oversized videos
const isVideo =
contentType.startsWith('video/') || /\.(mp4|webm|mov)$/i.test(url)
if (isVideo && buffer.length > maxVideoBytes) {
console.warn(
`Skipping large video ${url} (${(buffer.length / 1024 / 1024).toFixed(1)}MB > ${(maxVideoBytes / 1024 / 1024).toFixed(0)}MB cap)`
)
continue
}
if (totalBytes + buffer.length > budgetBytes) {
console.warn(`Would exceed budget, skipping ${url}`)
continue
}
const ext = guessExtension(url, contentType)
const filename = `media-${downloaded.length}${ext}`
const filepath = resolve(mediaDir, filename)
writeFileSync(filepath, buffer)
totalBytes += buffer.length
const mimeType = contentType.split(';')[0].trim() || guessMimeType(ext)
downloaded.push({ path: filepath, mimeType })
console.warn(
`Downloaded: ${url.slice(0, 80)}... (${(buffer.length / 1024).toFixed(0)}KB)`
)
} catch (err) {
console.warn(`Failed to download ${url}: ${(err as Error).message}`)
}
}
console.warn(
`Downloaded ${downloaded.length}/${urls.length} media files ` +
`(${(totalBytes / 1024 / 1024).toFixed(1)}MB)`
)
return downloaded
}
function guessExtension(url: string, contentType: string): string {
const urlMatch = url.match(MEDIA_EXTENSIONS)
if (urlMatch) return urlMatch[0].toLowerCase()
const typeMap: Record<string, string> = {
'image/png': '.png',
'image/jpeg': '.jpg',
'image/gif': '.gif',
'image/webp': '.webp',
'video/mp4': '.mp4',
'video/webm': '.webm'
}
return typeMap[contentType.split(';')[0]] || '.bin'
}
function guessMimeType(ext: string): string {
const map: Record<string, string> = {
'.png': 'image/png',
'.jpg': 'image/jpeg',
'.jpeg': 'image/jpeg',
'.gif': 'image/gif',
'.webp': 'image/webp',
'.mp4': 'video/mp4',
'.webm': 'video/webm',
'.mov': 'video/quicktime'
}
return map[ext] || 'application/octet-stream'
}
// ── Gemini analysis ──
function buildIssueAnalysisPrompt(issue: IssueThread): string {
const allText = [
`# Issue: ${issue.title}`,
'',
'## Description',
issue.body,
'',
issue.comments.length > 0
? `## Comments\n${issue.comments.join('\n\n---\n\n')}`
: ''
]
.filter(Boolean)
.join('\n')
return `You are a senior QA engineer analyzing a bug report for ComfyUI frontend — a node-based visual workflow editor for AI image generation (Vue 3 + TypeScript).
The UI has:
- A large canvas (1280x720 viewport) showing a node graph centered at ~(640, 400)
- Nodes are boxes with input/output slots connected by wires
- A hamburger menu (top-left C logo) with File, Edit, Help submenus
- Sidebars (Workflows, Node Library, Models)
- A topbar with workflow tabs and Queue button
- The default workflow loads with these nodes (approximate center coordinates):
- Load Checkpoint (~150, 300), CLIP Text Encode x2 (~450, 250 and ~450, 450)
- Empty Latent Image (~450, 600), KSampler (~750, 350), VAE Decode (~1000, 350), Save Image (~1200, 350)
- Right-clicking ON a node shows node actions (Clone, Bypass, Convert, etc.)
- Right-clicking on EMPTY canvas shows Add Node menu — different from node context menu
Your task: Generate a DETAILED reproduction guide (8-15 steps) to trigger this bug on main.
${allText}
## Available test actions
Each step must use one of these actions:
### Menu actions
- "openMenu" — clicks the Comfy hamburger menu (top-left C logo)
- "hoverMenuItem" — hovers a top-level menu item to open submenu (label required)
- "clickMenuItem" — clicks an item in the visible submenu (label required)
### Element actions (by visible text)
- "click" — clicks an element by visible text (text required)
- "rightClick" — right-clicks an element to open context menu (text required)
- "doubleClick" — double-clicks an element or coordinates (text or x,y)
- "fillDialog" — fills dialog input and presses Enter (text required)
- "pressKey" — presses a keyboard key (key required: Escape, Tab, Delete, Enter, etc.)
### Canvas actions (by coordinates — viewport is 1280x720)
- "clickCanvas" — click at coordinates (x, y required)
- "rightClickCanvas" — right-click at coordinates (x, y required)
- "doubleClick" — double-click at coordinates to open node search (x, y)
- "dragCanvas" — drag from one point to another (fromX, fromY, toX, toY)
- "scrollCanvas" — scroll wheel for zoom (x, y, deltaY: negative=zoom in, positive=zoom out)
### Utility
- "wait" — waits briefly (ms required, max 3000)
- "screenshot" — takes a screenshot (name required)
## Common ComfyUI interactions
- Right-click a node → context menu with Clone, Bypass, Remove, Colors, etc.
- Double-click empty canvas → opens node search dialog
- Ctrl+C / Ctrl+V → copy/paste selected nodes
- Delete key → remove selected node
- Ctrl+G → group selected nodes
- Drag from output slot to input slot → create connection
- Click a node to select it, Shift+click for multi-select
## Output format
Return a JSON object with exactly one key: "reproduce", containing:
{
"summary": "One sentence: what bug this issue reports",
"test_focus": "Specific behavior to reproduce",
"prerequisites": ["e.g. Load default workflow"],
"steps": [
{
"action": "clickCanvas",
"description": "Click on first node to select it",
"expected_before": "What should happen if the bug is present"
}
],
"visual_checks": ["Specific visual evidence of the bug to look for"]
}
## Rules
- Generate 8-15 DETAILED steps that actually trigger the reported bug.
- Follow the issue's reproduction steps PRECISELY — translate them into available actions.
- Use canvas coordinates for node interactions (nodes are typically in the center area 300-900 x 200-500).
- Take screenshots BEFORE and AFTER critical actions to capture the bug state.
- Do NOT just open a menu and screenshot — actually perform the full reproduction sequence.
- Do NOT include login steps.
- Output ONLY valid JSON, no markdown fences or explanation.`
}
function buildAnalysisPrompt(thread: PrThread): string {
const allText = [
`# PR: ${thread.title}`,
'',
'## Description',
thread.body,
'',
thread.issueComments.length > 0
? `## Issue Comments\n${thread.issueComments.join('\n\n---\n\n')}`
: '',
thread.reviewComments.length > 0
? `## Review Comments\n${thread.reviewComments.join('\n\n---\n\n')}`
: '',
thread.reviews.length > 0
? `## Reviews\n${thread.reviews.join('\n\n---\n\n')}`
: '',
'',
'## Diff (truncated)',
'```',
thread.diff.slice(0, 8000),
'```'
]
.filter(Boolean)
.join('\n')
return `You are a senior QA engineer analyzing a pull request for ComfyUI frontend (a Vue 3 + TypeScript web application for AI image generation workflows).
Your task: Generate TWO targeted QA test guides — one for BEFORE the PR (main branch) and one for AFTER (PR branch).
${allText}
## Available test actions
Each step must use one of these actions:
- "openMenu" — clicks the Comfy hamburger menu (top-left C logo)
- "hoverMenuItem" — hovers a top-level menu item to open submenu (label required)
- "clickMenuItem" — clicks an item in the visible submenu (label required)
- "fillDialog" — fills dialog input and presses Enter (text required)
- "pressKey" — presses a keyboard key (key required)
- "click" — clicks an element by visible text (text required)
- "wait" — waits briefly (ms required, max 3000)
- "screenshot" — takes a screenshot (name required)
## Output format
Return a JSON object with exactly two keys: "before" and "after", each containing:
{
"summary": "One sentence: what this PR changes",
"test_focus": "Specific behaviors to verify in this recording",
"prerequisites": ["e.g. Load default workflow"],
"steps": [
{
"action": "openMenu",
"description": "Open the main menu to check file options",
"expected_before": "Old behavior description (before key only)",
"expected_after": "New behavior description (after key only)"
}
],
"visual_checks": ["Specific visual elements to look for"]
}
## Rules
- BEFORE guide: 2-4 steps, under 15 seconds. Show OLD/missing behavior.
- AFTER guide: 3-6 steps, under 30 seconds. Prove the fix/feature works.
- Focus on the SPECIFIC behavior changed by this PR, not generic testing.
- Use information from PR description, screenshots, and comments to understand intended behavior.
- Include at least one screenshot step in each guide.
- Do NOT include login steps.
- Menu pattern: openMenu -> hoverMenuItem -> clickMenuItem or screenshot.
- Output ONLY valid JSON, no markdown fences or explanation.`
}
async function analyzeWithGemini(
thread: PrThread,
media: Array<{ path: string; mimeType: string }>,
model: string,
apiKey: string
): Promise<{ before: QaGuide; after: QaGuide }> {
const genAI = new GoogleGenerativeAI(apiKey)
const geminiModel = genAI.getGenerativeModel({ model })
const prompt = buildAnalysisPrompt(thread)
const parts: Array<
{ text: string } | { inlineData: { mimeType: string; data: string } }
> = [{ text: prompt }]
// Add media as inline data
for (const item of media) {
try {
const buffer = readFileSync(item.path)
parts.push({
inlineData: {
mimeType: item.mimeType,
data: buffer.toString('base64')
}
})
} catch (err) {
console.warn(
`Failed to read media ${item.path}: ${(err as Error).message}`
)
}
}
console.warn(
`Sending to ${model}: ${prompt.length} chars text, ${media.length} media files`
)
const result = await geminiModel.generateContent({
contents: [{ role: 'user', parts }],
generationConfig: {
temperature: 0.2,
maxOutputTokens: 8192,
responseMimeType: 'application/json'
}
})
let text = result.response.text()
// Strip markdown fences if present
text = text
.replace(/^```(?:json)?\n?/gm, '')
.replace(/```$/gm, '')
.trim()
console.warn('Gemini response received')
console.warn('Raw response (first 500 chars):', text.slice(0, 500))
const parsed = JSON.parse(text)
// Handle different response shapes from Gemini
let before: QaGuide
let after: QaGuide
if (Array.isArray(parsed) && parsed.length >= 2) {
// Array format: [before, after]
before = parsed[0]
after = parsed[1]
} else if (parsed.before && parsed.after) {
// Object format: { before, after }
before = parsed.before
after = parsed.after
} else {
// Try nested wrapper keys
const inner = parsed.qa_guide ?? parsed.guides ?? parsed
if (inner.before && inner.after) {
before = inner.before
after = inner.after
} else {
console.warn(
'Full response:',
JSON.stringify(parsed, null, 2).slice(0, 2000)
)
throw new Error(
`Unexpected response shape. Got keys: ${Object.keys(parsed).join(', ')}`
)
}
}
return { before, after }
}
async function analyzeIssueWithGemini(
issue: IssueThread,
media: Array<{ path: string; mimeType: string }>,
model: string,
apiKey: string
): Promise<QaGuide> {
const genAI = new GoogleGenerativeAI(apiKey)
const geminiModel = genAI.getGenerativeModel({ model })
const prompt = buildIssueAnalysisPrompt(issue)
const parts: Array<
{ text: string } | { inlineData: { mimeType: string; data: string } }
> = [{ text: prompt }]
for (const item of media) {
try {
const buffer = readFileSync(item.path)
parts.push({
inlineData: {
mimeType: item.mimeType,
data: buffer.toString('base64')
}
})
} catch (err) {
console.warn(
`Failed to read media ${item.path}: ${(err as Error).message}`
)
}
}
console.warn(
`Sending to ${model}: ${prompt.length} chars text, ${media.length} media files`
)
const result = await geminiModel.generateContent({
contents: [{ role: 'user', parts }],
generationConfig: {
temperature: 0.2,
maxOutputTokens: 8192,
responseMimeType: 'application/json'
}
})
let text = result.response.text()
text = text
.replace(/^```(?:json)?\n?/gm, '')
.replace(/```$/gm, '')
.trim()
console.warn('Gemini response received')
console.warn('Raw response (first 500 chars):', text.slice(0, 500))
const parsed = JSON.parse(text)
const guide: QaGuide =
parsed.reproduce ?? parsed.qa_guide?.reproduce ?? parsed
return guide
}
// ── Main ──
async function main() {
const opts = parseArgs()
mkdirSync(opts.outputDir, { recursive: true })
if (opts.type === 'issue') {
await analyzeIssue(opts)
} else {
await analyzePr(opts)
}
}
async function analyzeIssue(opts: Options) {
const issue = fetchIssueThread(opts.prNumber, opts.repo)
const allText = [issue.body, ...issue.comments].join('\n')
const mediaUrls = extractMediaUrls(allText)
console.warn(`Found ${mediaUrls.length} media URLs`)
const media = await downloadMedia(
mediaUrls,
opts.outputDir,
opts.mediaBudgetBytes,
opts.maxVideoBytes
)
const guide = await analyzeIssueWithGemini(
issue,
media,
opts.model,
opts.apiKey
)
const beforePath = resolve(opts.outputDir, 'qa-guide-before.json')
writeFileSync(beforePath, JSON.stringify(guide, null, 2))
console.warn(`Wrote QA guide:`)
console.warn(` Reproduce: ${beforePath}`)
}
async function analyzePr(opts: Options) {
const thread = fetchPrThread(opts.prNumber, opts.repo)
const allText = [
thread.body,
...thread.issueComments,
...thread.reviewComments,
...thread.reviews
].join('\n')
const mediaUrls = extractMediaUrls(allText)
console.warn(`Found ${mediaUrls.length} media URLs`)
const media = await downloadMedia(
mediaUrls,
opts.outputDir,
opts.mediaBudgetBytes,
opts.maxVideoBytes
)
const guides = await analyzeWithGemini(thread, media, opts.model, opts.apiKey)
const beforePath = resolve(opts.outputDir, 'qa-guide-before.json')
const afterPath = resolve(opts.outputDir, 'qa-guide-after.json')
writeFileSync(beforePath, JSON.stringify(guides.before, null, 2))
writeFileSync(afterPath, JSON.stringify(guides.after, null, 2))
console.warn(`Wrote QA guides:`)
console.warn(` Before: ${beforePath}`)
console.warn(` After: ${afterPath}`)
}
function isExecutedAsScript(metaUrl: string): boolean {
const modulePath = fileURLToPath(metaUrl)
const scriptPath = process.argv[1] ? resolve(process.argv[1]) : ''
return modulePath === scriptPath
}
if (isExecutedAsScript(import.meta.url)) {
main().catch((err) => {
console.error('PR analysis failed:', err)
process.exit(1)
})
}

View File

@@ -0,0 +1,413 @@
#!/usr/bin/env bash
# Deploy QA report to Cloudflare Pages.
# Expected env vars: CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID, RAW_BRANCH,
# BEFORE_SHA, AFTER_SHA, TARGET_NUM, TARGET_TYPE, REPO, RUN_ID
# Writes outputs to GITHUB_OUTPUT: badge_status, url
set -euo pipefail
npm install -g wrangler@4.74.0 >/dev/null 2>&1
DEPLOY_DIR=$(mktemp -d)
mkdir -p "$DEPLOY_DIR"
for os in Linux macOS Windows; do
DIR="qa-artifacts/qa-report-${os}-${RUN_ID}"
for prefix in qa qa-before; do
VID="${DIR}/${prefix}-session.mp4"
if [ -f "$VID" ]; then
DEST="$DEPLOY_DIR/${prefix}-${os}.mp4"
cp "$VID" "$DEST"
echo "Found ${prefix} ${os} video ($(du -h "$VID" | cut -f1))"
fi
done
# Copy multi-pass session videos (qa-session-1, qa-session-2, etc.)
for numbered in "$DIR"/qa-session-[0-9].mp4; do
[ -f "$numbered" ] || continue
NUM=$(basename "$numbered" | sed 's/qa-session-\([0-9]\).mp4/\1/')
DEST="$DEPLOY_DIR/qa-${os}-pass${NUM}.mp4"
cp "$numbered" "$DEST"
echo "Found pass ${NUM} ${os} video ($(du -h "$numbered" | cut -f1))"
done
# Generate GIF thumbnail from after video (or first pass)
THUMB_SRC="$DEPLOY_DIR/qa-${os}.mp4"
[ ! -f "$THUMB_SRC" ] && THUMB_SRC="$DEPLOY_DIR/qa-${os}-pass1.mp4"
if [ -f "$THUMB_SRC" ]; then
ffmpeg -y -ss 10 -i "$THUMB_SRC" -t 8 \
-vf "fps=8,scale=480:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=64[p];[s1][p]paletteuse=dither=bayer" \
-loop 0 "$DEPLOY_DIR/qa-${os}-thumb.gif" 2>/dev/null \
|| echo "GIF generation failed for ${os} (non-fatal)"
fi
done
# Build video cards and report sections
CARDS=""
# shellcheck disable=SC2034 # accessed via eval
ICONS_Linux="&#x1F427;" ICONS_macOS="&#x1F34E;" ICONS_Windows="&#x1FA9F;"
CARD_COUNT=0
DL_ICON="<svg width=14 height=14 viewBox='0 0 24 24' fill=none stroke=currentColor stroke-width=2><path d='M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4'/><polyline points='7 10 12 15 17 10'/><line x1=12 y1=15 x2=12 y2=3'/></svg>"
for os in Linux macOS Windows; do
eval "ICON=\$ICONS_${os}"
OS_LOWER=$(echo "$os" | tr '[:upper:]' '[:lower:]')
HAS_BEFORE=$([ -f "$DEPLOY_DIR/qa-before-${os}.mp4" ] && echo 1 || echo 0)
HAS_AFTER=$( { [ -f "$DEPLOY_DIR/qa-${os}.mp4" ] || [ -f "$DEPLOY_DIR/qa-${os}-pass1.mp4" ]; } && echo 1 || echo 0)
[ "$HAS_AFTER" = "0" ] && continue
# Collect all reports for this platform (single + multi-pass)
REPORT_FILES=""
REPORT_LINK=""
REPORT_HTML=""
for rpt in "video-reviews/${OS_LOWER}-qa-video-report.md" "video-reviews/${OS_LOWER}-pass"*-qa-video-report.md; do
[ -f "$rpt" ] && REPORT_FILES="${REPORT_FILES} ${rpt}"
done
if [ -n "$REPORT_FILES" ]; then
# Concatenate all reports into one combined report file
COMBINED_MD=""
for rpt in $REPORT_FILES; do
cp "$rpt" "$DEPLOY_DIR/$(basename "$rpt")"
RPT_MD=$(sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g' "$rpt")
[ -n "$COMBINED_MD" ] && COMBINED_MD="${COMBINED_MD}&#10;&#10;---&#10;&#10;"
COMBINED_MD="${COMBINED_MD}${RPT_MD}"
done
FIRST_REPORT=$(echo "$REPORT_FILES" | awk '{print $1}')
FIRST_BASENAME=$(basename "$FIRST_REPORT")
REPORT_LINK="<a class=dl href=${FIRST_BASENAME}><svg width=14 height=14 viewBox='0 0 24 24' fill=none stroke=currentColor stroke-width=2><path d='M14 2H6a2 2 0 0 0-2 2v16a2 2 0 0 0 2 2h12a2 2 0 0 0 2-2V8z'/><polyline points='14 2 14 8 20 8'/><line x1=16 y1=13 x2=8 y2=13/><line x1=16 y1=17 x2=8 y2=17'/></svg>Report</a>"
REPORT_HTML="<details class=report open><summary><svg width=14 height=14 viewBox='0 0 24 24' fill=none stroke=currentColor stroke-width=2><circle cx=12 cy=12 r=10/><line x1=12 y1=16 x2=12 y2=12/><line x1=12 y1=8 x2=12.01 y2=8'/></svg> AI Comparative Review</summary><div class=report-body data-md>${COMBINED_MD}</div></details>"
fi
if [ "$HAS_BEFORE" = "1" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"
elif [ -f "$DEPLOY_DIR/qa-${os}.mp4" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=video-wrap><video controls preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=card-body><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links><a class=dl href=qa-${os}.mp4 download>${DL_ICON}Download</a>${REPORT_LINK}</span></div>${REPORT_HTML}</div>"
else
PASS_VIDEOS=""
for pass_vid in "$DEPLOY_DIR/qa-${os}-pass"[0-9].mp4; do
[ -f "$pass_vid" ] || continue
PASS_NUM=$(basename "$pass_vid" | sed "s/qa-${os}-pass\([0-9]\).mp4/\1/")
PASS_VIDEOS="${PASS_VIDEOS}<div class=comp-panel><div class=comp-label>Pass ${PASS_NUM}</div><div class=video-wrap><video controls preload=auto><source src=qa-${os}-pass${PASS_NUM}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}-pass${PASS_NUM}.mp4 download>${DL_ICON}Pass ${PASS_NUM}</a></div></div>"
done
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison>${PASS_VIDEOS}</div>${REPORT_HTML}</div>"
fi
CARD_COUNT=$((CARD_COUNT + 1))
done
# Build commit info and target link for the report header
COMMIT_HTML=""
REPO_URL="https://github.com/${REPO}"
if [ -n "${TARGET_NUM:-}" ]; then
if [ "$TARGET_TYPE" = "issue" ]; then
COMMIT_HTML="<a href=${REPO_URL}/issues/${TARGET_NUM} class=sha title='Issue'>Issue #${TARGET_NUM}</a>"
else
COMMIT_HTML="<a href=${REPO_URL}/pull/${TARGET_NUM} class=sha title='Pull Request'>PR #${TARGET_NUM}</a>"
fi
fi
if [ -n "${BEFORE_SHA:-}" ]; then
SHORT_BEFORE="${BEFORE_SHA:0:7}"
COMMIT_HTML="${COMMIT_HTML:+${COMMIT_HTML} &middot; }<a href=${REPO_URL}/commit/${BEFORE_SHA} class=sha title='main branch'>main @ ${SHORT_BEFORE}</a>"
fi
if [ -n "${AFTER_SHA:-}" ]; then
SHORT_AFTER="${AFTER_SHA:0:7}"
AFTER_LABEL="PR"
[ -n "${TARGET_NUM:-}" ] && AFTER_LABEL="#${TARGET_NUM}"
COMMIT_HTML="${COMMIT_HTML:+${COMMIT_HTML} &middot; }<a href=${REPO_URL}/commit/${AFTER_SHA} class=sha title='PR head commit'>${AFTER_LABEL} @ ${SHORT_AFTER}</a>"
fi
if [ -n "${PIPELINE_SHA:-}" ]; then
SHORT_PIPE="${PIPELINE_SHA:0:7}"
COMMIT_HTML="${COMMIT_HTML:+${COMMIT_HTML} &middot; }<a href=${REPO_URL}/commit/${PIPELINE_SHA} class=sha title='QA pipeline version'>QA @ ${SHORT_PIPE}</a>"
fi
[ -n "$COMMIT_HTML" ] && COMMIT_HTML=" &middot; ${COMMIT_HTML}"
RUN_LINK=""
if [ -n "${RUN_URL:-}" ]; then
RUN_LINK=" &middot; <a href=\"${RUN_URL}\" class=sha title=\"GitHub Actions run\">CI Job</a>"
fi
# Timing info
DEPLOY_TIME=$(date -u '+%Y-%m-%d %H:%M UTC')
TIMING_HTML=""
if [ -n "${RUN_START_TIME:-}" ]; then
TIMING_HTML=" &middot; <span class=sha title='Pipeline timing'>${RUN_START_TIME} &rarr; ${DEPLOY_TIME}</span>"
fi
# Generate index.html from template
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
TEMPLATE="$SCRIPT_DIR/qa-report-template.html"
# Write dynamic content to temp files for safe substitution
# Cloudflare Pages _headers file — enable range requests for video seeking
cat > "$DEPLOY_DIR/_headers" <<'HEADERSEOF'
/*.mp4
Accept-Ranges: bytes
Cache-Control: public, max-age=86400
HEADERSEOF
# Build purpose description from pr-context.txt
PURPOSE_HTML=""
if [ -f pr-context.txt ]; then
# Extract title line and first paragraph of description
PR_TITLE=$(grep -m1 '^Title:' pr-context.txt 2>/dev/null | sed 's/^Title: //' || true)
if [ "$TARGET_TYPE" = "issue" ]; then
PURPOSE_LABEL="Issue #${TARGET_NUM}"
PURPOSE_VERB="reports"
else
PURPOSE_LABEL="PR #${TARGET_NUM}"
PURPOSE_VERB="aims to"
fi
# Get first ~300 chars of description body (after "Description:" line)
PR_DESC=$(sed -n '/^Description:/,/^###/p' pr-context.txt 2>/dev/null | grep -v '^Description:\|^###' | head -5 | sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g' | tr '\n' ' ' | head -c 400 || true)
[ -z "$PR_DESC" ] && PR_DESC=$(sed -n '3,8p' pr-context.txt 2>/dev/null | sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g' | tr '\n' ' ' | head -c 400 || true)
# Build requirements from QA guide JSON
REQS_HTML=""
QA_GUIDE=$(ls qa-guides/qa-guide-*.json 2>/dev/null | head -1 || true)
if [ -f "$QA_GUIDE" ]; then
PREREQS=$(python3 -c "
import json, sys, html
try:
g = json.load(open(sys.argv[1]))
prereqs = g.get('prerequisites', [])
steps = g.get('steps', [])
focus = g.get('test_focus', '')
parts = []
if focus:
parts.append('<strong>Test focus:</strong> ' + html.escape(focus))
if prereqs:
parts.append('<strong>Prerequisites:</strong> ' + ', '.join(html.escape(p) for p in prereqs))
if steps:
parts.append('<strong>Steps:</strong> ' + ' → '.join(html.escape(s.get('description', str(s))) for s in steps[:6]))
if len(steps) > 6:
parts[-1] += ' → ...'
print('<br>'.join(parts))
except: pass
" "$QA_GUIDE" 2>/dev/null)
[ -n "$PREREQS" ] && REQS_HTML="<div class=purpose-reqs>${PREREQS}</div>"
fi
PURPOSE_HTML="<div class=purpose><div class=purpose-label>${PURPOSE_LABEL} ${PURPOSE_VERB}</div><strong>${PR_TITLE}</strong><br>${PR_DESC}${REQS_HTML}</div>"
fi
echo -n "$COMMIT_HTML" > "$DEPLOY_DIR/.commit_html"
echo -n "$CARDS" > "$DEPLOY_DIR/.cards_html"
echo -n "$RUN_LINK" > "$DEPLOY_DIR/.run_link"
# Badge HTML with copy button (placeholder URL filled after deploy)
echo -n '<div class="badge-bar"><img src="badge.svg" alt="QA Badge" class="badge-img"/><button class="copy-badge" title="Copy badge markdown" onclick="copyBadge()"><svg width=14 height=14 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2><rect x=9 y=9 width=13 height=13 rx=2/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg></button></div>' > "$DEPLOY_DIR/.badge_html"
echo -n "${TIMING_HTML:-}" > "$DEPLOY_DIR/.timing_html"
echo -n "$PURPOSE_HTML" > "$DEPLOY_DIR/.purpose_html"
python3 -c "
import sys, pathlib
d = pathlib.Path(sys.argv[1])
t = pathlib.Path(sys.argv[2]).read_text()
t = t.replace('{{COMMIT_HTML}}', (d / '.commit_html').read_text())
t = t.replace('{{CARDS}}', (d / '.cards_html').read_text())
t = t.replace('{{RUN_LINK}}', (d / '.run_link').read_text())
t = t.replace('{{BADGE_HTML}}', (d / '.badge_html').read_text())
t = t.replace('{{TIMING_HTML}}', (d / '.timing_html').read_text())
t = t.replace('{{PURPOSE_HTML}}', (d / '.purpose_html').read_text())
sys.stdout.write(t)
" "$DEPLOY_DIR" "$TEMPLATE" > "$DEPLOY_DIR/index.html"
rm -f "$DEPLOY_DIR/.commit_html" "$DEPLOY_DIR/.cards_html" "$DEPLOY_DIR/.run_link" "$DEPLOY_DIR/.badge_html" "$DEPLOY_DIR/.timing_html" "$DEPLOY_DIR/.purpose_html"
cat > "$DEPLOY_DIR/404.html" <<'ERROREOF'
<!DOCTYPE html><html lang=en><head><meta charset=utf-8><title>404</title>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600&display=swap" rel=stylesheet>
<style>:root{--bg:oklch(8% 0.02 265);--fg:oklch(45% 0.01 265);--err:oklch(62% 0.22 25)}*{margin:0;padding:0;box-sizing:border-box}body{background:var(--bg);color:var(--fg);font-family:'Inter',system-ui,sans-serif;display:flex;align-items:center;justify-content:center;min-height:100vh}div{text-align:center}h1{color:var(--err);font-size:clamp(3rem,8vw,5rem);font-weight:700;letter-spacing:-.04em;margin-bottom:.5rem}p{font-size:1rem;max-width:32ch;line-height:1.5}</style>
</head><body><div><h1>404</h1><p>File not found. The QA recording may have failed or been cancelled.</p></div></body></html>
ERROREOF
# Copy research log to deploy dir if it exists
for rlog in qa-artifacts/*/research/research-log.json qa-artifacts/*/*/research/research-log.json qa-artifacts/before/*/research/research-log.json; do
if [ -f "$rlog" ]; then
cp "$rlog" "$DEPLOY_DIR/research-log.json"
echo "Found research log: $rlog"
break
fi
done
# Copy generated test code to deploy dir
for tfile in qa-artifacts/*/research/reproduce.spec.ts qa-artifacts/*/*/research/reproduce.spec.ts qa-artifacts/before/*/research/reproduce.spec.ts; do
if [ -f "$tfile" ]; then
cp "$tfile" "$DEPLOY_DIR/reproduce.spec.ts"
echo "Found test code: $tfile"
break
fi
done
# Copy video script if available
for vsfile in qa-artifacts/*/video-script.spec.ts qa-artifacts/*/*/video-script.spec.ts qa-artifacts/before/*/video-script.spec.ts; do
if [ -f "$vsfile" ]; then
cp "$vsfile" "$DEPLOY_DIR/video-script.spec.ts"
echo "Found video script: $vsfile"
break
fi
done
# Generate badge SVGs into deploy dir
# Priority: research-log.json verdict (a11y-verified) > video review verdict (AI interpretation)
REPRO_COUNT=0 INCONC_COUNT=0 NOT_REPRO_COUNT=0 TOTAL_REPORTS=0
# Try research log first (ground truth from a11y assertions)
RESEARCH_VERDICT=""
REPRO_METHOD=""
if [ -f "$DEPLOY_DIR/research-log.json" ]; then
RESEARCH_VERDICT=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); print(d.get('verdict',''))" "$DEPLOY_DIR/research-log.json" 2>/dev/null || true)
REPRO_METHOD=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); print(d.get('reproducedBy','none'))" "$DEPLOY_DIR/research-log.json" 2>/dev/null || true)
echo "Research verdict (a11y-verified): ${RESEARCH_VERDICT:-none} (by: ${REPRO_METHOD:-none})"
if [ -n "$RESEARCH_VERDICT" ]; then
TOTAL_REPORTS=1
case "$RESEARCH_VERDICT" in
REPRODUCED) REPRO_COUNT=1 ;;
NOT_REPRODUCIBLE) NOT_REPRO_COUNT=1 ;;
INCONCLUSIVE) INCONC_COUNT=1 ;;
esac
fi
fi
# Check video review verdicts (always, not just as fallback)
VIDEO_REPRODUCED=false
if [ -d video-reviews ]; then
for rpt in video-reviews/*-qa-video-report.md; do
[ -f "$rpt" ] || continue
VERDICT_JSON=$(grep -oP '"verdict":\s*"[A-Z_]+' "$rpt" 2>/dev/null | tail -1 | grep -oP '[A-Z_]+$' || true)
if [ -n "$VERDICT_JSON" ]; then
echo "Video review verdict: $VERDICT_JSON ($(basename "$rpt"))"
[ "$VERDICT_JSON" = "REPRODUCED" ] && VIDEO_REPRODUCED=true
# Only count video as separate report if no research log
if [ -z "$RESEARCH_VERDICT" ]; then
TOTAL_REPORTS=$((TOTAL_REPORTS + 1))
case "$VERDICT_JSON" in
REPRODUCED) REPRO_COUNT=$((REPRO_COUNT + 1)) ;;
NOT_REPRODUCIBLE) NOT_REPRO_COUNT=$((NOT_REPRO_COUNT + 1)) ;;
INCONCLUSIVE) INCONC_COUNT=$((INCONC_COUNT + 1)) ;;
esac
fi
elif [ -z "$RESEARCH_VERDICT" ]; then
TOTAL_REPORTS=$((TOTAL_REPORTS + 1))
# Fallback: grep Summary section (for older reports without ## Verdict)
SUMM=$(sed -n '/^## Summary/,/^## /p' "$rpt" 2>/dev/null | head -15)
if echo "$SUMM" | grep -iq 'INCONCLUSIVE'; then
INCONC_COUNT=$((INCONC_COUNT + 1))
elif echo "$SUMM" | grep -iq 'not reproduced\|could not reproduce\|could not be confirmed\|unable to reproduce\|fails\? to reproduce\|fails\? to perform\|was NOT\|NOT visible\|not observed\|fail.* to demonstrate\|does not demonstrate\|steps were not performed\|never.*tested\|never.*accessed\|not.* confirmed'; then
NOT_REPRO_COUNT=$((NOT_REPRO_COUNT + 1))
elif echo "$SUMM" | grep -iq 'reproduc\|confirm'; then
REPRO_COUNT=$((REPRO_COUNT + 1))
VIDEO_REPRODUCED=true
fi
fi
done
fi
# Upgrade reproduction method to "both" when E2E and video agree
if [ "$REPRO_METHOD" = "e2e_test" ] && [ "$VIDEO_REPRODUCED" = "true" ]; then
REPRO_METHOD="both"
echo "Upgraded reproducedBy to 'both' (E2E + video review agree)"
elif [ -z "$RESEARCH_VERDICT" ] && [ "$VIDEO_REPRODUCED" = "true" ]; then
REPRO_METHOD="video"
fi
FAIL_COUNT=$((TOTAL_REPORTS - REPRO_COUNT - NOT_REPRO_COUNT))
[ "$FAIL_COUNT" -lt 0 ] && FAIL_COUNT=0
echo "DEBUG verdict: repro=${REPRO_COUNT} not_repro=${NOT_REPRO_COUNT} inconc=${INCONC_COUNT} fail=${FAIL_COUNT} total=${TOTAL_REPORTS}"
# Warn on verdict mismatch between E2E and video review
if [ -n "$RESEARCH_VERDICT" ]; then
VIDEO_VERDICT=$(grep -oP '"verdict":\s*"[A-Z_]+' video-reviews/*-qa-video-report.md 2>/dev/null | tail -1 | grep -oP '[A-Z_]+$' || true)
if [ -n "$VIDEO_VERDICT" ] && [ "$RESEARCH_VERDICT" != "$VIDEO_VERDICT" ]; then
echo "⚠ Verdict mismatch: E2E=$RESEARCH_VERDICT vs Video=$VIDEO_VERDICT (E2E takes priority)"
fi
fi
echo "Verdict: ${REPRO_COUNT}${NOT_REPRO_COUNT}${FAIL_COUNT}⚠ / ${TOTAL_REPORTS}"
# Badge text:
# Single pass: "REPRODUCED" / "NOT REPRODUCIBLE" / "INCONCLUSIVE"
# Multi pass: "2✓ 0✗ 1⚠ / 3" with color based on dominant result
REPRO_RESULT="" REPRO_COLOR="#9f9f9f"
if [ "$TOTAL_REPORTS" -le 1 ]; then
# Single report — simple label
if [ "$REPRO_COUNT" -gt 0 ]; then
REPRO_RESULT="REPRODUCED" REPRO_COLOR="#2196f3"
elif [ "$NOT_REPRO_COUNT" -gt 0 ]; then
REPRO_RESULT="NOT REPRODUCIBLE" REPRO_COLOR="#9f9f9f"
elif [ "$FAIL_COUNT" -gt 0 ]; then
REPRO_RESULT="INCONCLUSIVE" REPRO_COLOR="#9f9f9f"
fi
else
# Multi pass — show breakdown: X✓ Y✗ Z⚠ / N
PARTS=""
[ "$REPRO_COUNT" -gt 0 ] && PARTS="${REPRO_COUNT}"
[ "$NOT_REPRO_COUNT" -gt 0 ] && PARTS="${PARTS:+${PARTS} }${NOT_REPRO_COUNT}"
[ "$FAIL_COUNT" -gt 0 ] && PARTS="${PARTS:+${PARTS} }${FAIL_COUNT}"
REPRO_RESULT="${PARTS} / ${TOTAL_REPORTS}"
# Color based on best outcome
if [ "$REPRO_COUNT" -gt 0 ]; then
REPRO_COLOR="#2196f3"
elif [ "$NOT_REPRO_COUNT" -gt 0 ]; then
REPRO_COLOR="#9f9f9f"
fi
fi
# Badge label: #NUM QA0327 (with today's date)
QA_DATE=$(date -u '+%m%d')
BADGE_LABEL="QA${QA_DATE}"
[ -n "${TARGET_NUM:-}" ] && BADGE_LABEL="#${TARGET_NUM} QA${QA_DATE}"
# For PRs, also extract fix quality from Overall Risk section
FIX_RESULT="" FIX_COLOR="#4c1"
if [ "$TARGET_TYPE" != "issue" ]; then
# Try structured JSON risk first
ALL_RISKS=$(grep -ohP '"risk":\s*"[a-z]+' video-reviews/*.md 2>/dev/null | grep -oP '[a-z]+$' || true)
if [ -n "$ALL_RISKS" ]; then
# Use worst risk across all reports
if echo "$ALL_RISKS" | grep -q 'high'; then
FIX_RESULT="MAJOR ISSUES" FIX_COLOR="#e05d44"
elif echo "$ALL_RISKS" | grep -q 'medium'; then
FIX_RESULT="MINOR ISSUES" FIX_COLOR="#dfb317"
elif echo "$ALL_RISKS" | grep -q 'low'; then
FIX_RESULT="APPROVED" FIX_COLOR="#4c1"
fi
else
# Fallback: grep Overall Risk section
RISK_TEXT=""
if [ -d video-reviews ]; then
RISK_TEXT=$(sed -n '/^## Overall Risk/,/^## /p' video-reviews/*.md 2>/dev/null | sed 's/\*//g' | head -20 || true)
fi
RISK_FIRST=$(echo "$RISK_TEXT" | grep -oiP '^\s*(high|medium|moderate|low|minimal|critical)' | head -1 | tr '[:upper:]' '[:lower:]' || true)
if [ -n "$RISK_FIRST" ]; then
case "$RISK_FIRST" in
*low*|*minimal*) FIX_RESULT="APPROVED" FIX_COLOR="#4c1" ;;
*medium*|*moderate*) FIX_RESULT="MINOR ISSUES" FIX_COLOR="#dfb317" ;;
*high*|*critical*) FIX_RESULT="MAJOR ISSUES" FIX_COLOR="#e05d44" ;;
esac
elif echo "$RISK_TEXT" | grep -iq 'no.*risk\|approved\|looks good'; then
FIX_RESULT="APPROVED" FIX_COLOR="#4c1"
fi
fi
fi
# Always use vertical box badge
/tmp/gen-badge-box.sh "$DEPLOY_DIR/badge.svg" "$BADGE_LABEL" \
"$REPRO_COUNT" "$NOT_REPRO_COUNT" "$FAIL_COUNT" "$TOTAL_REPORTS" \
"$FIX_RESULT" "$FIX_COLOR" "$REPRO_METHOD"
BADGE_STATUS="${REPRO_RESULT:-UNKNOWN}${FIX_RESULT:+ | Fix: ${FIX_RESULT}}"
echo "badge_status=${BADGE_STATUS:-FINISHED}" >> "$GITHUB_OUTPUT"
# Remove files exceeding Cloudflare Pages 25MB limit to prevent silent deploy failures
MAX_SIZE=$((25 * 1024 * 1024))
find "$DEPLOY_DIR" -type f -size +${MAX_SIZE}c | while read -r big_file; do
SIZE_MB=$(( $(stat -c%s "$big_file") / 1024 / 1024 ))
echo "Removing oversized file: $(basename "$big_file") (${SIZE_MB}MB > 25MB limit)"
rm "$big_file"
done
BRANCH=$(echo "$RAW_BRANCH" | sed 's/[^a-zA-Z0-9-]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | cut -c1-28)
DEPLOY_OUTPUT=$(wrangler pages deploy "$DEPLOY_DIR" \
--project-name="comfy-qa" \
--branch="$BRANCH" 2>&1) || true
echo "$DEPLOY_OUTPUT" | tail -5
URL=$(echo "$DEPLOY_OUTPUT" | grep -oE 'https://[a-zA-Z0-9.-]+\.pages\.dev\S*' | head -1 || true)
FALLBACK_URL="https://${BRANCH}.comfy-qa.pages.dev"
echo "url=${URL:-$FALLBACK_URL}" >> "$GITHUB_OUTPUT"
echo "Deployed to: ${URL:-$FALLBACK_URL}"

View File

@@ -0,0 +1,208 @@
#!/usr/bin/env tsx
/**
* Generates a Playwright regression test (.spec.ts) from a QA report + PR diff.
* Uses Gemini to produce a test that asserts UIUX behavior verified during QA.
*
* Usage:
* pnpm exec tsx scripts/qa-generate-test.ts \
* --qa-report <path> QA video review report (markdown)
* --pr-diff <path> PR diff file
* --output <path> Output .spec.ts file path
* --model <name> Gemini model (default: gemini-3-flash-preview)
*/
import { readFile, writeFile } from 'node:fs/promises'
import { basename, resolve } from 'node:path'
import { GoogleGenerativeAI } from '@google/generative-ai'
interface CliOptions {
qaReport: string
prDiff: string
output: string
model: string
}
const DEFAULTS: CliOptions = {
qaReport: '',
prDiff: '',
output: '',
model: 'gemini-3-flash-preview'
}
// ── Fixture API reference for the prompt ────────────────────────────
const FIXTURE_API = `
## ComfyUI Playwright Test Fixture API
Import pattern:
\`\`\`typescript
import { expect } from '@playwright/test'
import { comfyPageFixture as test } from '../fixtures/ComfyPage'
\`\`\`
### Available helpers on \`comfyPage\`:
- \`comfyPage.page\` — raw Playwright Page
- \`comfyPage.menu.topbar\` — Topbar helper:
- \`.getTabNames(): Promise<string[]>\` — get all open tab names
- \`.getActiveTabName(): Promise<string>\` — get active tab name
- \`.saveWorkflow(name)\` — Save via File > Save dialog
- \`.saveWorkflowAs(name)\` — Save via File > Save As dialog
- \`.exportWorkflow(name)\` — Export via File > Export dialog
- \`.triggerTopbarCommand(path: string[])\` — e.g. ['File', 'Save As']
- \`.getWorkflowTab(name)\` — get a tab locator by name
- \`.closeWorkflowTab(name)\` — close a tab
- \`.openTopbarMenu()\` — open the hamburger menu
- \`.openSubmenu(label)\` — hover to open a submenu
- \`comfyPage.menu.workflowsTab\` — Workflows sidebar:
- \`.open()\` / \`.close()\` — toggle sidebar
- \`.getTopLevelSavedWorkflowNames()\` — list saved workflows
- \`.getPersistedItem(name)\` — get a workflow item locator
- \`comfyPage.workflow\` — WorkflowHelper:
- \`.loadWorkflow(name)\` — load from browser_tests/assets/{name}.json
- \`.setupWorkflowsDirectory(structure)\` — setup test directory
- \`.deleteWorkflow(name)\` — delete a workflow
- \`.isCurrentWorkflowModified(): Promise<boolean>\` — check dirty state
- \`.getUndoQueueSize()\` / \`.getRedoQueueSize()\`
- \`comfyPage.settings.setSetting(key, value)\` — change settings
- \`comfyPage.keyboard\` — KeyboardHelper:
- \`.undo()\` / \`.redo()\` / \`.bypass()\`
- \`comfyPage.nodeOps\` — NodeOperationsHelper
- \`comfyPage.canvas\` — CanvasHelper
- \`comfyPage.contextMenu\` — ContextMenu
- \`comfyPage.toast\` — ToastHelper
- \`comfyPage.confirmDialog\` — confirmation dialog
- \`comfyPage.nextFrame()\` — wait for Vue re-render
### Test patterns:
- Use \`test.describe('Name', { tag: '@ui' }, () => { ... })\` for UI tests
- Use \`test.beforeEach\` to set up common state (settings, workflow dir)
- Use \`expect(locator).toHaveScreenshot('name.png')\` for visual assertions
- Use \`expect(locator).toBeVisible()\` / \`.toHaveText()\` for behavioral assertions
- Use \`comfyPage.workflow.setupWorkflowsDirectory({})\` to ensure clean state
`
// ── Prompt builder ──────────────────────────────────────────────────
function buildPrompt(qaReport: string, prDiff: string): string {
return `You are a Playwright test generator for the ComfyUI frontend.
Your task: Generate a single .spec.ts regression test file that asserts the UIUX behavior
described in the QA report below. The test must:
1. Use the ComfyUI Playwright fixture API (documented below)
2. Test UIUX behavior ONLY — element visibility, tab names, dialog states, workflow states
3. NOT test code implementation details
4. Be concise — only test the behavior that the PR changed
5. Follow existing test conventions (see API reference)
${FIXTURE_API}
## QA Video Review Report
${qaReport}
## PR Diff (for context on what changed)
${prDiff.slice(0, 8000)}
## Output Requirements
- Output ONLY the .spec.ts file content — no markdown fences, no explanations
- Start with imports, end with closing brace
- Use descriptive test names that explain the expected behavior
- Add screenshot assertions where visual verification matters
- Keep it focused: 2-5 test cases covering the core behavioral change
- Use \`test.beforeEach\` for common setup (settings, workflow directory)
- Tag the describe block with \`{ tag: '@ui' }\` or \`{ tag: '@workflow' }\` as appropriate
`
}
// ── Gemini call ─────────────────────────────────────────────────────
async function generateTest(
qaReport: string,
prDiff: string,
model: string
): Promise<string> {
const apiKey = process.env.GEMINI_API_KEY
if (!apiKey) throw new Error('GEMINI_API_KEY env var required')
const genAI = new GoogleGenerativeAI(apiKey)
const genModel = genAI.getGenerativeModel({ model })
const prompt = buildPrompt(qaReport, prDiff)
console.warn(`Sending prompt to ${model} (${prompt.length} chars)...`)
const result = await genModel.generateContent({
contents: [{ role: 'user', parts: [{ text: prompt }] }],
generationConfig: {
temperature: 0.2,
maxOutputTokens: 8192
}
})
const text = result.response.text()
// Strip markdown fences if model wraps output
return text
.replace(/^```(?:typescript|ts)?\n?/, '')
.replace(/\n?```$/, '')
.trim()
}
// ── CLI ─────────────────────────────────────────────────────────────
function parseArgs(): CliOptions {
const args = process.argv.slice(2)
const opts = { ...DEFAULTS }
for (let i = 0; i < args.length; i++) {
switch (args[i]) {
case '--qa-report':
opts.qaReport = args[++i]
break
case '--pr-diff':
opts.prDiff = args[++i]
break
case '--output':
opts.output = args[++i]
break
case '--model':
opts.model = args[++i]
break
case '--help':
console.warn(`Usage:
pnpm exec tsx scripts/qa-generate-test.ts [options]
Options:
--qa-report <path> QA video review report (markdown) [required]
--pr-diff <path> PR diff file [required]
--output <path> Output .spec.ts path [required]
--model <name> Gemini model (default: gemini-3-flash-preview)`)
process.exit(0)
}
}
if (!opts.qaReport || !opts.prDiff || !opts.output) {
console.error('Missing required args. Run with --help for usage.')
process.exit(1)
}
return opts
}
async function main() {
const opts = parseArgs()
const qaReport = await readFile(resolve(opts.qaReport), 'utf-8')
const prDiff = await readFile(resolve(opts.prDiff), 'utf-8')
console.warn(
`QA report: ${basename(opts.qaReport)} (${qaReport.length} chars)`
)
console.warn(`PR diff: ${basename(opts.prDiff)} (${prDiff.length} chars)`)
const testCode = await generateTest(qaReport, prDiff, opts.model)
const outputPath = resolve(opts.output)
await writeFile(outputPath, testCode + '\n')
console.warn(`Generated test: ${outputPath} (${testCode.length} chars)`)
}
main().catch((err) => {
console.error(err)
process.exit(1)
})

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,164 @@
<!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><title>QA Session Recordings</title>
<link rel=preconnect href=https://fonts.googleapis.com><link rel=preconnect href=https://fonts.gstatic.com crossorigin><link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel=stylesheet>
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<style>
:root{--bg:oklch(97% 0.01 265);--surface:oklch(100% 0 0);--surface-up:oklch(94% 0.01 265);--fg:oklch(15% 0.02 265);--fg-muted:oklch(40% 0.01 265);--fg-dim:oklch(55% 0.01 265);--primary:oklch(50% 0.21 265);--primary-up:oklch(45% 0.21 265);--primary-glow:oklch(55% 0.15 265);--ok:oklch(45% 0.18 155);--err:oklch(50% 0.22 25);--border:oklch(85% 0.01 265);--border-faint:oklch(90% 0.01 265);--r:0.75rem;--r-lg:1rem;--ease-out:cubic-bezier(0.22,1,0.36,1);--dur-base:250ms;--dur-slow:500ms;--font:'Inter',system-ui,sans-serif;--font-mono:'JetBrains Mono',monospace}
@media(prefers-color-scheme:dark){:root{--bg:oklch(8% 0.02 265);--surface:oklch(12% 0.02 265);--surface-up:oklch(16% 0.02 265);--fg:oklch(96% 0.01 95);--fg-muted:oklch(65% 0.01 265);--fg-dim:oklch(45% 0.01 265);--primary:oklch(62% 0.21 265);--primary-up:oklch(68% 0.21 265);--primary-glow:oklch(62% 0.15 265);--ok:oklch(62% 0.18 155);--err:oklch(62% 0.22 25);--border:oklch(22% 0.02 265);--border-faint:oklch(15% 0.01 265)}}
*{margin:0;padding:0;box-sizing:border-box}
body{background:var(--bg);color:var(--fg);font-family:var(--font);min-height:100vh;padding:clamp(1.5rem,4vw,3rem) clamp(1rem,3vw,2rem);position:relative}
@media(prefers-color-scheme:dark){body::after{content:'';position:fixed;inset:0;pointer-events:none;opacity:.03;background:url("data:image/svg+xml,%3Csvg viewBox='0 0 256 256' xmlns='http://www.w3.org/2000/svg'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='.85' numOctaves='4' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='100%25' height='100%25' filter='url(%23n)'/%3E%3C/svg%3E")}}
.container{max-width:1200px;margin:0 auto}
header{display:flex;align-items:center;gap:1rem;margin-bottom:clamp(1.5rem,4vw,3rem);padding-bottom:1.25rem;border-bottom:1px solid var(--border)}
.header-icon{width:36px;height:36px;display:grid;place-items:center;background:linear-gradient(135deg,oklch(100% 0 0/.06),oklch(100% 0 0/.02));backdrop-filter:blur(12px);border:1px solid oklch(100% 0 0/.1);border-radius:var(--r);flex-shrink:0}
.header-icon svg{color:var(--primary)}
h1{font-size:clamp(1.25rem,2.5vw,1.625rem);font-weight:700;letter-spacing:-.03em;background:linear-gradient(135deg,var(--fg),var(--fg-muted));-webkit-background-clip:text;-webkit-text-fill-color:transparent;background-clip:text}
.meta{color:var(--fg-dim);font-size:.8125rem;margin-top:.15rem;letter-spacing:.01em}
.grid{display:grid;grid-template-columns:repeat(auto-fill,minmax(min(480px,100%),1fr));gap:1.5rem}
.card{background:var(--surface);border:1px solid var(--border);border-radius:var(--r-lg);overflow:hidden;transition:border-color var(--dur-base) var(--ease-out),box-shadow var(--dur-base) var(--ease-out),transform var(--dur-base) var(--ease-out)}
.card:hover{border-color:var(--primary);box-shadow:0 4px 16px oklch(0% 0 0/.1);transform:translateY(-2px)}
.video-wrap{position:relative;background:var(--surface);border-bottom:1px solid var(--border-faint)}
.video-wrap video{width:100%;display:block;aspect-ratio:16/9;object-fit:contain}
.card-body{padding:.75rem 1rem;display:flex;align-items:center;justify-content:space-between}
.platform{display:flex;align-items:center;gap:.5rem;font-weight:600;font-size:.9375rem;letter-spacing:-.01em}
.icon{font-size:1.125rem}
.links{display:flex;gap:.5rem}
.dl{color:var(--fg-muted);text-decoration:none;font-size:.75rem;font-weight:500;display:inline-flex;align-items:center;gap:.3rem;padding:.25rem .6rem;border-radius:9999px;border:1px solid var(--border);background:oklch(100% 0 0/.03);transition:all var(--dur-base) var(--ease-out)}
.dl:hover{color:var(--primary-up);border-color:var(--primary);background:oklch(62% 0.21 265/.08)}
.badge{font-size:.6875rem;font-weight:600;padding:.2rem .625rem;border-radius:9999px;text-transform:uppercase;letter-spacing:.05em}
.card-header{padding:.75rem 1rem;display:flex;align-items:center;justify-content:space-between;border-bottom:1px solid var(--border-faint)}
.comparison{display:grid;grid-template-columns:1fr 1fr;gap:0}
.comp-panel{border-right:1px solid var(--border-faint)}
.comp-panel:last-child{border-right:none}
.comp-label{padding:.4rem .75rem;font-size:.7rem;font-weight:600;text-transform:uppercase;letter-spacing:.05em;color:var(--fg-muted);background:var(--surface);display:flex;align-items:center;gap:.4rem}
.comp-tag{font-size:.6rem;padding:.1rem .4rem;border-radius:9999px;font-weight:600}
.comp-panel:first-child .comp-tag{background:oklch(65% 0.01 265/.15);color:var(--fg-muted);border:1px solid var(--border)}
.comp-panel:last-child .comp-tag{background:oklch(62% 0.18 155/.15);color:var(--ok);border:1px solid oklch(62% 0.18 155/.25)}
.comp-dl{padding:.4rem .75rem;display:flex;justify-content:center}
.report{border-top:1px solid var(--border-faint);padding:.75rem 1rem;font-size:.8125rem}
.report summary{cursor:pointer;color:var(--fg-muted);font-weight:500;display:flex;align-items:center;gap:.4rem;user-select:none;transition:color var(--dur-base) var(--ease-out)}
.report summary:hover{color:var(--fg)}
.report summary svg{flex-shrink:0;opacity:.5}
.report[open] summary{margin-bottom:.75rem;padding-bottom:.5rem;border-bottom:1px solid var(--border-faint)}
.report-body{line-height:1.7;color:oklch(80% 0.01 265);overflow-x:auto}
.report-body h1,.report-body h2{margin:1.25rem 0 .5rem;color:var(--fg);font-size:1rem;font-weight:600;letter-spacing:-.02em;border-bottom:1px solid var(--border-faint);padding-bottom:.4rem}
.report-body h3{margin:.75rem 0 .4rem;color:var(--fg);font-size:.875rem;font-weight:600}
.report-body p{margin:.4rem 0}
.report-body ul,.report-body ol{margin:.4rem 0 .4rem 1.5rem}
.report-body li{margin:.25rem 0}
.report-body code{background:var(--surface-up);padding:.125rem .375rem;border-radius:.25rem;font-size:.7rem;font-family:var(--font-mono);border:1px solid var(--border-faint)}
.report-body h3+p>code:first-child{background:oklch(62% 0.22 25/.15);color:var(--err);border-color:oklch(62% 0.22 25/.25)}
.report-body h3+p>code:nth-child(2){background:oklch(62% 0.21 265/.15);color:var(--primary-up);border-color:oklch(62% 0.21 265/.25)}
.report-body h3+p>code:nth-child(3){background:oklch(65% 0.01 265/.15);color:var(--fg-muted);border-color:var(--border)}
.report-body table{width:100%;border-collapse:collapse;margin:.75rem 0;font-size:.75rem;border:1px solid var(--border);border-radius:var(--r);overflow:hidden}
.report-body th,.report-body td{border:1px solid var(--border-faint);padding:.5rem .75rem;text-align:left;vertical-align:top;word-wrap:break-word}
.report-body th{background:var(--surface-up);color:var(--fg);font-weight:600;font-size:.6875rem;text-transform:uppercase;letter-spacing:.05em;position:sticky;top:0;white-space:nowrap}
.report-body tr:nth-child(even){background:color-mix(in oklch,var(--surface) 50%,transparent)}
.report-body tr:hover{background:color-mix(in oklch,var(--surface-up) 50%,transparent)}
.report-body strong{color:var(--fg)}
.report-body hr{border:none;border-top:1px solid var(--border-faint);margin:1rem 0}
@keyframes fade-up{from{opacity:0;transform:translateY(16px)}to{opacity:1;transform:translateY(0)}}
.reveal{animation:fade-up var(--dur-slow) var(--ease-out) both;animation-delay:calc(var(--i,0) * 120ms)}
@media(prefers-reduced-motion:reduce){.reveal{animation:none}}
@media(max-width:480px){.grid{grid-template-columns:1fr}.card-body{flex-wrap:wrap;gap:.5rem}}
.sha{color:var(--primary);text-decoration:none;font-family:var(--font-mono);font-size:.75rem;font-weight:500;padding:.1rem .4rem;border-radius:.25rem;background:oklch(62% 0.21 265/.08);border:1px solid oklch(62% 0.21 265/.15);transition:all var(--dur-base) var(--ease-out)}
.sha:hover{background:oklch(62% 0.21 265/.15);border-color:var(--primary)}
.badge-bar{display:flex;align-items:center;gap:.5rem;margin-bottom:1rem}
.badge-img{height:20px;display:block}
.copy-badge{background:oklch(100% 0 0/.06);border:1px solid var(--border);color:var(--fg-muted);padding:.3rem .4rem;border-radius:var(--r);cursor:pointer;display:inline-flex;align-items:center;transition:all var(--dur-base) var(--ease-out)}
.copy-badge:hover{color:var(--primary-up);border-color:var(--primary);background:oklch(62% 0.21 265/.1)}
.copy-badge.copied{color:var(--ok);border-color:var(--ok)}
.vseek{width:100%;padding:0 .75rem;background:var(--surface);border-top:1px solid var(--border-faint);position:relative;height:24px;display:flex;align-items:center}
.vseek input[type=range]{-webkit-appearance:none;appearance:none;width:100%;height:4px;background:var(--border);border-radius:2px;outline:none;cursor:pointer;position:relative;z-index:2}
.vseek input[type=range]::-webkit-slider-thumb{-webkit-appearance:none;width:12px;height:12px;border-radius:50%;background:var(--primary);cursor:pointer;border:2px solid var(--bg);box-shadow:0 0 4px oklch(0% 0 0/.3)}
.vseek input[type=range]::-moz-range-thumb{width:12px;height:12px;border-radius:50%;background:var(--primary);cursor:pointer;border:2px solid var(--bg)}
.vseek .vbuf{position:absolute;left:.75rem;right:.75rem;height:4px;border-radius:2px;pointer-events:none;top:50%;transform:translateY(-50%)}
.vseek .vbuf-bar{height:100%;background:oklch(62% 0.21 265/.25);border-radius:2px;transition:width 200ms linear}
.vctrl{display:flex;align-items:center;gap:.375rem;padding:.5rem .75rem;background:var(--surface);border-top:1px solid var(--border-faint);flex-wrap:wrap}
.vctrl button{background:oklch(100% 0 0/.06);border:1px solid var(--border);color:var(--fg-muted);font-size:.6875rem;font-weight:600;font-family:var(--font-mono);padding:.25rem .5rem;border-radius:.25rem;cursor:pointer;transition:all var(--dur-base) var(--ease-out);white-space:nowrap}
.vctrl button:hover{color:var(--primary-up);border-color:var(--primary);background:oklch(62% 0.21 265/.1)}
.vctrl button.active{color:var(--primary);border-color:var(--primary);background:oklch(62% 0.21 265/.15)}
.vctrl .vtime{font-family:var(--font-mono);font-size:.6875rem;color:var(--fg-dim);min-width:10ch;text-align:center}
.vctrl .vsep{width:1px;height:1rem;background:var(--border);flex-shrink:0}
.vctrl .vhint{font-size:.6rem;color:var(--fg-dim);margin-left:auto}
.purpose{background:linear-gradient(135deg,oklch(100% 0 0/.04),oklch(100% 0 0/.02));border:1px solid oklch(100% 0 0/.08);border-radius:var(--r-lg);padding:1rem 1.25rem;margin-bottom:1.5rem;font-size:.85rem;line-height:1.7;color:oklch(80% 0.01 265)}
.purpose strong{color:var(--fg);font-weight:600}
.purpose .purpose-label{font-size:.7rem;font-weight:600;text-transform:uppercase;letter-spacing:.05em;color:var(--fg-muted);margin-bottom:.4rem}
.purpose .purpose-reqs{margin-top:.75rem;padding-top:.75rem;border-top:1px solid oklch(100% 0 0/.06);font-size:.8rem;color:oklch(70% 0.01 265);line-height:1.8}
</style></head><body><div class=container>
<header><div class=header-icon><svg width=20 height=20 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2 stroke-linecap=round stroke-linejoin=round><polygon points="23 7 16 12 23 17 23 7"/><rect x=1 y=5 width=15 height=14 rx=2 ry=2/></svg></div><div><h1>QA Session Recordings</h1><div class=meta>ComfyUI Frontend &middot; Automated QA{{COMMIT_HTML}}{{RUN_LINK}}{{TIMING_HTML}}</div>{{BADGE_HTML}}</div></header>
{{PURPOSE_HTML}}<div class=grid>{{CARDS}}</div>
<div id=research-section style="margin-top:2rem"></div>
</div><script>
// Load research-log.json and reproduce.spec.ts if available
(async()=>{
const sec=document.getElementById('research-section');
try{
const [logRes,testRes,vsRes]=await Promise.allSettled([fetch('research-log.json'),fetch('reproduce.spec.ts'),fetch('video-script.spec.ts')]);
let html='';
if(logRes.status==='fulfilled'&&logRes.value.ok){
const log=await logRes.value.json();
// Show verdict banner for non-reproduced results
if(log.verdict&&log.verdict!=='REPRODUCED'){
const colors={NOT_REPRODUCIBLE:{bg:'oklch(25% 0.08 25)',border:'oklch(40% 0.15 25)',icon:'✗'},INCONCLUSIVE:{bg:'oklch(25% 0.06 80)',border:'oklch(40% 0.12 80)',icon:'⚠'}};
const c=colors[log.verdict]||colors.INCONCLUSIVE;
html+=`<div style="margin-bottom:1.5rem;padding:1.25rem;background:${c.bg};border:1px solid ${c.border};border-radius:var(--r-lg)"><div style="font-size:1.25rem;font-weight:700;margin-bottom:.5rem">${c.icon} ${log.verdict.replace(/_/g,' ')}</div><div style="font-size:.9rem;line-height:1.6;opacity:.9">${(log.summary||'No details available.').replace(/</g,'&lt;')}</div>${log.evidence?`<div style="margin-top:.75rem;padding:.75rem;background:oklch(0% 0 0/.2);border-radius:var(--r);font-family:var(--font-mono);font-size:.8rem;white-space:pre-wrap;max-height:200px;overflow:auto">${log.evidence.replace(/</g,'&lt;')}</div>`:''}</div>`;
}
html+=`<details style="margin-bottom:1.5rem"><summary style="cursor:pointer;font-weight:600;font-size:1rem;padding:.75rem 1rem;background:var(--surface);border:1px solid var(--border);border-radius:var(--r-lg)">Research Log &mdash; ${log.verdict||'?'} (${(log.log||[]).length||'?'} tool calls, ${((log.elapsedMs||0)/1000).toFixed(1)}s)</summary><div style="padding:1rem;background:var(--surface);border:1px solid var(--border);border-top:0;border-radius:0 0 var(--r-lg) var(--r-lg);overflow:auto;max-height:600px"><pre style="font-family:var(--font-mono);font-size:.8rem;line-height:1.6;white-space:pre-wrap">${JSON.stringify(log,null,2)}</pre></div></details>`;
}
if(testRes.status==='fulfilled'&&testRes.value.ok){
const code=await testRes.value.text();
html+=`<details><summary style="cursor:pointer;font-weight:600;font-size:1rem;padding:.75rem 1rem;background:var(--surface);border:1px solid var(--border);border-radius:var(--r-lg)">E2E Test Code (reproduce.spec.ts)</summary><div style="padding:1rem;background:var(--surface);border:1px solid var(--border);border-top:0;border-radius:0 0 var(--r-lg) var(--r-lg);overflow:auto;max-height:600px"><pre style="font-family:var(--font-mono);font-size:.8rem;line-height:1.6;white-space:pre-wrap">${code.replace(/</g,'&lt;').replace(/>/g,'&gt;')}</pre></div></details>`;
}
if(vsRes.status==='fulfilled'&&vsRes.value.ok){
const vsCode=await vsRes.value.text();
html+=`<details style="margin-top:1rem"><summary style="cursor:pointer;font-weight:600;font-size:1rem;padding:.75rem 1rem;background:var(--surface);border:1px solid var(--border);border-radius:var(--r-lg)">Video Script (video-script.spec.ts)</summary><div style="padding:1rem;background:var(--surface);border:1px solid var(--border);border-top:0;border-radius:0 0 var(--r-lg) var(--r-lg);overflow:auto;max-height:600px"><pre style="font-family:var(--font-mono);font-size:.8rem;line-height:1.6;white-space:pre-wrap">${vsCode.replace(/</g,'&lt;').replace(/>/g,'&gt;')}</pre></div></details>`;
}
if(html)sec.innerHTML=html;
}catch(e){console.warn('research load failed',e)}
})();
</script><script>
function copyBadge(){const u=location.href.replace(/\/[^/]*$/,'/');const b=u+'badge.svg';const md='[![QA Badge]('+b+')]('+u+')';navigator.clipboard.writeText(md).then(()=>{const btn=document.querySelector('.copy-badge');btn.classList.add('copied');btn.innerHTML='<svg width=14 height=14 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2><polyline points="20 6 9 17 4 12"/></svg>';setTimeout(()=>{btn.classList.remove('copied');btn.innerHTML='<svg width=14 height=14 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2><rect x=9 y=9 width=13 height=13 rx=2/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>'},2000)})}
document.querySelectorAll('[data-md]').forEach(el=>{const t=el.textContent;el.removeAttribute('data-md');el.innerHTML=marked.parse(t)});
const FPS=30,FT=1/FPS,SPEEDS=[0.1,0.25,0.5,1,1.5,2];
document.querySelectorAll('.video-wrap video').forEach(v=>{
v.playbackRate=1;
const c=document.createElement('div');c.className='vctrl';
const btn=(label,fn)=>{const b=document.createElement('button');b.textContent=label;b.onclick=fn;c.appendChild(b);return b};
const sep=()=>{const s=document.createElement('div');s.className='vsep';c.appendChild(s)};
const time=document.createElement('span');time.className='vtime';time.textContent='0:00.000';
btn('\u23EE',()=>{v.currentTime=0});
btn('\u25C0\u25C0',()=>{v.currentTime=Math.max(0,v.currentTime-FT*10)});
btn('\u25C0',()=>{v.pause();v.currentTime=Math.max(0,v.currentTime-FT)});
const playBtn=btn('\u25B6',()=>{v.paused?v.play():v.pause()});
btn('\u25B6\u25B6',()=>{v.pause();v.currentTime+=FT});
btn('\u25B6\u25B6\u25B6',()=>{v.currentTime+=FT*10});
sep();
const spdBtns=SPEEDS.map(s=>{const b=btn(s+'x',()=>{v.playbackRate=s;spdBtns.forEach(x=>x.classList.remove('active'));b.classList.add('active')});if(s===1)b.classList.add('active');return b});
sep();c.appendChild(time);
const hint=document.createElement('span');hint.className='vhint';hint.textContent='\u2190\u2192 frame \u2022 space play';c.appendChild(hint);
// Custom seekbar — works even without server range request support
const seekWrap=document.createElement('div');seekWrap.className='vseek';
const seekBar=document.createElement('input');seekBar.type='range';seekBar.min=0;seekBar.max=1000;seekBar.value=0;seekBar.step=1;
const bufWrap=document.createElement('div');bufWrap.className='vbuf';
const bufBar=document.createElement('div');bufBar.className='vbuf-bar';bufBar.style.width='0%';
bufWrap.appendChild(bufBar);seekWrap.appendChild(bufWrap);seekWrap.appendChild(seekBar);
let seeking=false;
seekBar.oninput=()=>{seeking=true;if(v.duration){v.currentTime=v.duration*(seekBar.value/1000)}};
seekBar.onchange=()=>{seeking=false};
v.closest('.video-wrap').after(seekWrap);
seekWrap.after(c);
v.ontimeupdate=()=>{
const m=Math.floor(v.currentTime/60),s=Math.floor(v.currentTime%60),ms=Math.floor((v.currentTime%1)*1000);
time.textContent=m+':'+(s<10?'0':'')+s+'.'+String(ms).padStart(3,'0');
if(!seeking&&v.duration){seekBar.value=Math.round((v.currentTime/v.duration)*1000)}
};
v.onprogress=v.onloadeddata=()=>{if(v.buffered.length&&v.duration){bufBar.style.width=(v.buffered.end(v.buffered.length-1)/v.duration*100)+'%'}};
v.onplay=()=>{playBtn.textContent='\u23F8'};v.onpause=()=>{playBtn.textContent='\u25B6'};
v.parentElement.addEventListener('keydown',e=>{
if(e.key==='ArrowLeft'){e.preventDefault();v.pause();v.currentTime=Math.max(0,v.currentTime-FT)}
if(e.key==='ArrowRight'){e.preventDefault();v.pause();v.currentTime+=FT}
if(e.key===' '){e.preventDefault();v.paused?v.play():v.pause()}
});
v.parentElement.setAttribute('tabindex','0');
});
</script></body></html>

View File

@@ -0,0 +1,253 @@
#!/usr/bin/env tsx
/**
* QA Reproduce Phase — Deterministic replay of research plan with narration
*
* Takes a reproduction plan from the research phase and replays it:
* 1. Execute each action deterministically (no AI decisions)
* 2. Capture a11y snapshot before/after each action
* 3. Gemini describes what visually changed (narration for humans)
* 4. Output: narration-log.json with full evidence chain
*/
import type { Page } from '@playwright/test'
import { GoogleGenerativeAI } from '@google/generative-ai'
import { mkdirSync, writeFileSync } from 'fs'
import type { ActionResult } from './qa-record.js'
// ── Types ──
interface ReproductionStep {
action: Record<string, unknown> & { action: string }
expectedAssertion: string
}
interface NarrationEntry {
step: number
action: string
params: Record<string, unknown>
result: ActionResult
a11yBefore: unknown
a11yAfter: unknown
assertionExpected: string
assertionPassed: boolean
assertionActual: string
geminiNarration: string
timestampMs: number
}
export interface NarrationLog {
entries: NarrationEntry[]
allAssertionsPassed: boolean
}
interface ReproduceOptions {
page: Page
plan: ReproductionStep[]
geminiApiKey: string
outputDir: string
}
// ── A11y helpers ──
interface A11yNode {
role: string
name: string
value?: string
checked?: boolean
disabled?: boolean
expanded?: boolean
children?: A11yNode[]
}
function searchA11y(node: A11yNode | null, selector: string): A11yNode | null {
if (!node) return null
const sel = selector.toLowerCase()
if (
node.name?.toLowerCase().includes(sel) ||
node.role?.toLowerCase().includes(sel)
) {
return node
}
if (node.children) {
for (const child of node.children) {
const found = searchA11y(child, selector)
if (found) return found
}
}
return null
}
function summarizeA11y(node: A11yNode | null): string {
if (!node) return 'null'
const parts = [`role=${node.role}`, `name="${node.name}"`]
if (node.value !== undefined) parts.push(`value="${node.value}"`)
if (node.checked !== undefined) parts.push(`checked=${node.checked}`)
if (node.disabled) parts.push('disabled')
if (node.expanded !== undefined) parts.push(`expanded=${node.expanded}`)
return `{${parts.join(', ')}}`
}
// ── Subtitle overlay ──
async function showSubtitle(page: Page, text: string, step: number) {
const encoded = encodeURIComponent(
text.slice(0, 120).replace(/'/g, "\\'").replace(/\n/g, ' ')
)
await page.addScriptTag({
content: `(function(){
var id='qa-subtitle';
var el=document.getElementById(id);
if(!el){
el=document.createElement('div');
el.id=id;
Object.assign(el.style,{position:'fixed',bottom:'32px',left:'50%',transform:'translateX(-50%)',zIndex:'2147483646',maxWidth:'90%',padding:'6px 14px',borderRadius:'6px',background:'rgba(0,0,0,0.8)',color:'rgba(255,255,255,0.95)',fontSize:'12px',fontFamily:'system-ui,sans-serif',fontWeight:'400',lineHeight:'1.4',pointerEvents:'none',textAlign:'center',whiteSpace:'normal'});
document.body.appendChild(el);
}
el.textContent='['+${step}+'] '+decodeURIComponent('${encoded}');
})()`
})
}
// ── Gemini visual narration ──
async function geminiDescribe(
page: Page,
geminiApiKey: string,
focus: string
): Promise<string> {
try {
const screenshot = await page.screenshot({ type: 'jpeg', quality: 70 })
const genAI = new GoogleGenerativeAI(geminiApiKey)
const model = genAI.getGenerativeModel({ model: 'gemini-3-flash-preview' })
const result = await model.generateContent([
{
text: `Describe in 1-2 sentences what you see on this ComfyUI screen. Focus on: ${focus}. Be factual — only describe what is visible.`
},
{
inlineData: {
mimeType: 'image/jpeg',
data: screenshot.toString('base64')
}
}
])
return result.response.text().trim()
} catch (e) {
return `(Gemini narration failed: ${e instanceof Error ? e.message.slice(0, 50) : e})`
}
}
// ── Main reproduce function ──
export async function runReproducePhase(
opts: ReproduceOptions
): Promise<NarrationLog> {
const { page, plan, geminiApiKey, outputDir } = opts
const { executeAction } = await import('./qa-record.js')
const narrationDir = `${outputDir}/narration`
mkdirSync(narrationDir, { recursive: true })
const entries: NarrationEntry[] = []
const startMs = Date.now()
console.warn(`Reproduce phase: replaying ${plan.length} steps...`)
for (let i = 0; i < plan.length; i++) {
const step = plan[i]
const actionObj = step.action
const elapsed = Date.now() - startMs
// Show subtitle
await showSubtitle(page, `Step ${i + 1}: ${actionObj.action}`, i + 1)
console.warn(` [${i + 1}/${plan.length}] ${actionObj.action}`)
// Capture a11y BEFORE
const a11yBefore = await page
.locator('body')
.ariaSnapshot({ timeout: 3000 })
.catch(() => null)
// Execute action
const result = await executeAction(
page,
actionObj as Parameters<typeof executeAction>[1],
outputDir
)
await new Promise((r) => setTimeout(r, 500))
// Capture a11y AFTER
const a11yAfter = await page
.locator('body')
.ariaSnapshot({ timeout: 3000 })
.catch(() => null)
// Check assertion
let assertionPassed = false
let assertionActual = ''
if (step.expectedAssertion) {
// Parse the expected assertion — e.g. "Settings dialog: visible" or "tab count: 2"
const parts = step.expectedAssertion.split(':').map((s) => s.trim())
const selectorName = parts[0]
const expectedState = parts.slice(1).join(':').trim()
const found = searchA11y(a11yAfter as A11yNode | null, selectorName)
assertionActual = found ? summarizeA11y(found) : 'NOT FOUND'
if (expectedState === 'visible' || expectedState === 'exists') {
assertionPassed = found !== null
} else if (expectedState === 'hidden' || expectedState === 'gone') {
assertionPassed = found === null
} else {
// Generic: check if the actual state contains the expected text
assertionPassed = assertionActual
.toLowerCase()
.includes(expectedState.toLowerCase())
}
console.warn(
` Assertion: "${step.expectedAssertion}" → ${assertionPassed ? '✓ PASS' : '✗ FAIL'} (actual: ${assertionActual})`
)
}
// Gemini narration (visual description for humans)
const geminiNarration = await geminiDescribe(
page,
geminiApiKey,
`What changed after ${actionObj.action}?`
)
entries.push({
step: i + 1,
action: actionObj.action,
params: actionObj,
result,
a11yBefore,
a11yAfter,
assertionExpected: step.expectedAssertion,
assertionPassed,
assertionActual,
geminiNarration,
timestampMs: elapsed
})
}
// Final screenshot
await page.screenshot({ path: `${outputDir}/reproduce-final.png` })
const log: NarrationLog = {
entries,
allAssertionsPassed: entries.every((e) => e.assertionPassed)
}
writeFileSync(
`${narrationDir}/narration-log.json`,
JSON.stringify(log, null, 2)
)
console.warn(
`Reproduce phase complete: ${entries.filter((e) => e.assertionPassed).length}/${entries.length} assertions passed`
)
return log
}

View File

@@ -0,0 +1,150 @@
import { describe, expect, it } from 'vitest'
import {
extractPlatformFromArtifactDirName,
pickLatestVideosByPlatform,
selectVideoCandidateByFile
} from './qa-video-review'
describe('extractPlatformFromArtifactDirName', () => {
it('extracts and normalizes known qa artifact directory names', () => {
expect(
extractPlatformFromArtifactDirName('qa-report-Windows-22818315023')
).toBe('windows')
expect(
extractPlatformFromArtifactDirName('qa-report-macOS-22818315023')
).toBe('macos')
expect(
extractPlatformFromArtifactDirName('qa-report-Linux-22818315023')
).toBe('linux')
})
it('falls back to slugifying unknown directory names', () => {
expect(extractPlatformFromArtifactDirName('custom platform run')).toBe(
'custom-platform-run'
)
})
})
describe('pickLatestVideosByPlatform', () => {
it('keeps only the latest candidate per platform', () => {
const selected = pickLatestVideosByPlatform([
{
platformName: 'windows',
videoPath: '/tmp/windows-old.mp4',
mtimeMs: 100
},
{
platformName: 'windows',
videoPath: '/tmp/windows-new.mp4',
mtimeMs: 200
},
{
platformName: 'linux',
videoPath: '/tmp/linux.mp4',
mtimeMs: 150
}
])
expect(selected).toEqual([
{
platformName: 'linux',
videoPath: '/tmp/linux.mp4',
mtimeMs: 150
},
{
platformName: 'windows',
videoPath: '/tmp/windows-new.mp4',
mtimeMs: 200
}
])
})
})
describe('selectVideoCandidateByFile', () => {
it('selects a single candidate by artifacts-relative path', () => {
const selected = selectVideoCandidateByFile(
[
{
platformName: 'windows',
videoPath: '/tmp/qa-artifacts/qa-report-Windows-1/qa-session.mp4',
mtimeMs: 100
},
{
platformName: 'linux',
videoPath: '/tmp/qa-artifacts/qa-report-Linux-1/qa-session.mp4',
mtimeMs: 200
}
],
{
artifactsDir: '/tmp/qa-artifacts',
videoFile: 'qa-report-Linux-1/qa-session.mp4'
}
)
expect(selected).toEqual({
platformName: 'linux',
videoPath: '/tmp/qa-artifacts/qa-report-Linux-1/qa-session.mp4',
mtimeMs: 200
})
})
it('throws when basename matches multiple videos', () => {
expect(() =>
selectVideoCandidateByFile(
[
{
platformName: 'windows',
videoPath: '/tmp/qa-artifacts/qa-report-Windows-1/qa-session.mp4',
mtimeMs: 100
},
{
platformName: 'linux',
videoPath: '/tmp/qa-artifacts/qa-report-Linux-1/qa-session.mp4',
mtimeMs: 200
}
],
{
artifactsDir: '/tmp/qa-artifacts',
videoFile: 'qa-session.mp4'
}
)
).toThrow('matched 2 videos')
})
it('throws when there is no matching video', () => {
expect(() =>
selectVideoCandidateByFile(
[
{
platformName: 'windows',
videoPath: '/tmp/qa-artifacts/qa-report-Windows-1/qa-session.mp4',
mtimeMs: 100
}
],
{
artifactsDir: '/tmp/qa-artifacts',
videoFile: 'qa-report-macOS-1/qa-session.mp4'
}
)
).toThrow('No video matched')
})
it('throws when video file is missing', () => {
expect(() =>
selectVideoCandidateByFile(
[
{
platformName: 'windows',
videoPath: '/tmp/qa-artifacts/qa-report-Windows-1/qa-session.mp4',
mtimeMs: 100
}
],
{
artifactsDir: '/tmp/qa-artifacts',
videoFile: ' '
}
)
).toThrow('--video-file is required')
})
})

View File

@@ -0,0 +1,771 @@
#!/usr/bin/env tsx
import { mkdir, readFile, stat, writeFile } from 'node:fs/promises'
import { basename, dirname, extname, relative, resolve } from 'node:path'
import { fileURLToPath } from 'node:url'
import { GoogleGenerativeAI } from '@google/generative-ai'
import { globSync } from 'glob'
interface CliOptions {
artifactsDir: string
videoFile: string
beforeVideo: string
outputDir: string
model: string
requestTimeoutMs: number
dryRun: boolean
prContext: string
targetUrl: string
passLabel: string
}
interface VideoCandidate {
platformName: string
videoPath: string
mtimeMs: number
}
const DEFAULT_OPTIONS: CliOptions = {
artifactsDir: './tmp/qa-artifacts',
videoFile: '',
beforeVideo: '',
outputDir: './tmp',
model: 'gemini-3-flash-preview',
requestTimeoutMs: 300_000,
dryRun: false,
prContext: '',
targetUrl: '',
passLabel: ''
}
const USAGE = `Usage:
pnpm exec tsx scripts/qa-video-review.ts [options]
Options:
--artifacts-dir <path> Artifacts root directory
(default: ./tmp/qa-artifacts)
--video-file <name-or-path> Video file to analyze (required)
(supports basename or relative/absolute path)
--before-video <path> Before video (main branch) for comparison
When provided, sends both videos to Gemini
for comparative before/after analysis
--output-dir <path> Output directory for markdown reports
(default: ./tmp)
--model <name> Gemini model
(default: gemini-3-flash-preview)
--request-timeout-ms <n> Request timeout in milliseconds
(default: 300000)
--pr-context <file> File with PR context (title, body, diff)
for PR-aware review
--target-url <url> Issue or PR URL to include in the report
--pass-label <label> Label for multi-pass reports (e.g. pass1)
Output becomes {platform}-{label}-qa-video-report.md
--dry-run Discover videos and output targets only
--help Show this help text
Environment:
GEMINI_API_KEY Required unless --dry-run
`
function parsePositiveInteger(rawValue: string, flagName: string): number {
const parsedValue = Number.parseInt(rawValue, 10)
if (!Number.isInteger(parsedValue) || parsedValue <= 0) {
throw new Error(`Invalid value for ${flagName}: "${rawValue}"`)
}
return parsedValue
}
function parseCliOptions(args: string[]): CliOptions {
const options: CliOptions = { ...DEFAULT_OPTIONS }
for (let index = 0; index < args.length; index += 1) {
const argument = args[index]
const nextValue = args[index + 1]
const requireValue = (flagName: string): string => {
if (!nextValue || nextValue.startsWith('--')) {
throw new Error(`Missing value for ${flagName}`)
}
index += 1
return nextValue
}
if (argument === '--help') {
process.stdout.write(USAGE)
process.exit(0)
}
if (argument === '--artifacts-dir') {
options.artifactsDir = requireValue(argument)
continue
}
if (argument === '--video-file') {
options.videoFile = requireValue(argument)
continue
}
if (argument === '--output-dir') {
options.outputDir = requireValue(argument)
continue
}
if (argument === '--model') {
options.model = requireValue(argument)
continue
}
if (argument === '--request-timeout-ms') {
options.requestTimeoutMs = parsePositiveInteger(
requireValue(argument),
argument
)
continue
}
if (argument === '--before-video') {
options.beforeVideo = requireValue(argument)
continue
}
if (argument === '--pr-context') {
options.prContext = requireValue(argument)
continue
}
if (argument === '--target-url') {
options.targetUrl = requireValue(argument)
continue
}
if (argument === '--pass-label') {
options.passLabel = requireValue(argument)
continue
}
if (argument === '--dry-run') {
options.dryRun = true
continue
}
throw new Error(`Unknown argument: ${argument}`)
}
return options
}
function normalizePlatformName(value: string): string {
const slug = value
.trim()
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-+|-+$/g, '')
return slug.length > 0 ? slug : 'unknown-platform'
}
export function extractPlatformFromArtifactDirName(dirName: string): string {
const matchedValue = dirName.match(/^qa-report-(.+?)(?:-\d+)?$/i)?.[1]
return normalizePlatformName(matchedValue ?? dirName)
}
function extractPlatformFromVideoPath(videoPath: string): string {
const artifactDirName = basename(dirname(videoPath))
return extractPlatformFromArtifactDirName(artifactDirName)
}
export function pickLatestVideosByPlatform(
candidates: VideoCandidate[]
): VideoCandidate[] {
const latestByPlatform = new Map<string, VideoCandidate>()
for (const candidate of candidates) {
const current = latestByPlatform.get(candidate.platformName)
if (!current || candidate.mtimeMs > current.mtimeMs) {
latestByPlatform.set(candidate.platformName, candidate)
}
}
return [...latestByPlatform.values()].sort((a, b) =>
a.platformName.localeCompare(b.platformName)
)
}
function toProjectRelativePath(targetPath: string): string {
const relativePath = relative(process.cwd(), targetPath)
if (relativePath.startsWith('.')) {
return relativePath
}
return `./${relativePath}`
}
function errorToString(error: unknown): string {
return error instanceof Error ? error.message : String(error)
}
function normalizePathForMatch(value: string): string {
return value.replaceAll('\\', '/').replace(/^\.\/+/, '')
}
export function selectVideoCandidateByFile(
candidates: VideoCandidate[],
options: { artifactsDir: string; videoFile: string }
): VideoCandidate {
const requestedValue = options.videoFile.trim()
if (requestedValue.length === 0) {
throw new Error('--video-file is required')
}
const artifactsRoot = resolve(options.artifactsDir)
const requestedAbsolutePath = resolve(requestedValue)
const requestedPathKey = normalizePathForMatch(requestedValue)
const matches = candidates.filter((candidate) => {
const candidateAbsolutePath = resolve(candidate.videoPath)
if (candidateAbsolutePath === requestedAbsolutePath) {
return true
}
const candidateBaseName = basename(candidate.videoPath)
if (candidateBaseName === requestedValue) {
return true
}
const relativeToCwd = normalizePathForMatch(
relative(process.cwd(), candidateAbsolutePath)
)
if (relativeToCwd === requestedPathKey) {
return true
}
const relativeToArtifacts = normalizePathForMatch(
relative(artifactsRoot, candidateAbsolutePath)
)
return relativeToArtifacts === requestedPathKey
})
if (matches.length === 1) {
return matches[0]
}
if (matches.length === 0) {
const availableVideos = candidates.map((candidate) =>
toProjectRelativePath(candidate.videoPath)
)
throw new Error(
[
`No video matched --video-file "${options.videoFile}".`,
'Available videos:',
...availableVideos.map((videoPath) => `- ${videoPath}`)
].join('\n')
)
}
throw new Error(
[
`--video-file "${options.videoFile}" matched ${matches.length} videos.`,
'Please pass a more specific path.',
...matches.map((match) => `- ${toProjectRelativePath(match.videoPath)}`)
].join('\n')
)
}
async function collectVideoCandidates(
artifactsDir: string
): Promise<VideoCandidate[]> {
const absoluteArtifactsDir = resolve(artifactsDir)
const videoPaths = globSync('**/qa-session{,-[0-9]}.mp4', {
cwd: absoluteArtifactsDir,
absolute: true,
nodir: true
}).sort()
const candidates = await Promise.all(
videoPaths.map(async (videoPath) => {
const videoStat = await stat(videoPath)
return {
platformName: extractPlatformFromVideoPath(videoPath),
videoPath,
mtimeMs: videoStat.mtimeMs
}
})
)
return candidates
}
function getMimeType(filePath: string): string {
const ext = extname(filePath).toLowerCase()
const mimeMap: Record<string, string> = {
'.mp4': 'video/mp4',
'.webm': 'video/webm',
'.mov': 'video/quicktime',
'.avi': 'video/x-msvideo',
'.mkv': 'video/x-matroska',
'.m4v': 'video/mp4'
}
return mimeMap[ext] || 'video/mp4'
}
function buildReviewPrompt(options: {
platformName: string
videoPath: string
prContext: string
isComparative: boolean
}): string {
const { platformName, videoPath, prContext, isComparative } = options
if (isComparative) {
return buildComparativePrompt(platformName, videoPath, prContext)
}
return buildSingleVideoPrompt(platformName, videoPath, prContext)
}
function buildComparativePrompt(
platformName: string,
videoPath: string,
prContext: string
): string {
const lines = [
'You are a senior QA engineer performing a BEFORE/AFTER comparison review.',
'',
'You are given TWO videos:',
'- **Video 1 (BEFORE)**: The main branch BEFORE the PR. This shows the OLD behavior.',
'- **Video 2 (AFTER)**: The PR branch AFTER the changes. This shows the NEW behavior.',
'',
'Both videos show the same test steps executed on different code versions.',
''
]
if (prContext) {
lines.push('## PR Context', prContext, '')
}
lines.push(
'## Your Task',
`Platform: "${platformName}". After video: ${toProjectRelativePath(videoPath)}.`,
'',
'1. **BEFORE video**: Does it demonstrate the old behavior or bug that the PR aims to fix?',
' Describe what you observe — this establishes the baseline.',
'2. **AFTER video**: Does it prove the PR fix works? Is the intended new behavior visible?',
'3. **Comparison**: What specifically changed between before and after?',
'4. **Regressions**: Did the PR introduce any new problems visible in the AFTER video',
' that were NOT present in the BEFORE video?',
'',
'Note: Brief black frames during page transitions are NORMAL.',
'Note: Small cyan/purple dashed labels prefixed with "QA:" are annotations placed by the automated test script — they are NOT part of the application UI. Do not treat them as bugs or evidence.',
'Report only concrete, visible differences. Avoid speculation.',
'',
'Return markdown with these sections exactly:',
'## Summary',
'(What the PR changes, whether BEFORE confirms the old behavior, whether AFTER proves the fix)',
'',
'## Behavior Changes',
'Summarize ALL behavioral differences as a markdown TABLE:',
'| Behavior | Before (main) | After (PR) | Verdict |',
'',
'- **Behavior**: short name for the behavior (e.g. "Save shortcut label", "Menu hover style")',
'- **Before (main)**: how it works/looks in the BEFORE video',
'- **After (PR)**: how it works/looks in the AFTER video',
'- **Verdict**: `Fixed`, `Improved`, `Changed`, `Regression`, or `No Change`',
'',
'One row per distinct behavior. Include both changed AND unchanged key behaviors',
'that were tested, so reviewers can confirm nothing was missed.',
'',
'## Timeline Comparison',
'Present a chronological frame-by-frame comparison as a markdown TABLE:',
'| Time | Type | Severity | Before (main) | After (PR) |',
'',
'- **Time**: timestamp or range from the videos (e.g. `0:05-0:08`)',
'- **Type**: category such as `Visual`, `Behavior`, `Layout`, `Text`, `Animation`, `Menu`, `State`',
'- **Severity**: `None` (neutral change), `Fixed` (bug resolved), `Regression`, `Minor`, `Major`',
'- **Before (main)**: what is observed in the BEFORE video at that time',
'- **After (PR)**: what is observed in the AFTER video at that time',
'',
'Include one row per distinct observable difference. If behavior is identical at a timestamp,',
'omit that row. Focus on meaningful differences, not narrating every frame.',
'',
'## Confirmed Issues',
'For each issue, use this exact format:',
'',
'### [Short issue title]',
'`SEVERITY` `TIMESTAMP` `Confidence: LEVEL`',
'',
'[Description — specify whether it appears in BEFORE, AFTER, or both]',
'',
'**Evidence:** [What you observed at the given timestamp in which video]',
'',
'**Suggested Fix:** [Actionable recommendation]',
'',
'---',
'',
'## Possible Issues (Needs Human Verification)',
'## Overall Risk',
'(Assess whether the PR achieves its goal based on the before/after comparison)',
'',
'## Verdict',
'End your report with this EXACT JSON block (no markdown fence):',
'{"verdict": "REPRODUCED" | "NOT_REPRODUCIBLE" | "INCONCLUSIVE", "risk": "low" | "medium" | "high", "confidence": "high" | "medium" | "low"}',
'- REPRODUCED: the before video confirms the old behavior and the after video shows the fix working',
'- NOT_REPRODUCIBLE: the before video does not show the reported bug',
'- INCONCLUSIVE: the videos do not adequately demonstrate the behavior change'
)
return lines.filter(Boolean).join('\n')
}
function buildSingleVideoPrompt(
platformName: string,
videoPath: string,
prContext: string
): string {
const lines = [
'You are a senior QA engineer reviewing a UI test session recording.',
'',
'## ANTI-HALLUCINATION RULES (READ FIRST)',
'- Describe ONLY what you can directly observe in the video frames',
'- NEVER infer or assume what "must have happened" between frames',
'- If a step is not visible in the video, say "NOT SHOWN" — do not guess',
'- Your job is to be a CAMERA — report facts, not interpretations',
''
]
const isIssueContext =
prContext &&
/^### Issue #|^Title:.*\bbug\b|^This video attempts to reproduce/im.test(
prContext
)
if (prContext) {
lines.push(
'## Phase 1: Blind Observation (describe what you SEE and HEAR)',
'First, describe every UI interaction chronologically WITHOUT knowing the expected outcome:',
'- What elements does the user click/hover/type?',
'- What dialogs/menus open and close?',
'- What keyboard indicators appear? (look for subtitle overlays)',
'- What is the BEFORE state and AFTER state of each action?',
'- **Audio**: Does the video have a TTS narration audio track? If yes, transcribe what the voice says. This narration describes the bug being reproduced.',
'',
'## Phase 2: Compare against expected behavior',
'Now compare your observations against the context below.',
'Only claim a match if your Phase 1 observations EXPLICITLY support it.',
''
)
if (isIssueContext) {
lines.push(
'## Issue Context',
prContext,
'',
'## Comparison Questions',
'1. Did the video perform the reproduction steps described in the issue?',
'2. Did your Phase 1 observations show the reported bug behavior?',
'3. If the steps were not performed or the bug was not visible, say INCONCLUSIVE.',
''
)
} else {
lines.push(
'## PR Context',
prContext,
'',
'## Comparison Questions',
'1. Did the video test the specific behavior the PR changes?',
'2. Did your Phase 1 observations show the expected before/after difference?',
'3. If the test was incomplete or inconclusive, say so honestly.',
''
)
}
}
lines.push(
`Review this QA session video for platform "${platformName}".`,
`Source video: ${toProjectRelativePath(videoPath)}.`,
'The video shows the full test session — analyze it chronologically.',
'Focus on UI regressions, broken states, visual glitches, unreadable text, missing labels/i18n, and clear workflow failures.',
'Note: Brief black frames during page transitions are NORMAL and should NOT be reported as issues.',
'Note: Small cyan/purple dashed labels prefixed with "QA:" are annotations placed by the automated test script — they are NOT part of the application UI. Do not treat them as bugs or evidence.',
'Report only concrete, visible problems and avoid speculation.',
'If confidence is low, mark it explicitly.',
'',
'Return markdown with these sections exactly:',
'## Summary',
isIssueContext
? '(Explain what bug was reported and whether the video confirms it is reproducible)'
: prContext
? '(Explain what the PR intended and whether the video confirms it works)'
: '',
'## Confirmed Issues',
'For each confirmed issue, use this exact format (one block per issue):',
'',
'### [Short issue title]',
'`HIGH` `01:03` `Confidence: High`',
'',
'[Description of the issue — what went wrong and what was expected]',
'',
'**Evidence:** [What you observed in the video at the given timestamp]',
'',
'**Suggested Fix:** [Actionable recommendation]',
'',
'---',
'',
'The first line after the heading MUST be exactly three backtick-wrapped labels:',
'`SEVERITY` `TIMESTAMP` `Confidence: LEVEL`',
'Do NOT use a table for issues — use the block format above.',
'## Possible Issues (Needs Human Verification)',
'## Overall Risk',
'',
'## Narration',
'If the video contains a TTS audio narration track, transcribe it here.',
'If there is no audio or the video is silent, write "No narration detected."',
'',
'## Verdict',
'End your report with this EXACT JSON block (no markdown fence):',
'{"verdict": "REPRODUCED" | "NOT_REPRODUCIBLE" | "INCONCLUSIVE", "risk": "low" | "medium" | "high" | null, "confidence": "high" | "medium" | "low", "narrationDetected": true | false}',
'- REPRODUCED: the bug/behavior is clearly visible in the video',
'- NOT_REPRODUCIBLE: the steps were performed correctly but the bug was not observed',
'- INCONCLUSIVE: the reproduction steps were not performed or the video is insufficient',
'- narrationDetected: true if you heard TTS voice narration in the video, false if silent'
)
return lines.filter(Boolean).join('\n')
}
const MAX_VIDEO_BYTES = 100 * 1024 * 1024
async function readVideoFile(videoPath: string): Promise<Buffer> {
const fileStat = await stat(videoPath)
if (fileStat.size > MAX_VIDEO_BYTES) {
throw new Error(
`Video ${basename(videoPath)} is ${formatBytes(fileStat.size)}, exceeds ${formatBytes(MAX_VIDEO_BYTES)} limit`
)
}
return readFile(videoPath)
}
async function requestGeminiReview(options: {
apiKey: string
model: string
platformName: string
videoPath: string
beforeVideoPath: string
timeoutMs: number
prContext: string
}): Promise<string> {
const genAI = new GoogleGenerativeAI(options.apiKey)
const model = genAI.getGenerativeModel({ model: options.model })
const isComparative = options.beforeVideoPath.length > 0
const prompt = buildReviewPrompt({
platformName: options.platformName,
videoPath: options.videoPath,
prContext: options.prContext,
isComparative
})
const parts: Array<
{ text: string } | { inlineData: { mimeType: string; data: string } }
> = [{ text: prompt }]
if (isComparative) {
const beforeBuffer = await readVideoFile(options.beforeVideoPath)
parts.push(
{ text: 'Video 1 — BEFORE (main branch):' },
{
inlineData: {
mimeType: getMimeType(options.beforeVideoPath),
data: beforeBuffer.toString('base64')
}
}
)
}
const afterBuffer = await readVideoFile(options.videoPath)
if (isComparative) {
parts.push({ text: 'Video 2 — AFTER (PR branch):' })
}
parts.push({
inlineData: {
mimeType: getMimeType(options.videoPath),
data: afterBuffer.toString('base64')
}
})
const result = await model.generateContent(parts, {
timeout: options.timeoutMs
})
const response = result.response
const text = response.text()
if (!text || text.trim().length === 0) {
throw new Error('Gemini API returned no output text')
}
return text.trim()
}
function formatBytes(bytes: number): string {
if (bytes < 1024) return `${bytes} B`
if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`
return `${(bytes / (1024 * 1024)).toFixed(1)} MB`
}
function buildReportMarkdown(input: {
platformName: string
model: string
videoPath: string
videoSizeBytes: number
beforeVideoPath?: string
beforeVideoSizeBytes?: number
reviewText: string
targetUrl?: string
}): string {
const headerLines = [
`# ${input.platformName} QA Video Report`,
'',
`- Generated at: ${new Date().toISOString()}`,
`- Model: \`${input.model}\``
]
if (input.targetUrl) {
headerLines.push(`- Target: ${input.targetUrl}`)
}
if (input.beforeVideoPath) {
headerLines.push(
`- Before video: \`${toProjectRelativePath(input.beforeVideoPath)}\` (${formatBytes(input.beforeVideoSizeBytes ?? 0)})`,
`- After video: \`${toProjectRelativePath(input.videoPath)}\` (${formatBytes(input.videoSizeBytes)})`,
'- Mode: **Comparative (before/after)**'
)
} else {
headerLines.push(
`- Source video: \`${toProjectRelativePath(input.videoPath)}\``,
`- Video size: ${formatBytes(input.videoSizeBytes)}`
)
}
headerLines.push('', '## AI Review', '')
return `${headerLines.join('\n')}${input.reviewText.trim()}\n`
}
async function reviewVideo(
video: VideoCandidate,
options: CliOptions,
apiKey: string
): Promise<void> {
let prContext = ''
if (options.prContext) {
try {
prContext = await readFile(options.prContext, 'utf-8')
process.stdout.write(
`[${video.platformName}] Loaded PR context from ${options.prContext}\n`
)
} catch {
process.stdout.write(
`[${video.platformName}] Warning: Could not read PR context file ${options.prContext}\n`
)
}
}
const beforeVideoPath = options.beforeVideo
? resolve(options.beforeVideo)
: ''
if (beforeVideoPath) {
const beforeStat = await stat(beforeVideoPath)
process.stdout.write(
`[${video.platformName}] Before video: ${toProjectRelativePath(beforeVideoPath)} (${formatBytes(beforeStat.size)})\n`
)
}
process.stdout.write(
`[${video.platformName}] Sending ${beforeVideoPath ? '2 videos (comparative)' : 'video'} to ${options.model}\n`
)
const reviewText = await requestGeminiReview({
apiKey,
model: options.model,
platformName: video.platformName,
videoPath: video.videoPath,
beforeVideoPath,
timeoutMs: options.requestTimeoutMs,
prContext
})
const videoStat = await stat(video.videoPath)
const passSegment = options.passLabel ? `-${options.passLabel}` : ''
const outputPath = resolve(
options.outputDir,
`${video.platformName}${passSegment}-qa-video-report.md`
)
const reportInput: Parameters<typeof buildReportMarkdown>[0] = {
platformName: video.platformName,
model: options.model,
videoPath: video.videoPath,
videoSizeBytes: videoStat.size,
reviewText,
targetUrl: options.targetUrl || undefined
}
if (beforeVideoPath) {
const beforeStat = await stat(beforeVideoPath)
reportInput.beforeVideoPath = beforeVideoPath
reportInput.beforeVideoSizeBytes = beforeStat.size
}
const reportMarkdown = buildReportMarkdown(reportInput)
await mkdir(dirname(outputPath), { recursive: true })
await writeFile(outputPath, reportMarkdown, 'utf-8')
process.stdout.write(
`[${video.platformName}] Wrote ${toProjectRelativePath(outputPath)}\n`
)
}
function isExecutedAsScript(metaUrl: string): boolean {
const modulePath = fileURLToPath(metaUrl)
const scriptPath = process.argv[1] ? resolve(process.argv[1]) : ''
return modulePath === scriptPath
}
async function main(): Promise<void> {
const options = parseCliOptions(process.argv.slice(2))
const candidates = await collectVideoCandidates(options.artifactsDir)
if (candidates.length === 0) {
process.stdout.write(
`No qa-session.mp4 files found under ${toProjectRelativePath(resolve(options.artifactsDir))}\n`
)
return
}
const selectedVideo = selectVideoCandidateByFile(candidates, {
artifactsDir: options.artifactsDir,
videoFile: options.videoFile
})
process.stdout.write(
`Selected ${selectedVideo.platformName}: ${toProjectRelativePath(selectedVideo.videoPath)}\n`
)
if (options.dryRun) {
process.stdout.write('\nDry run mode enabled, no API calls were made.\n')
return
}
const apiKey = process.env.GEMINI_API_KEY
if (!apiKey) {
throw new Error('GEMINI_API_KEY is required unless --dry-run is set')
}
await reviewVideo(selectedVideo, options, apiKey)
}
if (isExecutedAsScript(import.meta.url)) {
void main().catch((error: unknown) => {
const message = errorToString(error)
process.stderr.write(`qa-video-review failed: ${message}\n`)
process.exit(1)
})
}

View File

@@ -0,0 +1,513 @@
#!/usr/bin/env tsx
/**
* QA CLI — simplified entry point for local & CI QA runs
*
* Usage:
* pnpm qa 10253 # auto-detects issue vs PR
* pnpm qa https://github.com/.../pull/10270
* pnpm qa 10270 -t base # test PR base (reproduce bug)
* pnpm qa 10270 -t both # test base + head
* pnpm qa --uncommitted # test local uncommitted changes
*
* Automatically loads .env.local / .env for GEMINI_API_KEY, ANTHROPIC_API_KEY.
* Results are written to .comfy-qa/<number>/ by default.
*/
import { parseArgs } from 'node:util'
import { config } from 'dotenv'
import { existsSync, mkdirSync, writeFileSync } from 'fs'
import { dirname, resolve } from 'path'
import { execSync, spawn, spawnSync } from 'child_process'
import { fileURLToPath } from 'url'
// ── Constants ──
const SCRIPT_DIR = dirname(fileURLToPath(import.meta.url))
const RECORD_SCRIPT = resolve(SCRIPT_DIR, 'qa-record.ts')
const DEFAULT_REPO = 'Comfy-Org/ComfyUI_frontend'
const VALID_TARGETS = ['head', 'base', 'both'] as const
const CLOUD_FALLBACK_URL = 'https://testcloud.comfy.org/'
type PrTarget = (typeof VALID_TARGETS)[number]
type TargetType = 'issue' | 'pr'
// ── Load .env.local / .env ──
for (const f of ['.env.local', '.env']) {
if (existsSync(f)) {
config({ path: f })
break
}
}
// ── Parse CLI ──
const { values, positionals } = tryParseArgs()
if (values.help) {
printUsage()
process.exit(0)
}
const serverUrl =
values.url || process.env.DEV_SERVER_COMFYUI_URL || 'http://127.0.0.1:8188'
const prTarget = values.target as PrTarget
if (!VALID_TARGETS.includes(prTarget)) {
console.error(
`Invalid --target "${prTarget}". Must be one of: ${VALID_TARGETS.join(', ')}`
)
process.exit(1)
}
// ── Ensure server is reachable (may fall back to cloud) ──
const resolvedServerUrl = await ensureServer(serverUrl)
// ── Dispatch by mode ──
if (values.uncommitted) {
runUncommitted()
} else {
const input = positionals[0]
if (!input) {
printUsage()
process.exit(1)
}
runTarget(input)
}
// ── Mode: uncommitted changes ──
function runUncommitted(): never {
const diff = shell('git diff && git diff --staged')
if (!diff.trim()) {
console.error('No uncommitted changes found')
process.exit(1)
}
const outputDir = resolveOutputDir('.comfy-qa/local')
const diffFile = writeTmpFile(outputDir, 'uncommitted.diff', diff)
logHeader({ label: 'uncommitted changes', outputDir })
const code = runQaRecord('after', diffFile, outputDir)
exit(code, outputDir)
}
// ── Mode: issue or PR by number/URL ──
function runTarget(input: string): never {
const { targetType, number, repo } = resolveTarget(input)
const outputDir = resolveOutputDir(`.comfy-qa/${number}`)
logHeader({
label: `${targetType} #${number} (${repo})`,
outputDir,
extra: targetType === 'pr' ? `Target: ${prTarget}` : undefined
})
const diffFile =
targetType === 'issue'
? fetchIssue(number, repo, outputDir)
: fetchPR(number, repo, outputDir)
let exitCode: number
if (targetType === 'issue') {
exitCode = runQaRecord('reproduce', diffFile, outputDir)
} else if (prTarget === 'both') {
exitCode = runPrBoth(diffFile, outputDir)
} else if (prTarget === 'base') {
exitCode = runQaRecord('before', diffFile, outputDir)
} else {
exitCode = runQaRecord('after', diffFile, outputDir)
}
exit(exitCode, outputDir)
}
// ── PR both phases ──
function runPrBoth(diffFile: string, outputDir: string): number {
console.warn('\n=== Phase 1: Reproduce bug on base ===')
const baseDir = resolve(outputDir, 'base')
mkdirSync(baseDir, { recursive: true })
const baseCode = runQaRecord('before', diffFile, baseDir)
if (baseCode !== 0) {
console.warn('Base phase failed, continuing to head...')
}
console.warn('\n=== Phase 2: Demonstrate fix on head ===')
const headDir = resolve(outputDir, 'head')
mkdirSync(headDir, { recursive: true })
return runQaRecord('after', diffFile, headDir)
}
// ── Target resolution ──
function resolveTarget(input: string): {
targetType: TargetType
number: string
repo: string
} {
const urlMatch = input.match(
/github\.com\/([^/]+\/[^/]+)\/(issues|pull)\/(\d+)/
)
if (urlMatch) {
return {
repo: urlMatch[1],
targetType: urlMatch[2] === 'pull' ? 'pr' : 'issue',
number: urlMatch[3]
}
}
if (/^\d+$/.test(input)) {
return {
repo: DEFAULT_REPO,
targetType: detectType(input, DEFAULT_REPO),
number: input
}
}
console.error(`Cannot parse target: ${input}`)
console.error('Expected a GitHub URL or issue/PR number')
printUsage()
process.exit(1)
}
function detectType(number: string, repo: string): TargetType {
try {
const result = execSync(
`gh api repos/${repo}/issues/${number} --jq 'has("pull_request")'`,
{ encoding: 'utf-8', timeout: 15000, stdio: ['pipe', 'pipe', 'pipe'] }
)
return result.trim() === 'true' ? 'pr' : 'issue'
} catch {
return 'issue'
}
}
// ── Data fetching ──
function fetchIssue(number: string, repo: string, outputDir: string): string {
console.warn(`Fetching issue #${number}...`)
const body = shell(
`gh issue view ${number} --repo ${repo} --json title,body,labels --jq '"Title: " + .title + "\\n\\nLabels: " + ([.labels[].name] | join(", ")) + "\\n\\n" + .body'`
)
return writeTmpFile(outputDir, `issue-${number}.txt`, body)
}
function fetchPR(number: string, repo: string, outputDir: string): string {
console.warn(`Fetching PR #${number}...`)
const prJson = shell(
`gh pr view ${number} --repo ${repo} --json title,body,baseRefName,headRefName,baseRefOid,headRefOid`
)
const pr = JSON.parse(prJson) as {
title: string
body: string
baseRefName: string
headRefName: string
baseRefOid: string
headRefOid: string
}
console.warn(` Base: ${pr.baseRefName} (${pr.baseRefOid.slice(0, 8)})`)
console.warn(` Head: ${pr.headRefName} (${pr.headRefOid.slice(0, 8)})`)
let diff = ''
try {
diff = shell(`gh pr diff ${number} --repo ${repo}`)
} catch {
console.warn('Could not fetch PR diff')
}
writeTmpFile(
outputDir,
'refs.json',
JSON.stringify(
{
base: { ref: pr.baseRefName, sha: pr.baseRefOid },
head: { ref: pr.headRefName, sha: pr.headRefOid }
},
null,
2
)
)
return writeTmpFile(
outputDir,
`pr-${number}.txt`,
`Title: ${pr.title}\n\n${pr.body}\n\n--- DIFF ---\n\n${diff}`
)
}
// ── QA record runner ──
function runQaRecord(
mode: string,
diffFile: string,
outputDir: string
): number {
console.warn(`\nStarting QA ${mode} mode...\n`)
const r = spawnSync(
'pnpm',
[
'exec',
'tsx',
RECORD_SCRIPT,
'--mode',
mode,
'--diff',
diffFile,
'--output-dir',
outputDir,
'--url',
resolvedServerUrl
],
{ stdio: 'inherit', env: process.env }
)
return r.status ?? 1
}
// ── Server management ──
async function ensureServer(url: string): Promise<string> {
if (await isReachable(url)) {
console.warn(`Server OK: ${url}`)
return url
}
console.warn(`Server not reachable at ${url}, attempting auto-start...`)
const port = new URL(url).port || '8188'
// Strategy 1: comfy-cli (pip install comfy-cli)
try {
execSync('which comfy', { stdio: 'pipe' })
console.warn('Starting ComfyUI via comfy-cli...')
const proc = spawn(
'comfy',
['launch', '--background', '--', '--cpu', '--port', port],
{
stdio: 'ignore',
detached: true
}
)
proc.unref()
await waitForServer(url, 120000)
return url
} catch {
// comfy-cli not available
}
// Strategy 2: python main.py from TEST_COMFYUI_DIR or .comfy-qa/ComfyUI
const comfyDir = findComfyUIDir()
if (comfyDir) {
console.warn(`Starting ComfyUI from ${comfyDir}...`)
const proc = spawn('python', ['main.py', '--cpu', '--port', port], {
cwd: comfyDir,
stdio: 'ignore',
detached: true
})
proc.unref()
await waitForServer(url, 120000)
return url
}
// Strategy 3: clone ComfyUI and start
const cloneDir = resolve('.comfy-qa/ComfyUI')
if (!existsSync(resolve(cloneDir, 'main.py'))) {
console.warn('No ComfyUI installation found, cloning...')
try {
execSync(
`git clone --depth 1 https://github.com/comfyanonymous/ComfyUI.git "${cloneDir}"`,
{ stdio: 'inherit', timeout: 120000 }
)
console.warn('Installing ComfyUI dependencies...')
execSync('pip install -r requirements.txt', {
cwd: cloneDir,
stdio: 'inherit',
timeout: 300000
})
} catch (err) {
console.warn(
`Clone/install failed: ${err instanceof Error ? err.message : err}`
)
}
}
if (existsSync(resolve(cloneDir, 'main.py'))) {
console.warn(`Starting ComfyUI from ${cloneDir}...`)
const proc = spawn('python', ['main.py', '--cpu', '--port', port], {
cwd: cloneDir,
stdio: 'ignore',
detached: true
})
proc.unref()
await waitForServer(url, 120000)
return url
}
// Strategy 4: fallback to testcloud
console.warn(`Local server failed. Falling back to ${CLOUD_FALLBACK_URL}`)
if (await isReachable(CLOUD_FALLBACK_URL)) {
console.warn(`Cloud server OK: ${CLOUD_FALLBACK_URL}`)
return CLOUD_FALLBACK_URL
}
console.error(`
No ComfyUI server available. Tried:
1. ${url} (not reachable)
2. comfy-cli (not installed)
3. Local ComfyUI installation (not found)
4. ${CLOUD_FALLBACK_URL} (not reachable)
Install: pip install comfy-cli && comfy install && comfy launch --cpu
`)
process.exit(1)
}
function findComfyUIDir(): string | undefined {
const candidates = [
process.env.TEST_COMFYUI_DIR,
resolve('.comfy-qa/ComfyUI'),
'/home/ComfyUI'
]
return candidates.find((d) => d && existsSync(resolve(d, 'main.py')))
}
async function isReachable(url: string): Promise<boolean> {
try {
const controller = new AbortController()
const timeout = setTimeout(() => controller.abort(), 5000)
const res = await fetch(url, { signal: controller.signal })
clearTimeout(timeout)
return res.ok || res.status === 200 || res.status === 304
} catch {
return false
}
}
async function waitForServer(url: string, timeoutMs: number): Promise<void> {
const start = Date.now()
while (Date.now() - start < timeoutMs) {
if (await isReachable(url)) {
console.warn('Server is ready')
return
}
await new Promise((r) => setTimeout(r, 2000))
}
console.error(`Server did not start within ${timeoutMs / 1000}s`)
process.exit(1)
}
// ── Utilities ──
function shell(cmd: string): string {
return execSync(cmd, { encoding: 'utf-8', timeout: 30000 })
}
function writeTmpFile(
outputDir: string,
filename: string,
content: string
): string {
const tmpDir = resolve(outputDir, '.tmp')
mkdirSync(tmpDir, { recursive: true })
const filePath = resolve(tmpDir, filename)
writeFileSync(filePath, content)
return filePath
}
function resolveOutputDir(defaultPath: string): string {
const dir = values.output ? resolve(values.output) : resolve(defaultPath)
mkdirSync(dir, { recursive: true })
return dir
}
function logHeader(opts: { label: string; outputDir: string; extra?: string }) {
console.warn(`QA target: ${opts.label}`)
console.warn(`Output: ${opts.outputDir}`)
console.warn(`Server: ${resolvedServerUrl}`)
if (values.ref) console.warn(`Ref: ${values.ref}`)
if (opts.extra) console.warn(opts.extra)
}
function exit(code: number, outputDir: string): never {
console.warn('\n=== QA Complete ===')
console.warn(`Results: ${outputDir}`)
try {
console.warn(shell(`ls -la "${outputDir}"`))
} catch {
// not critical
}
process.exit(code)
}
function tryParseArgs() {
try {
const parsed = parseArgs({
args: process.argv.slice(2),
options: {
target: { type: 'string', short: 't', default: 'head' },
uncommitted: { type: 'boolean', default: false },
url: { type: 'string', default: '' },
ref: { type: 'string', default: '' },
output: { type: 'string', short: 'o', default: '' },
help: { type: 'boolean', short: 'h', default: false }
},
allowPositionals: true,
strict: true
})
return {
values: parsed.values as {
target: string
uncommitted: boolean
url: string
ref: string
output: string
help: boolean
},
positionals: parsed.positionals
}
} catch (err) {
console.error(`Error: ${err instanceof Error ? err.message : err}\n`)
printUsage()
process.exit(1)
}
}
function printUsage() {
console.warn(`
QA CLI — Reproduce issues & test PRs for ComfyUI frontend
Usage:
pnpm qa <number|url> [options]
pnpm qa --uncommitted
Targets:
10253 Number (auto-detects issue vs PR via gh CLI)
https://github.com/Comfy-Org/ComfyUI_frontend/issues/10253
https://github.com/Comfy-Org/ComfyUI_frontend/pull/10270
Options:
-t, --target <head|base|both>
For PRs: which ref to test (default: head)
head — test the fix (PR head)
base — reproduce the bug (PR base)
both — base then head
--uncommitted Test local uncommitted changes
--url <url> ComfyUI server URL (default: from .env or http://127.0.0.1:8188)
--ref <ref> Git ref to test against
-o, --output <dir> Override output directory (default: .comfy-qa/<number>)
-h, --help Show this help
Environment (auto-loaded from .env.local or .env):
GEMINI_API_KEY Required — used for PR analysis, video review, TTS
ANTHROPIC_API_KEY Optional locally — Claude Agent SDK auto-detects Claude Code session
Examples:
pnpm qa 10253 # reproduce an issue
pnpm qa 10270 # test PR head (the fix)
pnpm qa 10270 -t base # reproduce bug on PR base
pnpm qa 10270 -t both # test base + head
pnpm qa --uncommitted # test local changes
`)
}

View File

@@ -0,0 +1,246 @@
---
name: hardening-flaky-e2e-tests
description: 'Diagnoses and fixes flaky Playwright e2e tests by replacing race-prone patterns with retry-safe alternatives. Use when triaging CI flakes, hardening spec files, fixing timing races, or asked to stabilize browser tests. Triggers on: flaky, flake, harden, stabilize, race condition in e2e, intermittent failure.'
---
# Hardening Flaky E2E Tests
Fix flaky Playwright specs by identifying race-prone patterns and replacing them with retry-safe alternatives. This skill covers diagnosis, pattern matching, and mechanical transforms — not writing new tests (see `writing-playwright-tests` for that).
## Workflow
### 1. Gather CI Evidence
```bash
gh run list --workflow=ci-test.yaml --limit=5
gh run download <run-id> -n playwright-report
```
- Open `report.json` and search for `"status": "flaky"` entries.
- Collect file paths, test titles, and error messages.
- Do NOT trust green checks alone — flaky tests that passed on retry still need fixing.
- Use `error-context.md`, traces, and page snapshots before editing code.
- Pull the newest run after each push instead of assuming the flaky set is unchanged.
### 2. Classify the Flake
Read the failing assertion and match it against the pattern table. Most flakes fall into one of these categories:
| # | Pattern | Signature in Code | Fix |
| --- | ------------------------------------- | --------------------------------------------------------- | ---------------------------------------------------------------- |
| 1 | **Snapshot-then-assert** | `expect(await evaluate()).toBe(x)` | `await expect.poll(() => evaluate()).toBe(x)` |
| 2 | **Immediate count** | `const n = await loc.count(); expect(n).toBe(3)` | `await expect(loc).toHaveCount(3)` |
| 3 | **nextFrame after menu click** | `clickMenuItem(x); nextFrame()` | `clickMenuItem(x); contextMenu.waitForHidden()` |
| 4 | **Tight poll timeout** | `expect.poll(..., { timeout: 250 })` | ≥2000 ms; prefer default 5000 ms |
| 5 | **Immediate evaluate after mutation** | `setSetting(k, v); expect(await evaluate()).toBe(x)` | `await expect.poll(() => evaluate()).toBe(x)` |
| 6 | **Screenshot without readiness** | `loadWorkflow(); nextFrame(); toHaveScreenshot()` | `waitForNodes()` or poll state first |
| 7 | **Non-deterministic node order** | `getNodeRefsByType('X')[0]` with >1 match | `getNodeRefById(id)` or guard `toHaveLength(1)` |
| 8 | **Fake readiness helper** | Helper clicks but doesn't assert state | Remove; poll the actual value |
| 9 | **Immediate graph state after drop** | `expect(await getLinkCount()).toBe(1)` | `await expect.poll(() => getLinkCount()).toBe(1)` |
| 10 | **Immediate boundingBox/layout read** | `const box = await loc.boundingBox(); expect(box!.width)` | `await expect.poll(() => loc.boundingBox().then(b => b?.width))` |
### 3. Apply the Transform
#### Rule: Choose the Smallest Correct Assertion
- **Locator state** → use built-in retrying assertions: `toBeVisible()`, `toHaveText()`, `toHaveCount()`, `toHaveClass()`
- **Single async value** → `expect.poll(() => asyncFn()).toBe(expected)`
- **Multiple assertions that must settle together** → `expect(async () => { ... }).toPass()`
- **Never** use `waitForTimeout()` to hide a race.
```typescript
// ✅ Single value — use expect.poll
await expect
.poll(() => comfyPage.page.evaluate(() => window.app!.graph.links.length))
.toBe(3)
// ✅ Locator count — use toHaveCount
await expect(comfyPage.page.locator('.dom-widget')).toHaveCount(2)
// ✅ Multiple conditions — use toPass
await expect(async () => {
expect(await node1.getValue()).toBe('foo')
expect(await node2.getValue()).toBe('bar')
}).toPass({ timeout: 5000 })
```
#### Rule: Wait for the Real Readiness Boundary
Visible is not always ready. Prefer user-facing assertions when possible; poll internal state only when there is no UI surface to assert on.
Common readiness boundaries:
| After this action... | Wait for... |
| -------------------------------------- | ------------------------------------------------------------ |
| Canvas interaction (drag, click node) | `await comfyPage.nextFrame()` |
| Menu item click | `await contextMenu.waitForHidden()` |
| Workflow load | `await comfyPage.workflow.loadWorkflow(...)` (built-in wait) |
| Settings write | Poll the setting value with `expect.poll()` |
| Node pin/bypass/collapse toggle | `await expect.poll(() => nodeRef.isPinned()).toBe(true)` |
| Graph mutation (add/remove node, link) | Poll link/node count |
| Clipboard write | Poll pasted value |
| Screenshot | Ensure nodes are rendered: `waitForNodes()` or poll state |
#### Rule: Expose Locators for Retrying Assertions
When a helper returns a count via `await loc.count()`, callers can't use `toHaveCount()`. Expose the underlying `Locator` as a getter so callers choose between:
```typescript
// Helper exposes locator
get domWidgets(): Locator {
return this.page.locator('.dom-widget')
}
// Caller uses retrying assertion
await expect(comfyPage.domWidgets).toHaveCount(2)
```
Replace count methods with locator getters so callers can use retrying assertions directly.
#### Rule: Fix Check-then-Act Races in Helpers
```typescript
// ❌ Race: count can change between check and waitFor
const count = await locator.count()
if (count > 0) {
await locator.waitFor({ state: 'hidden' })
}
// ✅ Direct: waitFor handles both cases
await locator.waitFor({ state: 'hidden' })
```
#### Rule: Remove force:true from Clicks
`force: true` bypasses actionability checks, hiding real animation/visibility races. Remove it and fix the underlying timing issue.
```typescript
// ❌ Hides the race
await closeButton.click({ force: true })
// ✅ Surfaces the real issue — fix with proper wait
await closeButton.click()
await dialog.waitForHidden()
```
#### Rule: Handle Non-deterministic Element Order
When `getNodeRefsByType` returns multiple nodes, the order is not guaranteed. Don't use index `[0]` blindly.
```typescript
// ❌ Assumes order
const node = (await comfyPage.nodeOps.getNodeRefsByType('CLIPTextEncode'))[0]
// ✅ Find by ID or proximity
const nodes = await comfyPage.nodeOps.getNodeRefsByType('CLIPTextEncode')
let target = nodes[0]
for (const n of nodes) {
const pos = await n.getPosition()
if (Math.abs(pos.y - expectedY) < minDist) target = n
}
```
Or guard the assumption:
```typescript
const nodes = await comfyPage.nodeOps.getNodeRefsByType('CLIPTextEncode')
expect(nodes).toHaveLength(1)
const node = nodes[0]
```
#### Rule: Use toPass for Timing-sensitive Dismiss Guards
Some UI elements (e.g. LiteGraph's graphdialog) have built-in dismiss delays. Retry the entire dismiss action:
```typescript
// ✅ Retry click+assert together
await expect(async () => {
await comfyPage.canvas.click({ position: { x: 10, y: 10 } })
await expect(dialog).toBeHidden({ timeout: 500 })
}).toPass({ timeout: 5000 })
```
### 4. Keep Changes Narrow
- Shared helpers should drive setup to a stable boundary.
- Do not encode one-spec timing assumptions into generic helpers.
- If a race only matters to one spec, prefer a local wait in that spec.
- If a helper fails before the real test begins, remove or relax the brittle precondition and let downstream UI interaction prove readiness.
### 5. Verify Narrowly
```bash
# Targeted rerun with repetition
pnpm test:browser:local -- browser_tests/tests/myFile.spec.ts --repeat-each 10
# Single test by line number (avoids grep quoting issues on Windows)
pnpm test:browser:local -- browser_tests/tests/myFile.spec.ts:42
```
- Use `--repeat-each 10` for targeted flake verification (use 20 for single test cases).
- Verify with the smallest command that exercises the flaky path.
### 6. Watch CI E2E Runs
After pushing, use `gh` to monitor the E2E workflow:
```bash
# Find the run for the current branch
gh run list --workflow="CI: Tests E2E" --branch=$(git branch --show-current) --limit=1
# Watch it live (blocks until complete, streams logs)
gh run watch <run-id>
# One-liner: find and watch the latest E2E run for the current branch
gh run list --workflow="CI: Tests E2E" --branch=$(git branch --show-current) --limit=1 --json databaseId --jq ".[0].databaseId" | xargs gh run watch
```
On Windows (PowerShell):
```powershell
# One-liner equivalent
gh run watch (gh run list --workflow="CI: Tests E2E" --branch=$(git branch --show-current) --limit=1 --json databaseId --jq ".[0].databaseId")
```
After the run completes:
```bash
# Download the Playwright report artifact
gh run download <run-id> -n playwright-report
# View the run summary in browser
gh run view <run-id> --web
```
Also watch the unit test workflow in parallel if you changed helpers:
```bash
gh run list --workflow="CI: Tests Unit" --branch=$(git branch --show-current) --limit=1
```
### 7. Pre-merge Checklist
Before merging a flaky-test fix, confirm:
- [ ] The latest CI artifact was inspected directly
- [ ] The root cause is stated as a race or readiness mismatch
- [ ] The fix waits on the real readiness boundary
- [ ] The assertion primitive matches the job (poll vs toHaveCount vs toPass)
- [ ] The fix stays local unless a shared helper truly owns the race
- [ ] Local verification uses a targeted rerun
- [ ] No behavioral changes to the test — only timing/retry strategy updated
## Local Noise — Do Not Fix
These are local distractions, not CI root causes:
- Missing local input fixture files required by the test path
- Missing local models directory
- Teardown `EPERM` while restoring the local browser-test user data directory
- Local screenshot baseline differences on Windows
Rules:
- First confirm whether it blocks the exact flaky path under investigation.
- Do not commit temporary local assets used only for verification.
- Do not commit local screenshot baselines.

View File

@@ -46,3 +46,11 @@ ALGOLIA_API_KEY=684d998c36b67a9a9fce8fc2d8860579
# SENTRY_ORG=comfy-org
# SENTRY_PROJECT=cloud-frontend-staging
# SENTRY_PROJECT_PROD= # prod project slug for sourcemap uploads
# ── QA Skill (scripts/qa-record.ts) ──
# Required for automated bug reproduction via `pnpm exec tsx scripts/qa-record.ts`
# GEMINI_API_KEY is required — used for PR analysis, video review, and TTS narration
GEMINI_API_KEY=
# ANTHROPIC_API_KEY is optional locally — Claude Agent SDK auto-detects Claude Code session
# Required in CI (set as GitHub Actions secret)
# ANTHROPIC_API_KEY=

View File

@@ -1,107 +0,0 @@
# Description: When upstream comfy-api is updated, click dispatch to update the TypeScript type definitions in this repo
name: 'Api: Update Registry API Types'
on:
# Manual trigger
workflow_dispatch:
# Triggered from comfy-api repo
repository_dispatch:
types: [comfy-api-updated]
jobs:
update-registry-types:
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version-file: '.nvmrc'
cache: 'pnpm'
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Checkout comfy-api repository
uses: actions/checkout@v6
with:
repository: Comfy-Org/comfy-api
path: comfy-api
token: ${{ secrets.COMFY_API_PAT }}
clean: true
- name: Get API commit information
id: api-info
run: |
cd comfy-api
API_COMMIT=$(git rev-parse --short HEAD)
echo "commit=${API_COMMIT}" >> $GITHUB_OUTPUT
cd ..
- name: Generate API types
run: |
echo "Generating TypeScript types from comfy-api@${{ steps.api-info.outputs.commit }}..."
mkdir -p ./packages/registry-types/src
pnpm dlx openapi-typescript ./comfy-api/openapi.yml --output ./packages/registry-types/src/comfyRegistryTypes.ts
- name: Validate generated types
run: |
if [ ! -f ./packages/registry-types/src/comfyRegistryTypes.ts ]; then
echo "Error: Types file was not generated."
exit 1
fi
# Check if file is not empty
if [ ! -s ./packages/registry-types/src/comfyRegistryTypes.ts ]; then
echo "Error: Generated types file is empty."
exit 1
fi
- name: Lint generated types
run: |
echo "Linting generated Comfy Registry API types..."
pnpm lint:fix:no-cache -- ./packages/registry-types/src/comfyRegistryTypes.ts
- name: Check for changes
id: check-changes
run: |
if [[ -z $(git status --porcelain ./packages/registry-types/src/comfyRegistryTypes.ts) ]]; then
echo "No changes to Comfy Registry API types detected."
echo "changed=false" >> $GITHUB_OUTPUT
exit 0
else
echo "Changes detected in Comfy Registry API types."
echo "changed=true" >> $GITHUB_OUTPUT
fi
- name: Create Pull Request
if: steps.check-changes.outputs.changed == 'true'
uses: peter-evans/create-pull-request@c0f553fe549906ede9cf27b5156039d195d2ece0 # v8.1.0
with:
token: ${{ secrets.PR_GH_TOKEN }}
commit-message: '[chore] Update Comfy Registry API types from comfy-api@${{ steps.api-info.outputs.commit }}'
title: '[chore] Update Comfy Registry API types from comfy-api@${{ steps.api-info.outputs.commit }}'
body: |
## Automated API Type Update
This PR updates the Comfy Registry API types from the latest comfy-api OpenAPI specification.
- API commit: ${{ steps.api-info.outputs.commit }}
- Generated on: ${{ github.event.repository.updated_at }}
These types are automatically generated using openapi-typescript.
branch: update-registry-types-${{ steps.api-info.outputs.commit }}
base: main
labels: CNR
delete-branch: true
add-paths: |
packages/registry-types/src/comfyRegistryTypes.ts

View File

@@ -31,4 +31,5 @@ jobs:
uses: codecov/codecov-action@1af58845a975a7985b0beb0cbe6fbbb71a41dbad # v5.5.3
with:
files: coverage/lcov.info
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: false

33
.github/workflows/ci-website-build.yaml vendored Normal file
View File

@@ -0,0 +1,33 @@
# Description: Build and validate the marketing website (apps/website)
name: 'CI: Website Build'
on:
push:
branches: [main, master, website/*]
paths:
- 'apps/website/**'
- 'packages/design-system/**'
- 'pnpm-lock.yaml'
pull_request:
branches-ignore: [wip/*, draft/*, temp/*]
paths:
- 'apps/website/**'
- 'packages/design-system/**'
- 'pnpm-lock.yaml'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Build website
run: pnpm --filter @comfyorg/website build

View File

@@ -1,182 +0,0 @@
name: Hub CI
on:
push:
branches: [main]
paths:
- 'apps/hub/**'
- '.github/workflows/hub-ci.yaml'
pull_request:
branches: [main]
paths:
- 'apps/hub/**'
- '.github/workflows/hub-ci.yaml'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
pull-requests: write
jobs:
lint:
name: Lint & Check
runs-on: ubuntu-latest
defaults:
run:
working-directory: apps/hub
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Astro Check
run: pnpm run check
- name: Unit Tests
run: pnpm test
- name: Validate Templates
run: pnpm run validate:templates
continue-on-error: true
build:
name: Build Hub
runs-on: ubuntu-latest
defaults:
run:
working-directory: apps/hub
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Build site
run: pnpm run build
env:
HUB_SKIP_SYNC: 'true'
SKIP_AI_GENERATION: 'true'
- name: Upload build artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: hub-build
path: apps/hub/dist
retention-days: 1
seo-audit:
name: SEO Audit
needs: build
runs-on: ubuntu-latest
defaults:
run:
working-directory: apps/hub
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Download build artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: hub-build
path: apps/hub/dist
- name: Validate sitemap
id: sitemap
continue-on-error: true
run: |
echo "## Sitemap Validation" >> $GITHUB_STEP_SUMMARY
if pnpm run validate:sitemap 2>&1 | tee sitemap-output.txt; then
echo "✅ Sitemap validation passed" >> $GITHUB_STEP_SUMMARY
echo "status=passed" >> $GITHUB_OUTPUT
else
echo "❌ Sitemap validation failed" >> $GITHUB_STEP_SUMMARY
echo "status=failed" >> $GITHUB_OUTPUT
fi
- name: Run SEO audit
id: seo
continue-on-error: true
run: |
echo "## SEO Audit" >> $GITHUB_STEP_SUMMARY
if pnpm run audit:seo 2>&1 | tee seo-output.txt; then
echo "✅ SEO audit passed" >> $GITHUB_STEP_SUMMARY
echo "status=passed" >> $GITHUB_OUTPUT
else
echo "⚠️ SEO audit found issues" >> $GITHUB_STEP_SUMMARY
echo "status=issues" >> $GITHUB_OUTPUT
fi
- name: Check internal links
id: links
continue-on-error: true
run: |
echo "## Link Check" >> $GITHUB_STEP_SUMMARY
DIST_DIR="dist"
if [ ! -d "$DIST_DIR" ]; then
echo "⚠️ No build output found at $DIST_DIR" >> $GITHUB_STEP_SUMMARY
echo "status=skipped" >> $GITHUB_OUTPUT
exit 0
fi
BROKEN_FILE="broken-links.txt"
: > "$BROKEN_FILE"
BROKEN_COUNT=0
TOTAL_COUNT=0
for htmlfile in $(find "$DIST_DIR" -name '*.html' \
-not -path "$DIST_DIR/ar/*" -not -path "$DIST_DIR/es/*" -not -path "$DIST_DIR/fr/*" \
-not -path "$DIST_DIR/ja/*" -not -path "$DIST_DIR/ko/*" -not -path "$DIST_DIR/pt-BR/*" \
-not -path "$DIST_DIR/ru/*" -not -path "$DIST_DIR/tr/*" -not -path "$DIST_DIR/zh/*" \
-not -path "$DIST_DIR/zh-TW/*" | head -500); do
hrefs=$(grep -oP 'href="(/[^"]*)"' "$htmlfile" | sed 's/href="//;s/"$//' || true)
for href in $hrefs; do
TOTAL_COUNT=$((TOTAL_COUNT + 1))
clean="${href%%#*}"
clean="${clean%%\?*}"
if [ -z "$clean" ] || [ "$clean" = "/" ]; then continue; fi
found=false
if [[ "$clean" =~ \.[a-zA-Z0-9]+$ ]]; then
[ -f "${DIST_DIR}${clean}" ] && found=true
else
base="${clean%/}"
[ -f "${DIST_DIR}${base}/index.html" ] && found=true
[ "$found" = false ] && [ -f "${DIST_DIR}${base}.html" ] && found=true
[ "$found" = false ] && [ -f "${DIST_DIR}${clean}" ] && found=true
[ "$found" = false ] && [ -d "${DIST_DIR}${base}" ] && found=true
fi
if [ "$found" = false ]; then
BROKEN_COUNT=$((BROKEN_COUNT + 1))
echo "- \`${href}\` (in ${htmlfile#${DIST_DIR}/})" >> "$BROKEN_FILE"
fi
done
done
if [ "$BROKEN_COUNT" -eq 0 ]; then
echo "✅ All internal links valid ($TOTAL_COUNT checked)" >> $GITHUB_STEP_SUMMARY
echo "status=passed" >> $GITHUB_OUTPUT
else
echo "❌ Found $BROKEN_COUNT broken internal links out of $TOTAL_COUNT" >> $GITHUB_STEP_SUMMARY
head -n 50 "$BROKEN_FILE" >> $GITHUB_STEP_SUMMARY
echo "status=failed" >> $GITHUB_OUTPUT
fi
- name: Upload SEO reports
if: always()
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: hub-seo-reports
path: |
apps/hub/seo-output.txt
apps/hub/seo-summary.json
apps/hub/broken-links.txt
if-no-files-found: ignore

View File

@@ -1,68 +0,0 @@
name: Hub Cron Rebuild
on:
schedule:
# Every 15 minutes — rebuilds the site to pick up new UGC workflows
# for search index, sitemap, filter pages, and pre-rendered detail pages.
- cron: '*/15 * * * *'
workflow_dispatch:
concurrency:
group: hub-deploy-prod
cancel-in-progress: false
permissions:
contents: read
jobs:
rebuild:
runs-on: ubuntu-latest
env:
SKIP_AI_GENERATION: 'true'
PUBLIC_POSTHOG_KEY: ${{ secrets.HUB_POSTHOG_KEY }}
PUBLIC_GA_MEASUREMENT_ID: ${{ secrets.HUB_GA_MEASUREMENT_ID }}
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Checkout templates data
uses: actions/checkout@v6
with:
repository: Comfy-Org/workflow_templates
path: _workflow_templates
sparse-checkout: templates
token: ${{ secrets.GH_TOKEN }}
- name: Restore content cache
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: apps/hub/.content-cache
key: hub-content-cache-cron-prod-${{ hashFiles('_workflow_templates/templates/**', 'apps/hub/src/**') }}
restore-keys: |
hub-content-cache-cron-prod-
- name: Sync templates
run: pnpm run sync
working-directory: apps/hub
env:
HUB_TEMPLATES_DIR: ${{ github.workspace }}/_workflow_templates/templates
- name: Build Astro site
run: pnpm run build
working-directory: apps/hub
env:
PUBLIC_HUB_API_URL: ${{ secrets.HUB_API_URL_PRODUCTION }}
PUBLIC_COMFY_CLOUD_URL: ${{ secrets.COMFY_CLOUD_URL_PRODUCTION }}
PUBLIC_APPROVED_ONLY: 'true'
- name: Deploy to Vercel
uses: amondnet/vercel-action@16e87c0a08142b0d0d33b76aeaf20823c381b9b9 # v25.2.0
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.HUB_VERCEL_PROJECT_ID }}
working-directory: apps/hub
vercel-args: '--prebuilt --prod'

View File

@@ -1,80 +0,0 @@
name: Deploy Hub
on:
workflow_dispatch:
inputs:
skip_ai:
description: 'Skip AI content generation'
type: boolean
default: false
force_regenerate:
description: 'Force regenerate all content (ignore cache)'
type: boolean
default: false
template_filter:
description: 'Regenerate specific template only (e.g. "flux_schnell")'
type: string
default: ''
concurrency:
group: hub-deploy-prod
cancel-in-progress: false
permissions:
contents: read
jobs:
build-deploy:
runs-on: ubuntu-latest
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
PUBLIC_POSTHOG_KEY: ${{ secrets.HUB_POSTHOG_KEY }}
PUBLIC_GA_MEASUREMENT_ID: ${{ secrets.HUB_GA_MEASUREMENT_ID }}
SKIP_AI_GENERATION: ${{ inputs.skip_ai && 'true' || '' }}
FORCE_AI_REGENERATE: ${{ inputs.force_regenerate && 'true' || '' }}
AI_TEMPLATE_FILTER: ${{ inputs.template_filter }}
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Checkout templates data
uses: actions/checkout@v6
with:
repository: Comfy-Org/workflow_templates
path: _workflow_templates
sparse-checkout: templates
token: ${{ secrets.GH_TOKEN }}
- name: Restore content cache
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: apps/hub/.content-cache
key: hub-content-cache-${{ hashFiles('_workflow_templates/templates/**', 'apps/hub/src/**') }}
restore-keys: |
hub-content-cache-
- name: Sync templates
run: pnpm run sync
working-directory: apps/hub
env:
HUB_TEMPLATES_DIR: ${{ github.workspace }}/_workflow_templates/templates
- name: Build Astro site
run: pnpm run build
working-directory: apps/hub
env:
PUBLIC_HUB_API_URL: ${{ secrets.HUB_API_URL_PRODUCTION }}
PUBLIC_COMFY_CLOUD_URL: ${{ secrets.COMFY_CLOUD_URL_PRODUCTION }}
PUBLIC_APPROVED_ONLY: 'true'
- name: Deploy to Vercel
uses: amondnet/vercel-action@16e87c0a08142b0d0d33b76aeaf20823c381b9b9 # v25.2.0
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.HUB_VERCEL_PROJECT_ID }}
working-directory: apps/hub
vercel-args: '--prebuilt --prod'

View File

@@ -1,134 +0,0 @@
name: Hub Preview Cron
on:
schedule:
- cron: '*/15 * * * *'
workflow_dispatch:
permissions:
contents: read
pull-requests: write
jobs:
discover:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.targets.outputs.matrix }}
steps:
- uses: actions/checkout@v6
- name: Build rebuild targets
id: targets
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
targets='[]'
# Main with production API (all workflows, no approved filter)
targets=$(echo "$targets" | jq -c '. + [{"ref": "main", "is_main": true, "pr": 0, "api_env": "production"}]')
# Main with test API
targets=$(echo "$targets" | jq -c '. + [{"ref": "main", "is_main": true, "pr": 0, "api_env": "test"}]')
# Find open PRs with the "preview-cron" label
prs=$(gh pr list --label "preview-cron" --state open --json number,headRefName)
for row in $(echo "$prs" | jq -c '.[]'); do
ref=$(echo "$row" | jq -r '.headRefName')
num=$(echo "$row" | jq -r '.number')
targets=$(echo "$targets" | jq -c \
--arg ref "$ref" --argjson num "$num" \
'. + [{"ref": $ref, "is_main": false, "pr": $num, "api_env": "test"}]')
done
echo "matrix={\"include\":$targets}" >> "$GITHUB_OUTPUT"
echo "### Rebuild targets" >> "$GITHUB_STEP_SUMMARY"
echo "$targets" | jq '.' >> "$GITHUB_STEP_SUMMARY"
rebuild:
needs: discover
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.discover.outputs.matrix) }}
concurrency:
group: hub-preview-cron-${{ matrix.ref }}-${{ matrix.api_env }}
cancel-in-progress: true
env:
SKIP_AI_GENERATION: 'true'
steps:
- name: Checkout
uses: actions/checkout@v6
with:
ref: ${{ matrix.ref }}
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Checkout templates data
uses: actions/checkout@v6
with:
repository: Comfy-Org/workflow_templates
path: _workflow_templates
sparse-checkout: templates
token: ${{ secrets.GH_TOKEN }}
- name: Restore content cache
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: apps/hub/.content-cache
key: hub-content-cache-cron-${{ matrix.ref }}-${{ matrix.api_env }}-${{ hashFiles('_workflow_templates/templates/**', 'apps/hub/src/**') }}
restore-keys: |
hub-content-cache-cron-${{ matrix.ref }}-${{ matrix.api_env }}-
- name: Sync templates
run: pnpm run sync:en-only
working-directory: apps/hub
env:
HUB_TEMPLATES_DIR: ${{ github.workspace }}/_workflow_templates/templates
- name: Build Astro site
run: pnpm run build
working-directory: apps/hub
env:
PUBLIC_HUB_API_URL: ${{ matrix.api_env == 'test' && secrets.HUB_API_URL_PREVIEW || secrets.HUB_API_URL_PRODUCTION }}
PUBLIC_COMFY_CLOUD_URL: ${{ matrix.api_env == 'test' && secrets.COMFY_CLOUD_URL_PREVIEW || secrets.COMFY_CLOUD_URL_PRODUCTION }}
- name: Deploy to Vercel
id: deploy
uses: amondnet/vercel-action@16e87c0a08142b0d0d33b76aeaf20823c381b9b9 # v25.2.0
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.HUB_VERCEL_PROJECT_ID }}
working-directory: apps/hub
vercel-args: '--prebuilt'
- name: Alias main preview (prod API)
if: matrix.is_main && matrix.api_env == 'production' && secrets.HUB_PREVIEW_ALIAS
env:
PREVIEW_URL: ${{ steps.deploy.outputs.preview-url }}
ALIAS: ${{ secrets.HUB_PREVIEW_ALIAS }}
VERCEL_TOKEN_VAL: ${{ secrets.VERCEL_TOKEN }}
VERCEL_SCOPE: ${{ secrets.VERCEL_ORG_ID }}
run: |
npx vercel alias "$PREVIEW_URL" "$ALIAS" --token="$VERCEL_TOKEN_VAL" --scope="$VERCEL_SCOPE"
- name: Alias main preview (test API)
if: matrix.is_main && matrix.api_env == 'test' && secrets.HUB_PREVIEW_TEST_ALIAS
env:
PREVIEW_URL: ${{ steps.deploy.outputs.preview-url }}
ALIAS: ${{ secrets.HUB_PREVIEW_TEST_ALIAS }}
VERCEL_TOKEN_VAL: ${{ secrets.VERCEL_TOKEN }}
VERCEL_SCOPE: ${{ secrets.VERCEL_ORG_ID }}
run: |
npx vercel alias "$PREVIEW_URL" "$ALIAS" --token="$VERCEL_TOKEN_VAL" --scope="$VERCEL_SCOPE"
- name: Comment preview URL on PR
if: matrix.pr > 0
uses: marocchino/sticky-pull-request-comment@773744901bac0e8cbb5a0dc842800d45e9b2b405 # v2.9.4
with:
number: ${{ matrix.pr }}
header: hub-preview-cron
message: |
🔄 **Hub preview cron rebuilt:** ${{ steps.deploy.outputs.preview-url }}
_Last rebuild: ${{ github.event.head_commit.timestamp || 'manual trigger' }}_

View File

@@ -1,74 +0,0 @@
name: Hub Preview
on:
pull_request:
paths:
- 'apps/hub/**'
workflow_dispatch:
concurrency:
group: hub-preview-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
permissions:
contents: read
pull-requests: write
jobs:
preview:
runs-on: ubuntu-latest
env:
SKIP_AI_GENERATION: 'true'
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Checkout templates data
uses: actions/checkout@v6
with:
repository: Comfy-Org/workflow_templates
path: _workflow_templates
sparse-checkout: templates
token: ${{ secrets.GH_TOKEN }}
- name: Restore content cache
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: apps/hub/.content-cache
key: hub-content-cache-preview-${{ hashFiles('_workflow_templates/templates/**', 'apps/hub/src/**') }}
restore-keys: |
hub-content-cache-preview-
- name: Sync templates
run: pnpm run sync:en-only
working-directory: apps/hub
env:
HUB_TEMPLATES_DIR: ${{ github.workspace }}/_workflow_templates/templates
- name: Build Astro site
run: pnpm run build
working-directory: apps/hub
env:
PUBLIC_HUB_API_URL: ${{ secrets.HUB_API_URL_PREVIEW }}
PUBLIC_COMFY_CLOUD_URL: ${{ secrets.COMFY_CLOUD_URL_PREVIEW }}
- name: Deploy preview to Vercel
id: deploy
uses: amondnet/vercel-action@16e87c0a08142b0d0d33b76aeaf20823c381b9b9 # v25.2.0
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.HUB_VERCEL_PROJECT_ID }}
working-directory: apps/hub
vercel-args: '--prebuilt'
- name: Comment preview URL
if: github.event_name == 'pull_request'
uses: marocchino/sticky-pull-request-comment@773744901bac0e8cbb5a0dc842800d45e9b2b405 # v2.9.4
with:
header: hub-vercel-preview
message: |
🚀 **Hub preview deployed:** ${{ steps.deploy.outputs.preview-url }}

1094
.github/workflows/pr-qa.yaml vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -142,6 +142,20 @@ jobs:
fi
echo "✅ Branch '$BRANCH' exists"
- name: Ensure packageManager field exists
run: |
if ! grep -q '"packageManager"' package.json; then
# Old branches (e.g. core/1.42) predate the packageManager field.
# Inject it so pnpm/action-setup can resolve the version.
node -e "
const fs = require('fs');
const pkg = JSON.parse(fs.readFileSync('package.json','utf8'));
pkg.packageManager = 'pnpm@10.33.0';
fs.writeFileSync('package.json', JSON.stringify(pkg, null, 2) + '\n');
"
echo "Injected packageManager into package.json for legacy branch"
fi
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0

3
.gitignore vendored
View File

@@ -67,6 +67,9 @@ dist.zip
/temp/
/tmp/
# QA local output
/.comfy-qa/
# Generated JSON Schemas
/schemas/

View File

@@ -12,8 +12,6 @@
"playwright-report/*",
"src/extensions/core/*",
"src/scripts/*",
"apps/hub/scripts/**/*",
"apps/hub/src/scripts/*",
"src/types/generatedManagerTypes.ts",
"src/types/vue-shim.d.ts",
"test-results/*",
@@ -66,6 +64,7 @@
]
}
],
"no-unsafe-optional-chaining": "error",
"no-self-assign": "allow",
"no-unused-expressions": "off",
"no-unused-private-class-members": "off",
@@ -106,8 +105,7 @@
"allowInterfaces": "always"
}
],
"vue/no-import-compiler-macros": "error",
"vue/no-dupe-keys": "error"
"vue/no-import-compiler-macros": "error"
},
"overrides": [
{

View File

@@ -179,6 +179,12 @@ This project uses **pnpm**. Always prefer scripts defined in `package.json` (e.g
24. Do not use function expressions if it's possible to use function declarations instead
25. Watch out for [Code Smells](https://wiki.c2.com/?CodeSmell) and refactor to avoid them
## Design Standards
Before implementing any user-facing feature, consult the [Comfy Design Standards](https://www.figma.com/design/QreIv5htUaSICNuO2VBHw0/Comfy-Design-Standards) Figma file. Use the Figma MCP to fetch it live — the file is the single source of truth and may be updated by designers at any time.
See `docs/guidance/design-standards.md` for Figma file keys, section node IDs, and component references.
## Testing Guidelines
See @docs/testing/\*.md for detailed patterns.
@@ -226,6 +232,7 @@ See @docs/testing/\*.md for detailed patterns.
- shadcn/vue: <https://www.shadcn-vue.com/>
- Reka UI: <https://reka-ui.com/>
- PrimeVue: <https://primevue.org>
- Comfy Design Standards: <https://www.figma.com/design/QreIv5htUaSICNuO2VBHw0/Comfy-Design-Standards>
- ComfyUI: <https://docs.comfy.org>
- Electron: <https://www.electronjs.org/docs/latest/>
- Wiki: <https://deepwiki.com/Comfy-Org/ComfyUI_frontend/1-overview>

View File

@@ -62,6 +62,37 @@ python main.py --port 8188 --cpu
- Run `pnpm dev:electron` to start the dev server with electron API mocked
- Run `pnpm dev:cloud` to start the dev server against the cloud backend (instead of local ComfyUI server)
#### Testing with Cloud & Staging Environments
Some features — particularly **partner/API nodes** (e.g. BFL, OpenAI, Stability AI) — require a cloud backend for authentication and billing. Running these against a local ComfyUI instance will result in permission errors or logged-out states. There are two ways to connect to a cloud/staging backend:
**Option 1: Frontend — `pnpm dev:cloud`**
The simplest approach. This proxies all API requests to the test cloud environment:
```bash
pnpm dev:cloud
```
This sets `DEV_SERVER_COMFYUI_URL` to `https://testcloud.comfy.org/` automatically. You can also set this variable manually in your `.env` file to target a different environment:
```bash
# .env
DEV_SERVER_COMFYUI_URL=https://stagingcloud.comfy.org/
```
Any `*.comfy.org` URL automatically enables cloud mode, which includes the GCS media proxy needed for viewing generated images and videos. See [.env_example](.env_example) for all available cloud URLs.
**Option 2: Backend — `--comfy-api-base`**
Alternatively, launch the ComfyUI backend pointed at the staging API:
```bash
python main.py --comfy-api-base https://stagingapi.comfy.org --verbose
```
Then run `pnpm dev` as usual. This keeps the frontend in local mode but routes backend API calls through staging.
#### Access dev server on touch devices
Enable remote access to the dev server by setting `VITE_REMOTE_DEV` in `.env` to `true`.

9
apps/hub/.gitignore vendored
View File

@@ -1,9 +0,0 @@
dist/
.astro/
.content-cache/
src/content/templates/
public/workflows/thumbnails/
public/workflows/avatars/
public/previews/
public/search-index.json
knowledge/tutorials/

View File

@@ -1,254 +0,0 @@
import { defineConfig } from 'astro/config'
import sitemap from '@astrojs/sitemap'
import vercel from '@astrojs/vercel'
import tailwindcss from '@tailwindcss/vite'
import fs from 'node:fs'
import path from 'node:path'
import os from 'node:os'
import vue from '@astrojs/vue'
// Build template date lookup at config time
const templatesDir = path.join(process.cwd(), 'src/content/templates')
const templateDates = new Map()
if (fs.existsSync(templatesDir)) {
const files = fs.readdirSync(templatesDir).filter((f) => f.endsWith('.json'))
for (const file of files) {
try {
const content = JSON.parse(
fs.readFileSync(path.join(templatesDir, file), 'utf-8')
)
if (content.name && content.date) {
templateDates.set(content.name, content.date)
}
} catch {
// Skip invalid JSON files
}
}
}
// Build timestamp used as lastmod fallback for pages without a specific date
const buildDate = new Date().toISOString()
// Supported locales (matches src/i18n/config.ts)
const locales = [
'en',
'zh',
'zh-TW',
'ja',
'ko',
'es',
'fr',
'ru',
'tr',
'ar',
'pt-BR'
]
const nonDefaultLocales = locales.filter((l) => l !== 'en')
// Custom sitemap pages for ISR routes not discovered at build time
const siteOrigin = (
process.env.PUBLIC_SITE_ORIGIN || 'https://www.comfy.org'
).replace(/\/$/, '')
// Creator profile pages — extract unique usernames from synced templates
const creatorUsernames = new Set()
if (fs.existsSync(templatesDir)) {
const files = fs.readdirSync(templatesDir).filter((f) => f.endsWith('.json'))
for (const file of files) {
try {
const content = JSON.parse(
fs.readFileSync(path.join(templatesDir, file), 'utf-8')
)
if (content.username) creatorUsernames.add(content.username)
} catch {
// Skip invalid JSON
}
}
}
const creatorPages = [...creatorUsernames].map(
(u) => `${siteOrigin}/workflows/${u}/`
)
const localeCustomPages = nonDefaultLocales.map(
(locale) => `${siteOrigin}/${locale}/workflows/`
)
const customPages = [...creatorPages, ...localeCustomPages]
// https://astro.build/config
export default defineConfig({
site: (process.env.PUBLIC_SITE_ORIGIN || 'https://www.comfy.org').replace(
/\/$/,
''
),
prefetch: {
prefetchAll: false,
defaultStrategy: 'hover'
},
i18n: {
defaultLocale: 'en',
locales: locales,
routing: {
prefixDefaultLocale: false // English at root, others prefixed (/zh/, /ja/, etc.)
}
},
integrations: [
sitemap({
// Use custom filename to avoid collision with Framer's /sitemap.xml
filenameBase: 'sitemap-workflows',
// Include Framer's marketing sitemap in the index
customSitemaps: ['https://www.comfy.org/sitemap.xml'],
// Include on-demand locale pages that aren't discovered at build time
customPages: customPages,
serialize(item) {
const url = new URL(item.url)
const pathname = url.pathname
// Template detail pages: /workflows/{slug}/ or /{locale}/workflows/{slug}/
const templateMatch = pathname.match(
/^(?:\/([a-z]{2}(?:-[A-Z]{2})?))?\/workflows\/([^/]+)\/?$/
)
if (templateMatch) {
const slug = templateMatch[2]
const date = templateDates.get(slug)
item.lastmod = date ? new Date(date).toISOString() : buildDate
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'monthly'
item.priority = 0.8
return item
}
// Homepage
if (pathname === '/' || pathname === '') {
item.lastmod = buildDate
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'daily'
item.priority = 1.0
return item
}
// Workflows index (including localized versions)
if (pathname.match(/^(?:\/[a-z]{2}(?:-[A-Z]{2})?)?\/workflows\/?$/)) {
item.lastmod = buildDate
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'daily'
item.priority = 0.9
return item
}
// Category pages: /workflows/category/{type}/ or /{locale}/workflows/category/{type}/
if (
pathname.match(
/^(?:\/[a-z]{2}(?:-[A-Z]{2})?)?\/workflows\/category\//
)
) {
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'weekly'
item.priority = 0.7
return item
}
// Model pages: /workflows/model/{model}/ or /{locale}/workflows/model/{model}/
if (
pathname.match(/^(?:\/[a-z]{2}(?:-[A-Z]{2})?)?\/workflows\/model\//)
) {
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'weekly'
item.priority = 0.6
return item
}
// Tag pages: /workflows/tag/{tag}/ or /{locale}/workflows/tag/{tag}/
if (
pathname.match(/^(?:\/[a-z]{2}(?:-[A-Z]{2})?)?\/workflows\/tag\//)
) {
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'weekly'
item.priority = 0.6
return item
}
// Default for other pages
// @ts-expect-error - sitemap types are stricter than actual API
item.changefreq = 'weekly'
item.priority = 0.5
return item
},
// Exclude OG image routes and legacy redirect pages from sitemap.
// Legacy redirects are /workflows/{slug}/ without a 12-char hex share_id suffix.
// Canonical detail pages are /workflows/{slug}-{shareId}/ (shareId = 12 hex chars).
filter: (page) => {
if (
page.includes('/workflows/og/') ||
page.includes('/workflows/og.png')
)
return false
// Check if this is a workflow detail path (not category/tag/model/creators)
const match = page.match(/\/workflows\/([^/]+)\/$/)
if (match) {
const segment = match[1]
// Skip known sub-paths
if (
['category', 'tag', 'model', 'creators'].some((p) =>
page.includes(`/workflows/${p}/`)
)
)
return true
// Include if it has a share_id suffix (12 hex chars after last hyphen)
const lastHyphen = segment.lastIndexOf('-')
if (lastHyphen === -1) return false // No hyphen = legacy redirect
const candidate = segment.slice(lastHyphen + 1)
if (candidate.length === 12 && /^[0-9a-f]+$/.test(candidate))
return true
return false // Has hyphen but not a valid share_id = legacy redirect
}
return true
}
}),
vue()
],
output: 'static',
adapter: vercel({
webAnalytics: { enabled: true },
skewProtection: true
}),
// Build performance optimizations
build: {
// Increase concurrency for faster builds on multi-core systems
concurrency: Math.max(1, os.cpus().length),
// Inline small stylesheets automatically
inlineStylesheets: 'auto'
},
// HTML compression
compressHTML: true,
// Image optimization settings
image: {
service: {
entrypoint: 'astro/assets/services/sharp',
config: {
// Limit input pixels to prevent memory issues with large images
limitInputPixels: 268402689 // ~16384x16384
}
}
},
// Responsive images for automatic srcset generation (now stable in Astro 5)
// Note: responsiveImages was moved from experimental to stable in Astro 5.x
vite: {
plugins: [tailwindcss()],
build: {
chunkSizeWarningLimit: 1000
},
optimizeDeps: {
include: ['web-vitals']
},
css: {
devSourcemap: false
}
}
})

View File

@@ -1,22 +0,0 @@
# 3D Generation
3D generation creates three-dimensional models — meshes, point clouds, or multi-view images — from text or image inputs. This enables rapid prototyping of 3D assets without manual modeling. In ComfyUI, several approaches exist: image-to-3D (lifting a single photo into a mesh), text-to-3D (generating a 3D object from a description), and multi-view generation (producing consistent views of an object that can be reconstructed into 3D).
## How It Works in ComfyUI
- Key nodes involved: Model-specific loaders (`TripoSR`, `InstantMesh`, `StableZero123`), `LoadImage`, `Save3D` / `Preview3D`, `CRM` nodes
- Typical workflow pattern: Load image → Load 3D model → Run inference → Preview 3D result → Export mesh
## Key Settings
- **Inference steps**: Number of denoising/reconstruction steps. More steps generally improve quality but increase generation time.
- **Elevation angle**: Camera elevation for multi-view generation, controlling the vertical viewing angle of the generated views.
- **Guidance scale**: How closely the model follows the input image or text. Higher values increase fidelity to the input but may reduce diversity.
- **Output format**: Export format for the 3D mesh — OBJ, GLB, and PLY are common options, each suited to different downstream tools.
## Tips
- Clean single-object images on white or simple backgrounds work best for image-to-3D conversion.
- Multi-view approaches (like Zero123) often produce better geometry than single-view methods.
- Post-process generated meshes in Blender for cleanup, retopology, or texturing before production use.
- Start with TripoSR for quick results — it generates meshes in seconds and is a good baseline to compare against other methods.

View File

@@ -1,374 +0,0 @@
{
"text-to-image": [
"01_get_started_text_to_image",
"api_bfl_flux2_max_sofa_swap",
"api_bfl_flux_1_kontext_max_image",
"api_bfl_flux_1_kontext_multiple_images_input",
"api_bfl_flux_1_kontext_pro_image",
"api_bfl_flux_pro_t2i",
"api_bytedance_seedream4",
"api_flux2",
"api_from_photo_2_miniature",
"api_google_gemini_image",
"api_grok_text_to_image",
"api_ideogram_v3_t2i",
"api_kling_omni_image",
"api_luma_photon_i2i",
"api_luma_photon_style_ref",
"api_nano_banana_pro",
"api_openai_dall_e_2_inpaint",
"api_openai_dall_e_2_t2i",
"api_openai_dall_e_3_t2i",
"api_openai_fashion_billboard_generator",
"api_openai_image_1_i2i",
"api_openai_image_1_inpaint",
"api_openai_image_1_multi_inputs",
"api_openai_image_1_t2i",
"api_recraft_image_gen_with_color_control",
"api_recraft_image_gen_with_style_control",
"api_recraft_style_reference",
"api_recraft_vector_gen",
"api_runway_reference_to_image",
"api_runway_text_to_image",
"api_stability_ai_i2i",
"api_stability_ai_sd3.5_i2i",
"api_stability_ai_sd3.5_t2i",
"api_stability_ai_stable_image_ultra_t2i",
"api_wan_text_to_image",
"default",
"flux1_dev_uso_reference_image_gen",
"flux1_krea_dev",
"flux_canny_model_example",
"flux_depth_lora_example",
"flux_dev_checkpoint_example",
"flux_dev_full_text_to_image",
"flux_fill_inpaint_example",
"flux_fill_outpaint_example",
"flux_redux_model_example",
"flux_schnell",
"flux_schnell_full_text_to_image",
"hidream_e1_1",
"hidream_e1_full",
"hidream_i1_dev",
"hidream_i1_fast",
"hidream_i1_full",
"image-qwen_image_edit_2511_lora_inflation",
"image_anima_preview",
"image_chroma1_radiance_text_to_image",
"image_chroma_text_to_image",
"image_flux2",
"image_flux2_fp8",
"image_flux2_klein_image_edit_4b_base",
"image_flux2_klein_image_edit_4b_distilled",
"image_flux2_klein_image_edit_9b_base",
"image_flux2_klein_image_edit_9b_distilled",
"image_flux2_klein_text_to_image",
"image_flux2_text_to_image",
"image_flux2_text_to_image_9b",
"image_kandinsky5_t2i",
"image_lotus_depth_v1_1",
"image_netayume_lumina_t2i",
"image_newbieimage_exp0_1-t2i",
"image_omnigen2_image_edit",
"image_omnigen2_t2i",
"image_ovis_text_to_image",
"image_qwen_Image_2512",
"image_qwen_image",
"image_qwen_image_2512_with_2steps_lora",
"image_qwen_image_controlnet_patch",
"image_qwen_image_instantx_controlnet",
"image_qwen_image_instantx_inpainting_controlnet",
"image_qwen_image_union_control_lora",
"image_z_image",
"image_z_image_turbo",
"image_z_image_turbo_fun_union_controlnet",
"sd3.5_large_blur",
"sd3.5_large_canny_controlnet_example",
"sd3.5_large_depth",
"sd3.5_simple_example",
"sdxl_refiner_prompt_example",
"sdxl_revision_text_prompts",
"sdxl_simple_example",
"sdxlturbo_example",
"templates-9grid_social_media-v2.0"
],
"img2img": [
"02_qwen_Image_edit_subgraphed",
"api_luma_photon_i2i",
"api_meshy_multi_image_to_model",
"api_openai_image_1_i2i",
"api_runway_reference_to_image",
"api_stability_ai_i2i",
"api_stability_ai_sd3.5_i2i",
"flux1_dev_uso_reference_image_gen",
"flux_canny_model_example",
"flux_depth_lora_example",
"flux_fill_inpaint_example",
"flux_fill_outpaint_example",
"flux_kontext_dev_basic",
"flux_redux_model_example",
"image_chrono_edit_14B",
"image_qwen_image_edit",
"image_qwen_image_edit_2509",
"image_qwen_image_instantx_controlnet",
"image_qwen_image_instantx_inpainting_controlnet",
"sd3.5_large_blur",
"sd3.5_large_canny_controlnet_example",
"sd3.5_large_depth"
],
"inpainting": [
"api_openai_dall_e_2_inpaint",
"api_openai_image_1_inpaint",
"api_stability_ai_audio_inpaint",
"flux_fill_inpaint_example",
"flux_fill_outpaint_example",
"image_flux.1_fill_dev_OneReward",
"image_qwen_image_instantx_inpainting_controlnet",
"video_wan2_2_14B_fun_inpaint",
"video_wan2_2_5B_fun_inpaint",
"video_wan_vace_inpainting",
"wan2.1_fun_inp"
],
"outpainting": [
"api_bria_image_outpainting",
"flux_fill_outpaint_example",
"image_flux.1_fill_dev_OneReward",
"video_wan_vace_outpainting"
],
"controlnet": [
"02_qwen_Image_edit_subgraphed",
"flux_canny_model_example",
"flux_depth_lora_example",
"flux_redux_model_example",
"image_lotus_depth_v1_1",
"image_qwen_image_controlnet_patch",
"image_qwen_image_edit_2509",
"image_qwen_image_instantx_controlnet",
"image_qwen_image_instantx_inpainting_controlnet",
"image_qwen_image_union_control_lora",
"image_z_image_turbo_fun_union_controlnet",
"sd3.5_large_canny_controlnet_example",
"sd3.5_large_depth",
"utility-depthAnything-v2-relative-video",
"utility-frame_interpolation-film",
"utility-lineart-video",
"utility-normal_crafter-video",
"utility-openpose-video",
"video_ltx2_canny_to_video",
"video_ltx2_depth_to_video",
"video_ltx2_pose_to_video",
"wan2.1_fun_control"
],
"upscaling": [
"api_topaz_image_enhance",
"api_topaz_video_enhance",
"api_wavespeed_flshvsr_video_upscale",
"api_wavespped_image_upscale",
"api_wavespped_seedvr2_ai_image_fix",
"ultility_hitpaw_general_image_enhance",
"ultility_hitpaw_video_enhance",
"utility-gan_upscaler",
"utility-topaz_landscape_upscaler",
"utility_interpolation_image_upscale",
"utility_nanobanana_pro_ai_image_fix",
"utility_nanobanana_pro_illustration_upscale",
"utility_nanobanana_pro_product_upscale",
"utility_recraft_creative_image_upscale",
"utility_recraft_crisp_image_upscale",
"utility_seedvr2_image_upscale",
"utility_seedvr2_video_upscale",
"utility_topaz_illustration_upscale",
"utility_video_upscale"
],
"video-generation": [
"03_video_wan2_2_14B_i2v_subgraphed",
"api_bytedace_seedance1_5_flf2v",
"api_bytedace_seedance1_5_image_to_video",
"api_bytedace_seedance1_5_text_to_video",
"api_bytedance_flf2v",
"api_bytedance_image_to_video",
"api_bytedance_text_to_video",
"api_grok_video",
"api_grok_video_edit",
"api_hailuo_minimax_i2v",
"api_hailuo_minimax_t2v",
"api_hailuo_minimax_video",
"api_kling2_6_i2v",
"api_kling2_6_t2v",
"api_kling_effects",
"api_kling_flf",
"api_kling_i2v",
"api_kling_motion_control",
"api_kling_omni_edit_video",
"api_kling_omni_i2v",
"api_kling_omni_t2v",
"api_kling_omni_v2v",
"api_ltxv_image_to_video",
"api_ltxv_text_to_video",
"api_luma_i2v",
"api_luma_t2v",
"api_moonvalley_image_to_video",
"api_moonvalley_text_to_video",
"api_moonvalley_video_to_video_motion_transfer",
"api_moonvalley_video_to_video_pose_control",
"api_openai_sora_video",
"api_pixverse_i2v",
"api_pixverse_t2v",
"api_pixverse_template_i2v",
"api_runway_first_last_frame",
"api_runway_gen3a_turbo_image_to_video",
"api_runway_gen4_turo_image_to_video",
"api_topaz_video_enhance",
"api_veo2_i2v",
"api_veo3",
"api_vidu_image_to_video",
"api_vidu_q2_flf2v",
"api_vidu_q2_i2v",
"api_vidu_q2_r2v",
"api_vidu_q2_t2v",
"api_vidu_q3_image_to_video",
"api_vidu_q3_text_to_video",
"api_vidu_reference_to_video",
"api_vidu_start_end_to_video",
"api_vidu_text_to_video",
"api_vidu_video_extension",
"api_wan2_6_i2v",
"api_wan2_6_t2v",
"api_wan_image_to_video",
"api_wan_r2v",
"api_wan_text_to_video",
"api_wavespeed_flshvsr_video_upscale",
"gsc_starter_2",
"hunyuan_video_text_to_video",
"image_to_video_wan",
"ltxv_image_to_video",
"ltxv_text_to_video",
"template-Animation_Trajectory_Control_Wan_ATI",
"templates-3D_logo_texture_animation",
"templates-6-key-frames",
"templates-car_product",
"templates-photo_to_product_vid",
"templates-sprite_sheet",
"templates-stitched_vid_contact_sheet",
"templates-textured_logo_elements",
"templates-textured_logotype-v2.1",
"text_to_video_wan",
"txt_to_image_to_video",
"ultility_hitpaw_video_enhance",
"utility-depthAnything-v2-relative-video",
"utility-frame_interpolation-film",
"utility-gan_upscaler",
"utility-lineart-video",
"utility-normal_crafter-video",
"utility-openpose-video",
"utility_seedvr2_video_upscale",
"utility_video_upscale",
"video-wan21_scail",
"video_humo",
"video_hunyuan_video_1.5_720p_i2v",
"video_hunyuan_video_1.5_720p_t2v",
"video_kandinsky5_i2v",
"video_kandinsky5_t2v",
"video_ltx2_canny_to_video",
"video_ltx2_depth_to_video",
"video_ltx2_i2v",
"video_ltx2_i2v_distilled",
"video_ltx2_pose_to_video",
"video_ltx2_t2v",
"video_ltx2_t2v_distilled",
"video_wan2.1_alpha_t2v_14B",
"video_wan2.1_fun_camera_v1.1_1.3B",
"video_wan2.1_fun_camera_v1.1_14B",
"video_wan2_1_infinitetalk",
"video_wan2_2_14B_animate",
"video_wan2_2_14B_flf2v",
"video_wan2_2_14B_fun_camera",
"video_wan2_2_14B_fun_control",
"video_wan2_2_14B_fun_inpaint",
"video_wan2_2_14B_i2v",
"video_wan2_2_14B_s2v",
"video_wan2_2_14B_t2v",
"video_wan2_2_5B_fun_control",
"video_wan2_2_5B_fun_inpaint",
"video_wan2_2_5B_ti2v",
"video_wan_ati",
"video_wan_vace_14B_ref2v",
"video_wan_vace_14B_t2v",
"video_wan_vace_14B_v2v",
"video_wan_vace_flf2v",
"video_wan_vace_inpainting",
"video_wan_vace_outpainting",
"video_wanmove_480p",
"video_wanmove_480p_hallucination",
"wan2.1_flf2v_720_f16",
"wan2.1_fun_control",
"wan2.1_fun_inp"
],
"audio-generation": [
"05_audio_ace_step_1_t2a_song_subgraphed",
"api_kling2_6_i2v",
"api_kling2_6_t2v",
"api_stability_ai_audio_inpaint",
"api_stability_ai_audio_to_audio",
"api_stability_ai_text_to_audio",
"api_vidu_q3_image_to_video",
"api_vidu_q3_text_to_video",
"audio-chatterbox_tts",
"audio-chatterbox_tts_dialog",
"audio-chatterbox_tts_multilingual",
"audio-chatterbox_vc",
"audio_ace_step_1_5_checkpoint",
"audio_ace_step_1_5_split",
"audio_ace_step_1_5_split_4b",
"audio_ace_step_1_m2m_editing",
"audio_ace_step_1_t2a_instrumentals",
"audio_ace_step_1_t2a_song",
"audio_stable_audio_example",
"utility-audioseparation",
"video_wan2_1_infinitetalk",
"video_wan2_2_14B_s2v"
],
"3d-generation": [
"04_hunyuan_3d_2.1_subgraphed",
"3d_hunyuan3d-v2.1",
"3d_hunyuan3d_image_to_model",
"3d_hunyuan3d_multiview_to_model",
"3d_hunyuan3d_multiview_to_model_turbo",
"api_from_photo_2_miniature",
"api_hunyuan3d_image_to_model",
"api_hunyuan3d_text_to_model",
"api_meshy_image_to_model",
"api_meshy_multi_image_to_model",
"api_meshy_text_to_model",
"api_rodin_gen2",
"api_rodin_image_to_model",
"api_rodin_multiview_to_model",
"api_tripo3_0_image_to_model",
"api_tripo3_0_text_to_model",
"api_tripo_image_to_model",
"api_tripo_multiview_to_model",
"api_tripo_text_to_model",
"templates-3D_logo_texture_animation",
"templates-qwen_multiangle"
],
"lora": [
"flux_depth_lora_example",
"image-qwen_image_edit_2511_lora_inflation",
"image_qwen_image_2512_with_2steps_lora",
"image_qwen_image_union_control_lora"
],
"embeddings": [],
"ip-adapter": [
"api_kling_omni_i2v",
"api_kling_omni_image",
"api_kling_omni_v2v",
"api_magnific_image_style_transfer",
"api_recraft_style_reference",
"api_vidu_q2_r2v",
"api_wan_r2v",
"templates-product_ad-v2.0"
],
"samplers": [],
"cfg": [],
"vae": []
}

View File

@@ -1,22 +0,0 @@
# Audio Generation
Audio generation in ComfyUI covers creating speech (text-to-speech), music, and sound effects from text prompts or reference audio. Dedicated audio models run within ComfyUI's node graph, letting you integrate audio creation into larger multimedia workflows — for example, generating a video and its soundtrack in a single pipeline.
## How It Works in ComfyUI
- Key nodes involved: Model-specific nodes (`CosyVoice` nodes for TTS, `StableAudio` nodes for music/SFX), audio preview and save nodes, `AudioScheduler`
- Typical workflow pattern: Load audio model → Provide text/reference input → Generate audio → Preview/save audio
## Key Settings
- **Sample rate**: Output audio quality, typically 2400048000 Hz. Higher rates capture more detail but produce larger files.
- **Duration**: Length of generated audio in seconds. Longer durations may reduce quality or coherence depending on the model.
- **Voice reference**: For voice cloning, a short audio clip of the target voice (310 seconds of clean speech works best).
- **Text input**: The text to be spoken (TTS) or the description of the desired audio (music/SFX generation).
## Tips
- CosyVoice and F5-TTS are popular choices for text-to-speech in ComfyUI, each with dedicated custom nodes.
- Stable Audio Open handles music and sound effect generation from text descriptions.
- Use clean, noise-free reference audio clips for voice cloning to get the best results.
- Keep text inputs short and well-punctuated for the highest quality speech output — long paragraphs may degrade in naturalness.

View File

@@ -1,23 +0,0 @@
# CFG / Guidance Scale
Classifier-Free Guidance (CFG) controls how strongly the model follows your text prompt versus generating freely. Higher CFG values produce outputs that adhere more closely to the prompt but can cause oversaturation and artifacts, while lower values yield more natural-looking images at the cost of reduced prompt control. Finding the right balance is essential for every workflow.
## How It Works in ComfyUI
- Key nodes: `KSampler` (the `cfg` parameter), `ModelSamplingDiscrete` (for advanced noise schedule configurations)
- During each sampling step, the model generates both a conditioned prediction (with your prompt) and an unconditioned prediction (without it). CFG scales the difference between the two — higher values push the output further toward the conditioned prediction, amplifying prompt influence.
## Key Settings
- **cfg** (1.030.0): The guidance scale value. Recommended ranges vary by model architecture:
- SD 1.5 / SDXL: 78 is the standard starting point
- Flux: 1.04.0 (Flux uses much lower guidance)
- Video models (e.g., Wan, HunyuanVideo): 3.55.0
## Tips
- Start at 7 for SD-based models and 3.5 for Flux, then adjust based on results
- Values above ~12 for SD models typically cause color oversaturation, harsh contrast, and visible artifacts
- Values below ~3 for SD models tend to produce blurry or incoherent results
- Some models like Flux Schnell use a guidance embedding baked into the model rather than traditional CFG — for these, the `cfg` parameter may have little or no effect
- When experimenting, change CFG in increments of 0.51.0 to see its impact clearly

View File

@@ -1,28 +0,0 @@
# ControlNet
ControlNet guides image generation using structural conditions extracted from reference images — such as edge maps, depth information, or human poses. Instead of relying solely on text prompts for composition, ControlNet lets you specify the spatial layout precisely. This bridges the gap between text-to-image flexibility and the structural precision needed for professional workflows.
## How It Works in ComfyUI
- Key nodes involved: `ControlNetLoader`, `ControlNetApplyAdvanced`, preprocessor nodes (`CannyEdgePreprocessor`, `DepthAnythingPreprocessor`, `DWPosePreprocessor`, `LineartPreprocessor`)
- Typical workflow pattern: Load reference image → preprocess to extract condition (edges/depth/pose) → load ControlNet model → apply condition to sampling → generate image with structural guidance
## ControlNet Types
- **Canny**: Detects edges to preserve outlines and shapes
- **Depth**: Captures spatial depth for accurate foreground/background placement
- **OpenPose**: Extracts human body and hand poses for character positioning
- **Normal Map**: Encodes surface orientation for consistent lighting and geometry
- **Lineart**: Follows line drawings and illustrations as generation guides
- **Scribble**: Uses rough sketches as loose compositional guides
## Key Settings
- **Strength**: Controls how strongly the condition guides generation (0.01.0). Values of 0.51.0 are typical. Higher values enforce the structure more rigidly; lower values allow the model more creative freedom.
- **start_percent / end_percent**: Controls when the ControlNet activates during the sampling process. Starting at 0.0 and ending at 1.0 applies guidance throughout. Ending earlier (e.g., 0.8) lets the model refine fine details freely in final steps.
## Tips
- Always preprocess your input image with the appropriate preprocessor node before feeding it to ControlNet. Raw images will not produce correct conditioning.
- Combine multiple ControlNets for precise control — for example, Depth for spatial layout plus OpenPose for character positioning. Stack them by chaining `ControlNetApplyAdvanced` nodes.
- If your generation looks distorted or overcooked, lower the ControlNet strength. Values above 0.8 can fight with the text prompt and produce artifacts.

View File

@@ -1,19 +0,0 @@
# Textual Embeddings
Textual embeddings are learned text representations that encode specific concepts, styles, or objects into the CLIP text encoder's vocabulary. These tiny files (~10100 KB) effectively add new "words" to your prompt vocabulary, letting you reference complex visual concepts — a particular art style, a specific character, or a set of undesirable artifacts — with a single token. Because they operate at the text-encoding level, embeddings integrate seamlessly with your existing prompts and require no changes to the model itself.
## How It Works in ComfyUI
- Key nodes: `CLIPTextEncode` — reference embeddings directly in your prompt text using the syntax `embedding:name_of_embedding`
- Typical workflow pattern: Place embedding files in `ComfyUI/models/embeddings/` → type `embedding:name_of_embedding` inside your positive or negative prompt in a `CLIPTextEncode` node → connect to sampler as usual
## Key Settings
- **Prompt weighting**: Embeddings have no dedicated strength slider, but you can adjust their influence with prompt weighting syntax, e.g., `(embedding:name_of_embedding:1.2)` to increase strength or `(embedding:name_of_embedding:0.6)` to soften it
- **Placement**: Add embeddings to the negative prompt to suppress unwanted features, or to the positive prompt to invoke a learned concept
## Tips
- Embeddings are commonly used in negative prompts (e.g., `embedding:EasyNegative`, `embedding:bad-hands-5`) to reduce common artifacts like malformed hands or distorted faces
- Make sure the embedding matches your base model version — an SD 1.5 embedding will not work correctly with an SDXL checkpoint
- You can combine multiple embeddings with regular text in the same prompt for fine-grained control

View File

@@ -1,20 +0,0 @@
# Image-to-Image
Image-to-image (img2img) transforms an existing image using a text prompt while preserving the original structure and composition. Instead of starting from pure noise, the source image is encoded into latent space and partially noised, then the sampler denoises it guided by your prompt. This lets you restyle photos, refine AI-generated images, or apply creative modifications while keeping the overall layout intact.
## How It Works in ComfyUI
- Key nodes involved: `LoadImage`, `VAEEncode`, `CLIPTextEncode` (positive + negative), `KSampler`, `VAEDecode`, `SaveImage`
- Typical workflow pattern: Load source image → encode to latent with VAE → encode text prompts → sample with partial denoise → decode latent to image → save
## Key Settings
- **Denoise Strength**: The most important setting, ranging from 0.0 to 1.0. Lower values (0.20.4) preserve more of the original image with subtle changes. Higher values (0.60.8) allow more creative freedom but deviate further from the source. A value of 1.0 effectively ignores the input image entirely.
- **Steps**: Number of sampling steps. 2030 is typical. Fewer steps may be sufficient at low denoise values since less transformation is needed.
- **CFG Scale**: Controls prompt adherence, same as text-to-image. 78 is a standard starting point.
## Tips
- Start with a denoise strength of 0.5 and adjust up or down based on how much change you want. This gives a balanced mix of original structure and new content.
- Your input image resolution should match the model's training resolution. Resize or crop your source image to 512×512 (SD 1.5) or 1024×1024 (SDXL) before loading to avoid quality issues.
- Use img2img iteratively: generate an initial text-to-image result, then refine it with img2img at low denoise to fix details without losing the overall composition.

View File

@@ -1,21 +0,0 @@
# Inpainting
Inpainting selectively regenerates parts of an image using a mask while leaving the rest untouched. You paint a mask over the area you want to change, provide a text prompt describing the desired replacement, and the model fills in only the masked region. This is essential for fixing defects, replacing objects, or refining specific details in an otherwise finished image.
## How It Works in ComfyUI
- Key nodes involved: `LoadImage`, `VAEEncodeForInpainting`, `CLIPTextEncode` (positive + negative), `KSampler`, `VAEDecode`, `SaveImage`
- Typical workflow pattern: Load image + mask → encode with inpainting-aware VAE node → encode text prompts → sample → decode → save
- The mask can be created using ComfyUI's built-in mask editor or loaded from an external image
## Key Settings
- **grow_mask_by**: Expands the mask by a number of pixels, helping the regenerated area blend smoothly with the surrounding image. 68 pixels is typical. Too little causes visible seams; too much affects areas you wanted to keep.
- **Denoise Strength**: For inpainting, higher values (0.71.0) generally work best since you want the masked region to be fully regenerated. Lower values may produce inconsistent blending.
- **Checkpoint**: Dedicated inpainting models like `512-inpainting-ema` produce significantly better edge blending than standard checkpoints.
## Tips
- Always expand your mask slightly beyond the target area. Tight masks create hard edges that look unnatural against the surrounding image.
- Describe what you want to appear in the masked region, not what you want to remove. For example, prompt "a clear blue sky" rather than "remove the bird."
- Use inpainting-specific checkpoints whenever possible. Standard models can inpaint but often struggle with seamless blending at mask boundaries.

View File

@@ -1,21 +0,0 @@
# IP-Adapter
IP-Adapter (Image Prompt Adapter) uses reference images to guide generation style, composition, or subject instead of — or alongside — text prompts. Rather than describing what you want in words, you show the model an image, enabling "image prompting." This is especially powerful for transferring artistic style, maintaining character consistency across generations, or conveying visual concepts that are difficult to express in text.
## How It Works in ComfyUI
- Key nodes: `IPAdapterModelLoader`, `IPAdapterApply` (or `IPAdapterAdvanced`), `CLIPVisionLoader`, `CLIPVisionEncode`, `PrepImageForClipVision`
- Typical workflow pattern: Load IP-Adapter model + CLIP Vision model → prepare and encode reference image → apply adapter to the main model → connect to sampler → decode
## Key Settings
- **weight** (0.01.0): Controls the influence of the reference image on the output. A range of 0.50.8 is typical; higher values make the output closer to the reference
- **weight_type**: Determines how the reference is interpreted — `standard` for general use, `style transfer` for artistic style without copying content, `composition` for layout guidance
- **start_at / end_at** (0.01.0): Controls when the adapter is active during sampling. Limiting the range (e.g., 0.00.8) can improve prompt responsiveness while retaining reference influence
## Tips
- Use the `style_transfer` weight type when you want to borrow an artistic style without reproducing the reference image's content
- Combine IP-Adapter with a text prompt for the best results — the text adds detail and specificity on top of the visual guidance
- Face-specific IP-Adapter models (e.g., `ip-adapter-faceid`) exist for portrait consistency across multiple generations
- Lower the weight if your output looks too similar to the reference image

View File

@@ -1,20 +0,0 @@
# LoRA
LoRA (Low-Rank Adaptation) is a technique for fine-tuning a base model's behavior using a small add-on file rather than retraining the entire model. LoRAs adjust a model's style, teach it specific subjects, or introduce new concepts — all in a file typically just 10200 MB, compared to multi-gigabyte full checkpoints. This makes them easy to share, swap, and combine. In ComfyUI, you load LoRAs on top of a checkpoint and control how strongly they influence the output.
## How It Works in ComfyUI
- Key nodes involved: `LoraLoader` (loads one LoRA and applies it to both MODEL and CLIP), `LoraLoaderModelOnly` (applies to MODEL only, skipping CLIP for faster loading)
- Typical workflow pattern: Load checkpoint → LoraLoader (attach LoRA) → CLIP Text Encode → KSampler → VAE Decode. Chain multiple `LoraLoader` nodes to stack LoRAs.
## Key Settings
- **strength_model**: Controls how much the LoRA affects the diffusion model. Range 0.01.0; typical values are 0.61.0. Higher values apply the LoRA effect more strongly.
- **strength_clip**: Controls how much the LoRA affects text encoding. Usually set to the same value as strength_model, but can be adjusted independently for fine control.
## Tips
- Start with strength 0.7 and adjust up or down based on results — too high can cause oversaturation or artifacts.
- Stacking too many LoRAs simultaneously can cause visual artifacts or conflicting styles; two or three is usually a safe limit.
- Ensure the LoRA matches your base model architecture — SD 1.5 LoRAs will not work with SDXL checkpoints, and vice versa.
- Many LoRAs require specific trigger words in your prompt to activate; always check the LoRA's documentation or model card.

View File

@@ -1,20 +0,0 @@
# Outpainting
Outpainting extends an image beyond its original borders, generating new content that seamlessly continues the existing scene. Unlike inpainting which replaces content within an image, outpainting adds content outside the frame — expanding the canvas in any direction. This is useful for changing aspect ratios, adding environmental context, or creating panoramic compositions from a single image.
## How It Works in ComfyUI
- Key nodes involved: `LoadImage`, `ImagePadForOutpaint`, `VAEEncodeForInpainting`, `CLIPTextEncode` (positive + negative), `KSampler`, `VAEDecode`, `SaveImage`
- Typical workflow pattern: Load image → pad image with transparent/noised borders → encode with inpainting VAE node (padded area becomes the mask) → encode text prompts → sample → decode → save
## Key Settings
- **Padding Pixels**: The number of pixels to extend on each side, typically 64256. Smaller increments produce more coherent results since the model has more context relative to the new area.
- **Denoise Strength**: Use high values (0.81.0) for outpainted regions since the padded area is essentially blank and needs full generation.
- **Feathering**: Controls the gradient blend between the original image and the new content. Higher feathering values create smoother transitions and reduce visible seams.
## Tips
- Outpaint in stages rather than all at once. Extending by 128 pixels at a time and iterating produces far more coherent results than trying to add 512 pixels in a single pass.
- Use a lower CFG scale (56) for outpainting. This allows the model to generate more natural, context-aware extensions rather than forcing strict prompt adherence that may clash with the existing image.
- Include scene context in your prompt that matches the original image. If the source shows an indoor room, describe the room's style and lighting so the extension feels continuous.

View File

@@ -1,21 +0,0 @@
# Samplers & Schedulers
Samplers are the algorithms that iteratively denoise a random latent into a coherent image, while schedulers control the noise schedule — how much noise is removed at each step. Together they determine the image's quality, speed, and visual character. Choosing the right combination is one of the most impactful decisions in any generation workflow.
## How It Works in ComfyUI
- Key nodes: `KSampler` (main sampling node), `KSamplerAdvanced` (provides control over start/end steps for multi-pass workflows)
- Typical workflow pattern: Load model → connect conditioning → configure sampler/scheduler/steps → sample → decode
## Key Settings
- **sampler_name**: The denoising algorithm. Common choices include `euler` (fast, good baseline), `euler_ancestral` (more creative variation), `dpmpp_2m` (balanced quality and speed), `dpmpp_2m_sde` (high quality, slightly slower), `dpmpp_3m_sde` (very high quality), and `uni_pc` (fast convergence)
- **scheduler**: Controls the noise reduction curve. `normal` is linear, `karras` front-loads noise reduction for better detail, `exponential` and `sgm_uniform` (recommended for SDXL) are also available
- **steps** (1100): Number of denoising iterations. 2030 is typical; more steps give diminishing returns. Flux and LCM models need far fewer (48 steps)
## Tips
- `euler` + `normal` is the safest starting combination for any model
- `dpmpp_2m` + `karras` is a popular choice when you want higher quality with minimal speed cost
- Ancestral samplers (`euler_ancestral`, any `_sde` variant) produce different results each run even with the same seed — useful for exploration, but not for reproducibility
- Flux and LCM models converge much faster; using 20+ steps with them wastes time without improving quality

View File

@@ -1,21 +0,0 @@
# Text-to-Image Generation
Text-to-image is the foundational workflow in ComfyUI: you provide a text description (prompt) and the system generates an image from scratch. This is the starting point for most generative AI art. A diffusion model iteratively denoises a random latent image, guided by your text prompt encoded through CLIP, to produce a coherent image matching your description.
## How It Works in ComfyUI
- Key nodes involved: `CheckpointLoaderSimple`, `CLIPTextEncode` (positive + negative), `EmptyLatentImage`, `KSampler`, `VAEDecode`, `SaveImage`
- Typical workflow pattern: Load checkpoint → encode text prompts → create empty latent → sample → decode latent to image → save
## Key Settings
- **Resolution**: Must match the model's training resolution. Use 512×512 for SD 1.5, 1024×1024 for SDXL and Flux models. Mismatched resolutions produce artifacts like duplicated limbs or distorted compositions.
- **Steps**: Number of denoising iterations. 2030 steps is a good balance between quality and speed. More steps refine details but with diminishing returns beyond 30.
- **CFG Scale**: Controls how strongly the sampler follows your prompt. 78 is the typical range. Higher values increase prompt adherence but can introduce oversaturation or artifacts.
- **Seed**: Determines the initial random noise. A fixed seed produces reproducible results, which is useful for iterating on prompts while keeping composition consistent.
## Tips
- Start with simple, descriptive prompts before adding stylistic modifiers. Complex prompts can conflict and produce muddy results.
- Use the negative prompt `CLIPTextEncode` to specify what you want to avoid (e.g., "blurry, low quality, deformed hands") — this significantly improves output quality.
- Always match your `EmptyLatentImage` resolution to the model you loaded. A 768×768 image on an SD 1.5 checkpoint will produce noticeably worse results than 512×512.

View File

@@ -1,21 +0,0 @@
# Upscaling
Upscaling increases image resolution while adding detail, turning a small generated image into a large, sharp result. In ComfyUI, there are two main approaches: model-based upscaling, which uses trained AI models (like RealESRGAN or 4x-UltraSharp) to intelligently enlarge an image in one pass, and latent-based upscaling, which works in latent space with a KSampler to add new detail during the enlargement process. Model-based is faster, while latent-based offers more creative control.
## How It Works in ComfyUI
- Key nodes involved: `UpscaleModelLoader`, `ImageUpscaleWithModel`, `ImageScaleBy`, `LatentUpscale`, `VAEDecodeTiled`
- Typical workflow pattern: Generate image → Upscale model loader → ImageUpscaleWithModel → Save image (model-based), or Generate latent → LatentUpscale → KSampler (lower denoise) → VAEDecode → Save image (latent-based)
## Key Settings
- **Upscale model**: The AI model used for model-based upscaling. `RealESRGAN_x4plus` is a reliable general-purpose choice; `4x-UltraSharp` excels at photo-realistic detail.
- **Scale factor**: How much to enlarge — 2x and 4x are typical. Higher factors increase VRAM usage significantly.
- **tile_size**: For tiled decoding/encoding of very large images. Range 5121024; smaller tiles use less VRAM but take longer.
## Tips
- Model-based upscaling is faster but less creative; latent upscaling paired with a KSampler adds genuinely new detail.
- Use `VAEDecodeTiled` for very large images to avoid out-of-memory errors.
- Chain two 2x upscales instead of one 4x for better overall quality.
- When using latent upscaling, set KSampler denoise to 0.30.5 to add detail without changing the composition.

View File

@@ -1,20 +0,0 @@
# VAE (Variational Autoencoder)
The VAE encodes pixel images into a compact latent representation and decodes latents back into pixel images. All diffusion in Stable Diffusion and Flux happens in latent space — the VAE is the bridge between the images you see and the mathematical space where the model actually works. Every generation workflow ends with a VAE decode step to produce a viewable image.
## How It Works in ComfyUI
- Key nodes: `VAEDecode` (latent → image), `VAEEncode` (image → latent), `VAEDecodeTiled` (for large images to avoid out-of-memory errors), `VAELoader` (load a standalone VAE file)
- Typical workflow pattern: Most checkpoints include a built-in VAE, so the `VAEDecode` node can pull directly from the loaded checkpoint. To use a different VAE, add a `VAELoader` node and connect it to `VAEDecode` instead.
## Key Settings
- **tile_size** (for `VAEDecodeTiled`): Size of each tile when decoding in chunks. Default is 512; reduce if you still encounter memory issues
- **VAE choice**: VAE files are model-specific. Use `sdxl_vae.safetensors` for SDXL, `ae.safetensors` for Flux. Place files in `ComfyUI/models/vae/`
## Tips
- If colors look washed out or slightly off, try loading an external VAE — the VAE baked into a checkpoint is not always optimal, especially for community fine-tunes
- Use `VAEDecodeTiled` for images larger than ~2048 px on either side to prevent out-of-memory crashes
- SDXL and Flux each have their own VAE architecture — using the wrong one will produce corrupted output
- When doing img2img or inpainting, the `VAEEncode` node converts your input image into the latent space the model expects

View File

@@ -1,22 +0,0 @@
# Video Generation
Video generation creates video content from text prompts (T2V), reference images (I2V), or existing video (V2V) using specialized video diffusion models. Unlike image generation, video models must maintain temporal coherence across frames, ensuring smooth motion and consistent subjects. ComfyUI supports several leading open-source video models including WAN 2.1 and HunyuanVideo, each with their own loader and latent nodes.
## How It Works in ComfyUI
- Key nodes involved: Model-specific loaders (e.g. `WAN` video nodes, `HunyuanVideo` nodes, `LTXVLoader`), `EmptyHunyuanLatentVideo` / `EmptyLTXVLatentVideo`, `KSampler`, `VHS_VideoCombine` (from Video Helper Suite)
- Typical workflow pattern: Load video model → Create empty video latent → KSampler (with video-aware scheduling) → VAE decode → VHS_VideoCombine → Save video
## Key Settings
- **Frame count**: Number of frames to generate. Typically 1681 frames depending on the model; more frames require more VRAM and time.
- **Resolution**: Often 512×320 or 848×480 for T2V. Higher resolutions need significantly more resources.
- **FPS**: Frames per second for output, typically 824. Higher FPS gives smoother motion but requires more frames for the same duration.
- **Motion scale/strength**: Controls the amount of movement in the generated video. Lower values produce subtle motion; higher values produce more dynamic scenes.
## Tips
- Start with fewer frames and lower resolution to test your prompt and settings before committing to a full-quality render.
- Image-to-video (I2V) typically gives better coherence than text-to-video (T2V) because the model has a visual anchor.
- Video Helper Suite (VHS) nodes are essential for loading, previewing, and saving video — install this custom node pack first.
- WAN 2.1 and HunyuanVideo are currently the leading open models for quality video generation in ComfyUI.

View File

@@ -1,88 +0,0 @@
{
"Wan": "wan",
"Wan2.1": "wan",
"Wan2.2": "wan",
"Wan2.5": "wan",
"Wan2.6": "wan",
"Wan-Move": "wan",
"Motion Control": "wan",
"Flux": "flux",
"Flux.2": "flux",
"Flux.2 Dev": "flux",
"Flux.2 Klein": "flux",
"Kontext": "flux",
"BFL": "flux",
"SDXL": "sdxl",
"SD1.5": "sdxl",
"Stability": "sdxl",
"Reimagine": "sdxl",
"SD3.5": "sd3-5",
"SVD": "svd",
"Stable Audio": "stable-audio",
"Google": "gemini",
"Google Gemini": "gemini",
"Google Gemini Image": "gemini",
"Gemini3 Pro Image Preview": "gemini",
"Gemini-2.5-Flash": "gemini",
"Veo": "veo",
"Nano Banana Pro": "nano-banana-pro",
"nano-banana": "nano-banana-pro",
"OpenAI": "gpt-image-1",
"GPT-Image-1": "gpt-image-1",
"GPT-Image-1.5": "gpt-image-1",
"Qwen": "qwen",
"Qwen-Image": "qwen",
"Qwen-Image-Edit": "qwen",
"Qwen-Image-Layered": "qwen",
"Qwen-Image 2512": "qwen",
"Hunyuan Video": "hunyuan",
"Hunyuan3D": "hunyuan",
"Tencent": "hunyuan",
"LTX-2": "ltx-video",
"LTXV": "ltx-video",
"Lightricks": "ltx-video",
"ByteDance": "seedance",
"Seedance": "seedance",
"Seedream": "seedream",
"Seedream 4.0": "seedream",
"SeedVR2": "seedvr2",
"Vidu": "vidu",
"Vidu Q2": "vidu",
"Vidu Q3": "vidu",
"Kling": "kling",
"Kling O1": "kling",
"Kling2.6": "kling",
"ACE-Step": "ace-step",
"Chatter Box": "chatterbox",
"Recraft": "recraft",
"Runway": "runway",
"Luma": "luma",
"HiDream": "hidream",
"Tripo": "tripo",
"MiniMax": "minimax",
"Z-Image-Turbo": "z-image",
"Z-Image": "z-image",
"Grok": "grok",
"Moonvalley": "moonvalley",
"Topaz": "topaz",
"Kandinsky": "kandinsky",
"OmniGen": "omnigen",
"Magnific": "magnific",
"PixVerse": "pixverse",
"Meshy": "meshy",
"Rodin": "rodin",
"WaveSpeed": "wavespeed",
"Chroma": "chroma",
"BRIA": "bria",
"HitPaw": "hitpaw",
"NewBie": "newbie",
"Ovis-Image": "ovis-image",
"Ideogram": "ideogram",
"Anima": "anima",
"ChronoEdit": "chronoedit",
"Nvidia": "chronoedit",
"HuMo": "humo",
"FlashVSR": "flashvsr",
"Real-ESRGAN": "real-esrgan",
"Depth Anything\u00a0v2": "depth-anything-v2"
}

View File

@@ -1,47 +0,0 @@
# ACE-Step
ACE-Step is a foundation model for music generation developed by ACE Studio and StepFun. It uses diffusion-based generation with a Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer to achieve state-of-the-art speed and musical coherence.
## Model Variants
### ACE-Step (3.5B)
- 3.5B parameter diffusion model
- DCAE encoder with linear transformer conditioning
- 27 or 60 inference steps recommended
- Apache 2.0 license
## Key Features
- 15x faster than LLM-based baselines (20 seconds for a 4-minute song on A100)
- Full-song generation with lyrics and structure
- Duration control for variable-length output
- Music remixing and style transfer
- Lyrics editing and vocal synthesis
- Supports 16+ languages including English, Chinese, Japanese, Korean, French, German, Spanish, and more
- Text-to-music from natural language descriptions
## Hardware Requirements
- RTX 3090: 12.76x real-time factor at 27 steps
- RTX 4090: 34.48x real-time factor at 27 steps
- NVIDIA A100: 27.27x real-time factor at 27 steps
- Apple M2 Max: 2.27x real-time factor at 27 steps
- Higher step counts (60) reduce speed by roughly half
## Common Use Cases
- Original music generation from text descriptions
- Song remixing and style transfer
- Lyrics-based music creation
- Multi-language vocal music generation
- Rapid music prototyping for content creators
- Background music and soundtrack generation
## Key Parameters
- **steps**: Inference steps (27 for speed, 60 for quality)
- **duration**: Target audio length in seconds (up to ~5 minutes)
- **lyrics**: Song lyrics text input for vocal generation
- **prompt**: Natural language description of desired music style and mood
- **seed**: Random seed for reproducible generation (results are seed-sensitive)

View File

@@ -1,46 +0,0 @@
# Anima
Anima is an API-based AI video generation platform that creates animated video content from text prompts, supporting character consistency and storyboard-driven workflows.
## Model Variants
### Anima Video Generation
- Cloud-based video generation service
- Supports multiple underlying AI models (Runway, Kling, Minimax, Luma)
- Integrated text, image, and audio generation pipeline
## Key Features
- AI character generation with persistent identity across scenes
- Storyboard-based workflow: script to visual scenes with narration
- Multi-model integration (GPT-4, Claude, Gemini for text; FLUX, MidJourney for images)
- Voice generation via ElevenLabs integration
- Music composition via Suno integration
- Autopilot mode for fully automated video creation
- Prompt enhancement for optimized output quality
- Template library for rapid content creation
- Scene-by-scene generation with character consistency
## Hardware Requirements
- No local hardware required (cloud-based service)
- Runs entirely through web API
- Browser-based interface for interactive use
## Common Use Cases
- Animated story series production
- Movie trailer and concept video creation
- Kids bedtime story animation
- Lofi music video generation
- Marketing and explainer video content
- Storyboard visualization
## Key Parameters
- **prompt**: Text description of the scene or story
- **character**: Selected or generated character for identity consistency
- **style**: Visual style preset (animation, cinematic, etc.)
- **duration**: Target video length
- **resolution**: Output video resolution

View File

@@ -1,48 +0,0 @@
# BRIA AI
BRIA AI is an enterprise-focused visual generative AI platform that trains its models exclusively on licensed, ethically sourced data, ensuring commercially safe outputs with full IP indemnification.
## Model Variants
### BRIA Fibo
- Flagship hyper-controllable text-to-image model
- JSON-based control framework with 100+ disentangled visual attributes
- Supports lighting, depth, color, composition, and camera control
- Ideal for agentic workflows and enterprise-scale creative automation
### BRIA Text-to-Image Lite
- Fully private, self-hosted deployment of the Fibo pipeline
- Designed for regulated industries requiring total data control
- Runs on-premises with no external data transfer
## Key Features
- Trained on 100% licensed data from 20+ partners including Getty Images
- Full IP indemnification for commercial use
- Tri-layer content moderation for brand-safe outputs
- Patented attribution engine compensating data owners by usage
- ControlNet support for canny, depth, recoloring, and IP Adapter
- Multilingual prompt support
- Fine-tuning API for brand-specific customization
## Hardware Requirements
- Cloud-hosted API available (no local GPU required)
- Self-hosted Lite version supports deployment on AWS and Azure
- Open-source weights available on Hugging Face for local inference
## Common Use Cases
- Enterprise marketing and advertising content
- E-commerce product photography
- Brand-consistent visual asset generation
- Storyboarding and concept art for media production
## Key Parameters
- **prompt**: Text description of desired image
- **style**: Photorealistic, illustrative, or custom styles
- **guidance_methods**: ControlNet canny, depth, recoloring, IP Adapter
- **resolution**: Multiple aspect ratios supported

View File

@@ -1,52 +0,0 @@
# Chatterbox
Chatterbox is a family of state-of-the-art open-source text-to-speech models developed by Resemble AI, featuring zero-shot voice cloning and emotion control.
## Model Variants
### Chatterbox Turbo
- 350M parameters, single-step mel decoding for low latency
- Paralinguistic tags for non-speech sounds ([laugh], [cough], [chuckle])
- English only, optimized for voice agents and production use
### Chatterbox (Original)
- 500M parameter Llama backbone, English only
- CFG and exaggeration control for emotion intensity
### Chatterbox Multilingual
- 500M parameters, 23 languages (Arabic, Chinese, French, German, Hindi, Japanese, Korean, Spanish, and more)
- Zero-shot voice cloning across languages
## Key Features
- Zero-shot voice cloning from a few seconds of reference audio
- Emotion exaggeration control (first open-source model with this feature)
- Built-in PerTh neural watermarking for responsible AI
- Sub-200ms latency for real-time applications
- Trained on 500K hours of cleaned speech data
- MIT license (free for commercial use)
- Outperforms ElevenLabs in subjective evaluations
## Hardware Requirements
- Minimum: NVIDIA GPU with CUDA support
- Turbo model requires less VRAM than original due to smaller architecture
- Runs on consumer GPUs (RTX 3060 and above)
- CPU inference possible but significantly slower
## Common Use Cases
- Voice cloning for content creation
- AI voice agents and assistants
- Audiobook narration
- Game and media dialogue generation
## Key Parameters
- **exaggeration**: Emotion intensity control (0.0 to 1.0, default 0.5)
- **cfg_weight**: Classifier-free guidance weight (0.0 to 1.0, default 0.5)
- **audio_prompt_path**: Path to reference audio clip for voice cloning
- **language_id**: Language code for multilingual model (e.g., "fr", "zh", "ja")

View File

@@ -1,50 +0,0 @@
# Chroma
Chroma is an open-source 8.9 billion parameter text-to-image model based on the FLUX.1-schnell architecture, developed by Lodestone Rock and the community. It is fully Apache 2.0 licensed.
## Model Variants
### Chroma
- 8.9B parameter model based on FLUX.1-schnell
- Trained on a curated 5M sample dataset (from 20M candidates)
- Apache 2.0 license for unrestricted use
- Supports both tag-based and natural language prompting
### Chroma XL
- Experimental merge and fine-tune based on NoobAI-XL (SDXL architecture)
- Low CFG (2.5-3.0) and low step count (8-12 steps)
- Optimized for fast generation on consumer hardware
## Key Features
- Fully open-source with Apache 2.0 licensing
- Diverse training data spanning anime, artistic, and photographic styles
- Community-driven development with public training logs
- Compatible with FLUX ecosystem (VAE, T5 text encoder)
- ComfyUI workflow support
- LoRA and fine-tuning compatible
- GGUF quantized versions available for lower VRAM
## Hardware Requirements
- Base model: 24GB VRAM recommended (BF16)
- Q8_0 quantized: ~13GB VRAM
- Q4_0 quantized: ~7GB VRAM
- Requires FLUX.1 VAE and T5 text encoder
## Common Use Cases
- Open-source text-to-image generation
- Artistic and stylized image creation
- Community model fine-tuning and experimentation
- LoRA training for custom styles and characters
## Key Parameters
- **prompt**: Text description or tag-based prompt
- **steps**: Inference steps (15-30 recommended)
- **cfg_scale**: Guidance scale (1-4, model uses low CFG)
- **resolution**: Output resolution (1024x1024 default)
- **guidance**: Flux-style guidance parameter (around 4)

View File

@@ -1,58 +0,0 @@
# ChronoEdit
ChronoEdit is an image editing framework by NVIDIA that reframes editing as a video generation task, using temporal reasoning to ensure physically plausible and consistent edits.
## Model Variants
### ChronoEdit-14B
- Full 14 billion parameter model for maximum quality
- Built on pretrained video diffusion model architecture
- Requires ~34GB VRAM (38GB with temporal reasoning enabled)
### ChronoEdit-2B
- Compact 2 billion parameter variant for efficiency
- Maintains core temporal reasoning capabilities
- Lower VRAM requirements for broader hardware compatibility
### ChronoEdit-14B 8-Step Distilled LoRA
- Distilled variant requiring only 8 inference steps
- Faster generation with minimal quality loss
- Uses flow-shift 2.0 and guidance-scale 1.0
## Key Features
- Treats image editing as a video generation task for temporal consistency
- Temporal reasoning tokens simulate intermediate editing trajectories
- Ensures physically plausible edits (object interactions, lighting, shadows)
- Two-stage pipeline: temporal reasoning stage followed by editing frame generation
- Prompt enhancer integration for improved editing instructions
- LoRA fine-tuning support via DiffSynth-Studio
- Upscaler LoRA available for super-resolution editing
- PaintBrush LoRA for sketch-to-object editing
- Apache-2.0 license
## Hardware Requirements
- 14B model: 34GB VRAM minimum (38GB with temporal reasoning)
- 2B model: 12GB+ VRAM estimated
- Supports model offloading to reduce peak VRAM
- Linux only (not supported on Windows/macOS)
## Common Use Cases
- Physically consistent image editing (add/remove/modify objects)
- World simulation for autonomous driving and robotics
- Visualizing editing trajectories and reasoning
- Image super-resolution via upscaler LoRA
- Sketch-to-object conversion via PaintBrush LoRA
## Key Parameters
- **prompt**: Text description of the desired edit
- **num_inference_steps**: Denoising steps (default ~50, or 8 with distilled LoRA)
- **guidance_scale**: Prompt adherence strength (default ~7.5, or 1.0 with distilled LoRA)
- **flow_shift**: Flow matching shift parameter (2.0 for distilled LoRA)
- **enable_temporal_reasoning**: Toggle temporal reasoning stage for better consistency

View File

@@ -1,60 +0,0 @@
# Depth Anything V2
Depth Anything V2 is a monocular depth estimation model trained on 595K synthetic labeled images and 62M+ real unlabeled images, providing robust relative depth maps from single images.
## Model Variants
### Depth-Anything-V2-Small
- Lightweight variant for fast inference
- ViT-S (Small) encoder backbone
- Suitable for real-time applications
### Depth-Anything-V2-Base
- Mid-range variant balancing speed and accuracy
- ViT-B (Base) encoder backbone
### Depth-Anything-V2-Large
- High-accuracy variant for detailed depth maps
- ViT-L (Large) encoder backbone with 256 output features
- Recommended for most production use cases
### Depth-Anything-V2-Giant
- Maximum accuracy variant
- ViT-G (Giant) encoder backbone
- Highest computational requirements
## Key Features
- More fine-grained depth detail than Depth Anything V1
- More robust than V1 and Stable Diffusion-based alternatives (Marigold, Geowizard)
- 10× faster than SD-based depth estimation models
- Trained on large-scale synthetic + real data mixture
- Produces relative (not metric) depth maps by default
- DPT (Dense Prediction Transformer) decoder architecture
## Hardware Requirements
- Small: 2GB VRAM minimum
- Base: 4GB VRAM minimum
- Large: 6GB VRAM recommended
- Giant: 12GB+ VRAM recommended
- CPU inference supported for smaller variants
## Common Use Cases
- Depth map generation for compositing and VFX
- ControlNet depth conditioning for image generation
- 3D scene understanding and reconstruction
- Foreground/background separation
- Augmented reality occlusion
- Video depth estimation for parallax effects
## Key Parameters
- **encoder**: Model size variant (vits, vitb, vitl, vitg)
- **input_size**: Processing resolution (higher = more detail, more VRAM)
- **output_type**: Raw depth array or normalized visualization

View File

@@ -1,50 +0,0 @@
# FlashVSR
FlashVSR is a diffusion-based streaming video super-resolution framework that achieves near real-time 4× upscaling through one-step inference with locality-constrained sparse attention.
## Model Variants
### FlashVSR v1
- Initial release of the one-step streaming VSR model
- Built on Wan2.1 1.3B video diffusion backbone
- 4× super-resolution optimized
### FlashVSR v1.1
- Enhanced stability and fidelity over v1
- Improved artifact handling across different aspect ratios
- Recommended for production use
## Key Features
- One-step diffusion inference (no multi-step denoising required)
- Streaming architecture with KV cache for sequential frame processing
- Locality-Constrained Sparse Attention (LCSA) prevents artifacts at high resolutions
- Tiny Conditional Decoder (TC Decoder) achieves 7× faster decoding than standard WanVAE
- Three-stage distillation pipeline from multi-step to single-step inference
- Runs at ~17 FPS for 768×1408 videos on a single A100 GPU
- Up to 12× speedup over prior one-step diffusion VSR models
- Scales reliably to ultra-high resolutions
## Hardware Requirements
- Minimum: 24GB VRAM (A100 or similar recommended)
- Optimized for NVIDIA A100 GPUs
- Significant VRAM required for high-resolution video processing
- Multi-GPU inference not required but beneficial for throughput
## Common Use Cases
- Real-world video upscaling to 4K
- AI-generated video enhancement and artifact removal
- Long video super-resolution with temporal consistency
- Streaming video quality improvement
- Restoring compressed or low-resolution video footage
## Key Parameters
- **scale**: Upscaling factor (4× recommended for best results)
- **tile_size**: Spatial tiling for memory management (0 = auto)
- **input_resolution**: Source video resolution (outputs 4× larger)
- **model_version**: v1 or v1.1 checkpoint selection

View File

@@ -1,98 +0,0 @@
# Flux
Flux is a family of state-of-the-art text-to-image and image editing models developed by Black Forest Labs (BFL).
## Model Variants
### Flux.1 Schnell
- Ultra-fast inference (1-4 steps)
- 12B parameter rectified flow transformer
- Apache 2.0 license (open source)
- Best for rapid prototyping and real-time applications
### Flux.1 Dev
- High-quality 12B parameter development model
- 20-50 steps for best results
- Non-commercial license for research
- Guidance-distilled for efficient generation
### Flux.1 Pro
- Highest quality Flux.1 outputs via commercial API
- Best prompt adherence and detail
### Flux.2 Dev
- 32B parameter rectified flow transformer
- Unified text-to-image, single-reference editing, and multi-reference editing
- No fine-tuning needed for character/object/style reference
- Up to 4MP photorealistic output with improved autoencoder
- Non-commercial license; quantized versions available for consumer GPUs
### Flux.2 Klein
- Fastest Flux model family — sub-second inference on modern hardware
- **Klein 4B**: ~8GB VRAM, Apache 2.0 license, ideal for edge deployment
- **Klein 9B**: Best quality-to-latency ratio, non-commercial license
- Base (undistilled) variants available for fine-tuning and LoRA training
- Supports text-to-image, single-reference editing, and multi-reference editing
### Flux.1 Kontext
- In-context image generation and editing via text instructions
- Available as Kontext Max (premium), Pro (API), and Dev (open-weights, 12B)
- Character consistency across multiple scenes without fine-tuning
- Typography manipulation and local editing within images
### Flux.1 Fill
- Dedicated inpainting and outpainting model
- Maintains consistency with surrounding image context
- Available as Fill Pro (API) and Fill Dev (open-weights)
### Flux Redux / Canny / Depth
- **Redux**: Image variation generation from reference images
- **Canny**: Edge-detection-based structural conditioning
- **Depth**: Depth-map-based structural conditioning for pose/layout control
## Key Features
- Excellent text rendering in images
- Strong prompt following and instruction adherence
- High resolution output (up to 4MP with Flux.2)
- Multi-reference editing: combine up to 6 reference images
- Consistent style and quality across generations
## Hardware Requirements
- Flux.2 Klein 4B: ~8GB VRAM (consumer GPUs like RTX 4070)
- Flux.2 Klein 9B: ~20GB VRAM
- Flux.1 models: 12GB VRAM minimum (fp16), 24GB recommended
- Flux.2 Dev: 64GB+ VRAM native, FP8 quantized ~40GB
- Quantized and weight-streaming options available for lower VRAM cards
## Common Use Cases
- Text-to-image generation
- Iterative image editing via text instructions
- Character-consistent multi-scene generation
- Inpainting and outpainting
- Style transfer and image variation
- Structural conditioning (canny, depth)
## Key Parameters
- **steps**: 1-4 (Schnell/Klein distilled), 20-50 (Dev/Base)
- **guidance_scale**: 3.5-4.0 typical for Flux.2, 3.5 for Flux.1
- **resolution**: Up to 2048x2048 (Flux.1), up to 4MP (Flux.2)
- **seed**: For reproducible generation
- **prompt_upsampling**: Optional LLM-based prompt enhancement (Flux.2)
## Blog References
- [FLUX.2 Day-0 Support in ComfyUI](../blog/flux2-day-0-support.md) — FLUX.2 with 4MP output, multi-reference consistency, professional text rendering
- [FLUX.2 [klein] 4B & 9B](../blog/flux2-klein-4b.md) — Fastest Flux models, sub-second inference, unified generation and editing
- [The Complete AI Upscaling Handbook](../blog/upscaling-handbook.md) — Benchmarks for upscaling workflows

View File

@@ -1 +0,0 @@
Flux is Black Forest Labs' family of text-to-image and image editing models. The lineup includes Flux.1 Schnell (ultra-fast, 1-4 steps, Apache 2.0), Flux.1 Dev (high-quality, 20-50 steps, non-commercial), Flux.1 Pro (commercial API), and the newer Flux.2 Dev (32B parameters, up to 4MP output, multi-reference editing without fine-tuning). Flux.2 Klein offers sub-second inference in 4B (~8GB VRAM, Apache 2.0) and 9B variants. Specialized models include Kontext (in-context editing, character consistency), Fill (inpainting/outpainting), Redux (image variations), and Canny/Depth (structural conditioning). Flux excels at text rendering in images, strong prompt adherence, and consistent multi-scene generation. VRAM ranges from ~8GB (Klein 4B) to 64GB+ (Flux.2 Dev native), with quantized options available. Key parameters: guidance_scale 3.5-4.0, resolution up to 4MP for Flux.2. Primary uses include text-to-image, iterative editing, style transfer, and structural conditioning.

View File

@@ -1,75 +0,0 @@
# Gemini
Gemini is Google DeepMind's multimodal AI model family with native image generation, editing, and video generation capabilities, accessible in ComfyUI through API nodes.
## Model Variants
### Gemini 3 Pro Image Preview
- Most capable Gemini image model with advanced reasoning
- Complex multi-turn image generation and editing
- Up to 14 input images, native 4K output
- Also known as Nano Banana Pro
- Model ID: `gemini-3-pro-image-preview`
### Gemini 2.5 Flash Image
- Cost-effective image generation optimized for speed and low latency
- Character consistency, multi-image fusion, and prompt-based editing
- $0.039 per image (1290 output tokens per image)
- Model ID: `gemini-2.5-flash-image`
### Google Gemini (General)
- Multimodal model for text, image understanding, and generation
- Interleaved text-and-image output in conversational context
- Supports image input for analysis and editing tasks
### Veo 2
- Text-to-video and image-to-video generation
- 8-second video clips at 720p resolution
- Realistic physics simulation and cinematic styles
- Supports 16:9 and 9:16 aspect ratios
- Model ID: `veo-2.0-generate-001`
### Veo 3 / 3.1
- Latest video generation with native audio (dialogue, SFX, ambient)
- Up to 1080p and 4K resolution (Veo 3.1)
- Style reference images for aesthetic control
- 4, 6, or 8-second video duration options
## Key Features
- Native multimodal generation: text, images, and video in one model family
- World knowledge from Google Search for factually accurate image generation
- SynthID invisible watermarking on all generated content
- Multi-image fusion and character consistency across generations
- Clean text rendering across multiple languages
- Prompt-based image editing without masks or complex workflows
## Hardware Requirements
- No local GPU required — all models accessed via cloud API
- Available through ComfyUI API nodes, Google AI Studio, and Vertex AI
- Requires API key and network access
## Common Use Cases
- Text-to-image and image editing via API nodes
- Multi-turn conversational image generation
- Video generation from text prompts or reference images
- Product animation and social media video content
- Style-consistent character and brand asset generation
- Text rendering and translation in images
## Key Parameters
- **prompt**: Text description for generation or editing
- **aspect_ratio**: 1:1, 3:4, 4:3, 9:16, 16:9, 21:9 (images); 16:9, 9:16 (video)
- **temperature**: 0.0-2.0 (default 1.0 for image models)
- **durationSeconds**: 4-8 seconds for Veo models
- **sampleCount**: 1-4 output videos per request
- **seed**: Integer for reproducible generation
- **personGeneration**: Safety control — `allow_adult`, `dont_allow`, or `allow_all`

View File

@@ -1,62 +0,0 @@
# GPT-Image-1
GPT-Image-1 is OpenAI's natively multimodal image generation model, capable of generating and editing images from text and image inputs. It is accessed in ComfyUI through API nodes.
## Model Variants
### GPT-Image-1.5
- Latest and most advanced GPT Image model
- Best overall quality with superior instruction following
- High input fidelity for the first 5 input images
- Supports generate vs. edit action control
- Multi-turn editing via the Responses API
### GPT-Image-1
- Production-grade image generation and editing
- High input fidelity for the first input image
- Supports up to 16 input images for editing
- Up to 10 images per generation request
### GPT-Image-1-Mini
- Cost-effective variant for lower quality requirements
- Same API surface as GPT-Image-1
- Suitable for rapid prototyping and high-volume workloads
## Key Features
- Superior text rendering in generated images
- Real-world knowledge for accurate depictions
- Transparent background support (PNG and WebP)
- Mask-based inpainting with prompt guidance
- Multi-image editing: combine up to 16 reference images
- Streaming partial image output during generation
- Content moderation with adjustable strictness
## Hardware Requirements
- No local GPU required — cloud API service via OpenAI
- Accessed through ComfyUI API nodes
- Requires OpenAI API key and organization verification
## Common Use Cases
- Text-to-image generation with detailed prompts
- Image editing and compositing from multiple references
- Product photography and mockup generation
- Inpainting with mask-guided editing
- Transparent asset generation (stickers, logos, icons)
- Multi-turn iterative image refinement
## Key Parameters
- **prompt**: Text description up to 32,000 characters
- **size**: `1024x1024`, `1536x1024` (landscape), `1024x1536` (portrait), or `auto`
- **quality**: `low`, `medium`, `high`, or `auto` (affects cost and detail)
- **n**: Number of images to generate (1-10)
- **background**: `transparent`, `opaque`, or `auto`
- **output_format**: `png`, `jpeg`, or `webp`
- **moderation**: `auto` (default) or `low` (less restrictive)
- **input_fidelity**: `low` (default) or `high` for preserving input image details

View File

@@ -1,56 +0,0 @@
# Grok (Aurora)
Aurora is xAI's autoregressive image generation model integrated into Grok, excelling at photorealistic rendering and precise text instruction following.
## Model Variants
### grok-2-image-1212
- API-accessible image generation model
- Generates multiple images from text prompts
- $0.07 per generated image
- OpenAI and Anthropic SDK compatible
### Aurora (Consumer)
- Autoregressive mixture-of-experts network
- Trained on billions of text and image examples
- Available via Grok on X platform, web, iOS, and Android
### Grok Imagine
- Video and image generation model
- State-of-the-art quality across cost and latency
- API available since January 2026
## Key Features
- Photorealistic image generation from text prompts
- Precise text rendering within images
- Accurate rendering of real-world entities, logos, and text
- Image editing via uploaded photos with text instructions
- Multi-image generation per request
- Native multimodal input support
## Hardware Requirements
- Cloud API-based (no local GPU required)
- All generation runs on xAI infrastructure
- API access via console.x.ai
## Common Use Cases
- Photorealistic image generation
- Text and logo rendering in images
- Image editing and style transfer
- Meme and social media content creation
- Product visualization
- Character and portrait generation
## Key Parameters
- **prompt**: Text description of desired image
- **model**: Model identifier (grok-2-image-1212)
- **n**: Number of images to generate
- **response_format**: Output format (url or b64_json)
- **size**: Image dimensions

View File

@@ -1,55 +0,0 @@
# HiDream-I1
HiDream-I1 is a 17B parameter image generation foundation model by HiDream.ai that achieves state-of-the-art quality using a sparse diffusion transformer architecture.
## Model Variants
### HiDream-I1 Full
- Full 17B parameter sparse diffusion transformer
- Uses Llama-3.1-8B-Instruct and T5-XXL as text encoders
- VAE from FLUX.1 Schnell, MIT license
### HiDream-I1 Dev
- Distilled variant, faster inference with minor quality tradeoff
### HiDream-I1 Fast
- Further distilled for maximum speed, best for rapid prototyping
### HiDream-E1
- Instruction-based image editing model
## Key Features
- State-of-the-art HPS v2.1 score (33.82), surpassing Flux.1-dev, DALL-E 3, and Midjourney V6
- Best-in-class prompt following on GenEval (0.83) and DPG-Bench (85.89)
- Multiple output styles: photorealistic, cartoon, artistic, and more
- Dual text encoding with Llama-3.1-8B-Instruct and T5-XXL for strong prompt adherence
- MIT license for commercial use
- Requires Flash Attention for optimal performance
## Hardware Requirements
- Minimum: 24GB VRAM (Full model), Dev and Fast variants run on lower VRAM
- Recommended: 40GB+ VRAM for Full model at high resolution
- CUDA 12.4+ recommended for Flash Attention
- Llama-3.1-8B-Instruct weights downloaded automatically
## Common Use Cases
- High-fidelity text-to-image generation
- Photorealistic image creation
- Artistic and stylized illustrations
- Instruction-based image editing (E1 variant)
- Commercial image generation
## Key Parameters
- **model_type**: Variant selection (full, dev, fast)
- **steps**: Inference steps (varies by variant; fewer for fast/dev)
- **cfg_scale**: Guidance scale for prompt adherence
- **resolution**: Output image dimensions
- **prompt**: Detailed text description of desired image

View File

@@ -1,51 +0,0 @@
# HitPaw
HitPaw is an AI-powered visual enhancement platform providing image and video upscaling, restoration, and denoising through dedicated API services and desktop applications.
## Model Variants
### HitPaw Image Enhancer
- AI-powered photo enhancement with super-resolution up to 8x
- Face Clear Model: dual-model portrait upscaling (2x and 4x)
- Face Natural Model: texture-preserving portrait enhancement
- General Enhance Model: super-resolution for scenes and objects
- High Fidelity Model: premium upscaling for DSLR and AIGC images
- Generative Portrait/Enhance Models: diffusion-based restoration for heavily compressed images
### HitPaw Video Enhancer (VikPea)
- Frame-aware video restoration and ultra HD upscaling
- Face Soft Model: face-optimized noise and blur reduction
- Portrait Restore Model: multi-frame fusion for facial detail
- General Restore Model: GAN-based restoration for broad scenarios
- Ultra HD Model: premium upscaling from HD to ultra HD
- Generative Model: diffusion-driven repair for low-resolution video
## Key Features
- One-click portrait and scene enhancement
- Dual-model face and background processing pipelines
- Batch processing and API access for automated workflows
- Support for 30+ video input formats and 5 export formats
- Multi-frame face restoration for temporal consistency in video
- Denoising models for mobile and camera images
## Hardware Requirements
- Cloud API available (no local GPU required)
- Desktop apps for Windows, Mac, Android, and iOS
- API integration via HTTP-based interface
## Common Use Cases
- Upscaling AI-generated images to publication quality
- Restoring old or low-resolution photos and videos
- Enhancing portrait and landscape photography
- Video quality improvement for content creators
## Key Parameters
- **model**: Select enhancement model per content type
- **scale**: 2x or 4x super-resolution options
- **format**: Output format selection (mp4, mov, mkv, m4v, avi for video)

View File

@@ -1,47 +0,0 @@
# HuMo
HuMo is a human-centric video generation model by ByteDance that produces videos from collaborative multi-modal conditioning using text, image, and audio inputs.
## Model Variants
### HuMo (Wan2.1-T2V-1.3B based)
- Built on the Wan2.1-T2V-1.3B video foundation model
- Supports Text+Image (TI), Text+Audio (TA), and Text+Image+Audio (TIA) modes
- Two-stage training: subject preservation then audio-visual sync
## Key Features
- Multi-modal conditioning: text, reference images, and audio simultaneously
- Subject identity preservation from reference images across frames
- Audio-driven lip synchronization with facial expression alignment
- Focus-by-predicting strategy for facial region attention during audio sync
- Time-adaptive guidance dynamically adjusts input weights across denoising steps
- Minimal-invasive image injection maintains base model prompt understanding
- Progressive two-stage training separates identity learning from audio sync
- Supports text-controlled appearance editing while preserving identity
## Hardware Requirements
- Minimum: 24GB VRAM (RTX 3090/4090 or similar)
- Multi-GPU inference supported via FSDP and sequence parallelism
- Whisper-large-v3 audio encoder required for audio modes
- Optional audio separator for cleaner speech input
## Common Use Cases
- Digital avatar and virtual presenter creation
- Audio-driven talking head generation
- Character-consistent video clips from reference photos
- Lip-synced dialogue video from audio tracks
- Prompted reenactment with identity preservation
- Text-controlled outfit and style changes on consistent subjects
## Key Parameters
- **mode**: Generation mode (TI, TA, or TIA)
- **scale_t**: Text guidance strength (default: 7.5)
- **scale_a**: Audio guidance strength (default: 2.0)
- **frames**: Number of output frames (97 at 25 FPS = ~4 seconds)
- **height/width**: Output resolution (480p or 720p supported)
- **steps**: Denoising steps (30-50 recommended)

View File

@@ -1,75 +0,0 @@
# Hunyuan
Hunyuan is Tencent's family of open-source generative models spanning text-to-image, text-to-video, and 3D asset generation.
## Model Variants
### Hunyuan-DiT
- Text-to-image diffusion transformer with native Chinese and English support
- 1.5B parameter DiT architecture, native 1024x1024 resolution
- Bilingual text encoder for strong CJK text rendering in images
- v1.2 is the latest version with improved quality
### HunyuanVideo
- Large-scale text-to-video and image-to-video generation model
- 13B+ parameters, the largest open-source video generation model
- Dual-stream to single-stream transformer architecture with full attention
- MLLM text encoder (decoder-only LLM) for better instruction following
- Causal 3D VAE with 4x temporal, 8x spatial, 16x channel compression
- Generates 720p video (1280x720) at up to 129 frames (~5s at 24fps)
- FP8 quantized weights available to reduce memory by ~10GB
- Outperforms Runway Gen-3, Luma 1.6 in professional evaluations
- 3 workflow templates available
### Hunyuan3D 2.0
- Image-to-3D and text-to-3D asset generation system
- Two-stage pipeline: Hunyuan3D-DiT (shape) + Hunyuan3D-Paint (texture)
- Flow-based diffusion transformer for geometry generation
- High-resolution texture synthesis with geometric and diffusion priors
- Outputs textured meshes in GLB/OBJ format
- Outperforms both open and closed-source 3D generation models
- 7 workflow templates available
## Key Features
- Native bilingual support (Chinese and English) across the family
- Strong text rendering in generated images (Hunyuan-DiT)
- State-of-the-art video generation quality (HunyuanVideo)
- End-to-end 3D asset creation with texturing (Hunyuan3D)
- Multi-resolution generation across all model types
- Prompt rewrite system for improved generation quality (HunyuanVideo)
## Hardware Requirements
- Hunyuan-DiT: 11GB VRAM minimum (fp16), 16GB recommended
- HunyuanVideo 540p (544x960): 45GB VRAM minimum
- HunyuanVideo 720p (720x1280): 60GB VRAM minimum, 80GB recommended
- HunyuanVideo FP8: Saves ~10GB compared to fp16 weights
- Hunyuan3D 2.0: 16-24GB VRAM for shape + texture pipeline
## Common Use Cases
- Bilingual content creation and marketing materials
- Asian-style artwork and illustrations
- Text-in-image generation (Chinese/English)
- High-quality video generation from text or image prompts
- 3D asset creation for games, design, and prototyping
- Textured mesh generation from reference images
## Key Parameters
- **steps**: 25-50 for Hunyuan-DiT (default 40), 50 for HunyuanVideo
- **cfg_scale**: 5-8 for DiT (6 typical), 6.0 embedded for HunyuanVideo
- **flow_shift**: 7.0 for HunyuanVideo flow matching scheduler
- **video_length**: 129 frames for HunyuanVideo (~5s at 24fps)
- **resolution**: 1024x1024 for DiT, 720x1280 or 544x960 for video
- **negative_prompt**: Recommended for Hunyuan-DiT quality control
## Blog References
- [HunyuanVideo Native Support](../blog/hunyuanvideo-native-support.md) — 13B parameter video model, dual-stream transformer, MLLM text encoder
- [HunyuanVideo 1.5 Native Support](../blog/hunyuanvideo-15-native-support.md) — Lightweight 8.3B model, 720p output, runs on 24GB consumer GPUs
- [Hunyuan3D 2.0 and MultiView Native Support](../blog/hunyuan3d-20-native-support.md) — 3D model generation with PBR materials, 1.1B parameter multi-view model

View File

@@ -1 +0,0 @@
Hunyuan is Tencent's open-source generative model family spanning text-to-image, text-to-video, and 3D generation. Hunyuan-DiT is a 1.5B parameter text-to-image model with native Chinese and English support and strong CJK text rendering at 1024x1024 (11-16GB VRAM). HunyuanVideo is the largest open-source video model at 13B+ parameters, generating 720p video up to 129 frames (~5s at 24fps) using a dual-stream transformer with MLLM text encoder; it requires 45-80GB VRAM depending on resolution (FP8 saves ~10GB). Hunyuan3D 2.0 handles image-to-3D and text-to-3D generation via a two-stage pipeline producing textured GLB/OBJ meshes (16-24GB VRAM). Key strengths: bilingual content creation, state-of-the-art video quality surpassing Runway Gen-3, and end-to-end 3D asset creation. Typical parameters: 25-50 steps for DiT, 50 steps for video, cfg_scale 5-8.

View File

@@ -1,52 +0,0 @@
# Ideogram
Ideogram is an AI image generation platform founded by former Google Brain researchers, known for industry-leading text rendering accuracy in generated images. It achieves approximately 90% text rendering accuracy compared to roughly 30% for competing tools.
## Model Variants
### Ideogram 3.0
- Latest generation released March 2025
- Highest ELO rating in human evaluations across diverse prompts
- Style References support with up to 3 reference images
- Random style feature with 4.3 billion style presets
- Batch generation for scaled content production
### Ideogram 2.0
- Previous generation model
- Available as alternative option in the platform
- Solid text rendering and general image quality
## Key Features
- Best-in-class text rendering with accurate typography and spelling
- Handles complex, multi-line text compositions and curved surfaces
- Style modes: Realistic, Anime, 3D, Watercolor, Typography
- Magic Prompt for automatic prompt enhancement
- Canvas editing for post-generation refinement
- Upscaler up to 8K resolution in 2x increments
- Color palette control for brand consistency
- API available for programmatic integration
## Hardware Requirements
- Cloud API only (no local GPU required)
- API pricing at approximately $0.06 per image
- Web interface with credit-based subscription plans
## Common Use Cases
- Marketing materials with branded text and logos
- Social media graphics with text overlays
- Product packaging and label design
- Event posters, flyers, and invitations
- Book covers and editorial design
## Key Parameters
- **prompt**: Text description with quoted text for typography
- **model**: Version selection (2.0 or 3.0)
- **style**: Realistic, Anime, 3D, Watercolor, Typography
- **aspect_ratio**: 16 aspect ratio options available
- **magic_prompt**: Toggle for automatic prompt enhancement

View File

@@ -1,51 +0,0 @@
# Kandinsky
Kandinsky is a family of open-source diffusion models for video and image generation, developed by Kandinsky Lab (Sber AI, Russia). The models support both English and Russian text prompts.
## Model Variants
### Kandinsky 5.0 Video Pro (19B)
- HD video at 1280x768, 24fps (5 or 10 seconds)
- Controllable camera motion via LoRA
- Top-1 open-source T2V model on LMArena
### Kandinsky 5.0 Video Lite (2B)
- Lightweight model, #1 among open-source in its class
- CFG-distilled (2x faster) and diffusion-distilled (6x faster) variants
- Best Russian concept understanding in open source
### Kandinsky 5.0 Image Lite (6B)
- HD image output (1280x768, 1024x1024)
- Strong text rendering; image editing variant available
## Key Features
- Bilingual support (English and Russian prompts)
- Flow Matching architecture with MIT license
- Camera control via trained LoRAs
- ComfyUI and Diffusers integration
- MagCache acceleration for faster inference
## Hardware Requirements
- Video Lite: 12GB VRAM minimum with optimizations
- Video Pro: 24GB+ VRAM recommended
- NF4 quantization and FlashAttention 2/3 or SDPA supported
## Common Use Cases
- Open-source video generation research
- Russian and English bilingual content creation
- Camera-controlled video synthesis
- Image generation with text rendering
- Fine-tuning with custom LoRAs
## Key Parameters
- **prompt**: Text description in English or Russian
- **num_frames**: Number of output frames (5s or 10s)
- **resolution**: Output resolution (up to 1280x768)
- **steps**: Inference steps (varies by distillation level)

View File

@@ -1,64 +0,0 @@
# Kling
Kling is a video and image generation platform developed by Kuaishou Technology. It offers text-to-video, image-to-video, video editing, audio generation, and virtual try-on capabilities through both a creative studio and a developer API.
## Model Variants
### Kling O1
- First unified multimodal video model combining generation and editing
- Built on Multimodal Visual Language (MVL) framework
- Accepts text, image, video, and subject inputs in a single prompt
- Supports video inpainting, outpainting, style re-rendering, and shot extension
- Character and scene consistency via Element Library with director-like memory
- Generates 3-10 second videos at up to 2K resolution
### Kling 2.6
- Simultaneous audio-visual generation in a single pass
- Produces video with speech, sound effects, and ambient sounds together
- Supports Chinese and English voice generation
- Video content up to 10 seconds with synchronized audio
- Deep semantic alignment between audio and visual dynamics
### Kling (Base Models)
- Text-to-video and image-to-video with Standard and Professional modes
- Multi-image-to-video with multiple reference inputs
- Camera control with 6 basic movements and 4 master shots
- Video extension, lip-sync, and avatar generation
- Start and end frame generation for controlled transitions
## Key Features
- Unified generation and editing in a single model (O1)
- Simultaneous audio-visual generation (2.6)
- Multi-subject consistency across shots and angles
- Conversational editing via natural language prompts
- Video effects center for special effects and transformations
- Virtual try-on and image recognition capabilities
- DeepSeek integration for prompt optimization
## Hardware Requirements
- Cloud API only; no local hardware required
- Accessed via klingai.com creative studio or API platform
- Standard and Professional generation modes (speed vs. quality tradeoff)
## Common Use Cases
- Film and television pre-production and shot generation
- Social media content creation with audio
- E-commerce product videos and virtual try-on
- Advertising with one-click ad generation
- Video post-production editing via text prompts
- Multi-character narrative video creation
## Key Parameters
- **prompt**: Text description with positive and negative prompts
- **mode**: Standard (fast) or Professional (high quality)
- **duration**: Video length (3-10 seconds for O1, up to 10s for 2.6)
- **aspect_ratio**: Width-to-height ratio for output
- **camera_control**: Predefined camera movements and master shots
- **creativity_strength**: Balance between reference fidelity and creative variation

View File

@@ -1,68 +0,0 @@
# LTX-Video
LTX-Video is Lightricks' open-source DiT-based video generation model, the first capable of generating high-quality videos in real-time.
## Model Variants
### LTX-Video 2 (v0.9.7/v0.9.8)
- Major quality upgrade over the original release
- Available in 2B and 13B parameter sizes
- 13B dev: highest quality, requires more VRAM
- 13B distilled: faster inference, fewer steps needed, slight quality trade-off
- 2B distilled: lightweight option for lower VRAM usage
- FP8 quantized versions available for all sizes (13B-dev, 13B-distilled, 2B-distilled)
- Multi-condition generation: condition on multiple images or video segments at specific frames
- Spatial and temporal upscaler models for enhanced resolution and frame rate
- ICLoRA adapters for depth, pose, and canny edge conditioning
- 9 workflow templates available
### LTX-Video 0.9.1/0.9.6
- Original public releases with 2B parameter DiT architecture
- Text-to-video and image-to-video modes
- 768x512 native resolution at 24fps
- 0.9.6 distilled variant: 15x faster, real-time capable, no CFG required
- Foundation for community fine-tunes
## Key Features
- Real-time video generation on high-end GPUs (first DiT model to achieve this)
- Generates 30 FPS video at 1216x704 resolution faster than playback speed
- Multi-condition generation with per-frame image/video conditioning and strength control
- Temporal VAE for smooth, consistent motion
- Multi-scale rendering pipeline mixing dev and distilled models for speed-quality balance
- Latent upsampling pipeline for progressive resolution enhancement
## Hardware Requirements
- 2B model: 12GB VRAM minimum, 16GB recommended
- 2B distilled FP8: 8-10GB VRAM
- 13B model: 24-32GB VRAM (fp16)
- 13B FP8: 16-20GB VRAM
- 13B distilled: less VRAM than 13B dev, ideal for rapid iterations
- 32GB+ system RAM recommended for all variants
## Common Use Cases
- Short-form video content and social media clips
- Image-to-video animation from reference frames
- Video-to-video transformation and extension
- Multi-condition video generation (start/end frame, keyframes)
- Depth, pose, and edge-conditioned video generation via ICLoRA
- Rapid video prototyping and creative experimentation
## Key Parameters
- **num_frames**: Output frame count (divisible by 8 + 1, e.g. 97, 161, 257)
- **steps**: 30-50 for dev models, 8-15 for distilled variants
- **cfg_scale**: 3-5 typical for dev, not required for distilled
- **width/height**: Divisible by 32, best under 720x1280 for 13B
- **denoise_strength**: 0.3-0.5 when using latent upsampler refinement pass
- **conditioning_strength**: Per-condition strength for multi-condition generation (default 1.0)
- **seed**: For reproducible generation
## Blog References
- [LTX-Video 0.9.5 Day-1 Support](../blog/ltx-video-095-support.md) — Commercial license (OpenRail-M), multi-frame control, improved quality
- [LTX-2: Open Source Audio-Video AI](../blog/ltx-2-open-source-audio-video.md) — Synchronized audio-video generation, NVFP4 for 3x speed / 60% less VRAM

View File

@@ -1 +0,0 @@
LTX-Video is Lightricks' open-source DiT-based video generation model, the first to achieve real-time video generation. LTX-Video 2 (v0.9.7/0.9.8) is available in 2B and 13B parameter sizes, with dev, distilled, and FP8 quantized variants. It supports multi-condition generation with per-frame image/video conditioning, spatial and temporal upscalers, and ICLoRA adapters for depth, pose, and canny conditioning. The 2B model needs 12-16GB VRAM (8-10GB FP8), while the 13B model requires 24-32GB (16-20GB FP8). It generates 30fps video at 1216x704 faster than playback speed. Earlier versions (0.9.1/0.9.6) established the 2B foundation with a 15x faster distilled variant. Primary uses: short-form video, image-to-video animation, video extension, and multi-condition keyframe generation. Key parameters: 30-50 steps for dev, 8-15 for distilled, cfg_scale 3-5, frames divisible by 8+1.

View File

@@ -1,50 +0,0 @@
# Luma
Luma AI develops video and image generation models through its Dream Machine platform, powered by the Ray model family and Photon image model.
## Model Variants
### Ray3 / Ray3.14
- Native 1080p video with reasoning-driven generation
- World's first native 16-bit HDR video generation
- Character reference, Modify Video, and Draft Mode (5x faster)
### Ray2
- Production-ready text-to-video and image-to-video
- 5-9 second output at 24fps with coherent motion
### Photon
- Image generation with strong prompt following
- Character and visual reference support
- 1080p output at $0.016 per image
## Key Features
- Reasoning capability for understanding creative intent
- Visual annotation for precise layout and motion control
- HDR generation with 16-bit EXR export for pro workflows
- Keyframe control, video extension, looping, and camera control
## Hardware Requirements
- API-only access via Luma AI API
- No local hardware requirements
- Available through Dream Machine web and iOS app
## Common Use Cases
- Cinematic video production and storytelling
- Commercial advertising and product videos
- Visual effects with Modify Video workflows
- HDR content for professional post-production
## Key Parameters
- **prompt**: Text description for video generation
- **keyframes**: Start and/or end frame images
- **aspect_ratio**: Output dimensions and ratio
- **loop**: Enable seamless looping
- **camera_control**: Camera movement via text instructions

View File

@@ -1,47 +0,0 @@
# Magnific
Magnific is an AI-powered image upscaler and enhancer that uses generative AI to hallucinate new details and textures during the upscaling process.
## Model Variants
### Magnific Creative Upscaler
- Generative upscaling up to 16x (max 10,000px per dimension)
- AI engines: Illusio (illustration), Sharpy (photography), Sparkle (balanced)
- Adds hallucinated details guided by text prompts
### Magnific Precision Upscaler
- Faithful high-fidelity upscaling without creative reinterpretation
- Clean enlargement that stays true to the source image
### Mystic Image Generator
- Photorealistic text-to-image/image-to-image with LoRA styles at up to 4K
## Key Features
- Creativity slider controls AI-hallucinated detail level
- HDR control for micro-contrast and crispness
- Resemblance slider to balance fidelity vs. creative enhancement
- Optimized modes for portraits, illustrations, video games, and film
- API hosted on Freepik with Skin Enhancer endpoint
## Hardware Requirements
- Cloud-only service with no local hardware requirements
- API available through Freepik's developer platform
- Subscription-based with credit system
## Common Use Cases
- Upscaling AI-generated images for print and production
- Enhancing low-resolution concept art and illustrations
- Restoring old or compressed photographs
## Key Parameters
- Creativity: level of new detail hallucination (0-10)
- HDR: micro-contrast and sharpness (-10 to 10)
- Resemblance: fidelity to source image (-10 to 10)
- Scale Factor: 2x, 4x, 8x, or 16x magnification

View File

@@ -1,49 +0,0 @@
# Meshy
Meshy is a popular AI 3D model generator enabling text-to-3D and image-to-3D creation with PBR textures and production-ready exports.
## Model Variants
### Meshy-6
- Latest generation with highest quality geometry
- Supports symmetry and pose control (A-pose, T-pose)
- Configurable polygon counts up to 300,000
### Meshy-5
- Previous generation with art style support
- Realistic and sculpture style options
## Key Features
- Text-to-3D with two-stage workflow (preview mesh, then refine textures)
- Image-to-3D from photos, sketches, or illustrations
- Multi-image input for multi-view reconstruction
- AI texturing with PBR maps (diffuse, roughness, metallic, normal)
- Automatic rigging and 500+ animation motion library
- Smart remesh with quad or triangle topology control
- Export in FBX, GLB, OBJ, STL, 3MF, USDZ, BLEND formats
## Hardware Requirements
- Cloud API-based (no local GPU required)
- All generation runs on Meshy servers
- API available on Pro tier and above
## Common Use Cases
- Game development asset creation
- 3D printing and prototyping
- Film and VFX previsualization
- VR/AR content development
- Product design and e-commerce
## Key Parameters
- **prompt**: Text description up to 600 characters
- **ai_model**: Model version (meshy-5, meshy-6, latest)
- **topology**: Mesh type (quad or triangle)
- **target_polycount**: 100 to 300,000 polygons
- **enable_pbr**: Generate PBR material maps
- **pose_mode**: Character pose (a-pose, t-pose, or none)

View File

@@ -1,58 +0,0 @@
# MiniMax
MiniMax is a multi-modal AI company known for the Hailuo video generation models and Image-01, offering API-based video and image creation.
## Model Variants
### Hailuo 2.3
- Latest video model with improved body movement and facial expressions
- Supports anime, illustration, ink-wash, and game-CG styles
- 768p or 1080p resolution, 6 or 10 second clips
- Available in Quality and Fast variants
### Hailuo 2.0 (Hailuo 02)
- Native 1080p with Noise-aware Compute Redistribution (NCR)
- 2.5x efficiency improvement over predecessors
- Last-frame conditioning support
### Image-01
- Text-to-image generation with multiple output sizes
### T2V-01-Director
- Enhanced camera control with natural language commands
- Pan, zoom, tracking shot, and shake directives
## Key Features
- Text-to-video and image-to-video generation
- Up to 1080p resolution at 25fps
- Video clips up to 10 seconds
- Camera control with natural language commands
- Subject consistency with reference images
- Text-to-image generation with Image-01
## Hardware Requirements
- Cloud API-based (no local GPU required)
- All generation runs on MiniMax servers
- API access via platform.minimax.io
## Common Use Cases
- Social media video content creation
- Cinematic short film production
- Product advertising and e-commerce videos
- Anime and illustrated content
- Character-driven narrative scenes
## Key Parameters
- **prompt**: Text description for generation
- **model**: Model selection (hailuo-2.3, hailuo-02, image-01)
- **resolution**: Output resolution (768p or 1080p)
- **duration**: Clip length (6 or 10 seconds for video)
- **first_frame_image**: Reference image for image-to-video

View File

@@ -1,762 +0,0 @@
{
"generated": "2026-02-07",
"totalModels": 87,
"categories": {
"specific_model": [
{
"name": "Wan",
"category": "specific_model",
"templateCount": 36,
"priority": 108,
"docFile": "wan",
"hasExistingDoc": true
},
{
"name": "Nano Banana Pro",
"category": "specific_model",
"templateCount": 29,
"priority": 87,
"docFile": "nano-banana-pro",
"hasExistingDoc": false
},
{
"name": "Flux",
"category": "specific_model",
"templateCount": 24,
"priority": 72,
"docFile": "flux",
"hasExistingDoc": true
},
{
"name": "SDXL",
"category": "specific_model",
"templateCount": 4,
"priority": 12,
"docFile": "sdxl",
"hasExistingDoc": true
},
{
"name": "ACE-Step",
"category": "specific_model",
"templateCount": 7,
"priority": 21,
"docFile": "ace-step",
"hasExistingDoc": false
},
{
"name": "Seedance",
"category": "specific_model",
"templateCount": 6,
"priority": 18,
"docFile": "seedance",
"hasExistingDoc": false
},
{
"name": "Seedream",
"category": "specific_model",
"templateCount": 5,
"priority": 15,
"docFile": "seedream",
"hasExistingDoc": false
},
{
"name": "HiDream",
"category": "specific_model",
"templateCount": 5,
"priority": 15,
"docFile": "hidream",
"hasExistingDoc": false
},
{
"name": "Stable Audio",
"category": "specific_model",
"templateCount": 4,
"priority": 12,
"docFile": "stable-audio",
"hasExistingDoc": false
},
{
"name": "Chatter Box",
"category": "specific_model",
"templateCount": 4,
"priority": 12,
"docFile": "chatterbox",
"hasExistingDoc": false
},
{
"name": "Z-Image-Turbo",
"category": "specific_model",
"templateCount": 4,
"priority": 12,
"docFile": "z-image",
"hasExistingDoc": false
},
{
"name": "Kandinsky",
"category": "specific_model",
"templateCount": 3,
"priority": 9,
"docFile": "kandinsky",
"hasExistingDoc": false
},
{
"name": "OmniGen",
"category": "specific_model",
"templateCount": 3,
"priority": 9,
"docFile": "omnigen",
"hasExistingDoc": false
},
{
"name": "SeedVR2",
"category": "specific_model",
"templateCount": 3,
"priority": 9,
"docFile": "seedvr2",
"hasExistingDoc": false
},
{
"name": "Chroma",
"category": "specific_model",
"templateCount": 2,
"priority": 6,
"docFile": "chroma",
"hasExistingDoc": false
},
{
"name": "ChronoEdit",
"category": "specific_model",
"templateCount": 1,
"priority": 3,
"docFile": "chronoedit",
"hasExistingDoc": false
},
{
"name": "HuMo",
"category": "specific_model",
"templateCount": 1,
"priority": 3,
"docFile": "humo",
"hasExistingDoc": false
},
{
"name": "NewBie",
"category": "specific_model",
"templateCount": 1,
"priority": 3,
"docFile": "newbie",
"hasExistingDoc": false
},
{
"name": "Ovis-Image",
"category": "specific_model",
"templateCount": 1,
"priority": 3,
"docFile": "ovis-image",
"hasExistingDoc": false
}
],
"provider_name": [
{
"name": "Google",
"category": "provider_name",
"templateCount": 29,
"priority": 0,
"mapsTo": ["gemini", "veo", "nano-banana-pro"],
"hasExistingDoc": false
},
{
"name": "BFL",
"category": "provider_name",
"templateCount": 28,
"priority": 0,
"mapsTo": ["flux"],
"hasExistingDoc": false
},
{
"name": "Stability",
"category": "provider_name",
"templateCount": 19,
"priority": 0,
"mapsTo": ["sdxl", "stable-audio", "reimagine"],
"hasExistingDoc": false
},
{
"name": "ByteDance",
"category": "provider_name",
"templateCount": 11,
"priority": 0,
"mapsTo": ["seedance", "seedvr2", "seedream"],
"hasExistingDoc": false
},
{
"name": "OpenAI",
"category": "provider_name",
"templateCount": 11,
"priority": 0,
"mapsTo": ["gpt-image-1"],
"hasExistingDoc": false
},
{
"name": "Lightricks",
"category": "provider_name",
"templateCount": 9,
"priority": 0,
"mapsTo": ["ltx-video"],
"hasExistingDoc": false
},
{
"name": "Tencent",
"category": "provider_name",
"templateCount": 5,
"priority": 0,
"mapsTo": ["hunyuan"],
"hasExistingDoc": false
},
{
"name": "Qwen",
"category": "provider_name",
"templateCount": 2,
"priority": 0,
"mapsTo": ["qwen"],
"hasExistingDoc": true
},
{
"name": "Nvidia",
"category": "provider_name",
"templateCount": 1,
"priority": 0,
"mapsTo": [],
"hasExistingDoc": false
}
],
"api_only": [
{
"name": "Vidu",
"category": "api_only",
"templateCount": 10,
"priority": 20,
"docFile": "vidu",
"hasExistingDoc": false
},
{
"name": "Kling",
"category": "api_only",
"templateCount": 9,
"priority": 18,
"docFile": "kling",
"hasExistingDoc": false
},
{
"name": "Recraft",
"category": "api_only",
"templateCount": 6,
"priority": 12,
"docFile": "recraft",
"hasExistingDoc": false
},
{
"name": "Runway",
"category": "api_only",
"templateCount": 5,
"priority": 10,
"docFile": "runway",
"hasExistingDoc": false
},
{
"name": "Tripo",
"category": "api_only",
"templateCount": 5,
"priority": 10,
"docFile": "tripo",
"hasExistingDoc": false
},
{
"name": "GPT-Image-1",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "gpt-image-1",
"hasExistingDoc": false
},
{
"name": "MiniMax",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "minimax",
"hasExistingDoc": false
},
{
"name": "Grok",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "grok",
"hasExistingDoc": false
},
{
"name": "Luma",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "luma",
"hasExistingDoc": false
},
{
"name": "Moonvalley",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "moonvalley",
"hasExistingDoc": false
},
{
"name": "Topaz",
"category": "api_only",
"templateCount": 4,
"priority": 8,
"docFile": "topaz",
"hasExistingDoc": false
},
{
"name": "PixVerse",
"category": "api_only",
"templateCount": 3,
"priority": 6,
"docFile": "pixverse",
"hasExistingDoc": false
},
{
"name": "Meshy",
"category": "api_only",
"templateCount": 3,
"priority": 6,
"docFile": "meshy",
"hasExistingDoc": false
},
{
"name": "Rodin",
"category": "api_only",
"templateCount": 3,
"priority": 6,
"docFile": "rodin",
"hasExistingDoc": false
},
{
"name": "Magnific",
"category": "api_only",
"templateCount": 3,
"priority": 6,
"docFile": "magnific",
"hasExistingDoc": false
},
{
"name": "WaveSpeed",
"category": "api_only",
"templateCount": 3,
"priority": 6,
"docFile": "wavespeed",
"hasExistingDoc": false
},
{
"name": "BRIA",
"category": "api_only",
"templateCount": 2,
"priority": 4,
"docFile": "bria",
"hasExistingDoc": false
},
{
"name": "Veo",
"category": "api_only",
"templateCount": 2,
"priority": 4,
"docFile": "veo",
"hasExistingDoc": false
},
{
"name": "HitPaw",
"category": "api_only",
"templateCount": 2,
"priority": 4,
"docFile": "hitpaw",
"hasExistingDoc": false
},
{
"name": "Z-Image",
"category": "api_only",
"templateCount": 1,
"priority": 2,
"docFile": "z-image",
"hasExistingDoc": false
},
{
"name": "Anima",
"category": "api_only",
"templateCount": 1,
"priority": 2,
"docFile": "anima",
"hasExistingDoc": false
},
{
"name": "Reimagine",
"category": "api_only",
"templateCount": 1,
"priority": 2,
"docFile": "reimagine",
"mapsTo": ["stability"],
"hasExistingDoc": false
},
{
"name": "Ideogram",
"category": "api_only",
"templateCount": 1,
"priority": 2,
"docFile": "ideogram",
"hasExistingDoc": false
},
{
"name": "Gemini3 Pro Image Preview",
"category": "api_only",
"templateCount": 16,
"priority": 32,
"docFile": "gemini",
"hasExistingDoc": false
}
],
"utility_model": [
{
"name": "SVD",
"category": "utility_model",
"templateCount": 1,
"priority": 1,
"docFile": "svd",
"hasExistingDoc": false
},
{
"name": "Real-ESRGAN",
"category": "utility_model",
"templateCount": 1,
"priority": 1,
"docFile": "real-esrgan",
"hasExistingDoc": false
},
{
"name": "Depth Anything v2",
"category": "utility_model",
"templateCount": 1,
"priority": 1,
"docFile": "depth-anything-v2",
"hasExistingDoc": false
},
{
"name": "FlashVSR",
"category": "utility_model",
"templateCount": 1,
"priority": 1,
"docFile": "flashvsr",
"hasExistingDoc": false
}
],
"variant": [
{
"name": "Wan2.1",
"category": "variant",
"templateCount": 21,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": true
},
{
"name": "Wan2.2",
"category": "variant",
"templateCount": 15,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": true
},
{
"name": "Qwen-Image-Edit",
"category": "variant",
"templateCount": 11,
"priority": 0,
"mapsTo": "qwen",
"hasExistingDoc": true
},
{
"name": "LTX-2",
"category": "variant",
"templateCount": 9,
"priority": 0,
"mapsTo": "ltx-video",
"hasExistingDoc": true
},
{
"name": "Qwen-Image",
"category": "variant",
"templateCount": 7,
"priority": 0,
"mapsTo": "qwen",
"hasExistingDoc": true
},
{
"name": "Hunyuan3D",
"category": "variant",
"templateCount": 7,
"priority": 0,
"mapsTo": "hunyuan",
"hasExistingDoc": true
},
{
"name": "Google Gemini Image",
"category": "variant",
"templateCount": 6,
"priority": 0,
"mapsTo": "gemini",
"hasExistingDoc": false
},
{
"name": "Flux.2 Klein",
"category": "variant",
"templateCount": 6,
"priority": 0,
"mapsTo": "flux",
"hasExistingDoc": true
},
{
"name": "Kling O1",
"category": "variant",
"templateCount": 5,
"priority": 0,
"mapsTo": "kling",
"hasExistingDoc": false
},
{
"name": "Vidu Q2",
"category": "variant",
"templateCount": 5,
"priority": 0,
"mapsTo": "vidu",
"hasExistingDoc": false
},
{
"name": "SD3.5",
"category": "variant",
"templateCount": 4,
"priority": 0,
"mapsTo": "sdxl",
"hasExistingDoc": false
},
{
"name": "Google Gemini",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "gemini",
"hasExistingDoc": false
},
{
"name": "Flux.2 Dev",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "flux",
"hasExistingDoc": true
},
{
"name": "Flux.2",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "flux",
"hasExistingDoc": true
},
{
"name": "Wan2.5",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": true
},
{
"name": "Kontext",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "flux",
"hasExistingDoc": false
},
{
"name": "Wan2.6",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": true
},
{
"name": "Hunyuan Video",
"category": "variant",
"templateCount": 3,
"priority": 0,
"mapsTo": "hunyuan",
"hasExistingDoc": true
},
{
"name": "Vidu Q3",
"category": "variant",
"templateCount": 2,
"priority": 0,
"mapsTo": "vidu",
"hasExistingDoc": false
},
{
"name": "LTXV",
"category": "variant",
"templateCount": 2,
"priority": 0,
"mapsTo": "ltx-video",
"hasExistingDoc": true
},
{
"name": "Qwen-Image-Layered",
"category": "variant",
"templateCount": 2,
"priority": 0,
"mapsTo": "qwen",
"hasExistingDoc": true
},
{
"name": "SD1.5",
"category": "variant",
"templateCount": 2,
"priority": 0,
"mapsTo": "sdxl",
"hasExistingDoc": false
},
{
"name": "Gemini-2.5-Flash",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "gemini",
"hasExistingDoc": false
},
{
"name": "Qwen-Image 2512",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "qwen",
"hasExistingDoc": true
},
{
"name": "Seedream 4.0",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "seedream",
"hasExistingDoc": false
},
{
"name": "GPT-Image-1.5",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "gpt-image-1",
"hasExistingDoc": false
},
{
"name": "Kling2.6",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "kling",
"hasExistingDoc": false
},
{
"name": "Wan-Move",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": true
},
{
"name": "Motion Control",
"category": "variant",
"templateCount": 1,
"priority": 0,
"mapsTo": "wan",
"hasExistingDoc": false
}
],
"skip": [
{
"name": "None",
"category": "skip",
"templateCount": 1,
"priority": 0,
"hasExistingDoc": false
},
{
"name": "nano-banana",
"category": "skip",
"templateCount": 1,
"priority": 0,
"note": "Duplicate of Nano Banana Pro",
"hasExistingDoc": false
}
]
},
"priorityOrder": [
"wan",
"nano-banana-pro",
"flux",
"gemini",
"ace-step",
"vidu",
"kling",
"seedance",
"seedream",
"hidream",
"sdxl",
"stable-audio",
"chatterbox",
"z-image",
"recraft",
"runway",
"tripo",
"kandinsky",
"omnigen",
"seedvr2",
"gpt-image-1",
"minimax",
"grok",
"luma",
"moonvalley",
"topaz",
"chroma",
"pixverse",
"meshy",
"rodin",
"magnific",
"wavespeed",
"bria",
"veo",
"hitpaw",
"newbie",
"ovis-image",
"chronoedit",
"humo",
"anima",
"reimagine",
"ideogram",
"svd",
"real-esrgan",
"depth-anything-v2",
"flashvsr"
]
}

View File

@@ -1,53 +0,0 @@
# Moonvalley (Marey)
Marey is Moonvalley's AI video generation model for professional filmmakers, delivering studio-grade quality and trained exclusively on licensed footage.
## Model Variants
### Marey Realism v1.5
- Latest production model with cinematic detail
- 1080p resolution at 24fps, up to 5-second clips
- Available via ComfyUI native nodes and fal.ai
### Marey Director Controls
- 3D-aware camera control from single images
- Motion transfer from reference videos
- Trajectory control for object path definition
- Pose transfer and keyframing with multi-image timeline
## Key Features
- Text-to-video and image-to-video generation
- Camera control with 3D scene understanding
- Motion transfer from reference video clips
- Trajectory control via drawn paths
- Pose transfer for expressive character animation
- Shot extension for seamless duration increase
- Commercially safe (trained on licensed data only)
## Hardware Requirements
- Cloud API-based (no local GPU required)
- Available via Moonvalley platform, ComfyUI, and fal.ai
- Subscription tiers starting at $14.99/month
## Common Use Cases
- Professional film and commercial production
- Cinematic B-roll generation
- Previsualization and storyboarding
- Music video and social media content
- Product advertising with dynamic camera
- Animation and character-driven storytelling
## Key Parameters
- **prompt**: Text description of desired video scene
- **image**: Reference image for image-to-video mode
- **camera_control**: Camera movement specification
- **motion_reference**: Video reference for motion transfer
- **trajectory**: Drawn path for object movement
- **duration**: Clip length (up to 5 seconds)
- **resolution**: Output resolution (up to 1080p at 24fps)

View File

@@ -1,53 +0,0 @@
# Nano Banana Pro
Nano Banana Pro is Google DeepMind's flagship image generation and editing model, accessed through ComfyUI's API nodes. Internally it is the Gemini 3 Pro Image model, designed for production-ready high-fidelity visuals.
## Model Variants
### Nano Banana Pro (Gemini 3 Pro Image)
- State-of-the-art reasoning-powered image generation
- Supports up to 14 reference image inputs
- Native 4K output resolution (up to 4096x4096)
- Complex multi-turn image generation and editing
- Model ID: `gemini-3-pro-image-preview`
### Gemini 2.5 Flash Image (Nano Banana)
- Cost-effective alternative optimized for speed
- Balanced price-to-performance for interactive workflows
- Character consistency and prompt-based editing
- Model ID: `gemini-2.5-flash-image`
## Key Features
- **World knowledge**: Generates accurate real-world images using Google Search's knowledge base
- **Text rendering**: Clean text generation with detection and translation across 10 languages
- **Multi-image fusion**: Blend up to 14 input images into a single coherent output
- **Studio controls**: Adjust angles, focus, color grading in generated images
- **Character consistency**: Maintain subject identity across multiple generations
- **Prompt-based editing**: Targeted transformations via natural language instructions
## Hardware Requirements
- No local GPU required — runs as a cloud API service
- Accessed via ComfyUI API nodes (requires ComfyUI login and network access)
- Available on Comfy Cloud or local ComfyUI with API node support
## Common Use Cases
- High-fidelity text-to-image generation
- Multi-reference style transfer and image blending
- Product visualization and mockups
- Sketch-to-image and blueprint-to-3D visualization
- Text rendering and translation in images
- Iterative prompt-based image editing
## Key Parameters
- **prompt**: Text description of desired image or edit
- **aspect_ratio**: Supported ratios include 1:1, 3:2, 4:3, 9:16, 16:9, 21:9
- **temperature**: 0.0-2.0 (default 1.0)
- **topP**: 0.0-1.0 (default 0.95)
- **max_output_tokens**: Up to 32,768 tokens per response
- **input images**: Up to 14 reference images per prompt

View File

@@ -1,43 +0,0 @@
# NewBie
NewBie image Exp0.1 is a 3.5B parameter open-source text-to-image model built on the Next-DiT architecture, developed by the NewBie-AI community. It is specifically pretrained on high-quality anime data for detailed and visually striking anime-style image generation.
## Model Variants
### NewBie image Exp0.1
- 3.5B parameter DiT model based on Next-DiT architecture
- Uses Gemma3-4B-it as primary text encoder with Jina CLIP v2 for pooled features
- FLUX.1-dev 16-channel VAE for rich color rendering and fine texture detail
- Supports natural language, tags, and XML structured prompts
- Non-commercial community license (Newbie-NC-1.0) for model weights
## Key Features
- Exceptional anime and ACG (Anime, Comics, Games) style generation
- XML structured prompting for improved attribute binding and element disentanglement
- Strong multi-character scene generation with accurate attribute assignment
- ComfyUI integration via dedicated custom nodes
- LoRA training support with community trainer
- Built on research from the Lumina architecture family
## Hardware Requirements
- Minimum: 12GB VRAM (bfloat16 or float16)
- Recommended: 24GB VRAM for comfortable generation
- Requires Gemma3-4B-it and Jina CLIP v2 text encoders
- Python 3.10, PyTorch 2.6.0+, Transformers 4.57.1+
## Common Use Cases
- Anime and illustration generation
- Character design with precise attribute control
- Multi-character scene composition
- Fan art and creative anime artwork
## Key Parameters
- **num_inference_steps**: 28 recommended
- **height/width**: 1024x1024 native resolution
- **prompt_format**: Natural language, tags, or XML structured
- **torch_dtype**: bfloat16 recommended (float16 fallback)

View File

@@ -1,53 +0,0 @@
# OmniGen2
OmniGen2 is a multimodal generation model with dual decoding pathways for text and image, built on the Qwen-VL-2.5 foundation by VectorSpaceLab.
## Model Variants
### OmniGen2
- 3B vision-language encoder (Qwen-VL-2.5) + 4B image decoder
- Dual decoding with unshared parameters for text and image
- Decoupled image tokenizer
- Apache 2.0 license
### OmniGen v1
- Earlier single-pathway architecture
- Fewer capabilities than OmniGen2
- Superseded by OmniGen2
## Key Features
- Text-to-image generation with high fidelity and aesthetics
- Instruction-guided image editing (state-of-the-art among open-source models)
- In-context generation combining multiple reference inputs (humans, objects, scenes)
- Visual understanding inherited from Qwen-VL-2.5
- CPU offload support reduces VRAM usage by nearly 50%
- Sequential CPU offload available for under 3GB VRAM (slower inference)
- Supports negative prompts and configurable guidance scales
## Hardware Requirements
- Minimum: NVIDIA RTX 3090 or equivalent (~17GB VRAM)
- With CPU offload: ~9GB VRAM
- With sequential CPU offload: under 3GB VRAM (significantly slower)
- Flash Attention optional but recommended for best performance
- CUDA 12.4+ recommended
- Default output resolution: 1024x1024
## Common Use Cases
- Text-to-image generation
- Instruction-based photo editing
- Subject-driven image generation from reference photos
- Multi-image composition and in-context editing
## Key Parameters
- **text_guidance_scale**: Controls adherence to text prompt (CFG)
- **image_guidance_scale**: Controls similarity to reference image (1.2-2.0 for editing, 2.5-3.0 for in-context)
- **num_inference_step**: Diffusion steps (default 50)
- **max_pixels**: Maximum total pixel count for input images (default 1024x1024)
- **negative_prompt**: Text describing undesired qualities (e.g., "blurry, low quality, watermark")
- **scheduler**: ODE solver choice (euler or dpmsolver++)

View File

@@ -1,43 +0,0 @@
# Ovis-Image
Ovis-Image is a 7B text-to-image model by AIDC-AI, built on Ovis-U1, optimized for high-quality text rendering in generated images. It achieves state-of-the-art results on the CVTG-2K text rendering benchmark while remaining compact enough for single-GPU deployment.
## Model Variants
### Ovis-Image-7B
- 2B (Ovis2.5-2B) + 7B parameter architecture
- State-of-the-art on CVTG-2K benchmark for text rendering accuracy
- Competitive with 20B+ models (Qwen-Image) and GPT-4o on text-centric tasks
- Uses FLUX-based autoencoder for latent encoding
- Apache 2.0 license
## Key Features
- Excellent text rendering with correct spelling and consistent typography
- High fidelity on text-heavy, layout-sensitive prompts
- Handles posters, banners, logos, UI mockups, and infographics
- Supports diverse fonts, sizes, and aspect ratios
- Strong performance on both English and Chinese text generation
- Available via Diffusers library with OvisImagePipeline
## Hardware Requirements
- Minimum: 16GB VRAM (bfloat16)
- Recommended: 24GB VRAM for comfortable use
- Fits on a single high-end GPU
- Tested with Python 3.10, PyTorch 2.6.0, Transformers 4.57.1
## Common Use Cases
- Generating posters and banners with accurate text
- Logo and brand asset creation
- UI mockup and infographic generation
- Marketing materials with embedded typography
## Key Parameters
- **num_inference_steps**: 50 recommended
- **guidance_scale**: 5.0
- **resolution**: 1024x1024 native
- **negative_prompt**: Supported for quality control

View File

@@ -1,46 +0,0 @@
# PixVerse
PixVerse is an AI video generation platform founded in 2023 and backed by Alibaba, offering text-to-video and image-to-video capabilities with over 100 million registered users.
## Model Variants
### PixVerse V5.5
- Latest model with improved fidelity, text-to-video, image-to-video, and modification
### PixVerse R1
- Real-time AI video generation model
- Interactive control where users direct character actions as video unfolds
### PixVerse V4.5 / V5
- Previous generation models with strong cinematic quality and trending effects
## Key Features
- Text-to-video generation from natural language prompts
- Image-to-video animation with realistic physics simulation
- Fusion mode combining up to 3 images into one video
- Key frame control and video extension with AI continuity
- AI Video Modify for text-prompt-based editing
## Hardware Requirements
- Cloud-based platform with no local GPU required
- Web app at app.pixverse.ai and mobile apps (iOS/Android)
- API at platform.pixverse.ai for developer integration
## Common Use Cases
- Social media content creation (AI Kiss, Hug, Dance effects)
- Marketing and promotional video production
- Old photo revival and animation
- Cinematic narrative and stylistic art generation
## Key Parameters
- prompt: text description of the desired video content
- duration: video length (typically 5s clips)
- resolution: output quality (360p to 720p+)
- aspect_ratio: 16:9, 9:16, 1:1, and other ratios

View File

@@ -1,77 +0,0 @@
# Qwen
Qwen is Alibaba's family of vision-language and image generation models, spanning visual understanding, image editing, and image generation.
## Model Variants
### Qwen2.5-VL
- Multimodal vision-language model from the Qwen team
- Available in 3B, 7B, and 72B parameter sizes
- Image understanding, video comprehension (1+ hour videos), and visual localization
- Visual agent capabilities: computer use, phone use, dynamic tool calling
- Structured output generation for invoices, forms, and tables
- Dynamic resolution and frame rate training for video understanding
- Optimized ViT encoder with window attention, SwiGLU, and RMSNorm
### Qwen-Image-Edit
- Specialized image editing model with instruction-following
- Supports inpainting, outpainting, style transfer, and content-aware edits
- 11 workflow templates available
### Qwen-Image
- Text-to-image generation model from the Qwen family
- 7 workflow templates available
### Qwen-Image-Layered
- Layered image generation for composable outputs
- Generates images with separate foreground/background layers
- 2 workflow templates available
### Qwen-Image 2512
- Specific variant optimized for particular generation tasks
- 1 workflow template available
## Key Features
- Strong visual understanding with state-of-the-art benchmark results
- Native multi-language support including Chinese and English
- Visual agent capabilities for computer and phone interaction
- Video event capture with temporal segment pinpointing
- Bounding box and point-based visual localization
- Structured JSON output for document and table extraction
- Instruction-based image editing with precise control
## Hardware Requirements
- 3B model: 6-8GB VRAM
- 7B model: 16GB VRAM, flash_attention_2 recommended for multi-image/video
- 72B model: Multi-GPU setup required (80GB+ per GPU)
- Context length: 32,768 tokens default, extendable to 64K+ with YaRN
- Dynamic pixel budget: 256-1280 tokens per image (configurable min/max pixels)
## Common Use Cases
- Image editing based on text instructions
- Visual question answering and image description
- Long video comprehension and event extraction
- Document OCR and structured data extraction
- Visual agent tasks (screen interaction, UI navigation)
- Layered image generation for design workflows
- Text-to-image generation with strong prompt following
## Key Parameters
- **max_new_tokens**: Controls output length for VL model responses
- **min_pixels / max_pixels**: Control image token budget (e.g. 256x28x28 to 1280x28x28)
- **temperature**: Generation diversity for text outputs
- **resized_height / resized_width**: Direct image dimension control (rounded to nearest 28)
- **fps**: Frame rate for video input processing in Qwen2.5-VL
## Blog References
- [Qwen Image Edit 2511 & Qwen Image Layered](../blog/qwen-image-edit-2511.md) — Better character consistency, RGBA layer decomposition, built-in LoRA support

View File

@@ -1 +0,0 @@
Qwen is Alibaba's family of vision-language and image generation models. Qwen2.5-VL is a multimodal vision-language model available in 3B (6-8GB VRAM), 7B (16GB), and 72B (multi-GPU 80GB+) sizes, capable of image understanding, hour-long video comprehension, visual localization, visual agent tasks (computer/phone use), and structured JSON output for document extraction. Qwen-Image-Edit provides instruction-based image editing with inpainting, outpainting, and style transfer. Qwen-Image handles text-to-image generation, while Qwen-Image-Layered produces composable foreground/background layer outputs. The family features native Chinese/English support, strong prompt following, and state-of-the-art visual understanding benchmarks. Key parameters include dynamic pixel budgets (256-1280 tokens per image), configurable frame rates for video input, and temperature for text diversity. Primary uses: image editing, visual QA, video comprehension, document OCR, and layered image generation.

View File

@@ -1,61 +0,0 @@
# Real-ESRGAN
Real-ESRGAN is a practical image and video super-resolution model that extends ESRGAN with improved training on pure synthetic data for real-world restoration.
## Model Variants
### RealESRGAN_x4plus
- General-purpose 4× upscaling model for real-world images
- RRDB (Residual-in-Residual Dense Block) architecture
- Handles noise, blur, JPEG compression artifacts
### RealESRGAN_x4plus_anime_6B
- Optimized for anime and illustration images
- Smaller 6-block model for faster inference
- Better edge preservation for line art
### RealESRGAN_x2plus
- 2× upscaling variant for moderate enlargement
- Lower risk of hallucinated details
### realesr-animevideov3
- Lightweight model designed for anime video frames
- Temporal consistency for video processing
## Key Features
- Trained entirely on synthetic degradation data (no paired real-world data needed)
- Second-order degradation modeling simulates real-world compression chains
- GFPGAN integration for face enhancement during upscaling
- Tiling support for processing large images with limited VRAM
- FP16 (half precision) inference for faster processing
- NCNN Vulkan portable executables for cross-platform GPU support (Intel/AMD/NVIDIA)
- Supports 2×, 3×, and 4× upscaling with arbitrary output scale via LANCZOS4 resize
## Hardware Requirements
- Minimum: 2GB VRAM with tiling enabled
- Recommended: 4GB+ VRAM for comfortable use
- NCNN Vulkan build runs on any GPU with Vulkan support
- CPU inference supported but significantly slower
## Common Use Cases
- Upscaling old or low-resolution photographs
- Enhancing compressed web images
- Anime and manga image upscaling
- Video frame super-resolution
- Restoring degraded historical images
- Pre-processing for print from low-resolution sources
## Key Parameters
- **outscale**: Final upsampling scale factor (default: 4)
- **tile**: Tile size for memory management (0 = no tiling)
- **face_enhance**: Enable GFPGAN face enhancement (default: false)
- **model_name**: Select model variant (RealESRGAN_x4plus, anime_6B, etc.)
- **denoise_strength**: Balance noise removal vs detail preservation (realesr-general-x4v3)

View File

@@ -1,50 +0,0 @@
# Recraft
Recraft is an AI image generation platform known for its V3 model and unique ability to produce both raster and vector (SVG) images from text prompts.
## Model Variants
### Recraft V3
- Top-ranked model on Artificial Analysis Text-to-Image Leaderboard
- Supports raster image generation at $0.04 per image
- Supports vector SVG generation at $0.08 per image
- Accurate text rendering at any size in generated images
### Recraft 20B
- More cost-effective variant at $0.022 per raster image
- Vector generation at $0.044 per image
- Suitable for high-volume production workflows
## Key Features
- Native vector SVG image generation from text prompts
- Accurate text rendering (headlines, labels, signs) in images
- Custom brand style creation from reference images
- Generation in exact brand colors for brand consistency
- AI-powered image vectorization (PNG/JPG to SVG)
- Background removal, creative upscaling, and crisp upscaling
- Multiple style presets: photorealism, clay, retro-pop, hand-drawn, 80s
## Hardware Requirements
- API-only access via Recraft API
- No local hardware requirements
- Available through Recraft Studio web interface
## Common Use Cases
- Logo and icon design (SVG output)
- Brand-consistent marketing asset generation
- Poster and advertisement creation with text
- Scalable vector illustrations for web and print
- Product mockup generation
- SEO blog imagery at scale
## Key Parameters
- **prompt**: Text description of the desired image
- **style**: Visual style (realistic_image, digital_illustration, vector_illustration, icon)
- **colors**: Brand color palette for consistent output
- **format**: Output format (raster PNG/JPG or vector SVG)

View File

@@ -1,57 +0,0 @@
# Rodin
Rodin is a 3D generation API by Hyper3D (DeemosTech) that creates production-ready 3D models from text or images with PBR materials.
## Model Variants
### Rodin Gen-2
- Most advanced model with 10 billion parameters
- Built on the BANG architecture
- 4x improved geometric mesh quality over Gen-1
- Generation time approximately 90 seconds
### Rodin Gen-1.5 Regular
- Detailed 3D assets with customizable quality
- Adjustable polygon counts and 2K textures
- Generation time approximately 70 seconds
### Rodin Sketch
- Fast prototyping with basic geometry and 1K textures
- GLB format only, generation in approximately 20 seconds
## Key Features
- Text-to-3D and image-to-3D generation
- Multi-view image input (up to 5 images) with fuse and concat modes
- PBR and Shaded material options
- Quad and triangle mesh modes
- HighPack add-on for 4K textures and high-poly models
- Bounding box ControlNet for dimension constraints
- T/A pose control for humanoid models
## Hardware Requirements
- Cloud API-based (no local GPU required)
- All generation runs on Hyper3D servers
- API key required via hyper3d.ai dashboard
## Common Use Cases
- Game asset production
- VR/AR content creation
- Product visualization
- Character modeling with pose control
- Rapid 3D prototyping
## Key Parameters
- **prompt**: Text description for text-to-3D mode
- **images**: Up to 5 reference images for image-to-3D
- **quality**: Detail level (high, medium, low, extra-low)
- **mesh_mode**: Face type (Quad or Raw triangles)
- **material**: Material type (PBR, Shaded, or All)
- **geometry_file_format**: Output format (glb, fbx, obj, stl, usdz)
- **seed**: Randomization seed (0-65535)

View File

@@ -1,50 +0,0 @@
# Runway
Runway is a generative AI company producing state-of-the-art video generation models, accessible via API and web interface.
## Model Variants
### Gen-3 Alpha
- Text-to-video and image-to-video at 1280x768, 24fps
- 5 or 10 second output, extendable up to 40 seconds
- Photorealistic human character generation
### Gen-3 Alpha Turbo
- Faster, lower-cost variant (5 credits/sec vs 10)
- Requires input image; supports first, middle, and last keyframes
- Video extension up to 34 seconds total
### Gen-4 Turbo
- Latest generation with improved motion and prompt adherence
- Image reference support and text-to-image (gen4_image)
## Key Features
- Advanced camera controls (Motion Brush, Director Mode)
- C2PA provenance metadata for content authenticity
- Expressive human characters with gestures and emotions
- Wide range of cinematic styles and terminology support
## Hardware Requirements
- API-only access via Runway developer portal
- No local hardware requirements
- Enterprise tier available for higher rate limits
## Common Use Cases
- Film pre-visualization and storyboarding
- Commercial advertisement production
- Social media video content
- Visual effects and motion graphics
- Music video and artistic video creation
## Key Parameters
- **prompt**: Text description guiding video generation
- **duration**: Output length (5 or 10 seconds)
- **ratio**: Aspect ratio (1280:768 or 768:1280)
- **keyframes**: Start, middle, and/or end frame images

View File

@@ -1,63 +0,0 @@
# Stable Diffusion 3.5
Stable Diffusion 3.5 is Stability AI's text-to-image model family based on the Multimodal Diffusion Transformer (MMDiT) architecture with rectified flow matching.
## Model Variants
### Stable Diffusion 3.5 Large
- 8.1 billion parameter MMDiT model
- Highest quality and prompt adherence in the SD family
- 1 megapixel native resolution (1024×1024)
- 28-50 inference steps recommended
### Stable Diffusion 3.5 Large Turbo
- Distilled version of SD 3.5 Large
- 4-step inference for fast generation
- Guidance scale of 0 (classifier-free guidance disabled)
- Comparable quality to full model at fraction of the time
### Stable Diffusion 3.5 Medium
- 2.5 billion parameter MMDiT-X architecture
- Designed for consumer hardware (9.9GB VRAM for transformer)
- Dual attention blocks in first 12 transformer layers
- Multi-resolution generation from 0.25 to 2 megapixels
- Skip Layer Guidance recommended for better coherency
## Key Features
- Three text encoders: CLIP ViT-L, OpenCLIP ViT-bigG (77 tokens each), T5-XXL (256 tokens)
- QK-normalization for stable training and easier fine-tuning
- Rectified flow matching replaces traditional DDPM/DDIM sampling
- Strong text rendering and typography in generated images
- Diverse output styles (photography, 3D, painting, line art)
- Highly customizable base for fine-tuning and LoRA training
- T5-XXL encoder optional (can be removed to save memory with minimal quality loss)
- Supports negative prompts for excluding unwanted elements
## Hardware Requirements
- Large: 24GB+ VRAM recommended (fp16), quantizable to fit smaller GPUs
- Large Turbo: 16GB+ VRAM recommended
- Medium: 10GB VRAM minimum (excluding text encoders)
- NF4 quantization available via bitsandbytes for low-VRAM GPUs
- CPU offloading supported via diffusers pipeline
## Common Use Cases
- Photorealistic image generation
- Artistic illustration and concept art
- Typography and text-heavy designs
- Product visualization
- Fine-tuning and LoRA development
- ControlNet-guided generation
## Key Parameters
- **steps**: 28-50 for Large, 4 for Large Turbo, 20-40 for Medium
- **guidance_scale**: 4.5-7.5 for Large/Medium, 0 for Large Turbo
- **max_sequence_length**: T5 token limit (77 or 256, higher = better prompt understanding)
- **resolution**: 1024×1024 native, flexible aspect ratios around 1MP
- **negative_prompt**: Text describing elements to exclude (not supported by Turbo)

View File

@@ -1,75 +0,0 @@
# Stable Diffusion
Stable Diffusion is Stability AI's family of open-source image and video generation models, spanning multiple architectures from U-Net to diffusion transformers.
## Model Variants
### SDXL (Stable Diffusion XL)
- Stability AI's flagship text-to-image model (6.6B parameter U-Net)
- Native 1024x1024 resolution with flexible aspect ratios around 1MP
- Two text encoders (CLIP ViT-L + OpenCLIP ViT-bigG)
- Optional refiner model for second-stage detail enhancement
- Turbo and Lightning distilled variants for 1-4 step generation
- Largest ecosystem of LoRAs, fine-tunes, and community models
### SD3.5 (Stable Diffusion 3.5)
- Diffusion transformer (DiT) architecture, successor to SDXL
- Three text encoders (CLIP ViT-L, OpenCLIP ViT-bigG, T5-XXL) for stronger prompt following
- Available in Large (8B) and Medium (2B) parameter sizes
- Improved text rendering and compositional accuracy over SDXL
- 4 workflow templates available
### SD1.5 (Stable Diffusion 1.5)
- The classic 512x512 latent diffusion model
- Single CLIP ViT-L text encoder, 860M parameter U-Net
- Still widely used for its massive LoRA and checkpoint ecosystem
- Lower VRAM requirements make it accessible on consumer hardware
- 2 workflow templates available
### SVD (Stable Video Diffusion)
- Image-to-video generation model based on Stable Diffusion
- Generates short video clips (14 or 25 frames) from a single image
- Related model for motion generation from static inputs
### Stability API Products
- Reimagine: Stability's API-based image variation and transformation service
## Key Features
- Excellent composition, layout, and photorealism (SDXL/SD3.5)
- Large open-source ecosystem with thousands of community fine-tunes
- Flexible aspect ratios and multi-resolution support
- Dual/triple CLIP text encoding for nuanced prompt interpretation
- Strong text rendering in SD3.5 via T5-XXL encoder
## Hardware Requirements
- SD1.5: 4-6GB VRAM (fp16), runs on most consumer GPUs
- SDXL Base: 8GB VRAM minimum (fp16), 12GB recommended
- SDXL Base + Refiner: 16GB+ VRAM
- SD3.5 Medium: 8-12GB VRAM
- SD3.5 Large: 16-24GB VRAM (fp16), quantized versions for 12GB cards
## Common Use Cases
- Photorealistic image generation
- Artistic illustrations and concept art
- Product photography and design
- Character and portrait generation
- LoRA-based custom style and subject training
- Image-to-video with SVD
## Key Parameters
- **steps**: 20-40 for SDXL base, 15-25 for refiner, 28+ for SD3.5
- **cfg_scale**: 5-10 (7 default for SDXL), 3.5-7 for SD3.5
- **sampler**: DPM++ 2M Karras and Euler are popular for SDXL; Euler for SD3.5
- **resolution**: 1024x1024 native for SDXL/SD3.5, 512x512 for SD1.5
- **clip_skip**: Often set to 1-2; important for SD1.5 LoRA compatibility
- **denoise_strength**: 0.7-0.8 when using the SDXL refiner (img2img)
- **negative_prompt**: Supported in SDXL/SD1.5; not used in SD3.5 by default

View File

@@ -1 +0,0 @@
Stable Diffusion is Stability AI's open-source image and video generation family. SDXL is the flagship text-to-image model (6.6B U-Net, dual CLIP encoders) generating 1024x1024 images with the largest ecosystem of LoRAs and community fine-tunes; it requires 8-12GB VRAM with Turbo/Lightning variants for 1-4 step generation. SD3.5 is the DiT-based successor with triple text encoders (including T5-XXL) in Large (8B, 16-24GB) and Medium (2B, 8-12GB) sizes, offering improved text rendering and compositional accuracy. SD1.5 remains popular for its massive ecosystem at just 4-6GB VRAM (512x512). SVD handles image-to-video generation (14 or 25 frames). Key parameters: 20-40 steps for SDXL, cfg_scale 5-10 (7 default), DPM++ 2M Karras sampler. Primary uses: photorealistic generation, artistic illustration, product photography, character generation, and LoRA-based custom training.

View File

@@ -1,64 +0,0 @@
# Seedance
Seedance is ByteDance's video generation model family, designed for cinematic, high-fidelity video creation from text and images. The 1.0 series established a standard for fluid motion and multi-shot consistency, while the 1.5 series adds native joint audio-visual generation.
## Model Variants
### Seedance 1.5 Pro
- Native audio-visual generation producing video and audio in a single pass
- Multilingual lip-sync supporting English, Mandarin, Japanese, Korean, and Spanish
- 1080p output with 5-12 second duration
- Advanced directorial camera controls (dolly zoom, tracking shots, whip pans)
- Captures micro-expressions, non-verbal cues, and emotional transitions
### Seedance 1.0 Pro
- Production-quality 1080p video generation
- Text-to-video and image-to-video with first and last frame control
- Native multi-shot storytelling with subject and style consistency across cuts
- Cinematic camera grammar interpretation (35mm film, noir lighting, drone shots)
- 2-12 second video duration at 24-30fps
### Seedance 1.0 Pro Fast
- Faster, more cost-effective version of 1.0 Pro
- Same capabilities with reduced generation time
### Seedance 1.0 Lite
- Optimized for speed and iteration at 720p or 1080p
- Lower cost per generation for rapid prototyping
## Key Features
- Smooth, stable motion with wide dynamic range for large-scale movements
- Native multi-shot storytelling maintaining consistency across transitions
- Diverse stylistic expression (photorealism, cyberpunk, illustration, pixel art)
- Precise prompt following for complex actions, multi-agent interactions, and camera work
- Joint audio-visual synthesis with environmental sounds and dialogue (1.5)
- Supports multiple aspect ratios (16:9, 9:16, 1:1, 4:3, 21:9, and more)
## Hardware Requirements
- Cloud API only; no local weights publicly available
- Accessed via seed.bytedance.com, Scenario, fal.ai, and other API providers
- 1080p 5-second video costs approximately $0.62 via fal.ai (Pro)
- Lite version available at lower cost ($0.18 per 720p 5-second video)
## Common Use Cases
- Cinematic shorts and scene previsualization
- Music video concept development
- Product demonstration and marketing videos
- Character-focused animation sequences
- Social media content with audio (1.5)
- Moodboard and style exploration for creative teams
## Key Parameters
- **prompt**: Text description of desired scene, action, and camera work
- **image_url**: Source image for image-to-video generation (first frame)
- **duration**: Video length (2-12 seconds for 1.0, 5-12 seconds for 1.5)
- **resolution**: 480p, 720p, or 1080p output
- **aspect_ratio**: 16:9, 9:16, 1:1, 4:3, 21:9, 9:21

View File

@@ -1,50 +0,0 @@
# Seedream
Seedream is ByteDance's text-to-image generation model, capable of producing high-quality images with strong text rendering, bilingual support (English and Chinese), and native high-resolution output.
## Model Variants
### Seedream 3.0
- Native 2K resolution output without post-processing
- Bilingual image generation (English and Chinese)
- 3-second end-to-end generation for 1K images
- Improved text rendering for small fonts and long text layouts
### Seedream 4.0
- Unified architecture for text-to-image and image editing
- Native output up to 4K resolution
- Multi-image reference input (up to 6 source images)
- 1.8-second inference for 2K images
- Batch input and output for multiple generations
- Natural language image editing capabilities
## Key Features
- Accurate text rendering in both English and Chinese
- Knowledge-driven generation for educational illustrations and charts
- Strong character consistency across multiple angles
- Prompt-based image editing without separate tools
- Versatile style support from photorealism to anime
- Leading scores on Artificial Analysis Image Arena
## Hardware Requirements
- API-only access via ByteDance Volcano Engine
- No local hardware requirements for end users
- Third-party API providers available (e.g., EvoLink)
## Common Use Cases
- Poster and advertisement design with embedded text
- E-commerce product photography
- Character design with multi-angle consistency
- Educational illustration and infographic generation
- Brand-consistent marketing materials
## Key Parameters
- **prompt**: Text description of the desired image
- **resolution**: Output resolution (up to 4K supported)
- **aspect_ratio**: Supports 16:9, 4:3, 1:1, and custom ratios

View File

@@ -1,47 +0,0 @@
# SeedVR2
SeedVR2 is a one-step diffusion-based video restoration model developed by ByteDance Seed and NTU S-Lab, published at ICLR 2026.
## Model Variants
### SeedVR2-3B
- 3B parameter DiT with one-step inference for video and image upscaling
- Available in FP16, FP8, and GGUF quantized formats
### SeedVR2-7B
- 7B parameter model with Sharp variant for maximum detail
- Multi-GPU inference; supports 1080p and 2K on 4x H100-80GB
### SeedVR (Original)
- Multi-step diffusion model (CVPR 2025 Highlight)
- Arbitrary-resolution restoration without pretrained diffusion prior
## Key Features
- One-step inference achieving 10x speedup over multi-step methods
- Adaptive window attention with dynamic sizing for high-resolution inputs
- Adversarial post-training against real data for faithful detail recovery
- ComfyUI integration via official SeedVR2 Video Upscaler nodes
- Apache 2.0 open-source license
## Hardware Requirements
- Minimum: 8-12GB VRAM with GGUF quantization and tiled VAE
- Recommended: 24GB+ VRAM (RTX 4090) for 3B model at 1080p
- High-end: 4x H100-80GB for 7B model at 2K resolution
## Common Use Cases
- Upscaling AI-generated video to 1080p or 4K
- Restoring degraded or compressed video footage
- Image super-resolution and detail recovery
## Key Parameters
- resolution: target shortest-edge resolution (720, 1080, 2160)
- batch_size: frames per batch, must follow 4n+1 formula (5, 9, 13, 17, 21)
- seed: random seed for reproducible generation
- color_fix_type: wavelet, adain, hsv, or none

View File

@@ -1,53 +0,0 @@
# Stable Audio Open
Stable Audio Open 1.0 is Stability AI's open-source text-to-audio model for generating sound effects, production elements, and short musical clips.
## Model Variants
### Stable Audio Open 1.0
- 1.2B parameter latent diffusion model
- Transformer-based diffusion (DiT) architecture
- T5-base text encoder for conditioning
- Variational autoencoder for audio compression
- Stability AI Community License (non-commercial)
### Stable Audio (Commercial)
- Full-length music generation up to 3 minutes with audio-to-audio and inpainting
- Available via Stability AI platform API, commercial license
## Key Features
- Generates up to 47 seconds of stereo audio at 44.1kHz
- Text-prompted sound effects, drum beats, ambient sounds, and foley
- Variable-length output with timing control
- Fine-tunable on custom audio datasets
- Trained exclusively on Creative Commons licensed audio (CC0, CC BY, CC Sampling+)
- Strong performance for sound effects and field recordings
- Compatible with both stable-audio-tools and diffusers libraries
## Hardware Requirements
- Minimum: 8GB VRAM (fp16)
- Recommended: 12GB+ VRAM for comfortable inference
- Half-precision (fp16) supported for reduced memory
- Chunked decoding available for memory-constrained setups
- Inference speed: 8-20 diffusion steps per second depending on GPU
## Common Use Cases
- Sound effect and foley generation
- Drum beats and instrument riff creation
- Ambient soundscapes and background audio
- Music production elements and samples
- Audio prototyping for film and game sound design
## Key Parameters
- **steps**: Number of inference steps (100-200 recommended)
- **cfg_scale**: Classifier-free guidance scale (typically 7)
- **seconds_total**: Target audio duration (up to 47 seconds)
- **seconds_start**: Start time offset for timing control
- **negative_prompt**: Text describing undesired audio qualities
- **sampler_type**: Diffusion sampler (dpmpp-3m-sde recommended)

View File

@@ -1,55 +0,0 @@
# Stable Video Diffusion
Stable Video Diffusion (SVD) is Stability AI's image-to-video diffusion model that generates short video clips from a single conditioning image. In user studies, SVD was preferred over GEN-2 and PikaLabs for video quality.
## Model Variants
### SVD-XT (25 frames)
- Generates 25 frames at 576x1024 resolution
- Finetuned from the 14-frame SVD base model
- Includes temporally consistent f8-decoder
- Standard frame-wise decoder also available
### SVD (14 frames)
- Original release generating 14 frames
- Foundation for community fine-tunes and extensions
- Same 576x1024 native resolution
## Key Features
- Image-to-video generation from a single still image
- Temporally consistent video output with finetuned decoder
- Preferred over GEN-2 and PikaLabs in human evaluation studies
- SynthID-compatible watermarking enabled by default
- Latent diffusion architecture for efficient generation
## Hardware Requirements
- Minimum: 16GB VRAM
- Recommended: A100 80GB for full quality (tested configuration)
- SVD generation ~100s, SVD-XT ~180s on A100 80GB
- Optimizations available for lower VRAM cards with quality tradeoffs
## Common Use Cases
- Animating still images into short video clips
- Product visualization and motion graphics
- Creative video experiments and art
- Research on generative video models
## Key Parameters
- **num_frames**: 14 (SVD) or 25 (SVD-XT)
- **resolution**: 576x1024 native
- **conditioning_frame**: Input image at same resolution
- **duration**: Up to ~4 seconds (25 frames)
## Limitations
- Short videos only (4 seconds maximum)
- No text-based control (image conditioning only)
- Cannot render legible text in output
- Faces and people may not generate properly
- May produce videos without motion or with very slow camera pans

Some files were not shown because too many files have changed in this diff Show More