mirror of https://github.com/Comfy-Org/ComfyUI_frontend.git synced 2026-05-02 04:02:20 +00:00

Files

snomiao 854f1c7da0 feat: readFixture/readTest tools, ANTHROPIC_API_KEY_QA, fix TS errors

- Add readFixture and readTest tools to qa-agent for fixture API discovery
- Enrich system prompt with comprehensive ComfyPage fixture API reference
- Switch CI to ANTHROPIC_API_KEY_QA secret
- Fix all TS errors in qa-agent.ts, qa-record.ts, qa-reproduce.ts
- Better error handling for API credit exhaustion
- Rewrite SKILL.md to reflect three-phase pipeline

Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800
Co-authored-by: Amp <amp@ampcode.com>

2026-04-01 06:44:34 +00:00

9.3 KiB

Raw Blame History

name, description

name	description
comfy-qa	Comprehensive QA of ComfyUI frontend. Navigates all routes, tests all interactive features using playwright-cli, generates a report, and submits a draft PR. Works in CI and local environments, cross-platform.

ComfyUI Frontend QA Skill

Automated quality assurance for the ComfyUI frontend. The pipeline reproduces reported bugs using Playwright E2E tests, records video evidence, and deploys reports to Cloudflare Pages.

Architecture Overview

The QA pipeline uses a three-phase approach:

RESEARCH — Claude writes Playwright E2E tests to reproduce bugs (assertion-backed, no hallucination)
REPRODUCE — Deterministic replay of the research test with video recording
REPORT — Deploy results to Cloudflare Pages with badge, video, and verdict

Key Design Decision

Earlier iterations used AI vision (Gemini) to drive a browser and judge results from video. This was abandoned after discovering AI reviewers hallucinate — Gemini reported "REPRODUCED" when videos showed idle screens. The current approach uses Playwright assertions as the source of truth: if the test passes, the bug is proven.

Prerequisites

Node.js 22+
pnpm package manager
gh CLI (authenticated)
Playwright browsers: npx playwright install chromium
Environment variables:
- GEMINI_API_KEY — for PR analysis and video review
- ANTHROPIC_API_KEY — for Claude Agent SDK (research phase)
- CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID — for report deployment

Pipeline Scripts

Script	Role	Model
`scripts/qa-analyze-pr.ts`	Deep PR/issue analysis → QA guide	gemini-3.1-pro-preview
`scripts/qa-agent.ts`	Research phase: Claude writes E2E tests	claude-sonnet-4-6 (Agent SDK)
`scripts/qa-record.ts`	Before/after video recording with Gemini-driven actions	gemini-3.1-pro-preview
`scripts/qa-reproduce.ts`	Deterministic replay with narration	gemini-3-flash-preview
`scripts/qa-video-review.ts`	Video comparison review	gemini-3-flash-preview
`scripts/qa-generate-test.ts`	Regression test generation from QA report	gemini-3-flash-preview
`scripts/qa-deploy-pages.sh`	Deploy to Cloudflare Pages + badge	—
`scripts/qa-batch.sh`	Batch-trigger QA for multiple issues	—
`scripts/qa-report-template.html`	Report site (light/dark, seekbar, copy badge)	—

Triggering QA

Via GitHub Labels

qa-changes — Focused QA on a PR (Linux-only, before/after comparison)
qa-full — Full QA (3-OS matrix, after-only)
qa-issue — Reproduce a bug from an issue

Via Batch Script

# Trigger QA for specific issue numbers
./scripts/qa-batch.sh 10394 10238 9996

# From a triage file (top 5 Tier 1 issues)
./scripts/qa-batch.sh --from tmp/issues.md --top 5

# Preview without pushing
./scripts/qa-batch.sh --dry-run 10394

# Clean up old trigger branches
./scripts/qa-batch.sh --cleanup

Via Workflow Dispatch

Go to Actions → "PR: QA" → Run workflow → choose mode (focused/full).

CI Workflow (`.github/workflows/pr-qa.yaml`)

resolve-matrix → analyze-pr ──┐
                               ├→ qa-before (main branch, worktree build)
                               ├→ qa-after  (PR branch)
                               └→ report (video review, deploy, comment)

Before/after jobs run in parallel on separate runners for clean isolation.

Issue Reproduce Mode

For issues (not PRs), the pipeline:

Fetches the issue body and comments
Runs qa-analyze-pr.ts --type issue to generate a QA guide
Runs the research phase (Claude writes E2E test to reproduce)
Records video of the test execution
Posts results as a comment on the issue

Running Locally

Step 1: Environment Setup

# Ensure ComfyUI server is running
# Default: http://127.0.0.1:8188

# Install Playwright browsers
npx playwright install chromium

Step 2: Analyze the Issue/PR

# For a PR
pnpm exec tsx scripts/qa-analyze-pr.ts \
  --pr-number 10394 \
  --repo Comfy-Org/ComfyUI_frontend \
  --output-dir qa-guides

# For an issue
pnpm exec tsx scripts/qa-analyze-pr.ts \
  --pr-number 10394 \
  --repo Comfy-Org/ComfyUI_frontend \
  --output-dir qa-guides \
  --type issue

Step 3: Record Before/After

# Before (main branch)
pnpm exec tsx scripts/qa-record.ts \
  --mode before \
  --diff /tmp/pr-diff.txt \
  --output-dir /tmp/qa-before \
  --qa-guide qa-guides/qa-guide-1.json

# After (PR branch)
pnpm exec tsx scripts/qa-record.ts \
  --mode after \
  --diff /tmp/pr-diff.txt \
  --output-dir /tmp/qa-after \
  --qa-guide qa-guides/qa-guide-1.json

Step 4: Review Videos

pnpm exec tsx scripts/qa-video-review.ts \
  --artifacts-dir /tmp/qa-artifacts \
  --video-file qa-session.mp4 \
  --before-video qa-before-session.mp4 \
  --output-dir /tmp/video-reviews \
  --pr-context /tmp/pr-context.txt

Research Phase Details (`qa-agent.ts`)

Claude receives:

The issue description and comments
A QA guide from qa-analyze-pr.ts
An accessibility tree snapshot of the current UI

Claude's tools:

inspect(selector?) — Read a11y tree to discover element selectors
writeTest(code) — Write a Playwright .spec.ts file
runTest() — Execute the test and get pass/fail + errors
done(verdict, summary, evidence, testCode) — Finish with verdict

The test uses the project's Playwright fixtures (comfyPageFixture), giving access to comfyPage.page, comfyPage.menu, comfyPage.settings, etc.

Verdict Logic

REPRODUCED — Test passes (asserting the bug exists) → bug is proven
NOT_REPRODUCIBLE — Claude exhausted attempts, test cannot pass
INCONCLUSIVE — Agent timed out or encountered infrastructure issues

Auto-completion: if a test passed but done() was never called, the pipeline auto-completes with REPRODUCED.

Manual QA (Fallback)

When the automated pipeline isn't suitable (e.g., visual-only bugs, complex multi-step interactions), use playwright-cli for manual browser interaction:

# Install
npm install -g @playwright/cli@latest

# Open browser and navigate
playwright-cli open http://127.0.0.1:8188

# Get element references
playwright-cli snapshot

# Interact
playwright-cli click e1
playwright-cli fill e2 "test text"
playwright-cli press Escape
playwright-cli screenshot --filename=f.png

Snapshots return element references (e1, e2, …). Always run snapshot after navigation to refresh refs.

Manual QA Test Plan

When performing manual QA (either via playwright-cli or the automated pipeline), systematically test each area below.

Application Load & Routes

Test	Steps
Root route loads	Navigate to `/` — GraphView should render with canvas
User select route	Navigate to `/user-select` — user selection UI should appear
404 handling	Navigate to `/nonexistent` — should handle gracefully

Canvas & Graph View

Test	Steps
Canvas renders	The LiteGraph canvas is visible and interactive
Pan canvas	Click and drag on empty canvas area
Zoom in/out	Use scroll wheel or Alt+=/Alt+-
Add node via double-click	Double-click canvas to open search, type "KSampler", select it
Delete node	Select a node, press Delete key
Connect nodes	Drag from output slot to input slot
Copy/Paste	Select nodes, Ctrl+C then Ctrl+V
Undo/Redo	Make changes, Ctrl+Z to undo, Ctrl+Y to redo
Context menus	Right-click node vs empty canvas — different menus

Sidebar Tabs

Test	Steps
Workflows tab	Press W — workflows sidebar opens
Node Library tab	Press N — node library opens
Model Library tab	Press M — model library opens
Tab toggle	Press same key again — sidebar closes
Search in sidebar	Type in search box — results filter

Settings Dialog

Test	Steps
Open settings	Press Ctrl+, or click settings button
Change a setting	Toggle a boolean setting — it persists after closing
Search settings	Type in settings search box — results filter
Close settings	Press Escape or click close button

Execution & Queue

Test	Steps
Queue prompt	Load default workflow, click Queue — execution starts
Queue progress	Progress indicator shows during execution
Interrupt	Press Ctrl+Alt+Enter during execution — interrupts

Report Site

Deployed to Cloudflare Pages at https://comfy-qa.pages.dev/<branch>/.

Features:

Light/dark theme
Seekable video player with preload
Copy badge button (markdown)
Date-stamped badges (e.g., QA0327)
Vertical box badge for issues and PRs

Known Issues & Troubleshooting

See docs/qa/TROUBLESHOOTING.md for common failures:

set -euo pipefail + grep with no match → append || true
__name is not defined in page.evaluate → use addScriptTag
Cursor not visible in videos → monkey-patch page.mouse methods
Agent not calling done() → auto-complete from passing test

Backlog

See docs/qa/backlog.md for planned improvements:

Type B comparison: Different commits for regression detection
Type C comparison: Cross-browser testing
Pre-seed assets: Upload test images before recording
Lazy a11y tree: Reduce token usage with inspect(selector) vs full dump

9.3 KiB Raw Blame History