mirror of
https://github.com/Comfy-Org/ComfyUI_frontend.git
synced 2026-04-19 22:09:37 +00:00
Compare commits
1 Commits
sno-qa-102
...
sno-qa-986
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1dd66315ed |
@@ -1,283 +1,133 @@
|
||||
---
|
||||
name: comfy-qa
|
||||
description: 'Comprehensive QA of ComfyUI frontend. Navigates all routes, tests all interactive features using playwright-cli, generates a report, and submits a draft PR. Works in CI and local environments, cross-platform.'
|
||||
description: 'Comprehensive QA of ComfyUI frontend. Reproduces bugs via E2E tests, records narrated demo videos, deploys reports. Works in CI and locally via `pnpm qa`.'
|
||||
---
|
||||
|
||||
# ComfyUI Frontend QA Skill
|
||||
|
||||
Automated quality assurance for the ComfyUI frontend. The pipeline reproduces reported bugs using Playwright E2E tests, records video evidence, and deploys reports to Cloudflare Pages.
|
||||
Automated quality assurance pipeline that reproduces reported bugs using Playwright E2E tests, records narrated demo videos with demowright, and deploys reports to Cloudflare Pages.
|
||||
|
||||
## Architecture Overview
|
||||
## Quick Start
|
||||
|
||||
The QA pipeline uses a **three-phase approach**:
|
||||
```bash
|
||||
# Reproduce an issue
|
||||
pnpm qa 10253
|
||||
|
||||
1. **RESEARCH** — Claude writes Playwright E2E tests to reproduce bugs (assertion-backed, no hallucination)
|
||||
2. **REPRODUCE** — Deterministic replay of the research test with video recording
|
||||
3. **REPORT** — Deploy results to Cloudflare Pages with badge, video, and verdict
|
||||
# Test a PR
|
||||
pnpm qa 10270
|
||||
|
||||
### Key Design Decision
|
||||
# Test PR base (reproduce bug)
|
||||
pnpm qa 10270 -t base
|
||||
|
||||
Earlier iterations used AI vision (Gemini) to drive a browser and judge results from video. This was abandoned after discovering **AI reviewers hallucinate** — Gemini reported "REPRODUCED" when videos showed idle screens. The current approach uses **Playwright assertions** as the source of truth: if the test passes, the bug is proven.
|
||||
# Test both base + head
|
||||
pnpm qa 10270 -t both
|
||||
|
||||
## Prerequisites
|
||||
# Test local uncommitted changes
|
||||
pnpm qa --uncommitted
|
||||
|
||||
- Node.js 22+
|
||||
- `pnpm` package manager
|
||||
- `gh` CLI (authenticated)
|
||||
- Playwright browsers: `npx playwright install chromium`
|
||||
- Environment variables:
|
||||
- `GEMINI_API_KEY` — for PR analysis and video review
|
||||
- `ANTHROPIC_API_KEY` — for Claude Agent SDK (research phase)
|
||||
- `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` — for report deployment
|
||||
# Help
|
||||
pnpm qa -h
|
||||
```
|
||||
|
||||
Auto-loads `.env.local` / `.env` for `GEMINI_API_KEY` and `ANTHROPIC_API_KEY`.
|
||||
Auto-detects issue vs PR via GitHub API. Auto-starts ComfyUI if not running.
|
||||
|
||||
## Architecture
|
||||
|
||||
Three-phase pipeline:
|
||||
|
||||
1. **RESEARCH** — Claude Sonnet 4.6 (Agent SDK) writes Playwright E2E tests to reproduce bugs
|
||||
2. **RECORD** — Re-runs test with demowright for narrated demo video (title cards, TTS, subtitles)
|
||||
3. **REPORT** — Gemini reviews video, deploys to Cloudflare Pages with badge + verdict
|
||||
|
||||
Key principle: **Playwright assertions are truth** — no AI hallucination. If the test passes, the bug is proven.
|
||||
|
||||
## Pipeline Scripts
|
||||
|
||||
| Script | Role | Model |
|
||||
| --------------------------------- | ------------------------------------------------------- | ----------------------------- |
|
||||
| `scripts/qa-analyze-pr.ts` | Deep PR/issue analysis → QA guide | gemini-3.1-pro-preview |
|
||||
| `scripts/qa-agent.ts` | Research phase: Claude writes E2E tests | claude-sonnet-4-6 (Agent SDK) |
|
||||
| `scripts/qa-record.ts` | Before/after video recording with Gemini-driven actions | gemini-3.1-pro-preview |
|
||||
| `scripts/qa-reproduce.ts` | Deterministic replay with narration | gemini-3-flash-preview |
|
||||
| `scripts/qa-video-review.ts` | Video comparison review | gemini-3-flash-preview |
|
||||
| `scripts/qa-generate-test.ts` | Regression test generation from QA report | gemini-3-flash-preview |
|
||||
| `scripts/qa-deploy-pages.sh` | Deploy to Cloudflare Pages + badge | — |
|
||||
| `scripts/qa-batch.sh` | Batch-trigger QA for multiple issues | — |
|
||||
| `scripts/qa-report-template.html` | Report site (light/dark, seekbar, copy badge) | — |
|
||||
All scripts live in `.claude/skills/comfy-qa/scripts/`:
|
||||
|
||||
| Script | Role |
|
||||
| ------------------------- | ----------------------------------------------------- |
|
||||
| `qa.ts` | CLI entry point (`pnpm qa`) |
|
||||
| `qa-agent.ts` | Research phase: Claude writes E2E tests via Agent SDK |
|
||||
| `qa-record.ts` | Orchestrator: 3-phase pipeline |
|
||||
| `qa-deploy-pages.sh` | Cloudflare Pages deploy + badge generation |
|
||||
| `qa-report-template.html` | Report site template |
|
||||
| `qa-video-review.ts` | Gemini video review |
|
||||
| `qa-analyze-pr.ts` | Deep PR/issue analysis → QA guide |
|
||||
|
||||
## Triggering QA
|
||||
|
||||
### Via GitHub Labels
|
||||
|
||||
- **`qa-changes`** — Focused QA on a PR (Linux-only, before/after comparison)
|
||||
- **`qa-full`** — Full QA (3-OS matrix, after-only)
|
||||
- **`qa-issue`** — Reproduce a bug from an issue
|
||||
|
||||
### Via Batch Script
|
||||
### Via CLI (`pnpm qa`)
|
||||
|
||||
```bash
|
||||
# Trigger QA for specific issue numbers
|
||||
./scripts/qa-batch.sh 10394 10238 9996
|
||||
|
||||
# From a triage file (top 5 Tier 1 issues)
|
||||
./scripts/qa-batch.sh --from tmp/issues.md --top 5
|
||||
|
||||
# Preview without pushing
|
||||
./scripts/qa-batch.sh --dry-run 10394
|
||||
|
||||
# Clean up old trigger branches
|
||||
./scripts/qa-batch.sh --cleanup
|
||||
pnpm qa 10253 # issue (auto-detect)
|
||||
pnpm qa 10270 # PR head
|
||||
pnpm qa 10270 -t base # PR base
|
||||
pnpm qa 10270 -t both # both
|
||||
pnpm qa --uncommitted # local changes
|
||||
```
|
||||
|
||||
### Via Workflow Dispatch
|
||||
### Via GitHub Labels
|
||||
|
||||
Go to Actions → "PR: QA" → Run workflow → choose mode (focused/full).
|
||||
- **`qa-issue`** — Reproduce a bug from an issue
|
||||
- **`qa-changes`** — Focused QA on a PR (Linux-only, before/after)
|
||||
- **`qa-full`** — Full QA (3-OS matrix)
|
||||
|
||||
### Via Push to trigger branches
|
||||
|
||||
```bash
|
||||
git push origin sno-skills:sno-qa-10253 --force
|
||||
```
|
||||
|
||||
## Research Phase (`qa-agent.ts`)
|
||||
|
||||
Claude receives the issue/PR context + a11y tree snapshot + ComfyPage fixture API docs.
|
||||
|
||||
Tools:
|
||||
|
||||
- **`inspect(selector?)`** — Read a11y tree
|
||||
- **`readFixture(path)`** — Read fixture source code
|
||||
- **`readTest(path)`** — Read existing tests for patterns
|
||||
- **`writeTest(code)`** — Write a Playwright .spec.ts
|
||||
- **`runTest()`** — Execute and get pass/fail + errors
|
||||
- **`done(verdict, summary, evidence, testCode, videoScript?)`** — Finish
|
||||
|
||||
When `verdict=REPRODUCED`, Claude also provides a `videoScript` — a separate test file using demowright's `createVideoScript()` for professional narrated demo video with title cards, TTS segments, and outro.
|
||||
|
||||
## Video Recording (demowright)
|
||||
|
||||
Phase 2 uses the video script to record with:
|
||||
|
||||
- `showTitleCard()` / `hideTitleCard()` — covers setup/loading screen
|
||||
- `createVideoScript().title().segment().outro()` — structured narration
|
||||
- `pace()` — narration-then-action timing
|
||||
- TTS audio + subtitles + cursor overlay + key badges
|
||||
|
||||
## Report Site
|
||||
|
||||
Deployed to `https://sno-qa-{number}.comfy-qa.pages.dev/`
|
||||
|
||||
Features:
|
||||
|
||||
- Video player (1x default, adjustable speed)
|
||||
- Research log (verdict, tool calls, timing)
|
||||
- E2E test code + video script code
|
||||
- Verdict banner for NOT_REPRODUCIBLE/INCONCLUSIVE with failure reason
|
||||
- Copy badge button (markdown)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `GEMINI_API_KEY` — video review, TTS
|
||||
- `ANTHROPIC_API_KEY` — Claude Agent SDK (research phase)
|
||||
- `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` — report deployment (CI only)
|
||||
- ComfyUI server running (auto-detected, or auto-started)
|
||||
|
||||
## CI Workflow (`.github/workflows/pr-qa.yaml`)
|
||||
|
||||
```
|
||||
resolve-matrix → analyze-pr ──┐
|
||||
├→ qa-before (main branch, worktree build)
|
||||
├→ qa-before (main branch)
|
||||
├→ qa-after (PR branch)
|
||||
└→ report (video review, deploy, comment)
|
||||
```
|
||||
|
||||
Before/after jobs run **in parallel** on separate runners for clean isolation.
|
||||
|
||||
### Issue Reproduce Mode
|
||||
|
||||
For issues (not PRs), the pipeline:
|
||||
|
||||
1. Fetches the issue body and comments
|
||||
2. Runs `qa-analyze-pr.ts --type issue` to generate a QA guide
|
||||
3. Runs the research phase (Claude writes E2E test to reproduce)
|
||||
4. Records video of the test execution
|
||||
5. Posts results as a comment on the issue
|
||||
|
||||
## Running Locally
|
||||
|
||||
### Step 1: Environment Setup
|
||||
|
||||
```bash
|
||||
# Ensure ComfyUI server is running
|
||||
# Default: http://127.0.0.1:8188
|
||||
|
||||
# Install Playwright browsers
|
||||
npx playwright install chromium
|
||||
```
|
||||
|
||||
### Step 2: Analyze the Issue/PR
|
||||
|
||||
```bash
|
||||
# For a PR
|
||||
pnpm exec tsx scripts/qa-analyze-pr.ts \
|
||||
--pr-number 10394 \
|
||||
--repo Comfy-Org/ComfyUI_frontend \
|
||||
--output-dir qa-guides
|
||||
|
||||
# For an issue
|
||||
pnpm exec tsx scripts/qa-analyze-pr.ts \
|
||||
--pr-number 10394 \
|
||||
--repo Comfy-Org/ComfyUI_frontend \
|
||||
--output-dir qa-guides \
|
||||
--type issue
|
||||
```
|
||||
|
||||
### Step 3: Record Before/After
|
||||
|
||||
```bash
|
||||
# Before (main branch)
|
||||
pnpm exec tsx scripts/qa-record.ts \
|
||||
--mode before \
|
||||
--diff /tmp/pr-diff.txt \
|
||||
--output-dir /tmp/qa-before \
|
||||
--qa-guide qa-guides/qa-guide-1.json
|
||||
|
||||
# After (PR branch)
|
||||
pnpm exec tsx scripts/qa-record.ts \
|
||||
--mode after \
|
||||
--diff /tmp/pr-diff.txt \
|
||||
--output-dir /tmp/qa-after \
|
||||
--qa-guide qa-guides/qa-guide-1.json
|
||||
```
|
||||
|
||||
### Step 4: Review Videos
|
||||
|
||||
```bash
|
||||
pnpm exec tsx scripts/qa-video-review.ts \
|
||||
--artifacts-dir /tmp/qa-artifacts \
|
||||
--video-file qa-session.mp4 \
|
||||
--before-video qa-before-session.mp4 \
|
||||
--output-dir /tmp/video-reviews \
|
||||
--pr-context /tmp/pr-context.txt
|
||||
```
|
||||
|
||||
## Research Phase Details (`qa-agent.ts`)
|
||||
|
||||
Claude receives:
|
||||
|
||||
- The issue description and comments
|
||||
- A QA guide from `qa-analyze-pr.ts`
|
||||
- An accessibility tree snapshot of the current UI
|
||||
|
||||
Claude's tools:
|
||||
|
||||
- **`inspect(selector?)`** — Read a11y tree to discover element selectors
|
||||
- **`writeTest(code)`** — Write a Playwright `.spec.ts` file
|
||||
- **`runTest()`** — Execute the test and get pass/fail + errors
|
||||
- **`done(verdict, summary, evidence, testCode)`** — Finish with verdict
|
||||
|
||||
The test uses the project's Playwright fixtures (`comfyPageFixture`), giving access to `comfyPage.page`, `comfyPage.menu`, `comfyPage.settings`, etc.
|
||||
|
||||
### Verdict Logic
|
||||
|
||||
- **REPRODUCED** — Test passes (asserting the bug exists) → bug is proven
|
||||
- **NOT_REPRODUCIBLE** — Claude exhausted attempts, test cannot pass
|
||||
- **INCONCLUSIVE** — Agent timed out or encountered infrastructure issues
|
||||
|
||||
Auto-completion: if a test passed but `done()` was never called, the pipeline auto-completes with REPRODUCED.
|
||||
|
||||
## Manual QA (Fallback)
|
||||
|
||||
When the automated pipeline isn't suitable (e.g., visual-only bugs, complex multi-step interactions), use **playwright-cli** for manual browser interaction:
|
||||
|
||||
```bash
|
||||
# Install
|
||||
npm install -g @playwright/cli@latest
|
||||
|
||||
# Open browser and navigate
|
||||
playwright-cli open http://127.0.0.1:8188
|
||||
|
||||
# Get element references
|
||||
playwright-cli snapshot
|
||||
|
||||
# Interact
|
||||
playwright-cli click e1
|
||||
playwright-cli fill e2 "test text"
|
||||
playwright-cli press Escape
|
||||
playwright-cli screenshot --filename=f.png
|
||||
```
|
||||
|
||||
Snapshots return element references (`e1`, `e2`, …). Always run `snapshot` after navigation to refresh refs.
|
||||
|
||||
## Manual QA Test Plan
|
||||
|
||||
When performing manual QA (either via playwright-cli or the automated pipeline), systematically test each area below.
|
||||
|
||||
### Application Load & Routes
|
||||
|
||||
| Test | Steps |
|
||||
| ----------------- | ------------------------------------------------------------ |
|
||||
| Root route loads | Navigate to `/` — GraphView should render with canvas |
|
||||
| User select route | Navigate to `/user-select` — user selection UI should appear |
|
||||
| 404 handling | Navigate to `/nonexistent` — should handle gracefully |
|
||||
|
||||
### Canvas & Graph View
|
||||
|
||||
| Test | Steps |
|
||||
| ------------------------- | -------------------------------------------------------------- |
|
||||
| Canvas renders | The LiteGraph canvas is visible and interactive |
|
||||
| Pan canvas | Click and drag on empty canvas area |
|
||||
| Zoom in/out | Use scroll wheel or Alt+=/Alt+- |
|
||||
| Add node via double-click | Double-click canvas to open search, type "KSampler", select it |
|
||||
| Delete node | Select a node, press Delete key |
|
||||
| Connect nodes | Drag from output slot to input slot |
|
||||
| Copy/Paste | Select nodes, Ctrl+C then Ctrl+V |
|
||||
| Undo/Redo | Make changes, Ctrl+Z to undo, Ctrl+Y to redo |
|
||||
| Context menus | Right-click node vs empty canvas — different menus |
|
||||
|
||||
### Sidebar Tabs
|
||||
|
||||
| Test | Steps |
|
||||
| ----------------- | ------------------------------------- |
|
||||
| Workflows tab | Press W — workflows sidebar opens |
|
||||
| Node Library tab | Press N — node library opens |
|
||||
| Model Library tab | Press M — model library opens |
|
||||
| Tab toggle | Press same key again — sidebar closes |
|
||||
| Search in sidebar | Type in search box — results filter |
|
||||
|
||||
### Settings Dialog
|
||||
|
||||
| Test | Steps |
|
||||
| ---------------- | ---------------------------------------------------- |
|
||||
| Open settings | Press Ctrl+, or click settings button |
|
||||
| Change a setting | Toggle a boolean setting — it persists after closing |
|
||||
| Search settings | Type in settings search box — results filter |
|
||||
| Close settings | Press Escape or click close button |
|
||||
|
||||
### Execution & Queue
|
||||
|
||||
| Test | Steps |
|
||||
| -------------- | ----------------------------------------------------- |
|
||||
| Queue prompt | Load default workflow, click Queue — execution starts |
|
||||
| Queue progress | Progress indicator shows during execution |
|
||||
| Interrupt | Press Ctrl+Alt+Enter during execution — interrupts |
|
||||
|
||||
## Report Site
|
||||
|
||||
Deployed to Cloudflare Pages at `https://comfy-qa.pages.dev/<branch>/`.
|
||||
|
||||
Features:
|
||||
|
||||
- Light/dark theme
|
||||
- Seekable video player with preload
|
||||
- Copy badge button (markdown)
|
||||
- Date-stamped badges (e.g., `QA0327`)
|
||||
- Vertical box badge for issues and PRs
|
||||
|
||||
## Known Issues & Troubleshooting
|
||||
|
||||
See `docs/qa/TROUBLESHOOTING.md` for common failures:
|
||||
|
||||
- `set -euo pipefail` + grep with no match → append `|| true`
|
||||
- `__name is not defined` in `page.evaluate` → use `addScriptTag`
|
||||
- Cursor not visible in videos → monkey-patch `page.mouse` methods
|
||||
- Agent not calling `done()` → auto-complete from passing test
|
||||
|
||||
## Backlog
|
||||
|
||||
See `docs/qa/backlog.md` for planned improvements:
|
||||
|
||||
- **Type B comparison**: Different commits for regression detection
|
||||
- **Type C comparison**: Cross-browser testing
|
||||
- **Pre-seed assets**: Upload test images before recording
|
||||
- **Lazy a11y tree**: Reduce token usage with `inspect(selector)` vs full dump
|
||||
|
||||
Reference in New Issue
Block a user