docs: update SKILL.md with current CLI and architecture

2026-04-19 22:09:37 +00:00 · 2026-04-13 16:21:02 +00:00
1 changed files with 101 additions and 251 deletions
--- a/.claude/skills/comfy-qa/SKILL.md
+++ b/.claude/skills/comfy-qa/SKILL.md
@@ -1,283 +1,133 @@
 ---
 name: comfy-qa
-description: 'Comprehensive QA of ComfyUI frontend. Navigates all routes, tests all interactive features using playwright-cli, generates a report, and submits a draft PR. Works in CI and local environments, cross-platform.'
+description: 'Comprehensive QA of ComfyUI frontend. Reproduces bugs via E2E tests, records narrated demo videos, deploys reports. Works in CI and locally via `pnpm qa`.'
 ---

 # ComfyUI Frontend QA Skill

-Automated quality assurance for the ComfyUI frontend. The pipeline reproduces reported bugs using Playwright E2E tests, records video evidence, and deploys reports to Cloudflare Pages.
+Automated quality assurance pipeline that reproduces reported bugs using Playwright E2E tests, records narrated demo videos with demowright, and deploys reports to Cloudflare Pages.

-## Architecture Overview
+## Quick Start

-The QA pipeline uses a **three-phase approach**:
+```bash
+# Reproduce an issue
+pnpm qa 10253

-1. **RESEARCH** — Claude writes Playwright E2E tests to reproduce bugs (assertion-backed, no hallucination)
-2. **REPRODUCE** — Deterministic replay of the research test with video recording
-3. **REPORT** — Deploy results to Cloudflare Pages with badge, video, and verdict
+# Test a PR
+pnpm qa 10270

-### Key Design Decision
+# Test PR base (reproduce bug)
+pnpm qa 10270 -t base

-Earlier iterations used AI vision (Gemini) to drive a browser and judge results from video. This was abandoned after discovering **AI reviewers hallucinate** — Gemini reported "REPRODUCED" when videos showed idle screens. The current approach uses **Playwright assertions** as the source of truth: if the test passes, the bug is proven.
+# Test both base + head
+pnpm qa 10270 -t both

-## Prerequisites
+# Test local uncommitted changes
+pnpm qa --uncommitted

- Node.js 22+
- `pnpm` package manager
- `gh` CLI (authenticated)
- Playwright browsers: `npx playwright install chromium`
- Environment variables:
-  - `GEMINI_API_KEY` — for PR analysis and video review
-  - `ANTHROPIC_API_KEY` — for Claude Agent SDK (research phase)
-  - `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` — for report deployment
+# Help
+pnpm qa -h
+```
+
+Auto-loads `.env.local` / `.env` for `GEMINI_API_KEY` and `ANTHROPIC_API_KEY`.
+Auto-detects issue vs PR via GitHub API. Auto-starts ComfyUI if not running.
+
+## Architecture
+
+Three-phase pipeline:
+
+1. **RESEARCH** — Claude Sonnet 4.6 (Agent SDK) writes Playwright E2E tests to reproduce bugs
+2. **RECORD** — Re-runs test with demowright for narrated demo video (title cards, TTS, subtitles)
+3. **REPORT** — Gemini reviews video, deploys to Cloudflare Pages with badge + verdict
+
+Key principle: **Playwright assertions are truth** — no AI hallucination. If the test passes, the bug is proven.

 ## Pipeline Scripts

-| Script                            | Role                                                    | Model                         |
-| --------------------------------- | ------------------------------------------------------- | ----------------------------- |
-| `scripts/qa-analyze-pr.ts`        | Deep PR/issue analysis → QA guide                       | gemini-3.1-pro-preview        |
-| `scripts/qa-agent.ts`             | Research phase: Claude writes E2E tests                 | claude-sonnet-4-6 (Agent SDK) |
-| `scripts/qa-record.ts`            | Before/after video recording with Gemini-driven actions | gemini-3.1-pro-preview        |
-| `scripts/qa-reproduce.ts`         | Deterministic replay with narration                     | gemini-3-flash-preview        |
-| `scripts/qa-video-review.ts`      | Video comparison review                                 | gemini-3-flash-preview        |
-| `scripts/qa-generate-test.ts`     | Regression test generation from QA report               | gemini-3-flash-preview        |
-| `scripts/qa-deploy-pages.sh`      | Deploy to Cloudflare Pages + badge                      | —                             |
-| `scripts/qa-batch.sh`             | Batch-trigger QA for multiple issues                    | —                             |
-| `scripts/qa-report-template.html` | Report site (light/dark, seekbar, copy badge)           | —                             |
+All scripts live in `.claude/skills/comfy-qa/scripts/`:
+
+| Script                    | Role                                                  |
+| ------------------------- | ----------------------------------------------------- |
+| `qa.ts`                   | CLI entry point (`pnpm qa`)                           |
+| `qa-agent.ts`             | Research phase: Claude writes E2E tests via Agent SDK |
+| `qa-record.ts`            | Orchestrator: 3-phase pipeline                        |
+| `qa-deploy-pages.sh`      | Cloudflare Pages deploy + badge generation            |
+| `qa-report-template.html` | Report site template                                  |
+| `qa-video-review.ts`      | Gemini video review                                   |
+| `qa-analyze-pr.ts`        | Deep PR/issue analysis → QA guide                     |

 ## Triggering QA

-### Via GitHub Labels
-
- **`qa-changes`** — Focused QA on a PR (Linux-only, before/after comparison)
- **`qa-full`** — Full QA (3-OS matrix, after-only)
- **`qa-issue`** — Reproduce a bug from an issue
-
-### Via Batch Script
+### Via CLI (`pnpm qa`)

 ```bash
-# Trigger QA for specific issue numbers
-./scripts/qa-batch.sh 10394 10238 9996
-
-# From a triage file (top 5 Tier 1 issues)
-./scripts/qa-batch.sh --from tmp/issues.md --top 5
-
-# Preview without pushing
-./scripts/qa-batch.sh --dry-run 10394
-
-# Clean up old trigger branches
-./scripts/qa-batch.sh --cleanup
+pnpm qa 10253                  # issue (auto-detect)
+pnpm qa 10270                  # PR head
+pnpm qa 10270 -t base          # PR base
+pnpm qa 10270 -t both          # both
+pnpm qa --uncommitted          # local changes
 ```

-### Via Workflow Dispatch
+### Via GitHub Labels

-Go to Actions → "PR: QA" → Run workflow → choose mode (focused/full).
+- **`qa-issue`** — Reproduce a bug from an issue
+- **`qa-changes`** — Focused QA on a PR (Linux-only, before/after)
+- **`qa-full`** — Full QA (3-OS matrix)
+
+### Via Push to trigger branches
+
+```bash
+git push origin sno-skills:sno-qa-10253 --force
+```
+
+## Research Phase (`qa-agent.ts`)
+
+Claude receives the issue/PR context + a11y tree snapshot + ComfyPage fixture API docs.
+
+Tools:
+
+- **`inspect(selector?)`** — Read a11y tree
+- **`readFixture(path)`** — Read fixture source code
+- **`readTest(path)`** — Read existing tests for patterns
+- **`writeTest(code)`** — Write a Playwright .spec.ts
+- **`runTest()`** — Execute and get pass/fail + errors
+- **`done(verdict, summary, evidence, testCode, videoScript?)`** — Finish
+
+When `verdict=REPRODUCED`, Claude also provides a `videoScript` — a separate test file using demowright's `createVideoScript()` for professional narrated demo video with title cards, TTS segments, and outro.
+
+## Video Recording (demowright)
+
+Phase 2 uses the video script to record with:
+
+- `showTitleCard()` / `hideTitleCard()` — covers setup/loading screen
+- `createVideoScript().title().segment().outro()` — structured narration
+- `pace()` — narration-then-action timing
+- TTS audio + subtitles + cursor overlay + key badges
+
+## Report Site
+
+Deployed to `https://sno-qa-{number}.comfy-qa.pages.dev/`
+
+Features:
+
+- Video player (1x default, adjustable speed)
+- Research log (verdict, tool calls, timing)
+- E2E test code + video script code
+- Verdict banner for NOT_REPRODUCIBLE/INCONCLUSIVE with failure reason
+- Copy badge button (markdown)
+
+## Prerequisites
+
+- `GEMINI_API_KEY` — video review, TTS
+- `ANTHROPIC_API_KEY` — Claude Agent SDK (research phase)
+- `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` — report deployment (CI only)
+- ComfyUI server running (auto-detected, or auto-started)

 ## CI Workflow (`.github/workflows/pr-qa.yaml`)

 ```
 resolve-matrix → analyze-pr ──┐
-                               ├→ qa-before (main branch, worktree build)
+                               ├→ qa-before (main branch)
                               ├→ qa-after  (PR branch)
                               └→ report (video review, deploy, comment)
 ```
-
-Before/after jobs run **in parallel** on separate runners for clean isolation.
-
-### Issue Reproduce Mode
-
-For issues (not PRs), the pipeline:
-
-1. Fetches the issue body and comments
-2. Runs `qa-analyze-pr.ts --type issue` to generate a QA guide
-3. Runs the research phase (Claude writes E2E test to reproduce)
-4. Records video of the test execution
-5. Posts results as a comment on the issue
-
-## Running Locally
-
-### Step 1: Environment Setup
-
-```bash
-# Ensure ComfyUI server is running
-# Default: http://127.0.0.1:8188
-
-# Install Playwright browsers
-npx playwright install chromium
-```
-
-### Step 2: Analyze the Issue/PR
-
-```bash
-# For a PR
-pnpm exec tsx scripts/qa-analyze-pr.ts \
-  --pr-number 10394 \
-  --repo Comfy-Org/ComfyUI_frontend \
-  --output-dir qa-guides
-
-# For an issue
-pnpm exec tsx scripts/qa-analyze-pr.ts \
-  --pr-number 10394 \
-  --repo Comfy-Org/ComfyUI_frontend \
-  --output-dir qa-guides \
-  --type issue
-```
-
-### Step 3: Record Before/After
-
-```bash
-# Before (main branch)
-pnpm exec tsx scripts/qa-record.ts \
-  --mode before \
-  --diff /tmp/pr-diff.txt \
-  --output-dir /tmp/qa-before \
-  --qa-guide qa-guides/qa-guide-1.json
-
-# After (PR branch)
-pnpm exec tsx scripts/qa-record.ts \
-  --mode after \
-  --diff /tmp/pr-diff.txt \
-  --output-dir /tmp/qa-after \
-  --qa-guide qa-guides/qa-guide-1.json
-```
-
-### Step 4: Review Videos
-
-```bash
-pnpm exec tsx scripts/qa-video-review.ts \
-  --artifacts-dir /tmp/qa-artifacts \
-  --video-file qa-session.mp4 \
-  --before-video qa-before-session.mp4 \
-  --output-dir /tmp/video-reviews \
-  --pr-context /tmp/pr-context.txt
-```
-
-## Research Phase Details (`qa-agent.ts`)
-
-Claude receives:
-
- The issue description and comments
- A QA guide from `qa-analyze-pr.ts`
- An accessibility tree snapshot of the current UI
-
-Claude's tools:
-
- **`inspect(selector?)`** — Read a11y tree to discover element selectors
- **`writeTest(code)`** — Write a Playwright `.spec.ts` file
- **`runTest()`** — Execute the test and get pass/fail + errors
- **`done(verdict, summary, evidence, testCode)`** — Finish with verdict
-
-The test uses the project's Playwright fixtures (`comfyPageFixture`), giving access to `comfyPage.page`, `comfyPage.menu`, `comfyPage.settings`, etc.
-
-### Verdict Logic
-
- **REPRODUCED** — Test passes (asserting the bug exists) → bug is proven
- **NOT_REPRODUCIBLE** — Claude exhausted attempts, test cannot pass
- **INCONCLUSIVE** — Agent timed out or encountered infrastructure issues
-
-Auto-completion: if a test passed but `done()` was never called, the pipeline auto-completes with REPRODUCED.
-
-## Manual QA (Fallback)
-
-When the automated pipeline isn't suitable (e.g., visual-only bugs, complex multi-step interactions), use **playwright-cli** for manual browser interaction:
-
-```bash
-# Install
-npm install -g @playwright/cli@latest
-
-# Open browser and navigate
-playwright-cli open http://127.0.0.1:8188
-
-# Get element references
-playwright-cli snapshot
-
-# Interact
-playwright-cli click e1
-playwright-cli fill e2 "test text"
-playwright-cli press Escape
-playwright-cli screenshot --filename=f.png
-```
-
-Snapshots return element references (`e1`, `e2`, …). Always run `snapshot` after navigation to refresh refs.
-
-## Manual QA Test Plan
-
-When performing manual QA (either via playwright-cli or the automated pipeline), systematically test each area below.
-
-### Application Load & Routes
-
-| Test              | Steps                                                        |
-| ----------------- | ------------------------------------------------------------ |
-| Root route loads  | Navigate to `/` — GraphView should render with canvas        |
-| User select route | Navigate to `/user-select` — user selection UI should appear |
-| 404 handling      | Navigate to `/nonexistent` — should handle gracefully        |
-
-### Canvas & Graph View
-
-| Test                      | Steps                                                          |
-| ------------------------- | -------------------------------------------------------------- |
-| Canvas renders            | The LiteGraph canvas is visible and interactive                |
-| Pan canvas                | Click and drag on empty canvas area                            |
-| Zoom in/out               | Use scroll wheel or Alt+=/Alt+-                                |
-| Add node via double-click | Double-click canvas to open search, type "KSampler", select it |
-| Delete node               | Select a node, press Delete key                                |
-| Connect nodes             | Drag from output slot to input slot                            |
-| Copy/Paste                | Select nodes, Ctrl+C then Ctrl+V                               |
-| Undo/Redo                 | Make changes, Ctrl+Z to undo, Ctrl+Y to redo                   |
-| Context menus             | Right-click node vs empty canvas — different menus             |
-
-### Sidebar Tabs
-
-| Test              | Steps                                 |
-| ----------------- | ------------------------------------- |
-| Workflows tab     | Press W — workflows sidebar opens     |
-| Node Library tab  | Press N — node library opens          |
-| Model Library tab | Press M — model library opens         |
-| Tab toggle        | Press same key again — sidebar closes |
-| Search in sidebar | Type in search box — results filter   |
-
-### Settings Dialog
-
-| Test             | Steps                                                |
-| ---------------- | ---------------------------------------------------- |
-| Open settings    | Press Ctrl+, or click settings button                |
-| Change a setting | Toggle a boolean setting — it persists after closing |
-| Search settings  | Type in settings search box — results filter         |
-| Close settings   | Press Escape or click close button                   |
-
-### Execution & Queue
-
-| Test           | Steps                                                 |
-| -------------- | ----------------------------------------------------- |
-| Queue prompt   | Load default workflow, click Queue — execution starts |
-| Queue progress | Progress indicator shows during execution             |
-| Interrupt      | Press Ctrl+Alt+Enter during execution — interrupts    |
-
-## Report Site
-
-Deployed to Cloudflare Pages at `https://comfy-qa.pages.dev/<branch>/`.
-
-Features:
-
- Light/dark theme
- Seekable video player with preload
- Copy badge button (markdown)
- Date-stamped badges (e.g., `QA0327`)
- Vertical box badge for issues and PRs
-
-## Known Issues & Troubleshooting
-
-See `docs/qa/TROUBLESHOOTING.md` for common failures:
-
- `set -euo pipefail` + grep with no match → append `|| true`
- `__name is not defined` in `page.evaluate` → use `addScriptTag`
- Cursor not visible in videos → monkey-patch `page.mouse` methods
- Agent not calling `done()` → auto-complete from passing test
-
-## Backlog
-
-See `docs/qa/backlog.md` for planned improvements:
-
- **Type B comparison**: Different commits for regression detection
- **Type C comparison**: Cross-browser testing
- **Pre-seed assets**: Upload test images before recording
- **Lazy a11y tree**: Reduce token usage with `inspect(selector)` vs full dump