feat: readFixture/readTest tools, ANTHROPIC_API_KEY_QA, fix TS errors

- Add readFixture and readTest tools to qa-agent for fixture API discovery - Enrich system prompt with comprehensive ComfyPage fixture API reference - Switch CI to ANTHROPIC_API_KEY_QA secret - Fix all TS errors in qa-agent.ts, qa-record.ts, qa-reproduce.ts - Better error handling for API credit exhaustion - Rewrite SKILL.md to reflect three-phase pipeline Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800 Co-authored-by: Amp <amp@ampcode.com>
2026-04-20 06:20:11 +00:00 · 2026-04-01 06:44:34 +00:00
parent ed96aaafc6
commit 854f1c7da0
5 changed files with 456 additions and 347 deletions
--- a/.claude/skills/comfy-qa/SKILL.md
+++ b/.claude/skills/comfy-qa/SKILL.md
@@ -5,357 +5,273 @@ description: 'Comprehensive QA of ComfyUI frontend. Navigates all routes, tests

 # ComfyUI Frontend QA Skill

-Perform comprehensive quality assurance of the ComfyUI frontend application by navigating all routes, clicking interactive elements, and testing features. Generate a structured report and submit it as a draft PR.
+Automated quality assurance for the ComfyUI frontend. The pipeline reproduces reported bugs using Playwright E2E tests, records video evidence, and deploys reports to Cloudflare Pages.
+
+## Architecture Overview
+
+The QA pipeline uses a **three-phase approach**:
+
+1. **RESEARCH** — Claude writes Playwright E2E tests to reproduce bugs (assertion-backed, no hallucination)
+2. **REPRODUCE** — Deterministic replay of the research test with video recording
+3. **REPORT** — Deploy results to Cloudflare Pages with badge, video, and verdict
+
+### Key Design Decision
+
+Earlier iterations used AI vision (Gemini) to drive a browser and judge results from video. This was abandoned after discovering **AI reviewers hallucinate** — Gemini reported "REPRODUCED" when videos showed idle screens. The current approach uses **Playwright assertions** as the source of truth: if the test passes, the bug is proven.

 ## Prerequisites

 - Node.js 22+
 - `pnpm` package manager
 - `gh` CLI (authenticated)
- `playwright-cli` (browser automation): `npm install -g @playwright/cli@latest`
+- Playwright browsers: `npx playwright install chromium`
+- Environment variables:
+  - `GEMINI_API_KEY` — for PR analysis and video review
+  - `ANTHROPIC_API_KEY` — for Claude Agent SDK (research phase)
+  - `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` — for report deployment

-## Step 1: Environment Detection & Setup
+## Pipeline Scripts

-Detect the runtime environment and ensure the app is accessible.
+| Script | Role | Model |
+|---|---|---|
+| `scripts/qa-analyze-pr.ts` | Deep PR/issue analysis → QA guide | gemini-3.1-pro-preview |
+| `scripts/qa-agent.ts` | Research phase: Claude writes E2E tests | claude-sonnet-4-6 (Agent SDK) |
+| `scripts/qa-record.ts` | Before/after video recording with Gemini-driven actions | gemini-3.1-pro-preview |
+| `scripts/qa-reproduce.ts` | Deterministic replay with narration | gemini-3-flash-preview |
+| `scripts/qa-video-review.ts` | Video comparison review | gemini-3-flash-preview |
+| `scripts/qa-generate-test.ts` | Regression test generation from QA report | gemini-3-flash-preview |
+| `scripts/qa-deploy-pages.sh` | Deploy to Cloudflare Pages + badge | — |
+| `scripts/qa-batch.sh` | Batch-trigger QA for multiple issues | — |
+| `scripts/qa-report-template.html` | Report site (light/dark, seekbar, copy badge) | — |

-### CI Environment
+## Triggering QA

-If `CI=true` is set:
+### Via GitHub Labels

-1. The ComfyUI backend is pre-configured in the CI container (`ghcr.io/comfy-org/comfyui-ci-container`)
-2. Frontend dist is already built and served by the backend
-3. Server runs at `http://127.0.0.1:8188`
-4. Skip user prompts — run fully automated
+- **`qa-changes`** — Focused QA on a PR (Linux-only, before/after comparison)
+- **`qa-full`** — Full QA (3-OS matrix, after-only)
+- **`qa-issue`** — Reproduce a bug from an issue

-### Local Environment
-
-If `CI` is not set:
-
-1. **Ask the user**: "Is a ComfyUI server already running? If so, what URL? (default: http://127.0.0.1:8188)"
-   - If yes: use the provided URL
-   - If no: offer to start one:
-
-     ```bash
-     # Option A: Use existing ComfyUI installation
-     # Ask for the path to ComfyUI, then:
-     cd <comfyui_path>
-     python main.py --cpu --multi-user --front-end-root <frontend_dist_path> &
-
-     # Option B: Build frontend and use preview server (no backend features)
-     pnpm build && pnpm preview &
-     ```
-
-2. Wait for server readiness by polling the URL (retry with 2s intervals, 60s timeout)
-
-### Browser Automation Setup
-
-Use **playwright-cli** for browser interaction via Bash commands:
+### Via Batch Script

 ```bash
-playwright-cli open http://127.0.0.1:8188   # open browser and navigate
-playwright-cli snapshot                      # capture snapshot with element refs
-playwright-cli click e1                      # click by element ref from snapshot
-playwright-cli press Tab                     # keyboard shortcuts
-playwright-cli screenshot --filename=f.png  # save screenshot
+# Trigger QA for specific issue numbers
+./scripts/qa-batch.sh 10394 10238 9996
+
+# From a triage file (top 5 Tier 1 issues)
+./scripts/qa-batch.sh --from tmp/issues.md --top 5
+
+# Preview without pushing
+./scripts/qa-batch.sh --dry-run 10394
+
+# Clean up old trigger branches
+./scripts/qa-batch.sh --cleanup
 ```

-playwright-cli is headless by default (CI-friendly). Each command outputs the current page snapshot with element references (`e1`, `e2`, …) that you use for subsequent `click`, `fill`, `hover` commands. Always run `snapshot` before interacting to get fresh refs.
+### Via Workflow Dispatch

-For local dev servers behind proxies, adjust the URL accordingly (e.g., `https://[port].stukivx.xyz` pattern if configured).
+Go to Actions → "PR: QA" → Run workflow → choose mode (focused/full).

-## Step 2: QA Test Plan
+## CI Workflow (`.github/workflows/pr-qa.yaml`)

-Navigate to the application URL and systematically test each area below. For each test, record:
+```
+resolve-matrix → analyze-pr ──┐
+                               ├→ qa-before (main branch, worktree build)
+                               ├→ qa-after  (PR branch)
+                               └→ report (video review, deploy, comment)
+```

- **Status**: pass / fail / skip (with reason)
- **Notes**: any issues, unexpected behavior, or visual glitches
- **Screenshots**: take screenshots of failures or notable states
+Before/after jobs run **in parallel** on separate runners for clean isolation.

-### 2.1 Application Load & Routes
+### Issue Reproduce Mode

-| Test              | Steps                                                        |
-| ----------------- | ------------------------------------------------------------ |
-| Root route loads  | Navigate to `/` — GraphView should render with canvas        |
+For issues (not PRs), the pipeline:
+1. Fetches the issue body and comments
+2. Runs `qa-analyze-pr.ts --type issue` to generate a QA guide
+3. Runs the research phase (Claude writes E2E test to reproduce)
+4. Records video of the test execution
+5. Posts results as a comment on the issue
+
+## Running Locally
+
+### Step 1: Environment Setup
+
+```bash
+# Ensure ComfyUI server is running
+# Default: http://127.0.0.1:8188
+
+# Install Playwright browsers
+npx playwright install chromium
+```
+
+### Step 2: Analyze the Issue/PR
+
+```bash
+# For a PR
+pnpm exec tsx scripts/qa-analyze-pr.ts \
+  --pr-number 10394 \
+  --repo Comfy-Org/ComfyUI_frontend \
+  --output-dir qa-guides
+
+# For an issue
+pnpm exec tsx scripts/qa-analyze-pr.ts \
+  --pr-number 10394 \
+  --repo Comfy-Org/ComfyUI_frontend \
+  --output-dir qa-guides \
+  --type issue
+```
+
+### Step 3: Record Before/After
+
+```bash
+# Before (main branch)
+pnpm exec tsx scripts/qa-record.ts \
+  --mode before \
+  --diff /tmp/pr-diff.txt \
+  --output-dir /tmp/qa-before \
+  --qa-guide qa-guides/qa-guide-1.json
+
+# After (PR branch)
+pnpm exec tsx scripts/qa-record.ts \
+  --mode after \
+  --diff /tmp/pr-diff.txt \
+  --output-dir /tmp/qa-after \
+  --qa-guide qa-guides/qa-guide-1.json
+```
+
+### Step 4: Review Videos
+
+```bash
+pnpm exec tsx scripts/qa-video-review.ts \
+  --artifacts-dir /tmp/qa-artifacts \
+  --video-file qa-session.mp4 \
+  --before-video qa-before-session.mp4 \
+  --output-dir /tmp/video-reviews \
+  --pr-context /tmp/pr-context.txt
+```
+
+## Research Phase Details (`qa-agent.ts`)
+
+Claude receives:
+- The issue description and comments
+- A QA guide from `qa-analyze-pr.ts`
+- An accessibility tree snapshot of the current UI
+
+Claude's tools:
+- **`inspect(selector?)`** — Read a11y tree to discover element selectors
+- **`writeTest(code)`** — Write a Playwright `.spec.ts` file
+- **`runTest()`** — Execute the test and get pass/fail + errors
+- **`done(verdict, summary, evidence, testCode)`** — Finish with verdict
+
+The test uses the project's Playwright fixtures (`comfyPageFixture`), giving access to `comfyPage.page`, `comfyPage.menu`, `comfyPage.settings`, etc.
+
+### Verdict Logic
+
+- **REPRODUCED** — Test passes (asserting the bug exists) → bug is proven
+- **NOT_REPRODUCIBLE** — Claude exhausted attempts, test cannot pass
+- **INCONCLUSIVE** — Agent timed out or encountered infrastructure issues
+
+Auto-completion: if a test passed but `done()` was never called, the pipeline auto-completes with REPRODUCED.
+
+## Manual QA (Fallback)
+
+When the automated pipeline isn't suitable (e.g., visual-only bugs, complex multi-step interactions), use **playwright-cli** for manual browser interaction:
+
+```bash
+# Install
+npm install -g @playwright/cli@latest
+
+# Open browser and navigate
+playwright-cli open http://127.0.0.1:8188
+
+# Get element references
+playwright-cli snapshot
+
+# Interact
+playwright-cli click e1
+playwright-cli fill e2 "test text"
+playwright-cli press Escape
+playwright-cli screenshot --filename=f.png
+```
+
+Snapshots return element references (`e1`, `e2`, …). Always run `snapshot` after navigation to refresh refs.
+
+## Manual QA Test Plan
+
+When performing manual QA (either via playwright-cli or the automated pipeline), systematically test each area below.
+
+### Application Load & Routes
+
+| Test | Steps |
+|---|---|
+| Root route loads | Navigate to `/` — GraphView should render with canvas |
 | User select route | Navigate to `/user-select` — user selection UI should appear |
-| Default redirect  | If multi-user mode, `/` redirects to `/user-select` first    |
-| 404 handling      | Navigate to `/nonexistent` — should handle gracefully        |
-
-### 2.2 Canvas & Graph View
-
-| Test                      | Steps                                                          |
-| ------------------------- | -------------------------------------------------------------- |
-| Canvas renders            | The LiteGraph canvas is visible and interactive                |
-| Pan canvas                | Click and drag on empty canvas area                            |
-| Zoom in/out               | Use scroll wheel or Alt+=/Alt+-                                |
-| Fit view                  | Press `.` key — canvas fits to content                         |
-| Add node via double-click | Double-click canvas to open search, type "KSampler", select it |
-| Add node via search       | Open search box, find and add a node                           |
-| Delete node               | Select a node, press Delete key                                |
-| Connect nodes             | Drag from output slot to input slot                            |
-| Disconnect nodes          | Right-click a link and remove, or drag from connected slot     |
-| Multi-select              | Shift+click or drag-select multiple nodes                      |
-| Copy/Paste                | Select nodes, Ctrl+C then Ctrl+V                               |
-| Undo/Redo                 | Make changes, Ctrl+Z to undo, Ctrl+Y to redo                   |
-| Node context menu         | Right-click a node — menu appears with all expected options    |
-| Canvas context menu       | Right-click empty canvas — menu appears                        |
-
-### 2.3 Node Operations
-
-| Test                | Steps                                                      |
-| ------------------- | ---------------------------------------------------------- |
-| Bypass node         | Select node, Ctrl+B — node shows bypass state              |
-| Mute node           | Select node, Ctrl+M — node shows muted state               |
-| Collapse node       | Select node, Alt+C — node collapses                        |
-| Pin node            | Select node, press P — node becomes pinned                 |
-| Rename node         | Double-click node title — edit mode activates              |
-| Node color          | Right-click > Color — color picker works                   |
-| Group nodes         | Select multiple nodes, Ctrl+G — group created              |
-| Ungroup             | Right-click group > Ungroup                                |
-| Widget interactions | Toggle checkboxes, adjust sliders, type in text fields     |
-| Combo widget        | Click dropdown widgets — options appear and are selectable |
-
-### 2.4 Sidebar Tabs
-
-| Test                   | Steps                                                  |
-| ---------------------- | ------------------------------------------------------ |
-| Workflows tab          | Press W — workflows sidebar opens with saved workflows |
-| Node Library tab       | Press N — node library opens with categories           |
-| Model Library tab      | Press M — model library opens                          |
-| Assets tab             | Press A — assets browser opens                         |
-| Tab toggle             | Press same key again — sidebar closes                  |
-| Search in sidebar      | Type in search box — results filter                    |
-| Drag node from library | Drag a node from library onto canvas                   |
-
-### 2.5 Topbar & Workflow Tabs
-
-| Test                 | Steps                                                  |
-| -------------------- | ------------------------------------------------------ |
-| Workflow tab display | Current workflow name shown in tab bar                 |
-| New workflow         | Ctrl+N — new blank workflow created                    |
-| Rename workflow      | Double-click workflow tab                              |
-| Tab context menu     | Right-click workflow tab — menu with Close/Rename/etc. |
-| Multiple tabs        | Open multiple workflows, switch between them           |
-| Queue button         | Click Queue/Run button — prompt queues                 |
-| Batch count          | Click batch count editor, change value                 |
-| Menu hamburger       | Click hamburger menu — options appear                  |
-
-### 2.6 Settings Dialog
-
-| Test             | Steps                                                |
-| ---------------- | ---------------------------------------------------- |
-| Open settings    | Press Ctrl+, or click settings button                |
-| Settings tabs    | Navigate through all setting categories              |
-| Change a setting | Toggle a boolean setting — it persists after closing |
-| Search settings  | Type in settings search box — results filter         |
-| Keybindings tab  | Navigate to keybindings panel                        |
-| About tab        | Navigate to about panel — version info shown         |
-| Close settings   | Press Escape or click close button                   |
-
-### 2.7 Bottom Panel
-
-| Test                | Steps                                  |
-| ------------------- | -------------------------------------- |
-| Toggle panel        | Press Ctrl+` — bottom panel opens      |
-| Logs tab            | Logs/terminal tab shows server output  |
-| Shortcuts tab       | Shortcuts reference is displayed       |
-| Keybindings display | Press Ctrl+Shift+K — keybindings panel |
-
-### 2.8 Execution & Queue
-
-| Test           | Steps                                                 |
-| -------------- | ----------------------------------------------------- |
-| Queue prompt   | Load default workflow, click Queue — execution starts |
-| Queue progress | Progress indicator shows during execution             |
-| Interrupt      | Press Ctrl+Alt+Enter during execution — interrupts    |
-| Job history    | Open job history sidebar — past executions listed     |
-| Clear history  | Clear execution history via menu                      |
-
-### 2.9 Workflow File Operations
-
-| Test            | Steps                                             |
-| --------------- | ------------------------------------------------- |
-| Save workflow   | Ctrl+S — workflow saves (check for prompt if new) |
-| Open workflow   | Ctrl+O — file picker or workflow browser opens    |
-| Export JSON     | Menu > Export — workflow JSON downloads           |
-| Import workflow | Drag a .json workflow file onto canvas            |
-| Load default    | Menu > Load Default — default workflow loads      |
-| Clear workflow  | Menu > Clear — canvas clears (after confirmation) |
-
-### 2.10 Advanced Features
-
-| Test            | Steps                                             |
-| --------------- | ------------------------------------------------- |
-| Minimap         | Alt+M — minimap toggle                            |
-| Focus mode      | Toggle focus mode                                 |
-| Canvas lock     | Press H to lock, V to unlock                      |
-| Link visibility | Ctrl+Shift+L — toggle links                       |
-| Subgraph        | Select nodes > Ctrl+Shift+E — convert to subgraph |
-
-### 2.11 Error Handling
-
-| Test                  | Steps                                        |
-| --------------------- | -------------------------------------------- |
-| Missing nodes dialog  | Load workflow with non-existent node types   |
-| Missing models dialog | Trigger missing model warning                |
-| Network error         | Disconnect backend, verify graceful handling |
-| Invalid workflow      | Try loading malformed JSON                   |
-
-### 2.12 Responsive & Accessibility
-
-| Test                | Steps                                 |
-| ------------------- | ------------------------------------- |
-| Window resize       | Resize browser window — layout adapts |
-| Keyboard navigation | Tab through interactive elements      |
-| Sidebar resize      | Drag sidebar edge to resize           |
-
-## Step 3: Generate Report
-
-After completing all tests, generate a markdown report file.
-
-### Report Location
-
-```
-docs/qa/YYYY-MM-DD-NNN-report.md
-```
-
-Where:
-
- `YYYY-MM-DD` is today's date
- `NNN` is a zero-padded increment index (001, 002, etc.)
-
-To determine the increment, check existing files:
-
-```bash
-ls docs/qa/ | grep "$(date +%Y-%m-%d)" | wc -l
-```
-
-### Report Template
-
-```markdown
-# QA Report: ComfyUI Frontend
-
-**Date**: YYYY-MM-DD
-**Environment**: CI / Local (OS, Browser)
-**Frontend Version**: (git sha or version)
-**Agent**: Claude / Codex / Other
-**Server URL**: http://...
-
-## Summary
-
-| Category        | Pass | Fail | Skip | Total |
-| --------------- | ---- | ---- | ---- | ----- |
-| Routes & Load   |      |      |      |       |
-| Canvas          |      |      |      |       |
-| Node Operations |      |      |      |       |
-| Sidebar         |      |      |      |       |
-| Topbar          |      |      |      |       |
-| Settings        |      |      |      |       |
-| Bottom Panel    |      |      |      |       |
-| Execution       |      |      |      |       |
-| File Operations |      |      |      |       |
-| Advanced        |      |      |      |       |
-| Error Handling  |      |      |      |       |
-| Responsive      |      |      |      |       |
-| **Total**       |      |      |      |       |
-
-## Results
-
-### Routes & Load
-
- [x] Root route loads — pass
- [ ] ...
+| 404 handling | Navigate to `/nonexistent` — should handle gracefully |

 ### Canvas & Graph View

- [x] Canvas renders — pass
- [ ] ...
+| Test | Steps |
+|---|---|
+| Canvas renders | The LiteGraph canvas is visible and interactive |
+| Pan canvas | Click and drag on empty canvas area |
+| Zoom in/out | Use scroll wheel or Alt+=/Alt+- |
+| Add node via double-click | Double-click canvas to open search, type "KSampler", select it |
+| Delete node | Select a node, press Delete key |
+| Connect nodes | Drag from output slot to input slot |
+| Copy/Paste | Select nodes, Ctrl+C then Ctrl+V |
+| Undo/Redo | Make changes, Ctrl+Z to undo, Ctrl+Y to redo |
+| Context menus | Right-click node vs empty canvas — different menus |

-(repeat for each category)
+### Sidebar Tabs

-## Issues Found
+| Test | Steps |
+|---|---|
+| Workflows tab | Press W — workflows sidebar opens |
+| Node Library tab | Press N — node library opens |
+| Model Library tab | Press M — model library opens |
+| Tab toggle | Press same key again — sidebar closes |
+| Search in sidebar | Type in search box — results filter |

-### Issue 1: [Title]
+### Settings Dialog

- **Severity**: critical / major / minor / cosmetic
- **Steps to reproduce**: ...
- **Expected**: ...
- **Actual**: ...
- **Screenshot**: (if available)
+| Test | Steps |
+|---|---|
+| Open settings | Press Ctrl+, or click settings button |
+| Change a setting | Toggle a boolean setting — it persists after closing |
+| Search settings | Type in settings search box — results filter |
+| Close settings | Press Escape or click close button |

-## Notes
+### Execution & Queue

-Any additional observations, performance notes, or suggestions.
-```
+| Test | Steps |
+|---|---|
+| Queue prompt | Load default workflow, click Queue — execution starts |
+| Queue progress | Progress indicator shows during execution |
+| Interrupt | Press Ctrl+Alt+Enter during execution — interrupts |

-## Step 4: Commit and Push Report
+## Report Site

-### In CI (when `CI=true`)
+Deployed to Cloudflare Pages at `https://comfy-qa.pages.dev/<branch>/`.

-Save the report directly to `$QA_ARTIFACTS` (the CI workflow uploads this as
-an artifact and posts results as a PR comment). Do **not** commit, push, or
-create a new PR.
+Features:
+- Light/dark theme
+- Seekable video player with preload
+- Copy badge button (markdown)
+- Date-stamped badges (e.g., `QA0327`)
+- Vertical box badge for issues and PRs

-### Local / interactive use
+## Known Issues & Troubleshooting

-When running locally, create a draft PR after committing:
+See `docs/qa/TROUBLESHOOTING.md` for common failures:
+- `set -euo pipefail` + grep with no match → append `|| true`
+- `__name is not defined` in `page.evaluate` → use `addScriptTag`
+- Cursor not visible in videos → monkey-patch `page.mouse` methods
+- Agent not calling `done()` → auto-complete from passing test

-```bash
-# Ensure on a feature branch
-BRANCH_NAME="qa/$(date +%Y-%m-%d)-$(git rev-parse --short HEAD)"
-git checkout -b "$BRANCH_NAME" 2>/dev/null || git checkout "$BRANCH_NAME"
+## Backlog

-git add docs/qa/
-git commit -m "docs: add QA report $(date +%Y-%m-%d)
-
-Automated QA report covering all frontend routes and features."
-git push -u origin "$BRANCH_NAME"
-
-# Create draft PR assigned to comfy-pr-bot
-gh pr create \
-  --draft \
-  --title "QA Report: $(date +%Y-%m-%d)" \
-  --body "## QA Report
-
-Automated frontend QA run covering all routes and interactive features.
-
-See \`docs/qa/\` for the full report.
-
-/cc @comfy-pr-bot" \
-  --assignee comfy-pr-bot
-```
-
-## Execution Notes
-
-### Cross-Platform Considerations
-
- **Windows**: Use `pwsh` or `cmd` equivalents for shell commands. `gh` CLI works on all platforms.
- **macOS**: Keyboard shortcuts use Cmd instead of Ctrl in the actual app, but Playwright sends OS-appropriate keys.
- **Linux**: Primary CI platform. Screenshot baselines are Linux-only.
-
-### Agent Compatibility
-
-This skill uses **playwright-cli** (`@playwright/cli`) — a token-efficient CLI designed for coding agents. Install it once with `npm install -g @playwright/cli@latest`, then use `Bash` to run commands.
-
-The key operations and their playwright-cli equivalents:
-
-| Action           | Command                                  |
-| ---------------- | ---------------------------------------- |
-| Navigate to URL  | `playwright-cli goto <url>`              |
-| Get element refs | `playwright-cli snapshot`                |
-| Click element    | `playwright-cli click <ref>`             |
-| Type text        | `playwright-cli fill <ref> <text>`       |
-| Press shortcut   | `playwright-cli press <key>`             |
-| Take screenshot  | `playwright-cli screenshot --filename=f` |
-| Hover element    | `playwright-cli hover <ref>`             |
-| Select dropdown  | `playwright-cli select <ref> <value>`    |
-
-Snapshots return element references (`e1`, `e2`, …). Always run `snapshot` after navigation or major interactions to refresh refs before acting.
-
-### Tips for Reliable QA
-
-1. **Wait for page stability** before interacting — check that elements are visible and enabled
-2. **Take a snapshot after each major navigation** to verify state
-3. **Don't use fixed timeouts** — poll for expected conditions
-4. **Record the full page snapshot** at the start for baseline comparison
-5. **If a test fails**, document it and continue — don't abort the entire QA run
-6. **Group related tests** — complete one category before moving to the next
+See `docs/qa/backlog.md` for planned improvements:
+- **Type B comparison**: Different commits for regression detection
+- **Type C comparison**: Cross-browser testing
+- **Pre-seed assets**: Upload test images before recording
+- **Lazy a11y tree**: Reduce token usage with `inspect(selector)` vs full dump
--- a/.github/workflows/pr-qa.yaml
+++ b/.github/workflows/pr-qa.yaml
@@ -290,7 +290,7 @@ jobs:
        env:
          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY_QA }}
          TARGET_TYPE: ${{ needs.resolve-matrix.outputs.target_type }}
        run: |
          MODE="before"
--- a/docs/qa/TROUBLESHOOTING.md
+++ b/docs/qa/TROUBLESHOOTING.md
@@ -64,6 +64,11 @@
 **Cause**: Headless Chrome doesn't render system cursor. The CSS cursor overlay relies on DOM `mousemove` events which Playwright CDP doesn't reliably trigger.
 **Fix**: Monkey-patch `page.mouse.move/click/dblclick/down/up` to call `__moveCursor(x,y)` on the injected cursor div. This makes ALL mouse operations update the overlay.

+## Credit Balance Too Low
+**Symptom**: Research phase produces INCONCLUSIVE with 0 tool calls. Log shows "Credit balance is too low".
+**Cause**: The `ANTHROPIC_API_KEY` secret in the repo has exhausted its credits.
+**Fix**: Top up the Anthropic API account linked to the key, or rotate to a new key in repo Settings → Secrets.
+
 ## Agent Doesn't Perform Steps
 **Symptom**: Agent opens menus and settings but never interacts with the canvas.
 **Causes**:
--- a/scripts/qa-agent.ts
+++ b/scripts/qa-agent.ts
@@ -18,7 +18,7 @@
 import type { Page } from '@playwright/test'
 import { query, tool, createSdkMcpServer } from '@anthropic-ai/claude-agent-sdk'
 import { z } from 'zod'
-import { mkdirSync, writeFileSync } from 'fs'
+import { mkdirSync, readFileSync, writeFileSync } from 'fs'
 import { execSync } from 'child_process'

 // ── Types ──
@@ -56,7 +56,6 @@ export async function runResearchPhase(
  const { page, issueContext, qaGuide, outputDir, serverUrl, anthropicApiKey } =
    opts
  const maxTurns = opts.maxTurns ?? 50
-  const timeBudgetMs = opts.timeBudgetMs ?? 600_000 // 10 min for write→run→fix loops

  let agentDone = false
  let finalVerdict: ResearchResult['verdict'] = 'INCONCLUSIVE'
@@ -127,6 +126,87 @@ export async function runResearchPhase(
    }
  )

+  // ── Tool: readFixture ──
+  const readFixtureTool = tool(
+    'readFixture',
+    'Read a fixture or helper file from browser_tests/fixtures/ to understand the API. Use this to discover available methods on comfyPage helpers before writing your test.',
+    {
+      path: z
+        .string()
+        .describe(
+          'Relative path within browser_tests/fixtures/, e.g. "helpers/CanvasHelper.ts" or "components/Topbar.ts" or "ComfyPage.ts"'
+        )
+    },
+    async (args) => {
+      let resultText: string
+      try {
+        const fullPath = `${projectRoot}/browser_tests/fixtures/${args.path}`
+        const content = readFileSync(fullPath, 'utf-8')
+        resultText = content.slice(0, 4000)
+        if (content.length > 4000) {
+          resultText += `\n\n... (truncated, ${content.length} total chars)`
+        }
+      } catch (e) {
+        resultText = `Could not read fixture: ${e instanceof Error ? e.message : e}`
+      }
+
+      researchLog.push({
+        turn: turnCount,
+        timestampMs: Date.now() - startTime,
+        toolName: 'readFixture',
+        toolInput: args,
+        toolResult: resultText.slice(0, 500)
+      })
+
+      return { content: [{ type: 'text' as const, text: resultText }] }
+    }
+  )
+
+  // ── Tool: readTest ──
+  const readTestTool = tool(
+    'readTest',
+    'Read an existing E2E test file from browser_tests/tests/ to learn patterns and conventions used in this project.',
+    {
+      path: z
+        .string()
+        .describe(
+          'Relative path within browser_tests/tests/, e.g. "workflow.spec.ts" or "subgraph.spec.ts"'
+        )
+    },
+    async (args) => {
+      let resultText: string
+      try {
+        const fullPath = `${projectRoot}/browser_tests/tests/${args.path}`
+        const content = readFileSync(fullPath, 'utf-8')
+        resultText = content.slice(0, 4000)
+        if (content.length > 4000) {
+          resultText += `\n\n... (truncated, ${content.length} total chars)`
+        }
+      } catch (e) {
+        // List available test files if the path doesn't exist
+        try {
+          const { readdirSync } = await import('fs')
+          const files = readdirSync(`${projectRoot}/browser_tests/tests/`)
+            .filter((f: string) => f.endsWith('.spec.ts'))
+            .slice(0, 30)
+          resultText = `File not found: ${args.path}\n\nAvailable test files:\n${files.join('\n')}`
+        } catch {
+          resultText = `Could not read test: ${e instanceof Error ? e.message : e}`
+        }
+      }
+
+      researchLog.push({
+        turn: turnCount,
+        timestampMs: Date.now() - startTime,
+        toolName: 'readTest',
+        toolInput: args,
+        toolResult: resultText.slice(0, 500)
+      })
+
+      return { content: [{ type: 'text' as const, text: resultText }] }
+    }
+  )
+
  // ── Tool: writeTest ──
  const writeTestTool = tool(
    'writeTest',
@@ -255,7 +335,14 @@ export async function runResearchPhase(
  const server = createSdkMcpServer({
    name: 'qa-research',
    version: '1.0.0',
-    tools: [inspectTool, writeTestTool, runTestTool, doneTool]
+    tools: [
+      inspectTool,
+      readFixtureTool,
+      readTestTool,
+      writeTestTool,
+      runTestTool,
+      doneTool
+    ]
  })

  // ── System prompt ──
@@ -263,6 +350,8 @@ export async function runResearchPhase(

 ## Your tools
 - inspect(selector?) — Read the accessibility tree to understand the current UI. Use to discover selectors, element names, and UI state.
+- readFixture(path) — Read fixture source code from browser_tests/fixtures/. Use to discover available methods. E.g. "helpers/CanvasHelper.ts", "components/Topbar.ts", "ComfyPage.ts"
+- readTest(path) — Read an existing test from browser_tests/tests/ to learn patterns. E.g. "workflow.spec.ts". Pass any name to list available files.
 - writeTest(code) — Write a Playwright test file (.spec.ts)
 - runTest() — Execute the test and get results (pass/fail + errors)
 - done(verdict, summary, evidence, testCode) — Finish with the final test
@@ -270,31 +359,111 @@ export async function runResearchPhase(
 ## Workflow
 1. Read the issue description carefully
 2. Use inspect() to understand the current UI state and discover element selectors
-3. Write a Playwright test that:
-   - Navigates to ${serverUrl}
+3. If unsure about the fixture API, use readFixture() to read the relevant helper source code
+4. If unsure about test patterns, use readTest() to read an existing test for reference
+5. Write a Playwright test that:
   - Performs the exact reproduction steps from the issue
   - Asserts the BROKEN behavior (the bug) — so the test PASSES when the bug exists
-4. Run the test with runTest()
-5. If it fails: read the error, fix the test, run again (max 5 attempts)
-6. Call done() with the final verdict and test code
+6. Run the test with runTest()
+7. If it fails: read the error, fix the test, run again (max 5 attempts)
+8. Call done() with the final verdict and test code

 ## Test writing guidelines
 - Import the project fixture: \`import { comfyPageFixture as test } from '../fixtures/ComfyPage'\`
 - Import expect: \`import { expect } from '@playwright/test'\`
- The fixture provides \`comfyPage\` which has:
-  - \`comfyPage.page\` — the Playwright Page object
-  - \`comfyPage.menu.topbar\` — topbar actions (saveWorkflowAs, getTabNames, getWorkflowTab)
-  - \`comfyPage.menu.topbar.triggerTopbarCommand(label)\` — click a menu command
-  - \`comfyPage.workflow\` — workflow helpers (isCurrentWorkflowModified, setupWorkflowsDirectory)
-  - \`comfyPage.canvas\` — canvas element for mouse interactions
-  - \`comfyPage.settings.setSetting(id, value)\` — change settings
-  - \`comfyPage.nextFrame()\` — wait for next render frame
-  - \`comfyPage.loadWorkflow(name)\` — load a named workflow
- Use beforeEach to set up settings and workflow directory
- Use afterEach to clean up (setupWorkflowsDirectory({}))
+- The fixture provides \`comfyPage\` which has all the helpers listed below
 - If the bug IS present, the test should PASS. If the bug is fixed, the test would FAIL.
 - Keep tests focused and minimal — test ONLY the reported bug
 - The test file will be placed in browser_tests/tests/qa-reproduce.spec.ts
+- Use \`comfyPage.nextFrame()\` after interactions that trigger UI updates
+
+## ComfyPage Fixture API Reference
+
+### Core properties
+- \`comfyPage.page\` — raw Playwright Page
+- \`comfyPage.canvas\` — Locator for #graph-canvas
+- \`comfyPage.queueButton\` — "Queue Prompt" button
+- \`comfyPage.runButton\` — "Run" button (new UI)
+- \`comfyPage.confirmDialog\` — ConfirmDialog (has .confirm, .delete, .overwrite, .reject locators + .click(name) method)
+- \`comfyPage.nextFrame()\` — wait for next requestAnimationFrame
+- \`comfyPage.setup()\` — navigate + wait for app ready (called automatically by fixture)
+
+### Menu (comfyPage.menu)
+- \`comfyPage.menu.topbar\` — Topbar helper:
+  - \`.triggerTopbarCommand(['File', 'Save As'])\` — navigate menu hierarchy
+  - \`.openTopbarMenu()\` / \`.closeTopbarMenu()\` — open/close hamburger
+  - \`.openSubmenu('File')\` — hover to open submenu, returns submenu Locator
+  - \`.getTabNames()\` — get all open workflow tab names
+  - \`.getActiveTabName()\` — get active tab name
+  - \`.getWorkflowTab(name)\` — get tab Locator
+  - \`.closeWorkflowTab(name)\` — close a tab
+  - \`.saveWorkflow(name)\` / \`.saveWorkflowAs(name)\` / \`.exportWorkflow(name)\`
+  - \`.switchTheme('dark' | 'light')\`
+- \`comfyPage.menu.workflowsTab\` — WorkflowsSidebarTab:
+  - \`.open()\` / \`.close()\` — toggle workflows sidebar
+  - \`.getTopLevelSavedWorkflowNames()\` — list saved workflow names
+- \`comfyPage.menu.nodeLibraryTab\` — NodeLibrarySidebarTab
+- \`comfyPage.menu.assetsTab\` — AssetsSidebarTab
+
+### Canvas (comfyPage.canvasOps)
+- \`.click({x, y})\` — click at position on canvas
+- \`.rightClick(x, y)\` — right-click (opens context menu)
+- \`.doubleClick()\` — double-click canvas (opens node search)
+- \`.clickEmptySpace()\` — click known empty area
+- \`.dragAndDrop(source, target)\` — drag from source to target position
+- \`.pan(offset, safeSpot?)\` — pan canvas by offset
+- \`.zoom(deltaY, steps?)\` — zoom via scroll wheel
+- \`.resetView()\` — reset zoom/pan to default
+- \`.getScale()\` / \`.setScale(n)\` — get/set canvas zoom
+- \`.getNodeCenterByTitle(title)\` — get screen coords of node center
+- \`.disconnectEdge()\` / \`.connectEdge()\` — default graph edge operations
+
+### Node Operations (comfyPage.nodeOps)
+- \`.getGraphNodesCount()\` — count all nodes
+- \`.getSelectedGraphNodesCount()\` — count selected nodes
+- \`.getNodes()\` — get all nodes
+- \`.getFirstNodeRef()\` — get NodeReference for first node
+- \`.getNodeRefById(id)\` — get NodeReference by ID
+- \`.getNodeRefsByType(type)\` — get all nodes of a type
+- \`.waitForGraphNodes(count)\` — wait until node count matches
+
+### Settings (comfyPage.settings)
+- \`.setSetting(id, value)\` — change a ComfyUI setting
+- \`.getSetting(id)\` — read current setting value
+
+### Keyboard (comfyPage.keyboard)
+- \`.undo()\` / \`.redo()\` — Ctrl+Z / Ctrl+Y
+- \`.bypass()\` — Ctrl+B
+- \`.selectAll()\` — Ctrl+A
+- \`.ctrlSend(key)\` — send Ctrl+key
+
+### Workflow (comfyPage.workflow)
+- \`.loadWorkflow(name)\` — load from browser_tests/assets/{name}.json
+- \`.setupWorkflowsDirectory(structure)\` — setup test directory
+- \`.deleteWorkflow(name)\`
+- \`.isCurrentWorkflowModified()\` — check dirty state
+
+### Context Menu (comfyPage.contextMenu)
+- \`.openFor(locator)\` — right-click locator and wait for menu
+- \`.clickMenuItem(name)\` — click a menu item by name
+- \`.isVisible()\` — check if context menu is showing
+- \`.assertHasItems(items)\` — assert menu contains items
+
+### Other helpers
+- \`comfyPage.settingDialog\` — SettingDialog component
+- \`comfyPage.searchBox\` / \`comfyPage.searchBoxV2\` — node search
+- \`comfyPage.toast\` — ToastHelper (\`.visibleToasts\`)
+- \`comfyPage.subgraph\` — SubgraphHelper
+- \`comfyPage.vueNodes\` — VueNodeHelpers
+- \`comfyPage.bottomPanel\` — BottomPanel
+- \`comfyPage.clipboard\` — ClipboardHelper
+- \`comfyPage.dragDrop\` — DragDropHelper
+
+### Available fixture files (use readFixture to explore)
+- ComfyPage.ts — main fixture with all helpers
+- helpers/CanvasHelper.ts, NodeOperationsHelper.ts, WorkflowHelper.ts
+- helpers/KeyboardHelper.ts, SettingsHelper.ts, SubgraphHelper.ts
+- components/Topbar.ts, ContextMenu.ts, SettingDialog.ts, SidebarTab.ts

 ## Current UI state (accessibility tree)
 ${initialA11y}
@@ -309,7 +478,7 @@ ${issueContext}`
  try {
    for await (const message of query({
      prompt:
-        'Write a Playwright E2E test that reproduces the reported bug. Use inspect() to discover selectors, writeTest() to write the test, runTest() to execute it. Iterate until it works or you determine the bug cannot be reproduced.',
+        'Write a Playwright E2E test that reproduces the reported bug. Use inspect() to discover selectors, readFixture() or readTest() if you need to understand the fixture API or see existing test patterns, writeTest() to write the test, runTest() to execute it. Iterate until it works or you determine the bug cannot be reproduced.',
      options: {
        model: 'claude-sonnet-4-6',
        systemPrompt,
@@ -318,6 +487,8 @@ ${issueContext}`
        mcpServers: { 'qa-research': server },
        allowedTools: [
          'mcp__qa-research__inspect',
+          'mcp__qa-research__readFixture',
+          'mcp__qa-research__readTest',
          'mcp__qa-research__writeTest',
          'mcp__qa-research__runTest',
          'mcp__qa-research__done'
@@ -339,7 +510,21 @@ ${issueContext}`
      if (agentDone) break
    }
  } catch (e) {
-    console.warn(`Research error: ${e instanceof Error ? e.message : e}`)
+    const errMsg = e instanceof Error ? e.message : String(e)
+    console.warn(`Research error: ${errMsg}`)
+
+    // Detect billing/auth errors and surface them clearly
+    if (
+      errMsg.includes('Credit balance is too low') ||
+      errMsg.includes('insufficient_quota') ||
+      errMsg.includes('rate_limit')
+    ) {
+      finalSummary = `API error: ${errMsg.slice(0, 200)}`
+      finalEvidence = 'Agent could not start due to API billing/auth issue'
+      console.warn(
+        '::error::Anthropic API credits exhausted — cannot run research phase'
+      )
+    }
  }

  // Auto-complete: if a test passed but done() was never called, use the passing test
--- a/scripts/qa-record.ts
+++ b/scripts/qa-record.ts
@@ -79,7 +79,7 @@ type TestAction =
  | { action: 'resizeNode'; x: number; y: number; dx: number; dy: number }
  | { action: 'middleClick'; x: number; y: number }

-interface ActionResult {
+export interface ActionResult {
  action: TestAction
  success: boolean
  error?: string
@@ -462,7 +462,7 @@ async function showSubtitle(page: Page, text: string, turn: number) {
 async function generateNarrationAudio(
  segments: NarrationSegment[],
  outputDir: string,
-  apiKey: string
+  _apiKey: string
 ): Promise<string | null> {
  if (segments.length === 0) return null

@@ -523,7 +523,6 @@ async function generateNarrationAudio(

  for (let i = 0; i < audioFiles.length; i++) {
    inputArgs.push('-i', audioFiles[i].path)
-    const delaySec = (audioFiles[i].offsetMs / 1000).toFixed(3)
    filterParts.push(
      `[${i}]adelay=${audioFiles[i].offsetMs}|${audioFiles[i].offsetMs}[a${i}]`
    )
@@ -1429,7 +1428,8 @@ interface AgenticTurnContent {
  >
 }

-async function runAgenticLoop(
+// @ts-expect-error TS6133 — legacy function kept for fallback
+async function _runAgenticLoop(
  page: Page,
  opts: Options,
  outputDir: string,
@@ -1497,19 +1497,21 @@ async function runAgenticLoop(
    preflightNote
  )

-  const anthropicKey = process.env.ANTHROPIC_API_KEY
+  const anthropicKey =
+    process.env.ANTHROPIC_API_KEY_QA || process.env.ANTHROPIC_API_KEY
  const useHybrid = Boolean(anthropicKey)

  const genAI = new GoogleGenerativeAI(opts.apiKey)
-  const geminiVisionModel = genAI.getGenerativeModel({
+  // @ts-expect-error TS6133 — kept for hybrid mode fallback
+  const _geminiVisionModel = genAI.getGenerativeModel({
    model: 'gemini-3-flash-preview'
  })

-  // Gemini-only fallback model (used when no ANTHROPIC_API_KEY)
+  // Gemini-only fallback model (used when no ANTHROPIC_API_KEY_QA)
  const agenticModel = opts.model.includes('flash')
    ? opts.model
    : 'gemini-3-flash-preview'
-  const geminiOnlyModel = genAI.getGenerativeModel({
+  const _geminiOnlyModel = genAI.getGenerativeModel({
    model: agenticModel,
    systemInstruction
  })
@@ -1588,7 +1590,7 @@ async function runAgenticLoop(
    // 3. Call Gemini
    let actionObj: TestAction
    try {
-      const result = await model.generateContent({
+      const result = await _geminiOnlyModel.generateContent({
        contents,
        generationConfig: {
          temperature: 0.2,
@@ -1936,7 +1938,8 @@ async function main() {
        })
        // ═══ Phase 1: RESEARCH — Claude writes E2E test to reproduce ═══
        console.warn('Phase 1: Research — Claude writes E2E test')
-        const anthropicKey = process.env.ANTHROPIC_API_KEY
+        const anthropicKey =
+          process.env.ANTHROPIC_API_KEY_QA || process.env.ANTHROPIC_API_KEY
        const { runResearchPhase } = await import('./qa-agent.js')
        const issueCtx = opts.diffFile
          ? readFileSync(opts.diffFile, 'utf-8').slice(0, 6000)