fix: Opus escalation graceful fallback on credit exhaustion

When Opus API call fails (credit balance, rate limit), keep Sonnet's result instead of overwriting with INCONCLUSIVE API error. Only use Opus result if it's actually better than Sonnet's attempt.
feat: Opus escalation for INCONCLUSIVE issues
2026-04-19 22:09:37 +00:00 · 2026-04-14 15:37:07 +00:00 · 2026-04-14 13:14:33 +00:00 · 2026-04-14 13:12:49 +00:00 · 2026-04-13 19:41:54 +00:00 · 2026-04-13 18:42:20 +00:00
6 changed files with 180 additions and 286 deletions
--- a/.claude/skills/comfy-qa/SKILL.md
+++ b/.claude/skills/comfy-qa/SKILL.md
@@ -1,283 +1,133 @@
 ---
 name: comfy-qa
-description: 'Comprehensive QA of ComfyUI frontend. Navigates all routes, tests all interactive features using playwright-cli, generates a report, and submits a draft PR. Works in CI and local environments, cross-platform.'
+description: 'Comprehensive QA of ComfyUI frontend. Reproduces bugs via E2E tests, records narrated demo videos, deploys reports. Works in CI and locally via `pnpm qa`.'
 ---

 # ComfyUI Frontend QA Skill

-Automated quality assurance for the ComfyUI frontend. The pipeline reproduces reported bugs using Playwright E2E tests, records video evidence, and deploys reports to Cloudflare Pages.
+Automated quality assurance pipeline that reproduces reported bugs using Playwright E2E tests, records narrated demo videos with demowright, and deploys reports to Cloudflare Pages.

-## Architecture Overview
+## Quick Start

-The QA pipeline uses a **three-phase approach**:
+```bash
+# Reproduce an issue
+pnpm qa 10253

-1. **RESEARCH** — Claude writes Playwright E2E tests to reproduce bugs (assertion-backed, no hallucination)
-2. **REPRODUCE** — Deterministic replay of the research test with video recording
-3. **REPORT** — Deploy results to Cloudflare Pages with badge, video, and verdict
+# Test a PR
+pnpm qa 10270

-### Key Design Decision
+# Test PR base (reproduce bug)
+pnpm qa 10270 -t base

-Earlier iterations used AI vision (Gemini) to drive a browser and judge results from video. This was abandoned after discovering **AI reviewers hallucinate** — Gemini reported "REPRODUCED" when videos showed idle screens. The current approach uses **Playwright assertions** as the source of truth: if the test passes, the bug is proven.
+# Test both base + head
+pnpm qa 10270 -t both

-## Prerequisites
+# Test local uncommitted changes
+pnpm qa --uncommitted

- Node.js 22+
- `pnpm` package manager
- `gh` CLI (authenticated)
- Playwright browsers: `npx playwright install chromium`
- Environment variables:
-  - `GEMINI_API_KEY` — for PR analysis and video review
-  - `ANTHROPIC_API_KEY` — for Claude Agent SDK (research phase)
-  - `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` — for report deployment
+# Help
+pnpm qa -h
+```
+
+Auto-loads `.env.local` / `.env` for `GEMINI_API_KEY` and `ANTHROPIC_API_KEY`.
+Auto-detects issue vs PR via GitHub API. Auto-starts ComfyUI if not running.
+
+## Architecture
+
+Three-phase pipeline:
+
+1. **RESEARCH** — Claude Sonnet 4.6 (Agent SDK) writes Playwright E2E tests to reproduce bugs
+2. **RECORD** — Re-runs test with demowright for narrated demo video (title cards, TTS, subtitles)
+3. **REPORT** — Gemini reviews video, deploys to Cloudflare Pages with badge + verdict
+
+Key principle: **Playwright assertions are truth** — no AI hallucination. If the test passes, the bug is proven.

 ## Pipeline Scripts

-| Script                            | Role                                                    | Model                         |
-| --------------------------------- | ------------------------------------------------------- | ----------------------------- |
-| `scripts/qa-analyze-pr.ts`        | Deep PR/issue analysis → QA guide                       | gemini-3.1-pro-preview        |
-| `scripts/qa-agent.ts`             | Research phase: Claude writes E2E tests                 | claude-sonnet-4-6 (Agent SDK) |
-| `scripts/qa-record.ts`            | Before/after video recording with Gemini-driven actions | gemini-3.1-pro-preview        |
-| `scripts/qa-reproduce.ts`         | Deterministic replay with narration                     | gemini-3-flash-preview        |
-| `scripts/qa-video-review.ts`      | Video comparison review                                 | gemini-3-flash-preview        |
-| `scripts/qa-generate-test.ts`     | Regression test generation from QA report               | gemini-3-flash-preview        |
-| `scripts/qa-deploy-pages.sh`      | Deploy to Cloudflare Pages + badge                      | —                             |
-| `scripts/qa-batch.sh`             | Batch-trigger QA for multiple issues                    | —                             |
-| `scripts/qa-report-template.html` | Report site (light/dark, seekbar, copy badge)           | —                             |
+All scripts live in `.claude/skills/comfy-qa/scripts/`:
+
+| Script                    | Role                                                  |
+| ------------------------- | ----------------------------------------------------- |
+| `qa.ts`                   | CLI entry point (`pnpm qa`)                           |
+| `qa-agent.ts`             | Research phase: Claude writes E2E tests via Agent SDK |
+| `qa-record.ts`            | Orchestrator: 3-phase pipeline                        |
+| `qa-deploy-pages.sh`      | Cloudflare Pages deploy + badge generation            |
+| `qa-report-template.html` | Report site template                                  |
+| `qa-video-review.ts`      | Gemini video review                                   |
+| `qa-analyze-pr.ts`        | Deep PR/issue analysis → QA guide                     |

 ## Triggering QA

-### Via GitHub Labels
-
- **`qa-changes`** — Focused QA on a PR (Linux-only, before/after comparison)
- **`qa-full`** — Full QA (3-OS matrix, after-only)
- **`qa-issue`** — Reproduce a bug from an issue
-
-### Via Batch Script
+### Via CLI (`pnpm qa`)

 ```bash
-# Trigger QA for specific issue numbers
-./scripts/qa-batch.sh 10394 10238 9996
-
-# From a triage file (top 5 Tier 1 issues)
-./scripts/qa-batch.sh --from tmp/issues.md --top 5
-
-# Preview without pushing
-./scripts/qa-batch.sh --dry-run 10394
-
-# Clean up old trigger branches
-./scripts/qa-batch.sh --cleanup
+pnpm qa 10253                  # issue (auto-detect)
+pnpm qa 10270                  # PR head
+pnpm qa 10270 -t base          # PR base
+pnpm qa 10270 -t both          # both
+pnpm qa --uncommitted          # local changes
 ```

-### Via Workflow Dispatch
+### Via GitHub Labels

-Go to Actions → "PR: QA" → Run workflow → choose mode (focused/full).
+- **`qa-issue`** — Reproduce a bug from an issue
+- **`qa-changes`** — Focused QA on a PR (Linux-only, before/after)
+- **`qa-full`** — Full QA (3-OS matrix)
+
+### Via Push to trigger branches
+
+```bash
+git push origin sno-skills:sno-qa-10253 --force
+```
+
+## Research Phase (`qa-agent.ts`)
+
+Claude receives the issue/PR context + a11y tree snapshot + ComfyPage fixture API docs.
+
+Tools:
+
+- **`inspect(selector?)`** — Read a11y tree
+- **`readFixture(path)`** — Read fixture source code
+- **`readTest(path)`** — Read existing tests for patterns
+- **`writeTest(code)`** — Write a Playwright .spec.ts
+- **`runTest()`** — Execute and get pass/fail + errors
+- **`done(verdict, summary, evidence, testCode, videoScript?)`** — Finish
+
+When `verdict=REPRODUCED`, Claude also provides a `videoScript` — a separate test file using demowright's `createVideoScript()` for professional narrated demo video with title cards, TTS segments, and outro.
+
+## Video Recording (demowright)
+
+Phase 2 uses the video script to record with:
+
+- `showTitleCard()` / `hideTitleCard()` — covers setup/loading screen
+- `createVideoScript().title().segment().outro()` — structured narration
+- `pace()` — narration-then-action timing
+- TTS audio + subtitles + cursor overlay + key badges
+
+## Report Site
+
+Deployed to `https://sno-qa-{number}.comfy-qa.pages.dev/`
+
+Features:
+
+- Video player (1x default, adjustable speed)
+- Research log (verdict, tool calls, timing)
+- E2E test code + video script code
+- Verdict banner for NOT_REPRODUCIBLE/INCONCLUSIVE with failure reason
+- Copy badge button (markdown)
+
+## Prerequisites
+
+- `GEMINI_API_KEY` — video review, TTS
+- `ANTHROPIC_API_KEY` — Claude Agent SDK (research phase)
+- `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` — report deployment (CI only)
+- ComfyUI server running (auto-detected, or auto-started)

 ## CI Workflow (`.github/workflows/pr-qa.yaml`)

 ```
 resolve-matrix → analyze-pr ──┐
-                               ├→ qa-before (main branch, worktree build)
+                               ├→ qa-before (main branch)
                               ├→ qa-after  (PR branch)
                               └→ report (video review, deploy, comment)
 ```
-
-Before/after jobs run **in parallel** on separate runners for clean isolation.
-
-### Issue Reproduce Mode
-
-For issues (not PRs), the pipeline:
-
-1. Fetches the issue body and comments
-2. Runs `qa-analyze-pr.ts --type issue` to generate a QA guide
-3. Runs the research phase (Claude writes E2E test to reproduce)
-4. Records video of the test execution
-5. Posts results as a comment on the issue
-
-## Running Locally
-
-### Step 1: Environment Setup
-
-```bash
-# Ensure ComfyUI server is running
-# Default: http://127.0.0.1:8188
-
-# Install Playwright browsers
-npx playwright install chromium
-```
-
-### Step 2: Analyze the Issue/PR
-
-```bash
-# For a PR
-pnpm exec tsx scripts/qa-analyze-pr.ts \
-  --pr-number 10394 \
-  --repo Comfy-Org/ComfyUI_frontend \
-  --output-dir qa-guides
-
-# For an issue
-pnpm exec tsx scripts/qa-analyze-pr.ts \
-  --pr-number 10394 \
-  --repo Comfy-Org/ComfyUI_frontend \
-  --output-dir qa-guides \
-  --type issue
-```
-
-### Step 3: Record Before/After
-
-```bash
-# Before (main branch)
-pnpm exec tsx scripts/qa-record.ts \
-  --mode before \
-  --diff /tmp/pr-diff.txt \
-  --output-dir /tmp/qa-before \
-  --qa-guide qa-guides/qa-guide-1.json
-
-# After (PR branch)
-pnpm exec tsx scripts/qa-record.ts \
-  --mode after \
-  --diff /tmp/pr-diff.txt \
-  --output-dir /tmp/qa-after \
-  --qa-guide qa-guides/qa-guide-1.json
-```
-
-### Step 4: Review Videos
-
-```bash
-pnpm exec tsx scripts/qa-video-review.ts \
-  --artifacts-dir /tmp/qa-artifacts \
-  --video-file qa-session.mp4 \
-  --before-video qa-before-session.mp4 \
-  --output-dir /tmp/video-reviews \
-  --pr-context /tmp/pr-context.txt
-```
-
-## Research Phase Details (`qa-agent.ts`)
-
-Claude receives:
-
- The issue description and comments
- A QA guide from `qa-analyze-pr.ts`
- An accessibility tree snapshot of the current UI
-
-Claude's tools:
-
- **`inspect(selector?)`** — Read a11y tree to discover element selectors
- **`writeTest(code)`** — Write a Playwright `.spec.ts` file
- **`runTest()`** — Execute the test and get pass/fail + errors
- **`done(verdict, summary, evidence, testCode)`** — Finish with verdict
-
-The test uses the project's Playwright fixtures (`comfyPageFixture`), giving access to `comfyPage.page`, `comfyPage.menu`, `comfyPage.settings`, etc.
-
-### Verdict Logic
-
- **REPRODUCED** — Test passes (asserting the bug exists) → bug is proven
- **NOT_REPRODUCIBLE** — Claude exhausted attempts, test cannot pass
- **INCONCLUSIVE** — Agent timed out or encountered infrastructure issues
-
-Auto-completion: if a test passed but `done()` was never called, the pipeline auto-completes with REPRODUCED.
-
-## Manual QA (Fallback)
-
-When the automated pipeline isn't suitable (e.g., visual-only bugs, complex multi-step interactions), use **playwright-cli** for manual browser interaction:
-
-```bash
-# Install
-npm install -g @playwright/cli@latest
-
-# Open browser and navigate
-playwright-cli open http://127.0.0.1:8188
-
-# Get element references
-playwright-cli snapshot
-
-# Interact
-playwright-cli click e1
-playwright-cli fill e2 "test text"
-playwright-cli press Escape
-playwright-cli screenshot --filename=f.png
-```
-
-Snapshots return element references (`e1`, `e2`, …). Always run `snapshot` after navigation to refresh refs.
-
-## Manual QA Test Plan
-
-When performing manual QA (either via playwright-cli or the automated pipeline), systematically test each area below.
-
-### Application Load & Routes
-
-| Test              | Steps                                                        |
-| ----------------- | ------------------------------------------------------------ |
-| Root route loads  | Navigate to `/` — GraphView should render with canvas        |
-| User select route | Navigate to `/user-select` — user selection UI should appear |
-| 404 handling      | Navigate to `/nonexistent` — should handle gracefully        |
-
-### Canvas & Graph View
-
-| Test                      | Steps                                                          |
-| ------------------------- | -------------------------------------------------------------- |
-| Canvas renders            | The LiteGraph canvas is visible and interactive                |
-| Pan canvas                | Click and drag on empty canvas area                            |
-| Zoom in/out               | Use scroll wheel or Alt+=/Alt+-                                |
-| Add node via double-click | Double-click canvas to open search, type "KSampler", select it |
-| Delete node               | Select a node, press Delete key                                |
-| Connect nodes             | Drag from output slot to input slot                            |
-| Copy/Paste                | Select nodes, Ctrl+C then Ctrl+V                               |
-| Undo/Redo                 | Make changes, Ctrl+Z to undo, Ctrl+Y to redo                   |
-| Context menus             | Right-click node vs empty canvas — different menus             |
-
-### Sidebar Tabs
-
-| Test              | Steps                                 |
-| ----------------- | ------------------------------------- |
-| Workflows tab     | Press W — workflows sidebar opens     |
-| Node Library tab  | Press N — node library opens          |
-| Model Library tab | Press M — model library opens         |
-| Tab toggle        | Press same key again — sidebar closes |
-| Search in sidebar | Type in search box — results filter   |
-
-### Settings Dialog
-
-| Test             | Steps                                                |
-| ---------------- | ---------------------------------------------------- |
-| Open settings    | Press Ctrl+, or click settings button                |
-| Change a setting | Toggle a boolean setting — it persists after closing |
-| Search settings  | Type in settings search box — results filter         |
-| Close settings   | Press Escape or click close button                   |
-
-### Execution & Queue
-
-| Test           | Steps                                                 |
-| -------------- | ----------------------------------------------------- |
-| Queue prompt   | Load default workflow, click Queue — execution starts |
-| Queue progress | Progress indicator shows during execution             |
-| Interrupt      | Press Ctrl+Alt+Enter during execution — interrupts    |
-
-## Report Site
-
-Deployed to Cloudflare Pages at `https://comfy-qa.pages.dev/<branch>/`.
-
-Features:
-
- Light/dark theme
- Seekable video player with preload
- Copy badge button (markdown)
- Date-stamped badges (e.g., `QA0327`)
- Vertical box badge for issues and PRs
-
-## Known Issues & Troubleshooting
-
-See `docs/qa/TROUBLESHOOTING.md` for common failures:
-
- `set -euo pipefail` + grep with no match → append `|| true`
- `__name is not defined` in `page.evaluate` → use `addScriptTag`
- Cursor not visible in videos → monkey-patch `page.mouse` methods
- Agent not calling `done()` → auto-complete from passing test
-
-## Backlog
-
-See `docs/qa/backlog.md` for planned improvements:
-
- **Type B comparison**: Different commits for regression detection
- **Type C comparison**: Cross-browser testing
- **Pre-seed assets**: Upload test images before recording
- **Lazy a11y tree**: Reduce token usage with `inspect(selector)` vs full dump
--- a/.claude/skills/comfy-qa/scripts/qa-agent.ts
+++ b/.claude/skills/comfy-qa/scripts/qa-agent.ts
@@ -35,6 +35,7 @@ interface ResearchOptions {
  anthropicApiKey?: string
  maxTurns?: number
  timeBudgetMs?: number
+  model?: string
 }

 export type ReproMethod = 'e2e_test' | 'video' | 'both' | 'none'
@@ -402,9 +403,13 @@ export async function runResearchPhase(

 ## Workflow
 1. Read the issue description carefully
-2. Use inspect() to understand the current UI state and discover element selectors
-3. If unsure about the fixture API, use readFixture() to read the relevant helper source code
-4. If unsure about test patterns, use readTest() to read an existing test for reference
+2. FIRST: Use readTest() to read 1-2 existing tests similar to the bug you're reproducing:
+   - For menu/workflow bugs: readTest("workflow.spec.ts") or readTest("topbarMenu.spec.ts")
+   - For node/canvas bugs: readTest("nodeInteraction.spec.ts") or readTest("copyPaste.spec.ts")
+   - For settings bugs: readTest("settingDialogSearch.spec.ts")
+   - For subgraph bugs: readTest("subgraph.spec.ts")
+3. Use inspect() to understand the current UI state and discover element selectors
+4. If unsure about the fixture API, use readFixture("ComfyPage.ts") or relevant helper
 5. Write a Playwright test that:
   - Performs the exact reproduction steps from the issue
   - Asserts the BROKEN behavior (the bug) — so the test PASSES when the bug exists
@@ -423,6 +428,8 @@ export async function runResearchPhase(
 - Use \`comfyPage.nextFrame()\` after interactions that trigger UI updates
 - NEVER use \`page.waitForTimeout()\` — use Locator actions and retrying assertions instead
 - ALWAYS call done() when finished, even if the test passed — do not keep iterating after a passing test
+- CRITICAL: If your test FAILS 3 times in a row with the same or similar error, call done(NOT_REPRODUCIBLE) immediately. Do NOT keep retrying the same approach — try a completely different strategy or give up. Spending 20+ tool calls on failing tests is wasteful.
+- Budget your turns: spend at most 3 turns on inspect/readFixture, 2 turns writing the first test, then max 3 fix attempts. If still failing after ~10 tool calls, call done().
 - Use \`expect.poll()\` for async assertions: \`await expect.poll(() => comfyPage.nodeOps.getGraphNodesCount()).toBe(8)\`
 - CRITICAL: Your assertions must be SPECIFIC TO THE BUG. A test that asserts \`expect(count).toBeGreaterThan(0)\` proves nothing — it would pass even without the bug. Instead assert the exact broken state, e.g. \`expect(clonedWidgets).toHaveLength(0)\` (missing widgets) or \`expect(zIndex).toBeLessThan(parentZIndex)\` (wrong z-order). If a test passes trivially, it's a false positive.
 - NEVER write "debug", "discovery", or "inspect node types" tests. These waste turns and produce false REPRODUCED verdicts. If you need to discover node type names, use inspect() or readFixture() — not a passing test.
@@ -525,14 +532,19 @@ The videoScript is a complete, standalone Playwright test file for Phase 2 demo

 \`\`\`typescript
 import { comfyPageFixture as test } from '../fixtures/ComfyPage'
-import { createVideoScript } from 'demowright/video-script'
+import { createVideoScript, showTitleCard, hideTitleCard } from 'demowright/video-script'

 test('Demo: Bug Title', async ({ comfyPage }) => {
-  // IMPORTANT: ALL setup code MUST go here BEFORE createVideoScript()
-  // so the title card is the FIRST thing viewers see in the video
+  // Show title card IMMEDIATELY — covers the screen while setup runs behind it
+  await showTitleCard(comfyPage.page, 'Bug Title Here', { subtitle: 'Issue #NNNN' })
+
+  // Setup runs while title card is visible
  await comfyPage.settings.setSetting('Comfy.UseNewMenu', 'Top')
  await comfyPage.workflow.setupWorkflowsDirectory({})

+  // Remove early title card before script starts (script will show its own)
+  await hideTitleCard(comfyPage.page)
+
  const script = createVideoScript()
    .title('Bug Title Here', { subtitle: 'Issue #NNNN', durationMs: 4000 })
    .segment('Step 1: description of what we do', async (pace) => {
@@ -570,7 +582,7 @@ to happen before it happens. Pattern:

 IMPORTANT RULES for videoScript:
 1. You MUST provide videoScript when verdict is REPRODUCED — every reproduced bug needs a narrated demo
-2. ALL setup code (setSetting, setupWorkflowsDirectory) goes BEFORE createVideoScript() — title card must be first thing in video
+2. Call showTitleCard() BEFORE setup, run setup behind it, call hideTitleCard() before createVideoScript() — see example
 3. Call \`await pace()\` FIRST in each segment callback, BEFORE actions
 4. Add \`waitForTimeout(2000)\` after each action so viewers can see the result
 5. Final evidence segment: hold for 5+ seconds
@@ -591,7 +603,7 @@ ${issueContext}`
      prompt:
        'Write a Playwright E2E test that reproduces the reported bug. Use inspect() to discover selectors, readFixture() or readTest() if you need to understand the fixture API or see existing test patterns, writeTest() to write the test, runTest() to execute it. Iterate until it works or you determine the bug cannot be reproduced.',
      options: {
-        model: 'claude-sonnet-4-6',
+        model: opts.model ?? 'claude-sonnet-4-6',
        systemPrompt,
        ...(anthropicApiKey ? { apiKey: anthropicApiKey } : {}),
        maxTurns,
--- a/.claude/skills/comfy-qa/scripts/qa-deploy-pages.sh
+++ b/.claude/skills/comfy-qa/scripts/qa-deploy-pages.sh
@@ -77,15 +77,15 @@ for os in Linux macOS Windows; do
  fi

  if [ "$HAS_BEFORE" = "1" ]; then
-    CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls autoplay preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls autoplay preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"
+    CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"
  elif [ -f "$DEPLOY_DIR/qa-${os}.mp4" ]; then
-    CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=video-wrap><video controls autoplay preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=card-body><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links><a class=dl href=qa-${os}.mp4 download>${DL_ICON}Download</a>${REPORT_LINK}</span></div>${REPORT_HTML}</div>"
+    CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=video-wrap><video controls preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=card-body><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links><a class=dl href=qa-${os}.mp4 download>${DL_ICON}Download</a>${REPORT_LINK}</span></div>${REPORT_HTML}</div>"
  else
    PASS_VIDEOS=""
    for pass_vid in "$DEPLOY_DIR/qa-${os}-pass"[0-9].mp4; do
      [ -f "$pass_vid" ] || continue
      PASS_NUM=$(basename "$pass_vid" | sed "s/qa-${os}-pass\([0-9]\).mp4/\1/")
-      PASS_VIDEOS="${PASS_VIDEOS}<div class=comp-panel><div class=comp-label>Pass ${PASS_NUM}</div><div class=video-wrap><video controls autoplay preload=auto><source src=qa-${os}-pass${PASS_NUM}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}-pass${PASS_NUM}.mp4 download>${DL_ICON}Pass ${PASS_NUM}</a></div></div>"
+      PASS_VIDEOS="${PASS_VIDEOS}<div class=comp-panel><div class=comp-label>Pass ${PASS_NUM}</div><div class=video-wrap><video controls preload=auto><source src=qa-${os}-pass${PASS_NUM}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}-pass${PASS_NUM}.mp4 download>${DL_ICON}Pass ${PASS_NUM}</a></div></div>"
    done
    CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison>${PASS_VIDEOS}</div>${REPORT_HTML}</div>"
  fi
--- a/.claude/skills/comfy-qa/scripts/qa-record.ts
+++ b/.claude/skills/comfy-qa/scripts/qa-record.ts
@@ -1952,7 +1952,7 @@ async function main() {
            // QA guide not available
          }
        }
-        const research = await runResearchPhase({
+        let research = await runResearchPhase({
          page,
          issueContext: issueCtx,
          qaGuide: qaGuideText,
@@ -1963,6 +1963,44 @@ async function main() {
        console.warn(
          `Research complete: ${research.verdict} — ${research.summary.slice(0, 100)}`
        )
+
+        // Opus escalation: if Sonnet couldn't reproduce, try Opus
+        if (
+          research.verdict === 'INCONCLUSIVE' &&
+          anthropicKey &&
+          process.env.QA_OPUS_ESCALATION !== '0'
+        ) {
+          console.warn('Escalating to claude-opus-4-6 for complex issue...')
+          try {
+            const opusResult = await runResearchPhase({
+              page,
+              issueContext: issueCtx,
+              qaGuide: qaGuideText,
+              outputDir: opts.outputDir,
+              serverUrl: opts.serverUrl,
+              anthropicApiKey: anthropicKey,
+              model: 'claude-opus-4-6',
+              maxTurns: 30
+            })
+            console.warn(
+              `Opus result: ${opusResult.verdict} — ${opusResult.summary.slice(0, 100)}`
+            )
+            // Only use Opus result if it's better than Sonnet's
+            if (
+              opusResult.verdict !== 'INCONCLUSIVE' ||
+              !opusResult.summary.includes('API error')
+            ) {
+              research = opusResult
+            } else {
+              console.warn('Opus failed (API error) — keeping Sonnet result')
+            }
+          } catch (opusErr) {
+            console.warn(
+              `Opus escalation failed: ${opusErr instanceof Error ? opusErr.message : opusErr}`
+            )
+            // Keep Sonnet's result
+          }
+        }
        console.warn(`Evidence: ${research.evidence.slice(0, 200)}`)

        // ═══ Phase 2: Record demo video with demowright ═══
@@ -2087,22 +2125,10 @@ export default withDemowright(baseConfig, {
          }

          if (demowrightMp4) {
-            // Trim first 7s (ComfyUI loading screen) from the video
-            try {
-              execSync(
-                `ffmpeg -y -i "${demowrightMp4}" -ss 7 -c copy -avoid_negative_ts 1 "${opts.outputDir}/qa-session.mp4" 2>/dev/null`
-              )
-              console.warn(
-                `Phase 2: Trimmed video → ${opts.outputDir}/qa-session.mp4`
-              )
-            } catch {
-              execSync(
-                `cp "${demowrightMp4}" "${opts.outputDir}/qa-session.mp4"`
-              )
-              console.warn(
-                `Phase 2: Narrated video → ${opts.outputDir}/qa-session.mp4`
-              )
-            }
+            execSync(`cp "${demowrightMp4}" "${opts.outputDir}/qa-session.mp4"`)
+            console.warn(
+              `Phase 2: Narrated video → ${opts.outputDir}/qa-session.mp4`
+            )
          }

          // Also copy raw webm as fallback
--- a/.claude/skills/comfy-qa/scripts/qa-report-template.html
+++ b/.claude/skills/comfy-qa/scripts/qa-report-template.html
@@ -97,7 +97,13 @@ h1{font-size:clamp(1.25rem,2.5vw,1.625rem);font-weight:700;letter-spacing:-.03em
    let html='';
    if(logRes.status==='fulfilled'&&logRes.value.ok){
      const log=await logRes.value.json();
-      html+=`<details style="margin-bottom:1.5rem"><summary style="cursor:pointer;font-weight:600;font-size:1rem;padding:.75rem 1rem;background:var(--surface);border:1px solid var(--border);border-radius:var(--r-lg)">Research Log &mdash; ${log.verdict||'?'} (${log.toolCalls||'?'} tool calls, ${((log.elapsedMs||0)/1000).toFixed(1)}s)</summary><div style="padding:1rem;background:var(--surface);border:1px solid var(--border);border-top:0;border-radius:0 0 var(--r-lg) var(--r-lg);overflow:auto;max-height:600px"><pre style="font-family:var(--font-mono);font-size:.8rem;line-height:1.6;white-space:pre-wrap">${JSON.stringify(log,null,2)}</pre></div></details>`;
+      // Show verdict banner for non-reproduced results
+      if(log.verdict&&log.verdict!=='REPRODUCED'){
+        const colors={NOT_REPRODUCIBLE:{bg:'oklch(25% 0.08 25)',border:'oklch(40% 0.15 25)',icon:'✗'},INCONCLUSIVE:{bg:'oklch(25% 0.06 80)',border:'oklch(40% 0.12 80)',icon:'⚠'}};
+        const c=colors[log.verdict]||colors.INCONCLUSIVE;
+        html+=`<div style="margin-bottom:1.5rem;padding:1.25rem;background:${c.bg};border:1px solid ${c.border};border-radius:var(--r-lg)"><div style="font-size:1.25rem;font-weight:700;margin-bottom:.5rem">${c.icon} ${log.verdict.replace(/_/g,' ')}</div><div style="font-size:.9rem;line-height:1.6;opacity:.9">${(log.summary||'No details available.').replace(/</g,'&lt;')}</div>${log.evidence?`<div style="margin-top:.75rem;padding:.75rem;background:oklch(0% 0 0/.2);border-radius:var(--r);font-family:var(--font-mono);font-size:.8rem;white-space:pre-wrap;max-height:200px;overflow:auto">${log.evidence.replace(/</g,'&lt;')}</div>`:''}</div>`;
+      }
+      html+=`<details style="margin-bottom:1.5rem"><summary style="cursor:pointer;font-weight:600;font-size:1rem;padding:.75rem 1rem;background:var(--surface);border:1px solid var(--border);border-radius:var(--r-lg)">Research Log &mdash; ${log.verdict||'?'} (${(log.log||[]).length||'?'} tool calls, ${((log.elapsedMs||0)/1000).toFixed(1)}s)</summary><div style="padding:1rem;background:var(--surface);border:1px solid var(--border);border-top:0;border-radius:0 0 var(--r-lg) var(--r-lg);overflow:auto;max-height:600px"><pre style="font-family:var(--font-mono);font-size:.8rem;line-height:1.6;white-space:pre-wrap">${JSON.stringify(log,null,2)}</pre></div></details>`;
    }
    if(testRes.status==='fulfilled'&&testRes.value.ok){
      const code=await testRes.value.text();
@@ -115,7 +121,7 @@ function copyBadge(){const u=location.href.replace(/\/[^/]*$/,'/');const b=u+'ba
 document.querySelectorAll('[data-md]').forEach(el=>{const t=el.textContent;el.removeAttribute('data-md');el.innerHTML=marked.parse(t)});
 const FPS=30,FT=1/FPS,SPEEDS=[0.1,0.25,0.5,1,1.5,2];
 document.querySelectorAll('.video-wrap video').forEach(v=>{
-  v.playbackRate=0.5;
+  v.playbackRate=1;
  const c=document.createElement('div');c.className='vctrl';
  const btn=(label,fn)=>{const b=document.createElement('button');b.textContent=label;b.onclick=fn;c.appendChild(b);return b};
  const sep=()=>{const s=document.createElement('div');s.className='vsep';c.appendChild(s)};
@@ -127,7 +133,7 @@ document.querySelectorAll('.video-wrap video').forEach(v=>{
  btn('\u25B6\u25B6',()=>{v.pause();v.currentTime+=FT});
  btn('\u25B6\u25B6\u25B6',()=>{v.currentTime+=FT*10});
  sep();
-  const spdBtns=SPEEDS.map(s=>{const b=btn(s+'x',()=>{v.playbackRate=s;spdBtns.forEach(x=>x.classList.remove('active'));b.classList.add('active')});if(s===0.5)b.classList.add('active');return b});
+  const spdBtns=SPEEDS.map(s=>{const b=btn(s+'x',()=>{v.playbackRate=s;spdBtns.forEach(x=>x.classList.remove('active'));b.classList.add('active')});if(s===1)b.classList.add('active');return b});
  sep();c.appendChild(time);
  const hint=document.createElement('span');hint.className='vhint';hint.textContent='\u2190\u2192 frame \u2022 space play';c.appendChild(hint);
  // Custom seekbar — works even without server range request support
--- a/.github/workflows/pr-qa.yaml
+++ b/.github/workflows/pr-qa.yaml
@@ -26,7 +26,7 @@ on:
        default: focused

 concurrency:
-  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.event.issue.number || github.ref }}
+  group: qa-${{ github.event.pull_request.number || github.event.issue.number || github.ref_name }}
  cancel-in-progress: true

 jobs:
@@ -53,7 +53,7 @@ jobs:

          # Only run on label events if it's one of our labels
          if [ "$EVENT_ACTION" = "labeled" ] && \
-             [ "$LABEL" != "qa-changes" ] && [ "$LABEL" != "qa-full" ] && [ "$LABEL" != "qa-issue" ]; then
+             [ "$LABEL" != "qa-changes" ] && [ "$LABEL" != "qa-full" ] && [ "$LABEL" != "qa-issue" ] && [ "$LABEL" != "Potential Bug" ] && [ "$LABEL" != "verified bug" ]; then
             echo "skip=true" >> "$GITHUB_OUTPUT"
          fi

@@ -206,7 +206,7 @@ jobs:
      - name: Install QA dependencies
        run: |
          pnpm add -D @google/generative-ai@^0.24.1 @anthropic-ai/claude-agent-sdk@^0.2.85
-          git clone --depth 1 https://github.com/snomiao/demowright.git /tmp/demowright
+          git clone --depth 1 --branch feat/show-title-card-api https://github.com/snomiao/demowright.git /tmp/demowright
          cd /tmp/demowright && npm install && npm install typescript && npm run build
          sed -i 's|"./src/setup.ts"|"./dist/setup.mjs"|' register.cjs
          node --input-type=module -e "import{readFileSync,writeFileSync}from'fs';const p=JSON.parse(readFileSync('package.json','utf8'));p.exports['./video-script']={import:'./dist/video-script.mjs',types:'./dist/video-script.d.mts'};p.exports['./setup']={import:'./dist/setup.mjs',types:'./dist/setup.d.mts'};writeFileSync('package.json',JSON.stringify(p,null,2))"
@@ -392,7 +392,7 @@ jobs:
      - name: Install QA dependencies
        run: |
          pnpm add -D @google/generative-ai@^0.24.1
-          git clone --depth 1 https://github.com/snomiao/demowright.git /tmp/demowright
+          git clone --depth 1 --branch feat/show-title-card-api https://github.com/snomiao/demowright.git /tmp/demowright
          cd /tmp/demowright && npm install && npm install typescript && npm run build
          sed -i 's|"./src/setup.ts"|"./dist/setup.mjs"|' register.cjs
          node --input-type=module -e "import{readFileSync,writeFileSync}from'fs';const p=JSON.parse(readFileSync('package.json','utf8'));p.exports['./video-script']={import:'./dist/video-script.mjs',types:'./dist/video-script.d.mts'};p.exports['./setup']={import:'./dist/setup.mjs',types:'./dist/setup.d.mts'};writeFileSync('package.json',JSON.stringify(p,null,2))"
Author	SHA1	Message	Date
snomiao	cdea8bf2c9	fix: Opus escalation graceful fallback on credit exhaustion When Opus API call fails (credit balance, rate limit), keep Sonnet's result instead of overwriting with INCONCLUSIVE API error. Only use Opus result if it's actually better than Sonnet's attempt.	2026-04-14 15:37:07 +00:00
snomiao	a2da58eb0f	feat: Opus escalation for INCONCLUSIVE issues Sonnet tries first. If INCONCLUSIVE, automatically retries with claude-opus-4-6 (30 turns). Disable with QA_OPUS_ESCALATION=0. Also: model param added to ResearchOptions for flexibility.	2026-04-14 13:14:33 +00:00
snomiao	3154865ce2	feat: Phase 1 improvements — concurrency, auto-trigger, better prompts - B1: Fix concurrency group to use ref_name (parallel sno-qa-* branches) - D1: Auto-trigger QA on 'Potential Bug' and 'verified bug' labels - A4: Prompt agent to read existing tests first before writing - Turn budget enforcement from previous commit	2026-04-14 13:12:49 +00:00
snomiao	ff6034e2ee	fix: reduce INCONCLUSIVE rate — enforce turn budget and fail-fast - 3 consecutive test failures → call done(NOT_REPRODUCIBLE) - Turn budget: ~3 inspect, 2 write, 3 fix = ~10 tool calls max - Prevents 20+ tool call retry loops that waste CI time	2026-04-13 19:41:54 +00:00
snomiao	529ac3cea4	trigger: re-run cancelled batch 2	2026-04-13 18:42:20 +00:00
snomiao	f95eebf3db	trigger: re-run cancelled QA batches	2026-04-13 17:49:03 +00:00
snomiao	1dd66315ed	docs: update SKILL.md with current CLI and architecture	2026-04-13 16:21:02 +00:00
snomiao	83de5a222e	feat: use demowright showTitleCard API for early title overlay - Import showTitleCard/hideTitleCard from demowright/video-script - Replace page.evaluate() hack with official demowright API - CI clones demowright feat/show-title-card-api branch - demowright PR: https://github.com/snomiao/demowright/pull/11	2026-04-13 09:24:44 +00:00
snomiao	2faadaeab0	fix: early title card covers setup, remove unstable ffmpeg trim Show title card via page.evaluate() IMMEDIATELY before setup code runs. Setup (setSetting, setupWorkflowsDirectory) executes behind the card. Card is removed before createVideoScript() renders its own title. This ensures the title card is visible from the first frame of the video.	2026-04-13 09:18:43 +00:00
snomiao	cb921ada71	fix: remove autoplay so browser plays video with sound	2026-04-13 07:21:29 +00:00
snomiao	884270c46f	feat: show verdict banner with failure reason for non-reproduced bugs NOT_REPRODUCIBLE and INCONCLUSIVE verdicts now display a prominent banner with the agent's summary and evidence explaining why the bug could not be reproduced. Default video playback speed changed to 1x.	2026-04-13 06:27:05 +00:00
snomiao	51e48b55b3	fix: default video playback speed 1x instead of 0.5x	2026-04-13 06:19:56 +00:00
snomiao	07f6611cc8	trigger: re-run QA for #10766	2026-04-12 13:51:17 +00:00