Compare commits

...

15 Commits

Author SHA1 Message Date
snomiao
fb5813dddb feat: richer fixture API hints — i18n, queue mocks, subgraph helpers
- Add Comfy.Locale setting ID for i18n language switching tests
- Add queue/assets mock API (mockOutputHistory, runButton)
- Add subgraph workflow asset paths for loadWorkflow
- Add SubgraphHelper API docs (slot ops, navigation, conversion)
- Add VueNodeHelpers enterSubgraph/selectNode
2026-04-15 16:40:27 +00:00
snomiao
e153e58c91 feat: smarter agent — precondition reasoning + issue comments
- Prompt agent to reason about hidden preconditions before writing tests
  (e.g. z-index bugs need crowded canvas, not empty default workflow)
- Fetch issue comments with reproduction hints (repro/step/workaround)
- Better error analysis: different strategy on retry, not same code
- Both CI workflow and pnpm qa CLI fetch comments
2026-04-15 16:31:20 +00:00
snomiao
cdea8bf2c9 fix: Opus escalation graceful fallback on credit exhaustion
When Opus API call fails (credit balance, rate limit), keep Sonnet's
result instead of overwriting with INCONCLUSIVE API error. Only use
Opus result if it's actually better than Sonnet's attempt.
2026-04-14 15:37:07 +00:00
snomiao
a2da58eb0f feat: Opus escalation for INCONCLUSIVE issues
Sonnet tries first. If INCONCLUSIVE, automatically retries with
claude-opus-4-6 (30 turns). Disable with QA_OPUS_ESCALATION=0.
Also: model param added to ResearchOptions for flexibility.
2026-04-14 13:14:33 +00:00
snomiao
3154865ce2 feat: Phase 1 improvements — concurrency, auto-trigger, better prompts
- B1: Fix concurrency group to use ref_name (parallel sno-qa-* branches)
- D1: Auto-trigger QA on 'Potential Bug' and 'verified bug' labels
- A4: Prompt agent to read existing tests first before writing
- Turn budget enforcement from previous commit
2026-04-14 13:12:49 +00:00
snomiao
ff6034e2ee fix: reduce INCONCLUSIVE rate — enforce turn budget and fail-fast
- 3 consecutive test failures → call done(NOT_REPRODUCIBLE)
- Turn budget: ~3 inspect, 2 write, 3 fix = ~10 tool calls max
- Prevents 20+ tool call retry loops that waste CI time
2026-04-13 19:41:54 +00:00
snomiao
529ac3cea4 trigger: re-run cancelled batch 2 2026-04-13 18:42:20 +00:00
snomiao
f95eebf3db trigger: re-run cancelled QA batches 2026-04-13 17:49:03 +00:00
snomiao
1dd66315ed docs: update SKILL.md with current CLI and architecture 2026-04-13 16:21:02 +00:00
snomiao
83de5a222e feat: use demowright showTitleCard API for early title overlay
- Import showTitleCard/hideTitleCard from demowright/video-script
- Replace page.evaluate() hack with official demowright API
- CI clones demowright feat/show-title-card-api branch
- demowright PR: https://github.com/snomiao/demowright/pull/11
2026-04-13 09:24:44 +00:00
snomiao
2faadaeab0 fix: early title card covers setup, remove unstable ffmpeg trim
Show title card via page.evaluate() IMMEDIATELY before setup code runs.
Setup (setSetting, setupWorkflowsDirectory) executes behind the card.
Card is removed before createVideoScript() renders its own title.
This ensures the title card is visible from the first frame of the video.
2026-04-13 09:18:43 +00:00
snomiao
cb921ada71 fix: remove autoplay so browser plays video with sound 2026-04-13 07:21:29 +00:00
snomiao
884270c46f feat: show verdict banner with failure reason for non-reproduced bugs
NOT_REPRODUCIBLE and INCONCLUSIVE verdicts now display a prominent
banner with the agent's summary and evidence explaining why the bug
could not be reproduced. Default video playback speed changed to 1x.
2026-04-13 06:27:05 +00:00
snomiao
51e48b55b3 fix: default video playback speed 1x instead of 0.5x 2026-04-13 06:19:56 +00:00
snomiao
07f6611cc8 trigger: re-run QA for #10766 2026-04-12 13:51:17 +00:00
7 changed files with 231 additions and 293 deletions

View File

@@ -1,283 +1,133 @@
---
name: comfy-qa
description: 'Comprehensive QA of ComfyUI frontend. Navigates all routes, tests all interactive features using playwright-cli, generates a report, and submits a draft PR. Works in CI and local environments, cross-platform.'
description: 'Comprehensive QA of ComfyUI frontend. Reproduces bugs via E2E tests, records narrated demo videos, deploys reports. Works in CI and locally via `pnpm qa`.'
---
# ComfyUI Frontend QA Skill
Automated quality assurance for the ComfyUI frontend. The pipeline reproduces reported bugs using Playwright E2E tests, records video evidence, and deploys reports to Cloudflare Pages.
Automated quality assurance pipeline that reproduces reported bugs using Playwright E2E tests, records narrated demo videos with demowright, and deploys reports to Cloudflare Pages.
## Architecture Overview
## Quick Start
The QA pipeline uses a **three-phase approach**:
```bash
# Reproduce an issue
pnpm qa 10253
1. **RESEARCH** — Claude writes Playwright E2E tests to reproduce bugs (assertion-backed, no hallucination)
2. **REPRODUCE** — Deterministic replay of the research test with video recording
3. **REPORT** — Deploy results to Cloudflare Pages with badge, video, and verdict
# Test a PR
pnpm qa 10270
### Key Design Decision
# Test PR base (reproduce bug)
pnpm qa 10270 -t base
Earlier iterations used AI vision (Gemini) to drive a browser and judge results from video. This was abandoned after discovering **AI reviewers hallucinate** — Gemini reported "REPRODUCED" when videos showed idle screens. The current approach uses **Playwright assertions** as the source of truth: if the test passes, the bug is proven.
# Test both base + head
pnpm qa 10270 -t both
## Prerequisites
# Test local uncommitted changes
pnpm qa --uncommitted
- Node.js 22+
- `pnpm` package manager
- `gh` CLI (authenticated)
- Playwright browsers: `npx playwright install chromium`
- Environment variables:
- `GEMINI_API_KEY` — for PR analysis and video review
- `ANTHROPIC_API_KEY` — for Claude Agent SDK (research phase)
- `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` — for report deployment
# Help
pnpm qa -h
```
Auto-loads `.env.local` / `.env` for `GEMINI_API_KEY` and `ANTHROPIC_API_KEY`.
Auto-detects issue vs PR via GitHub API. Auto-starts ComfyUI if not running.
## Architecture
Three-phase pipeline:
1. **RESEARCH** — Claude Sonnet 4.6 (Agent SDK) writes Playwright E2E tests to reproduce bugs
2. **RECORD** — Re-runs test with demowright for narrated demo video (title cards, TTS, subtitles)
3. **REPORT** — Gemini reviews video, deploys to Cloudflare Pages with badge + verdict
Key principle: **Playwright assertions are truth** — no AI hallucination. If the test passes, the bug is proven.
## Pipeline Scripts
| Script | Role | Model |
| --------------------------------- | ------------------------------------------------------- | ----------------------------- |
| `scripts/qa-analyze-pr.ts` | Deep PR/issue analysis → QA guide | gemini-3.1-pro-preview |
| `scripts/qa-agent.ts` | Research phase: Claude writes E2E tests | claude-sonnet-4-6 (Agent SDK) |
| `scripts/qa-record.ts` | Before/after video recording with Gemini-driven actions | gemini-3.1-pro-preview |
| `scripts/qa-reproduce.ts` | Deterministic replay with narration | gemini-3-flash-preview |
| `scripts/qa-video-review.ts` | Video comparison review | gemini-3-flash-preview |
| `scripts/qa-generate-test.ts` | Regression test generation from QA report | gemini-3-flash-preview |
| `scripts/qa-deploy-pages.sh` | Deploy to Cloudflare Pages + badge | — |
| `scripts/qa-batch.sh` | Batch-trigger QA for multiple issues | — |
| `scripts/qa-report-template.html` | Report site (light/dark, seekbar, copy badge) | — |
All scripts live in `.claude/skills/comfy-qa/scripts/`:
| Script | Role |
| ------------------------- | ----------------------------------------------------- |
| `qa.ts` | CLI entry point (`pnpm qa`) |
| `qa-agent.ts` | Research phase: Claude writes E2E tests via Agent SDK |
| `qa-record.ts` | Orchestrator: 3-phase pipeline |
| `qa-deploy-pages.sh` | Cloudflare Pages deploy + badge generation |
| `qa-report-template.html` | Report site template |
| `qa-video-review.ts` | Gemini video review |
| `qa-analyze-pr.ts` | Deep PR/issue analysis → QA guide |
## Triggering QA
### Via GitHub Labels
- **`qa-changes`** — Focused QA on a PR (Linux-only, before/after comparison)
- **`qa-full`** — Full QA (3-OS matrix, after-only)
- **`qa-issue`** — Reproduce a bug from an issue
### Via Batch Script
### Via CLI (`pnpm qa`)
```bash
# Trigger QA for specific issue numbers
./scripts/qa-batch.sh 10394 10238 9996
# From a triage file (top 5 Tier 1 issues)
./scripts/qa-batch.sh --from tmp/issues.md --top 5
# Preview without pushing
./scripts/qa-batch.sh --dry-run 10394
# Clean up old trigger branches
./scripts/qa-batch.sh --cleanup
pnpm qa 10253 # issue (auto-detect)
pnpm qa 10270 # PR head
pnpm qa 10270 -t base # PR base
pnpm qa 10270 -t both # both
pnpm qa --uncommitted # local changes
```
### Via Workflow Dispatch
### Via GitHub Labels
Go to Actions → "PR: QA" → Run workflow → choose mode (focused/full).
- **`qa-issue`** — Reproduce a bug from an issue
- **`qa-changes`** — Focused QA on a PR (Linux-only, before/after)
- **`qa-full`** — Full QA (3-OS matrix)
### Via Push to trigger branches
```bash
git push origin sno-skills:sno-qa-10253 --force
```
## Research Phase (`qa-agent.ts`)
Claude receives the issue/PR context + a11y tree snapshot + ComfyPage fixture API docs.
Tools:
- **`inspect(selector?)`** — Read a11y tree
- **`readFixture(path)`** — Read fixture source code
- **`readTest(path)`** — Read existing tests for patterns
- **`writeTest(code)`** — Write a Playwright .spec.ts
- **`runTest()`** — Execute and get pass/fail + errors
- **`done(verdict, summary, evidence, testCode, videoScript?)`** — Finish
When `verdict=REPRODUCED`, Claude also provides a `videoScript` — a separate test file using demowright's `createVideoScript()` for professional narrated demo video with title cards, TTS segments, and outro.
## Video Recording (demowright)
Phase 2 uses the video script to record with:
- `showTitleCard()` / `hideTitleCard()` — covers setup/loading screen
- `createVideoScript().title().segment().outro()` — structured narration
- `pace()` — narration-then-action timing
- TTS audio + subtitles + cursor overlay + key badges
## Report Site
Deployed to `https://sno-qa-{number}.comfy-qa.pages.dev/`
Features:
- Video player (1x default, adjustable speed)
- Research log (verdict, tool calls, timing)
- E2E test code + video script code
- Verdict banner for NOT_REPRODUCIBLE/INCONCLUSIVE with failure reason
- Copy badge button (markdown)
## Prerequisites
- `GEMINI_API_KEY` — video review, TTS
- `ANTHROPIC_API_KEY` — Claude Agent SDK (research phase)
- `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` — report deployment (CI only)
- ComfyUI server running (auto-detected, or auto-started)
## CI Workflow (`.github/workflows/pr-qa.yaml`)
```
resolve-matrix → analyze-pr ──┐
├→ qa-before (main branch, worktree build)
├→ qa-before (main branch)
├→ qa-after (PR branch)
└→ report (video review, deploy, comment)
```
Before/after jobs run **in parallel** on separate runners for clean isolation.
### Issue Reproduce Mode
For issues (not PRs), the pipeline:
1. Fetches the issue body and comments
2. Runs `qa-analyze-pr.ts --type issue` to generate a QA guide
3. Runs the research phase (Claude writes E2E test to reproduce)
4. Records video of the test execution
5. Posts results as a comment on the issue
## Running Locally
### Step 1: Environment Setup
```bash
# Ensure ComfyUI server is running
# Default: http://127.0.0.1:8188
# Install Playwright browsers
npx playwright install chromium
```
### Step 2: Analyze the Issue/PR
```bash
# For a PR
pnpm exec tsx scripts/qa-analyze-pr.ts \
--pr-number 10394 \
--repo Comfy-Org/ComfyUI_frontend \
--output-dir qa-guides
# For an issue
pnpm exec tsx scripts/qa-analyze-pr.ts \
--pr-number 10394 \
--repo Comfy-Org/ComfyUI_frontend \
--output-dir qa-guides \
--type issue
```
### Step 3: Record Before/After
```bash
# Before (main branch)
pnpm exec tsx scripts/qa-record.ts \
--mode before \
--diff /tmp/pr-diff.txt \
--output-dir /tmp/qa-before \
--qa-guide qa-guides/qa-guide-1.json
# After (PR branch)
pnpm exec tsx scripts/qa-record.ts \
--mode after \
--diff /tmp/pr-diff.txt \
--output-dir /tmp/qa-after \
--qa-guide qa-guides/qa-guide-1.json
```
### Step 4: Review Videos
```bash
pnpm exec tsx scripts/qa-video-review.ts \
--artifacts-dir /tmp/qa-artifacts \
--video-file qa-session.mp4 \
--before-video qa-before-session.mp4 \
--output-dir /tmp/video-reviews \
--pr-context /tmp/pr-context.txt
```
## Research Phase Details (`qa-agent.ts`)
Claude receives:
- The issue description and comments
- A QA guide from `qa-analyze-pr.ts`
- An accessibility tree snapshot of the current UI
Claude's tools:
- **`inspect(selector?)`** — Read a11y tree to discover element selectors
- **`writeTest(code)`** — Write a Playwright `.spec.ts` file
- **`runTest()`** — Execute the test and get pass/fail + errors
- **`done(verdict, summary, evidence, testCode)`** — Finish with verdict
The test uses the project's Playwright fixtures (`comfyPageFixture`), giving access to `comfyPage.page`, `comfyPage.menu`, `comfyPage.settings`, etc.
### Verdict Logic
- **REPRODUCED** — Test passes (asserting the bug exists) → bug is proven
- **NOT_REPRODUCIBLE** — Claude exhausted attempts, test cannot pass
- **INCONCLUSIVE** — Agent timed out or encountered infrastructure issues
Auto-completion: if a test passed but `done()` was never called, the pipeline auto-completes with REPRODUCED.
## Manual QA (Fallback)
When the automated pipeline isn't suitable (e.g., visual-only bugs, complex multi-step interactions), use **playwright-cli** for manual browser interaction:
```bash
# Install
npm install -g @playwright/cli@latest
# Open browser and navigate
playwright-cli open http://127.0.0.1:8188
# Get element references
playwright-cli snapshot
# Interact
playwright-cli click e1
playwright-cli fill e2 "test text"
playwright-cli press Escape
playwright-cli screenshot --filename=f.png
```
Snapshots return element references (`e1`, `e2`, …). Always run `snapshot` after navigation to refresh refs.
## Manual QA Test Plan
When performing manual QA (either via playwright-cli or the automated pipeline), systematically test each area below.
### Application Load & Routes
| Test | Steps |
| ----------------- | ------------------------------------------------------------ |
| Root route loads | Navigate to `/` — GraphView should render with canvas |
| User select route | Navigate to `/user-select` — user selection UI should appear |
| 404 handling | Navigate to `/nonexistent` — should handle gracefully |
### Canvas & Graph View
| Test | Steps |
| ------------------------- | -------------------------------------------------------------- |
| Canvas renders | The LiteGraph canvas is visible and interactive |
| Pan canvas | Click and drag on empty canvas area |
| Zoom in/out | Use scroll wheel or Alt+=/Alt+- |
| Add node via double-click | Double-click canvas to open search, type "KSampler", select it |
| Delete node | Select a node, press Delete key |
| Connect nodes | Drag from output slot to input slot |
| Copy/Paste | Select nodes, Ctrl+C then Ctrl+V |
| Undo/Redo | Make changes, Ctrl+Z to undo, Ctrl+Y to redo |
| Context menus | Right-click node vs empty canvas — different menus |
### Sidebar Tabs
| Test | Steps |
| ----------------- | ------------------------------------- |
| Workflows tab | Press W — workflows sidebar opens |
| Node Library tab | Press N — node library opens |
| Model Library tab | Press M — model library opens |
| Tab toggle | Press same key again — sidebar closes |
| Search in sidebar | Type in search box — results filter |
### Settings Dialog
| Test | Steps |
| ---------------- | ---------------------------------------------------- |
| Open settings | Press Ctrl+, or click settings button |
| Change a setting | Toggle a boolean setting — it persists after closing |
| Search settings | Type in settings search box — results filter |
| Close settings | Press Escape or click close button |
### Execution & Queue
| Test | Steps |
| -------------- | ----------------------------------------------------- |
| Queue prompt | Load default workflow, click Queue — execution starts |
| Queue progress | Progress indicator shows during execution |
| Interrupt | Press Ctrl+Alt+Enter during execution — interrupts |
## Report Site
Deployed to Cloudflare Pages at `https://comfy-qa.pages.dev/<branch>/`.
Features:
- Light/dark theme
- Seekable video player with preload
- Copy badge button (markdown)
- Date-stamped badges (e.g., `QA0327`)
- Vertical box badge for issues and PRs
## Known Issues & Troubleshooting
See `docs/qa/TROUBLESHOOTING.md` for common failures:
- `set -euo pipefail` + grep with no match → append `|| true`
- `__name is not defined` in `page.evaluate` → use `addScriptTag`
- Cursor not visible in videos → monkey-patch `page.mouse` methods
- Agent not calling `done()` → auto-complete from passing test
## Backlog
See `docs/qa/backlog.md` for planned improvements:
- **Type B comparison**: Different commits for regression detection
- **Type C comparison**: Cross-browser testing
- **Pre-seed assets**: Upload test images before recording
- **Lazy a11y tree**: Reduce token usage with `inspect(selector)` vs full dump

View File

@@ -35,6 +35,7 @@ interface ResearchOptions {
anthropicApiKey?: string
maxTurns?: number
timeBudgetMs?: number
model?: string
}
export type ReproMethod = 'e2e_test' | 'video' | 'both' | 'none'
@@ -401,15 +402,28 @@ export async function runResearchPhase(
- done(verdict, summary, evidence, testCode) — Finish with the final test
## Workflow
1. Read the issue description carefully
2. Use inspect() to understand the current UI state and discover element selectors
3. If unsure about the fixture API, use readFixture() to read the relevant helper source code
4. If unsure about test patterns, use readTest() to read an existing test for reference
1. Read the issue description carefully. Think about:
- What PRECONDITIONS are needed? (many nodes on canvas? specific layout? saved workflow? subgraph?)
- What HIDDEN ASSUMPTIONS exist? (e.g. "z-index bug" means nodes must overlap → need a crowded canvas)
- What specific UI STATE triggers the bug? (dirty workflow? collapsed node? specific menu open?)
2. Use readTest() to read 1-2 existing tests similar to the bug:
- For menu/workflow bugs: readTest("workflow.spec.ts") or readTest("topbarMenu.spec.ts")
- For node/canvas bugs: readTest("nodeInteraction.spec.ts") or readTest("copyPaste.spec.ts")
- For settings bugs: readTest("settingDialogSearch.spec.ts")
- For subgraph bugs: readTest("subgraph.spec.ts")
3. Use inspect() to understand the current UI state and discover element selectors
4. If unsure about the fixture API, use readFixture("ComfyPage.ts") or relevant helper
5. Write a Playwright test that:
- Performs the exact reproduction steps from the issue
- Asserts the BROKEN behavior (the bug) — so the test PASSES when the bug exists
- FIRST sets up the preconditions (add multiple nodes, create specific layout, save workflow, etc.)
- THEN performs the reproduction steps from the issue
- FINALLY asserts the BROKEN behavior (the bug) — so the test PASSES when the bug exists
- Think like a tester: the bug may only appear under specific conditions that the reporter assumed were obvious
6. Run the test with runTest()
7. If it fails: read the error, fix the test, run again (max 5 attempts)
7. If it fails, ANALYZE the error before retrying:
- Is it a selector issue? Use inspect() to find the right element
- Is it a timing issue? The UI may need time to update — use nextFrame() or expect.poll()
- Is the precondition wrong? Maybe the bug only appears with MORE nodes, AFTER a save, etc.
- Try a DIFFERENT approach, not the same code with minor tweaks
8. Call done() with the final verdict and test code
## Test writing guidelines
@@ -423,6 +437,8 @@ export async function runResearchPhase(
- Use \`comfyPage.nextFrame()\` after interactions that trigger UI updates
- NEVER use \`page.waitForTimeout()\` — use Locator actions and retrying assertions instead
- ALWAYS call done() when finished, even if the test passed — do not keep iterating after a passing test
- CRITICAL: If your test FAILS 3 times in a row with the same or similar error, call done(NOT_REPRODUCIBLE) immediately. Do NOT keep retrying the same approach — try a completely different strategy or give up. Spending 20+ tool calls on failing tests is wasteful.
- Budget your turns: spend at most 3 turns on inspect/readFixture, 2 turns writing the first test, then max 3 fix attempts. If still failing after ~10 tool calls, call done().
- Use \`expect.poll()\` for async assertions: \`await expect.poll(() => comfyPage.nodeOps.getGraphNodesCount()).toBe(8)\`
- CRITICAL: Your assertions must be SPECIFIC TO THE BUG. A test that asserts \`expect(count).toBeGreaterThan(0)\` proves nothing — it would pass even without the bug. Instead assert the exact broken state, e.g. \`expect(clonedWidgets).toHaveLength(0)\` (missing widgets) or \`expect(zIndex).toBeLessThan(parentZIndex)\` (wrong z-order). If a test passes trivially, it's a false positive.
- NEVER write "debug", "discovery", or "inspect node types" tests. These waste turns and produce false REPRODUCED verdicts. If you need to discover node type names, use inspect() or readFixture() — not a passing test.
@@ -481,6 +497,10 @@ export async function runResearchPhase(
### Settings (comfyPage.settings)
- \`.setSetting(id, value)\` — change a ComfyUI setting
- \`.getSetting(id)\` — read current setting value
- Common setting IDs:
- \`'Comfy.UseNewMenu'\` — 'Top' | 'Bottom' | 'Disabled'
- \`'Comfy.Locale'\` — 'en' | 'zh' | 'ja' | 'ko' | 'ru' | 'fr' | 'es' etc. (change UI language)
- \`'Comfy.NodeBadge.NodeSourceBadgeMode'\` — node badge display
### Keyboard (comfyPage.keyboard)
- \`.undo()\` / \`.redo()\` — Ctrl+Z / Ctrl+Y
@@ -493,6 +513,7 @@ export async function runResearchPhase(
- \`.setupWorkflowsDirectory(structure)\` — setup test directory
- \`.deleteWorkflow(name)\`
- \`.isCurrentWorkflowModified()\` — check dirty state
- Available subgraph assets: loadWorkflow('subgraphs/basic-subgraph'), 'subgraphs/nested-subgraph', 'subgraphs/subgraph-with-promoted-text-widget', etc.
### Context Menu (comfyPage.contextMenu)
- \`.openFor(locator)\` — right-click locator and wait for menu
@@ -500,12 +521,26 @@ export async function runResearchPhase(
- \`.isVisible()\` — check if context menu is showing
- \`.assertHasItems(items)\` — assert menu contains items
### Queue & Assets (comfyPage.assets)
- \`comfyPage.runButton.click()\` — execute current workflow (backend runs with --cpu in CI)
- \`comfyPage.assets.mockOutputHistory(jobs)\` — mock queue history with fake job items
- \`comfyPage.assets.mockEmptyState()\` — clear all mocked state
- Queue overlay: \`page.getByTestId('queue-overlay-toggle')\` to open queue panel
### Subgraph (comfyPage.subgraph)
- \`.isInSubgraph()\` — check if currently viewing a subgraph
- \`.getNodeCount()\` — nodes in current graph view
- \`.getSlotCount('input'|'output')\` — I/O slot count
- \`.connectToInput(sourceNode, slotIdx, inputName)\` — connect to subgraph input
- \`.exitViaBreadcrumb()\` — navigate out of subgraph
- \`.convertDefaultKSamplerToSubgraph()\` — helper: convert default workflow node to subgraph
- NodeReference: \`.convertToSubgraph()\`, \`.navigateIntoSubgraph()\`
### Other helpers
- \`comfyPage.settingDialog\` — SettingDialog component
- \`comfyPage.searchBox\` / \`comfyPage.searchBoxV2\` — node search
- \`comfyPage.toast\` — ToastHelper (\`.visibleToasts\`)
- \`comfyPage.subgraph\`SubgraphHelper
- \`comfyPage.vueNodes\` — VueNodeHelpers
- \`comfyPage.vueNodes\`VueNodeHelpers (\`.enterSubgraph(nodeId)\`, \`.selectNode(nodeId)\`)
- \`comfyPage.bottomPanel\` — BottomPanel
- \`comfyPage.clipboard\` — ClipboardHelper
- \`comfyPage.dragDrop\` — DragDropHelper
@@ -525,14 +560,19 @@ The videoScript is a complete, standalone Playwright test file for Phase 2 demo
\`\`\`typescript
import { comfyPageFixture as test } from '../fixtures/ComfyPage'
import { createVideoScript } from 'demowright/video-script'
import { createVideoScript, showTitleCard, hideTitleCard } from 'demowright/video-script'
test('Demo: Bug Title', async ({ comfyPage }) => {
// IMPORTANT: ALL setup code MUST go here BEFORE createVideoScript()
// so the title card is the FIRST thing viewers see in the video
// Show title card IMMEDIATELY — covers the screen while setup runs behind it
await showTitleCard(comfyPage.page, 'Bug Title Here', { subtitle: 'Issue #NNNN' })
// Setup runs while title card is visible
await comfyPage.settings.setSetting('Comfy.UseNewMenu', 'Top')
await comfyPage.workflow.setupWorkflowsDirectory({})
// Remove early title card before script starts (script will show its own)
await hideTitleCard(comfyPage.page)
const script = createVideoScript()
.title('Bug Title Here', { subtitle: 'Issue #NNNN', durationMs: 4000 })
.segment('Step 1: description of what we do', async (pace) => {
@@ -570,7 +610,7 @@ to happen before it happens. Pattern:
IMPORTANT RULES for videoScript:
1. You MUST provide videoScript when verdict is REPRODUCED — every reproduced bug needs a narrated demo
2. ALL setup code (setSetting, setupWorkflowsDirectory) goes BEFORE createVideoScript() — title card must be first thing in video
2. Call showTitleCard() BEFORE setup, run setup behind it, call hideTitleCard() before createVideoScript() — see example
3. Call \`await pace()\` FIRST in each segment callback, BEFORE actions
4. Add \`waitForTimeout(2000)\` after each action so viewers can see the result
5. Final evidence segment: hold for 5+ seconds
@@ -591,7 +631,7 @@ ${issueContext}`
prompt:
'Write a Playwright E2E test that reproduces the reported bug. Use inspect() to discover selectors, readFixture() or readTest() if you need to understand the fixture API or see existing test patterns, writeTest() to write the test, runTest() to execute it. Iterate until it works or you determine the bug cannot be reproduced.',
options: {
model: 'claude-sonnet-4-6',
model: opts.model ?? 'claude-sonnet-4-6',
systemPrompt,
...(anthropicApiKey ? { apiKey: anthropicApiKey } : {}),
maxTurns,

View File

@@ -77,15 +77,15 @@ for os in Linux macOS Windows; do
fi
if [ "$HAS_BEFORE" = "1" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls autoplay preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls autoplay preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"
elif [ -f "$DEPLOY_DIR/qa-${os}.mp4" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=video-wrap><video controls autoplay preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=card-body><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links><a class=dl href=qa-${os}.mp4 download>${DL_ICON}Download</a>${REPORT_LINK}</span></div>${REPORT_HTML}</div>"
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=video-wrap><video controls preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=card-body><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links><a class=dl href=qa-${os}.mp4 download>${DL_ICON}Download</a>${REPORT_LINK}</span></div>${REPORT_HTML}</div>"
else
PASS_VIDEOS=""
for pass_vid in "$DEPLOY_DIR/qa-${os}-pass"[0-9].mp4; do
[ -f "$pass_vid" ] || continue
PASS_NUM=$(basename "$pass_vid" | sed "s/qa-${os}-pass\([0-9]\).mp4/\1/")
PASS_VIDEOS="${PASS_VIDEOS}<div class=comp-panel><div class=comp-label>Pass ${PASS_NUM}</div><div class=video-wrap><video controls autoplay preload=auto><source src=qa-${os}-pass${PASS_NUM}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}-pass${PASS_NUM}.mp4 download>${DL_ICON}Pass ${PASS_NUM}</a></div></div>"
PASS_VIDEOS="${PASS_VIDEOS}<div class=comp-panel><div class=comp-label>Pass ${PASS_NUM}</div><div class=video-wrap><video controls preload=auto><source src=qa-${os}-pass${PASS_NUM}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}-pass${PASS_NUM}.mp4 download>${DL_ICON}Pass ${PASS_NUM}</a></div></div>"
done
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison>${PASS_VIDEOS}</div>${REPORT_HTML}</div>"
fi

View File

@@ -1952,7 +1952,7 @@ async function main() {
// QA guide not available
}
}
const research = await runResearchPhase({
let research = await runResearchPhase({
page,
issueContext: issueCtx,
qaGuide: qaGuideText,
@@ -1963,6 +1963,44 @@ async function main() {
console.warn(
`Research complete: ${research.verdict}${research.summary.slice(0, 100)}`
)
// Opus escalation: if Sonnet couldn't reproduce, try Opus
if (
research.verdict === 'INCONCLUSIVE' &&
anthropicKey &&
process.env.QA_OPUS_ESCALATION !== '0'
) {
console.warn('Escalating to claude-opus-4-6 for complex issue...')
try {
const opusResult = await runResearchPhase({
page,
issueContext: issueCtx,
qaGuide: qaGuideText,
outputDir: opts.outputDir,
serverUrl: opts.serverUrl,
anthropicApiKey: anthropicKey,
model: 'claude-opus-4-6',
maxTurns: 30
})
console.warn(
`Opus result: ${opusResult.verdict}${opusResult.summary.slice(0, 100)}`
)
// Only use Opus result if it's better than Sonnet's
if (
opusResult.verdict !== 'INCONCLUSIVE' ||
!opusResult.summary.includes('API error')
) {
research = opusResult
} else {
console.warn('Opus failed (API error) — keeping Sonnet result')
}
} catch (opusErr) {
console.warn(
`Opus escalation failed: ${opusErr instanceof Error ? opusErr.message : opusErr}`
)
// Keep Sonnet's result
}
}
console.warn(`Evidence: ${research.evidence.slice(0, 200)}`)
// ═══ Phase 2: Record demo video with demowright ═══
@@ -2087,22 +2125,10 @@ export default withDemowright(baseConfig, {
}
if (demowrightMp4) {
// Trim first 7s (ComfyUI loading screen) from the video
try {
execSync(
`ffmpeg -y -i "${demowrightMp4}" -ss 7 -c copy -avoid_negative_ts 1 "${opts.outputDir}/qa-session.mp4" 2>/dev/null`
)
console.warn(
`Phase 2: Trimmed video → ${opts.outputDir}/qa-session.mp4`
)
} catch {
execSync(
`cp "${demowrightMp4}" "${opts.outputDir}/qa-session.mp4"`
)
console.warn(
`Phase 2: Narrated video → ${opts.outputDir}/qa-session.mp4`
)
}
execSync(`cp "${demowrightMp4}" "${opts.outputDir}/qa-session.mp4"`)
console.warn(
`Phase 2: Narrated video → ${opts.outputDir}/qa-session.mp4`
)
}
// Also copy raw webm as fallback

View File

@@ -97,7 +97,13 @@ h1{font-size:clamp(1.25rem,2.5vw,1.625rem);font-weight:700;letter-spacing:-.03em
let html='';
if(logRes.status==='fulfilled'&&logRes.value.ok){
const log=await logRes.value.json();
html+=`<details style="margin-bottom:1.5rem"><summary style="cursor:pointer;font-weight:600;font-size:1rem;padding:.75rem 1rem;background:var(--surface);border:1px solid var(--border);border-radius:var(--r-lg)">Research Log &mdash; ${log.verdict||'?'} (${log.toolCalls||'?'} tool calls, ${((log.elapsedMs||0)/1000).toFixed(1)}s)</summary><div style="padding:1rem;background:var(--surface);border:1px solid var(--border);border-top:0;border-radius:0 0 var(--r-lg) var(--r-lg);overflow:auto;max-height:600px"><pre style="font-family:var(--font-mono);font-size:.8rem;line-height:1.6;white-space:pre-wrap">${JSON.stringify(log,null,2)}</pre></div></details>`;
// Show verdict banner for non-reproduced results
if(log.verdict&&log.verdict!=='REPRODUCED'){
const colors={NOT_REPRODUCIBLE:{bg:'oklch(25% 0.08 25)',border:'oklch(40% 0.15 25)',icon:'✗'},INCONCLUSIVE:{bg:'oklch(25% 0.06 80)',border:'oklch(40% 0.12 80)',icon:'⚠'}};
const c=colors[log.verdict]||colors.INCONCLUSIVE;
html+=`<div style="margin-bottom:1.5rem;padding:1.25rem;background:${c.bg};border:1px solid ${c.border};border-radius:var(--r-lg)"><div style="font-size:1.25rem;font-weight:700;margin-bottom:.5rem">${c.icon} ${log.verdict.replace(/_/g,' ')}</div><div style="font-size:.9rem;line-height:1.6;opacity:.9">${(log.summary||'No details available.').replace(/</g,'&lt;')}</div>${log.evidence?`<div style="margin-top:.75rem;padding:.75rem;background:oklch(0% 0 0/.2);border-radius:var(--r);font-family:var(--font-mono);font-size:.8rem;white-space:pre-wrap;max-height:200px;overflow:auto">${log.evidence.replace(/</g,'&lt;')}</div>`:''}</div>`;
}
html+=`<details style="margin-bottom:1.5rem"><summary style="cursor:pointer;font-weight:600;font-size:1rem;padding:.75rem 1rem;background:var(--surface);border:1px solid var(--border);border-radius:var(--r-lg)">Research Log &mdash; ${log.verdict||'?'} (${(log.log||[]).length||'?'} tool calls, ${((log.elapsedMs||0)/1000).toFixed(1)}s)</summary><div style="padding:1rem;background:var(--surface);border:1px solid var(--border);border-top:0;border-radius:0 0 var(--r-lg) var(--r-lg);overflow:auto;max-height:600px"><pre style="font-family:var(--font-mono);font-size:.8rem;line-height:1.6;white-space:pre-wrap">${JSON.stringify(log,null,2)}</pre></div></details>`;
}
if(testRes.status==='fulfilled'&&testRes.value.ok){
const code=await testRes.value.text();
@@ -115,7 +121,7 @@ function copyBadge(){const u=location.href.replace(/\/[^/]*$/,'/');const b=u+'ba
document.querySelectorAll('[data-md]').forEach(el=>{const t=el.textContent;el.removeAttribute('data-md');el.innerHTML=marked.parse(t)});
const FPS=30,FT=1/FPS,SPEEDS=[0.1,0.25,0.5,1,1.5,2];
document.querySelectorAll('.video-wrap video').forEach(v=>{
v.playbackRate=0.5;
v.playbackRate=1;
const c=document.createElement('div');c.className='vctrl';
const btn=(label,fn)=>{const b=document.createElement('button');b.textContent=label;b.onclick=fn;c.appendChild(b);return b};
const sep=()=>{const s=document.createElement('div');s.className='vsep';c.appendChild(s)};
@@ -127,7 +133,7 @@ document.querySelectorAll('.video-wrap video').forEach(v=>{
btn('\u25B6\u25B6',()=>{v.pause();v.currentTime+=FT});
btn('\u25B6\u25B6\u25B6',()=>{v.currentTime+=FT*10});
sep();
const spdBtns=SPEEDS.map(s=>{const b=btn(s+'x',()=>{v.playbackRate=s;spdBtns.forEach(x=>x.classList.remove('active'));b.classList.add('active')});if(s===0.5)b.classList.add('active');return b});
const spdBtns=SPEEDS.map(s=>{const b=btn(s+'x',()=>{v.playbackRate=s;spdBtns.forEach(x=>x.classList.remove('active'));b.classList.add('active')});if(s===1)b.classList.add('active');return b});
sep();c.appendChild(time);
const hint=document.createElement('span');hint.className='vhint';hint.textContent='\u2190\u2192 frame \u2022 space play';c.appendChild(hint);
// Custom seekbar — works even without server range request support

View File

@@ -193,7 +193,17 @@ function fetchIssue(number: string, repo: string, outputDir: string): string {
const body = shell(
`gh issue view ${number} --repo ${repo} --json title,body,labels --jq '"Title: " + .title + "\\n\\nLabels: " + ([.labels[].name] | join(", ")) + "\\n\\n" + .body'`
)
return writeTmpFile(outputDir, `issue-${number}.txt`, body)
// Append relevant comments for reproduction context
let comments = ''
try {
comments = shell(
`gh issue view ${number} --repo ${repo} --comments --json comments --jq '[.comments[] | select(.body | test("repro|step|how to|workaround"; "i")) | .body] | first(5; .[]) // empty'`
)
} catch {
// comments fetch failed, not critical
}
const content = comments ? `${body}\n\n--- Comments ---\n\n${comments}` : body
return writeTmpFile(outputDir, `issue-${number}.txt`, content)
}
function fetchPR(number: string, repo: string, outputDir: string): string {

View File

@@ -26,7 +26,7 @@ on:
default: focused
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.event.issue.number || github.ref }}
group: qa-${{ github.event.pull_request.number || github.event.issue.number || github.ref_name }}
cancel-in-progress: true
jobs:
@@ -53,7 +53,7 @@ jobs:
# Only run on label events if it's one of our labels
if [ "$EVENT_ACTION" = "labeled" ] && \
[ "$LABEL" != "qa-changes" ] && [ "$LABEL" != "qa-full" ] && [ "$LABEL" != "qa-issue" ]; then
[ "$LABEL" != "qa-changes" ] && [ "$LABEL" != "qa-full" ] && [ "$LABEL" != "qa-issue" ] && [ "$LABEL" != "Potential Bug" ] && [ "$LABEL" != "verified bug" ]; then
echo "skip=true" >> "$GITHUB_OUTPUT"
fi
@@ -206,7 +206,7 @@ jobs:
- name: Install QA dependencies
run: |
pnpm add -D @google/generative-ai@^0.24.1 @anthropic-ai/claude-agent-sdk@^0.2.85
git clone --depth 1 https://github.com/snomiao/demowright.git /tmp/demowright
git clone --depth 1 --branch feat/show-title-card-api https://github.com/snomiao/demowright.git /tmp/demowright
cd /tmp/demowright && npm install && npm install typescript && npm run build
sed -i 's|"./src/setup.ts"|"./dist/setup.mjs"|' register.cjs
node --input-type=module -e "import{readFileSync,writeFileSync}from'fs';const p=JSON.parse(readFileSync('package.json','utf8'));p.exports['./video-script']={import:'./dist/video-script.mjs',types:'./dist/video-script.d.mts'};p.exports['./setup']={import:'./dist/setup.mjs',types:'./dist/setup.d.mts'};writeFileSync('package.json',JSON.stringify(p,null,2))"
@@ -272,6 +272,12 @@ jobs:
--repo ${{ github.repository }} \
--json title,body,labels --jq '"Labels: \([.labels[].name] | join(", "))\nTitle: \(.title)\n\n\(.body)"' \
> "${{ runner.temp }}/issue-body.txt"
# Append top comments for reproduction context
gh issue view ${{ needs.resolve-matrix.outputs.number }} \
--repo ${{ github.repository }} \
--comments --json comments \
--jq '[.comments[] | select(.authorAssociation != "NONE" or (.body | test("repro|step|how to|workaround"; "i"))) | .body] | first(5; .[]) // empty' \
>> "${{ runner.temp }}/issue-body.txt" 2>/dev/null || true
echo "Issue body saved ($(wc -c < "${{ runner.temp }}/issue-body.txt") bytes)"
- name: Download QA guide
@@ -392,7 +398,7 @@ jobs:
- name: Install QA dependencies
run: |
pnpm add -D @google/generative-ai@^0.24.1
git clone --depth 1 https://github.com/snomiao/demowright.git /tmp/demowright
git clone --depth 1 --branch feat/show-title-card-api https://github.com/snomiao/demowright.git /tmp/demowright
cd /tmp/demowright && npm install && npm install typescript && npm run build
sed -i 's|"./src/setup.ts"|"./dist/setup.mjs"|' register.cjs
node --input-type=module -e "import{readFileSync,writeFileSync}from'fs';const p=JSON.parse(readFileSync('package.json','utf8'));p.exports['./video-script']={import:'./dist/video-script.mjs',types:'./dist/video-script.d.mts'};p.exports['./setup']={import:'./dist/setup.mjs',types:'./dist/setup.d.mts'};writeFileSync('package.json',JSON.stringify(p,null,2))"