Compare commits

..

60 Commits

Author SHA1 Message Date
snomiao
05c6e1c0ff fix: use comfyPage.page.waitForTimeout for delay injection
The test uses comfyPageFixture, not bare page. Also match
firstNode await calls for node interaction pauses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 11:42:41 +00:00
snomiao
15fb037a55 feat: inject 800ms pauses between test actions for readable videos
Regex inserts await page.waitForTimeout(800) before every
comfyPage/topbar/page/canvas/expect await call in the Phase 2
test code. Adds ~5-8s to a 10-step test (negligible vs 10min research).

Default playback changed to 0.5x (was 0.25x) since pauses provide
natural breathing room. A 15s video at 0.5x = 30s viewing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 11:27:02 +00:00
snomiao
dbc12a9d6a feat: default playback 0.25x + cursor overlay in E2E test videos
- Report player defaults to 0.25x speed (was 0.5x) — 5s test videos
  play in 20s, much more watchable
- Phase 2 injects cursor overlay via addInitScript into the test code
  before running — white SVG arrow follows mousemove events

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 10:35:23 +00:00
snomiao
909b75b9d1 fix: set PLAYWRIGHT_LOCAL=1 for Phase 2 to enable video recording
Playwright config only records video when PLAYWRIGHT_LOCAL is set.
In CI, this env var was missing so Phase 2 produced no video.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 13:09:51 +00:00
snomiao
868d774007 fix: don't overwrite Phase 2 test video with idle research video
After context.close(), renameLatestWebm would overwrite the Phase 2
test execution video with the idle research browser recording.
Now skips the rename if qa-session.webm already exists from Phase 2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 12:18:42 +00:00
snomiao
69cd6a7628 fix: Phase 2 records test execution video, copies to qa-session.webm
The old video showed an idle screen (research browser doing nothing).
Now Phase 2 runs the test with --video=on from browser_tests/tests/,
finds the recorded .webm, and copies it to qa-session.webm where
the deploy script expects it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 07:32:22 +00:00
snomiao
9c39635f16 fix: runTest uses project Playwright config + fixtures
- Copy test to browser_tests/tests/ where Playwright config expects it
- System prompt teaches Claude the project's test fixtures:
  comfyPageFixture, comfyPage.menu.topbar, comfyPage.workflow, etc.
- Increased time budget to 10 min for write→run→fix iterations
- Increased max turns to 50

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 06:40:11 +00:00
snomiao
3e957213c8 fix: broaden research-log.json search paths in deploy script
Also search qa-artifacts/before/*/research/ for the research log
since artifacts are downloaded with that nested structure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 05:13:32 +00:00
snomiao
f5d99c9c22 feat: Claude writes E2E tests to reproduce bugs instead of driving browser
Phase 1: Claude reads issue + a11y tree, writes a Playwright .spec.ts
test that asserts the bug exists. Runs the test, reads errors, iterates
until the test passes (proving the bug) or determines NOT_REPRODUCIBLE.

Phase 2: Run the passing test with --video=on for clean recording.

This replaces interactive browser driving with deterministic test code.
Claude Sonnet 4.6 excels at writing Playwright tests — much more
reliable than real-time browser interaction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 21:02:36 +00:00
snomiao
09bbd89172 fix: use ariaSnapshot() instead of removed page.accessibility API
page.accessibility.snapshot() was removed in Playwright 1.49+.
Use page.locator('body').ariaSnapshot() which returns a text
representation of the accessibility tree.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 20:50:25 +00:00
snomiao
0b5613246e feat: deploy research-log.json + use it as primary verdict source
- Copy research-log.json to deploy dir (accessible at /research-log.json)
- Read verdict from research log first (a11y-verified ground truth)
- Fall back to video review verdict only if no research log exists
- Research log is uploaded as part of QA artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 19:38:19 +00:00
snomiao
3a27263ca6 feat: three-phase QA pipeline — Research → Reproduce → Report
Phase 1 (qa-agent.ts): Claude investigates via a11y API only.
  - No video, no Gemini vision — only page.accessibility.snapshot()
  - Every action logged with a11y before/after state
  - done() requires evidence citing inspect() results
  - Outputs reproduction plan for Phase 2

Phase 2 (qa-reproduce.ts): Deterministic replay of research plan.
  - Executes each step with a11y assertions
  - Gemini describes visual changes (narration for humans)
  - Clean focused video with subtitles

Phase 3: Report job reads research-log.json for verdict (ground truth),
  narration-log.json for descriptions, video for visuals.
  Gemini formats logs into report — never determines verdict.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 18:17:31 +00:00
snomiao
6044452b8f fix: prevent AI lies — assertion-based verdicts + blind reviewer
Agent: MUST use inspect() after every action, verdict based on DOM
state not opinions. "NEVER claim REPRODUCED unless inspect() confirms."

Reviewer: Two-phase prompt — Phase 1 describes what it SEES (blind,
no context). Phase 2 compares observations against issue/PR context.
Anti-hallucination rules: "describe ONLY what you observe, NEVER infer."

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 17:26:48 +00:00
snomiao
c4a243060b feat: Agent SDK auto-detects Claude Code session — no API key needed locally
ANTHROPIC_API_KEY is optional: Agent SDK uses Claude Code OAuth
session when running locally (detects CLAUDE_CODE_SSE_PORT).
In CI, ANTHROPIC_API_KEY from secrets is used.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 15:07:57 +00:00
snomiao
3690e98c79 refactor: require ANTHROPIC_API_KEY, remove Gemini-only fallback
The Gemini-only agentic loop had ~47% success rate — too low to be
useful as a fallback. Now ANTHROPIC_API_KEY is required for issue
reproduction. Fails clearly if missing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 15:01:51 +00:00
snomiao
1c40893cfa fix: inject cursor overlay via addScriptTag after login, not addInitScript
addInitScript runs before page load — Vue's app mount destroys the
cursor div when it takes over the DOM. Using addScriptTag after login
ensures the cursor persists in the stable DOM.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 13:30:37 +00:00
snomiao
72a28a1e76 fix: cursor overlay on locator clicks (clickByText, menu items)
Locator.click/hover bypasses our page.mouse monkey-patch. Now
clickByText, hoverMenuItem, clickSubmenuItem get the element
bounding box and update cursor overlay manually.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 11:58:01 +00:00
snomiao
49c95248e5 fix: verdict JSON grep pattern — capture value without closing quote
The grep \{"verdict":\s*"[^"]+ captures up to but not including the
closing quote. The second grep for "[A-Z_]+"$ then fails because
there's no closing quote. Fixed: match "verdict":\s*"[A-Z_]+ then
extract [A-Z_]+$ (no quotes needed).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 10:51:35 +00:00
snomiao
6bb9d18ca6 fix: grep pipefail crash + add QA troubleshooting doc
- Add || true to all grep pipelines in deploy script (grep returns 1
  on no match, pipefail kills script)
- Add docs/qa/TROUBLESHOOTING.md covering all failures encountered:
  __name errors, zod/v4 imports, model IDs, badge mismatches, cursor,
  loadDefaultWorkflow, pressKey timing, agent behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 10:17:36 +00:00
snomiao
ca49e9cb1b feat: structured JSON verdict from AI reviewer, light-first theme
- Video review prompt now requests a ## Verdict JSON block:
  {"verdict": "REPRODUCED|NOT_REPRODUCIBLE|INCONCLUSIVE", "risk": "low|medium|high"}
- Deploy script reads JSON verdict first, falls back to grep
- Eliminates all regex-matching false positives permanently
- Theme: light mode is default, dark via prefers-color-scheme:dark
- Cards use solid backgrounds, grain overlay only in dark mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 09:11:09 +00:00
snomiao
db538c9d76 feat: report site follows system light/dark theme
Add prefers-color-scheme:light media query with light palette.
Replace hardcoded dark oklch values with CSS variables.
Light mode: white surfaces, dark text, subtle borders, no grain overlay.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 08:53:08 +00:00
snomiao
c04b31a0f1 fix: loadDefaultWorkflow uses API instead of menu, pressKey uses instant press
- loadDefaultWorkflow now calls app.resetToDefaultWorkflow() via JS API
  instead of navigating File → Load Default menu (menu item name varies)
- pressKey reverted to instant press() — the 400ms hold via down/up
  prevented Escape from propagating to parent dialog (#10397 BEFORE video
  showed wrong behavior because hold intercepted the event)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 08:45:14 +00:00
snomiao
5938fdef8d fix: monkey-patch page.mouse for universal cursor overlay
Instead of manually calling moveCursorOverlay in each action,
patch page.mouse.move/click/dblclick/down/up globally. Now EVERY
mouse operation shows the cursor — text clicks, menu hovers, etc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 15:47:56 +00:00
snomiao
27d55f093b fix: use gemini-3-flash-preview in hybrid agent (not 2.5 preview)
Gemini 2.5 preview models return 404. Always use gemini-3+ models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 15:45:29 +00:00
snomiao
c1ddb8669e fix: add 'could not be confirmed' to negative verdict patterns
"could not be confirmed" contains "confirmed" which matched the
positive reproduc|confirm check. Now caught by the negative check first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 15:17:02 +00:00
snomiao
57dbf0132d fix: correct Claude model ID — claude-sonnet-4-6 (not dated suffix)
The Agent SDK returned "model not found" for claude-sonnet-4-6-20250514.
Correct ID is claude-sonnet-4-6.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 14:36:22 +00:00
snomiao
b4c49588dc fix: cursor overlay now controlled via __moveCursor, not DOM events
Headless Chrome's Playwright CDP doesn't trigger DOM mousemove events
reliably. Now executeAction calls __moveCursor(x,y) directly after
every mouse.move/click/drag. Cursor is an SVG arrow (white + outline).
Click state shown via scale animation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 13:25:12 +00:00
snomiao
00dc10e9e6 feat: badge label includes QA date — #10397 QA0327
Shows when the QA was run so stale results are obvious at a glance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 08:43:19 +00:00
snomiao
48414fa1b5 fix: make key presses visible in video — hold + subtitle
pressKey now uses keyboard.down/up with 400ms hold instead of
instant press(). Shows subtitle "⌨ Escape" and the keyboard HUD
catches the held state for video frame capture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 08:33:22 +00:00
snomiao
e744f101b0 fix: use zod instead of zod/v4 — project zod doesn't export /v4 subpath
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:51:37 +00:00
snomiao
9ad8267067 fix: add claude-agent-sdk to workspace catalog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:26:54 +00:00
snomiao
cf54ddb6d3 feat: control/test comparison strategy + QA backlog doc
- Agent system prompt now instructs Claude to demonstrate BOTH working
  (control) and broken (test) states when bug is triggered by a setting
- Added docs/qa/backlog.md with future improvements: Type B/C comparisons,
  TTS, pre-seeding, cost optimization, environment-dependent issues

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:24:10 +00:00
snomiao
fce31cf0bf feat: all badges use vertical box style
Drop horizontal badges. Universal box badge shows:
  ┌──────────────────┐
  │    #7414 QA       │
  │ ✓ 1 reproduced   │
  │ ⚙ Fix: APPROVED  │  ← only for PRs
  └──────────────────┘

Issues show repro/not-repro/inconclusive rows.
PRs add a fix quality row (APPROVED/MINOR/MAJOR).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:20:05 +00:00
snomiao
050091abc6 feat: show QA pipeline commit hash + timing on report site
- Shows "QA @ abc1234" linking to the pipeline code commit
- Shows start time → deploy time in header
- Helps trace which version of QA scripts generated each report

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:16:41 +00:00
snomiao
548e37b9a5 feat: hybrid QA agent — Claude Sonnet 4.6 brain + Gemini vision
Architecture:
- Claude Sonnet 4.6 plans and reasons (via Claude Agent SDK)
- Gemini 2.5 Flash watches video buffer and describes what it sees
- 4 tools: observe(), inspect(), perform(), done()

observe(seconds, focus): builds video clip from screenshot buffer,
  sends to Gemini with Claude's focused question.
inspect(selector): searches a11y tree for specific element state.
perform(action, params): executes Playwright action.
done(verdict, summary): signals completion.

Falls back to Gemini-only loop if ANTHROPIC_API_KEY not set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 06:27:30 +00:00
snomiao
458b2e918c feat: pass OPENAI_API_KEY to recording step for TTS narration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 04:53:13 +00:00
snomiao
63dbe002d1 feat: subtitle overlay + OpenAI TTS narration on reproduce videos
- Agent reasoning shown as subtitle bar at bottom of video during recording
- After recording, generates TTS audio via OpenAI API (tts-1, nova voice)
- Merges audio clips at correct timestamps into the video with ffmpeg
- Requires OPENAI_API_KEY env var; gracefully skips if not set
- No-sandbox + disable-dev-shm-usage for headless Chrome compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 04:42:03 +00:00
snomiao
f3d9a8c2e4 feat: show test requirements from QA guide on report site
- Download QA guide artifact in report job
- Extract prerequisites, test focus, and steps from guide JSON
- Display below the purpose description: focus → prerequisites → steps
- Separated by a subtle divider with smaller font

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 03:28:03 +00:00
snomiao
831718bd50 feat: purpose description on report + multi-pass video link fix
- Report site shows "PR #N aims to..." or "Issue #N reports..." block
  above the video cards, extracted from pr-context.txt
- Multi-pass video links fall back to pass1 when qa-{os}.mp4 is 404
- More negative verdict patterns: "does not demonstrate", "never tested"
- Risk uses first word of Overall Risk (avoids "high confidence" match)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 03:23:50 +00:00
snomiao
7d3ddbf619 fix: verdict detection — more negative patterns, risk uses first word
- Add "does not demonstrate", "steps were not performed", "never tested"
  to NOT_REPRO patterns (fixes #9101 false positive)
- Risk detection uses first word of Overall Risk section instead of
  grepping entire text (fixes "high confidence" matching HIGH)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 02:23:18 +00:00
snomiao
a42240cb65 fix: use addScriptTag for keyboard HUD to avoid tsx __name issue
tsx compiles arrow functions with __name helpers that don't exist in
browser context. Using addScriptTag with plain JS string avoids this.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 02:03:34 +00:00
snomiao
c957c0833b fix: remove TS type annotation from page.evaluate (browser context)
Set<string>() in page.evaluate causes __name ReferenceError in browser.
Use untyped Set() since browser JS doesn't support TS generics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 01:55:46 +00:00
snomiao
e01d4aaffc debug: add verdict count logging to deploy script 2026-03-27 01:54:14 +00:00
snomiao
3f226467cd fix: check negative verdicts before positive in per-report classification
"fails to reproduce" contains "reproduce" — must check negatives first
within each report. Across reports, REPRODUCED still wins (multi-pass).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 01:41:06 +00:00
snomiao
d4d8772ae7 feat: keyboard HUD overlay shows pressed keys in video
Injects a persistent overlay in bottom-right corner that displays
currently held keys (e.g. "⌨ Space", "⌨ CTRL+C"). Makes keyboard
interactions visible in the recording for both human and AI reviewers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 01:38:44 +00:00
snomiao
b578f8d7c4 feat: vertical box badge for multi-pass with breakdown
Multi-pass issues show a stacked box badge:
  ┌──────────────┐
  │  #7806 QA    │
  │ ✓ 1 reproduced    │
  │ ⚠ 1 inconclusive  │
  └──────────────┘

Single-pass issues keep the standard horizontal badge.
Badge colors: blue=reproduced, gray=not-repro, yellow=inconclusive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:49:19 +00:00
snomiao
d5360ce45c feat: show pass counts in badge for multi-pass reports (X/Y REPRODUCED)
When multiple report files exist, badge shows "2/3 REPRODUCED" instead
of just "REPRODUCED". Single-pass issues still show plain verdict.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:45:16 +00:00
snomiao
0e6d9fd926 fix: REPRODUCED wins over INCONCLUSIVE in multi-pass badge
When multiple passes exist and one confirms while another is
inconclusive, the badge should show REPRODUCED. Previously
INCONCLUSIVE was checked first, hiding successful reproductions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:36:59 +00:00
snomiao
de810f88a4 fix: cloneNode uses Ctrl+C/V instead of right-click Clone menu
The "Clone" context menu item doesn't exist in Nodes 2.0 mode.
Using Ctrl+C/Ctrl+V works in both legacy and Nodes 2.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:22:31 +00:00
snomiao
722b01a253 fix: preflight performs actual repro steps, not just setup
- #10307: preflight clones KSampler node, hint says drag to overlap
- #7414: preflight clicks numeric widget, hint says drag to change value
- #7806: preflight takes baseline screenshot, hint gives exact coords
  for holdKeyAndDrag with spacebar
- Hints now reference "Preflight already did X, NOW do Y" pattern

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:07:42 +00:00
snomiao
dfd19a3cf9 fix: tell agent what preflight already did to prevent repeated actions
Agent was wasting turns re-doing loadDefaultWorkflow and setSetting
that preflight already executed. Now the system prompt includes
"Already Done" section listing preflight actions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:40:48 +00:00
snomiao
3531e37ae7 fix: preflight actions + badge false-positive pattern
- Auto-execute prerequisite actions (enable Nodes 2.0, load default
  workflow) BEFORE the agentic loop starts. Agent model ignores prompt
  hints but preflight guarantees nodes are on canvas.
- Add "fails to reproduce" to NOT REPRODUCIBLE badge patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:25:41 +00:00
snomiao
024b231c05 feat: qa-issue label trigger + labels in issue context
- Add issues:[labeled] trigger and qa-issue label support
- Resolve github.event.issue.number for issue-triggered runs
- Include issue labels in context (feeds keyword matcher for hints)
- Remove qa-issue label after run completes (same as qa-changes/qa-full)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:49:00 +00:00
snomiao
0c22369c60 fix: keyword-driven action hints for agent issue reproduction
Scan issue context for keywords (clone, copy-paste, spacebar, resize,
sidebar, scroll, middle-click, node shape, Nodes 2.0, etc.) and inject
specific MUST-follow action steps into the agentic system prompt.

Addresses 9 INCONCLUSIVE issues where agent had actions available
but didn't know when to use them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:43:31 +00:00
snomiao
511fdf1b24 fix: restyle QA annotations to avoid misleading AI reviewer
- Annotations now use cyan dashed border + monospace "QA:" prefix
  instead of red solid labels that look like UI error messages
- Video review prompts explicitly tell reviewer to ignore QA annotations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:34:51 +00:00
snomiao
d756c362e3 fix: badge mismatch, multi-pass report overwrite, agent node creation
P1: Filter out QA bot's own comments from pr-context (INCONCLUSIVE loop)
P2: Grep only ## Summary section for verdict (false REPRODUCED fix)
P3: Strip markdown bold before matching Overall Risk section
P4: Deploy full placeholder page with spinner during CI
P5: Pass #NUM QA label to PREPARING/ANALYZING badges
P6: Add copyPaste, holdKeyAndDrag, resizeNode, middleClick actions
P7: preload=auto + custom seekbar (already deployed)
P8: Deploy FAILED badge on report job failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:02:17 +00:00
snomiao
ba512fd263 fix: video seeking — preload=auto, custom seekbar, _headers
- Change preload=metadata to preload=auto for full video download
- Add _headers file with Accept-Ranges for Cloudflare Pages
- Add custom seekbar (range input + buffer indicator) that works
  even without server HTTP range request support
- Seekbar shows buffered progress and allows dragging to any point

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:20:56 +00:00
snomiao
e1fb782832 fix: strengthen after-mode prompt to test PR-specific behavior
Previous prompt said "test the specific behavior" which was too vague,
leading to generic UI walkthroughs instead of targeted tests.

New prompt: explicitly instructs to read the diff, trigger the exact
scenario the PR fixes, and avoid generic menu screenshots.

Also added reload action to before/after prompt for state persistence tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 12:58:02 +09:00
snomiao
9c347642ba fix: badge mismatch, multi-pass report overwrite, agent node creation
- Fix quality badge now reads "## Overall Risk" section only
- Prevents false MAJOR ISSUES from severity labels or negated phrases
- "Low" risk → APPROVED, "High" → MAJOR ISSUES, "Medium" → MINOR ISSUES

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 03:47:54 +00:00
snomiao
5f8f40b559 fix: install pnpm before building PR frontend in sno-qa-* triggers
setup-frontend must run first to install node/pnpm, then rebuild
with PR code. Also re-install sno-skills deps after switching back
so QA scripts' dependencies are available.

Also gitignore .claude/scheduled_tasks.lock.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 12:17:26 +09:00
14 changed files with 1892 additions and 245 deletions

View File

@@ -16,6 +16,8 @@ on:
pull_request:
types: [labeled]
branches: [main]
issues:
types: [labeled]
workflow_dispatch:
inputs:
mode:
@@ -53,8 +55,8 @@ jobs:
# Only run on label events if it's one of our labels
if [ "$EVENT_ACTION" = "labeled" ] && \
[ "$LABEL" != "qa-changes" ] && [ "$LABEL" != "qa-full" ]; then
echo "skip=true" >> "$GITHUB_OUTPUT"
[ "$LABEL" != "qa-changes" ] && [ "$LABEL" != "qa-full" ] && [ "$LABEL" != "qa-issue" ]; then
echo "skip=true" >> "$GITHUB_OUTPUT"
fi
# Full QA triggers
@@ -80,10 +82,13 @@ jobs:
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_NUM: ${{ github.event.pull_request.number }}
ISSUE_NUM: ${{ github.event.issue.number }}
BRANCH: ${{ github.ref_name }}
REPO: ${{ github.repository }}
run: |
if [ -n "$PR_NUM" ]; then
if [ -n "$ISSUE_NUM" ]; then
NUM="$ISSUE_NUM"
elif [ -n "$PR_NUM" ]; then
NUM="$PR_NUM"
else
NUM=$(gh pr list --repo "$REPO" \
@@ -244,7 +249,7 @@ jobs:
run: |
gh issue view ${{ needs.resolve-matrix.outputs.number }} \
--repo ${{ github.repository }} \
--json title,body --jq '.title + "\n\n" + .body' \
--json title,body,labels --jq '"Labels: \([.labels[].name] | join(", "))\nTitle: \(.title)\n\n\(.body)"' \
> "${{ runner.temp }}/issue-body.txt"
echo "Issue body saved ($(wc -c < "${{ runner.temp }}/issue-body.txt") bytes)"
@@ -284,6 +289,8 @@ jobs:
shell: bash
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
TARGET_TYPE: ${{ needs.resolve-matrix.outputs.target_type }}
run: |
MODE="before"
@@ -356,10 +363,15 @@ jobs:
ref: ${{ github.head_ref || github.ref }}
token: ${{ secrets.GITHUB_TOKEN }}
# Always run setup-frontend first to install node/pnpm
- name: Setup frontend
uses: ./.github/actions/setup-frontend
with:
include_build_step: true
# When triggered via sno-qa-* push, the checkout above gets sno-skills
# (the scripts branch), not the actual PR. Build the PR frontend in a
# worktree so the QA scripts from sno-skills remain available.
- name: Build PR frontend for sno-qa-* triggers
# (the scripts branch), not the actual PR. Rebuild with PR code.
- name: Rebuild with PR frontend for sno-qa-* triggers
if: >-
!github.head_ref &&
needs.resolve-matrix.outputs.target_type == 'pr' &&
@@ -368,31 +380,20 @@ jobs:
env:
PR_NUM: ${{ needs.resolve-matrix.outputs.number }}
run: |
# Save the sno-skills ref for later
SNO_REF=$(git rev-parse HEAD)
# Fetch and checkout the PR to build its frontend
git fetch origin "refs/pull/${PR_NUM}/head"
git checkout FETCH_HEAD
echo "Building PR #${PR_NUM} frontend at $(git rev-parse --short HEAD)"
# Install and build the PR frontend
pnpm install --frozen-lockfile || pnpm install
pnpm build
# Switch back to sno-skills so QA scripts are available
git checkout "$SNO_REF"
pnpm install --frozen-lockfile || pnpm install
echo "Restored sno-skills scripts at $(git rev-parse --short HEAD)"
- name: Setup frontend (PR branch, non-sno-qa triggers)
if: >-
github.head_ref ||
needs.resolve-matrix.outputs.target_type != 'pr' ||
!needs.resolve-matrix.outputs.number
uses: ./.github/actions/setup-frontend
with:
include_build_step: true
- name: Setup ComfyUI server (no launch)
uses: ./.github/actions/setup-comfyui-server
with:
@@ -543,22 +544,39 @@ jobs:
BADGESCRIPT
chmod +x /tmp/gen-badge.sh
# Create badge deploy script
cat > /tmp/deploy-badge.sh <<DEPLOYSCRIPT
# Create badge deploy script — deploys badge + placeholder status page
cat > /tmp/deploy-badge.sh <<'DEPLOYBADGE'
#!/bin/bash
# Usage: deploy-badge.sh <status> [color]
STATUS="\$1"
COLOR="\${2:-#555}"
DIR=\$(mktemp -d)
/tmp/gen-badge.sh "\$STATUS" "\$COLOR" "\$DIR/badge.svg"
# Also create a minimal redirect page
echo '<!DOCTYPE html><html><head><meta http-equiv="refresh" content="0;url=badge.svg"></head></html>' > "\$DIR/index.html"
# Usage: deploy-badge.sh <status> <color> [label] [run_url]
STATUS="$1" COLOR="${2:-#555}" LABEL="${3:-QA}" RUN_URL="$4"
DIR=$(mktemp -d)
/tmp/gen-badge.sh "$STATUS" "$COLOR" "$DIR/badge.svg" "$LABEL"
RUN_LINK=""
[ -n "$RUN_URL" ] && RUN_LINK="<a href=\"${RUN_URL}\" style=\"color:#7c8aff;text-decoration:none;font-size:.8rem\">View CI run &rarr;</a>"
cat > "$DIR/index.html" <<PAGEEOF
<!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1">
<title>${LABEL} — ${STATUS}</title>
<meta http-equiv="refresh" content="30">
<style>:root{--bg:#0d0f14;--fg:#e8e8ec;--muted:#8b8fa3;--primary:#7c8aff}*{margin:0;padding:0;box-sizing:border-box}body{background:var(--bg);color:var(--fg);font-family:system-ui,sans-serif;display:flex;align-items:center;justify-content:center;min-height:100vh;text-align:center}
.wrap{max-width:420px;padding:2rem}.badge{margin:1.5rem 0}.status{font-size:1.5rem;font-weight:700;letter-spacing:-.02em;margin:.5rem 0}
.hint{color:var(--muted);font-size:.85rem;line-height:1.6;margin-top:1rem}
@keyframes pulse{0%,100%{opacity:1}50%{opacity:.4}}.dot{display:inline-block;width:8px;height:8px;border-radius:50%;background:var(--primary);animation:pulse 1.5s ease-in-out infinite;margin-right:.5rem;vertical-align:middle}
</style></head><body><div class=wrap>
<div class=badge><img src=badge.svg alt="${LABEL}: ${STATUS}"></div>
<p class=status><span class=dot></span>${STATUS}</p>
<p class=hint>QA pipeline is running. This page auto-refreshes every 30 seconds.<br>Results will appear here when analysis is complete.</p>
<p style="margin-top:1rem">${RUN_LINK}</p>
</div></body></html>
PAGEEOF
DEPLOYBADGE
# Append the wrangler deploy (uses outer BRANCH variable)
cat >> /tmp/deploy-badge.sh <<DEPLOYWRANGLER
wrangler pages deploy "\$DIR" \
--project-name="comfy-qa" \
--branch="${BRANCH}" 2>&1 | tail -3
rm -rf "\$DIR"
echo "Badge deployed: ${STATUS}"
DEPLOYSCRIPT
echo "Deployed: \${STATUS}"
DEPLOYWRANGLER
chmod +x /tmp/deploy-badge.sh
- name: Setup dual badge generator
@@ -596,11 +614,45 @@ jobs:
DUALBADGE
chmod +x /tmp/gen-badge-dual.sh
- name: Deploy badge — PREPARING
env:
CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
run: /tmp/deploy-badge.sh "PREPARING" "#2196f3"
# Universal vertical box badge — used for all badges (issues + PRs)
cat > /tmp/gen-badge-box.sh <<'BOXBADGE'
#!/bin/bash
# Usage: gen-badge-box.sh <output-path> <label> <repro> <not_repro> <fail> <total> [fix_result] [fix_color]
OUT="$1" LABEL="$2" REPRO="$3" NOREPRO="$4" FAIL="$5" TOTAL="$6"
FIX_RESULT="${7:-}" FIX_COLOR="${8:-#4c1}"
W=160
ROW=18
HEADER=22
ROWS=0
[ "$REPRO" -gt 0 ] 2>/dev/null && ROWS=$((ROWS+1))
[ "$NOREPRO" -gt 0 ] 2>/dev/null && ROWS=$((ROWS+1))
[ "$FAIL" -gt 0 ] 2>/dev/null && ROWS=$((ROWS+1))
[ "$ROWS" -eq 0 ] && ROWS=1
# Add fix quality row for PRs
[ -n "$FIX_RESULT" ] && ROWS=$((ROWS+1))
H=$((HEADER + ROWS * ROW + 4))
Y=$((HEADER + 2))
cat > "$OUT" <<SVGEOF
<svg xmlns="http://www.w3.org/2000/svg" width="${W}" height="${H}" role="img" aria-label="${LABEL}">
<title>${LABEL}: ${REPRO} reproduced, ${NOREPRO} not-repro, ${FAIL} inconclusive / ${TOTAL}${FIX_RESULT:+ | Fix: ${FIX_RESULT}}</title>
<rect width="${W}" height="${H}" rx="4" fill="#2a2d35"/>
<rect width="${W}" height="${HEADER}" rx="4" fill="#555"/>
<rect y="$((HEADER-4))" width="${W}" height="4" fill="#555"/>
<text x="$((W/2))" y="15" fill="#fff" text-anchor="middle" font-family="Verdana,sans-serif" font-size="11" font-weight="bold">${LABEL}</text>
SVGEOF
add_row() {
local icon="$1" text="$2" color="$3"
echo " <rect x='4' y='${Y}' width='$((W-8))' height='$((ROW-2))' rx='3' fill='${color}' opacity='.15'/>" >> "$OUT"
echo " <text x='10' y='$((Y+13))' fill='${color}' font-family='Verdana,sans-serif' font-size='11'>${icon} ${text}</text>" >> "$OUT"
Y=$((Y+ROW))
}
[ "$REPRO" -gt 0 ] 2>/dev/null && add_row "✓" "${REPRO} reproduced" "#58a6ff"
[ "$NOREPRO" -gt 0 ] 2>/dev/null && add_row "✗" "${NOREPRO} not reproducible" "#8b949e"
[ "$FAIL" -gt 0 ] 2>/dev/null && add_row "⚠" "${FAIL} inconclusive" "#d29922"
[ -n "$FIX_RESULT" ] && add_row "⚙" "Fix: ${FIX_RESULT}" "$FIX_COLOR"
echo "</svg>" >> "$OUT"
BOXBADGE
chmod +x /tmp/gen-badge-box.sh
- name: Resolve target number and type
id: pr
@@ -630,12 +682,32 @@ jobs:
fi
fi
# Badge label with target number
LABEL="QA"
[ -n "$NUM" ] && LABEL="#${NUM} QA"
echo "badge_label=${LABEL}" >> "$GITHUB_OUTPUT"
- name: Deploy placeholder page — PREPARING
env:
CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
BADGE_LABEL: ${{ steps.pr.outputs.badge_label || 'QA' }}
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: /tmp/deploy-badge.sh "PREPARING" "#2196f3" "$BADGE_LABEL" "$RUN_URL"
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Setup frontend
uses: ./.github/actions/setup-frontend
- name: Download QA guides
continue-on-error: true
uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0
with:
name: qa-guides-${{ github.run_id }}
path: qa-guides
- name: Download BEFORE artifacts
if: needs.qa-before.result == 'success'
uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0
@@ -732,7 +804,7 @@ jobs:
env:
CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
run: /tmp/deploy-badge.sh "ANALYZING" "#ff9800"
run: /tmp/deploy-badge.sh "ANALYZING" "#ff9800" "${{ steps.pr.outputs.badge_label || 'QA' }}" "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
- name: Build context for video review
env:
@@ -750,11 +822,12 @@ jobs:
{
echo "### Issue #${TARGET_NUM}"
gh issue view "$TARGET_NUM" --repo "$REPO" \
--json title,body --jq '"Title: \(.title)\n\nDescription:\n\(.body)"' 2>/dev/null || true
--json title,body,labels --jq '"Labels: \([.labels[].name] | join(", "))\nTitle: \(.title)\n\nDescription:\n\(.body)"' 2>/dev/null || true
echo ""
echo "### Comments"
# Filter out QA bot comments to prevent INCONCLUSIVE feedback loop
gh api "repos/${REPO}/issues/${TARGET_NUM}/comments" \
--jq '.[].body' 2>/dev/null | head -200 || true
--jq '.[] | select(.user.login != "github-actions[bot]") | .body' 2>/dev/null | head -200 || true
echo ""
echo "This video attempts to reproduce a reported bug on the main branch."
} > pr-context.txt
@@ -883,6 +956,8 @@ jobs:
TARGET_TYPE: ${{ steps.pr.outputs.target_type }}
REPO: ${{ github.repository }}
RUN_ID: ${{ github.run_id }}
PIPELINE_SHA: ${{ github.sha }}
RUN_START_TIME: ${{ github.event.head_commit.timestamp || github.event.pull_request.updated_at || '' }}
run: bash scripts/qa-deploy-pages.sh
- name: Post unified QA comment
@@ -917,6 +992,8 @@ jobs:
for os in Linux macOS Windows; do
GIF_URL="${VIDEO_BASE}/qa-${os}-thumb.gif"
VID_URL="${VIDEO_BASE}/qa-${os}.mp4"
# Fallback to pass1 for multi-pass recordings
curl -sf --head "$VID_URL" >/dev/null 2>&1 || VID_URL="${VIDEO_BASE}/qa-${os}-pass1.mp4"
if curl -sf --head "$VID_URL" >/dev/null 2>&1; then
if curl -sf --head "$GIF_URL" >/dev/null 2>&1; then
VIDEO_SECTION="${VIDEO_SECTION}[![${os} QA](${GIF_URL})](${VID_URL})"$'\n'
@@ -1000,11 +1077,24 @@ jobs:
- name: Remove QA label
if: >-
github.event.label.name == 'qa-changes' ||
github.event.label.name == 'qa-full'
github.event.label.name == 'qa-full' ||
github.event.label.name == 'qa-issue'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
LABEL_NAME: ${{ github.event.label.name }}
PR_NUMBER: ${{ steps.pr.outputs.number }}
TARGET_NUM: ${{ steps.pr.outputs.number }}
TARGET_TYPE: ${{ steps.pr.outputs.target_type }}
REPO: ${{ github.repository }}
run: |
[ -n "$PR_NUMBER" ] && gh pr edit "$PR_NUMBER" --repo "$REPO" --remove-label "$LABEL_NAME"
if [ "$TARGET_TYPE" = "issue" ]; then
[ -n "$TARGET_NUM" ] && gh issue edit "$TARGET_NUM" --repo "$REPO" --remove-label "$LABEL_NAME" || true
else
[ -n "$TARGET_NUM" ] && gh pr edit "$TARGET_NUM" --repo "$REPO" --remove-label "$LABEL_NAME" || true
fi
- name: Deploy FAILED badge on error
if: failure()
env:
CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
run: /tmp/deploy-badge.sh "FAILED" "#e05d44" "${{ steps.pr.outputs.badge_label || 'QA' }}" "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"

1
.gitignore vendored
View File

@@ -102,3 +102,4 @@ vitest.config.*.timestamp*
.amp
.playwright-cli/
.playwright/
.claude/scheduled_tasks.lock

View File

@@ -1,54 +0,0 @@
import { expect } from '@playwright/test'
import { comfyPageFixture as test } from '../fixtures/ComfyPage'
test.describe('Subgraph Promoted Widgets', { tag: '@ui' }, () => {
test.beforeEach(async ({ comfyPage }) => {
await comfyPage.workflow.setupWorkflowsDirectory({})
await comfyPage.settings.setSetting('Comfy.UseNewMenu', 'Disabled')
})
test('promoted DOM widgets remain visible on host when inner node is collapsed', async ({
comfyPage
}) => {
await comfyPage.workflow.loadWorkflow('default')
// Convert a node with a DOM widget (CLIPTextEncode) to a subgraph
const clipNode = await comfyPage.nodeOps.getNodeRefById('6')
const subgraphNode = await clipNode.convertToSubgraph()
await comfyPage.nextFrame()
// Identify the DOM widget (textarea)
const widget = comfyPage.page
.locator('textarea.comfy-multiline-input')
.first()
await expect(widget).toBeVisible()
// Navigate into the subgraph and collapse the inner node
await subgraphNode.navigateIntoSubgraph()
await comfyPage.nextFrame()
await comfyPage.page.evaluate(() => {
const graph = window.app!.canvas.graph!
for (const node of graph._nodes) {
node.flags.collapsed = true
}
graph._version++
})
await comfyPage.nextFrame()
// Navigate back to the parent graph
await comfyPage.page.evaluate(() => {
const canvas = window.app!.canvas
if (canvas.graph?.rootGraph) {
canvas.setGraph(canvas.graph.rootGraph)
}
})
await comfyPage.nextFrame()
// Assert the widget is still visible on the host node despite the inner node being collapsed
await expect(widget).toBeVisible()
await expect(widget).toHaveScreenshot(
'subgraph-promoted-widget-collapsed-inner.png'
)
})
})

View File

@@ -0,0 +1,73 @@
# QA Pipeline Troubleshooting
## Common Failures
### `set -euo pipefail` + grep with no match
**Symptom**: Deploy script crashes silently, badge shows FAILED.
**Cause**: `grep -oP` returns exit code 1 when no match. Under `pipefail`, this kills the entire script.
**Fix**: Always append `|| true` to grep pipelines in bash scripts.
### `__name is not defined` in page.evaluate
**Symptom**: Recording crashes with `ReferenceError: __name is not defined`.
**Cause**: tsx compiles arrow functions inside `page.evaluate()` with `__name` helpers. The browser context doesn't have these.
**Fix**: Use `page.addScriptTag({ content: '...' })` with plain JS strings instead of `page.evaluate(() => { ... })` with arrow functions.
### `Set<string>()` in page.evaluate
**Symptom**: Same `__name` error.
**Cause**: TypeScript generics like `new Set<string>()` get compiled incorrectly for browser context.
**Fix**: Use `new Set()` without type parameter.
### `zod/v4` import error
**Symptom**: `ERR_PACKAGE_PATH_NOT_EXPORTED: Package subpath './v4' is not defined`.
**Cause**: claude-agent-sdk depends on `zod/v4` internally, but the project's zod doesn't export it.
**Fix**: Import from `zod` (not `zod/v4`) in project code.
### `ERR_PNPM_LOCKFILE_CONFIG_MISMATCH`
**Symptom**: pnpm install fails with frozen lockfile mismatch.
**Cause**: Adding a new dependency changes the workspace catalog but lockfile wasn't regenerated.
**Fix**: Run `pnpm install` to regenerate lockfile, commit `pnpm-workspace.yaml` + `pnpm-lock.yaml`.
### `loadDefaultWorkflow` — "Load Default" not found
**Symptom**: Menu item "Load Default" not found, canvas stays empty.
**Cause**: The menu item name varies by version/locale. Menu navigation is fragile.
**Fix**: Use `app.resetToDefaultWorkflow()` JS API via `page.evaluate` instead of menu navigation.
### Model ID not found (Claude Agent SDK)
**Symptom**: `There's an issue with the selected model (claude-sonnet-4-6-20250514)`.
**Cause**: Dated model IDs like `claude-sonnet-4-6-20250514` don't exist.
**Fix**: Use `claude-sonnet-4-6` (no date suffix).
### Model not found (Gemini)
**Symptom**: 404 from Gemini API.
**Cause**: Preview model names like `gemini-2.5-flash-preview-05-20` expire.
**Fix**: Use `gemini-3-flash-preview` (latest stable).
## Badge Mismatches
### False REPRODUCED
**Symptom**: Badge says REPRODUCED but AI review says "could not reproduce".
**Root cause**: Grep pattern `reproduc|confirm` matches neutral words like "reproduction steps" or "could not be confirmed".
**Fix**: Use structured JSON verdict from AI (`## Verdict` section with `{"verdict": "..."}`) instead of regex matching the prose.
### INCONCLUSIVE feedback loop
**Symptom**: Once an issue gets INCONCLUSIVE, all future runs stay INCONCLUSIVE.
**Cause**: QA bot's own previous comments contain "INCONCLUSIVE", which gets fed back into pr-context.txt.
**Fix**: Filter out `github-actions[bot]` comments when building pr-context.
### pressKey with hold prevents event propagation
**Symptom**: BEFORE video doesn't show the bug (e.g., Escape doesn't close dialog).
**Cause**: `keyboard.down()` + 400ms sleep + `keyboard.up()` changes event timing. Some UI frameworks handle held keys differently than instant presses.
**Fix**: Use instant `keyboard.press()` for testing. Show key name via subtitle overlay instead.
## Cursor Not Visible
**Symptom**: No mouse cursor in recorded videos.
**Cause**: Headless Chrome doesn't render system cursor. The CSS cursor overlay relies on DOM `mousemove` events which Playwright CDP doesn't reliably trigger.
**Fix**: Monkey-patch `page.mouse.move/click/dblclick/down/up` to call `__moveCursor(x,y)` on the injected cursor div. This makes ALL mouse operations update the overlay.
## Agent Doesn't Perform Steps
**Symptom**: Agent opens menus and settings but never interacts with the canvas.
**Causes**:
1. `loadDefaultWorkflow` failed (no nodes on canvas)
2. Agent ran out of turn budget (30 turns / 120s)
3. Gemini Flash (old agent) ignores prompt hints
**Fix**: Use hybrid agent (Claude Sonnet 4.6 + Gemini vision). Claude's superior reasoning follows instructions precisely.

59
docs/qa/backlog.md Normal file
View File

@@ -0,0 +1,59 @@
# QA Pipeline Backlog
## Comparison Modes
### Type A: Same code, different settings (IMPLEMENTED)
Agent demonstrates both working (control) and broken (test) states in one session by toggling settings. E.g., Nodes 2.0 OFF → drag works, Nodes 2.0 ON → drag broken.
### Type B: Different commits
For regressions reported as "worked in vX.Y, broken in vX.Z":
- `qa-analyze-pr.ts` detects regression markers ("since v1.38", "after PR #1234")
- Pipeline checks out the old commit, records control video
- Records test video on current main
- Side-by-side comparison on report page (reuses PR before/after infra)
### Type C: Different browsers
For browser-specific bugs ("works on Chrome, broken on Firefox"):
- Run recording with different Playwright browser contexts
- Compare behavior across browsers in one report
## Agent Improvements
### TTS Narration
- OpenAI TTS (`tts-1`, nova voice) generates audio from agent reasoning
- Merged into video via ffmpeg at correct timestamps
- Currently in qa-record.ts but needs wiring into hybrid agent path
### Image/Screenshot Reading
- `qa-analyze-pr.ts` already downloads and sends images from issue bodies to Gemini
- Could also send them to the Claude agent as context ("the reporter showed this screenshot")
### Placeholder Page
- Deploy a status page immediately when CI starts
- Auto-refreshes every 30s until final report replaces it
- Shows spinner, CI link, badge
### Pre-seed Assets
- Upload test images via ComfyUI API before recording
- Enables reproduction of bugs requiring assets (#10424 zoom button)
### Environment-Dependent Issues
- #7942: needs custom TestNode — could install a test custom node pack in CI
- #9101: needs completed generation — could run with a tiny model checkpoint
## Cost Optimization
### Lazy A11y Tree
- `inspect(selector)` searches tree for specific element (~20 tokens)
- `getUIChanges()` diffs against previous snapshot (~100 tokens)
- vs dumping full tree every turn (~2000 tokens)
### Gemini Video vs Images
- 30s video clip: ~7,700 tokens (258 tok/s)
- 15 screenshots: ~19,500 tokens (1,300 tok/frame)
- Video is 2.5x cheaper and shows temporal changes
### Model Selection
- Claude Sonnet 4.6: $3/$15 per 1M in/out — best reasoning
- Gemini 2.5 Flash: $0.10/$0.40 per 1M — best vision-per-dollar
- Hybrid uses each where it's strongest

View File

@@ -121,6 +121,7 @@
"zod-validation-error": "catalog:"
},
"devDependencies": {
"@anthropic-ai/claude-agent-sdk": "catalog:",
"@eslint/js": "catalog:",
"@google/generative-ai": "catalog:",
"@intlify/eslint-plugin-vue-i18n": "catalog:",

26
pnpm-lock.yaml generated
View File

@@ -9,6 +9,9 @@ catalogs:
'@alloc/quick-lru':
specifier: ^5.2.0
version: 5.2.0
'@anthropic-ai/claude-agent-sdk':
specifier: ^0.2.85
version: 0.2.85
'@astrojs/vue':
specifier: ^5.0.0
version: 5.1.4
@@ -597,6 +600,9 @@ importers:
specifier: 'catalog:'
version: 3.3.0(zod@3.24.1)
devDependencies:
'@anthropic-ai/claude-agent-sdk':
specifier: 'catalog:'
version: 0.2.85(zod@3.24.1)
'@eslint/js':
specifier: 'catalog:'
version: 9.39.1
@@ -1064,6 +1070,12 @@ packages:
'@antfu/utils@0.7.10':
resolution: {integrity: sha512-+562v9k4aI80m1+VuMHehNJWLOFjBnXn3tdOitzD0il5b7smkSBal4+a3oKiQTbrwMmN/TBUMDvbdoWDehgOww==}
'@anthropic-ai/claude-agent-sdk@0.2.85':
resolution: {integrity: sha512-/ohKLtP1zy6aWXLW/9KTYBveJPEtAfdO96qiP1Cl5S7LgVq/qRDUl7AUw5YGrBaK6YWHEE/rfMQZGwP/i5zIvQ==}
engines: {node: '>=18.0.0'}
peerDependencies:
zod: ^4.0.0
'@asamuzakjp/css-color@4.1.1':
resolution: {integrity: sha512-B0Hv6G3gWGMn0xKJ0txEi/jM5iFpT3MfDxmhZFb4W047GvytCf1DHQ1D69W3zHI4yWe2aTZAA0JnbMZ7Xc8DuQ==}
@@ -10068,6 +10080,20 @@ snapshots:
'@antfu/utils@0.7.10': {}
'@anthropic-ai/claude-agent-sdk@0.2.85(zod@3.24.1)':
dependencies:
zod: 3.24.1
optionalDependencies:
'@img/sharp-darwin-arm64': 0.34.5
'@img/sharp-darwin-x64': 0.34.5
'@img/sharp-linux-arm': 0.34.5
'@img/sharp-linux-arm64': 0.34.5
'@img/sharp-linux-x64': 0.34.5
'@img/sharp-linuxmusl-arm64': 0.34.5
'@img/sharp-linuxmusl-x64': 0.34.5
'@img/sharp-win32-arm64': 0.34.5
'@img/sharp-win32-x64': 0.34.5
'@asamuzakjp/css-color@4.1.1':
dependencies:
'@csstools/css-calc': 2.1.4(@csstools/css-parser-algorithms@3.0.5(@csstools/css-tokenizer@3.0.4))(@csstools/css-tokenizer@3.0.4)

View File

@@ -4,6 +4,7 @@ packages:
catalog:
'@alloc/quick-lru': ^5.2.0
'@anthropic-ai/claude-agent-sdk': ^0.2.85
'@astrojs/vue': ^5.0.0
'@comfyorg/comfyui-electron-types': 0.6.2
'@eslint/js': ^9.39.1

347
scripts/qa-agent.ts Normal file
View File

@@ -0,0 +1,347 @@
#!/usr/bin/env tsx
/**
* QA Research Phase — Claude writes & debugs E2E tests to reproduce bugs
*
* Instead of driving a browser interactively, Claude:
* 1. Reads the issue + a11y snapshot of the UI
* 2. Writes a Playwright E2E test (.spec.ts) that reproduces the bug
* 3. Runs the test → reads errors → rewrites → repeats until it works
* 4. Outputs the passing test + verdict
*
* Tools:
* - inspect(selector) — read a11y tree to understand UI state
* - writeTest(code) — write a Playwright test file
* - runTest() — execute the test and get results
* - done(verdict, summary, testCode) — finish with the working test
*/
import type { Page } from '@playwright/test'
import { query, tool, createSdkMcpServer } from '@anthropic-ai/claude-agent-sdk'
import { z } from 'zod'
import { mkdirSync, writeFileSync } from 'fs'
import { execSync } from 'child_process'
// ── Types ──
interface ResearchOptions {
page: Page
issueContext: string
qaGuide: string
outputDir: string
serverUrl: string
anthropicApiKey?: string
maxTurns?: number
timeBudgetMs?: number
}
export interface ResearchResult {
verdict: 'REPRODUCED' | 'NOT_REPRODUCIBLE' | 'INCONCLUSIVE'
summary: string
evidence: string
testCode: string
log: Array<{
turn: number
timestampMs: number
toolName: string
toolInput: unknown
toolResult: string
}>
}
// ── Main research function ──
export async function runResearchPhase(
opts: ResearchOptions
): Promise<ResearchResult> {
const { page, issueContext, qaGuide, outputDir, serverUrl, anthropicApiKey } =
opts
const maxTurns = opts.maxTurns ?? 50
const timeBudgetMs = opts.timeBudgetMs ?? 600_000 // 10 min for write→run→fix loops
let agentDone = false
let finalVerdict: ResearchResult['verdict'] = 'INCONCLUSIVE'
let finalSummary = 'Agent did not complete'
let finalEvidence = ''
let finalTestCode = ''
let turnCount = 0
const startTime = Date.now()
const researchLog: ResearchResult['log'] = []
const testDir = `${outputDir}/research`
mkdirSync(testDir, { recursive: true })
const testPath = `${testDir}/reproduce.spec.ts`
// Get initial a11y snapshot for context
let initialA11y = ''
try {
initialA11y = await page.locator('body').ariaSnapshot({ timeout: 5000 })
initialA11y = initialA11y.slice(0, 3000)
} catch {
initialA11y = '(could not capture initial a11y snapshot)'
}
// ── Tool: inspect ──
const inspectTool = tool(
'inspect',
'Read the current accessibility tree to understand UI state. Use this to discover element names, roles, and selectors for your test.',
{
selector: z
.string()
.optional()
.describe(
'Optional filter — only show elements matching this name/role. Omit for full tree.'
)
},
async (args) => {
let resultText: string
try {
const ariaText = await page
.locator('body')
.ariaSnapshot({ timeout: 5000 })
if (args.selector) {
const lines = ariaText.split('\n')
const matches = lines.filter((l: string) =>
l.toLowerCase().includes(args.selector!.toLowerCase())
)
resultText =
matches.length > 0
? `Found "${args.selector}":\n${matches.slice(0, 15).join('\n')}`
: `"${args.selector}" not found. Full tree:\n${ariaText.slice(0, 2000)}`
} else {
resultText = ariaText.slice(0, 3000)
}
} catch (e) {
resultText = `inspect failed: ${e instanceof Error ? e.message : e}`
}
researchLog.push({
turn: turnCount,
timestampMs: Date.now() - startTime,
toolName: 'inspect',
toolInput: args,
toolResult: resultText.slice(0, 500)
})
return { content: [{ type: 'text' as const, text: resultText }] }
}
)
// ── Tool: writeTest ──
const writeTestTool = tool(
'writeTest',
'Write a Playwright E2E test file that reproduces the bug. The test should assert the broken behavior exists.',
{
code: z
.string()
.describe('Complete Playwright test file content (.spec.ts)')
},
async (args) => {
writeFileSync(testPath, args.code)
researchLog.push({
turn: turnCount,
timestampMs: Date.now() - startTime,
toolName: 'writeTest',
toolInput: { path: testPath, codeLength: args.code.length },
toolResult: `Test written to ${testPath} (${args.code.length} chars)`
})
return {
content: [
{
type: 'text' as const,
text: `Test written to ${testPath}. Use runTest() to execute it.`
}
]
}
}
)
// ── Tool: runTest ──
// Place test in browser_tests/ so Playwright config finds fixtures
const projectRoot = process.cwd()
const browserTestPath = `${projectRoot}/browser_tests/tests/qa-reproduce.spec.ts`
const runTestTool = tool(
'runTest',
'Run the Playwright test and get results. Returns stdout/stderr including assertion errors.',
{},
async () => {
turnCount++
// Copy the test to browser_tests/tests/ where Playwright expects it
const { copyFileSync } = await import('fs')
try {
copyFileSync(testPath, browserTestPath)
} catch {
// directory may not exist
mkdirSync(`${projectRoot}/browser_tests/tests`, { recursive: true })
copyFileSync(testPath, browserTestPath)
}
let resultText: string
try {
const output = execSync(
`cd "${projectRoot}" && npx playwright test browser_tests/tests/qa-reproduce.spec.ts --reporter=list --timeout=30000 --retries=0 --workers=1 2>&1`,
{
timeout: 90000,
encoding: 'utf-8',
env: {
...process.env,
COMFYUI_BASE_URL: serverUrl
}
}
)
resultText = `TEST PASSED:\n${output.slice(-1500)}`
} catch (e) {
const err = e as { stdout?: string; stderr?: string; message?: string }
const output = (err.stdout || '') + '\n' + (err.stderr || '')
resultText = `TEST FAILED:\n${output.slice(-2000)}`
}
researchLog.push({
turn: turnCount,
timestampMs: Date.now() - startTime,
toolName: 'runTest',
toolInput: { testPath },
toolResult: resultText.slice(0, 1000)
})
return { content: [{ type: 'text' as const, text: resultText }] }
}
)
// ── Tool: done ──
const doneTool = tool(
'done',
'Finish research with verdict and the final test code.',
{
verdict: z.enum(['REPRODUCED', 'NOT_REPRODUCIBLE', 'INCONCLUSIVE']),
summary: z.string().describe('What you found and why'),
evidence: z.string().describe('Test output that proves the verdict'),
testCode: z
.string()
.describe(
'Final Playwright test code. If REPRODUCED, this test asserts the bug exists and passes.'
)
},
async (args) => {
agentDone = true
finalVerdict = args.verdict
finalSummary = args.summary
finalEvidence = args.evidence
finalTestCode = args.testCode
writeFileSync(testPath, args.testCode)
return {
content: [
{ type: 'text' as const, text: `Research complete: ${args.verdict}` }
]
}
}
)
// ── MCP Server ──
const server = createSdkMcpServer({
name: 'qa-research',
version: '1.0.0',
tools: [inspectTool, writeTestTool, runTestTool, doneTool]
})
// ── System prompt ──
const systemPrompt = `You are a senior QA engineer who writes Playwright E2E tests to reproduce reported bugs.
## Your tools
- inspect(selector?) — Read the accessibility tree to understand the current UI. Use to discover selectors, element names, and UI state.
- writeTest(code) — Write a Playwright test file (.spec.ts)
- runTest() — Execute the test and get results (pass/fail + errors)
- done(verdict, summary, evidence, testCode) — Finish with the final test
## Workflow
1. Read the issue description carefully
2. Use inspect() to understand the current UI state and discover element selectors
3. Write a Playwright test that:
- Navigates to ${serverUrl}
- Performs the exact reproduction steps from the issue
- Asserts the BROKEN behavior (the bug) — so the test PASSES when the bug exists
4. Run the test with runTest()
5. If it fails: read the error, fix the test, run again (max 5 attempts)
6. Call done() with the final verdict and test code
## Test writing guidelines
- Import the project fixture: \`import { comfyPageFixture as test } from '../fixtures/ComfyPage'\`
- Import expect: \`import { expect } from '@playwright/test'\`
- The fixture provides \`comfyPage\` which has:
- \`comfyPage.page\` — the Playwright Page object
- \`comfyPage.menu.topbar\` — topbar actions (saveWorkflowAs, getTabNames, getWorkflowTab)
- \`comfyPage.menu.topbar.triggerTopbarCommand(label)\` — click a menu command
- \`comfyPage.workflow\` — workflow helpers (isCurrentWorkflowModified, setupWorkflowsDirectory)
- \`comfyPage.canvas\` — canvas element for mouse interactions
- \`comfyPage.settings.setSetting(id, value)\` — change settings
- \`comfyPage.nextFrame()\` — wait for next render frame
- \`comfyPage.loadWorkflow(name)\` — load a named workflow
- Use beforeEach to set up settings and workflow directory
- Use afterEach to clean up (setupWorkflowsDirectory({}))
- If the bug IS present, the test should PASS. If the bug is fixed, the test would FAIL.
- Keep tests focused and minimal — test ONLY the reported bug
- The test file will be placed in browser_tests/tests/qa-reproduce.spec.ts
## Current UI state (accessibility tree)
${initialA11y}
${qaGuide ? `## QA Analysis Guide\n${qaGuide}\n` : ''}
## Issue to Reproduce
${issueContext}`
// ── Run the agent ──
console.warn('Starting research phase (Claude writes E2E tests)...')
try {
for await (const message of query({
prompt:
'Write a Playwright E2E test that reproduces the reported bug. Use inspect() to discover selectors, writeTest() to write the test, runTest() to execute it. Iterate until it works or you determine the bug cannot be reproduced.',
options: {
model: 'claude-sonnet-4-6',
systemPrompt,
...(anthropicApiKey ? { apiKey: anthropicApiKey } : {}),
maxTurns,
mcpServers: { 'qa-research': server },
allowedTools: [
'mcp__qa-research__inspect',
'mcp__qa-research__writeTest',
'mcp__qa-research__runTest',
'mcp__qa-research__done'
]
}
})) {
if (message.type === 'assistant' && message.message?.content) {
for (const block of message.message.content) {
if ('text' in block && block.text) {
console.warn(` Claude: ${block.text.slice(0, 200)}`)
}
if ('name' in block) {
console.warn(
` Tool: ${block.name}(${JSON.stringify(block.input).slice(0, 100)})`
)
}
}
}
if (agentDone) break
}
} catch (e) {
console.warn(`Research error: ${e instanceof Error ? e.message : e}`)
}
const result: ResearchResult = {
verdict: finalVerdict,
summary: finalSummary,
evidence: finalEvidence,
testCode: finalTestCode,
log: researchLog
}
writeFileSync(`${testDir}/research-log.json`, JSON.stringify(result, null, 2))
console.warn(
`Research complete: ${finalVerdict} (${researchLog.length} tool calls)`
)
return result
}

View File

@@ -77,15 +77,15 @@ for os in Linux macOS Windows; do
fi
if [ "$HAS_BEFORE" = "1" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls muted preload=metadata><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls muted preload=metadata><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"
elif [ -f "$DEPLOY_DIR/qa-${os}.mp4" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=video-wrap><video controls muted preload=metadata><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=card-body><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links><a class=dl href=qa-${os}.mp4 download>${DL_ICON}Download</a>${REPORT_LINK}</span></div>${REPORT_HTML}</div>"
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=video-wrap><video controls muted preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=card-body><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links><a class=dl href=qa-${os}.mp4 download>${DL_ICON}Download</a>${REPORT_LINK}</span></div>${REPORT_HTML}</div>"
else
PASS_VIDEOS=""
for pass_vid in "$DEPLOY_DIR/qa-${os}-pass"[0-9].mp4; do
[ -f "$pass_vid" ] || continue
PASS_NUM=$(basename "$pass_vid" | sed "s/qa-${os}-pass\([0-9]\).mp4/\1/")
PASS_VIDEOS="${PASS_VIDEOS}<div class=comp-panel><div class=comp-label>Pass ${PASS_NUM}</div><div class=video-wrap><video controls muted preload=metadata><source src=qa-${os}-pass${PASS_NUM}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}-pass${PASS_NUM}.mp4 download>${DL_ICON}Pass ${PASS_NUM}</a></div></div>"
PASS_VIDEOS="${PASS_VIDEOS}<div class=comp-panel><div class=comp-label>Pass ${PASS_NUM}</div><div class=video-wrap><video controls muted preload=auto><source src=qa-${os}-pass${PASS_NUM}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}-pass${PASS_NUM}.mp4 download>${DL_ICON}Pass ${PASS_NUM}</a></div></div>"
done
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison>${PASS_VIDEOS}</div>${REPORT_HTML}</div>"
fi
@@ -112,6 +112,10 @@ if [ -n "${AFTER_SHA:-}" ]; then
[ -n "${TARGET_NUM:-}" ] && AFTER_LABEL="#${TARGET_NUM}"
COMMIT_HTML="${COMMIT_HTML:+${COMMIT_HTML} &middot; }<a href=${REPO_URL}/commit/${AFTER_SHA} class=sha title='PR head commit'>${AFTER_LABEL} @ ${SHORT_AFTER}</a>"
fi
if [ -n "${PIPELINE_SHA:-}" ]; then
SHORT_PIPE="${PIPELINE_SHA:0:7}"
COMMIT_HTML="${COMMIT_HTML:+${COMMIT_HTML} &middot; }<a href=${REPO_URL}/commit/${PIPELINE_SHA} class=sha title='QA pipeline version'>QA @ ${SHORT_PIPE}</a>"
fi
[ -n "$COMMIT_HTML" ] && COMMIT_HTML=" &middot; ${COMMIT_HTML}"
RUN_LINK=""
@@ -119,16 +123,76 @@ if [ -n "${RUN_URL:-}" ]; then
RUN_LINK=" &middot; <a href=\"${RUN_URL}\" class=sha title=\"GitHub Actions run\">CI Job</a>"
fi
# Timing info
DEPLOY_TIME=$(date -u '+%Y-%m-%d %H:%M UTC')
TIMING_HTML=""
if [ -n "${RUN_START_TIME:-}" ]; then
TIMING_HTML=" &middot; <span class=sha title='Pipeline timing'>${RUN_START_TIME} &rarr; ${DEPLOY_TIME}</span>"
fi
# Generate index.html from template
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
TEMPLATE="$SCRIPT_DIR/qa-report-template.html"
# Write dynamic content to temp files for safe substitution
# Cloudflare Pages _headers file — enable range requests for video seeking
cat > "$DEPLOY_DIR/_headers" <<'HEADERSEOF'
/*.mp4
Accept-Ranges: bytes
Cache-Control: public, max-age=86400
HEADERSEOF
# Build purpose description from pr-context.txt
PURPOSE_HTML=""
if [ -f pr-context.txt ]; then
# Extract title line and first paragraph of description
PR_TITLE=$(grep -m1 '^Title:' pr-context.txt | sed 's/^Title: //')
if [ "$TARGET_TYPE" = "issue" ]; then
PURPOSE_LABEL="Issue #${TARGET_NUM}"
PURPOSE_VERB="reports"
else
PURPOSE_LABEL="PR #${TARGET_NUM}"
PURPOSE_VERB="aims to"
fi
# Get first ~300 chars of description body (after "Description:" line)
PR_DESC=$(sed -n '/^Description:/,/^###/p' pr-context.txt | grep -v '^Description:\|^###' | head -5 | sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g' | tr '\n' ' ' | head -c 400)
[ -z "$PR_DESC" ] && PR_DESC=$(sed -n '3,8p' pr-context.txt | sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g' | tr '\n' ' ' | head -c 400)
# Build requirements from QA guide JSON
REQS_HTML=""
QA_GUIDE=$(ls qa-guides/qa-guide-*.json 2>/dev/null | head -1)
if [ -f "$QA_GUIDE" ]; then
PREREQS=$(python3 -c "
import json, sys, html
try:
g = json.load(open(sys.argv[1]))
prereqs = g.get('prerequisites', [])
steps = g.get('steps', [])
focus = g.get('test_focus', '')
parts = []
if focus:
parts.append('<strong>Test focus:</strong> ' + html.escape(focus))
if prereqs:
parts.append('<strong>Prerequisites:</strong> ' + ', '.join(html.escape(p) for p in prereqs))
if steps:
parts.append('<strong>Steps:</strong> ' + ' → '.join(html.escape(s.get('description', str(s))) for s in steps[:6]))
if len(steps) > 6:
parts[-1] += ' → ...'
print('<br>'.join(parts))
except: pass
" "$QA_GUIDE" 2>/dev/null)
[ -n "$PREREQS" ] && REQS_HTML="<div class=purpose-reqs>${PREREQS}</div>"
fi
PURPOSE_HTML="<div class=purpose><div class=purpose-label>${PURPOSE_LABEL} ${PURPOSE_VERB}</div><strong>${PR_TITLE}</strong><br>${PR_DESC}${REQS_HTML}</div>"
fi
echo -n "$COMMIT_HTML" > "$DEPLOY_DIR/.commit_html"
echo -n "$CARDS" > "$DEPLOY_DIR/.cards_html"
echo -n "$RUN_LINK" > "$DEPLOY_DIR/.run_link"
# Badge HTML with copy button (placeholder URL filled after deploy)
echo -n '<div class="badge-bar"><img src="badge.svg" alt="QA Badge" class="badge-img"/><button class="copy-badge" title="Copy badge markdown" onclick="copyBadge()"><svg width=14 height=14 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2><rect x=9 y=9 width=13 height=13 rx=2/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg></button></div>' > "$DEPLOY_DIR/.badge_html"
echo -n "${TIMING_HTML:-}" > "$DEPLOY_DIR/.timing_html"
echo -n "$PURPOSE_HTML" > "$DEPLOY_DIR/.purpose_html"
python3 -c "
import sys, pathlib
d = pathlib.Path(sys.argv[1])
@@ -137,9 +201,11 @@ t = t.replace('{{COMMIT_HTML}}', (d / '.commit_html').read_text())
t = t.replace('{{CARDS}}', (d / '.cards_html').read_text())
t = t.replace('{{RUN_LINK}}', (d / '.run_link').read_text())
t = t.replace('{{BADGE_HTML}}', (d / '.badge_html').read_text())
t = t.replace('{{TIMING_HTML}}', (d / '.timing_html').read_text())
t = t.replace('{{PURPOSE_HTML}}', (d / '.purpose_html').read_text())
sys.stdout.write(t)
" "$DEPLOY_DIR" "$TEMPLATE" > "$DEPLOY_DIR/index.html"
rm -f "$DEPLOY_DIR/.commit_html" "$DEPLOY_DIR/.cards_html" "$DEPLOY_DIR/.run_link" "$DEPLOY_DIR/.badge_html"
rm -f "$DEPLOY_DIR/.commit_html" "$DEPLOY_DIR/.cards_html" "$DEPLOY_DIR/.run_link" "$DEPLOY_DIR/.badge_html" "$DEPLOY_DIR/.timing_html" "$DEPLOY_DIR/.purpose_html"
cat > "$DEPLOY_DIR/404.html" <<'ERROREOF'
<!DOCTYPE html><html lang=en><head><meta charset=utf-8><title>404</title>
@@ -148,43 +214,138 @@ cat > "$DEPLOY_DIR/404.html" <<'ERROREOF'
</head><body><div><h1>404</h1><p>File not found. The QA recording may have failed or been cancelled.</p></div></body></html>
ERROREOF
# Generate badge SVGs into deploy dir
# Verdict detection: check AI review reports for reproduction outcome.
# Patterns are ordered from most specific to least specific.
REPRO_RESULT="" REPRO_COLOR="#9f9f9f"
if grep -riq 'INCONCLUSIVE' video-reviews/ 2>/dev/null; then
REPRO_RESULT="INCONCLUSIVE" REPRO_COLOR="#9f9f9f"
elif grep -riq 'not reproduced\|could not reproduce\|unable to reproduce' video-reviews/ 2>/dev/null; then
REPRO_RESULT="NOT REPRODUCIBLE" REPRO_COLOR="#9f9f9f"
elif grep -riq 'partially reproduced' video-reviews/ 2>/dev/null; then
REPRO_RESULT="PARTIAL" REPRO_COLOR="#dfb317"
# Match "reproduced", "confirmed", "confirms", "reproducible" in body text (not headings)
elif grep -ri 'reproduc\|confirm' video-reviews/ 2>/dev/null | grep -vq '^[^:]*:##'; then
REPRO_RESULT="REPRODUCED" REPRO_COLOR="#2196f3"
fi
# Badge label includes the target number for identification
BADGE_LABEL="QA"
[ -n "${TARGET_NUM:-}" ] && BADGE_LABEL="#${TARGET_NUM} QA"
if [ "$TARGET_TYPE" = "issue" ]; then
BADGE_STATUS="${REPRO_RESULT:-FINISHED}"
/tmp/gen-badge.sh "$BADGE_STATUS" "${REPRO_COLOR}" "$DEPLOY_DIR/badge.svg" "$BADGE_LABEL"
else
SOLN_RESULT="" SOLN_COLOR="#4c1"
if grep -riq 'major.*issue\|critical\|breaking\|regression' video-reviews/ 2>/dev/null; then
SOLN_RESULT="MAJOR ISSUES" SOLN_COLOR="#e05d44"
elif grep -riq 'minor.*issue\|cosmetic\|nitpick' video-reviews/ 2>/dev/null; then
SOLN_RESULT="MINOR ISSUES" SOLN_COLOR="#dfb317"
elif grep -riq 'no.*issue\|looks good\|approved\|pass' video-reviews/ 2>/dev/null; then
SOLN_RESULT="APPROVED" SOLN_COLOR="#4c1"
# Copy research log to deploy dir if it exists
for rlog in qa-artifacts/*/research/research-log.json qa-artifacts/*/*/research/research-log.json qa-artifacts/before/*/research/research-log.json; do
if [ -f "$rlog" ]; then
cp "$rlog" "$DEPLOY_DIR/research-log.json"
echo "Found research log: $rlog"
break
fi
done
# Generate badge SVGs into deploy dir
# Priority: research-log.json verdict (a11y-verified) > video review verdict (AI interpretation)
REPRO_COUNT=0 INCONC_COUNT=0 NOT_REPRO_COUNT=0 TOTAL_REPORTS=0
# Try research log first (ground truth from a11y assertions)
RESEARCH_VERDICT=""
if [ -f "$DEPLOY_DIR/research-log.json" ]; then
RESEARCH_VERDICT=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); print(d.get('verdict',''))" "$DEPLOY_DIR/research-log.json" 2>/dev/null || true)
echo "Research verdict (a11y-verified): ${RESEARCH_VERDICT:-none}"
if [ -n "$RESEARCH_VERDICT" ]; then
TOTAL_REPORTS=1
case "$RESEARCH_VERDICT" in
REPRODUCED) REPRO_COUNT=1 ;;
NOT_REPRODUCIBLE) NOT_REPRO_COUNT=1 ;;
INCONCLUSIVE) INCONC_COUNT=1 ;;
esac
fi
BADGE_STATUS="${REPRO_RESULT:-UNKNOWN} | Fix: ${SOLN_RESULT:-UNKNOWN}"
/tmp/gen-badge-dual.sh \
"${REPRO_RESULT:-UNKNOWN}" "${REPRO_COLOR}" \
"${SOLN_RESULT:-UNKNOWN}" "${SOLN_COLOR}" \
"$DEPLOY_DIR/badge.svg" "$BADGE_LABEL"
fi
# Fall back to video review verdicts if no research log
if [ -z "$RESEARCH_VERDICT" ] && [ -d video-reviews ]; then
for rpt in video-reviews/*-qa-video-report.md; do
[ -f "$rpt" ] || continue
TOTAL_REPORTS=$((TOTAL_REPORTS + 1))
# Try structured JSON verdict first (from ## Verdict section)
VERDICT_JSON=$(grep -oP '"verdict":\s*"[A-Z_]+' "$rpt" 2>/dev/null | tail -1 | grep -oP '[A-Z_]+$' || true)
RISK_JSON=$(grep -oP '"risk":\s*"[a-z]+' "$rpt" 2>/dev/null | tail -1 | grep -oP '[a-z]+$' || true)
if [ -n "$VERDICT_JSON" ]; then
case "$VERDICT_JSON" in
REPRODUCED) REPRO_COUNT=$((REPRO_COUNT + 1)) ;;
NOT_REPRODUCIBLE) NOT_REPRO_COUNT=$((NOT_REPRO_COUNT + 1)) ;;
INCONCLUSIVE) INCONC_COUNT=$((INCONC_COUNT + 1)) ;;
esac
else
# Fallback: grep Summary section (for older reports without ## Verdict)
SUMM=$(sed -n '/^## Summary/,/^## /p' "$rpt" 2>/dev/null | head -15)
if echo "$SUMM" | grep -iq 'INCONCLUSIVE'; then
INCONC_COUNT=$((INCONC_COUNT + 1))
elif echo "$SUMM" | grep -iq 'not reproduced\|could not reproduce\|could not be confirmed\|unable to reproduce\|fails\? to reproduce\|fails\? to perform\|was NOT\|NOT visible\|not observed\|fail.* to demonstrate\|does not demonstrate\|steps were not performed\|never.*tested\|never.*accessed\|not.* confirmed'; then
NOT_REPRO_COUNT=$((NOT_REPRO_COUNT + 1))
elif echo "$SUMM" | grep -iq 'reproduc\|confirm'; then
REPRO_COUNT=$((REPRO_COUNT + 1))
fi
fi
done
fi
FAIL_COUNT=$((TOTAL_REPORTS - REPRO_COUNT - NOT_REPRO_COUNT))
[ "$FAIL_COUNT" -lt 0 ] && FAIL_COUNT=0
echo "DEBUG verdict: repro=${REPRO_COUNT} not_repro=${NOT_REPRO_COUNT} inconc=${INCONC_COUNT} fail=${FAIL_COUNT} total=${TOTAL_REPORTS}"
echo "Verdict: ${REPRO_COUNT}${NOT_REPRO_COUNT}${FAIL_COUNT}⚠ / ${TOTAL_REPORTS}"
# Badge text:
# Single pass: "REPRODUCED" / "NOT REPRODUCIBLE" / "INCONCLUSIVE"
# Multi pass: "2✓ 0✗ 1⚠ / 3" with color based on dominant result
REPRO_RESULT="" REPRO_COLOR="#9f9f9f"
if [ "$TOTAL_REPORTS" -le 1 ]; then
# Single report — simple label
if [ "$REPRO_COUNT" -gt 0 ]; then
REPRO_RESULT="REPRODUCED" REPRO_COLOR="#2196f3"
elif [ "$NOT_REPRO_COUNT" -gt 0 ]; then
REPRO_RESULT="NOT REPRODUCIBLE" REPRO_COLOR="#9f9f9f"
elif [ "$FAIL_COUNT" -gt 0 ]; then
REPRO_RESULT="INCONCLUSIVE" REPRO_COLOR="#9f9f9f"
fi
else
# Multi pass — show breakdown: X✓ Y✗ Z⚠ / N
PARTS=""
[ "$REPRO_COUNT" -gt 0 ] && PARTS="${REPRO_COUNT}"
[ "$NOT_REPRO_COUNT" -gt 0 ] && PARTS="${PARTS:+${PARTS} }${NOT_REPRO_COUNT}"
[ "$FAIL_COUNT" -gt 0 ] && PARTS="${PARTS:+${PARTS} }${FAIL_COUNT}"
REPRO_RESULT="${PARTS} / ${TOTAL_REPORTS}"
# Color based on best outcome
if [ "$REPRO_COUNT" -gt 0 ]; then
REPRO_COLOR="#2196f3"
elif [ "$NOT_REPRO_COUNT" -gt 0 ]; then
REPRO_COLOR="#9f9f9f"
fi
fi
# Badge label: #NUM QA0327 (with today's date)
QA_DATE=$(date -u '+%m%d')
BADGE_LABEL="QA${QA_DATE}"
[ -n "${TARGET_NUM:-}" ] && BADGE_LABEL="#${TARGET_NUM} QA${QA_DATE}"
# For PRs, also extract fix quality from Overall Risk section
FIX_RESULT="" FIX_COLOR="#4c1"
if [ "$TARGET_TYPE" != "issue" ]; then
# Try structured JSON risk first
ALL_RISKS=$(grep -ohP '"risk":\s*"[a-z]+' video-reviews/*.md 2>/dev/null | grep -oP '[a-z]+$' || true)
if [ -n "$ALL_RISKS" ]; then
# Use worst risk across all reports
if echo "$ALL_RISKS" | grep -q 'high'; then
FIX_RESULT="MAJOR ISSUES" FIX_COLOR="#e05d44"
elif echo "$ALL_RISKS" | grep -q 'medium'; then
FIX_RESULT="MINOR ISSUES" FIX_COLOR="#dfb317"
elif echo "$ALL_RISKS" | grep -q 'low'; then
FIX_RESULT="APPROVED" FIX_COLOR="#4c1"
fi
else
# Fallback: grep Overall Risk section
RISK_TEXT=""
if [ -d video-reviews ]; then
RISK_TEXT=$(sed -n '/^## Overall Risk/,/^## /p' video-reviews/*.md 2>/dev/null | sed 's/\*//g' | head -20)
fi
RISK_FIRST=$(echo "$RISK_TEXT" | grep -oiP '^\s*(high|medium|moderate|low|minimal|critical)' | head -1 | tr '[:upper:]' '[:lower:]')
if [ -n "$RISK_FIRST" ]; then
case "$RISK_FIRST" in
*low*|*minimal*) FIX_RESULT="APPROVED" FIX_COLOR="#4c1" ;;
*medium*|*moderate*) FIX_RESULT="MINOR ISSUES" FIX_COLOR="#dfb317" ;;
*high*|*critical*) FIX_RESULT="MAJOR ISSUES" FIX_COLOR="#e05d44" ;;
esac
elif echo "$RISK_TEXT" | grep -iq 'no.*risk\|approved\|looks good'; then
FIX_RESULT="APPROVED" FIX_COLOR="#4c1"
fi
fi
fi
# Always use vertical box badge
/tmp/gen-badge-box.sh "$DEPLOY_DIR/badge.svg" "$BADGE_LABEL" \
"$REPRO_COUNT" "$NOT_REPRO_COUNT" "$FAIL_COUNT" "$TOTAL_REPORTS" \
"$FIX_RESULT" "$FIX_COLOR"
BADGE_STATUS="${REPRO_RESULT:-UNKNOWN}${FIX_RESULT:+ | Fix: ${FIX_RESULT}}"
echo "badge_status=${BADGE_STATUS:-FINISHED}" >> "$GITHUB_OUTPUT"
BRANCH=$(echo "$RAW_BRANCH" | sed 's/[^a-zA-Z0-9-]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | cut -c1-28)

View File

@@ -19,7 +19,15 @@
import { chromium } from '@playwright/test'
import type { Page } from '@playwright/test'
import { GoogleGenerativeAI } from '@google/generative-ai'
import { readFileSync, mkdirSync, readdirSync, renameSync, statSync } from 'fs'
import {
readFileSync,
writeFileSync,
mkdirSync,
readdirSync,
renameSync,
statSync
} from 'fs'
import { execSync } from 'child_process'
// ── Types ──
@@ -59,6 +67,17 @@ type TestAction =
}
| { action: 'addNode'; nodeName: string; x?: number; y?: number }
| { action: 'cloneNode'; x: number; y: number }
| { action: 'copyPaste'; x?: number; y?: number }
| {
action: 'holdKeyAndDrag'
key: string
fromX: number
fromY: number
toX: number
toY: number
}
| { action: 'resizeNode'; x: number; y: number; dx: number; dy: number }
| { action: 'middleClick'; x: number; y: number }
interface ActionResult {
action: TestAction
@@ -239,20 +258,25 @@ Each step is an object with an "action" field:
- { "action": "setSetting", "id": "Comfy.Setting.Id", "value": true } — changes a ComfyUI setting
### Compound actions (save multiple turns)
- { "action": "addNode", "nodeName": "KSampler", "x": 640, "y": 400 } — double-clicks canvas to open node search, types name, presses Enter
- { "action": "cloneNode", "x": 750, "y": 350 } — right-clicks node at coords and clicks Clone in context menu
- { "action": "addNode", "nodeName": "KSampler", "x": 640, "y": 400 } — double-clicks canvas, types name, presses Enter
- { "action": "cloneNode", "x": 750, "y": 350 } — right-clicks node, clicks Clone
- { "action": "copyPaste", "x": 640, "y": 400 } — clicks node at coords, Ctrl+C then Ctrl+V
- { "action": "holdKeyAndDrag", "key": " ", "fromX": 640, "fromY": 400, "toX": 400, "toY": 300 } — hold key + drag (Space=pan)
- { "action": "resizeNode", "x": 200, "y": 380, "dx": 100, "dy": 50 } — drag node edge to resize
- { "action": "middleClick", "x": 640, "y": 400 } — middle mouse button
### Utility actions
- { "action": "wait", "ms": 1000 } — waits (use sparingly, max 3000ms)
- { "action": "screenshot", "name": "step-name" } — takes a screenshot
- { "action": "annotate", "text": "Look here!", "x": 640, "y": 400 } — shows a floating label at coordinates for 2s (use to draw viewer attention to important UI state)
- { "action": "annotate", "text": "Bug: tab still dirty", "x": 100, "y": 20, "durationMs": 3000 } — annotation with custom duration
- { "action": "reload" } — reloads the page (use for testing state persistence across page loads)
${qaGuideSection}${testPlanSection}
${diff ? `## PR Diff\n\`\`\`\n${diff.slice(0, 3000)}\n\`\`\`` : ''}
## Rules
- Output ONLY a valid JSON array of actions, no markdown fences or explanation
- ${mode === 'reproduce' ? 'You MUST follow the reproduction steps from the issue closely. Generate 8-15 steps that actually trigger the bug. Do NOT just open a menu and take a screenshot — perform the FULL reproduction sequence including node interactions, context menus, keyboard shortcuts, and canvas operations' : mode === 'before' ? 'Keep it minimal — just show the old/missing behavior' : 'Test the specific behavior that changed in the PR'}
- ${mode === 'reproduce' ? 'You MUST follow the reproduction steps from the issue closely. Generate 8-15 steps that actually trigger the bug. Do NOT just open a menu and take a screenshot — perform the FULL reproduction sequence including node interactions, context menus, keyboard shortcuts, and canvas operations' : mode === 'before' ? 'Keep it minimal — just show the old/missing behavior' : 'CRITICAL: Test the EXACT behavior changed by the PR. Read the diff carefully to understand what UI feature was modified. Do NOT just open menus and take screenshots — you must TRIGGER the specific scenario the PR fixes. For example: if the PR fixes "tabs lost on restart", actually create tabs AND reload the page. If the PR fixes "widget disappears on collapse", create a subgraph with widgets AND collapse it. Generic UI walkthrough is USELESS — demonstrate the actual fix working.'}
- Always include at least one screenshot
- Do NOT include login steps (handled automatically)
- The default workflow is already loaded when your steps start
@@ -376,6 +400,176 @@ const FALLBACK_STEPS: Record<RecordMode, TestAction[]> = {
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms))
async function moveCursorOverlay(page: Page, x: number, y: number) {
await page.evaluate(
([cx, cy]) => {
const fn = (window as unknown as Record<string, unknown>).__moveCursor as
| ((x: number, y: number) => void)
| undefined
if (fn) fn(cx, cy)
},
[x, y]
)
}
async function clickCursorOverlay(page: Page, down: boolean) {
await page.evaluate((d) => {
const fn = (window as unknown as Record<string, unknown>).__clickCursor as
| ((down: boolean) => void)
| undefined
if (fn) fn(d)
}, down)
}
interface NarrationSegment {
turn: number
timestampMs: number
text: string
}
// Collected during recording, used for TTS post-processing
const narrationSegments: NarrationSegment[] = []
const recordingStartMs = 0
async function showSubtitle(page: Page, text: string, turn: number) {
const safeText = text.slice(0, 120).replace(/'/g, "\\'").replace(/\n/g, ' ')
const encoded = encodeURIComponent(safeText)
// Track for TTS post-processing
narrationSegments.push({
turn,
timestampMs: Date.now() - recordingStartMs,
text: safeText
})
await page.addScriptTag({
content: `(function(){
var id='qa-subtitle';
var el=document.getElementById(id);
if(!el){
el=document.createElement('div');
el.id=id;
Object.assign(el.style,{position:'fixed',bottom:'32px',left:'50%',transform:'translateX(-50%)',zIndex:'2147483646',maxWidth:'90%',padding:'6px 14px',borderRadius:'6px',background:'rgba(0,0,0,0.8)',color:'rgba(255,255,255,0.95)',fontSize:'12px',fontFamily:'system-ui,sans-serif',fontWeight:'400',lineHeight:'1.4',pointerEvents:'none',textAlign:'center',transition:'opacity 0.3s',whiteSpace:'normal'});
document.body.appendChild(el);
}
var msg=decodeURIComponent('${encoded}');
el.textContent='['+${turn}+'] '+msg;
el.style.opacity='1';
})()`
})
}
async function generateNarrationAudio(
segments: NarrationSegment[],
outputDir: string,
apiKey: string
): Promise<string | null> {
if (segments.length === 0) return null
const narrationDir = `${outputDir}/narration`
mkdirSync(narrationDir, { recursive: true })
// Save narration metadata
writeFileSync(
`${narrationDir}/segments.json`,
JSON.stringify(segments, null, 2)
)
// Generate TTS using OpenAI API (high quality, fast)
const ttsKey = process.env.OPENAI_API_KEY
if (!ttsKey) {
console.warn(' OPENAI_API_KEY not set, skipping TTS narration')
return null
}
const audioFiles: Array<{ path: string; offsetMs: number }> = []
for (const seg of segments) {
const audioPath = `${narrationDir}/turn-${seg.turn}.mp3`
try {
const resp = await fetch('https://api.openai.com/v1/audio/speech', {
method: 'POST',
headers: {
Authorization: `Bearer ${ttsKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'tts-1',
voice: 'nova',
input: seg.text,
speed: 1.15
})
})
if (!resp.ok)
throw new Error(`TTS API ${resp.status}: ${await resp.text()}`)
const audioBuffer = Buffer.from(await resp.arrayBuffer())
writeFileSync(audioPath, audioBuffer)
audioFiles.push({ path: audioPath, offsetMs: seg.timestampMs })
console.warn(
` TTS [${seg.turn}]: ${audioPath} (${audioBuffer.length} bytes)`
)
} catch (e) {
console.warn(
` TTS [${seg.turn}] failed: ${e instanceof Error ? e.message.slice(0, 80) : e}`
)
}
}
if (audioFiles.length === 0) return null
// Build ffmpeg filter to mix all audio clips at correct timestamps
const inputArgs: string[] = []
const filterParts: string[] = []
for (let i = 0; i < audioFiles.length; i++) {
inputArgs.push('-i', audioFiles[i].path)
const delaySec = (audioFiles[i].offsetMs / 1000).toFixed(3)
filterParts.push(
`[${i}]adelay=${audioFiles[i].offsetMs}|${audioFiles[i].offsetMs}[a${i}]`
)
}
const mixInputs = audioFiles.map((_, i) => `[a${i}]`).join('')
const filter = `${filterParts.join(';')};${mixInputs}amix=inputs=${audioFiles.length}:normalize=0[aout]`
const mixedAudio = `${narrationDir}/mixed.mp3`
try {
execSync(
`ffmpeg -y ${inputArgs.join(' ')} -filter_complex "${filter}" -map "[aout]" "${mixedAudio}" 2>/dev/null`,
{ timeout: 30000 }
)
console.warn(` TTS mixed: ${mixedAudio}`)
return mixedAudio
} catch (e) {
console.warn(
` TTS mix failed: ${e instanceof Error ? e.message.slice(0, 80) : e}`
)
return null
}
}
function mergeAudioIntoVideo(
videoPath: string,
audioPath: string,
outputPath: string
): boolean {
try {
execSync(
`ffmpeg -y -i "${videoPath}" -i "${audioPath}" -c:v copy -c:a aac -map 0:v:0 -map 1:a:0 -shortest "${outputPath}" 2>/dev/null`,
{ timeout: 60000 }
)
// Replace original with narrated version
renameSync(outputPath, videoPath)
console.warn(` Narrated video: ${videoPath}`)
return true
} catch (e) {
console.warn(
` Audio merge failed: ${e instanceof Error ? e.message.slice(0, 80) : e}`
)
return false
}
}
async function openComfyMenu(page: Page) {
const menuTrigger = page.locator('.comfy-menu-button-wrapper')
const menuPopup = page.locator('.comfy-command-menu')
@@ -411,6 +605,13 @@ async function hoverMenuItem(page: Page, label: string) {
const menuItem = page
.locator('.comfy-command-menu .p-tieredmenu-item')
.filter({ has: menuLabel })
const box = await menuItem.boundingBox().catch(() => null)
if (box)
await moveCursorOverlay(
page,
box.x + box.width / 2,
box.y + box.height / 2
)
await menuItem.hover()
// Wait for submenu to appear
try {
@@ -435,9 +636,18 @@ async function clickSubmenuItem(page: Page, label: string) {
.filter({ hasText: label })
.first()
if (await primeItem.isVisible().catch(() => false)) {
const box = await primeItem.boundingBox().catch(() => null)
if (box)
await moveCursorOverlay(
page,
box.x + box.width / 2,
box.y + box.height / 2
)
if (box) await clickCursorOverlay(page, true)
await primeItem.click({ timeout: 5000 }).catch(() => {
console.warn(`Click on PrimeVue menu item "${label}" failed`)
})
if (box) await clickCursorOverlay(page, false)
await sleep(800)
return
}
@@ -448,9 +658,18 @@ async function clickSubmenuItem(page: Page, label: string) {
.filter({ hasText: label })
.first()
if (await liteItem.isVisible().catch(() => false)) {
const box = await liteItem.boundingBox().catch(() => null)
if (box)
await moveCursorOverlay(
page,
box.x + box.width / 2,
box.y + box.height / 2
)
if (box) await clickCursorOverlay(page, true)
await liteItem.click({ timeout: 5000 }).catch(() => {
console.warn(`Click on litegraph menu item "${label}" failed`)
})
if (box) await clickCursorOverlay(page, false)
await sleep(800)
return
}
@@ -527,13 +746,24 @@ async function fillDialogAndConfirm(page: Page, text: string) {
async function clickByText(page: Page, text: string) {
const el = page.locator(`text=${text}`).first()
if (await el.isVisible().catch(() => false)) {
// Get element position for cursor overlay
const box = await el.boundingBox().catch(() => null)
if (box) {
await moveCursorOverlay(
page,
box.x + box.width / 2,
box.y + box.height / 2
)
}
await el.hover({ timeout: 3000 }).catch(() => {})
await sleep(400)
if (box) await clickCursorOverlay(page, true)
await el.click({ timeout: 5000 }).catch((e) => {
console.warn(
`Click on "${text}" failed: ${e instanceof Error ? e.message.split('\n')[0] : e}`
)
})
if (box) await clickCursorOverlay(page, false)
await sleep(500)
} else {
console.warn(`Element with text "${text}" not found`)
@@ -553,7 +783,7 @@ async function waitForEditorReady(page: Page) {
await sleep(1000)
}
async function executeAction(
export async function executeAction(
page: Page,
step: TestAction,
outputDir: string
@@ -589,8 +819,17 @@ async function executeAction(
break
case 'pressKey':
try {
// Show key in subtitle (persists 2s) then instant press
const keyLabel =
step.key === ' '
? 'Space'
: step.key.length === 1
? step.key.toUpperCase()
: step.key
await showSubtitle(page, `${keyLabel}`, 0)
await sleep(200) // Let subtitle render before pressing
await page.keyboard.press(step.key)
await sleep(300)
await sleep(500)
} catch (e) {
console.warn(
`Skipping invalid key "${step.key}": ${e instanceof Error ? e.message : e}`
@@ -706,11 +945,29 @@ async function executeAction(
break
}
case 'loadDefaultWorkflow':
// Convenience: File → Load Default in one action
await openComfyMenu(page)
await hoverMenuItem(page, 'File')
await clickSubmenuItem(page, 'Load Default')
await sleep(1000)
// Load default workflow via app API (most reliable, no menu navigation)
try {
await page.evaluate(() => {
const app = (window as unknown as Record<string, unknown>).app as {
loadGraphData?: (d: unknown) => Promise<void>
resetToDefaultWorkflow?: () => Promise<void>
}
if (app?.resetToDefaultWorkflow) return app.resetToDefaultWorkflow()
return Promise.resolve()
})
await sleep(1000)
} catch {
// Fallback: try menu navigation with multiple possible item names
await openComfyMenu(page)
await hoverMenuItem(page, 'File')
const loaded = await clickSubmenuItem(page, 'Load Default')
.then(() => true)
.catch(() => false)
if (!loaded) {
await clickSubmenuItem(page, 'Default Workflow').catch(() => {})
}
await sleep(1000)
}
break
case 'openSettings':
// Convenience: open Settings dialog in one action
@@ -730,22 +987,22 @@ async function executeAction(
await page.evaluate(
({ text, x, y, ms }) => {
const el = document.createElement('div')
el.textContent = text
el.textContent = 'QA: ' + text
Object.assign(el.style, {
position: 'fixed',
left: x + 'px',
top: y + 'px',
zIndex: '2147483646',
padding: '4px 10px',
borderRadius: '4px',
background: 'rgba(255, 60, 60, 0.9)',
color: '#fff',
fontSize: '13px',
fontWeight: '600',
fontFamily: 'system-ui, sans-serif',
padding: '3px 8px',
borderRadius: '3px',
background: 'rgba(0, 0, 0, 0.6)',
border: '1.5px dashed rgba(120, 200, 255, 0.8)',
color: 'rgba(120, 200, 255, 0.9)',
fontSize: '11px',
fontWeight: '500',
fontFamily: 'monospace',
pointerEvents: 'none',
whiteSpace: 'nowrap',
boxShadow: '0 2px 8px rgba(0,0,0,0.3)',
transform: 'translateY(-100%) translateX(-50%)',
animation: 'qa-ann-in 200ms ease-out'
})
@@ -777,13 +1034,67 @@ async function executeAction(
break
}
case 'cloneNode': {
await page.mouse.move(step.x, step.y)
// Select node then Ctrl+C/Ctrl+V — works in both legacy and Nodes 2.0
await page.mouse.click(step.x, step.y)
await sleep(300)
await page.mouse.click(step.x, step.y, { button: 'right' })
await page.keyboard.press('Control+c')
await sleep(200)
await page.keyboard.press('Control+v')
await sleep(500)
await clickSubmenuItem(page, 'Clone')
console.warn(` Cloned node at (${step.x}, ${step.y}) via Ctrl+C/V`)
break
}
case 'copyPaste': {
const cx = step.x ?? 640
const cy = step.y ?? 400
await page.mouse.click(cx, cy)
await sleep(200)
await page.keyboard.press('Control+c')
await sleep(300)
await page.keyboard.press('Control+v')
await sleep(500)
console.warn(` Cloned node at (${step.x}, ${step.y})`)
console.warn(` Copy-pasted at (${cx}, ${cy})`)
break
}
case 'holdKeyAndDrag': {
await page.keyboard.down(step.key)
await sleep(100)
await page.mouse.move(step.fromX, step.fromY)
await page.mouse.down()
await sleep(100)
const hkSteps = 5
for (let i = 1; i <= hkSteps; i++) {
const hx = step.fromX + ((step.toX - step.fromX) * i) / hkSteps
const hy = step.fromY + ((step.toY - step.fromY) * i) / hkSteps
await page.mouse.move(hx, hy)
await sleep(50)
}
await page.mouse.up()
await page.keyboard.up(step.key)
await sleep(300)
console.warn(
` Hold ${step.key} + drag (${step.fromX},${step.fromY})→(${step.toX},${step.toY})`
)
break
}
case 'resizeNode': {
// Click bottom-right corner of node, then drag
await page.mouse.move(step.x, step.y)
await page.mouse.down()
await sleep(100)
await page.mouse.move(step.x + step.dx, step.y + step.dy)
await sleep(100)
await page.mouse.up()
await sleep(300)
console.warn(
` Resized node at (${step.x},${step.y}) by (${step.dx},${step.dy})`
)
break
}
case 'middleClick': {
await page.mouse.click(step.x, step.y, { button: 'middle' })
await sleep(300)
console.warn(` Middle-clicked at (${step.x}, ${step.y})`)
break
}
default:
@@ -906,10 +1217,121 @@ async function captureScreenshotForGemini(page: Page): Promise<string> {
return buffer.toString('base64')
}
function buildIssueSpecificHints(context: string): string {
const ctx = context.toLowerCase()
const hints: string[] = []
if (/clone|z.?index|overlap|above.*origin|layering/.test(ctx))
hints.push(
'Preflight already cloned a node. NOW: take a screenshot to see the cloned node, then dragCanvas from (~800,350) to (~750,350) to overlap the clone on the original. Take screenshot to capture z-index. The clone should be ABOVE the original.'
)
if (/copy.*paste|paste.*offset|ctrl\+c|ctrl\+v|clipboard/.test(ctx))
hints.push(
'MUST: loadDefaultWorkflow, then clickCanvas on a node (~450,250), then use copyPaste to copy+paste it. Check if pasted nodes are offset or misaligned.'
)
if (/group.*paste|paste.*group/.test(ctx))
hints.push(
'MUST: Select multiple nodes by drag-selecting, then copyPaste. Check if the group frame and nodes align after paste.'
)
if (/numeric.*drag|drag.*numeric|drag.*value|widget.*drag|slider/.test(ctx))
hints.push(
'Preflight already enabled Nodes 2.0 and loaded the workflow. NOW: take a screenshot, find a numeric widget (e.g. KSampler seed/cfg around ~750,300), then use dragCanvas from that widget value to the right (fromX:750,fromY:300,toX:850,toY:300) to attempt changing the value by dragging. Take screenshot after to compare.'
)
if (
/sidebar.*file|file.*extension|workflow.*sidebar|workflow.*tree/.test(ctx)
)
hints.push(
'MUST: Click the "Workflows" button in the left sidebar to open the file tree. Take a screenshot of the file list to check for missing extensions.'
)
if (/spacebar|space.*pan|pan.*space|space.*drag/.test(ctx))
hints.push(
'Preflight already loaded the workflow. NOW: first take a screenshot, then use holdKeyAndDrag with key=" " (Space) fromX:640 fromY:400 toX:400 toY:300 to test spacebar panning. Take screenshot after. Then try: clickCanvas on an output slot (~200,320), then holdKeyAndDrag with key=" " to test panning while connecting.'
)
if (/resize.*node|node.*resize|gap.*widget|widget.*gap/.test(ctx))
hints.push(
'MUST: loadDefaultWorkflow, then use resizeNode on the bottom-right corner of a node (e.g. KSampler at ~830,430 with dx=100,dy=50) to resize it. Screenshot before and after.'
)
if (/new.*tab|open.*tab|tab.*open/.test(ctx))
hints.push(
'MUST: Right-click on a workflow tab in the topbar, then look for "Open in new tab" option in the context menu.'
)
if (/hover.*image|zoom.*button|asset.*column|thumbnail/.test(ctx))
hints.push(
'MUST: Open the sidebar, navigate to assets/models, hover over image thumbnails to trigger the zoom button overlay. Screenshot the hover state.'
)
if (/scroll.*leak|scroll.*text|text.*widget.*scroll|scroll.*canvas/.test(ctx))
hints.push(
'MUST: loadDefaultWorkflow, click on a text widget (e.g. CLIP Text Encode prompt at ~450,250), type some text, then use scrollCanvas inside the widget area to test if scroll leaks to canvas zoom.'
)
if (/middle.*click|mmb|reroute/.test(ctx))
hints.push(
'MUST: loadDefaultWorkflow, then use middleClick on a link/wire between two nodes to test reroute creation.'
)
if (/node.*shape|change.*shape/.test(ctx))
hints.push(
'MUST: loadDefaultWorkflow, then rightClickCanvas on a node (~750,350), look for "Shape" or "Properties" in context menu to change node shape.'
)
if (/nodes.*2\.0|vue.*node|new.*node/.test(ctx))
hints.push(
'MUST: Enable Nodes 2.0 via setSetting("Comfy.UseNewMenu","Top") and setSetting("Comfy.NodeBeta.Enabled",true) FIRST before testing.'
)
if (hints.length === 0) return ''
return `\n## Issue-Specific Action Plan\nBased on keyword analysis of this issue, you MUST follow these steps:\n${hints.map((h, i) => `${i + 1}. ${h}`).join('\n')}\nDo NOT skip these steps. They are the minimum required to attempt reproduction.\n`
}
function buildPreflightActions(context: string): TestAction[] {
const ctx = context.toLowerCase()
const actions: TestAction[] = []
// Enable Nodes 2.0 if issue mentions it — requires reload to take effect
if (/nodes.*2\.0|vue.*node|new.*node|node.*beta/.test(ctx)) {
actions.push({
action: 'setSetting',
id: 'Comfy.NodeBeta.Enabled',
value: true
})
actions.push({ action: 'reload' })
}
// Load default workflow for most reproduction scenarios
if (
/clone|z.?index|overlap|copy.*paste|paste|resize|drag|scroll.*leak|scroll.*text|spacebar|space.*pan|node.*shape|numeric/.test(
ctx
)
) {
actions.push({ action: 'loadDefaultWorkflow' })
actions.push({ action: 'screenshot', name: 'preflight-default-workflow' })
}
// Issue-specific preflight: perform the actual reproduction steps
// mechanically so the agent starts with the right state
if (/clone|z.?index|above.*origin/.test(ctx)) {
// #10307: clone a node and check z-index
actions.push({ action: 'cloneNode', x: 750, y: 350 })
actions.push({ action: 'screenshot', name: 'preflight-after-clone' })
}
if (/numeric.*drag|drag.*numeric|drag.*value|widget.*drag/.test(ctx)) {
// #7414: click on a numeric widget value to prepare for drag test
actions.push({ action: 'clickCanvas', x: 750, y: 300 })
actions.push({ action: 'screenshot', name: 'preflight-numeric-widget' })
}
if (/spacebar.*pan|space.*pan|pan.*space/.test(ctx)) {
// #7806: start a connection drag then try spacebar pan
// First click an output slot to start dragging a wire
actions.push({ action: 'screenshot', name: 'preflight-before-connection' })
}
return actions
}
function buildAgenticSystemPrompt(
issueContext: string,
subIssueFocus?: string,
qaGuide?: string
qaGuide?: string,
preflightNote?: string
): string {
const focusSection = subIssueFocus
? `\n## Current Focus\nYou are reproducing this specific sub-issue: ${subIssueFocus}\nStay focused on this particular bug. When you have demonstrated it, return done.\n`
@@ -919,6 +1341,8 @@ function buildAgenticSystemPrompt(
? `\n## QA Analysis\nA deep analysis of this issue produced the following guide. Follow it closely:\n${qaGuide}\n`
: ''
const issueHints = buildIssueSpecificHints(issueContext)
return `You are an AI QA agent controlling a ComfyUI browser session to reproduce reported bugs.
You see the ACTUAL screen after each action and decide what to do next.
@@ -959,8 +1383,12 @@ Each action is a JSON object with an "action" field:
- { "action": "reload" } — reloads the page (for bugs that manifest on load)
- { "action": "wait", "ms": 1000 } — waits (max 3000ms)
- { "action": "screenshot", "name": "step-name" } — takes a named screenshot
- { "action": "addNode", "nodeName": "KSampler", "x": 640, "y": 400 } — double-clicks canvas to open search, types node name, presses Enter (compound action)
- { "action": "addNode", "nodeName": "KSampler", "x": 640, "y": 400 } — double-clicks canvas to open search, types node name, presses Enter
- { "action": "cloneNode", "x": 750, "y": 350 } — right-clicks node at coords and clicks Clone
- { "action": "copyPaste", "x": 640, "y": 400 } — clicks at coords then Ctrl+C, Ctrl+V
- { "action": "holdKeyAndDrag", "key": " ", "fromX": 640, "fromY": 400, "toX": 400, "toY": 300 } — holds key (e.g. Space for pan) while dragging
- { "action": "resizeNode", "x": 200, "y": 380, "dx": 100, "dy": 50 } — drags from node edge to resize
- { "action": "middleClick", "x": 640, "y": 400 } — middle mouse button click
- { "action": "done", "reason": "..." } — signals you are finished
## Response Format
@@ -984,8 +1412,12 @@ Return { "reasoning": "...", "action": { "action": "done", "reason": "..." } } w
- For visual/rendering bugs (z-index, overlap, z-fighting): ALWAYS start with loadDefaultWorkflow to get nodes on canvas. You cannot reproduce visual bugs on an empty canvas.
- To clone a node: use cloneNode at the node's coordinates (right-clicks → Clone).
- To overlap nodes for z-index testing: use dragCanvas to move one node on top of another.
- For copy-paste bugs: use copyPaste to select+copy+paste a node or group.
- For panning bugs: use holdKeyAndDrag with key=" " (Space) to test spacebar panning.
- For node resize bugs: use resizeNode on the bottom-right corner of a node.
- For reroute/middle-click bugs: use middleClick on a link or slot.
- Do NOT waste turns on generic exploration. Focus on reproducing the specific bug.
${focusSection}${qaSection}
${preflightNote || ''}${issueHints}${focusSection}${qaSection}
## Issue to Reproduce
${issueContext}`
}
@@ -1040,24 +1472,50 @@ async function runAgenticLoop(
}
}
// Auto-execute prerequisite actions based on issue keywords BEFORE the
// agentic loop starts. This guarantees nodes are on canvas, settings are
// correct, etc. — the agent model often ignores prompt-only hints.
const preflight = buildPreflightActions(issueContext)
if (preflight.length > 0) {
console.warn(`Running ${preflight.length} preflight actions...`)
for (const action of preflight) {
await executeAction(page, action, outputDir)
}
await sleep(500)
}
// Tell the agent what preflight already did so it doesn't repeat
const preflightNote =
preflight.length > 0
? `\n## Already Done (by preflight)\nThe following actions were ALREADY executed before you started. Do NOT repeat them:\n${preflight.map((a) => `- ${a.action}${('id' in a && `: ${a.id}=${a.value}`) || ''}`).join('\n')}\nThe default workflow is loaded and settings are configured. Start with the REPRODUCTION steps immediately.\n`
: ''
const systemInstruction = buildAgenticSystemPrompt(
issueContext,
subIssue?.focus,
qaGuideSummary
qaGuideSummary,
preflightNote
)
const anthropicKey = process.env.ANTHROPIC_API_KEY
const useHybrid = Boolean(anthropicKey)
const genAI = new GoogleGenerativeAI(opts.apiKey)
// Use flash for agentic loop — rapid iteration matters more than reasoning
const geminiVisionModel = genAI.getGenerativeModel({
model: 'gemini-3-flash-preview'
})
// Gemini-only fallback model (used when no ANTHROPIC_API_KEY)
const agenticModel = opts.model.includes('flash')
? opts.model
: 'gemini-3-flash-preview'
const model = genAI.getGenerativeModel({
const geminiOnlyModel = genAI.getGenerativeModel({
model: agenticModel,
systemInstruction
})
console.warn(
`Starting agentic loop with ${agenticModel}` +
`Starting ${useHybrid ? 'hybrid (Claude planner + Gemini vision)' : 'Gemini-only'} agentic loop` +
(subIssue ? ` — focus: ${subIssue.title}` : '')
)
@@ -1145,6 +1603,8 @@ async function runAgenticLoop(
const parsed = JSON.parse(responseText)
if (parsed.reasoning) {
console.warn(` Reasoning: ${parsed.reasoning.slice(0, 150)}`)
// Show reasoning as subtitle overlay in the video
await showSubtitle(page, parsed.reasoning, turn)
}
actionObj = parsed.action || parsed.actions?.[0] || parsed
if (!actionObj?.action) {
@@ -1343,54 +1803,66 @@ async function launchSessionAndLogin(
>
page: Page
}> {
const browser = await chromium.launch({ headless: true })
const browser = await chromium.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
'--disable-dev-shm-usage'
]
})
const context = await browser.newContext({
viewport: { width: 1280, height: 720 },
recordVideo: { dir: videoDir, size: { width: 1280, height: 720 } }
})
const page = await context.newPage()
// Inject visible cursor overlay (headless Chrome doesn't render system cursor)
await page.addInitScript(() => {
const style = document.createElement('style')
style.textContent = `
#qa-cursor {
position: fixed; z-index: 2147483647; pointer-events: none;
width: 16px; height: 16px; margin: -8px 0 0 -8px;
border-radius: 50%; background: rgba(255, 60, 60, 0.7);
border: 2px solid rgba(255, 255, 255, 0.9);
box-shadow: 0 0 8px rgba(255, 60, 60, 0.5);
transition: transform 80ms ease-out, opacity 80ms;
transform: scale(1); opacity: 0.85;
}
#qa-cursor.clicking {
transform: scale(1.8); opacity: 1;
background: rgba(255, 200, 60, 0.8);
border-color: rgba(255, 255, 255, 1);
box-shadow: 0 0 16px rgba(255, 200, 60, 0.6);
}
`
const cursor = document.createElement('div')
cursor.id = 'qa-cursor'
// Cursor overlay placeholder — injected after login when DOM is stable
const init = () => {
document.head.appendChild(style)
document.body.appendChild(cursor)
document.addEventListener('mousemove', (e) => {
cursor.style.left = e.clientX + 'px'
cursor.style.top = e.clientY + 'px'
})
document.addEventListener('mousedown', () =>
cursor.classList.add('clicking')
)
document.addEventListener('mouseup', () =>
cursor.classList.remove('clicking')
)
}
// Monkey-patch page.mouse to auto-update cursor overlay on ALL mouse ops
const origMove = page.mouse.move.bind(page.mouse)
const origClick = page.mouse.click.bind(page.mouse)
const origDown = page.mouse.down.bind(page.mouse)
const origUp = page.mouse.up.bind(page.mouse)
const origDblclick = page.mouse.dblclick.bind(page.mouse)
if (document.body) init()
else document.addEventListener('DOMContentLoaded', init)
})
page.mouse.move = async (
x: number,
y: number,
options?: Parameters<typeof origMove>[2]
) => {
await origMove(x, y, options)
await moveCursorOverlay(page, x, y)
}
page.mouse.click = async (
x: number,
y: number,
options?: Parameters<typeof origClick>[2]
) => {
await moveCursorOverlay(page, x, y)
await clickCursorOverlay(page, true)
await origClick(x, y, options)
await clickCursorOverlay(page, false)
}
page.mouse.dblclick = async (
x: number,
y: number,
options?: Parameters<typeof origDblclick>[2]
) => {
await moveCursorOverlay(page, x, y)
await clickCursorOverlay(page, true)
await origDblclick(x, y, options)
await clickCursorOverlay(page, false)
}
page.mouse.down = async (options?: Parameters<typeof origDown>[0]) => {
await clickCursorOverlay(page, true)
await origDown(options)
}
page.mouse.up = async (options?: Parameters<typeof origUp>[0]) => {
await origUp(options)
await clickCursorOverlay(page, false)
}
console.warn(`Opening ComfyUI at ${opts.serverUrl}`)
await page.goto(opts.serverUrl, {
@@ -1401,6 +1873,35 @@ async function launchSessionAndLogin(
await loginAsQaCi(page, opts.serverUrl)
await sleep(1000)
// Inject cursor overlay AFTER login (addInitScript gets destroyed by Vue mount)
await page.addScriptTag({
content: `(function(){
var s=document.createElement('style');
s.textContent='#qa-cursor{position:fixed;z-index:2147483647;pointer-events:none;width:20px;height:20px;margin:-2px 0 0 -2px;opacity:0.95;transition:transform 80ms ease-out;transform:scale(1)}#qa-cursor.clicking{transform:scale(1.4)}';
document.head.appendChild(s);
var c=document.createElement('div');c.id='qa-cursor';
c.innerHTML='<svg width="20" height="20" viewBox="0 0 24 24" fill="white" stroke="black" stroke-width="1.5"><path d="M4 2l14 10-6.5 1.5L15 21l-3.5-1.5L8 21l-1.5-7.5L2 16z"/></svg>';
document.body.appendChild(c);
window.__moveCursor=function(x,y){c.style.left=x+'px';c.style.top=y+'px'};
window.__clickCursor=function(d){if(d)c.classList.add('clicking');else c.classList.remove('clicking')};
})()`
})
// Inject keyboard HUD — shows pressed keys in bottom-right corner of video
// Uses addScriptTag to avoid tsx __name compilation artifacts in page.evaluate
await page.addScriptTag({
content: `(function(){
var hud=document.createElement('div');
Object.assign(hud.style,{position:'fixed',bottom:'8px',right:'8px',zIndex:'2147483647',padding:'3px 8px',borderRadius:'4px',background:'rgba(0,0,0,0.7)',border:'1px solid rgba(120,200,255,0.4)',color:'rgba(120,200,255,0.9)',fontSize:'11px',fontFamily:'monospace',fontWeight:'500',pointerEvents:'none',display:'none',whiteSpace:'nowrap'});
document.body.appendChild(hud);
var held=new Set();
function update(){if(held.size===0){hud.style.display='none'}else{hud.style.display='block';hud.textContent=String.fromCharCode(9000)+' '+Array.from(held).map(function(k){return k===' '?'Space':k.length===1?k.toUpperCase():k}).join('+')}}
document.addEventListener('keydown',function(e){held.add(e.key);update()},true);
document.addEventListener('keyup',function(e){held.delete(e.key);update()},true);
window.addEventListener('blur',function(){held.clear();update()});
})()`
})
return { browser, context, page }
}
@@ -1433,8 +1934,115 @@ async function main() {
await page.screenshot({
path: `${opts.outputDir}/debug-after-login-reproduce${sessionLabel}.png`
})
console.warn('Editor ready — starting agentic loop')
await runAgenticLoop(page, opts, opts.outputDir, subIssue)
// ═══ Phase 1: RESEARCH — Claude writes E2E test to reproduce ═══
console.warn('Phase 1: Research — Claude writes E2E test')
const anthropicKey = process.env.ANTHROPIC_API_KEY
const { runResearchPhase } = await import('./qa-agent.js')
const issueCtx = opts.diffFile
? readFileSync(opts.diffFile, 'utf-8').slice(0, 6000)
: 'No issue context provided'
let qaGuideText = ''
if (opts.qaGuideFile) {
try {
qaGuideText = readFileSync(opts.qaGuideFile, 'utf-8')
} catch {
// QA guide not available
}
}
const research = await runResearchPhase({
page,
issueContext: issueCtx,
qaGuide: qaGuideText,
outputDir: opts.outputDir,
serverUrl: opts.serverUrl,
anthropicApiKey: anthropicKey
})
console.warn(
`Research complete: ${research.verdict}${research.summary.slice(0, 100)}`
)
console.warn(`Evidence: ${research.evidence.slice(0, 200)}`)
// ═══ Phase 2: Run passing test with video recording ═══
if (research.verdict === 'REPRODUCED' && research.testCode) {
console.warn('Phase 2: Recording test execution with video')
const projectRoot = process.cwd()
const browserTestFile = `${projectRoot}/browser_tests/tests/qa-reproduce.spec.ts`
const testResultsDir = `${opts.outputDir}/test-results`
// Inject cursor overlay into the test — add page.addInitScript in beforeEach
const cursorScript = `await comfyPage.page.addInitScript(() => {
var c=document.createElement('div');c.id='qa-cursor';
c.innerHTML='<svg width="20" height="20" viewBox="0 0 24 24" fill="white" stroke="black" stroke-width="1.5"><path d="M4 2l14 10-6.5 1.5L15 21l-3.5-1.5L8 21l-1.5-7.5L2 16z"/></svg>';
Object.assign(c.style,{position:'fixed',zIndex:'2147483647',pointerEvents:'none',width:'20px',height:'20px',margin:'-2px 0 0 -2px',opacity:'0.95'});
if(document.body)document.body.appendChild(c);
else document.addEventListener('DOMContentLoaded',function(){document.body.appendChild(c)});
document.addEventListener('mousemove',function(e){c.style.left=e.clientX+'px';c.style.top=e.clientY+'px'});
});`
// Insert cursor injection after the first line of the test body (after async ({ comfyPage }) => {)
let testCode = research.testCode
const testBodyMatch = testCode.match(
/async\s*\(\{\s*comfyPage\s*\}\)\s*=>\s*\{/
)
if (testBodyMatch && testBodyMatch.index !== undefined) {
const insertPos = testBodyMatch.index + testBodyMatch[0].length
testCode =
testCode.slice(0, insertPos) +
'\n ' +
cursorScript +
'\n' +
testCode.slice(insertPos)
}
// Inject 800ms pauses between actions for human-readable video
// Uses comfyPage.page since test code uses comfyPageFixture
testCode = testCode.replace(
/(\n\s*)(await\s+(?:comfyPage|topbar|firstNode|page|canvas|expect))/g,
'$1await comfyPage.page.waitForTimeout(800);\n$1$2'
)
writeFileSync(browserTestFile, testCode)
try {
const output = execSync(
`cd "${projectRoot}" && npx playwright test browser_tests/tests/qa-reproduce.spec.ts --reporter=list --timeout=30000 --retries=0 --workers=1 --output="${testResultsDir}" 2>&1`,
{
timeout: 90000,
encoding: 'utf-8',
env: {
...process.env,
COMFYUI_BASE_URL: opts.serverUrl,
PLAYWRIGHT_LOCAL: '1' // Enables video=on + trace=on in playwright.config.ts
}
}
)
console.warn(`Phase 2: Test passed\n${output.slice(-300)}`)
} catch (e) {
const err = e as { stdout?: string }
console.warn(
`Phase 2: Test failed\n${(err.stdout || '').slice(-300)}`
)
}
// Copy recorded video to outputDir so deploy script finds it
try {
const videos = execSync(
`find "${testResultsDir}" -name '*.webm' -type f 2>/dev/null`,
{ encoding: 'utf-8' }
)
.trim()
.split('\n')
.filter(Boolean)
if (videos.length > 0) {
execSync(`cp "${videos[0]}" "${opts.outputDir}/qa-session.webm"`)
console.warn(`Phase 2: Video → ${opts.outputDir}/qa-session.webm`)
}
} catch {
console.warn('Phase 2: No test video found')
}
// Cleanup
try {
execSync(`rm -f "${browserTestFile}"`)
} catch {
/* ignore */
}
} else {
console.warn(`Skipping Phase 2: verdict=${research.verdict}`)
}
await sleep(2000)
} finally {
await context.close()
@@ -1442,7 +2050,38 @@ async function main() {
}
knownNames.add(videoName)
renameLatestWebm(opts.outputDir, videoName, knownNames)
// If Phase 2 already copied a test video as qa-session.webm, don't overwrite it
// with the idle research browser video
const videoPath = `${opts.outputDir}/${videoName}`
if (statSync(videoPath, { throwIfNoEntry: false })) {
console.warn(
'Phase 2 test video exists — skipping research video rename'
)
} else {
renameLatestWebm(opts.outputDir, videoName, knownNames)
}
// Post-process: add TTS narration audio to the video
if (narrationSegments.length > 0) {
const videoPath = `${opts.outputDir}/${videoName}`
if (statSync(videoPath, { throwIfNoEntry: false })) {
console.warn(
`Generating TTS narration for ${narrationSegments.length} segments...`
)
const audioPath = await generateNarrationAudio(
narrationSegments,
opts.outputDir,
opts.apiKey
)
if (audioPath) {
mergeAudioIntoVideo(
videoPath,
audioPath,
`${opts.outputDir}/${videoName.replace('.webm', '-narrated.webm')}`
)
}
}
}
}
} else {
// Before/after batch mode (unchanged)

View File

@@ -2,10 +2,11 @@
<link rel=preconnect href=https://fonts.googleapis.com><link rel=preconnect href=https://fonts.gstatic.com crossorigin><link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel=stylesheet>
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<style>
:root{--bg:oklch(8% 0.02 265);--surface:oklch(12% 0.02 265);--surface-up:oklch(16% 0.02 265);--fg:oklch(96% 0.01 95);--fg-muted:oklch(65% 0.01 265);--fg-dim:oklch(45% 0.01 265);--primary:oklch(62% 0.21 265);--primary-up:oklch(68% 0.21 265);--primary-glow:oklch(62% 0.15 265);--ok:oklch(62% 0.18 155);--err:oklch(62% 0.22 25);--border:oklch(22% 0.02 265);--border-faint:oklch(15% 0.01 265);--r:0.75rem;--r-lg:1rem;--ease-out:cubic-bezier(0.22,1,0.36,1);--dur-base:250ms;--dur-slow:500ms;--font:'Inter',system-ui,sans-serif;--font-mono:'JetBrains Mono',monospace}
:root{--bg:oklch(97% 0.01 265);--surface:oklch(100% 0 0);--surface-up:oklch(94% 0.01 265);--fg:oklch(15% 0.02 265);--fg-muted:oklch(40% 0.01 265);--fg-dim:oklch(55% 0.01 265);--primary:oklch(50% 0.21 265);--primary-up:oklch(45% 0.21 265);--primary-glow:oklch(55% 0.15 265);--ok:oklch(45% 0.18 155);--err:oklch(50% 0.22 25);--border:oklch(85% 0.01 265);--border-faint:oklch(90% 0.01 265);--r:0.75rem;--r-lg:1rem;--ease-out:cubic-bezier(0.22,1,0.36,1);--dur-base:250ms;--dur-slow:500ms;--font:'Inter',system-ui,sans-serif;--font-mono:'JetBrains Mono',monospace}
@media(prefers-color-scheme:dark){:root{--bg:oklch(8% 0.02 265);--surface:oklch(12% 0.02 265);--surface-up:oklch(16% 0.02 265);--fg:oklch(96% 0.01 95);--fg-muted:oklch(65% 0.01 265);--fg-dim:oklch(45% 0.01 265);--primary:oklch(62% 0.21 265);--primary-up:oklch(68% 0.21 265);--primary-glow:oklch(62% 0.15 265);--ok:oklch(62% 0.18 155);--err:oklch(62% 0.22 25);--border:oklch(22% 0.02 265);--border-faint:oklch(15% 0.01 265)}}
*{margin:0;padding:0;box-sizing:border-box}
body{background:var(--bg);color:var(--fg);font-family:var(--font);min-height:100vh;padding:clamp(1.5rem,4vw,3rem) clamp(1rem,3vw,2rem);position:relative}
body::after{content:'';position:fixed;inset:0;pointer-events:none;opacity:.03;background:url("data:image/svg+xml,%3Csvg viewBox='0 0 256 256' xmlns='http://www.w3.org/2000/svg'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='.85' numOctaves='4' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='100%25' height='100%25' filter='url(%23n)'/%3E%3C/svg%3E")}
@media(prefers-color-scheme:dark){body::after{content:'';position:fixed;inset:0;pointer-events:none;opacity:.03;background:url("data:image/svg+xml,%3Csvg viewBox='0 0 256 256' xmlns='http://www.w3.org/2000/svg'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='.85' numOctaves='4' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='100%25' height='100%25' filter='url(%23n)'/%3E%3C/svg%3E")}}
.container{max-width:1200px;margin:0 auto}
header{display:flex;align-items:center;gap:1rem;margin-bottom:clamp(1.5rem,4vw,3rem);padding-bottom:1.25rem;border-bottom:1px solid var(--border)}
.header-icon{width:36px;height:36px;display:grid;place-items:center;background:linear-gradient(135deg,oklch(100% 0 0/.06),oklch(100% 0 0/.02));backdrop-filter:blur(12px);border:1px solid oklch(100% 0 0/.1);border-radius:var(--r);flex-shrink:0}
@@ -13,9 +14,9 @@ header{display:flex;align-items:center;gap:1rem;margin-bottom:clamp(1.5rem,4vw,3
h1{font-size:clamp(1.25rem,2.5vw,1.625rem);font-weight:700;letter-spacing:-.03em;background:linear-gradient(135deg,var(--fg),var(--fg-muted));-webkit-background-clip:text;-webkit-text-fill-color:transparent;background-clip:text}
.meta{color:var(--fg-dim);font-size:.8125rem;margin-top:.15rem;letter-spacing:.01em}
.grid{display:grid;grid-template-columns:repeat(auto-fill,minmax(min(480px,100%),1fr));gap:1.5rem}
.card{background:linear-gradient(135deg,oklch(100% 0 0/.05),oklch(100% 0 0/.015));backdrop-filter:blur(16px) saturate(150%);border:1px solid oklch(100% 0 0/.08);border-radius:var(--r-lg);overflow:hidden;transition:border-color var(--dur-base) var(--ease-out),box-shadow var(--dur-base) var(--ease-out),transform var(--dur-base) var(--ease-out)}
.card:hover{border-color:oklch(100% 0 0/.16);box-shadow:0 8px 32px oklch(0% 0 0/.3),inset 0 1px 0 oklch(100% 0 0/.1);transform:translateY(-2px)}
.video-wrap{position:relative;background:oklch(4% 0.01 265);border-bottom:1px solid var(--border-faint)}
.card{background:var(--surface);border:1px solid var(--border);border-radius:var(--r-lg);overflow:hidden;transition:border-color var(--dur-base) var(--ease-out),box-shadow var(--dur-base) var(--ease-out),transform var(--dur-base) var(--ease-out)}
.card:hover{border-color:var(--primary);box-shadow:0 4px 16px oklch(0% 0 0/.1);transform:translateY(-2px)}
.video-wrap{position:relative;background:var(--surface);border-bottom:1px solid var(--border-faint)}
.video-wrap video{width:100%;display:block;aspect-ratio:16/9;object-fit:contain}
.card-body{padding:.75rem 1rem;display:flex;align-items:center;justify-content:space-between}
.platform{display:flex;align-items:center;gap:.5rem;font-weight:600;font-size:.9375rem;letter-spacing:-.01em}
@@ -28,7 +29,7 @@ h1{font-size:clamp(1.25rem,2.5vw,1.625rem);font-weight:700;letter-spacing:-.03em
.comparison{display:grid;grid-template-columns:1fr 1fr;gap:0}
.comp-panel{border-right:1px solid var(--border-faint)}
.comp-panel:last-child{border-right:none}
.comp-label{padding:.4rem .75rem;font-size:.7rem;font-weight:600;text-transform:uppercase;letter-spacing:.05em;color:var(--fg-muted);background:oklch(10% 0.01 265);display:flex;align-items:center;gap:.4rem}
.comp-label{padding:.4rem .75rem;font-size:.7rem;font-weight:600;text-transform:uppercase;letter-spacing:.05em;color:var(--fg-muted);background:var(--surface);display:flex;align-items:center;gap:.4rem}
.comp-tag{font-size:.6rem;padding:.1rem .4rem;border-radius:9999px;font-weight:600}
.comp-panel:first-child .comp-tag{background:oklch(65% 0.01 265/.15);color:var(--fg-muted);border:1px solid var(--border)}
.comp-panel:last-child .comp-tag{background:oklch(62% 0.18 155/.15);color:var(--ok);border:1px solid oklch(62% 0.18 155/.25)}
@@ -44,15 +45,15 @@ h1{font-size:clamp(1.25rem,2.5vw,1.625rem);font-weight:700;letter-spacing:-.03em
.report-body p{margin:.4rem 0}
.report-body ul,.report-body ol{margin:.4rem 0 .4rem 1.5rem}
.report-body li{margin:.25rem 0}
.report-body code{background:oklch(16% 0.02 265);padding:.125rem .375rem;border-radius:.25rem;font-size:.7rem;font-family:var(--font-mono);border:1px solid var(--border-faint)}
.report-body code{background:var(--surface-up);padding:.125rem .375rem;border-radius:.25rem;font-size:.7rem;font-family:var(--font-mono);border:1px solid var(--border-faint)}
.report-body h3+p>code:first-child{background:oklch(62% 0.22 25/.15);color:var(--err);border-color:oklch(62% 0.22 25/.25)}
.report-body h3+p>code:nth-child(2){background:oklch(62% 0.21 265/.15);color:var(--primary-up);border-color:oklch(62% 0.21 265/.25)}
.report-body h3+p>code:nth-child(3){background:oklch(65% 0.01 265/.15);color:var(--fg-muted);border-color:var(--border)}
.report-body table{width:100%;border-collapse:collapse;margin:.75rem 0;font-size:.75rem;border:1px solid var(--border);border-radius:var(--r);overflow:hidden}
.report-body th,.report-body td{border:1px solid var(--border-faint);padding:.5rem .75rem;text-align:left;vertical-align:top;word-wrap:break-word}
.report-body th{background:oklch(14% 0.02 265);color:var(--fg);font-weight:600;font-size:.6875rem;text-transform:uppercase;letter-spacing:.05em;position:sticky;top:0;white-space:nowrap}
.report-body tr:nth-child(even){background:oklch(10% 0.01 265/.5)}
.report-body tr:hover{background:oklch(16% 0.02 265/.5)}
.report-body th{background:var(--surface-up);color:var(--fg);font-weight:600;font-size:.6875rem;text-transform:uppercase;letter-spacing:.05em;position:sticky;top:0;white-space:nowrap}
.report-body tr:nth-child(even){background:color-mix(in oklch,var(--surface) 50%,transparent)}
.report-body tr:hover{background:color-mix(in oklch,var(--surface-up) 50%,transparent)}
.report-body strong{color:var(--fg)}
.report-body hr{border:none;border-top:1px solid var(--border-faint);margin:1rem 0}
@keyframes fade-up{from{opacity:0;transform:translateY(16px)}to{opacity:1;transform:translateY(0)}}
@@ -66,16 +67,26 @@ h1{font-size:clamp(1.25rem,2.5vw,1.625rem);font-weight:700;letter-spacing:-.03em
.copy-badge{background:oklch(100% 0 0/.06);border:1px solid var(--border);color:var(--fg-muted);padding:.3rem .4rem;border-radius:var(--r);cursor:pointer;display:inline-flex;align-items:center;transition:all var(--dur-base) var(--ease-out)}
.copy-badge:hover{color:var(--primary-up);border-color:var(--primary);background:oklch(62% 0.21 265/.1)}
.copy-badge.copied{color:var(--ok);border-color:var(--ok)}
.vctrl{display:flex;align-items:center;gap:.375rem;padding:.5rem .75rem;background:oklch(6% 0.01 265);border-top:1px solid var(--border-faint);flex-wrap:wrap}
.vseek{width:100%;padding:0 .75rem;background:var(--surface);border-top:1px solid var(--border-faint);position:relative;height:24px;display:flex;align-items:center}
.vseek input[type=range]{-webkit-appearance:none;appearance:none;width:100%;height:4px;background:var(--border);border-radius:2px;outline:none;cursor:pointer;position:relative;z-index:2}
.vseek input[type=range]::-webkit-slider-thumb{-webkit-appearance:none;width:12px;height:12px;border-radius:50%;background:var(--primary);cursor:pointer;border:2px solid var(--bg);box-shadow:0 0 4px oklch(0% 0 0/.3)}
.vseek input[type=range]::-moz-range-thumb{width:12px;height:12px;border-radius:50%;background:var(--primary);cursor:pointer;border:2px solid var(--bg)}
.vseek .vbuf{position:absolute;left:.75rem;right:.75rem;height:4px;border-radius:2px;pointer-events:none;top:50%;transform:translateY(-50%)}
.vseek .vbuf-bar{height:100%;background:oklch(62% 0.21 265/.25);border-radius:2px;transition:width 200ms linear}
.vctrl{display:flex;align-items:center;gap:.375rem;padding:.5rem .75rem;background:var(--surface);border-top:1px solid var(--border-faint);flex-wrap:wrap}
.vctrl button{background:oklch(100% 0 0/.06);border:1px solid var(--border);color:var(--fg-muted);font-size:.6875rem;font-weight:600;font-family:var(--font-mono);padding:.25rem .5rem;border-radius:.25rem;cursor:pointer;transition:all var(--dur-base) var(--ease-out);white-space:nowrap}
.vctrl button:hover{color:var(--primary-up);border-color:var(--primary);background:oklch(62% 0.21 265/.1)}
.vctrl button.active{color:var(--primary);border-color:var(--primary);background:oklch(62% 0.21 265/.15)}
.vctrl .vtime{font-family:var(--font-mono);font-size:.6875rem;color:var(--fg-dim);min-width:10ch;text-align:center}
.vctrl .vsep{width:1px;height:1rem;background:var(--border);flex-shrink:0}
.vctrl .vhint{font-size:.6rem;color:var(--fg-dim);margin-left:auto}
.purpose{background:linear-gradient(135deg,oklch(100% 0 0/.04),oklch(100% 0 0/.02));border:1px solid oklch(100% 0 0/.08);border-radius:var(--r-lg);padding:1rem 1.25rem;margin-bottom:1.5rem;font-size:.85rem;line-height:1.7;color:oklch(80% 0.01 265)}
.purpose strong{color:var(--fg);font-weight:600}
.purpose .purpose-label{font-size:.7rem;font-weight:600;text-transform:uppercase;letter-spacing:.05em;color:var(--fg-muted);margin-bottom:.4rem}
.purpose .purpose-reqs{margin-top:.75rem;padding-top:.75rem;border-top:1px solid oklch(100% 0 0/.06);font-size:.8rem;color:oklch(70% 0.01 265);line-height:1.8}
</style></head><body><div class=container>
<header><div class=header-icon><svg width=20 height=20 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2 stroke-linecap=round stroke-linejoin=round><polygon points="23 7 16 12 23 17 23 7"/><rect x=1 y=5 width=15 height=14 rx=2 ry=2/></svg></div><div><h1>QA Session Recordings</h1><div class=meta>ComfyUI Frontend &middot; Automated QA{{COMMIT_HTML}}{{RUN_LINK}}</div>{{BADGE_HTML}}</div></header>
<div class=grid>{{CARDS}}</div>
<header><div class=header-icon><svg width=20 height=20 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2 stroke-linecap=round stroke-linejoin=round><polygon points="23 7 16 12 23 17 23 7"/><rect x=1 y=5 width=15 height=14 rx=2 ry=2/></svg></div><div><h1>QA Session Recordings</h1><div class=meta>ComfyUI Frontend &middot; Automated QA{{COMMIT_HTML}}{{RUN_LINK}}{{TIMING_HTML}}</div>{{BADGE_HTML}}</div></header>
{{PURPOSE_HTML}}<div class=grid>{{CARDS}}</div>
</div><script>
function copyBadge(){const u=location.href.replace(/\/[^/]*$/,'/');const b=u+'badge.svg';const md='[![QA Badge]('+b+')]('+u+')';navigator.clipboard.writeText(md).then(()=>{const btn=document.querySelector('.copy-badge');btn.classList.add('copied');btn.innerHTML='<svg width=14 height=14 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2><polyline points="20 6 9 17 4 12"/></svg>';setTimeout(()=>{btn.classList.remove('copied');btn.innerHTML='<svg width=14 height=14 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2><rect x=9 y=9 width=13 height=13 rx=2/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>'},2000)})}
document.querySelectorAll('[data-md]').forEach(el=>{const t=el.textContent;el.removeAttribute('data-md');el.innerHTML=marked.parse(t)});
@@ -96,8 +107,23 @@ document.querySelectorAll('.video-wrap video').forEach(v=>{
const spdBtns=SPEEDS.map(s=>{const b=btn(s+'x',()=>{v.playbackRate=s;spdBtns.forEach(x=>x.classList.remove('active'));b.classList.add('active')});if(s===0.5)b.classList.add('active');return b});
sep();c.appendChild(time);
const hint=document.createElement('span');hint.className='vhint';hint.textContent='\u2190\u2192 frame \u2022 space play';c.appendChild(hint);
v.closest('.video-wrap').after(c);
v.ontimeupdate=()=>{const m=Math.floor(v.currentTime/60),s=Math.floor(v.currentTime%60),ms=Math.floor((v.currentTime%1)*1000);time.textContent=m+':'+(s<10?'0':'')+s+'.'+String(ms).padStart(3,'0')};
// Custom seekbar — works even without server range request support
const seekWrap=document.createElement('div');seekWrap.className='vseek';
const seekBar=document.createElement('input');seekBar.type='range';seekBar.min=0;seekBar.max=1000;seekBar.value=0;seekBar.step=1;
const bufWrap=document.createElement('div');bufWrap.className='vbuf';
const bufBar=document.createElement('div');bufBar.className='vbuf-bar';bufBar.style.width='0%';
bufWrap.appendChild(bufBar);seekWrap.appendChild(bufWrap);seekWrap.appendChild(seekBar);
let seeking=false;
seekBar.oninput=()=>{seeking=true;if(v.duration){v.currentTime=v.duration*(seekBar.value/1000)}};
seekBar.onchange=()=>{seeking=false};
v.closest('.video-wrap').after(seekWrap);
seekWrap.after(c);
v.ontimeupdate=()=>{
const m=Math.floor(v.currentTime/60),s=Math.floor(v.currentTime%60),ms=Math.floor((v.currentTime%1)*1000);
time.textContent=m+':'+(s<10?'0':'')+s+'.'+String(ms).padStart(3,'0');
if(!seeking&&v.duration){seekBar.value=Math.round((v.currentTime/v.duration)*1000)}
};
v.onprogress=v.onloadeddata=()=>{if(v.buffered.length&&v.duration){bufBar.style.width=(v.buffered.end(v.buffered.length-1)/v.duration*100)+'%'}};
v.onplay=()=>{playBtn.textContent='\u23F8'};v.onpause=()=>{playBtn.textContent='\u25B6'};
v.parentElement.addEventListener('keydown',e=>{
if(e.key==='ArrowLeft'){e.preventDefault();v.pause();v.currentTime=Math.max(0,v.currentTime-FT)}

253
scripts/qa-reproduce.ts Normal file
View File

@@ -0,0 +1,253 @@
#!/usr/bin/env tsx
/**
* QA Reproduce Phase — Deterministic replay of research plan with narration
*
* Takes a reproduction plan from the research phase and replays it:
* 1. Execute each action deterministically (no AI decisions)
* 2. Capture a11y snapshot before/after each action
* 3. Gemini describes what visually changed (narration for humans)
* 4. Output: narration-log.json with full evidence chain
*/
import type { Page } from '@playwright/test'
import { GoogleGenerativeAI } from '@google/generative-ai'
import { mkdirSync, writeFileSync } from 'fs'
import type { ActionResult } from './qa-record.js'
// ── Types ──
interface ReproductionStep {
action: Record<string, unknown> & { action: string }
expectedAssertion: string
}
interface NarrationEntry {
step: number
action: string
params: Record<string, unknown>
result: ActionResult
a11yBefore: unknown
a11yAfter: unknown
assertionExpected: string
assertionPassed: boolean
assertionActual: string
geminiNarration: string
timestampMs: number
}
export interface NarrationLog {
entries: NarrationEntry[]
allAssertionsPassed: boolean
}
interface ReproduceOptions {
page: Page
plan: ReproductionStep[]
geminiApiKey: string
outputDir: string
}
// ── A11y helpers ──
interface A11yNode {
role: string
name: string
value?: string
checked?: boolean
disabled?: boolean
expanded?: boolean
children?: A11yNode[]
}
function searchA11y(node: A11yNode | null, selector: string): A11yNode | null {
if (!node) return null
const sel = selector.toLowerCase()
if (
node.name?.toLowerCase().includes(sel) ||
node.role?.toLowerCase().includes(sel)
) {
return node
}
if (node.children) {
for (const child of node.children) {
const found = searchA11y(child, selector)
if (found) return found
}
}
return null
}
function summarizeA11y(node: A11yNode | null): string {
if (!node) return 'null'
const parts = [`role=${node.role}`, `name="${node.name}"`]
if (node.value !== undefined) parts.push(`value="${node.value}"`)
if (node.checked !== undefined) parts.push(`checked=${node.checked}`)
if (node.disabled) parts.push('disabled')
if (node.expanded !== undefined) parts.push(`expanded=${node.expanded}`)
return `{${parts.join(', ')}}`
}
// ── Subtitle overlay ──
async function showSubtitle(page: Page, text: string, step: number) {
const encoded = encodeURIComponent(
text.slice(0, 120).replace(/'/g, "\\'").replace(/\n/g, ' ')
)
await page.addScriptTag({
content: `(function(){
var id='qa-subtitle';
var el=document.getElementById(id);
if(!el){
el=document.createElement('div');
el.id=id;
Object.assign(el.style,{position:'fixed',bottom:'32px',left:'50%',transform:'translateX(-50%)',zIndex:'2147483646',maxWidth:'90%',padding:'6px 14px',borderRadius:'6px',background:'rgba(0,0,0,0.8)',color:'rgba(255,255,255,0.95)',fontSize:'12px',fontFamily:'system-ui,sans-serif',fontWeight:'400',lineHeight:'1.4',pointerEvents:'none',textAlign:'center',whiteSpace:'normal'});
document.body.appendChild(el);
}
el.textContent='['+${step}+'] '+decodeURIComponent('${encoded}');
})()`
})
}
// ── Gemini visual narration ──
async function geminiDescribe(
page: Page,
geminiApiKey: string,
focus: string
): Promise<string> {
try {
const screenshot = await page.screenshot({ type: 'jpeg', quality: 70 })
const genAI = new GoogleGenerativeAI(geminiApiKey)
const model = genAI.getGenerativeModel({ model: 'gemini-3-flash-preview' })
const result = await model.generateContent([
{
text: `Describe in 1-2 sentences what you see on this ComfyUI screen. Focus on: ${focus}. Be factual — only describe what is visible.`
},
{
inlineData: {
mimeType: 'image/jpeg',
data: screenshot.toString('base64')
}
}
])
return result.response.text().trim()
} catch (e) {
return `(Gemini narration failed: ${e instanceof Error ? e.message.slice(0, 50) : e})`
}
}
// ── Main reproduce function ──
export async function runReproducePhase(
opts: ReproduceOptions
): Promise<NarrationLog> {
const { page, plan, geminiApiKey, outputDir } = opts
const { executeAction } = await import('./qa-record.js')
const narrationDir = `${outputDir}/narration`
mkdirSync(narrationDir, { recursive: true })
const entries: NarrationEntry[] = []
const startMs = Date.now()
console.warn(`Reproduce phase: replaying ${plan.length} steps...`)
for (let i = 0; i < plan.length; i++) {
const step = plan[i]
const actionObj = step.action
const elapsed = Date.now() - startMs
// Show subtitle
await showSubtitle(page, `Step ${i + 1}: ${actionObj.action}`, i + 1)
console.warn(` [${i + 1}/${plan.length}] ${actionObj.action}`)
// Capture a11y BEFORE
const a11yBefore = await page
.locator('body')
.ariaSnapshot({ timeout: 3000 })
.catch(() => null)
// Execute action
const result = await executeAction(
page,
actionObj as Parameters<typeof executeAction>[1],
outputDir
)
await new Promise((r) => setTimeout(r, 500))
// Capture a11y AFTER
const a11yAfter = await page
.locator('body')
.ariaSnapshot({ timeout: 3000 })
.catch(() => null)
// Check assertion
let assertionPassed = false
let assertionActual = ''
if (step.expectedAssertion) {
// Parse the expected assertion — e.g. "Settings dialog: visible" or "tab count: 2"
const parts = step.expectedAssertion.split(':').map((s) => s.trim())
const selectorName = parts[0]
const expectedState = parts.slice(1).join(':').trim()
const found = searchA11y(a11yAfter as A11yNode | null, selectorName)
assertionActual = found ? summarizeA11y(found) : 'NOT FOUND'
if (expectedState === 'visible' || expectedState === 'exists') {
assertionPassed = found !== null
} else if (expectedState === 'hidden' || expectedState === 'gone') {
assertionPassed = found === null
} else {
// Generic: check if the actual state contains the expected text
assertionPassed = assertionActual
.toLowerCase()
.includes(expectedState.toLowerCase())
}
console.warn(
` Assertion: "${step.expectedAssertion}" → ${assertionPassed ? '✓ PASS' : '✗ FAIL'} (actual: ${assertionActual})`
)
}
// Gemini narration (visual description for humans)
const geminiNarration = await geminiDescribe(
page,
geminiApiKey,
`What changed after ${actionObj.action}?`
)
entries.push({
step: i + 1,
action: actionObj.action,
params: actionObj,
result,
a11yBefore,
a11yAfter,
assertionExpected: step.expectedAssertion,
assertionPassed,
assertionActual,
geminiNarration,
timestampMs: elapsed
})
}
// Final screenshot
await page.screenshot({ path: `${outputDir}/reproduce-final.png` })
const log: NarrationLog = {
entries,
allAssertionsPassed: entries.every((e) => e.assertionPassed)
}
writeFileSync(
`${narrationDir}/narration-log.json`,
JSON.stringify(log, null, 2)
)
console.warn(
`Reproduce phase complete: ${entries.filter((e) => e.assertionPassed).length}/${entries.length} assertions passed`
)
return log
}

View File

@@ -353,6 +353,7 @@ function buildComparativePrompt(
' that were NOT present in the BEFORE video?',
'',
'Note: Brief black frames during page transitions are NORMAL.',
'Note: Small cyan/purple dashed labels prefixed with "QA:" are annotations placed by the automated test script — they are NOT part of the application UI. Do not treat them as bugs or evidence.',
'Report only concrete, visible differences. Avoid speculation.',
'',
'Return markdown with these sections exactly:',
@@ -400,7 +401,14 @@ function buildComparativePrompt(
'',
'## Possible Issues (Needs Human Verification)',
'## Overall Risk',
'(Assess whether the PR achieves its goal based on the before/after comparison)'
'(Assess whether the PR achieves its goal based on the before/after comparison)',
'',
'## Verdict',
'End your report with this EXACT JSON block (no markdown fence):',
'{"verdict": "REPRODUCED" | "NOT_REPRODUCIBLE" | "INCONCLUSIVE", "risk": "low" | "medium" | "high", "confidence": "high" | "medium" | "low"}',
'- REPRODUCED: the before video confirms the old behavior and the after video shows the fix working',
'- NOT_REPRODUCIBLE: the before video does not show the reported bug',
'- INCONCLUSIVE: the videos do not adequately demonstrate the behavior change'
)
return lines.filter(Boolean).join('\n')
@@ -413,6 +421,12 @@ function buildSingleVideoPrompt(
): string {
const lines = [
'You are a senior QA engineer reviewing a UI test session recording.',
'',
'## ANTI-HALLUCINATION RULES (READ FIRST)',
'- Describe ONLY what you can directly observe in the video frames',
'- NEVER infer or assume what "must have happened" between frames',
'- If a step is not visible in the video, say "NOT SHOWN" — do not guess',
'- Your job is to be a CAMERA — report facts, not interpretations',
''
]
@@ -423,38 +437,40 @@ function buildSingleVideoPrompt(
)
if (prContext) {
lines.push(
'## Phase 1: Blind Observation (describe what you SEE)',
'First, describe every UI interaction chronologically WITHOUT knowing the expected outcome:',
'- What elements does the user click/hover/type?',
'- What dialogs/menus open and close?',
'- What keyboard indicators appear? (look for subtitle overlays)',
'- What is the BEFORE state and AFTER state of each action?',
'',
'## Phase 2: Compare against expected behavior',
'Now compare your observations against the context below.',
'Only claim a match if your Phase 1 observations EXPLICITLY support it.',
''
)
if (isIssueContext) {
lines.push(
'## Issue Context',
'This video attempts to reproduce a reported bug on the main branch.',
'Your review MUST evaluate whether the reported bug is visible and reproducible.',
'',
prContext,
'',
'## Review Instructions',
'1. Does the video demonstrate the reported bug occurring?',
'2. Is the bug clearly visible and reproducible from the steps shown?',
'3. Are there any other issues visible during the reproduction attempt?',
'',
'## CRITICAL: Honesty Requirements',
'- If the video only shows login, idle canvas, or trivial menu interactions WITHOUT actually performing the reproduction steps, say "INCONCLUSIVE — reproduction steps were not performed".',
'- Do NOT claim a bug is "confirmed" unless you can clearly see the bug behavior described in the issue.',
'- Do NOT hallucinate findings. If the video does not show meaningful interaction, say so clearly.',
'- Rate confidence as "Low" if the video does not actually demonstrate the bug scenario.',
'## Comparison Questions',
'1. Did the video perform the reproduction steps described in the issue?',
'2. Did your Phase 1 observations show the reported bug behavior?',
'3. If the steps were not performed or the bug was not visible, say INCONCLUSIVE.',
''
)
} else {
lines.push(
'## PR Context',
'The video is a QA session testing a specific pull request.',
'Your review MUST evaluate whether the PR achieves its stated purpose.',
'',
prContext,
'',
'## Review Instructions',
"1. Does the video demonstrate the PR's intended behavior working correctly?",
'2. Are there regressions or side effects caused by the PR changes?',
'3. Does the observed behavior match what the PR claims to implement/fix?',
'## Comparison Questions',
'1. Did the video test the specific behavior the PR changes?',
'2. Did your Phase 1 observations show the expected before/after difference?',
'3. If the test was incomplete or inconclusive, say so honestly.',
''
)
}
@@ -466,6 +482,7 @@ function buildSingleVideoPrompt(
'The video shows the full test session — analyze it chronologically.',
'Focus on UI regressions, broken states, visual glitches, unreadable text, missing labels/i18n, and clear workflow failures.',
'Note: Brief black frames during page transitions are NORMAL and should NOT be reported as issues.',
'Note: Small cyan/purple dashed labels prefixed with "QA:" are annotations placed by the automated test script — they are NOT part of the application UI. Do not treat them as bugs or evidence.',
'Report only concrete, visible problems and avoid speculation.',
'If confidence is low, mark it explicitly.',
'',
@@ -494,7 +511,14 @@ function buildSingleVideoPrompt(
'`SEVERITY` `TIMESTAMP` `Confidence: LEVEL`',
'Do NOT use a table for issues — use the block format above.',
'## Possible Issues (Needs Human Verification)',
'## Overall Risk'
'## Overall Risk',
'',
'## Verdict',
'End your report with this EXACT JSON block (no markdown fence):',
'{"verdict": "REPRODUCED" | "NOT_REPRODUCIBLE" | "INCONCLUSIVE", "risk": "low" | "medium" | "high" | null, "confidence": "high" | "medium" | "low"}',
'- REPRODUCED: the bug/behavior is clearly visible in the video',
'- NOT_REPRODUCIBLE: the steps were performed correctly but the bug was not observed',
'- INCONCLUSIVE: the reproduction steps were not performed or the video is insufficient'
)
return lines.filter(Boolean).join('\n')