Compare commits

...

154 Commits

Author SHA1 Message Date
snomiao
c4a243060b feat: Agent SDK auto-detects Claude Code session — no API key needed locally
ANTHROPIC_API_KEY is optional: Agent SDK uses Claude Code OAuth
session when running locally (detects CLAUDE_CODE_SSE_PORT).
In CI, ANTHROPIC_API_KEY from secrets is used.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 15:07:57 +00:00
snomiao
3690e98c79 refactor: require ANTHROPIC_API_KEY, remove Gemini-only fallback
The Gemini-only agentic loop had ~47% success rate — too low to be
useful as a fallback. Now ANTHROPIC_API_KEY is required for issue
reproduction. Fails clearly if missing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 15:01:51 +00:00
snomiao
1c40893cfa fix: inject cursor overlay via addScriptTag after login, not addInitScript
addInitScript runs before page load — Vue's app mount destroys the
cursor div when it takes over the DOM. Using addScriptTag after login
ensures the cursor persists in the stable DOM.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 13:30:37 +00:00
snomiao
72a28a1e76 fix: cursor overlay on locator clicks (clickByText, menu items)
Locator.click/hover bypasses our page.mouse monkey-patch. Now
clickByText, hoverMenuItem, clickSubmenuItem get the element
bounding box and update cursor overlay manually.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 11:58:01 +00:00
snomiao
49c95248e5 fix: verdict JSON grep pattern — capture value without closing quote
The grep \{"verdict":\s*"[^"]+ captures up to but not including the
closing quote. The second grep for "[A-Z_]+"$ then fails because
there's no closing quote. Fixed: match "verdict":\s*"[A-Z_]+ then
extract [A-Z_]+$ (no quotes needed).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 10:51:35 +00:00
snomiao
6bb9d18ca6 fix: grep pipefail crash + add QA troubleshooting doc
- Add || true to all grep pipelines in deploy script (grep returns 1
  on no match, pipefail kills script)
- Add docs/qa/TROUBLESHOOTING.md covering all failures encountered:
  __name errors, zod/v4 imports, model IDs, badge mismatches, cursor,
  loadDefaultWorkflow, pressKey timing, agent behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 10:17:36 +00:00
snomiao
ca49e9cb1b feat: structured JSON verdict from AI reviewer, light-first theme
- Video review prompt now requests a ## Verdict JSON block:
  {"verdict": "REPRODUCED|NOT_REPRODUCIBLE|INCONCLUSIVE", "risk": "low|medium|high"}
- Deploy script reads JSON verdict first, falls back to grep
- Eliminates all regex-matching false positives permanently
- Theme: light mode is default, dark via prefers-color-scheme:dark
- Cards use solid backgrounds, grain overlay only in dark mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 09:11:09 +00:00
snomiao
db538c9d76 feat: report site follows system light/dark theme
Add prefers-color-scheme:light media query with light palette.
Replace hardcoded dark oklch values with CSS variables.
Light mode: white surfaces, dark text, subtle borders, no grain overlay.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 08:53:08 +00:00
snomiao
c04b31a0f1 fix: loadDefaultWorkflow uses API instead of menu, pressKey uses instant press
- loadDefaultWorkflow now calls app.resetToDefaultWorkflow() via JS API
  instead of navigating File → Load Default menu (menu item name varies)
- pressKey reverted to instant press() — the 400ms hold via down/up
  prevented Escape from propagating to parent dialog (#10397 BEFORE video
  showed wrong behavior because hold intercepted the event)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 08:45:14 +00:00
snomiao
5938fdef8d fix: monkey-patch page.mouse for universal cursor overlay
Instead of manually calling moveCursorOverlay in each action,
patch page.mouse.move/click/dblclick/down/up globally. Now EVERY
mouse operation shows the cursor — text clicks, menu hovers, etc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 15:47:56 +00:00
snomiao
27d55f093b fix: use gemini-3-flash-preview in hybrid agent (not 2.5 preview)
Gemini 2.5 preview models return 404. Always use gemini-3+ models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 15:45:29 +00:00
snomiao
c1ddb8669e fix: add 'could not be confirmed' to negative verdict patterns
"could not be confirmed" contains "confirmed" which matched the
positive reproduc|confirm check. Now caught by the negative check first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 15:17:02 +00:00
snomiao
57dbf0132d fix: correct Claude model ID — claude-sonnet-4-6 (not dated suffix)
The Agent SDK returned "model not found" for claude-sonnet-4-6-20250514.
Correct ID is claude-sonnet-4-6.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 14:36:22 +00:00
snomiao
b4c49588dc fix: cursor overlay now controlled via __moveCursor, not DOM events
Headless Chrome's Playwright CDP doesn't trigger DOM mousemove events
reliably. Now executeAction calls __moveCursor(x,y) directly after
every mouse.move/click/drag. Cursor is an SVG arrow (white + outline).
Click state shown via scale animation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 13:25:12 +00:00
snomiao
00dc10e9e6 feat: badge label includes QA date — #10397 QA0327
Shows when the QA was run so stale results are obvious at a glance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 08:43:19 +00:00
snomiao
48414fa1b5 fix: make key presses visible in video — hold + subtitle
pressKey now uses keyboard.down/up with 400ms hold instead of
instant press(). Shows subtitle "⌨ Escape" and the keyboard HUD
catches the held state for video frame capture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 08:33:22 +00:00
snomiao
e744f101b0 fix: use zod instead of zod/v4 — project zod doesn't export /v4 subpath
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:51:37 +00:00
snomiao
9ad8267067 fix: add claude-agent-sdk to workspace catalog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:26:54 +00:00
snomiao
cf54ddb6d3 feat: control/test comparison strategy + QA backlog doc
- Agent system prompt now instructs Claude to demonstrate BOTH working
  (control) and broken (test) states when bug is triggered by a setting
- Added docs/qa/backlog.md with future improvements: Type B/C comparisons,
  TTS, pre-seeding, cost optimization, environment-dependent issues

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:24:10 +00:00
snomiao
fce31cf0bf feat: all badges use vertical box style
Drop horizontal badges. Universal box badge shows:
  ┌──────────────────┐
  │    #7414 QA       │
  │ ✓ 1 reproduced   │
  │ ⚙ Fix: APPROVED  │  ← only for PRs
  └──────────────────┘

Issues show repro/not-repro/inconclusive rows.
PRs add a fix quality row (APPROVED/MINOR/MAJOR).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:20:05 +00:00
snomiao
050091abc6 feat: show QA pipeline commit hash + timing on report site
- Shows "QA @ abc1234" linking to the pipeline code commit
- Shows start time → deploy time in header
- Helps trace which version of QA scripts generated each report

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 07:16:41 +00:00
snomiao
548e37b9a5 feat: hybrid QA agent — Claude Sonnet 4.6 brain + Gemini vision
Architecture:
- Claude Sonnet 4.6 plans and reasons (via Claude Agent SDK)
- Gemini 2.5 Flash watches video buffer and describes what it sees
- 4 tools: observe(), inspect(), perform(), done()

observe(seconds, focus): builds video clip from screenshot buffer,
  sends to Gemini with Claude's focused question.
inspect(selector): searches a11y tree for specific element state.
perform(action, params): executes Playwright action.
done(verdict, summary): signals completion.

Falls back to Gemini-only loop if ANTHROPIC_API_KEY not set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 06:27:30 +00:00
snomiao
458b2e918c feat: pass OPENAI_API_KEY to recording step for TTS narration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 04:53:13 +00:00
snomiao
63dbe002d1 feat: subtitle overlay + OpenAI TTS narration on reproduce videos
- Agent reasoning shown as subtitle bar at bottom of video during recording
- After recording, generates TTS audio via OpenAI API (tts-1, nova voice)
- Merges audio clips at correct timestamps into the video with ffmpeg
- Requires OPENAI_API_KEY env var; gracefully skips if not set
- No-sandbox + disable-dev-shm-usage for headless Chrome compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 04:42:03 +00:00
snomiao
f3d9a8c2e4 feat: show test requirements from QA guide on report site
- Download QA guide artifact in report job
- Extract prerequisites, test focus, and steps from guide JSON
- Display below the purpose description: focus → prerequisites → steps
- Separated by a subtle divider with smaller font

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 03:28:03 +00:00
snomiao
831718bd50 feat: purpose description on report + multi-pass video link fix
- Report site shows "PR #N aims to..." or "Issue #N reports..." block
  above the video cards, extracted from pr-context.txt
- Multi-pass video links fall back to pass1 when qa-{os}.mp4 is 404
- More negative verdict patterns: "does not demonstrate", "never tested"
- Risk uses first word of Overall Risk (avoids "high confidence" match)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 03:23:50 +00:00
snomiao
7d3ddbf619 fix: verdict detection — more negative patterns, risk uses first word
- Add "does not demonstrate", "steps were not performed", "never tested"
  to NOT_REPRO patterns (fixes #9101 false positive)
- Risk detection uses first word of Overall Risk section instead of
  grepping entire text (fixes "high confidence" matching HIGH)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 02:23:18 +00:00
snomiao
a42240cb65 fix: use addScriptTag for keyboard HUD to avoid tsx __name issue
tsx compiles arrow functions with __name helpers that don't exist in
browser context. Using addScriptTag with plain JS string avoids this.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 02:03:34 +00:00
snomiao
c957c0833b fix: remove TS type annotation from page.evaluate (browser context)
Set<string>() in page.evaluate causes __name ReferenceError in browser.
Use untyped Set() since browser JS doesn't support TS generics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 01:55:46 +00:00
snomiao
e01d4aaffc debug: add verdict count logging to deploy script 2026-03-27 01:54:14 +00:00
snomiao
3f226467cd fix: check negative verdicts before positive in per-report classification
"fails to reproduce" contains "reproduce" — must check negatives first
within each report. Across reports, REPRODUCED still wins (multi-pass).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 01:41:06 +00:00
snomiao
d4d8772ae7 feat: keyboard HUD overlay shows pressed keys in video
Injects a persistent overlay in bottom-right corner that displays
currently held keys (e.g. "⌨ Space", "⌨ CTRL+C"). Makes keyboard
interactions visible in the recording for both human and AI reviewers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 01:38:44 +00:00
snomiao
b578f8d7c4 feat: vertical box badge for multi-pass with breakdown
Multi-pass issues show a stacked box badge:
  ┌──────────────┐
  │  #7806 QA    │
  │ ✓ 1 reproduced    │
  │ ⚠ 1 inconclusive  │
  └──────────────┘

Single-pass issues keep the standard horizontal badge.
Badge colors: blue=reproduced, gray=not-repro, yellow=inconclusive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:49:19 +00:00
snomiao
d5360ce45c feat: show pass counts in badge for multi-pass reports (X/Y REPRODUCED)
When multiple report files exist, badge shows "2/3 REPRODUCED" instead
of just "REPRODUCED". Single-pass issues still show plain verdict.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:45:16 +00:00
snomiao
0e6d9fd926 fix: REPRODUCED wins over INCONCLUSIVE in multi-pass badge
When multiple passes exist and one confirms while another is
inconclusive, the badge should show REPRODUCED. Previously
INCONCLUSIVE was checked first, hiding successful reproductions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:36:59 +00:00
snomiao
de810f88a4 fix: cloneNode uses Ctrl+C/V instead of right-click Clone menu
The "Clone" context menu item doesn't exist in Nodes 2.0 mode.
Using Ctrl+C/Ctrl+V works in both legacy and Nodes 2.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:22:31 +00:00
snomiao
722b01a253 fix: preflight performs actual repro steps, not just setup
- #10307: preflight clones KSampler node, hint says drag to overlap
- #7414: preflight clicks numeric widget, hint says drag to change value
- #7806: preflight takes baseline screenshot, hint gives exact coords
  for holdKeyAndDrag with spacebar
- Hints now reference "Preflight already did X, NOW do Y" pattern

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 00:07:42 +00:00
snomiao
dfd19a3cf9 fix: tell agent what preflight already did to prevent repeated actions
Agent was wasting turns re-doing loadDefaultWorkflow and setSetting
that preflight already executed. Now the system prompt includes
"Already Done" section listing preflight actions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:40:48 +00:00
snomiao
3531e37ae7 fix: preflight actions + badge false-positive pattern
- Auto-execute prerequisite actions (enable Nodes 2.0, load default
  workflow) BEFORE the agentic loop starts. Agent model ignores prompt
  hints but preflight guarantees nodes are on canvas.
- Add "fails to reproduce" to NOT REPRODUCIBLE badge patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:25:41 +00:00
snomiao
024b231c05 feat: qa-issue label trigger + labels in issue context
- Add issues:[labeled] trigger and qa-issue label support
- Resolve github.event.issue.number for issue-triggered runs
- Include issue labels in context (feeds keyword matcher for hints)
- Remove qa-issue label after run completes (same as qa-changes/qa-full)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:49:00 +00:00
snomiao
0c22369c60 fix: keyword-driven action hints for agent issue reproduction
Scan issue context for keywords (clone, copy-paste, spacebar, resize,
sidebar, scroll, middle-click, node shape, Nodes 2.0, etc.) and inject
specific MUST-follow action steps into the agentic system prompt.

Addresses 9 INCONCLUSIVE issues where agent had actions available
but didn't know when to use them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:43:31 +00:00
snomiao
511fdf1b24 fix: restyle QA annotations to avoid misleading AI reviewer
- Annotations now use cyan dashed border + monospace "QA:" prefix
  instead of red solid labels that look like UI error messages
- Video review prompts explicitly tell reviewer to ignore QA annotations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:34:51 +00:00
snomiao
d756c362e3 fix: badge mismatch, multi-pass report overwrite, agent node creation
P1: Filter out QA bot's own comments from pr-context (INCONCLUSIVE loop)
P2: Grep only ## Summary section for verdict (false REPRODUCED fix)
P3: Strip markdown bold before matching Overall Risk section
P4: Deploy full placeholder page with spinner during CI
P5: Pass #NUM QA label to PREPARING/ANALYZING badges
P6: Add copyPaste, holdKeyAndDrag, resizeNode, middleClick actions
P7: preload=auto + custom seekbar (already deployed)
P8: Deploy FAILED badge on report job failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 22:02:17 +00:00
snomiao
ba512fd263 fix: video seeking — preload=auto, custom seekbar, _headers
- Change preload=metadata to preload=auto for full video download
- Add _headers file with Accept-Ranges for Cloudflare Pages
- Add custom seekbar (range input + buffer indicator) that works
  even without server HTTP range request support
- Seekbar shows buffered progress and allows dragging to any point

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 21:20:56 +00:00
snomiao
e1fb782832 fix: strengthen after-mode prompt to test PR-specific behavior
Previous prompt said "test the specific behavior" which was too vague,
leading to generic UI walkthroughs instead of targeted tests.

New prompt: explicitly instructs to read the diff, trigger the exact
scenario the PR fixes, and avoid generic menu screenshots.

Also added reload action to before/after prompt for state persistence tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 12:58:02 +09:00
snomiao
9c347642ba fix: badge mismatch, multi-pass report overwrite, agent node creation
- Fix quality badge now reads "## Overall Risk" section only
- Prevents false MAJOR ISSUES from severity labels or negated phrases
- "Low" risk → APPROVED, "High" → MAJOR ISSUES, "Medium" → MINOR ISSUES

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 03:47:54 +00:00
snomiao
5f8f40b559 fix: install pnpm before building PR frontend in sno-qa-* triggers
setup-frontend must run first to install node/pnpm, then rebuild
with PR code. Also re-install sno-skills deps after switching back
so QA scripts' dependencies are available.

Also gitignore .claude/scheduled_tasks.lock.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 12:17:26 +09:00
snomiao
264e71a9de fix: restore sno-skills scripts after building PR frontend
When triggered via sno-qa-* push, the workflow checks out the PR code
to build its frontend, but this replaces qa-record.ts which only
exists on sno-skills. Fix: build PR frontend, then checkout back to
sno-skills so QA scripts remain available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:56:12 +09:00
snomiao
49fa1b3caa fix: use array subshell instead of mapfile for macOS compat
mapfile is not available on macOS default bash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 01:22:05 +09:00
snomiao
3142c8ead6 fix: resolve remaining shellcheck warnings in qa-deploy-pages.sh
- SC2231: quote glob expansions in for loop
- SC2002: use sed directly instead of cat | sed
- SC2086: quote variable in echo

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:52:12 +09:00
snomiao
4310c5238a feat: clickable badges with #NUM label and copy button
- Badge generators accept optional label param (#NUM QA)
- Badge in PR/issue comments links to report site
- Report site shows badge with copy-to-clipboard button
- Copy button produces markdown: [![QA Badge](url/badge.svg)](url/)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 15:34:23 +00:00
snomiao
a7d7d39712 fix: resolve CI failures — test, shellcheck, format
- Update DefaultThumbnail test to match size-full class change
- Fix shellcheck warnings in qa-batch.sh (SC2001, SC2207)
- Fix shellcheck warnings in qa-deploy-pages.sh (SC2034, SC2235, SC2231, SC2002)
- Add qa-report-template.html to oxfmt ignore (minified, not formattable)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:33:29 +09:00
snomiao
24ac6f1566 fix: badge pattern too narrow/broad, multi-pass video discovery
- "confirmed" didn't match "confirms"/"reproducible" — use "reproduc|confirm" stem
- "partial" matched unrelated text — require "partially reproduced" specifically
- collectVideoCandidates now finds qa-session-*.mp4 for multi-pass reviews

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 12:18:53 +00:00
snomiao
f9a5baba1a fix: badge mismatch, multi-pass report overwrite, agent node creation
- Check INCONCLUSIVE before reproduced/confirmed in badge detection
- Exclude markdown headings from reproduced grep match
- Add --pass-label to qa-video-review.ts for unique multi-pass filenames
- Pass pass label from workflow YAML when reviewing numbered sessions
- Collect all pass-specific reports in deploy script HTML
- Add addNode/cloneNode convenience actions to qa-record agent
- Improve strategy hints for visual/rendering bug reproduction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 10:48:38 +00:00
snomiao
628f64631b chore: tidy PR for merge — resolve TODOs, fix misplaced import
- Remove push trigger (was for dev testing only)
- Restore concurrency group (was commented out for dev)
- Move misplaced import in qa-analyze-pr.ts to top of file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 19:40:55 +09:00
snomiao
bc38b2ce13 fix: extract deploy step into script to fix expression length limit
The Cloudflare Pages deploy step exceeded GitHub Actions' 21000 char
expression limit due to inline HTML/CSS/JS. Extract to
scripts/qa-deploy-pages.sh + scripts/qa-report-template.html.
2026-03-25 09:24:17 +00:00
snomiao
ca8e4d2a29 fix: enforce menu navigation pattern + add CI job link to report
- Strengthen prompt: MUST use openMenu → hoverMenuItem → clickMenuItem
  in that order. Previous runs skipped openMenu causing silent failures.
- Add CI Job link to the QA report site header for quick navigation
  to the GitHub Actions run that generated the report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 17:46:46 +09:00
snomiao
55ce174c5b fix: split dual badge generator into separate step to fix expression length
GitHub Actions has a 21000 char limit per expression. The combined
badge setup step exceeded this after adding the dual badge generator.
Split into its own step.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 08:37:23 +00:00
snomiao
49176271f8 chore: retrigger QA pipeline 2026-03-25 06:23:00 +00:00
snomiao
50518449fc fix: remove Bug: prefix from PR badge, just show REPRODUCED directly
Badge now reads: QA Bot | REPRODUCED | Fix: APPROVED
Not all issues are bugs — could be feature requests too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 15:21:02 +09:00
snomiao
65aa03b20d fix: use blue for REPRODUCED badge (success, not failure)
Reproducing a bug is a successful outcome for the QA bot.
Blue (#2196f3) = bot succeeded. Red = bot found problems with the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 15:19:28 +09:00
snomiao
ce59b6a431 feat: combine PR bug+fix into single dual-segment badge
PRs now show one badge with three segments:
  QA Bot | Bug: REPRODUCED | Fix: APPROVED

Instead of two separate badges. Uses gen-badge-dual.sh which
renders label + bug status + fix status in one SVG.

Issues still use single two-segment badge:
  QA Bot | FINISHED: REPRODUCED

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 15:14:35 +09:00
snomiao
203e8c8a60 feat: add visible cursor overlay and annotation action to QA recorder
- Inject fake cursor (red dot with click animation) via addInitScript
  since headless Chrome doesn't render the system cursor in video
- Add hover-before-click delay to clickByText and canvas clicks
  so viewers can see where the cursor moves before clicking
- Add 'annotate' action: shows a floating label at (x,y) for N ms
  so AI can draw viewer attention to important UI state changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:50:25 +09:00
snomiao
b485d22760 fix: feed QA guide + issue context to agentic reproduce loop
Root cause: runAgenticLoop never read the QA guide — agent saw
"No issue context provided" for issues. Now reads qaGuideFile,
parses structured fields, and injects into system prompt.

Also: fetch issue body via gh issue view in workflow, increase
budget to 120s/30 turns, add focus reminders, smarter stuck
detection (50px grid normalization + action-type frequency),
reject invalid click targets, add loadDefaultWorkflow and
openSettings convenience actions, strategy hints in prompt.

Fix pre-existing typecheck error in eslint.config.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 05:47:04 +00:00
snomiao
2d1088f79e feat: dual badges for PRs — bug reproduction + fix quality
PRs now get two separate badges:
- Bug: REPRODUCED / NOT REPRODUCIBLE / PARTIAL (before branch)
- Fix: APPROVED / MAJOR ISSUES / MINOR ISSUES (after branch)

Issues keep a single badge: FINISHED: REPRODUCED / etc.

Both badge-bug.svg and badge-fix.svg served from the deploy site.
PR comment shows all three: ![badge] ![bug] ![fix]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:39:15 +09:00
snomiao
1fe0f97aa5 fix: badge FINISHED state includes result sub-state
FINISHED is not standalone — always shows result:
- FINISHED: REPRODUCED / NOT REPRODUCIBLE / PARTIAL (issues)
- FINISHED: APPROVED / MAJOR ISSUES / MINOR ISSUES (PRs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:25:53 +09:00
snomiao
178ecc6746 feat: add SVG status badge to QA report site
Badge shows QA pipeline status, deployed at each stage:
- PREPARING (blue) — setting up artifacts
- ANALYZING (orange) — running video review
- Final status with color:
  - Issues: REPRODUCED (red) / NOT REPRODUCIBLE (gray) / PARTIAL (yellow)
  - PRs: APPROVED (green) / MAJOR ISSUES (red) / MINOR ISSUES (yellow)

Badge served as /badge.svg from the same Cloudflare Pages site.
Included in PR comment as ![QA Badge](url/badge.svg).

Also restore @ts-expect-error for import-x plugin type incompatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:04:57 +09:00
GitHub Action
20f878f929 [automated] Apply ESLint and Oxfmt fixes 2026-03-25 04:31:51 +00:00
snomiao
712c386a69 fix: regenerate pnpm-lock.yaml after rebase
The rebase introduced a duplicated mapping key in the lockfile,
causing ERR_PNPM_BROKEN_LOCKFILE in CI.
2026-03-25 04:28:26 +00:00
snomiao
8f6fa738a5 fix: use correct flash model name for agentic loop 2026-03-25 03:19:28 +00:00
snomiao
387862d8b9 feat: agentic screenshot feedback loop + multi-pass recording
Replace single-shot step generation in reproduce mode with an agentic
loop where Gemini sees the screen after each action and decides what
to do next. For multi-bug issues, decompose into sub-issues and run
separate recording passes.

- Extract executeAction() from executeSteps() for reuse
- Add reload and done action types
- Add captureScreenshotForGemini() (JPEG q50, ~50KB)
- Add runAgenticLoop() with sliding window history (3 screenshots)
- Add decomposeIssue() for multi-pass recording (1-3 sub-issues)
- Update workflow to handle numbered session videos (qa-session-1, etc.)
2026-03-25 03:18:01 +00:00
snomiao
ff98eb13e4 feat: add frame-by-frame video controls to QA report player
- Add custom video controls below each video with frame stepping
- Frame back/forward buttons (1 frame at 30fps, 10 frames skip)
- Speed selector: 0.1x, 0.25x, 0.5x (default), 1x, 1.5x, 2x
- Keyboard shortcuts: arrow keys for frame step, space for play/pause
- SMPTE-style timecode display (m:ss.ms)
- Default 0.5x speed since AI operates UI faster than humans
- Videos no longer autoplay (pause on load for inspection)
- Zero external dependencies (pure HTML5 video API)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:18:01 +00:00
snomiao
f4f80d179f fix: cap reproduce video at 5min, skip env setup in Phase 4
- Reproduce video must be max 5 minutes (short, focused demo)
- Phase 4 reuses the environment from Phase 3 (no re-setup)
- Use video-start/video-stop commands (not --save-video flag)
- Start recording right before steps, stop immediately after

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:18:01 +00:00
snomiao
f758e16b72 feat: add reproduce-issue skill with two-video architecture
New Claude agent-driven issue reproduction skill that:
- Phase 1-2: Research issue and set up environment (custom nodes, workflows, settings)
- Phase 3: Record research video while exploring interactively via playwright-cli
- Phase 4: Record clean reproduce video with only the minimal repro steps
- Phase 5: Generate structured reproduction report

Key difference from the old approach: Claude agent explores and adapts
instead of blindly executing a Gemini-generated static plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:18:01 +00:00
snomiao
45e309c5f8 fix: add default workflow node positions to QA prompts
Gemini was right-clicking empty canvas instead of nodes because it
didn't know where the default workflow nodes are positioned. Now the
prompt includes approximate coordinates for all 7 default nodes and
clarifies the difference between node context menu vs canvas menu.

Also fixes TS2352 in page.evaluate by using double-cast through unknown.
2026-03-25 03:18:01 +00:00
snomiao
79df405733 feat: upgrade QA pipeline to Gemini 3.x models
- qa-record.ts, qa-analyze-pr.ts: gemini-2.5-flash/pro → gemini-3.1-pro-preview
- qa-video-review.ts, qa-generate-test.ts: gemini-2.5-flash → gemini-3-flash-preview
- pr-qa.yaml: update hardcoded model reference
- Add docs/qa/models.md with model comparison and rationale
2026-03-25 03:18:01 +00:00
snomiao
27c64e1092 feat: add loadWorkflow and setSetting actions to QA recorder
- Add loadWorkflow action to load workflow JSON from URL
- Add setSetting action to configure ComfyUI settings
- Improve reproduce mode prompt to emphasize establishing prerequisites
  (save workflow first, make it dirty, add needed nodes, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:18:01 +00:00
snomiao
a1307ef35c feat: add clickable issue/PR URLs to QA reports
- Add --target-url CLI option to qa-video-review.ts
- Include target URL in generated markdown reports
- Add clickable issue/PR link in deployed HTML report header
- Workflow passes the target URL automatically

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:17:56 +00:00
snomiao
11f02a2645 fix: resolve pre-existing typecheck errors
- Remove unused @ts-expect-error directives in eslint.config.ts
- Simplify LazyImage prop types from ClassValue to string
- Fix DialogInstance to avoid infinitely deep type instantiation
- Use cn() in DefaultThumbnail for class merging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 03:17:30 +00:00
snomiao
b3bcc3ff4c fix: improve fillDialog and clickSubmenuItem for litegraph UI components
- fillDialog now tries: PrimeVue dialog → node search box → focused input → keyboard fallback
- clickSubmenuItem now tries: PrimeVue tiered menu → litegraph context menu → role menuitem
- Fixes double-click-to-add-node flow and right-click context menu clicks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:30 +00:00
snomiao
49e904918e fix: add step-level error resilience and click timeout in QA recording
- Wrap each step in try/catch so failed steps don't abort the recording
- Add 5s timeout to clickByText to prevent 30s hangs on disabled elements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
78e5b1e1b3 feat: expand QA action set and improve issue reproduction depth
- Add new canvas actions: rightClick, doubleClick, clickCanvas,
  rightClickCanvas, dragCanvas, scrollCanvas for node graph interactions
- Increase reproduce mode step limit from 3-6 to 8-15 steps
- Add ComfyUI UI context to prompts (canvas layout, node interactions)
- Add anti-hallucination instructions to video review for issue mode
- Improve issue analysis prompt with detailed action descriptions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
81e3dc72f8 fix: use pulls API instead of gh pr view for PR/issue detection
gh pr view can't distinguish PRs from issues — it succeeds for both.
Use the REST API endpoint repos/{owner}/{repo}/pulls/{number} which
returns 404 for issues.
2026-03-25 03:17:29 +00:00
snomiao
936cf83337 fix: address Copilot review feedback on QA scripts
- Enforce requestTimeoutMs via Gemini SDK requestOptions
- Add 100MB video size check before base64 encoding
- Sanitize screenshot filenames to prevent path traversal
- Sort video files by mtime for reliable rename
- Validate --mode arg against allowed values
- Add Content-Length pre-check in downloadMedia
- Add GitHub domain allowlist for media downloads (SSRF mitigation)
- Add contents:write permission and git config for report job
- Update Node.js requirement in SKILL.md from 18+ to 22+
2026-03-25 03:17:29 +00:00
snomiao
1d880bf493 fix: gracefully skip invalid pressKey values in QA recording
Instead of crashing the entire recording session when Gemini generates
an invalid key name (e.g. "mouseWheelDown"), catch the error and
continue with remaining steps.
2026-03-25 03:17:29 +00:00
snomiao
da3e6cb4cf feat: add behavior changes summary table to QA video review
Add a "Behavior Changes" table (Behavior, Before, After, Verdict)
alongside the existing timeline comparison. This gives reviewers a
quick high-level view of all behavioral differences before diving
into the frame-by-frame timeline.
2026-03-25 03:17:29 +00:00
snomiao
a39e3054cf fix: format before/after comparison as table in QA video review
Instruct Gemini to output the Before vs After section as a markdown
table with Time, Type, Severity, Before, After columns for easier
comparison. Update HTML template table styles with fixed layout and
column widths optimized for the 5-column comparison format.
2026-03-25 03:17:29 +00:00
snomiao
5a6178e924 fix: handle array response from Gemini in analyze-pr
Gemini Pro with responseMimeType: 'application/json' returns a JSON
array [before, after] instead of {before, after}. Handle both shapes.
2026-03-25 03:17:29 +00:00
snomiao
a11e8a67f8 fix: make analyze-pr non-blocking and log Gemini response
- Log raw Gemini response for debugging when parsing fails
- Handle possible wrapper keys in response
- Make qa-before/qa-after run even if analyze-pr fails (only gate
  on resolve-matrix success)
2026-03-25 03:17:29 +00:00
snomiao
6a836b7c25 fix: extract PR number from sno-qa-<number> branch name
When running on push events for sno-qa-* branches without an open PR,
extract the PR number from the branch name so analyze-pr can fetch
the full PR thread for analysis.
2026-03-25 03:17:29 +00:00
snomiao
f91f94f71a feat: add analyze-pr job to QA pipeline
Add Gemini Pro-powered PR analysis that generates targeted QA guides
from the full PR thread (description, comments, screenshots, diff).
The analyze-pr job runs on lightweight ubuntu before recordings start,
producing qa-guide-before.json and qa-guide-after.json that are
downloaded by recording jobs to produce more focused test steps.

Graceful fallback: if analysis fails, recordings proceed without guides.
2026-03-25 03:17:29 +00:00
snomiao
7502528733 fix: download before/after artifacts into separate directories
download-artifact@v7 merges all files flat regardless of
merge-multiple setting. Use separate path dirs (before/after)
and copy all files into the report directory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
0656091959 fix: set merge-multiple false for download-artifact v7
download-artifact@v7 defaults merge-multiple to true, which puts all
files flat in qa-artifacts/ instead of per-artifact subdirectories.
The merge step expects qa-artifacts/qa-before-{os}-{run}/ subdirs,
so the report directory never gets created and video review finds
no files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
6515170d08 refactor: split QA into parallel before/after jobs
Instead of running before/after sequentially in a single job with
fragile git stash/checkout gymnastics, split into two independent
parallel jobs on separate runners:

  resolve-matrix → qa-before (main) ─┐
                 → qa-after  (PR)   ─┴→ report

- qa-before: uses git worktree for clean main branch build
- qa-after: normal PR build via setup-frontend
- report: downloads both artifact sets, merges, runs Gemini review

Benefits:
- Clean workspace isolation (no git checkout origin/main -- .)
- ~2x faster (parallel execution)
- Each job gets its own ComfyUI server (no shared state)
- Eliminates entire class of workspace contamination bugs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
25cbe56a34 fix: select existing user from dropdown instead of re-creating
The Pre-seed step creates qa-ci via API, so the "New user" form
shows "already exists" error. Fix by selecting the existing user
from the dropdown first, falling back to a unique username.
2026-03-25 03:17:29 +00:00
snomiao
120a531ef9 fix: use login page directly instead of localStorage bypass
The localStorage userId bypass doesn't work because the server
validates user IDs and rejects the simple 'qa-ci' string. Instead,
detect the login page by its input fields and create a user via the
"New user" text input, which is how real users would log in.
2026-03-25 03:17:29 +00:00
snomiao
389f6ca6b8 fix: switch from Firefox to Chromium for WebGL canvas support
Firefox headless doesn't support WebGL, causing "getCanvas: canvas is
null" errors. Switch to Chromium which has full headless WebGL support.
Also fix login flow to wait for async router guard to settle and
create user via text input as fallback.
2026-03-25 03:17:29 +00:00
snomiao
6298fc3a58 fix: add debug screenshot and fallback for menu button click
Add coordinate fallback when .comfy-menu-button-wrapper selector isn't
found, and capture a debug screenshot after login to diagnose what the
page looks like when the editor UI fails to render.
2026-03-25 03:17:29 +00:00
snomiao
83702c2e87 fix: use proper CSS selectors for menu interactions in QA recording
The openComfyMenu was clicking at hardcoded coordinates (20, 67) which
missed the menu button. Now uses .comfy-menu-button-wrapper selector
matching the browser tests. Also fixes menu item hover/click selectors
to use .p-menubar-item-label and .p-tieredmenu-item classes, and adds
a wait for the editor UI to fully load before executing test steps.
2026-03-25 03:17:29 +00:00
snomiao
c5d207fa9a fix: bypass login in QA recordings with localStorage pre-seeding
The QA recordings were stuck on the user selection screen because CI
has no existing users. Fix by pre-seeding localStorage with userId,
userName, and TutorialCompleted before navigation, plus creating a
qa-ci user via API as a fallback.
2026-03-25 03:17:29 +00:00
snomiao
ab6dff02c9 feat: autoplay and loop videos on QA dashboard 2026-03-25 03:17:29 +00:00
snomiao
7396d39a6a fix: handle flat artifact layout when no report.md exists
The normalize step couldn't create the qa-report-* subdir because it
only looked for *-report.md files. Add fallback to detect webm files.
2026-03-25 03:17:29 +00:00
snomiao
99e6681237 fix: exclude known video names when renaming Playwright recordings
The AFTER step was renaming qa-before-session.webm instead of the new
recording. Filter out already-named files before picking the latest.
2026-03-25 03:17:29 +00:00
snomiao
ae29224874 fix: dismiss dropdown overlay before clicking Next in QA login flow
The user dropdown shows "No available options" in CI, and the overlay
blocks the Next button. Dismiss with Escape before attempting click.
2026-03-25 03:17:29 +00:00
snomiao
47e5c39ac9 fix: install main branch deps before building, then reinstall PR deps
Main branch imports @vueuse/router which isn't in PR's node_modules.
Need both: main deps for building main, PR deps for running QA scripts.
2026-03-25 03:17:29 +00:00
snomiao
28f530d53a fix: reinstall PR deps after main branch build to restore @google/generative-ai
The main branch build step was running pnpm install with main's lockfile,
which removed @google/generative-ai from node_modules. Move the reinstall
to after restoring PR files so the QA recording script can find its deps.
2026-03-25 03:17:28 +00:00
snomiao
282754743d fix: temporarily disable concurrency group to unstick QA runs 2026-03-25 03:17:28 +00:00
snomiao
0a91ec09e8 fix: restore PR files with git checkout HEAD instead of git checkout -
git checkout - uses @{-1} which requires a previous branch switch.
Since we use 'git checkout origin/main -- .' (file checkout, not branch
switch), there is no @{-1} ref. Use HEAD to restore from current branch.

Also restore proper concurrency group.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
25fd1b2700 fix: use unique concurrency group to unstick QA runs 2026-03-25 03:17:28 +00:00
snomiao
b9d5ff0f8d ci: trigger QA run 2026-03-25 03:17:28 +00:00
snomiao
746d465912 feat: integrate comfy-qa skill test plan into QA recording pipeline
Pass the comprehensive test plan from .claude/skills/comfy-qa/SKILL.md
to Gemini when generating test steps. This gives Gemini knowledge of all
12 QA categories (canvas, menus, sidebar, settings, etc.) so it picks
the most relevant tests for each PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
e314a18b90 fix: use vite build directly to skip nx typecheck dependency
nx build runs typecheck as a prerequisite (via @nx/vite/plugin config).
Use vite build directly for the main branch comparison build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
78fb9ef27f fix: skip typecheck when building main branch for QA comparison
Main branch may have transient TS errors when built with the PR
branch's lockfile. Since we only need the dist for visual comparison,
run nx build directly instead of pnpm build (which includes typecheck).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
41999c2e0f refactor: replace Codex with direct Playwright recording in QA pipeline
Replace the unreliable codex exec approach with a Playwright script
(qa-record.ts) that uses Gemini to generate targeted test steps from
the PR diff, then executes them deterministically via Playwright's API.

Key changes:
- New scripts/qa-record.ts: Gemini generates JSON test actions, Playwright
  executes them with reliable helper functions (menu nav, dialog fill, etc.)
- Remove codex CLI and playwright-cli dependencies
- Remove 150+ lines of prompt templates from pr-qa.yaml
- Firefox headless with video recording (same approach proven locally)
- Fallback steps if Gemini fails

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
5c243d16be feat: auto-generate regression tests from QA reports
- Tighten BEFORE prompt to 15s snapshot (show old state only)
- Add qa-generate-test.ts: Gemini-powered Playwright test generator
- New workflow step: generate .spec.ts and push to {branch}-add-qa-test
- Tests assert UIUX behavior (tab names, dirty state, visibility)
2026-03-25 03:17:28 +00:00
snomiao
774dcd823e feat: before/after video comparison for QA pipeline
- Build both main (dist-before/) and PR (dist/) frontends in focused mode
- Run QA twice: BEFORE on main branch frontend, AFTER on PR branch
- Send both videos to Gemini in one request for comparative analysis
- Side-by-side dashboard layout with Before (main) / After (PR) panels
- Comparative prompt evaluates whether before confirms old behavior
  and after proves the fix works
- Falls back to single-video mode when no before video available
2026-03-25 03:17:28 +00:00
snomiao
b500f826fc fix: make QA videos seekable with faststart and frequent keyframes
moov atom was at end of file (8.6MB offset) — browser had to download
the entire video before seeking. Keyframes were only every 10 seconds.

Add -movflags +faststart (moov before mdat) and -g 60 (keyframe every
2.4s at 25fps) to ffmpeg conversion.
2026-03-25 03:17:28 +00:00
snomiao
1cd9e171c6 fix: issue cards instead of dense table, rename to comfy-qa.pages.dev
- Replace 6-column confirmed issues table with vertical card blocks
  using colored severity/timestamp/confidence badges
- Rename Cloudflare Pages project from comfyui-qa-videos to comfy-qa
2026-03-25 03:17:28 +00:00
snomiao
5c0bef9b72 fix: seekable video, hide empty cards, PR-aware video review
- Remove autoplay/loop so video timeline is seekable
- Skip card generation for platforms without recordings
- Add --pr-context flag to qa-video-review.ts so Gemini evaluates
  against PR purpose instead of just describing what happened
- Workflow now builds pr-context.txt from PR title/body/diff
2026-03-25 03:17:28 +00:00
snomiao
94e1388495 feat: redesign QA dashboard with modern frontend design
OKLCH color tokens, liquid glass card surfaces, Inter + JetBrains Mono
typography, grain texture overlay, staggered fade-up animations, pill
action buttons with SVG icons, and improved report table styling.
2026-03-25 03:17:28 +00:00
snomiao
0268e8f977 fix: make settings pre-seed non-fatal and try both API endpoints
The /api/settings endpoint returned 4xx in CI. Try both /api/settings
and /settings endpoints, and don't fail the job if neither works.
2026-03-25 03:17:28 +00:00
snomiao
4f1df7c7ce fix: pre-seed Comfy.TutorialCompleted to skip template gallery in QA
The Codex agent was spending 35s browsing the "Getting Started" template
gallery instead of testing the PR's changes. Pre-seeding this setting
via the ComfyUI API ensures the agent lands directly in the graph editor.
2026-03-25 03:17:28 +00:00
snomiao
3b2fdc786a fix: tighten focused QA prompt to only test PR-specific behavior
The Codex agent was spending time on login flow, template browsing,
and general smoke testing instead of testing the PR's actual changes.

Changes:
- Add 30-second time budget for video recording
- Move video-start AFTER login and editor verification
- Explicitly prohibit template browsing and sidebar exploration
- Reduce test steps to 3-6 targeted actions
- Restructure prompt with clear Instructions/Rules sections
2026-03-25 03:17:28 +00:00
snomiao
4d1ad4dcf0 fix: render markdown in QA reports with marked.js
Replace crude sed-based markdown conversion with client-side
rendering via marked.js CDN. Adds proper table, list, and
code styling for the report section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
7ba1aaed53 fix: run report job on workflow_dispatch events
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
09a3c10d50 refactor: replace GPT frame extraction with Gemini native video analysis
Replace the OpenAI GPT-based frame extraction approach (ffmpeg + screenshots)
with Gemini 2.5 Flash's native video understanding. This eliminates false
positives from frame-based analysis (e.g. "black screen = critical bug" during
page transitions) and produces dramatically better QA reviews.

Changes:
- Remove ffmpeg frame extraction, ffprobe duration detection, and all related
  logic (~365 lines removed)
- Add @google/generative-ai SDK for native video/mp4 upload to Gemini
- Update CLI: remove --max-frames, --min-interval-seconds, --keep-frames flags
- Update env: OPENAI_API_KEY → GEMINI_API_KEY
- Update workflow: swap API key secret and model in pr-qa.yaml
- Update report: replace "Frames analyzed" with "Video size"
- Add note in prompt that brief black frames during transitions are normal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
2f30fbe060 fix: use fill+click for quick login before video recording
playwright-cli doesn't support 'evaluate' command. Instead, instruct
Codex to quickly fill the username input and click Next on user-select
page BEFORE starting video recording, so the video only shows actual
QA testing.
2026-03-25 03:17:27 +00:00
snomiao
85adbe4878 fix: use evaluate to set localStorage before video recording
storageState config doesn't work with playwright-cli. Instead, use
evaluate to set Comfy.userId/userName after opening the page, then
navigate back. This skips user-select before video-start so the
recording only shows actual QA testing.
2026-03-25 03:17:27 +00:00
snomiao
0c707b5deb fix: pre-seed localStorage to skip user-select in QA runs
Write a Playwright storageState JSON with Comfy.userId/userName pre-set
so the app loads directly to the graph editor. Saves ~40s per QA run
that was wasted on navigating the user-select page.
2026-03-25 03:17:27 +00:00
snomiao
4f98518c22 fix: prefer explicit qa-session.webm over corrupt auto-recorded videos
The convert step was using find which picked up a 0-byte file from
playwright's videos/ directory instead of the valid qa-session.webm.
Now prefers qa-session.webm explicitly and skips empty files.
2026-03-25 03:17:27 +00:00
snomiao
ca394b9ff7 fix: improve focused QA prompt to test PR-specific behavior, not random walk 2026-03-25 03:17:27 +00:00
snomiao
51878db99d fix: re-add push trigger for sno-skills and sno-qa-* branches 2026-03-25 03:17:27 +00:00
snomiao
d0534003e7 fix: also install ffprobe for GPT video review frame extraction 2026-03-25 03:17:27 +00:00
snomiao
3d0cd72465 fix: use sudo for ffmpeg static binary extraction to /usr/local/bin 2026-03-25 03:17:27 +00:00
snomiao
82849df891 fix: use static ffmpeg binary instead of apt-get (avoids dpkg lock hang) 2026-03-25 03:17:27 +00:00
snomiao
1c62c0edc3 fix: add ffmpeg install back (not pre-installed on GH runners) 2026-03-25 03:17:27 +00:00
snomiao
6ea2ce755d fix: normalize flat artifact download into expected subdirectory 2026-03-25 03:17:27 +00:00
snomiao
8fc4480ee2 fix: pre-install chromium and clarify prompt for codex
Codex was using pnpm dlx instead of the global playwright-cli.
Pre-install chromium in setup step and make prompt explicit about
using the global command directly without pnpm/npx.
2026-03-25 03:17:27 +00:00
snomiao
3515a478fd fix: add debug output to video convert step 2026-03-25 03:17:27 +00:00
snomiao
11432992d3 fix: use danger-full-access sandbox for codex on GH Actions 2026-03-25 03:17:27 +00:00
snomiao
e619b0143a fix: use correct codex model name gpt-5.4-mini 2026-03-25 03:17:27 +00:00
snomiao
7d4a008f29 feat: switch QA from Claude Code to OpenAI Codex CLI
Replace claude --print with codex exec for cheaper QA runs.
Uses codex-mini-latest model ($1.50/$6 vs Sonnet $3/$15).
Uses existing OPENAI_API_KEY secret (no new secrets needed).
2026-03-25 03:17:27 +00:00
snomiao
a3e65140a9 fix: default to linux-only QA, full 3-OS only via qa-full label
Reduces per-run cost from ~$10-16 to ~$2.50 by defaulting to
Linux-only. Use qa-full label or workflow_dispatch for 3-OS runs.
2026-03-25 03:17:27 +00:00
snomiao
f55ab36dd7 fix: use explicit video-start/stop, remove ffmpeg install, use gpt-4.1-mini
- Replace saveVideo config (didn't produce video) with explicit
  playwright-cli video-start/video-stop commands in QA prompt
- Remove apt-get install ffmpeg step (pre-installed on GH runners)
- Switch video review model from gpt-4o to gpt-4.1-mini
2026-03-25 03:17:27 +00:00
snomiao
698a894b42 fix: use auto video recording and show GPT reports on QA site
- Enable saveVideo in playwright-cli config for real video recording
- Replace screenshot stitching with webm→mp4 conversion
- Move video review step before deploy so reports are included
- Add GPT video review reports inline on the Cloudflare Pages site
- Each video card now has expandable "GPT Video Review" section
2026-03-25 03:17:27 +00:00
snomiao
fbd7f404ef fix: configure playwright-cli outputDir and improve artifact collection
- Set .playwright/cli.config.json with outputDir pointing to screenshots/
- This way bare 'playwright-cli screenshot' auto-saves to the right place
- Create screenshot directory before Claude runs (don't rely on Claude)
- Collect step now searches working directory for stray PNGs
- Simplified prompt: no --filename needed, just 'playwright-cli screenshot'
2026-03-25 03:17:27 +00:00
snomiao
d633ce19a7 fix: stitch screenshots from correct directory and simplify prompt
Screenshots were saved to artifact root but stitch looked in frames/.
Now: prompt tells Claude to save to screenshots/ dir with numbered names,
collect step consolidates PNGs there, stitch step globs from screenshots/.
Removed video-start/video-stop (Claude doesn't use them).
2026-03-25 03:17:27 +00:00
snomiao
9a7b5f88a0 fix: use playwright-cli video recording and collect default output
- Add playwright-cli config with outputDir and saveVideo
- Use video-start/video-stop instead of relying on screenshot frames
- Add fallback artifact collection from .playwright-cli/ default dir
- Simplify prompts to focus on video recording workflow
2026-03-25 03:17:27 +00:00
snomiao
be261a8a86 fix: resolve QA_ARTIFACTS path in prompt so Claude gets the literal path
The escaped \$QA_ARTIFACTS in the heredoc produced literal text
'$QA_ARTIFACTS' in the prompt. Claude's Bash tool didn't reliably
expand this env var, so no screenshots or reports were saved.
Remove the escapes so the heredoc expands the variable to the actual
path (e.g. /home/runner/work/_temp/qa-artifacts).
2026-03-25 03:17:27 +00:00
snomiao
b2c31e785f fix: escape backticks in QA prompt heredoc to prevent command substitution
Backtick-wrapped playwright-cli examples in the unquoted heredoc were
being interpreted as bash command substitution, producing empty prompts.
Replace backtick syntax with plain "Run:" prefixed commands.
2026-03-25 03:17:27 +00:00
snomiao
475f9ae5f0 fix: reorganize QA CI — remove screen recording, merge video-review into report
- Remove all Xvfb/ffmpeg screen recording infrastructure from qa job
  (captured blank display since playwright-cli runs headless)
- Add screenshot instructions to QA prompts: Claude saves sequential
  frames to $QA_ARTIFACTS/frames/ after every interaction
- Stitch screenshots into video via ffmpeg in report job (2fps)
- Merge video-review job into report job (4 jobs → 3 jobs)
- Unified PR comment with video links + video review in <details> collapse
- Clean up stale QA_VIDEO_REVIEW_COMMENT markers from prior runs
2026-03-25 03:17:27 +00:00
GitHub Action
c329022e15 [automated] Apply ESLint and Oxfmt fixes 2026-03-25 03:17:27 +00:00
snomiao
63d0df9ff0 fix: harden setup-comfyui-server against shell injection
Move extra_server_params input to env var to prevent shell injection
from untrusted input. Replace wait-for-it pip dependency with a
cross-platform curl polling loop.
2026-03-25 03:17:27 +00:00
snomiao
c1593054fb feat: add comfy-qa skill and automated QA CI pipeline
Add Claude Code skills and a label-triggered QA workflow:

- .claude/skills/comfy-qa/SKILL.md: 12-category QA test plan using
  playwright-cli for browser automation
- .github/workflows/pr-qa.yaml: CI workflow triggered by qa-changes
  (focused, Linux) or qa-full (3-OS matrix) labels. Records screen via
  ffmpeg, runs Claude CLI with playwright-cli, deploys video gallery to
  Cloudflare Pages, posts PR comment with GIF thumbnails, and runs
  OpenAI vision-based video review
- scripts/qa-video-review.ts: frame extraction + GPT-4o analysis
- scripts/qa-video-review.test.ts: unit tests for video review
- knip.config.ts: resolve knip errors for ingest-types package
2026-03-25 03:17:26 +00:00
27 changed files with 7530 additions and 253 deletions

View File

@@ -0,0 +1,278 @@
---
name: reproduce-issue
description: 'Reproduce a GitHub issue by researching prerequisites, setting up the environment (custom nodes, workflows, settings), and interactively exploring ComfyUI via playwright-cli until the bug is confirmed. Then records a clean demo video.'
---
# Issue Reproduction Skill
Reproduce a reported GitHub issue against a running ComfyUI instance. This skill uses an interactive, agent-driven approach — not a static script. You will research, explore, retry, and adapt until the bug is reproduced, then record a clean demo.
## Architecture
Two videos are produced:
1. **Research video** — the full exploration session: installing deps, trying things, failing, retrying, figuring out the bug. Valuable for debugging context.
2. **Reproduce video** — a clean, minimal recording of just the reproduction steps. This is the demo you'd attach to the issue.
```
Phase 1: Research → Read issue, understand prerequisites
Phase 2: Environment → Install custom nodes, load workflows, configure settings
Phase 3: Explore → [VIDEO 1: research] Interactively try to reproduce (retries OK)
Phase 4: Record → [VIDEO 2: reproduce] Clean recording of just the minimal repro steps
Phase 5: Report → Generate a structured reproduction report
```
## Prerequisites
- ComfyUI server running (ask user for URL, default: `http://127.0.0.1:8188`)
- `playwright-cli` installed: `npm install -g @playwright/cli@latest`
- `gh` CLI (authenticated, for reading issues)
- ComfyUI backend with Python environment (for installing custom nodes)
## Phase 1: Research the Issue
1. Fetch the issue details:
```bash
gh issue view <number> --repo Comfy-Org/ComfyUI_frontend --json title,body,comments
```
2. Extract from the issue body:
- **Reproduction steps** (the exact sequence)
- **Prerequisites**: specific workflows, custom nodes, settings, models
- **Environment**: OS, browser, ComfyUI version
- **Media**: screenshots or videos showing the bug
3. Search the codebase for related code:
- Find the feature/component mentioned in the issue
- Understand how it works currently
- Identify what state the UI needs to be in
## Phase 2: Environment Setup
Set up everything the issue requires BEFORE attempting reproduction.
### Custom Nodes
If the issue mentions custom nodes:
```bash
# Find the custom node repo
# Clone into ComfyUI's custom_nodes directory
cd <comfyui_path>/custom_nodes
git clone <custom_node_repo_url>
# Install dependencies if needed
cd <custom_node_name>
pip install -r requirements.txt 2>/dev/null || true
# Restart ComfyUI server to load the new nodes
```
### Workflows
If the issue references a specific workflow:
```bash
# Download workflow JSON if a URL is provided
curl -L "<workflow_url>" -o /tmp/test-workflow.json
# Load it via the API
curl -X POST http://127.0.0.1:8188/api/workflow \
-H "Content-Type: application/json" \
-d @/tmp/test-workflow.json
```
Or load via playwright-cli:
```bash
playwright-cli goto "http://127.0.0.1:8188"
# Drag-and-drop or use File > Open to load the workflow
```
### Settings
If the issue requires specific settings:
```bash
# Use playwright-cli to open settings and change them
playwright-cli press "Control+,"
playwright-cli snapshot
# Find and modify the relevant setting
```
## Phase 3: Interactive Exploration — Research Video
Start recording the **research video** (Video 1). This captures the full exploration — mistakes, retries, dead ends — all valuable context.
```bash
# Open browser and start video recording
playwright-cli open "http://127.0.0.1:8188"
playwright-cli video-start
# Take a snapshot to see current state
playwright-cli snapshot
# Interact based on what you see
playwright-cli click <ref>
playwright-cli fill <ref> "text"
playwright-cli press "Control+s"
# Check results
playwright-cli snapshot
playwright-cli screenshot --filename=/tmp/qa/research-step-1.png
```
### Key Principles
- **Observe before acting**: Always `snapshot` before interacting
- **Retry and adapt**: If a step fails, try a different approach
- **Document what works**: Keep notes on which steps trigger the bug
- **Don't give up**: Try multiple approaches if the first doesn't work
- **Establish prerequisites**: Many bugs require specific UI state:
- Save a workflow first (File > Save)
- Make changes to dirty the workflow
- Open multiple tabs
- Add specific node types
- Change settings
- Resize the window
### Common ComfyUI Interactions via playwright-cli
| Action | Command |
| ------------------- | -------------------------------------------------------------- |
| Open hamburger menu | `playwright-cli click` on the C logo button |
| Navigate menu | `playwright-cli hover <ref>` then `playwright-cli click <ref>` |
| Add node | Double-click canvas → type node name → select from results |
| Connect nodes | Drag from output slot to input slot |
| Save workflow | `playwright-cli press "Control+s"` |
| Save As | Menu > File > Save As |
| Select node | Click on the node |
| Delete node | Select → `playwright-cli press "Delete"` |
| Right-click menu | `playwright-cli click <ref> --button right` |
| Keyboard shortcut | `playwright-cli press "Control+z"` |
## Phase 4: Record Clean Demo — Reproduce Video (max 5 minutes)
Once the bug is confirmed, **stop the research video** and **close the research browser**:
```bash
playwright-cli video-stop
playwright-cli close
```
Now start a **fresh browser session** for the clean reproduce video (Video 2).
**IMPORTANT constraints:**
- **Max 5 minutes** — the reproduce video must be short and focused
- **No environment setup** — server, user, custom nodes are already set up from Phase 3. Just log in and go.
- **No exploration** — you already know the exact steps. Execute them quickly and precisely.
- **Start video recording immediately**, execute steps, stop. Don't leave the recording running while thinking.
1. **Open browser and start recording**:
```bash
playwright-cli open "http://127.0.0.1:8188"
playwright-cli video-start
```
2. **Execute only the minimal reproduction steps** — no exploration, no mistakes. Just the clean sequence that demonstrates the bug. You already know exactly what works from Phase 3.
3. **Take key screenshots** at critical moments:
```bash
playwright-cli screenshot --filename=/tmp/qa/before-bug.png
# ... trigger the bug ...
playwright-cli screenshot --filename=/tmp/qa/bug-visible.png
```
4. **Stop recording and close** immediately after the bug is demonstrated:
```bash
playwright-cli video-stop
playwright-cli close
```
## Phase 5: Generate Report
Create a reproduction report at `tmp/qa/reproduce-report.md`:
```markdown
# Issue Reproduction Report
- **Issue**: <issue_url>
- **Title**: <issue_title>
- **Date**: <today>
- **Status**: Reproduced / Not Reproduced / Partially Reproduced
## Environment
- ComfyUI Server: <url>
- OS: <os>
- Custom Nodes Installed: <list or "none">
- Settings Changed: <list or "none">
## Prerequisites
List everything that had to be set up before the bug could be triggered:
1. ...
2. ...
## Reproduction Steps
Minimal steps to reproduce (the clean sequence):
1. ...
2. ...
3. ...
## Expected Behavior
<from the issue>
## Actual Behavior
<what actually happened>
## Evidence
- Research video: `research-video/video.webm` (full exploration session)
- Reproduce video: `reproduce-video/video.webm` (clean minimal repro)
- Screenshots: `before-bug.png`, `bug-visible.png`
## Root Cause Analysis (if identified)
<code pointers, hypothesis about what's going wrong>
## Notes
<any additional observations, workarounds discovered, related issues>
```
## Handling Failures
If the bug **cannot be reproduced**:
1. Document what you tried and why it didn't work
2. Check if the issue was already fixed (search git log for related commits)
3. Check if it's environment-specific (OS, browser, specific version)
4. Set report status to "Not Reproduced" with detailed notes
5. The report is still valuable — it saves others from repeating the same investigation
## CI Integration
In CI, this skill runs as a Claude Code agent with:
- `ANTHROPIC_API_KEY` for Claude
- `GEMINI_API_KEY` for initial issue analysis (optional)
- ComfyUI server pre-started in the container
- `playwright-cli` pre-installed
The CI workflow:
1. Gemini generates a reproduce guide (markdown) from the issue
2. Claude agent receives the guide and runs this skill
3. Claude explores interactively, installs dependencies, retries
4. Claude records a clean demo once reproduced
5. Video and report are uploaded as artifacts

View File

@@ -0,0 +1,361 @@
---
name: comfy-qa
description: 'Comprehensive QA of ComfyUI frontend. Navigates all routes, tests all interactive features using playwright-cli, generates a report, and submits a draft PR. Works in CI and local environments, cross-platform.'
---
# ComfyUI Frontend QA Skill
Perform comprehensive quality assurance of the ComfyUI frontend application by navigating all routes, clicking interactive elements, and testing features. Generate a structured report and submit it as a draft PR.
## Prerequisites
- Node.js 22+
- `pnpm` package manager
- `gh` CLI (authenticated)
- `playwright-cli` (browser automation): `npm install -g @playwright/cli@latest`
## Step 1: Environment Detection & Setup
Detect the runtime environment and ensure the app is accessible.
### CI Environment
If `CI=true` is set:
1. The ComfyUI backend is pre-configured in the CI container (`ghcr.io/comfy-org/comfyui-ci-container`)
2. Frontend dist is already built and served by the backend
3. Server runs at `http://127.0.0.1:8188`
4. Skip user prompts — run fully automated
### Local Environment
If `CI` is not set:
1. **Ask the user**: "Is a ComfyUI server already running? If so, what URL? (default: http://127.0.0.1:8188)"
- If yes: use the provided URL
- If no: offer to start one:
```bash
# Option A: Use existing ComfyUI installation
# Ask for the path to ComfyUI, then:
cd <comfyui_path>
python main.py --cpu --multi-user --front-end-root <frontend_dist_path> &
# Option B: Build frontend and use preview server (no backend features)
pnpm build && pnpm preview &
```
2. Wait for server readiness by polling the URL (retry with 2s intervals, 60s timeout)
### Browser Automation Setup
Use **playwright-cli** for browser interaction via Bash commands:
```bash
playwright-cli open http://127.0.0.1:8188 # open browser and navigate
playwright-cli snapshot # capture snapshot with element refs
playwright-cli click e1 # click by element ref from snapshot
playwright-cli press Tab # keyboard shortcuts
playwright-cli screenshot --filename=f.png # save screenshot
```
playwright-cli is headless by default (CI-friendly). Each command outputs the current page snapshot with element references (`e1`, `e2`, …) that you use for subsequent `click`, `fill`, `hover` commands. Always run `snapshot` before interacting to get fresh refs.
For local dev servers behind proxies, adjust the URL accordingly (e.g., `https://[port].stukivx.xyz` pattern if configured).
## Step 2: QA Test Plan
Navigate to the application URL and systematically test each area below. For each test, record:
- **Status**: pass / fail / skip (with reason)
- **Notes**: any issues, unexpected behavior, or visual glitches
- **Screenshots**: take screenshots of failures or notable states
### 2.1 Application Load & Routes
| Test | Steps |
| ----------------- | ------------------------------------------------------------ |
| Root route loads | Navigate to `/` — GraphView should render with canvas |
| User select route | Navigate to `/user-select` — user selection UI should appear |
| Default redirect | If multi-user mode, `/` redirects to `/user-select` first |
| 404 handling | Navigate to `/nonexistent` — should handle gracefully |
### 2.2 Canvas & Graph View
| Test | Steps |
| ------------------------- | -------------------------------------------------------------- |
| Canvas renders | The LiteGraph canvas is visible and interactive |
| Pan canvas | Click and drag on empty canvas area |
| Zoom in/out | Use scroll wheel or Alt+=/Alt+- |
| Fit view | Press `.` key — canvas fits to content |
| Add node via double-click | Double-click canvas to open search, type "KSampler", select it |
| Add node via search | Open search box, find and add a node |
| Delete node | Select a node, press Delete key |
| Connect nodes | Drag from output slot to input slot |
| Disconnect nodes | Right-click a link and remove, or drag from connected slot |
| Multi-select | Shift+click or drag-select multiple nodes |
| Copy/Paste | Select nodes, Ctrl+C then Ctrl+V |
| Undo/Redo | Make changes, Ctrl+Z to undo, Ctrl+Y to redo |
| Node context menu | Right-click a node — menu appears with all expected options |
| Canvas context menu | Right-click empty canvas — menu appears |
### 2.3 Node Operations
| Test | Steps |
| ------------------- | ---------------------------------------------------------- |
| Bypass node | Select node, Ctrl+B — node shows bypass state |
| Mute node | Select node, Ctrl+M — node shows muted state |
| Collapse node | Select node, Alt+C — node collapses |
| Pin node | Select node, press P — node becomes pinned |
| Rename node | Double-click node title — edit mode activates |
| Node color | Right-click > Color — color picker works |
| Group nodes | Select multiple nodes, Ctrl+G — group created |
| Ungroup | Right-click group > Ungroup |
| Widget interactions | Toggle checkboxes, adjust sliders, type in text fields |
| Combo widget | Click dropdown widgets — options appear and are selectable |
### 2.4 Sidebar Tabs
| Test | Steps |
| ---------------------- | ------------------------------------------------------ |
| Workflows tab | Press W — workflows sidebar opens with saved workflows |
| Node Library tab | Press N — node library opens with categories |
| Model Library tab | Press M — model library opens |
| Assets tab | Press A — assets browser opens |
| Tab toggle | Press same key again — sidebar closes |
| Search in sidebar | Type in search box — results filter |
| Drag node from library | Drag a node from library onto canvas |
### 2.5 Topbar & Workflow Tabs
| Test | Steps |
| -------------------- | ------------------------------------------------------ |
| Workflow tab display | Current workflow name shown in tab bar |
| New workflow | Ctrl+N — new blank workflow created |
| Rename workflow | Double-click workflow tab |
| Tab context menu | Right-click workflow tab — menu with Close/Rename/etc. |
| Multiple tabs | Open multiple workflows, switch between them |
| Queue button | Click Queue/Run button — prompt queues |
| Batch count | Click batch count editor, change value |
| Menu hamburger | Click hamburger menu — options appear |
### 2.6 Settings Dialog
| Test | Steps |
| ---------------- | ---------------------------------------------------- |
| Open settings | Press Ctrl+, or click settings button |
| Settings tabs | Navigate through all setting categories |
| Change a setting | Toggle a boolean setting — it persists after closing |
| Search settings | Type in settings search box — results filter |
| Keybindings tab | Navigate to keybindings panel |
| About tab | Navigate to about panel — version info shown |
| Close settings | Press Escape or click close button |
### 2.7 Bottom Panel
| Test | Steps |
| ------------------- | -------------------------------------- |
| Toggle panel | Press Ctrl+` — bottom panel opens |
| Logs tab | Logs/terminal tab shows server output |
| Shortcuts tab | Shortcuts reference is displayed |
| Keybindings display | Press Ctrl+Shift+K — keybindings panel |
### 2.8 Execution & Queue
| Test | Steps |
| -------------- | ----------------------------------------------------- |
| Queue prompt | Load default workflow, click Queue — execution starts |
| Queue progress | Progress indicator shows during execution |
| Interrupt | Press Ctrl+Alt+Enter during execution — interrupts |
| Job history | Open job history sidebar — past executions listed |
| Clear history | Clear execution history via menu |
### 2.9 Workflow File Operations
| Test | Steps |
| --------------- | ------------------------------------------------- |
| Save workflow | Ctrl+S — workflow saves (check for prompt if new) |
| Open workflow | Ctrl+O — file picker or workflow browser opens |
| Export JSON | Menu > Export — workflow JSON downloads |
| Import workflow | Drag a .json workflow file onto canvas |
| Load default | Menu > Load Default — default workflow loads |
| Clear workflow | Menu > Clear — canvas clears (after confirmation) |
### 2.10 Advanced Features
| Test | Steps |
| --------------- | ------------------------------------------------- |
| Minimap | Alt+M — minimap toggle |
| Focus mode | Toggle focus mode |
| Canvas lock | Press H to lock, V to unlock |
| Link visibility | Ctrl+Shift+L — toggle links |
| Subgraph | Select nodes > Ctrl+Shift+E — convert to subgraph |
### 2.11 Error Handling
| Test | Steps |
| --------------------- | -------------------------------------------- |
| Missing nodes dialog | Load workflow with non-existent node types |
| Missing models dialog | Trigger missing model warning |
| Network error | Disconnect backend, verify graceful handling |
| Invalid workflow | Try loading malformed JSON |
### 2.12 Responsive & Accessibility
| Test | Steps |
| ------------------- | ------------------------------------- |
| Window resize | Resize browser window — layout adapts |
| Keyboard navigation | Tab through interactive elements |
| Sidebar resize | Drag sidebar edge to resize |
## Step 3: Generate Report
After completing all tests, generate a markdown report file.
### Report Location
```
docs/qa/YYYY-MM-DD-NNN-report.md
```
Where:
- `YYYY-MM-DD` is today's date
- `NNN` is a zero-padded increment index (001, 002, etc.)
To determine the increment, check existing files:
```bash
ls docs/qa/ | grep "$(date +%Y-%m-%d)" | wc -l
```
### Report Template
```markdown
# QA Report: ComfyUI Frontend
**Date**: YYYY-MM-DD
**Environment**: CI / Local (OS, Browser)
**Frontend Version**: (git sha or version)
**Agent**: Claude / Codex / Other
**Server URL**: http://...
## Summary
| Category | Pass | Fail | Skip | Total |
| --------------- | ---- | ---- | ---- | ----- |
| Routes & Load | | | | |
| Canvas | | | | |
| Node Operations | | | | |
| Sidebar | | | | |
| Topbar | | | | |
| Settings | | | | |
| Bottom Panel | | | | |
| Execution | | | | |
| File Operations | | | | |
| Advanced | | | | |
| Error Handling | | | | |
| Responsive | | | | |
| **Total** | | | | |
## Results
### Routes & Load
- [x] Root route loads — pass
- [ ] ...
### Canvas & Graph View
- [x] Canvas renders — pass
- [ ] ...
(repeat for each category)
## Issues Found
### Issue 1: [Title]
- **Severity**: critical / major / minor / cosmetic
- **Steps to reproduce**: ...
- **Expected**: ...
- **Actual**: ...
- **Screenshot**: (if available)
## Notes
Any additional observations, performance notes, or suggestions.
```
## Step 4: Commit and Push Report
### In CI (when `CI=true`)
Save the report directly to `$QA_ARTIFACTS` (the CI workflow uploads this as
an artifact and posts results as a PR comment). Do **not** commit, push, or
create a new PR.
### Local / interactive use
When running locally, create a draft PR after committing:
```bash
# Ensure on a feature branch
BRANCH_NAME="qa/$(date +%Y-%m-%d)-$(git rev-parse --short HEAD)"
git checkout -b "$BRANCH_NAME" 2>/dev/null || git checkout "$BRANCH_NAME"
git add docs/qa/
git commit -m "docs: add QA report $(date +%Y-%m-%d)
Automated QA report covering all frontend routes and features."
git push -u origin "$BRANCH_NAME"
# Create draft PR assigned to comfy-pr-bot
gh pr create \
--draft \
--title "QA Report: $(date +%Y-%m-%d)" \
--body "## QA Report
Automated frontend QA run covering all routes and interactive features.
See \`docs/qa/\` for the full report.
/cc @comfy-pr-bot" \
--assignee comfy-pr-bot
```
## Execution Notes
### Cross-Platform Considerations
- **Windows**: Use `pwsh` or `cmd` equivalents for shell commands. `gh` CLI works on all platforms.
- **macOS**: Keyboard shortcuts use Cmd instead of Ctrl in the actual app, but Playwright sends OS-appropriate keys.
- **Linux**: Primary CI platform. Screenshot baselines are Linux-only.
### Agent Compatibility
This skill uses **playwright-cli** (`@playwright/cli`) — a token-efficient CLI designed for coding agents. Install it once with `npm install -g @playwright/cli@latest`, then use `Bash` to run commands.
The key operations and their playwright-cli equivalents:
| Action | Command |
| ---------------- | ---------------------------------------- |
| Navigate to URL | `playwright-cli goto <url>` |
| Get element refs | `playwright-cli snapshot` |
| Click element | `playwright-cli click <ref>` |
| Type text | `playwright-cli fill <ref> <text>` |
| Press shortcut | `playwright-cli press <key>` |
| Take screenshot | `playwright-cli screenshot --filename=f` |
| Hover element | `playwright-cli hover <ref>` |
| Select dropdown | `playwright-cli select <ref> <value>` |
Snapshots return element references (`e1`, `e2`, …). Always run `snapshot` after navigation or major interactions to refresh refs before acting.
### Tips for Reliable QA
1. **Wait for page stability** before interacting — check that elements are visible and enabled
2. **Take a snapshot after each major navigation** to verify state
3. **Don't use fixed timeouts** — poll for expected conditions
4. **Record the full page snapshot** at the start for baseline comparison
5. **If a test fails**, document it and continue — don't abort the entire QA run
6. **Group related tests** — complete one category before moving to the next

View File

@@ -44,12 +44,17 @@ runs:
python -m pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
pip install wait-for-it
- name: Start ComfyUI server
if: ${{ inputs.launch_server == 'true' }}
shell: bash
working-directory: ComfyUI
env:
EXTRA_SERVER_PARAMS: ${{ inputs.extra_server_params }}
run: |
python main.py --cpu --multi-user --front-end-root ../dist ${{ inputs.extra_server_params }} &
wait-for-it --service 127.0.0.1:8188 -t 600
python main.py --cpu --multi-user --front-end-root ../dist $EXTRA_SERVER_PARAMS &
for i in $(seq 1 300); do
curl -sf http://127.0.0.1:8188/api/system_stats >/dev/null 2>&1 && echo "Server ready" && exit 0
sleep 2
done
echo "::error::ComfyUI server did not start within 600s" && exit 1

1100
.github/workflows/pr-qa.yaml vendored Normal file

File diff suppressed because it is too large Load Diff

5
.gitignore vendored
View File

@@ -99,4 +99,7 @@ vitest.config.*.timestamp*
# Weekly docs check output
/output.txt
.amp
.amp
.playwright-cli/
.playwright/
.claude/scheduled_tasks.lock

View File

@@ -9,6 +9,7 @@
"packages/registry-types/src/comfyRegistryTypes.ts",
"public/materialdesignicons.min.css",
"src/types/generatedManagerTypes.ts",
"**/__fixtures__/**/*.json"
"**/__fixtures__/**/*.json",
"scripts/qa-report-template.html"
]
}

View File

@@ -0,0 +1,73 @@
# QA Pipeline Troubleshooting
## Common Failures
### `set -euo pipefail` + grep with no match
**Symptom**: Deploy script crashes silently, badge shows FAILED.
**Cause**: `grep -oP` returns exit code 1 when no match. Under `pipefail`, this kills the entire script.
**Fix**: Always append `|| true` to grep pipelines in bash scripts.
### `__name is not defined` in page.evaluate
**Symptom**: Recording crashes with `ReferenceError: __name is not defined`.
**Cause**: tsx compiles arrow functions inside `page.evaluate()` with `__name` helpers. The browser context doesn't have these.
**Fix**: Use `page.addScriptTag({ content: '...' })` with plain JS strings instead of `page.evaluate(() => { ... })` with arrow functions.
### `Set<string>()` in page.evaluate
**Symptom**: Same `__name` error.
**Cause**: TypeScript generics like `new Set<string>()` get compiled incorrectly for browser context.
**Fix**: Use `new Set()` without type parameter.
### `zod/v4` import error
**Symptom**: `ERR_PACKAGE_PATH_NOT_EXPORTED: Package subpath './v4' is not defined`.
**Cause**: claude-agent-sdk depends on `zod/v4` internally, but the project's zod doesn't export it.
**Fix**: Import from `zod` (not `zod/v4`) in project code.
### `ERR_PNPM_LOCKFILE_CONFIG_MISMATCH`
**Symptom**: pnpm install fails with frozen lockfile mismatch.
**Cause**: Adding a new dependency changes the workspace catalog but lockfile wasn't regenerated.
**Fix**: Run `pnpm install` to regenerate lockfile, commit `pnpm-workspace.yaml` + `pnpm-lock.yaml`.
### `loadDefaultWorkflow` — "Load Default" not found
**Symptom**: Menu item "Load Default" not found, canvas stays empty.
**Cause**: The menu item name varies by version/locale. Menu navigation is fragile.
**Fix**: Use `app.resetToDefaultWorkflow()` JS API via `page.evaluate` instead of menu navigation.
### Model ID not found (Claude Agent SDK)
**Symptom**: `There's an issue with the selected model (claude-sonnet-4-6-20250514)`.
**Cause**: Dated model IDs like `claude-sonnet-4-6-20250514` don't exist.
**Fix**: Use `claude-sonnet-4-6` (no date suffix).
### Model not found (Gemini)
**Symptom**: 404 from Gemini API.
**Cause**: Preview model names like `gemini-2.5-flash-preview-05-20` expire.
**Fix**: Use `gemini-3-flash-preview` (latest stable).
## Badge Mismatches
### False REPRODUCED
**Symptom**: Badge says REPRODUCED but AI review says "could not reproduce".
**Root cause**: Grep pattern `reproduc|confirm` matches neutral words like "reproduction steps" or "could not be confirmed".
**Fix**: Use structured JSON verdict from AI (`## Verdict` section with `{"verdict": "..."}`) instead of regex matching the prose.
### INCONCLUSIVE feedback loop
**Symptom**: Once an issue gets INCONCLUSIVE, all future runs stay INCONCLUSIVE.
**Cause**: QA bot's own previous comments contain "INCONCLUSIVE", which gets fed back into pr-context.txt.
**Fix**: Filter out `github-actions[bot]` comments when building pr-context.
### pressKey with hold prevents event propagation
**Symptom**: BEFORE video doesn't show the bug (e.g., Escape doesn't close dialog).
**Cause**: `keyboard.down()` + 400ms sleep + `keyboard.up()` changes event timing. Some UI frameworks handle held keys differently than instant presses.
**Fix**: Use instant `keyboard.press()` for testing. Show key name via subtitle overlay instead.
## Cursor Not Visible
**Symptom**: No mouse cursor in recorded videos.
**Cause**: Headless Chrome doesn't render system cursor. The CSS cursor overlay relies on DOM `mousemove` events which Playwright CDP doesn't reliably trigger.
**Fix**: Monkey-patch `page.mouse.move/click/dblclick/down/up` to call `__moveCursor(x,y)` on the injected cursor div. This makes ALL mouse operations update the overlay.
## Agent Doesn't Perform Steps
**Symptom**: Agent opens menus and settings but never interacts with the canvas.
**Causes**:
1. `loadDefaultWorkflow` failed (no nodes on canvas)
2. Agent ran out of turn budget (30 turns / 120s)
3. Gemini Flash (old agent) ignores prompt hints
**Fix**: Use hybrid agent (Claude Sonnet 4.6 + Gemini vision). Claude's superior reasoning follows instructions precisely.

59
docs/qa/backlog.md Normal file
View File

@@ -0,0 +1,59 @@
# QA Pipeline Backlog
## Comparison Modes
### Type A: Same code, different settings (IMPLEMENTED)
Agent demonstrates both working (control) and broken (test) states in one session by toggling settings. E.g., Nodes 2.0 OFF → drag works, Nodes 2.0 ON → drag broken.
### Type B: Different commits
For regressions reported as "worked in vX.Y, broken in vX.Z":
- `qa-analyze-pr.ts` detects regression markers ("since v1.38", "after PR #1234")
- Pipeline checks out the old commit, records control video
- Records test video on current main
- Side-by-side comparison on report page (reuses PR before/after infra)
### Type C: Different browsers
For browser-specific bugs ("works on Chrome, broken on Firefox"):
- Run recording with different Playwright browser contexts
- Compare behavior across browsers in one report
## Agent Improvements
### TTS Narration
- OpenAI TTS (`tts-1`, nova voice) generates audio from agent reasoning
- Merged into video via ffmpeg at correct timestamps
- Currently in qa-record.ts but needs wiring into hybrid agent path
### Image/Screenshot Reading
- `qa-analyze-pr.ts` already downloads and sends images from issue bodies to Gemini
- Could also send them to the Claude agent as context ("the reporter showed this screenshot")
### Placeholder Page
- Deploy a status page immediately when CI starts
- Auto-refreshes every 30s until final report replaces it
- Shows spinner, CI link, badge
### Pre-seed Assets
- Upload test images via ComfyUI API before recording
- Enables reproduction of bugs requiring assets (#10424 zoom button)
### Environment-Dependent Issues
- #7942: needs custom TestNode — could install a test custom node pack in CI
- #9101: needs completed generation — could run with a tiny model checkpoint
## Cost Optimization
### Lazy A11y Tree
- `inspect(selector)` searches tree for specific element (~20 tokens)
- `getUIChanges()` diffs against previous snapshot (~100 tokens)
- vs dumping full tree every turn (~2000 tokens)
### Gemini Video vs Images
- 30s video clip: ~7,700 tokens (258 tok/s)
- 15 screenshots: ~19,500 tokens (1,300 tok/frame)
- Video is 2.5x cheaper and shows temporal changes
### Model Selection
- Claude Sonnet 4.6: $3/$15 per 1M in/out — best reasoning
- Gemini 2.5 Flash: $0.10/$0.40 per 1M — best vision-per-dollar
- Hybrid uses each where it's strongest

60
docs/qa/models.md Normal file
View File

@@ -0,0 +1,60 @@
# QA Pipeline Model Selection
## Current Configuration
| Script | Role | Model | Why |
| --------------------- | -------------------------------------- | ------------------------ | --------------------------------------------------------------------------------------------------- |
| `qa-analyze-pr.ts` | PR/issue analysis, QA guide generation | `gemini-3.1-pro-preview` | Needs deep reasoning over PR diffs, screenshots, and issue threads |
| `qa-record.ts` | Playwright step generation | `gemini-3.1-pro-preview` | Step quality is critical — must understand ComfyUI's canvas UI and produce precise action sequences |
| `qa-video-review.ts` | Video comparison review | `gemini-3-flash-preview` | Video analysis with structured output; flash is sufficient and faster |
| `qa-generate-test.ts` | Regression test generation | `gemini-3-flash-preview` | Code generation from QA reports; flash handles this well |
## Model Comparison
### Gemini 3.1 Pro vs GPT-5.4
| | Gemini 3.1 Pro Preview | GPT-5.4 |
| ----------------- | ---------------------- | ----------------- |
| Context window | 1M tokens | 1M tokens |
| Max output | 65K tokens | 128K tokens |
| Video input | Yes | No |
| Image input | Yes | Yes |
| Audio input | Yes | No |
| Pricing (input) | $2/1M tokens | $2.50/1M tokens |
| Pricing (output) | $12/1M tokens | $15/1M tokens |
| Function calling | Yes | Yes |
| Code execution | Yes | Yes (interpreter) |
| Structured output | Yes | Yes |
**Why Gemini over GPT for QA:**
- Native video understanding (can review recordings directly)
- Lower cost at comparable quality
- Native multimodal input (screenshots, videos, audio from issue threads)
- Better price/performance for high-volume CI usage
### Gemini 3 Flash vs GPT-5.4 Mini
| | Gemini 3 Flash Preview | GPT-5.4 Mini |
| ---------------- | ---------------------- | --------------- |
| Context window | 1M tokens | 1M tokens |
| Pricing (input) | $0.50/1M tokens | $0.40/1M tokens |
| Pricing (output) | $3/1M tokens | $1.60/1M tokens |
| Video input | Yes | No |
**Why Gemini Flash for video review:**
- Video input support is required — GPT models cannot process video files
- Good enough quality for structured comparison reports
## Upgrade History
| Date | Change | Reason |
| ---------- | ---------------------------------------------------------------- | -------------------------------------------------------------------------- |
| 2026-03-24 | `gemini-2.5-flash``gemini-3.1-pro-preview` (record) | Shallow step generation; pro model needed for complex ComfyUI interactions |
| 2026-03-24 | `gemini-2.5-pro``gemini-3.1-pro-preview` (analyze) | Keep analysis on latest pro |
| 2026-03-24 | `gemini-2.5-flash``gemini-3-flash-preview` (review, test-gen) | Latest flash for cost-efficient tasks |
## Override
All scripts accept `--model <name>` to override the default. Pass any Gemini model ID.

View File

@@ -40,7 +40,7 @@ const config: KnipConfig = {
ignoreDependencies: ['@comfyorg/design-system', '@vercel/analytics']
}
},
ignoreBinaries: ['python3'],
ignoreBinaries: ['python3', 'wrangler'],
ignoreDependencies: [
// Weird importmap things
'@iconify-json/lucide',

View File

@@ -39,6 +39,7 @@
"preinstall": "pnpm dlx only-allow pnpm",
"prepare": "husky || true && git config blame.ignoreRevsFile .git-blame-ignore-revs || true",
"preview": "nx preview",
"qa:video-review": "tsx scripts/qa-video-review.ts",
"storybook": "nx storybook",
"storybook:desktop": "nx run @comfyorg/desktop-ui:storybook",
"stylelint:fix": "stylelint --cache --fix '{apps,packages,src}/**/*.{css,vue}'",
@@ -120,7 +121,9 @@
"zod-validation-error": "catalog:"
},
"devDependencies": {
"@anthropic-ai/claude-agent-sdk": "catalog:",
"@eslint/js": "catalog:",
"@google/generative-ai": "catalog:",
"@intlify/eslint-plugin-vue-i18n": "catalog:",
"@lobehub/i18n-cli": "catalog:",
"@nx/eslint": "catalog:",

547
pnpm-lock.yaml generated

File diff suppressed because it is too large Load Diff

View File

@@ -4,10 +4,12 @@ packages:
catalog:
'@alloc/quick-lru': ^5.2.0
'@anthropic-ai/claude-agent-sdk': ^0.2.85
'@astrojs/vue': ^5.0.0
'@comfyorg/comfyui-electron-types': 0.6.2
'@eslint/js': ^9.39.1
'@formkit/auto-animate': ^0.9.0
'@google/generative-ai': ^0.24.1
'@iconify-json/lucide': ^1.1.178
'@iconify/json': ^2.2.380
'@iconify/tailwind4': ^1.2.0

553
scripts/qa-agent.ts Normal file
View File

@@ -0,0 +1,553 @@
#!/usr/bin/env tsx
/**
* Hybrid QA Agent — Claude Sonnet 4.6 brain + Gemini 3.1 Pro eyes
*
* Claude plans and reasons. Gemini watches the video buffer and describes
* what it sees. The agent uses 4 tools:
* - observe(seconds, focus) — Gemini reviews last N seconds of video
* - inspect(selector) — search accessibility tree for element state
* - perform(action, params) — execute Playwright action
* - done(verdict, summary) — finish with result
*/
import type { Page } from '@playwright/test'
import { query, tool, createSdkMcpServer } from '@anthropic-ai/claude-agent-sdk'
import { GoogleGenerativeAI } from '@google/generative-ai'
import { z } from 'zod'
import { execSync } from 'child_process'
import { mkdirSync, writeFileSync, readFileSync } from 'fs'
// ── Types ──
interface AgentOptions {
page: Page
issueContext: string
qaGuide: string
outputDir: string
geminiApiKey: string
anthropicApiKey?: string // Optional — Agent SDK auto-detects Claude Code session
maxTurns?: number
timeBudgetMs?: number
}
interface ScreenshotFrame {
timestampMs: number
base64: string
}
// ── Video buffer ──
const FRAME_INTERVAL_MS = 2000
const MAX_BUFFER_FRAMES = 30 // 60 seconds at 2fps
class VideoBuffer {
private frames: ScreenshotFrame[] = []
private startMs = Date.now()
private intervalId: ReturnType<typeof setInterval> | null = null
private page: Page
constructor(page: Page) {
this.page = page
}
start() {
this.startMs = Date.now()
this.intervalId = setInterval(async () => {
try {
const buf = await this.page.screenshot({
type: 'jpeg',
quality: 60
})
this.frames.push({
timestampMs: Date.now() - this.startMs,
base64: buf.toString('base64')
})
if (this.frames.length > MAX_BUFFER_FRAMES) {
this.frames.shift()
}
} catch {
// page may be navigating
}
}, FRAME_INTERVAL_MS)
}
stop() {
if (this.intervalId) clearInterval(this.intervalId)
}
getLastFrames(seconds: number): ScreenshotFrame[] {
const cutoffMs = Date.now() - this.startMs - seconds * 1000
return this.frames.filter((f) => f.timestampMs >= cutoffMs)
}
async buildVideoClip(
seconds: number,
outputDir: string
): Promise<Buffer | null> {
const frames = this.getLastFrames(seconds)
if (frames.length < 2) return null
const clipDir = `${outputDir}/.clip-frames`
mkdirSync(clipDir, { recursive: true })
// Write frames as numbered JPEGs
for (let i = 0; i < frames.length; i++) {
writeFileSync(
`${clipDir}/frame-${String(i).padStart(4, '0')}.jpg`,
Buffer.from(frames[i].base64, 'base64')
)
}
// Compose into video with ffmpeg
const clipPath = `${outputDir}/.observe-clip.mp4`
try {
const fps = Math.max(1, Math.round(frames.length / seconds))
execSync(
`ffmpeg -y -framerate ${fps} -i "${clipDir}/frame-%04d.jpg" ` +
`-c:v libx264 -preset ultrafast -pix_fmt yuv420p "${clipPath}" 2>/dev/null`,
{ timeout: 10000 }
)
return readFileSync(clipPath)
} catch {
return null
}
}
}
// ── Gemini Vision ──
async function geminiObserve(
videoBuffer: VideoBuffer,
seconds: number,
focus: string,
outputDir: string,
geminiApiKey: string
): Promise<string> {
const genAI = new GoogleGenerativeAI(geminiApiKey)
const model = genAI.getGenerativeModel({
model: 'gemini-3-flash-preview'
})
// Try video clip first, fall back to last frame
const clip = await videoBuffer.buildVideoClip(seconds, outputDir)
const parts: Array<
{ text: string } | { inlineData: { mimeType: string; data: string } }
> = [
{
text: `You are observing a ComfyUI frontend session. Focus on: ${focus}\n\nDescribe what happened in the last ${seconds} seconds. Be specific about UI state, actions taken, and results.`
}
]
if (clip) {
parts.push({
inlineData: { mimeType: 'video/mp4', data: clip.toString('base64') }
})
} else {
// Fall back to last frame
const frames = videoBuffer.getLastFrames(seconds)
if (frames.length > 0) {
parts.push({
inlineData: {
mimeType: 'image/jpeg',
data: frames[frames.length - 1].base64
}
})
}
}
const result = await model.generateContent(parts)
return result.response.text().trim()
}
// ── Accessibility tree helpers ──
interface A11yNode {
role: string
name: string
value?: string
checked?: boolean
disabled?: boolean
children?: A11yNode[]
}
function searchA11y(node: A11yNode | null, selector: string): A11yNode | null {
if (!node) return null
const sel = selector.toLowerCase()
// Match by name or role
if (
node.name?.toLowerCase().includes(sel) ||
node.role?.toLowerCase().includes(sel)
) {
return node
}
// Recurse into children
if (node.children) {
for (const child of node.children) {
const found = searchA11y(child, selector)
if (found) return found
}
}
return null
}
function flattenA11y(node: A11yNode | null, depth = 0): string {
if (!node || depth > 3) return ''
const parts: string[] = []
const indent = ' '.repeat(depth)
const attrs: string[] = []
if (node.value !== undefined) attrs.push(`value="${node.value}"`)
if (node.checked !== undefined) attrs.push(`checked=${node.checked}`)
if (node.disabled) attrs.push('disabled')
const attrStr = attrs.length ? ` [${attrs.join(', ')}]` : ''
if (node.name || attrs.length) {
parts.push(`${indent}${node.role}: ${node.name || '(unnamed)'}${attrStr}`)
}
if (node.children) {
for (const child of node.children) {
parts.push(flattenA11y(child, depth + 1))
}
}
return parts.filter(Boolean).join('\n')
}
// ── Subtitle overlay ──
async function showSubtitle(page: Page, text: string, turn: number) {
const encoded = encodeURIComponent(
text.slice(0, 120).replace(/'/g, "\\'").replace(/\n/g, ' ')
)
await page.addScriptTag({
content: `(function(){
var id='qa-subtitle';
var el=document.getElementById(id);
if(!el){
el=document.createElement('div');
el.id=id;
Object.assign(el.style,{position:'fixed',bottom:'32px',left:'50%',transform:'translateX(-50%)',zIndex:'2147483646',maxWidth:'90%',padding:'6px 14px',borderRadius:'6px',background:'rgba(0,0,0,0.8)',color:'rgba(255,255,255,0.95)',fontSize:'12px',fontFamily:'system-ui,sans-serif',fontWeight:'400',lineHeight:'1.4',pointerEvents:'none',textAlign:'center',transition:'opacity 0.3s',whiteSpace:'normal'});
document.body.appendChild(el);
}
var msg=decodeURIComponent('${encoded}');
el.textContent='['+${turn}+'] '+msg;
el.style.opacity='1';
})()`
})
}
// ── Main agent ──
export async function runHybridAgent(opts: AgentOptions): Promise<{
verdict: string
summary: string
}> {
const {
page,
issueContext,
qaGuide,
outputDir,
geminiApiKey,
anthropicApiKey
} = opts
const maxTurns = opts.maxTurns ?? 30
const timeBudgetMs = opts.timeBudgetMs ?? 120_000
// Start video buffer
const videoBuffer = new VideoBuffer(page)
videoBuffer.start()
let lastA11ySnapshot: A11yNode | null = null
let agentDone = false
let finalVerdict = 'INCONCLUSIVE'
let finalSummary = 'Agent did not complete'
let turnCount = 0
const startTime = Date.now()
// Import executeAction from qa-record.ts (shared Playwright helpers)
// For now, inline the action execution
const { executeAction } = await import('./qa-record.js')
// Define tools
const observeTool = tool(
'observe',
'Watch the last N seconds of screen recording through Gemini vision. Use this to verify visual state, check if actions had visible effect, or inspect visual bugs. Pass a focused question so Gemini knows what to look for.',
{
seconds: z
.number()
.min(3)
.max(60)
.default(10)
.describe('How many seconds to look back'),
focus: z
.string()
.describe(
'What to look for — be specific, e.g. "Did the Nodes 2.0 toggle switch to ON?"'
)
},
async (args) => {
const description = await geminiObserve(
videoBuffer,
args.seconds,
args.focus,
outputDir,
geminiApiKey
)
return { content: [{ type: 'text' as const, text: description }] }
}
)
const inspectTool = tool(
'inspect',
'Search the accessibility tree for a specific UI element. Returns its role, name, value, checked state. Fast and precise — use this to verify element state without vision.',
{
selector: z
.string()
.describe(
'Element name or role to search for, e.g. "Nodes 2.0", "KSampler seed", "Run button"'
)
},
async (args) => {
try {
const snapshot =
(await page.accessibility.snapshot()) as A11yNode | null
lastA11ySnapshot = snapshot
const found = searchA11y(snapshot, args.selector)
if (found) {
return {
content: [
{
type: 'text' as const,
text: JSON.stringify({
role: found.role,
name: found.name,
value: found.value,
checked: found.checked,
disabled: found.disabled,
hasChildren: Boolean(found.children?.length)
})
}
]
}
}
// Return nearby elements if exact match not found
const tree = flattenA11y(snapshot, 0).slice(0, 2000)
return {
content: [
{
type: 'text' as const,
text: `Element "${args.selector}" not found. Available elements:\n${tree}`
}
]
}
} catch (e) {
return {
content: [
{
type: 'text' as const,
text: `inspect failed: ${e instanceof Error ? e.message : e}`
}
]
}
}
}
)
const performTool = tool(
'perform',
`Execute a Playwright action on the ComfyUI page. Available actions:
- click(text): click element by visible text
- clickCanvas(x, y): click at coordinates
- rightClickCanvas(x, y): right-click at coordinates
- doubleClick(x, y): double-click at coordinates
- dragCanvas(fromX, fromY, toX, toY): drag between points
- scrollCanvas(x, y, deltaY): scroll wheel (negative=zoom in)
- pressKey(key): press keyboard key (Escape, Enter, Delete, Control+c, etc.)
- fillDialog(text): fill input and press Enter
- openMenu(): open hamburger menu
- hoverMenuItem(label): hover menu item
- clickMenuItem(label): click submenu item
- setSetting(id, value): change a ComfyUI setting
- loadDefaultWorkflow(): load the 7-node default workflow
- openSettings(): open Settings dialog
- reload(): reload the page
- addNode(nodeName, x, y): add a node via search
- copyPaste(x, y): Ctrl+C then Ctrl+V at coords
- holdKeyAndDrag(key, fromX, fromY, toX, toY): hold key while dragging
- screenshot(name): take a named screenshot`,
{
action: z.string().describe('Action name'),
params: z
.record(z.unknown())
.optional()
.describe('Action parameters as key-value pairs')
},
async (args) => {
turnCount++
if (turnCount > maxTurns || Date.now() - startTime > timeBudgetMs) {
return {
content: [
{
type: 'text' as const,
text: `Budget exceeded (${turnCount}/${maxTurns} turns, ${Math.round((Date.now() - startTime) / 1000)}s). Use done() now.`
}
]
}
}
// Build TestAction object from args
const actionObj = { action: args.action, ...args.params } as Parameters<
typeof executeAction
>[1]
try {
const result = await executeAction(page, actionObj, outputDir)
// Show subtitle
await showSubtitle(
page,
`${args.action}: ${result.success ? 'OK' : result.error}`,
turnCount
)
return {
content: [
{
type: 'text' as const,
text: result.success
? `Action "${args.action}" succeeded.`
: `Action "${args.action}" FAILED: ${result.error}`
}
]
}
} catch (e) {
return {
content: [
{
type: 'text' as const,
text: `Action "${args.action}" threw: ${e instanceof Error ? e.message : e}`
}
]
}
}
}
)
const doneTool = tool(
'done',
'Signal that reproduction is complete. Call this when you have either confirmed the bug or determined it cannot be reproduced.',
{
verdict: z
.enum(['REPRODUCED', 'NOT_REPRODUCIBLE', 'INCONCLUSIVE'])
.describe('Final verdict'),
summary: z
.string()
.describe(
'One paragraph: what you did, what you observed, and why you reached this verdict'
)
},
async (args) => {
agentDone = true
finalVerdict = args.verdict
finalSummary = args.summary
await showSubtitle(page, `DONE: ${args.verdict}`, turnCount)
return {
content: [
{
type: 'text' as const,
text: `Agent finished: ${args.verdict}`
}
]
}
}
)
// Create MCP server with our tools
const server = createSdkMcpServer({
name: 'qa-agent',
version: '1.0.0',
tools: [observeTool, inspectTool, performTool, doneTool]
})
// Build system prompt
const systemPrompt = `You are a senior QA engineer reproducing a reported bug in ComfyUI, a node-based AI image generation tool.
## Your tools
- observe(seconds, focus) — Gemini AI watches the last N seconds of screen recording and answers your focused question. Use for visual verification.
- inspect(selector) — Search the accessibility tree for a specific element's state. Use for precise state checks (toggle on/off, value, disabled).
- perform(action, params) — Execute a Playwright action on the browser.
- done(verdict, summary) — Finish with your conclusion.
## Strategy
1. Start by understanding the issue, then plan your reproduction steps.
2. Use perform() to take actions. After each action, use inspect() to verify state or observe() for visual confirmation.
3. If a setting change doesn't seem to take effect, try reload() then verify again.
4. Focus on the specific bug — don't explore randomly.
5. Take screenshots at key moments for the video evidence.
6. When you've confirmed or ruled out the bug, call done().
## Control/Test Comparison (IMPORTANT)
When a bug is triggered by a specific setting, mode, or configuration:
1. **CONTROL phase**: First demonstrate the WORKING state. Disable the trigger (e.g., Nodes 2.0 OFF), perform the action, take a screenshot labeled "control-*", verify it works.
2. **TEST phase**: Then enable the trigger (e.g., Nodes 2.0 ON), reload if needed, perform the SAME action, take a screenshot labeled "test-*", verify it's broken.
3. In your done() summary, explicitly compare: "With X OFF, behavior was Y. With X ON, behavior was Z."
This contrast is critical evidence — it proves the bug is caused by the specific setting, not a general issue. Always try to show both states when possible.
Examples of control/test pairs:
- Nodes 2.0 OFF → ON (for node rendering, widget, drag bugs)
- Default theme → specific theme (for visual bugs)
- Single node → multiple overlapping nodes (for z-index bugs)
- Empty workflow → loaded workflow (for state bugs)
## ComfyUI Layout (1280×720 viewport)
- Canvas with node graph centered at ~(640, 400)
- Hamburger menu top-left (C logo)
- Sidebar: Workflows, Node Library, Models
- Default workflow nodes: Load Checkpoint (~150,300), CLIP Text Encode (~450,250/450), Empty Latent (~450,600), KSampler (~750,350), VAE Decode (~1000,350), Save Image (~1200,350)
${qaGuide ? `## QA Guide\n${qaGuide}\n` : ''}
## Issue to Reproduce
${issueContext}`
// Run the agent
console.warn('Starting hybrid agent (Claude Sonnet 4.6 + Gemini vision)...')
try {
for await (const message of query({
prompt:
'Reproduce the reported bug. Start by reading the issue context in your system prompt, then use your tools to interact with the ComfyUI browser session.',
options: {
model: 'claude-sonnet-4-6',
systemPrompt,
...(anthropicApiKey ? { apiKey: anthropicApiKey } : {}),
maxTurns,
mcpServers: { 'qa-agent': server },
allowedTools: [
'mcp__qa-agent__observe',
'mcp__qa-agent__inspect',
'mcp__qa-agent__perform',
'mcp__qa-agent__done'
]
}
})) {
if (message.type === 'assistant' && message.message?.content) {
for (const block of message.message.content) {
if ('text' in block && block.text) {
console.warn(` Claude: ${block.text.slice(0, 200)}`)
}
if ('name' in block) {
console.warn(
` Tool: ${block.name}(${JSON.stringify(block.input).slice(0, 100)})`
)
}
}
}
if (agentDone) break
}
} catch (e) {
console.warn(`Agent error: ${e instanceof Error ? e.message : e}`)
}
videoBuffer.stop()
return { verdict: finalVerdict, summary: finalSummary }
}

View File

@@ -0,0 +1,84 @@
import { describe, expect, it } from 'vitest'
import { extractMediaUrls } from './qa-analyze-pr'
describe('extractMediaUrls', () => {
it('extracts markdown image URLs', () => {
const text = '![screenshot](https://example.com/image.png)'
expect(extractMediaUrls(text)).toEqual(['https://example.com/image.png'])
})
it('extracts multiple markdown images', () => {
const text = [
'![before](https://example.com/before.png)',
'Some text',
'![after](https://example.com/after.jpg)'
].join('\n')
expect(extractMediaUrls(text)).toEqual([
'https://example.com/before.png',
'https://example.com/after.jpg'
])
})
it('extracts raw URLs with media extensions', () => {
const text = 'Check this: https://cdn.example.com/demo.mp4 for details'
expect(extractMediaUrls(text)).toEqual(['https://cdn.example.com/demo.mp4'])
})
it('extracts GitHub user-attachments URLs', () => {
const text =
'https://github.com/user-attachments/assets/abc12345-6789-0def-1234-567890abcdef'
expect(extractMediaUrls(text)).toEqual([
'https://github.com/user-attachments/assets/abc12345-6789-0def-1234-567890abcdef'
])
})
it('extracts private-user-images URLs', () => {
const text =
'https://private-user-images.githubusercontent.com/12345/abcdef-1234?jwt=token123'
expect(extractMediaUrls(text)).toEqual([
'https://private-user-images.githubusercontent.com/12345/abcdef-1234?jwt=token123'
])
})
it('extracts URLs with query parameters', () => {
const text = 'https://example.com/image.png?w=800&h=600'
expect(extractMediaUrls(text)).toEqual([
'https://example.com/image.png?w=800&h=600'
])
})
it('deduplicates URLs', () => {
const text = [
'![img](https://example.com/same.png)',
'![img2](https://example.com/same.png)',
'Also https://example.com/same.png'
].join('\n')
expect(extractMediaUrls(text)).toEqual(['https://example.com/same.png'])
})
it('returns empty array for empty input', () => {
expect(extractMediaUrls('')).toEqual([])
})
it('returns empty array for text with no media URLs', () => {
expect(extractMediaUrls('Just some text without any URLs')).toEqual([])
})
it('handles mixed media types', () => {
const text = [
'![screen](https://example.com/screenshot.png)',
'Video: https://example.com/demo.webm',
'![gif](https://example.com/animation.gif)'
].join('\n')
const urls = extractMediaUrls(text)
expect(urls).toContain('https://example.com/screenshot.png')
expect(urls).toContain('https://example.com/demo.webm')
expect(urls).toContain('https://example.com/animation.gif')
})
it('ignores non-http URLs in markdown', () => {
const text = '![local](./local-image.png)'
expect(extractMediaUrls(text)).toEqual([])
})
})

799
scripts/qa-analyze-pr.ts Normal file
View File

@@ -0,0 +1,799 @@
#!/usr/bin/env tsx
/**
* QA PR Analysis Script
*
* Deeply analyzes a PR using Gemini Pro to generate targeted QA guides
* for before/after recording sessions. Fetches PR thread, extracts media,
* and produces structured test plans.
*
* Usage:
* pnpm exec tsx scripts/qa-analyze-pr.ts \
* --pr-number 10270 \
* --repo owner/repo \
* --output-dir qa-guides/ \
* [--model gemini-3.1-pro-preview]
*
* Env: GEMINI_API_KEY (required)
*/
import { execSync } from 'node:child_process'
import { mkdirSync, readFileSync, writeFileSync } from 'node:fs'
import { resolve } from 'node:path'
import { fileURLToPath } from 'node:url'
import { GoogleGenerativeAI } from '@google/generative-ai'
// ── Types ──
interface QaGuideStep {
action: string
description: string
expected_before?: string
expected_after?: string
}
interface QaGuide {
summary: string
test_focus: string
prerequisites: string[]
steps: QaGuideStep[]
visual_checks: string[]
}
interface PrThread {
title: string
body: string
labels: string[]
issueComments: string[]
reviewComments: string[]
reviews: string[]
diff: string
}
type TargetType = 'pr' | 'issue'
interface Options {
prNumber: string
repo: string
outputDir: string
model: string
apiKey: string
mediaBudgetBytes: number
maxVideoBytes: number
type: TargetType
}
// ── CLI parsing ──
function parseArgs(): Options {
const args = process.argv.slice(2)
const opts: Partial<Options> = {
model: 'gemini-3.1-pro-preview',
apiKey: process.env.GEMINI_API_KEY || '',
mediaBudgetBytes: 20 * 1024 * 1024,
maxVideoBytes: 10 * 1024 * 1024,
type: 'pr'
}
for (let i = 0; i < args.length; i++) {
switch (args[i]) {
case '--pr-number':
opts.prNumber = args[++i]
break
case '--repo':
opts.repo = args[++i]
break
case '--output-dir':
opts.outputDir = args[++i]
break
case '--model':
opts.model = args[++i]
break
case '--type':
opts.type = args[++i] as TargetType
break
case '--help':
console.warn(
'Usage: qa-analyze-pr.ts --pr-number <num> --repo <owner/repo> --output-dir <path> [--model <model>] [--type pr|issue]'
)
process.exit(0)
}
}
if (!opts.prNumber || !opts.repo || !opts.outputDir) {
console.error(
'Required: --pr-number <num> --repo <owner/repo> --output-dir <path>'
)
process.exit(1)
}
if (!opts.apiKey) {
console.error('GEMINI_API_KEY environment variable is required')
process.exit(1)
}
return opts as Options
}
// ── PR thread fetching ──
function ghExec(cmd: string): string {
try {
return execSync(cmd, {
encoding: 'utf-8',
timeout: 30_000,
stdio: ['pipe', 'pipe', 'pipe']
}).trim()
} catch (err) {
console.warn(`gh command failed: ${cmd}`)
console.warn((err as Error).message)
return ''
}
}
function fetchPrThread(prNumber: string, repo: string): PrThread {
console.warn('Fetching PR thread...')
const prView = ghExec(
`gh pr view ${prNumber} --repo ${repo} --json title,body,labels`
)
const prData = prView
? JSON.parse(prView)
: { title: '', body: '', labels: [] }
const issueCommentsRaw = ghExec(
`gh api repos/${repo}/issues/${prNumber}/comments --paginate`
)
const issueComments: string[] = issueCommentsRaw
? JSON.parse(issueCommentsRaw).map((c: { body: string }) => c.body)
: []
const reviewCommentsRaw = ghExec(
`gh api repos/${repo}/pulls/${prNumber}/comments --paginate`
)
const reviewComments: string[] = reviewCommentsRaw
? JSON.parse(reviewCommentsRaw).map((c: { body: string }) => c.body)
: []
const reviewsRaw = ghExec(
`gh api repos/${repo}/pulls/${prNumber}/reviews --paginate`
)
const reviews: string[] = reviewsRaw
? JSON.parse(reviewsRaw)
.filter((r: { body: string }) => r.body)
.map((r: { body: string }) => r.body)
: []
const diff = ghExec(`gh pr diff ${prNumber} --repo ${repo}`)
console.warn(
`PR #${prNumber}: "${prData.title}" | ` +
`${issueComments.length} issue comments, ` +
`${reviewComments.length} review comments, ` +
`${reviews.length} reviews, ` +
`diff: ${diff.length} chars`
)
return {
title: prData.title || '',
body: prData.body || '',
labels: (prData.labels || []).map((l: { name: string }) => l.name),
issueComments,
reviewComments,
reviews,
diff
}
}
interface IssueThread {
title: string
body: string
labels: string[]
comments: string[]
}
function fetchIssueThread(issueNumber: string, repo: string): IssueThread {
console.warn('Fetching issue thread...')
const issueView = ghExec(
`gh issue view ${issueNumber} --repo ${repo} --json title,body,labels`
)
const issueData = issueView
? JSON.parse(issueView)
: { title: '', body: '', labels: [] }
const commentsRaw = ghExec(
`gh api repos/${repo}/issues/${issueNumber}/comments --paginate`
)
const comments: string[] = commentsRaw
? JSON.parse(commentsRaw).map((c: { body: string }) => c.body)
: []
console.warn(
`Issue #${issueNumber}: "${issueData.title}" | ` +
`${comments.length} comments`
)
return {
title: issueData.title || '',
body: issueData.body || '',
labels: (issueData.labels || []).map((l: { name: string }) => l.name),
comments
}
}
// ── Media extraction ──
const MEDIA_EXTENSIONS = /\.(png|jpg|jpeg|gif|webp|mp4|webm|mov)$/i
const MEDIA_URL_PATTERNS = [
// Markdown images: ![alt](url)
/!\[[^\]]*\]\(([^)]+)\)/g,
// GitHub user-attachments
/https:\/\/github\.com\/user-attachments\/assets\/[a-f0-9-]+/g,
// Private user images
/https:\/\/private-user-images\.githubusercontent\.com\/[^\s)"]+/g,
// Raw URLs with media extensions (standalone or in text)
/(?<!="|=')https?:\/\/[^\s)<>"]+\.(?:png|jpg|jpeg|gif|webp|mp4|webm|mov)(?:\?[^\s)<>"]*)?/gi
]
export function extractMediaUrls(text: string): string[] {
if (!text) return []
const urls = new Set<string>()
for (const pattern of MEDIA_URL_PATTERNS) {
// Reset lastIndex for global patterns
pattern.lastIndex = 0
let match: RegExpExecArray | null
while ((match = pattern.exec(text)) !== null) {
// For markdown images, the URL is in capture group 1
const url = match[1] || match[0]
// Clean trailing markdown/html artifacts
const cleaned = url.replace(/[)>"'\s]+$/, '')
if (cleaned.startsWith('http')) {
urls.add(cleaned)
}
}
}
return [...urls]
}
// ── Media downloading ──
const ALLOWED_MEDIA_DOMAINS = [
'github.com',
'raw.githubusercontent.com',
'user-images.githubusercontent.com',
'private-user-images.githubusercontent.com',
'objects.githubusercontent.com',
'github.githubassets.com'
]
function isAllowedMediaDomain(url: string): boolean {
try {
const hostname = new URL(url).hostname
return ALLOWED_MEDIA_DOMAINS.some(
(domain) => hostname === domain || hostname.endsWith(`.${domain}`)
)
} catch {
return false
}
}
async function downloadMedia(
urls: string[],
outputDir: string,
budgetBytes: number,
maxVideoBytes: number
): Promise<Array<{ path: string; mimeType: string }>> {
const downloaded: Array<{ path: string; mimeType: string }> = []
let totalBytes = 0
const mediaDir = resolve(outputDir, 'media')
mkdirSync(mediaDir, { recursive: true })
for (const url of urls) {
if (totalBytes >= budgetBytes) {
console.warn(
`Media budget exhausted (${totalBytes} bytes), skipping rest`
)
break
}
if (!isAllowedMediaDomain(url)) {
console.warn(`Skipping non-GitHub URL: ${url.slice(0, 80)}`)
continue
}
try {
const response = await fetch(url, {
signal: AbortSignal.timeout(15_000),
headers: { Accept: 'image/*,video/*' },
redirect: 'follow'
})
if (!response.ok) {
console.warn(`Failed to download ${url}: ${response.status}`)
continue
}
const contentLength = response.headers.get('content-length')
if (contentLength) {
const declaredSize = Number.parseInt(contentLength, 10)
if (declaredSize > budgetBytes - totalBytes) {
console.warn(
`Content-Length ${declaredSize} would exceed budget, skipping ${url}`
)
continue
}
}
const contentType = response.headers.get('content-type') || ''
const buffer = Buffer.from(await response.arrayBuffer())
// Skip oversized videos
const isVideo =
contentType.startsWith('video/') || /\.(mp4|webm|mov)$/i.test(url)
if (isVideo && buffer.length > maxVideoBytes) {
console.warn(
`Skipping large video ${url} (${(buffer.length / 1024 / 1024).toFixed(1)}MB > ${(maxVideoBytes / 1024 / 1024).toFixed(0)}MB cap)`
)
continue
}
if (totalBytes + buffer.length > budgetBytes) {
console.warn(`Would exceed budget, skipping ${url}`)
continue
}
const ext = guessExtension(url, contentType)
const filename = `media-${downloaded.length}${ext}`
const filepath = resolve(mediaDir, filename)
writeFileSync(filepath, buffer)
totalBytes += buffer.length
const mimeType = contentType.split(';')[0].trim() || guessMimeType(ext)
downloaded.push({ path: filepath, mimeType })
console.warn(
`Downloaded: ${url.slice(0, 80)}... (${(buffer.length / 1024).toFixed(0)}KB)`
)
} catch (err) {
console.warn(`Failed to download ${url}: ${(err as Error).message}`)
}
}
console.warn(
`Downloaded ${downloaded.length}/${urls.length} media files ` +
`(${(totalBytes / 1024 / 1024).toFixed(1)}MB)`
)
return downloaded
}
function guessExtension(url: string, contentType: string): string {
const urlMatch = url.match(MEDIA_EXTENSIONS)
if (urlMatch) return urlMatch[0].toLowerCase()
const typeMap: Record<string, string> = {
'image/png': '.png',
'image/jpeg': '.jpg',
'image/gif': '.gif',
'image/webp': '.webp',
'video/mp4': '.mp4',
'video/webm': '.webm'
}
return typeMap[contentType.split(';')[0]] || '.bin'
}
function guessMimeType(ext: string): string {
const map: Record<string, string> = {
'.png': 'image/png',
'.jpg': 'image/jpeg',
'.jpeg': 'image/jpeg',
'.gif': 'image/gif',
'.webp': 'image/webp',
'.mp4': 'video/mp4',
'.webm': 'video/webm',
'.mov': 'video/quicktime'
}
return map[ext] || 'application/octet-stream'
}
// ── Gemini analysis ──
function buildIssueAnalysisPrompt(issue: IssueThread): string {
const allText = [
`# Issue: ${issue.title}`,
'',
'## Description',
issue.body,
'',
issue.comments.length > 0
? `## Comments\n${issue.comments.join('\n\n---\n\n')}`
: ''
]
.filter(Boolean)
.join('\n')
return `You are a senior QA engineer analyzing a bug report for ComfyUI frontend — a node-based visual workflow editor for AI image generation (Vue 3 + TypeScript).
The UI has:
- A large canvas (1280x720 viewport) showing a node graph centered at ~(640, 400)
- Nodes are boxes with input/output slots connected by wires
- A hamburger menu (top-left C logo) with File, Edit, Help submenus
- Sidebars (Workflows, Node Library, Models)
- A topbar with workflow tabs and Queue button
- The default workflow loads with these nodes (approximate center coordinates):
- Load Checkpoint (~150, 300), CLIP Text Encode x2 (~450, 250 and ~450, 450)
- Empty Latent Image (~450, 600), KSampler (~750, 350), VAE Decode (~1000, 350), Save Image (~1200, 350)
- Right-clicking ON a node shows node actions (Clone, Bypass, Convert, etc.)
- Right-clicking on EMPTY canvas shows Add Node menu — different from node context menu
Your task: Generate a DETAILED reproduction guide (8-15 steps) to trigger this bug on main.
${allText}
## Available test actions
Each step must use one of these actions:
### Menu actions
- "openMenu" — clicks the Comfy hamburger menu (top-left C logo)
- "hoverMenuItem" — hovers a top-level menu item to open submenu (label required)
- "clickMenuItem" — clicks an item in the visible submenu (label required)
### Element actions (by visible text)
- "click" — clicks an element by visible text (text required)
- "rightClick" — right-clicks an element to open context menu (text required)
- "doubleClick" — double-clicks an element or coordinates (text or x,y)
- "fillDialog" — fills dialog input and presses Enter (text required)
- "pressKey" — presses a keyboard key (key required: Escape, Tab, Delete, Enter, etc.)
### Canvas actions (by coordinates — viewport is 1280x720)
- "clickCanvas" — click at coordinates (x, y required)
- "rightClickCanvas" — right-click at coordinates (x, y required)
- "doubleClick" — double-click at coordinates to open node search (x, y)
- "dragCanvas" — drag from one point to another (fromX, fromY, toX, toY)
- "scrollCanvas" — scroll wheel for zoom (x, y, deltaY: negative=zoom in, positive=zoom out)
### Utility
- "wait" — waits briefly (ms required, max 3000)
- "screenshot" — takes a screenshot (name required)
## Common ComfyUI interactions
- Right-click a node → context menu with Clone, Bypass, Remove, Colors, etc.
- Double-click empty canvas → opens node search dialog
- Ctrl+C / Ctrl+V → copy/paste selected nodes
- Delete key → remove selected node
- Ctrl+G → group selected nodes
- Drag from output slot to input slot → create connection
- Click a node to select it, Shift+click for multi-select
## Output format
Return a JSON object with exactly one key: "reproduce", containing:
{
"summary": "One sentence: what bug this issue reports",
"test_focus": "Specific behavior to reproduce",
"prerequisites": ["e.g. Load default workflow"],
"steps": [
{
"action": "clickCanvas",
"description": "Click on first node to select it",
"expected_before": "What should happen if the bug is present"
}
],
"visual_checks": ["Specific visual evidence of the bug to look for"]
}
## Rules
- Generate 8-15 DETAILED steps that actually trigger the reported bug.
- Follow the issue's reproduction steps PRECISELY — translate them into available actions.
- Use canvas coordinates for node interactions (nodes are typically in the center area 300-900 x 200-500).
- Take screenshots BEFORE and AFTER critical actions to capture the bug state.
- Do NOT just open a menu and screenshot — actually perform the full reproduction sequence.
- Do NOT include login steps.
- Output ONLY valid JSON, no markdown fences or explanation.`
}
function buildAnalysisPrompt(thread: PrThread): string {
const allText = [
`# PR: ${thread.title}`,
'',
'## Description',
thread.body,
'',
thread.issueComments.length > 0
? `## Issue Comments\n${thread.issueComments.join('\n\n---\n\n')}`
: '',
thread.reviewComments.length > 0
? `## Review Comments\n${thread.reviewComments.join('\n\n---\n\n')}`
: '',
thread.reviews.length > 0
? `## Reviews\n${thread.reviews.join('\n\n---\n\n')}`
: '',
'',
'## Diff (truncated)',
'```',
thread.diff.slice(0, 8000),
'```'
]
.filter(Boolean)
.join('\n')
return `You are a senior QA engineer analyzing a pull request for ComfyUI frontend (a Vue 3 + TypeScript web application for AI image generation workflows).
Your task: Generate TWO targeted QA test guides — one for BEFORE the PR (main branch) and one for AFTER (PR branch).
${allText}
## Available test actions
Each step must use one of these actions:
- "openMenu" — clicks the Comfy hamburger menu (top-left C logo)
- "hoverMenuItem" — hovers a top-level menu item to open submenu (label required)
- "clickMenuItem" — clicks an item in the visible submenu (label required)
- "fillDialog" — fills dialog input and presses Enter (text required)
- "pressKey" — presses a keyboard key (key required)
- "click" — clicks an element by visible text (text required)
- "wait" — waits briefly (ms required, max 3000)
- "screenshot" — takes a screenshot (name required)
## Output format
Return a JSON object with exactly two keys: "before" and "after", each containing:
{
"summary": "One sentence: what this PR changes",
"test_focus": "Specific behaviors to verify in this recording",
"prerequisites": ["e.g. Load default workflow"],
"steps": [
{
"action": "openMenu",
"description": "Open the main menu to check file options",
"expected_before": "Old behavior description (before key only)",
"expected_after": "New behavior description (after key only)"
}
],
"visual_checks": ["Specific visual elements to look for"]
}
## Rules
- BEFORE guide: 2-4 steps, under 15 seconds. Show OLD/missing behavior.
- AFTER guide: 3-6 steps, under 30 seconds. Prove the fix/feature works.
- Focus on the SPECIFIC behavior changed by this PR, not generic testing.
- Use information from PR description, screenshots, and comments to understand intended behavior.
- Include at least one screenshot step in each guide.
- Do NOT include login steps.
- Menu pattern: openMenu -> hoverMenuItem -> clickMenuItem or screenshot.
- Output ONLY valid JSON, no markdown fences or explanation.`
}
async function analyzeWithGemini(
thread: PrThread,
media: Array<{ path: string; mimeType: string }>,
model: string,
apiKey: string
): Promise<{ before: QaGuide; after: QaGuide }> {
const genAI = new GoogleGenerativeAI(apiKey)
const geminiModel = genAI.getGenerativeModel({ model })
const prompt = buildAnalysisPrompt(thread)
const parts: Array<
{ text: string } | { inlineData: { mimeType: string; data: string } }
> = [{ text: prompt }]
// Add media as inline data
for (const item of media) {
try {
const buffer = readFileSync(item.path)
parts.push({
inlineData: {
mimeType: item.mimeType,
data: buffer.toString('base64')
}
})
} catch (err) {
console.warn(
`Failed to read media ${item.path}: ${(err as Error).message}`
)
}
}
console.warn(
`Sending to ${model}: ${prompt.length} chars text, ${media.length} media files`
)
const result = await geminiModel.generateContent({
contents: [{ role: 'user', parts }],
generationConfig: {
temperature: 0.2,
maxOutputTokens: 8192,
responseMimeType: 'application/json'
}
})
let text = result.response.text()
// Strip markdown fences if present
text = text
.replace(/^```(?:json)?\n?/gm, '')
.replace(/```$/gm, '')
.trim()
console.warn('Gemini response received')
console.warn('Raw response (first 500 chars):', text.slice(0, 500))
const parsed = JSON.parse(text)
// Handle different response shapes from Gemini
let before: QaGuide
let after: QaGuide
if (Array.isArray(parsed) && parsed.length >= 2) {
// Array format: [before, after]
before = parsed[0]
after = parsed[1]
} else if (parsed.before && parsed.after) {
// Object format: { before, after }
before = parsed.before
after = parsed.after
} else {
// Try nested wrapper keys
const inner = parsed.qa_guide ?? parsed.guides ?? parsed
if (inner.before && inner.after) {
before = inner.before
after = inner.after
} else {
console.warn(
'Full response:',
JSON.stringify(parsed, null, 2).slice(0, 2000)
)
throw new Error(
`Unexpected response shape. Got keys: ${Object.keys(parsed).join(', ')}`
)
}
}
return { before, after }
}
async function analyzeIssueWithGemini(
issue: IssueThread,
media: Array<{ path: string; mimeType: string }>,
model: string,
apiKey: string
): Promise<QaGuide> {
const genAI = new GoogleGenerativeAI(apiKey)
const geminiModel = genAI.getGenerativeModel({ model })
const prompt = buildIssueAnalysisPrompt(issue)
const parts: Array<
{ text: string } | { inlineData: { mimeType: string; data: string } }
> = [{ text: prompt }]
for (const item of media) {
try {
const buffer = readFileSync(item.path)
parts.push({
inlineData: {
mimeType: item.mimeType,
data: buffer.toString('base64')
}
})
} catch (err) {
console.warn(
`Failed to read media ${item.path}: ${(err as Error).message}`
)
}
}
console.warn(
`Sending to ${model}: ${prompt.length} chars text, ${media.length} media files`
)
const result = await geminiModel.generateContent({
contents: [{ role: 'user', parts }],
generationConfig: {
temperature: 0.2,
maxOutputTokens: 8192,
responseMimeType: 'application/json'
}
})
let text = result.response.text()
text = text
.replace(/^```(?:json)?\n?/gm, '')
.replace(/```$/gm, '')
.trim()
console.warn('Gemini response received')
console.warn('Raw response (first 500 chars):', text.slice(0, 500))
const parsed = JSON.parse(text)
const guide: QaGuide =
parsed.reproduce ?? parsed.qa_guide?.reproduce ?? parsed
return guide
}
// ── Main ──
async function main() {
const opts = parseArgs()
mkdirSync(opts.outputDir, { recursive: true })
if (opts.type === 'issue') {
await analyzeIssue(opts)
} else {
await analyzePr(opts)
}
}
async function analyzeIssue(opts: Options) {
const issue = fetchIssueThread(opts.prNumber, opts.repo)
const allText = [issue.body, ...issue.comments].join('\n')
const mediaUrls = extractMediaUrls(allText)
console.warn(`Found ${mediaUrls.length} media URLs`)
const media = await downloadMedia(
mediaUrls,
opts.outputDir,
opts.mediaBudgetBytes,
opts.maxVideoBytes
)
const guide = await analyzeIssueWithGemini(
issue,
media,
opts.model,
opts.apiKey
)
const beforePath = resolve(opts.outputDir, 'qa-guide-before.json')
writeFileSync(beforePath, JSON.stringify(guide, null, 2))
console.warn(`Wrote QA guide:`)
console.warn(` Reproduce: ${beforePath}`)
}
async function analyzePr(opts: Options) {
const thread = fetchPrThread(opts.prNumber, opts.repo)
const allText = [
thread.body,
...thread.issueComments,
...thread.reviewComments,
...thread.reviews
].join('\n')
const mediaUrls = extractMediaUrls(allText)
console.warn(`Found ${mediaUrls.length} media URLs`)
const media = await downloadMedia(
mediaUrls,
opts.outputDir,
opts.mediaBudgetBytes,
opts.maxVideoBytes
)
const guides = await analyzeWithGemini(thread, media, opts.model, opts.apiKey)
const beforePath = resolve(opts.outputDir, 'qa-guide-before.json')
const afterPath = resolve(opts.outputDir, 'qa-guide-after.json')
writeFileSync(beforePath, JSON.stringify(guides.before, null, 2))
writeFileSync(afterPath, JSON.stringify(guides.after, null, 2))
console.warn(`Wrote QA guides:`)
console.warn(` Before: ${beforePath}`)
console.warn(` After: ${afterPath}`)
}
function isExecutedAsScript(metaUrl: string): boolean {
const modulePath = fileURLToPath(metaUrl)
const scriptPath = process.argv[1] ? resolve(process.argv[1]) : ''
return modulePath === scriptPath
}
if (isExecutedAsScript(import.meta.url)) {
main().catch((err) => {
console.error('PR analysis failed:', err)
process.exit(1)
})
}

176
scripts/qa-batch.sh Executable file
View File

@@ -0,0 +1,176 @@
#!/usr/bin/env bash
# Batch-trigger QA runs by creating and pushing sno-qa-* branches.
#
# Usage:
# ./scripts/qa-batch.sh 10394 10238 9996 # Trigger specific numbers
# ./scripts/qa-batch.sh --from tmp/issues.md --top 5 # From triage file
# ./scripts/qa-batch.sh --dry-run 10394 10238 # Preview only
# ./scripts/qa-batch.sh --cleanup # Delete old sno-qa-* branches
set -euo pipefail
DELAY=5
DRY_RUN=false
CLEANUP=false
FROM_FILE=""
TOP_N=0
NUMBERS=()
die() { echo "error: $*" >&2; exit 1; }
usage() {
cat <<'EOF'
Usage: qa-batch.sh [options] [numbers...]
Options:
--from <file> Extract numbers from a triage markdown file
--top <N> Take first N entries from Tier 1 (requires --from)
--dry-run Print what would happen without pushing
--cleanup Delete all sno-qa-* remote branches
--delay <secs> Seconds between pushes (default: 5)
-h, --help Show this help
EOF
exit 0
}
# --- Parse args ---
while [[ $# -gt 0 ]]; do
case "$1" in
--from) FROM_FILE="$2"; shift 2 ;;
--top) TOP_N="$2"; shift 2 ;;
--dry-run) DRY_RUN=true; shift ;;
--cleanup) CLEANUP=true; shift ;;
--delay) DELAY="$2"; shift 2 ;;
-h|--help) usage ;;
-*) die "unknown option: $1" ;;
*) NUMBERS+=("$1"); shift ;;
esac
done
# --- Cleanup mode ---
if $CLEANUP; then
echo "Fetching remote sno-qa-* branches..."
branches=$(git ls-remote --heads origin 'refs/heads/sno-qa-*' | awk '{print $2}' | sed 's|refs/heads/||')
if [[ -z "$branches" ]]; then
echo "No sno-qa-* branches found on remote."
exit 0
fi
echo "Found branches:"
while IFS= read -r b; do echo " $b"; done <<< "$branches"
echo
if $DRY_RUN; then
echo "[dry-run] Would delete the above branches."
exit 0
fi
read -rp "Delete all of the above? [y/N] " confirm
if [[ "$confirm" != "y" && "$confirm" != "Y" ]]; then
echo "Aborted."
exit 0
fi
for branch in $branches; do
echo "Deleting origin/$branch..."
git push origin --delete "$branch"
done
echo "Done. Cleaned up $(echo "$branches" | wc -l | tr -d ' ') branches."
exit 0
fi
# --- Extract numbers from markdown ---
if [[ -n "$FROM_FILE" ]]; then
[[ -f "$FROM_FILE" ]] || die "file not found: $FROM_FILE"
[[ "$TOP_N" -gt 0 ]] || die "--top N required with --from"
# Extract Tier 1 table rows: | N | [#NNNNN](...) | ...
# Stop at the next ## heading after Tier 1
extracted=$(awk '/^## Tier 1/,/^## Tier [^1]/' "$FROM_FILE" \
| grep -oP '\[#\K\d+' \
| head -n "$TOP_N")
if [[ -z "$extracted" ]]; then
die "no numbers found in $FROM_FILE"
fi
while IFS= read -r num; do
NUMBERS+=("$num")
done <<< "$extracted"
fi
[[ ${#NUMBERS[@]} -gt 0 ]] || die "no numbers specified. Use positional args or --from/--top."
# --- Validate ---
for num in "${NUMBERS[@]}"; do
[[ "$num" =~ ^[0-9]+$ ]] || die "invalid number: $num"
done
# Deduplicate
# shellcheck disable=SC2207 # mapfile not available on macOS default bash
NUMBERS=($(printf '%s\n' "${NUMBERS[@]}" | sort -un))
# --- Push branches ---
echo "Triggering QA for: ${NUMBERS[*]}"
if $DRY_RUN; then
echo "[dry-run]"
fi
echo
pushed=()
skipped=()
# Fetch remote refs once
remote_refs=$(git ls-remote --heads origin 'refs/heads/sno-qa-*' 2>/dev/null | awk '{print $2}' | sed 's|refs/heads/||')
for num in "${NUMBERS[@]}"; do
branch="sno-qa-$num"
# Check if already exists on remote
if echo "$remote_refs" | grep -qx "$branch"; then
echo " skip: $branch (already exists on remote)"
skipped+=("$num")
continue
fi
if $DRY_RUN; then
echo " would push: $branch"
pushed+=("$num")
continue
fi
# Create branch at current HEAD and push
git branch -f "$branch" HEAD
git push origin "$branch"
pushed+=("$num")
echo " pushed: $branch"
# Clean up local branch
git branch -D "$branch" 2>/dev/null || true
# Delay between pushes to avoid CI concurrency storm
if [[ "$num" != "${NUMBERS[-1]}" ]]; then
echo " waiting ${DELAY}s..."
sleep "$DELAY"
fi
done
# --- Summary ---
echo
echo "=== Summary ==="
echo "Triggered: ${#pushed[@]}"
echo "Skipped: ${#skipped[@]}"
if [[ ${#pushed[@]} -gt 0 ]]; then
echo
echo "Triggered numbers: ${pushed[*]}"
repo_url=$(git remote get-url origin | sed 's/\.git$//' | sed 's|git@github.com:|https://github.com/|')
echo "Actions: ${repo_url}/actions"
fi
if [[ ${#skipped[@]} -gt 0 ]]; then
echo
echo "Skipped (already exist): ${skipped[*]}"
echo "Use --cleanup first to remove old branches."
fi

333
scripts/qa-deploy-pages.sh Executable file
View File

@@ -0,0 +1,333 @@
#!/usr/bin/env bash
# Deploy QA report to Cloudflare Pages.
# Expected env vars: CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID, RAW_BRANCH,
# BEFORE_SHA, AFTER_SHA, TARGET_NUM, TARGET_TYPE, REPO, RUN_ID
# Writes outputs to GITHUB_OUTPUT: badge_status, url
set -euo pipefail
npm install -g wrangler@4.74.0 >/dev/null 2>&1
DEPLOY_DIR=$(mktemp -d)
mkdir -p "$DEPLOY_DIR"
for os in Linux macOS Windows; do
DIR="qa-artifacts/qa-report-${os}-${RUN_ID}"
for prefix in qa qa-before; do
VID="${DIR}/${prefix}-session.mp4"
if [ -f "$VID" ]; then
DEST="$DEPLOY_DIR/${prefix}-${os}.mp4"
cp "$VID" "$DEST"
echo "Found ${prefix} ${os} video ($(du -h "$VID" | cut -f1))"
fi
done
# Copy multi-pass session videos (qa-session-1, qa-session-2, etc.)
for numbered in "$DIR"/qa-session-[0-9].mp4; do
[ -f "$numbered" ] || continue
NUM=$(basename "$numbered" | sed 's/qa-session-\([0-9]\).mp4/\1/')
DEST="$DEPLOY_DIR/qa-${os}-pass${NUM}.mp4"
cp "$numbered" "$DEST"
echo "Found pass ${NUM} ${os} video ($(du -h "$numbered" | cut -f1))"
done
# Generate GIF thumbnail from after video (or first pass)
THUMB_SRC="$DEPLOY_DIR/qa-${os}.mp4"
[ ! -f "$THUMB_SRC" ] && THUMB_SRC="$DEPLOY_DIR/qa-${os}-pass1.mp4"
if [ -f "$THUMB_SRC" ]; then
ffmpeg -y -ss 10 -i "$THUMB_SRC" -t 8 \
-vf "fps=8,scale=480:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=64[p];[s1][p]paletteuse=dither=bayer" \
-loop 0 "$DEPLOY_DIR/qa-${os}-thumb.gif" 2>/dev/null \
|| echo "GIF generation failed for ${os} (non-fatal)"
fi
done
# Build video cards and report sections
CARDS=""
# shellcheck disable=SC2034 # accessed via eval
ICONS_Linux="&#x1F427;" ICONS_macOS="&#x1F34E;" ICONS_Windows="&#x1FA9F;"
CARD_COUNT=0
DL_ICON="<svg width=14 height=14 viewBox='0 0 24 24' fill=none stroke=currentColor stroke-width=2><path d='M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4'/><polyline points='7 10 12 15 17 10'/><line x1=12 y1=15 x2=12 y2=3'/></svg>"
for os in Linux macOS Windows; do
eval "ICON=\$ICONS_${os}"
OS_LOWER=$(echo "$os" | tr '[:upper:]' '[:lower:]')
HAS_BEFORE=$([ -f "$DEPLOY_DIR/qa-before-${os}.mp4" ] && echo 1 || echo 0)
HAS_AFTER=$( { [ -f "$DEPLOY_DIR/qa-${os}.mp4" ] || [ -f "$DEPLOY_DIR/qa-${os}-pass1.mp4" ]; } && echo 1 || echo 0)
[ "$HAS_AFTER" = "0" ] && continue
# Collect all reports for this platform (single + multi-pass)
REPORT_FILES=""
REPORT_LINK=""
REPORT_HTML=""
for rpt in "video-reviews/${OS_LOWER}-qa-video-report.md" "video-reviews/${OS_LOWER}-pass"*-qa-video-report.md; do
[ -f "$rpt" ] && REPORT_FILES="${REPORT_FILES} ${rpt}"
done
if [ -n "$REPORT_FILES" ]; then
# Concatenate all reports into one combined report file
COMBINED_MD=""
for rpt in $REPORT_FILES; do
cp "$rpt" "$DEPLOY_DIR/$(basename "$rpt")"
RPT_MD=$(sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g' "$rpt")
[ -n "$COMBINED_MD" ] && COMBINED_MD="${COMBINED_MD}&#10;&#10;---&#10;&#10;"
COMBINED_MD="${COMBINED_MD}${RPT_MD}"
done
FIRST_REPORT=$(echo "$REPORT_FILES" | awk '{print $1}')
FIRST_BASENAME=$(basename "$FIRST_REPORT")
REPORT_LINK="<a class=dl href=${FIRST_BASENAME}><svg width=14 height=14 viewBox='0 0 24 24' fill=none stroke=currentColor stroke-width=2><path d='M14 2H6a2 2 0 0 0-2 2v16a2 2 0 0 0 2 2h12a2 2 0 0 0 2-2V8z'/><polyline points='14 2 14 8 20 8'/><line x1=16 y1=13 x2=8 y2=13/><line x1=16 y1=17 x2=8 y2=17'/></svg>Report</a>"
REPORT_HTML="<details class=report open><summary><svg width=14 height=14 viewBox='0 0 24 24' fill=none stroke=currentColor stroke-width=2><circle cx=12 cy=12 r=10/><line x1=12 y1=16 x2=12 y2=12/><line x1=12 y1=8 x2=12.01 y2=8'/></svg> AI Comparative Review</summary><div class=report-body data-md>${COMBINED_MD}</div></details>"
fi
if [ "$HAS_BEFORE" = "1" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"
elif [ -f "$DEPLOY_DIR/qa-${os}.mp4" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=video-wrap><video controls muted preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=card-body><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links><a class=dl href=qa-${os}.mp4 download>${DL_ICON}Download</a>${REPORT_LINK}</span></div>${REPORT_HTML}</div>"
else
PASS_VIDEOS=""
for pass_vid in "$DEPLOY_DIR/qa-${os}-pass"[0-9].mp4; do
[ -f "$pass_vid" ] || continue
PASS_NUM=$(basename "$pass_vid" | sed "s/qa-${os}-pass\([0-9]\).mp4/\1/")
PASS_VIDEOS="${PASS_VIDEOS}<div class=comp-panel><div class=comp-label>Pass ${PASS_NUM}</div><div class=video-wrap><video controls muted preload=auto><source src=qa-${os}-pass${PASS_NUM}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}-pass${PASS_NUM}.mp4 download>${DL_ICON}Pass ${PASS_NUM}</a></div></div>"
done
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison>${PASS_VIDEOS}</div>${REPORT_HTML}</div>"
fi
CARD_COUNT=$((CARD_COUNT + 1))
done
# Build commit info and target link for the report header
COMMIT_HTML=""
REPO_URL="https://github.com/${REPO}"
if [ -n "${TARGET_NUM:-}" ]; then
if [ "$TARGET_TYPE" = "issue" ]; then
COMMIT_HTML="<a href=${REPO_URL}/issues/${TARGET_NUM} class=sha title='Issue'>Issue #${TARGET_NUM}</a>"
else
COMMIT_HTML="<a href=${REPO_URL}/pull/${TARGET_NUM} class=sha title='Pull Request'>PR #${TARGET_NUM}</a>"
fi
fi
if [ -n "${BEFORE_SHA:-}" ]; then
SHORT_BEFORE="${BEFORE_SHA:0:7}"
COMMIT_HTML="${COMMIT_HTML:+${COMMIT_HTML} &middot; }<a href=${REPO_URL}/commit/${BEFORE_SHA} class=sha title='main branch'>main @ ${SHORT_BEFORE}</a>"
fi
if [ -n "${AFTER_SHA:-}" ]; then
SHORT_AFTER="${AFTER_SHA:0:7}"
AFTER_LABEL="PR"
[ -n "${TARGET_NUM:-}" ] && AFTER_LABEL="#${TARGET_NUM}"
COMMIT_HTML="${COMMIT_HTML:+${COMMIT_HTML} &middot; }<a href=${REPO_URL}/commit/${AFTER_SHA} class=sha title='PR head commit'>${AFTER_LABEL} @ ${SHORT_AFTER}</a>"
fi
if [ -n "${PIPELINE_SHA:-}" ]; then
SHORT_PIPE="${PIPELINE_SHA:0:7}"
COMMIT_HTML="${COMMIT_HTML:+${COMMIT_HTML} &middot; }<a href=${REPO_URL}/commit/${PIPELINE_SHA} class=sha title='QA pipeline version'>QA @ ${SHORT_PIPE}</a>"
fi
[ -n "$COMMIT_HTML" ] && COMMIT_HTML=" &middot; ${COMMIT_HTML}"
RUN_LINK=""
if [ -n "${RUN_URL:-}" ]; then
RUN_LINK=" &middot; <a href=\"${RUN_URL}\" class=sha title=\"GitHub Actions run\">CI Job</a>"
fi
# Timing info
DEPLOY_TIME=$(date -u '+%Y-%m-%d %H:%M UTC')
TIMING_HTML=""
if [ -n "${RUN_START_TIME:-}" ]; then
TIMING_HTML=" &middot; <span class=sha title='Pipeline timing'>${RUN_START_TIME} &rarr; ${DEPLOY_TIME}</span>"
fi
# Generate index.html from template
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
TEMPLATE="$SCRIPT_DIR/qa-report-template.html"
# Write dynamic content to temp files for safe substitution
# Cloudflare Pages _headers file — enable range requests for video seeking
cat > "$DEPLOY_DIR/_headers" <<'HEADERSEOF'
/*.mp4
Accept-Ranges: bytes
Cache-Control: public, max-age=86400
HEADERSEOF
# Build purpose description from pr-context.txt
PURPOSE_HTML=""
if [ -f pr-context.txt ]; then
# Extract title line and first paragraph of description
PR_TITLE=$(grep -m1 '^Title:' pr-context.txt | sed 's/^Title: //')
if [ "$TARGET_TYPE" = "issue" ]; then
PURPOSE_LABEL="Issue #${TARGET_NUM}"
PURPOSE_VERB="reports"
else
PURPOSE_LABEL="PR #${TARGET_NUM}"
PURPOSE_VERB="aims to"
fi
# Get first ~300 chars of description body (after "Description:" line)
PR_DESC=$(sed -n '/^Description:/,/^###/p' pr-context.txt | grep -v '^Description:\|^###' | head -5 | sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g' | tr '\n' ' ' | head -c 400)
[ -z "$PR_DESC" ] && PR_DESC=$(sed -n '3,8p' pr-context.txt | sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g' | tr '\n' ' ' | head -c 400)
# Build requirements from QA guide JSON
REQS_HTML=""
QA_GUIDE=$(ls qa-guides/qa-guide-*.json 2>/dev/null | head -1)
if [ -f "$QA_GUIDE" ]; then
PREREQS=$(python3 -c "
import json, sys, html
try:
g = json.load(open(sys.argv[1]))
prereqs = g.get('prerequisites', [])
steps = g.get('steps', [])
focus = g.get('test_focus', '')
parts = []
if focus:
parts.append('<strong>Test focus:</strong> ' + html.escape(focus))
if prereqs:
parts.append('<strong>Prerequisites:</strong> ' + ', '.join(html.escape(p) for p in prereqs))
if steps:
parts.append('<strong>Steps:</strong> ' + ' → '.join(html.escape(s.get('description', str(s))) for s in steps[:6]))
if len(steps) > 6:
parts[-1] += ' → ...'
print('<br>'.join(parts))
except: pass
" "$QA_GUIDE" 2>/dev/null)
[ -n "$PREREQS" ] && REQS_HTML="<div class=purpose-reqs>${PREREQS}</div>"
fi
PURPOSE_HTML="<div class=purpose><div class=purpose-label>${PURPOSE_LABEL} ${PURPOSE_VERB}</div><strong>${PR_TITLE}</strong><br>${PR_DESC}${REQS_HTML}</div>"
fi
echo -n "$COMMIT_HTML" > "$DEPLOY_DIR/.commit_html"
echo -n "$CARDS" > "$DEPLOY_DIR/.cards_html"
echo -n "$RUN_LINK" > "$DEPLOY_DIR/.run_link"
# Badge HTML with copy button (placeholder URL filled after deploy)
echo -n '<div class="badge-bar"><img src="badge.svg" alt="QA Badge" class="badge-img"/><button class="copy-badge" title="Copy badge markdown" onclick="copyBadge()"><svg width=14 height=14 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2><rect x=9 y=9 width=13 height=13 rx=2/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg></button></div>' > "$DEPLOY_DIR/.badge_html"
echo -n "${TIMING_HTML:-}" > "$DEPLOY_DIR/.timing_html"
echo -n "$PURPOSE_HTML" > "$DEPLOY_DIR/.purpose_html"
python3 -c "
import sys, pathlib
d = pathlib.Path(sys.argv[1])
t = pathlib.Path(sys.argv[2]).read_text()
t = t.replace('{{COMMIT_HTML}}', (d / '.commit_html').read_text())
t = t.replace('{{CARDS}}', (d / '.cards_html').read_text())
t = t.replace('{{RUN_LINK}}', (d / '.run_link').read_text())
t = t.replace('{{BADGE_HTML}}', (d / '.badge_html').read_text())
t = t.replace('{{TIMING_HTML}}', (d / '.timing_html').read_text())
t = t.replace('{{PURPOSE_HTML}}', (d / '.purpose_html').read_text())
sys.stdout.write(t)
" "$DEPLOY_DIR" "$TEMPLATE" > "$DEPLOY_DIR/index.html"
rm -f "$DEPLOY_DIR/.commit_html" "$DEPLOY_DIR/.cards_html" "$DEPLOY_DIR/.run_link" "$DEPLOY_DIR/.badge_html" "$DEPLOY_DIR/.timing_html" "$DEPLOY_DIR/.purpose_html"
cat > "$DEPLOY_DIR/404.html" <<'ERROREOF'
<!DOCTYPE html><html lang=en><head><meta charset=utf-8><title>404</title>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600&display=swap" rel=stylesheet>
<style>:root{--bg:oklch(8% 0.02 265);--fg:oklch(45% 0.01 265);--err:oklch(62% 0.22 25)}*{margin:0;padding:0;box-sizing:border-box}body{background:var(--bg);color:var(--fg);font-family:'Inter',system-ui,sans-serif;display:flex;align-items:center;justify-content:center;min-height:100vh}div{text-align:center}h1{color:var(--err);font-size:clamp(3rem,8vw,5rem);font-weight:700;letter-spacing:-.04em;margin-bottom:.5rem}p{font-size:1rem;max-width:32ch;line-height:1.5}</style>
</head><body><div><h1>404</h1><p>File not found. The QA recording may have failed or been cancelled.</p></div></body></html>
ERROREOF
# Generate badge SVGs into deploy dir
# Verdict detection: check each report file's ## Summary section individually,
# then aggregate across passes for a "X/Y REPRODUCED" badge.
REPRO_COUNT=0 INCONC_COUNT=0 NOT_REPRO_COUNT=0 TOTAL_REPORTS=0
if [ -d video-reviews ]; then
for rpt in video-reviews/*-qa-video-report.md; do
[ -f "$rpt" ] || continue
TOTAL_REPORTS=$((TOTAL_REPORTS + 1))
# Try structured JSON verdict first (from ## Verdict section)
VERDICT_JSON=$(grep -oP '"verdict":\s*"[A-Z_]+' "$rpt" 2>/dev/null | tail -1 | grep -oP '[A-Z_]+$' || true)
RISK_JSON=$(grep -oP '"risk":\s*"[a-z]+' "$rpt" 2>/dev/null | tail -1 | grep -oP '[a-z]+$' || true)
if [ -n "$VERDICT_JSON" ]; then
case "$VERDICT_JSON" in
REPRODUCED) REPRO_COUNT=$((REPRO_COUNT + 1)) ;;
NOT_REPRODUCIBLE) NOT_REPRO_COUNT=$((NOT_REPRO_COUNT + 1)) ;;
INCONCLUSIVE) INCONC_COUNT=$((INCONC_COUNT + 1)) ;;
esac
else
# Fallback: grep Summary section (for older reports without ## Verdict)
SUMM=$(sed -n '/^## Summary/,/^## /p' "$rpt" 2>/dev/null | head -15)
if echo "$SUMM" | grep -iq 'INCONCLUSIVE'; then
INCONC_COUNT=$((INCONC_COUNT + 1))
elif echo "$SUMM" | grep -iq 'not reproduced\|could not reproduce\|could not be confirmed\|unable to reproduce\|fails\? to reproduce\|fails\? to perform\|was NOT\|NOT visible\|not observed\|fail.* to demonstrate\|does not demonstrate\|steps were not performed\|never.*tested\|never.*accessed\|not.* confirmed'; then
NOT_REPRO_COUNT=$((NOT_REPRO_COUNT + 1))
elif echo "$SUMM" | grep -iq 'reproduc\|confirm'; then
REPRO_COUNT=$((REPRO_COUNT + 1))
fi
fi
done
fi
FAIL_COUNT=$((TOTAL_REPORTS - REPRO_COUNT - NOT_REPRO_COUNT))
[ "$FAIL_COUNT" -lt 0 ] && FAIL_COUNT=0
echo "DEBUG verdict: repro=${REPRO_COUNT} not_repro=${NOT_REPRO_COUNT} inconc=${INCONC_COUNT} fail=${FAIL_COUNT} total=${TOTAL_REPORTS}"
echo "Verdict: ${REPRO_COUNT}${NOT_REPRO_COUNT}${FAIL_COUNT}⚠ / ${TOTAL_REPORTS}"
# Badge text:
# Single pass: "REPRODUCED" / "NOT REPRODUCIBLE" / "INCONCLUSIVE"
# Multi pass: "2✓ 0✗ 1⚠ / 3" with color based on dominant result
REPRO_RESULT="" REPRO_COLOR="#9f9f9f"
if [ "$TOTAL_REPORTS" -le 1 ]; then
# Single report — simple label
if [ "$REPRO_COUNT" -gt 0 ]; then
REPRO_RESULT="REPRODUCED" REPRO_COLOR="#2196f3"
elif [ "$NOT_REPRO_COUNT" -gt 0 ]; then
REPRO_RESULT="NOT REPRODUCIBLE" REPRO_COLOR="#9f9f9f"
elif [ "$FAIL_COUNT" -gt 0 ]; then
REPRO_RESULT="INCONCLUSIVE" REPRO_COLOR="#9f9f9f"
fi
else
# Multi pass — show breakdown: X✓ Y✗ Z⚠ / N
PARTS=""
[ "$REPRO_COUNT" -gt 0 ] && PARTS="${REPRO_COUNT}"
[ "$NOT_REPRO_COUNT" -gt 0 ] && PARTS="${PARTS:+${PARTS} }${NOT_REPRO_COUNT}"
[ "$FAIL_COUNT" -gt 0 ] && PARTS="${PARTS:+${PARTS} }${FAIL_COUNT}"
REPRO_RESULT="${PARTS} / ${TOTAL_REPORTS}"
# Color based on best outcome
if [ "$REPRO_COUNT" -gt 0 ]; then
REPRO_COLOR="#2196f3"
elif [ "$NOT_REPRO_COUNT" -gt 0 ]; then
REPRO_COLOR="#9f9f9f"
fi
fi
# Badge label: #NUM QA0327 (with today's date)
QA_DATE=$(date -u '+%m%d')
BADGE_LABEL="QA${QA_DATE}"
[ -n "${TARGET_NUM:-}" ] && BADGE_LABEL="#${TARGET_NUM} QA${QA_DATE}"
# For PRs, also extract fix quality from Overall Risk section
FIX_RESULT="" FIX_COLOR="#4c1"
if [ "$TARGET_TYPE" != "issue" ]; then
# Try structured JSON risk first
ALL_RISKS=$(grep -ohP '"risk":\s*"[a-z]+' video-reviews/*.md 2>/dev/null | grep -oP '[a-z]+$' || true)
if [ -n "$ALL_RISKS" ]; then
# Use worst risk across all reports
if echo "$ALL_RISKS" | grep -q 'high'; then
FIX_RESULT="MAJOR ISSUES" FIX_COLOR="#e05d44"
elif echo "$ALL_RISKS" | grep -q 'medium'; then
FIX_RESULT="MINOR ISSUES" FIX_COLOR="#dfb317"
elif echo "$ALL_RISKS" | grep -q 'low'; then
FIX_RESULT="APPROVED" FIX_COLOR="#4c1"
fi
else
# Fallback: grep Overall Risk section
RISK_TEXT=""
if [ -d video-reviews ]; then
RISK_TEXT=$(sed -n '/^## Overall Risk/,/^## /p' video-reviews/*.md 2>/dev/null | sed 's/\*//g' | head -20)
fi
RISK_FIRST=$(echo "$RISK_TEXT" | grep -oiP '^\s*(high|medium|moderate|low|minimal|critical)' | head -1 | tr '[:upper:]' '[:lower:]')
if [ -n "$RISK_FIRST" ]; then
case "$RISK_FIRST" in
*low*|*minimal*) FIX_RESULT="APPROVED" FIX_COLOR="#4c1" ;;
*medium*|*moderate*) FIX_RESULT="MINOR ISSUES" FIX_COLOR="#dfb317" ;;
*high*|*critical*) FIX_RESULT="MAJOR ISSUES" FIX_COLOR="#e05d44" ;;
esac
elif echo "$RISK_TEXT" | grep -iq 'no.*risk\|approved\|looks good'; then
FIX_RESULT="APPROVED" FIX_COLOR="#4c1"
fi
fi
fi
# Always use vertical box badge
/tmp/gen-badge-box.sh "$DEPLOY_DIR/badge.svg" "$BADGE_LABEL" \
"$REPRO_COUNT" "$NOT_REPRO_COUNT" "$FAIL_COUNT" "$TOTAL_REPORTS" \
"$FIX_RESULT" "$FIX_COLOR"
BADGE_STATUS="${REPRO_RESULT:-UNKNOWN}${FIX_RESULT:+ | Fix: ${FIX_RESULT}}"
echo "badge_status=${BADGE_STATUS:-FINISHED}" >> "$GITHUB_OUTPUT"
BRANCH=$(echo "$RAW_BRANCH" | sed 's/[^a-zA-Z0-9-]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | cut -c1-28)
URL=$(wrangler pages deploy "$DEPLOY_DIR" \
--project-name="comfy-qa" \
--branch="$BRANCH" 2>&1 \
| grep -oE 'https://[a-zA-Z0-9.-]+\.pages\.dev\S*' | head -1)
echo "url=${URL:-https://${BRANCH}.comfy-qa.pages.dev}" >> "$GITHUB_OUTPUT"
echo "Deployed to: ${URL}"

208
scripts/qa-generate-test.ts Normal file
View File

@@ -0,0 +1,208 @@
#!/usr/bin/env tsx
/**
* Generates a Playwright regression test (.spec.ts) from a QA report + PR diff.
* Uses Gemini to produce a test that asserts UIUX behavior verified during QA.
*
* Usage:
* pnpm exec tsx scripts/qa-generate-test.ts \
* --qa-report <path> QA video review report (markdown)
* --pr-diff <path> PR diff file
* --output <path> Output .spec.ts file path
* --model <name> Gemini model (default: gemini-3-flash-preview)
*/
import { readFile, writeFile } from 'node:fs/promises'
import { basename, resolve } from 'node:path'
import { GoogleGenerativeAI } from '@google/generative-ai'
interface CliOptions {
qaReport: string
prDiff: string
output: string
model: string
}
const DEFAULTS: CliOptions = {
qaReport: '',
prDiff: '',
output: '',
model: 'gemini-3-flash-preview'
}
// ── Fixture API reference for the prompt ────────────────────────────
const FIXTURE_API = `
## ComfyUI Playwright Test Fixture API
Import pattern:
\`\`\`typescript
import { expect } from '@playwright/test'
import { comfyPageFixture as test } from '../fixtures/ComfyPage'
\`\`\`
### Available helpers on \`comfyPage\`:
- \`comfyPage.page\` — raw Playwright Page
- \`comfyPage.menu.topbar\` — Topbar helper:
- \`.getTabNames(): Promise<string[]>\` — get all open tab names
- \`.getActiveTabName(): Promise<string>\` — get active tab name
- \`.saveWorkflow(name)\` — Save via File > Save dialog
- \`.saveWorkflowAs(name)\` — Save via File > Save As dialog
- \`.exportWorkflow(name)\` — Export via File > Export dialog
- \`.triggerTopbarCommand(path: string[])\` — e.g. ['File', 'Save As']
- \`.getWorkflowTab(name)\` — get a tab locator by name
- \`.closeWorkflowTab(name)\` — close a tab
- \`.openTopbarMenu()\` — open the hamburger menu
- \`.openSubmenu(label)\` — hover to open a submenu
- \`comfyPage.menu.workflowsTab\` — Workflows sidebar:
- \`.open()\` / \`.close()\` — toggle sidebar
- \`.getTopLevelSavedWorkflowNames()\` — list saved workflows
- \`.getPersistedItem(name)\` — get a workflow item locator
- \`comfyPage.workflow\` — WorkflowHelper:
- \`.loadWorkflow(name)\` — load from browser_tests/assets/{name}.json
- \`.setupWorkflowsDirectory(structure)\` — setup test directory
- \`.deleteWorkflow(name)\` — delete a workflow
- \`.isCurrentWorkflowModified(): Promise<boolean>\` — check dirty state
- \`.getUndoQueueSize()\` / \`.getRedoQueueSize()\`
- \`comfyPage.settings.setSetting(key, value)\` — change settings
- \`comfyPage.keyboard\` — KeyboardHelper:
- \`.undo()\` / \`.redo()\` / \`.bypass()\`
- \`comfyPage.nodeOps\` — NodeOperationsHelper
- \`comfyPage.canvas\` — CanvasHelper
- \`comfyPage.contextMenu\` — ContextMenu
- \`comfyPage.toast\` — ToastHelper
- \`comfyPage.confirmDialog\` — confirmation dialog
- \`comfyPage.nextFrame()\` — wait for Vue re-render
### Test patterns:
- Use \`test.describe('Name', { tag: '@ui' }, () => { ... })\` for UI tests
- Use \`test.beforeEach\` to set up common state (settings, workflow dir)
- Use \`expect(locator).toHaveScreenshot('name.png')\` for visual assertions
- Use \`expect(locator).toBeVisible()\` / \`.toHaveText()\` for behavioral assertions
- Use \`comfyPage.workflow.setupWorkflowsDirectory({})\` to ensure clean state
`
// ── Prompt builder ──────────────────────────────────────────────────
function buildPrompt(qaReport: string, prDiff: string): string {
return `You are a Playwright test generator for the ComfyUI frontend.
Your task: Generate a single .spec.ts regression test file that asserts the UIUX behavior
described in the QA report below. The test must:
1. Use the ComfyUI Playwright fixture API (documented below)
2. Test UIUX behavior ONLY — element visibility, tab names, dialog states, workflow states
3. NOT test code implementation details
4. Be concise — only test the behavior that the PR changed
5. Follow existing test conventions (see API reference)
${FIXTURE_API}
## QA Video Review Report
${qaReport}
## PR Diff (for context on what changed)
${prDiff.slice(0, 8000)}
## Output Requirements
- Output ONLY the .spec.ts file content — no markdown fences, no explanations
- Start with imports, end with closing brace
- Use descriptive test names that explain the expected behavior
- Add screenshot assertions where visual verification matters
- Keep it focused: 2-5 test cases covering the core behavioral change
- Use \`test.beforeEach\` for common setup (settings, workflow directory)
- Tag the describe block with \`{ tag: '@ui' }\` or \`{ tag: '@workflow' }\` as appropriate
`
}
// ── Gemini call ─────────────────────────────────────────────────────
async function generateTest(
qaReport: string,
prDiff: string,
model: string
): Promise<string> {
const apiKey = process.env.GEMINI_API_KEY
if (!apiKey) throw new Error('GEMINI_API_KEY env var required')
const genAI = new GoogleGenerativeAI(apiKey)
const genModel = genAI.getGenerativeModel({ model })
const prompt = buildPrompt(qaReport, prDiff)
console.warn(`Sending prompt to ${model} (${prompt.length} chars)...`)
const result = await genModel.generateContent({
contents: [{ role: 'user', parts: [{ text: prompt }] }],
generationConfig: {
temperature: 0.2,
maxOutputTokens: 8192
}
})
const text = result.response.text()
// Strip markdown fences if model wraps output
return text
.replace(/^```(?:typescript|ts)?\n?/, '')
.replace(/\n?```$/, '')
.trim()
}
// ── CLI ─────────────────────────────────────────────────────────────
function parseArgs(): CliOptions {
const args = process.argv.slice(2)
const opts = { ...DEFAULTS }
for (let i = 0; i < args.length; i++) {
switch (args[i]) {
case '--qa-report':
opts.qaReport = args[++i]
break
case '--pr-diff':
opts.prDiff = args[++i]
break
case '--output':
opts.output = args[++i]
break
case '--model':
opts.model = args[++i]
break
case '--help':
console.warn(`Usage:
pnpm exec tsx scripts/qa-generate-test.ts [options]
Options:
--qa-report <path> QA video review report (markdown) [required]
--pr-diff <path> PR diff file [required]
--output <path> Output .spec.ts path [required]
--model <name> Gemini model (default: gemini-3-flash-preview)`)
process.exit(0)
}
}
if (!opts.qaReport || !opts.prDiff || !opts.output) {
console.error('Missing required args. Run with --help for usage.')
process.exit(1)
}
return opts
}
async function main() {
const opts = parseArgs()
const qaReport = await readFile(resolve(opts.qaReport), 'utf-8')
const prDiff = await readFile(resolve(opts.prDiff), 'utf-8')
console.warn(
`QA report: ${basename(opts.qaReport)} (${qaReport.length} chars)`
)
console.warn(`PR diff: ${basename(opts.prDiff)} (${prDiff.length} chars)`)
const testCode = await generateTest(qaReport, prDiff, opts.model)
const outputPath = resolve(opts.output)
await writeFile(outputPath, testCode + '\n')
console.warn(`Generated test: ${outputPath} (${testCode.length} chars)`)
}
main().catch((err) => {
console.error(err)
process.exit(1)
})

2038
scripts/qa-record.ts Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,135 @@
<!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><title>QA Session Recordings</title>
<link rel=preconnect href=https://fonts.googleapis.com><link rel=preconnect href=https://fonts.gstatic.com crossorigin><link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel=stylesheet>
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<style>
:root{--bg:oklch(97% 0.01 265);--surface:oklch(100% 0 0);--surface-up:oklch(94% 0.01 265);--fg:oklch(15% 0.02 265);--fg-muted:oklch(40% 0.01 265);--fg-dim:oklch(55% 0.01 265);--primary:oklch(50% 0.21 265);--primary-up:oklch(45% 0.21 265);--primary-glow:oklch(55% 0.15 265);--ok:oklch(45% 0.18 155);--err:oklch(50% 0.22 25);--border:oklch(85% 0.01 265);--border-faint:oklch(90% 0.01 265);--r:0.75rem;--r-lg:1rem;--ease-out:cubic-bezier(0.22,1,0.36,1);--dur-base:250ms;--dur-slow:500ms;--font:'Inter',system-ui,sans-serif;--font-mono:'JetBrains Mono',monospace}
@media(prefers-color-scheme:dark){:root{--bg:oklch(8% 0.02 265);--surface:oklch(12% 0.02 265);--surface-up:oklch(16% 0.02 265);--fg:oklch(96% 0.01 95);--fg-muted:oklch(65% 0.01 265);--fg-dim:oklch(45% 0.01 265);--primary:oklch(62% 0.21 265);--primary-up:oklch(68% 0.21 265);--primary-glow:oklch(62% 0.15 265);--ok:oklch(62% 0.18 155);--err:oklch(62% 0.22 25);--border:oklch(22% 0.02 265);--border-faint:oklch(15% 0.01 265)}}
*{margin:0;padding:0;box-sizing:border-box}
body{background:var(--bg);color:var(--fg);font-family:var(--font);min-height:100vh;padding:clamp(1.5rem,4vw,3rem) clamp(1rem,3vw,2rem);position:relative}
@media(prefers-color-scheme:dark){body::after{content:'';position:fixed;inset:0;pointer-events:none;opacity:.03;background:url("data:image/svg+xml,%3Csvg viewBox='0 0 256 256' xmlns='http://www.w3.org/2000/svg'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='.85' numOctaves='4' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='100%25' height='100%25' filter='url(%23n)'/%3E%3C/svg%3E")}}
.container{max-width:1200px;margin:0 auto}
header{display:flex;align-items:center;gap:1rem;margin-bottom:clamp(1.5rem,4vw,3rem);padding-bottom:1.25rem;border-bottom:1px solid var(--border)}
.header-icon{width:36px;height:36px;display:grid;place-items:center;background:linear-gradient(135deg,oklch(100% 0 0/.06),oklch(100% 0 0/.02));backdrop-filter:blur(12px);border:1px solid oklch(100% 0 0/.1);border-radius:var(--r);flex-shrink:0}
.header-icon svg{color:var(--primary)}
h1{font-size:clamp(1.25rem,2.5vw,1.625rem);font-weight:700;letter-spacing:-.03em;background:linear-gradient(135deg,var(--fg),var(--fg-muted));-webkit-background-clip:text;-webkit-text-fill-color:transparent;background-clip:text}
.meta{color:var(--fg-dim);font-size:.8125rem;margin-top:.15rem;letter-spacing:.01em}
.grid{display:grid;grid-template-columns:repeat(auto-fill,minmax(min(480px,100%),1fr));gap:1.5rem}
.card{background:var(--surface);border:1px solid var(--border);border-radius:var(--r-lg);overflow:hidden;transition:border-color var(--dur-base) var(--ease-out),box-shadow var(--dur-base) var(--ease-out),transform var(--dur-base) var(--ease-out)}
.card:hover{border-color:var(--primary);box-shadow:0 4px 16px oklch(0% 0 0/.1);transform:translateY(-2px)}
.video-wrap{position:relative;background:var(--surface);border-bottom:1px solid var(--border-faint)}
.video-wrap video{width:100%;display:block;aspect-ratio:16/9;object-fit:contain}
.card-body{padding:.75rem 1rem;display:flex;align-items:center;justify-content:space-between}
.platform{display:flex;align-items:center;gap:.5rem;font-weight:600;font-size:.9375rem;letter-spacing:-.01em}
.icon{font-size:1.125rem}
.links{display:flex;gap:.5rem}
.dl{color:var(--fg-muted);text-decoration:none;font-size:.75rem;font-weight:500;display:inline-flex;align-items:center;gap:.3rem;padding:.25rem .6rem;border-radius:9999px;border:1px solid var(--border);background:oklch(100% 0 0/.03);transition:all var(--dur-base) var(--ease-out)}
.dl:hover{color:var(--primary-up);border-color:var(--primary);background:oklch(62% 0.21 265/.08)}
.badge{font-size:.6875rem;font-weight:600;padding:.2rem .625rem;border-radius:9999px;text-transform:uppercase;letter-spacing:.05em}
.card-header{padding:.75rem 1rem;display:flex;align-items:center;justify-content:space-between;border-bottom:1px solid var(--border-faint)}
.comparison{display:grid;grid-template-columns:1fr 1fr;gap:0}
.comp-panel{border-right:1px solid var(--border-faint)}
.comp-panel:last-child{border-right:none}
.comp-label{padding:.4rem .75rem;font-size:.7rem;font-weight:600;text-transform:uppercase;letter-spacing:.05em;color:var(--fg-muted);background:var(--surface);display:flex;align-items:center;gap:.4rem}
.comp-tag{font-size:.6rem;padding:.1rem .4rem;border-radius:9999px;font-weight:600}
.comp-panel:first-child .comp-tag{background:oklch(65% 0.01 265/.15);color:var(--fg-muted);border:1px solid var(--border)}
.comp-panel:last-child .comp-tag{background:oklch(62% 0.18 155/.15);color:var(--ok);border:1px solid oklch(62% 0.18 155/.25)}
.comp-dl{padding:.4rem .75rem;display:flex;justify-content:center}
.report{border-top:1px solid var(--border-faint);padding:.75rem 1rem;font-size:.8125rem}
.report summary{cursor:pointer;color:var(--fg-muted);font-weight:500;display:flex;align-items:center;gap:.4rem;user-select:none;transition:color var(--dur-base) var(--ease-out)}
.report summary:hover{color:var(--fg)}
.report summary svg{flex-shrink:0;opacity:.5}
.report[open] summary{margin-bottom:.75rem;padding-bottom:.5rem;border-bottom:1px solid var(--border-faint)}
.report-body{line-height:1.7;color:oklch(80% 0.01 265);overflow-x:auto}
.report-body h1,.report-body h2{margin:1.25rem 0 .5rem;color:var(--fg);font-size:1rem;font-weight:600;letter-spacing:-.02em;border-bottom:1px solid var(--border-faint);padding-bottom:.4rem}
.report-body h3{margin:.75rem 0 .4rem;color:var(--fg);font-size:.875rem;font-weight:600}
.report-body p{margin:.4rem 0}
.report-body ul,.report-body ol{margin:.4rem 0 .4rem 1.5rem}
.report-body li{margin:.25rem 0}
.report-body code{background:var(--surface-up);padding:.125rem .375rem;border-radius:.25rem;font-size:.7rem;font-family:var(--font-mono);border:1px solid var(--border-faint)}
.report-body h3+p>code:first-child{background:oklch(62% 0.22 25/.15);color:var(--err);border-color:oklch(62% 0.22 25/.25)}
.report-body h3+p>code:nth-child(2){background:oklch(62% 0.21 265/.15);color:var(--primary-up);border-color:oklch(62% 0.21 265/.25)}
.report-body h3+p>code:nth-child(3){background:oklch(65% 0.01 265/.15);color:var(--fg-muted);border-color:var(--border)}
.report-body table{width:100%;border-collapse:collapse;margin:.75rem 0;font-size:.75rem;border:1px solid var(--border);border-radius:var(--r);overflow:hidden}
.report-body th,.report-body td{border:1px solid var(--border-faint);padding:.5rem .75rem;text-align:left;vertical-align:top;word-wrap:break-word}
.report-body th{background:var(--surface-up);color:var(--fg);font-weight:600;font-size:.6875rem;text-transform:uppercase;letter-spacing:.05em;position:sticky;top:0;white-space:nowrap}
.report-body tr:nth-child(even){background:color-mix(in oklch,var(--surface) 50%,transparent)}
.report-body tr:hover{background:color-mix(in oklch,var(--surface-up) 50%,transparent)}
.report-body strong{color:var(--fg)}
.report-body hr{border:none;border-top:1px solid var(--border-faint);margin:1rem 0}
@keyframes fade-up{from{opacity:0;transform:translateY(16px)}to{opacity:1;transform:translateY(0)}}
.reveal{animation:fade-up var(--dur-slow) var(--ease-out) both;animation-delay:calc(var(--i,0) * 120ms)}
@media(prefers-reduced-motion:reduce){.reveal{animation:none}}
@media(max-width:480px){.grid{grid-template-columns:1fr}.card-body{flex-wrap:wrap;gap:.5rem}}
.sha{color:var(--primary);text-decoration:none;font-family:var(--font-mono);font-size:.75rem;font-weight:500;padding:.1rem .4rem;border-radius:.25rem;background:oklch(62% 0.21 265/.08);border:1px solid oklch(62% 0.21 265/.15);transition:all var(--dur-base) var(--ease-out)}
.sha:hover{background:oklch(62% 0.21 265/.15);border-color:var(--primary)}
.badge-bar{display:flex;align-items:center;gap:.5rem;margin-bottom:1rem}
.badge-img{height:20px;display:block}
.copy-badge{background:oklch(100% 0 0/.06);border:1px solid var(--border);color:var(--fg-muted);padding:.3rem .4rem;border-radius:var(--r);cursor:pointer;display:inline-flex;align-items:center;transition:all var(--dur-base) var(--ease-out)}
.copy-badge:hover{color:var(--primary-up);border-color:var(--primary);background:oklch(62% 0.21 265/.1)}
.copy-badge.copied{color:var(--ok);border-color:var(--ok)}
.vseek{width:100%;padding:0 .75rem;background:var(--surface);border-top:1px solid var(--border-faint);position:relative;height:24px;display:flex;align-items:center}
.vseek input[type=range]{-webkit-appearance:none;appearance:none;width:100%;height:4px;background:var(--border);border-radius:2px;outline:none;cursor:pointer;position:relative;z-index:2}
.vseek input[type=range]::-webkit-slider-thumb{-webkit-appearance:none;width:12px;height:12px;border-radius:50%;background:var(--primary);cursor:pointer;border:2px solid var(--bg);box-shadow:0 0 4px oklch(0% 0 0/.3)}
.vseek input[type=range]::-moz-range-thumb{width:12px;height:12px;border-radius:50%;background:var(--primary);cursor:pointer;border:2px solid var(--bg)}
.vseek .vbuf{position:absolute;left:.75rem;right:.75rem;height:4px;border-radius:2px;pointer-events:none;top:50%;transform:translateY(-50%)}
.vseek .vbuf-bar{height:100%;background:oklch(62% 0.21 265/.25);border-radius:2px;transition:width 200ms linear}
.vctrl{display:flex;align-items:center;gap:.375rem;padding:.5rem .75rem;background:var(--surface);border-top:1px solid var(--border-faint);flex-wrap:wrap}
.vctrl button{background:oklch(100% 0 0/.06);border:1px solid var(--border);color:var(--fg-muted);font-size:.6875rem;font-weight:600;font-family:var(--font-mono);padding:.25rem .5rem;border-radius:.25rem;cursor:pointer;transition:all var(--dur-base) var(--ease-out);white-space:nowrap}
.vctrl button:hover{color:var(--primary-up);border-color:var(--primary);background:oklch(62% 0.21 265/.1)}
.vctrl button.active{color:var(--primary);border-color:var(--primary);background:oklch(62% 0.21 265/.15)}
.vctrl .vtime{font-family:var(--font-mono);font-size:.6875rem;color:var(--fg-dim);min-width:10ch;text-align:center}
.vctrl .vsep{width:1px;height:1rem;background:var(--border);flex-shrink:0}
.vctrl .vhint{font-size:.6rem;color:var(--fg-dim);margin-left:auto}
.purpose{background:linear-gradient(135deg,oklch(100% 0 0/.04),oklch(100% 0 0/.02));border:1px solid oklch(100% 0 0/.08);border-radius:var(--r-lg);padding:1rem 1.25rem;margin-bottom:1.5rem;font-size:.85rem;line-height:1.7;color:oklch(80% 0.01 265)}
.purpose strong{color:var(--fg);font-weight:600}
.purpose .purpose-label{font-size:.7rem;font-weight:600;text-transform:uppercase;letter-spacing:.05em;color:var(--fg-muted);margin-bottom:.4rem}
.purpose .purpose-reqs{margin-top:.75rem;padding-top:.75rem;border-top:1px solid oklch(100% 0 0/.06);font-size:.8rem;color:oklch(70% 0.01 265);line-height:1.8}
</style></head><body><div class=container>
<header><div class=header-icon><svg width=20 height=20 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2 stroke-linecap=round stroke-linejoin=round><polygon points="23 7 16 12 23 17 23 7"/><rect x=1 y=5 width=15 height=14 rx=2 ry=2/></svg></div><div><h1>QA Session Recordings</h1><div class=meta>ComfyUI Frontend &middot; Automated QA{{COMMIT_HTML}}{{RUN_LINK}}{{TIMING_HTML}}</div>{{BADGE_HTML}}</div></header>
{{PURPOSE_HTML}}<div class=grid>{{CARDS}}</div>
</div><script>
function copyBadge(){const u=location.href.replace(/\/[^/]*$/,'/');const b=u+'badge.svg';const md='[![QA Badge]('+b+')]('+u+')';navigator.clipboard.writeText(md).then(()=>{const btn=document.querySelector('.copy-badge');btn.classList.add('copied');btn.innerHTML='<svg width=14 height=14 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2><polyline points="20 6 9 17 4 12"/></svg>';setTimeout(()=>{btn.classList.remove('copied');btn.innerHTML='<svg width=14 height=14 viewBox="0 0 24 24" fill=none stroke=currentColor stroke-width=2><rect x=9 y=9 width=13 height=13 rx=2/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/></svg>'},2000)})}
document.querySelectorAll('[data-md]').forEach(el=>{const t=el.textContent;el.removeAttribute('data-md');el.innerHTML=marked.parse(t)});
const FPS=30,FT=1/FPS,SPEEDS=[0.1,0.25,0.5,1,1.5,2];
document.querySelectorAll('.video-wrap video').forEach(v=>{
v.playbackRate=0.5;v.removeAttribute('autoplay');v.pause();
const c=document.createElement('div');c.className='vctrl';
const btn=(label,fn)=>{const b=document.createElement('button');b.textContent=label;b.onclick=fn;c.appendChild(b);return b};
const sep=()=>{const s=document.createElement('div');s.className='vsep';c.appendChild(s)};
const time=document.createElement('span');time.className='vtime';time.textContent='0:00.000';
btn('\u23EE',()=>{v.currentTime=0});
btn('\u25C0\u25C0',()=>{v.currentTime=Math.max(0,v.currentTime-FT*10)});
btn('\u25C0',()=>{v.pause();v.currentTime=Math.max(0,v.currentTime-FT)});
const playBtn=btn('\u25B6',()=>{v.paused?v.play():v.pause()});
btn('\u25B6\u25B6',()=>{v.pause();v.currentTime+=FT});
btn('\u25B6\u25B6\u25B6',()=>{v.currentTime+=FT*10});
sep();
const spdBtns=SPEEDS.map(s=>{const b=btn(s+'x',()=>{v.playbackRate=s;spdBtns.forEach(x=>x.classList.remove('active'));b.classList.add('active')});if(s===0.5)b.classList.add('active');return b});
sep();c.appendChild(time);
const hint=document.createElement('span');hint.className='vhint';hint.textContent='\u2190\u2192 frame \u2022 space play';c.appendChild(hint);
// Custom seekbar — works even without server range request support
const seekWrap=document.createElement('div');seekWrap.className='vseek';
const seekBar=document.createElement('input');seekBar.type='range';seekBar.min=0;seekBar.max=1000;seekBar.value=0;seekBar.step=1;
const bufWrap=document.createElement('div');bufWrap.className='vbuf';
const bufBar=document.createElement('div');bufBar.className='vbuf-bar';bufBar.style.width='0%';
bufWrap.appendChild(bufBar);seekWrap.appendChild(bufWrap);seekWrap.appendChild(seekBar);
let seeking=false;
seekBar.oninput=()=>{seeking=true;if(v.duration){v.currentTime=v.duration*(seekBar.value/1000)}};
seekBar.onchange=()=>{seeking=false};
v.closest('.video-wrap').after(seekWrap);
seekWrap.after(c);
v.ontimeupdate=()=>{
const m=Math.floor(v.currentTime/60),s=Math.floor(v.currentTime%60),ms=Math.floor((v.currentTime%1)*1000);
time.textContent=m+':'+(s<10?'0':'')+s+'.'+String(ms).padStart(3,'0');
if(!seeking&&v.duration){seekBar.value=Math.round((v.currentTime/v.duration)*1000)}
};
v.onprogress=v.onloadeddata=()=>{if(v.buffered.length&&v.duration){bufBar.style.width=(v.buffered.end(v.buffered.length-1)/v.duration*100)+'%'}};
v.onplay=()=>{playBtn.textContent='\u23F8'};v.onpause=()=>{playBtn.textContent='\u25B6'};
v.parentElement.addEventListener('keydown',e=>{
if(e.key==='ArrowLeft'){e.preventDefault();v.pause();v.currentTime=Math.max(0,v.currentTime-FT)}
if(e.key==='ArrowRight'){e.preventDefault();v.pause();v.currentTime+=FT}
if(e.key===' '){e.preventDefault();v.paused?v.play():v.pause()}
});
v.parentElement.setAttribute('tabindex','0');
});
</script></body></html>

View File

@@ -0,0 +1,150 @@
import { describe, expect, it } from 'vitest'
import {
extractPlatformFromArtifactDirName,
pickLatestVideosByPlatform,
selectVideoCandidateByFile
} from './qa-video-review'
describe('extractPlatformFromArtifactDirName', () => {
it('extracts and normalizes known qa artifact directory names', () => {
expect(
extractPlatformFromArtifactDirName('qa-report-Windows-22818315023')
).toBe('windows')
expect(
extractPlatformFromArtifactDirName('qa-report-macOS-22818315023')
).toBe('macos')
expect(
extractPlatformFromArtifactDirName('qa-report-Linux-22818315023')
).toBe('linux')
})
it('falls back to slugifying unknown directory names', () => {
expect(extractPlatformFromArtifactDirName('custom platform run')).toBe(
'custom-platform-run'
)
})
})
describe('pickLatestVideosByPlatform', () => {
it('keeps only the latest candidate per platform', () => {
const selected = pickLatestVideosByPlatform([
{
platformName: 'windows',
videoPath: '/tmp/windows-old.mp4',
mtimeMs: 100
},
{
platformName: 'windows',
videoPath: '/tmp/windows-new.mp4',
mtimeMs: 200
},
{
platformName: 'linux',
videoPath: '/tmp/linux.mp4',
mtimeMs: 150
}
])
expect(selected).toEqual([
{
platformName: 'linux',
videoPath: '/tmp/linux.mp4',
mtimeMs: 150
},
{
platformName: 'windows',
videoPath: '/tmp/windows-new.mp4',
mtimeMs: 200
}
])
})
})
describe('selectVideoCandidateByFile', () => {
it('selects a single candidate by artifacts-relative path', () => {
const selected = selectVideoCandidateByFile(
[
{
platformName: 'windows',
videoPath: '/tmp/qa-artifacts/qa-report-Windows-1/qa-session.mp4',
mtimeMs: 100
},
{
platformName: 'linux',
videoPath: '/tmp/qa-artifacts/qa-report-Linux-1/qa-session.mp4',
mtimeMs: 200
}
],
{
artifactsDir: '/tmp/qa-artifacts',
videoFile: 'qa-report-Linux-1/qa-session.mp4'
}
)
expect(selected).toEqual({
platformName: 'linux',
videoPath: '/tmp/qa-artifacts/qa-report-Linux-1/qa-session.mp4',
mtimeMs: 200
})
})
it('throws when basename matches multiple videos', () => {
expect(() =>
selectVideoCandidateByFile(
[
{
platformName: 'windows',
videoPath: '/tmp/qa-artifacts/qa-report-Windows-1/qa-session.mp4',
mtimeMs: 100
},
{
platformName: 'linux',
videoPath: '/tmp/qa-artifacts/qa-report-Linux-1/qa-session.mp4',
mtimeMs: 200
}
],
{
artifactsDir: '/tmp/qa-artifacts',
videoFile: 'qa-session.mp4'
}
)
).toThrow('matched 2 videos')
})
it('throws when there is no matching video', () => {
expect(() =>
selectVideoCandidateByFile(
[
{
platformName: 'windows',
videoPath: '/tmp/qa-artifacts/qa-report-Windows-1/qa-session.mp4',
mtimeMs: 100
}
],
{
artifactsDir: '/tmp/qa-artifacts',
videoFile: 'qa-report-macOS-1/qa-session.mp4'
}
)
).toThrow('No video matched')
})
it('throws when video file is missing', () => {
expect(() =>
selectVideoCandidateByFile(
[
{
platformName: 'windows',
videoPath: '/tmp/qa-artifacts/qa-report-Windows-1/qa-session.mp4',
mtimeMs: 100
}
],
{
artifactsDir: '/tmp/qa-artifacts',
videoFile: ' '
}
)
).toThrow('--video-file is required')
})
})

757
scripts/qa-video-review.ts Normal file
View File

@@ -0,0 +1,757 @@
#!/usr/bin/env tsx
import { mkdir, readFile, stat, writeFile } from 'node:fs/promises'
import { basename, dirname, extname, relative, resolve } from 'node:path'
import { fileURLToPath } from 'node:url'
import { GoogleGenerativeAI } from '@google/generative-ai'
import { globSync } from 'glob'
interface CliOptions {
artifactsDir: string
videoFile: string
beforeVideo: string
outputDir: string
model: string
requestTimeoutMs: number
dryRun: boolean
prContext: string
targetUrl: string
passLabel: string
}
interface VideoCandidate {
platformName: string
videoPath: string
mtimeMs: number
}
const DEFAULT_OPTIONS: CliOptions = {
artifactsDir: './tmp/qa-artifacts',
videoFile: '',
beforeVideo: '',
outputDir: './tmp',
model: 'gemini-3-flash-preview',
requestTimeoutMs: 300_000,
dryRun: false,
prContext: '',
targetUrl: '',
passLabel: ''
}
const USAGE = `Usage:
pnpm exec tsx scripts/qa-video-review.ts [options]
Options:
--artifacts-dir <path> Artifacts root directory
(default: ./tmp/qa-artifacts)
--video-file <name-or-path> Video file to analyze (required)
(supports basename or relative/absolute path)
--before-video <path> Before video (main branch) for comparison
When provided, sends both videos to Gemini
for comparative before/after analysis
--output-dir <path> Output directory for markdown reports
(default: ./tmp)
--model <name> Gemini model
(default: gemini-3-flash-preview)
--request-timeout-ms <n> Request timeout in milliseconds
(default: 300000)
--pr-context <file> File with PR context (title, body, diff)
for PR-aware review
--target-url <url> Issue or PR URL to include in the report
--pass-label <label> Label for multi-pass reports (e.g. pass1)
Output becomes {platform}-{label}-qa-video-report.md
--dry-run Discover videos and output targets only
--help Show this help text
Environment:
GEMINI_API_KEY Required unless --dry-run
`
function parsePositiveInteger(rawValue: string, flagName: string): number {
const parsedValue = Number.parseInt(rawValue, 10)
if (!Number.isInteger(parsedValue) || parsedValue <= 0) {
throw new Error(`Invalid value for ${flagName}: "${rawValue}"`)
}
return parsedValue
}
function parseCliOptions(args: string[]): CliOptions {
const options: CliOptions = { ...DEFAULT_OPTIONS }
for (let index = 0; index < args.length; index += 1) {
const argument = args[index]
const nextValue = args[index + 1]
const requireValue = (flagName: string): string => {
if (!nextValue || nextValue.startsWith('--')) {
throw new Error(`Missing value for ${flagName}`)
}
index += 1
return nextValue
}
if (argument === '--help') {
process.stdout.write(USAGE)
process.exit(0)
}
if (argument === '--artifacts-dir') {
options.artifactsDir = requireValue(argument)
continue
}
if (argument === '--video-file') {
options.videoFile = requireValue(argument)
continue
}
if (argument === '--output-dir') {
options.outputDir = requireValue(argument)
continue
}
if (argument === '--model') {
options.model = requireValue(argument)
continue
}
if (argument === '--request-timeout-ms') {
options.requestTimeoutMs = parsePositiveInteger(
requireValue(argument),
argument
)
continue
}
if (argument === '--before-video') {
options.beforeVideo = requireValue(argument)
continue
}
if (argument === '--pr-context') {
options.prContext = requireValue(argument)
continue
}
if (argument === '--target-url') {
options.targetUrl = requireValue(argument)
continue
}
if (argument === '--pass-label') {
options.passLabel = requireValue(argument)
continue
}
if (argument === '--dry-run') {
options.dryRun = true
continue
}
throw new Error(`Unknown argument: ${argument}`)
}
return options
}
function normalizePlatformName(value: string): string {
const slug = value
.trim()
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-+|-+$/g, '')
return slug.length > 0 ? slug : 'unknown-platform'
}
export function extractPlatformFromArtifactDirName(dirName: string): string {
const matchedValue = dirName.match(/^qa-report-(.+?)(?:-\d+)?$/i)?.[1]
return normalizePlatformName(matchedValue ?? dirName)
}
function extractPlatformFromVideoPath(videoPath: string): string {
const artifactDirName = basename(dirname(videoPath))
return extractPlatformFromArtifactDirName(artifactDirName)
}
export function pickLatestVideosByPlatform(
candidates: VideoCandidate[]
): VideoCandidate[] {
const latestByPlatform = new Map<string, VideoCandidate>()
for (const candidate of candidates) {
const current = latestByPlatform.get(candidate.platformName)
if (!current || candidate.mtimeMs > current.mtimeMs) {
latestByPlatform.set(candidate.platformName, candidate)
}
}
return [...latestByPlatform.values()].sort((a, b) =>
a.platformName.localeCompare(b.platformName)
)
}
function toProjectRelativePath(targetPath: string): string {
const relativePath = relative(process.cwd(), targetPath)
if (relativePath.startsWith('.')) {
return relativePath
}
return `./${relativePath}`
}
function errorToString(error: unknown): string {
return error instanceof Error ? error.message : String(error)
}
function normalizePathForMatch(value: string): string {
return value.replaceAll('\\', '/').replace(/^\.\/+/, '')
}
export function selectVideoCandidateByFile(
candidates: VideoCandidate[],
options: { artifactsDir: string; videoFile: string }
): VideoCandidate {
const requestedValue = options.videoFile.trim()
if (requestedValue.length === 0) {
throw new Error('--video-file is required')
}
const artifactsRoot = resolve(options.artifactsDir)
const requestedAbsolutePath = resolve(requestedValue)
const requestedPathKey = normalizePathForMatch(requestedValue)
const matches = candidates.filter((candidate) => {
const candidateAbsolutePath = resolve(candidate.videoPath)
if (candidateAbsolutePath === requestedAbsolutePath) {
return true
}
const candidateBaseName = basename(candidate.videoPath)
if (candidateBaseName === requestedValue) {
return true
}
const relativeToCwd = normalizePathForMatch(
relative(process.cwd(), candidateAbsolutePath)
)
if (relativeToCwd === requestedPathKey) {
return true
}
const relativeToArtifacts = normalizePathForMatch(
relative(artifactsRoot, candidateAbsolutePath)
)
return relativeToArtifacts === requestedPathKey
})
if (matches.length === 1) {
return matches[0]
}
if (matches.length === 0) {
const availableVideos = candidates.map((candidate) =>
toProjectRelativePath(candidate.videoPath)
)
throw new Error(
[
`No video matched --video-file "${options.videoFile}".`,
'Available videos:',
...availableVideos.map((videoPath) => `- ${videoPath}`)
].join('\n')
)
}
throw new Error(
[
`--video-file "${options.videoFile}" matched ${matches.length} videos.`,
'Please pass a more specific path.',
...matches.map((match) => `- ${toProjectRelativePath(match.videoPath)}`)
].join('\n')
)
}
async function collectVideoCandidates(
artifactsDir: string
): Promise<VideoCandidate[]> {
const absoluteArtifactsDir = resolve(artifactsDir)
const videoPaths = globSync('**/qa-session{,-[0-9]}.mp4', {
cwd: absoluteArtifactsDir,
absolute: true,
nodir: true
}).sort()
const candidates = await Promise.all(
videoPaths.map(async (videoPath) => {
const videoStat = await stat(videoPath)
return {
platformName: extractPlatformFromVideoPath(videoPath),
videoPath,
mtimeMs: videoStat.mtimeMs
}
})
)
return candidates
}
function getMimeType(filePath: string): string {
const ext = extname(filePath).toLowerCase()
const mimeMap: Record<string, string> = {
'.mp4': 'video/mp4',
'.webm': 'video/webm',
'.mov': 'video/quicktime',
'.avi': 'video/x-msvideo',
'.mkv': 'video/x-matroska',
'.m4v': 'video/mp4'
}
return mimeMap[ext] || 'video/mp4'
}
function buildReviewPrompt(options: {
platformName: string
videoPath: string
prContext: string
isComparative: boolean
}): string {
const { platformName, videoPath, prContext, isComparative } = options
if (isComparative) {
return buildComparativePrompt(platformName, videoPath, prContext)
}
return buildSingleVideoPrompt(platformName, videoPath, prContext)
}
function buildComparativePrompt(
platformName: string,
videoPath: string,
prContext: string
): string {
const lines = [
'You are a senior QA engineer performing a BEFORE/AFTER comparison review.',
'',
'You are given TWO videos:',
'- **Video 1 (BEFORE)**: The main branch BEFORE the PR. This shows the OLD behavior.',
'- **Video 2 (AFTER)**: The PR branch AFTER the changes. This shows the NEW behavior.',
'',
'Both videos show the same test steps executed on different code versions.',
''
]
if (prContext) {
lines.push('## PR Context', prContext, '')
}
lines.push(
'## Your Task',
`Platform: "${platformName}". After video: ${toProjectRelativePath(videoPath)}.`,
'',
'1. **BEFORE video**: Does it demonstrate the old behavior or bug that the PR aims to fix?',
' Describe what you observe — this establishes the baseline.',
'2. **AFTER video**: Does it prove the PR fix works? Is the intended new behavior visible?',
'3. **Comparison**: What specifically changed between before and after?',
'4. **Regressions**: Did the PR introduce any new problems visible in the AFTER video',
' that were NOT present in the BEFORE video?',
'',
'Note: Brief black frames during page transitions are NORMAL.',
'Note: Small cyan/purple dashed labels prefixed with "QA:" are annotations placed by the automated test script — they are NOT part of the application UI. Do not treat them as bugs or evidence.',
'Report only concrete, visible differences. Avoid speculation.',
'',
'Return markdown with these sections exactly:',
'## Summary',
'(What the PR changes, whether BEFORE confirms the old behavior, whether AFTER proves the fix)',
'',
'## Behavior Changes',
'Summarize ALL behavioral differences as a markdown TABLE:',
'| Behavior | Before (main) | After (PR) | Verdict |',
'',
'- **Behavior**: short name for the behavior (e.g. "Save shortcut label", "Menu hover style")',
'- **Before (main)**: how it works/looks in the BEFORE video',
'- **After (PR)**: how it works/looks in the AFTER video',
'- **Verdict**: `Fixed`, `Improved`, `Changed`, `Regression`, or `No Change`',
'',
'One row per distinct behavior. Include both changed AND unchanged key behaviors',
'that were tested, so reviewers can confirm nothing was missed.',
'',
'## Timeline Comparison',
'Present a chronological frame-by-frame comparison as a markdown TABLE:',
'| Time | Type | Severity | Before (main) | After (PR) |',
'',
'- **Time**: timestamp or range from the videos (e.g. `0:05-0:08`)',
'- **Type**: category such as `Visual`, `Behavior`, `Layout`, `Text`, `Animation`, `Menu`, `State`',
'- **Severity**: `None` (neutral change), `Fixed` (bug resolved), `Regression`, `Minor`, `Major`',
'- **Before (main)**: what is observed in the BEFORE video at that time',
'- **After (PR)**: what is observed in the AFTER video at that time',
'',
'Include one row per distinct observable difference. If behavior is identical at a timestamp,',
'omit that row. Focus on meaningful differences, not narrating every frame.',
'',
'## Confirmed Issues',
'For each issue, use this exact format:',
'',
'### [Short issue title]',
'`SEVERITY` `TIMESTAMP` `Confidence: LEVEL`',
'',
'[Description — specify whether it appears in BEFORE, AFTER, or both]',
'',
'**Evidence:** [What you observed at the given timestamp in which video]',
'',
'**Suggested Fix:** [Actionable recommendation]',
'',
'---',
'',
'## Possible Issues (Needs Human Verification)',
'## Overall Risk',
'(Assess whether the PR achieves its goal based on the before/after comparison)',
'',
'## Verdict',
'End your report with this EXACT JSON block (no markdown fence):',
'{"verdict": "REPRODUCED" | "NOT_REPRODUCIBLE" | "INCONCLUSIVE", "risk": "low" | "medium" | "high", "confidence": "high" | "medium" | "low"}',
'- REPRODUCED: the before video confirms the old behavior and the after video shows the fix working',
'- NOT_REPRODUCIBLE: the before video does not show the reported bug',
'- INCONCLUSIVE: the videos do not adequately demonstrate the behavior change'
)
return lines.filter(Boolean).join('\n')
}
function buildSingleVideoPrompt(
platformName: string,
videoPath: string,
prContext: string
): string {
const lines = [
'You are a senior QA engineer reviewing a UI test session recording.',
''
]
const isIssueContext =
prContext &&
/^### Issue #|^Title:.*\bbug\b|^This video attempts to reproduce/im.test(
prContext
)
if (prContext) {
if (isIssueContext) {
lines.push(
'## Issue Context',
'This video attempts to reproduce a reported bug on the main branch.',
'Your review MUST evaluate whether the reported bug is visible and reproducible.',
'',
prContext,
'',
'## Review Instructions',
'1. Does the video demonstrate the reported bug occurring?',
'2. Is the bug clearly visible and reproducible from the steps shown?',
'3. Are there any other issues visible during the reproduction attempt?',
'',
'## CRITICAL: Honesty Requirements',
'- If the video only shows login, idle canvas, or trivial menu interactions WITHOUT actually performing the reproduction steps, say "INCONCLUSIVE — reproduction steps were not performed".',
'- Do NOT claim a bug is "confirmed" unless you can clearly see the bug behavior described in the issue.',
'- Do NOT hallucinate findings. If the video does not show meaningful interaction, say so clearly.',
'- Rate confidence as "Low" if the video does not actually demonstrate the bug scenario.',
''
)
} else {
lines.push(
'## PR Context',
'The video is a QA session testing a specific pull request.',
'Your review MUST evaluate whether the PR achieves its stated purpose.',
'',
prContext,
'',
'## Review Instructions',
"1. Does the video demonstrate the PR's intended behavior working correctly?",
'2. Are there regressions or side effects caused by the PR changes?',
'3. Does the observed behavior match what the PR claims to implement/fix?',
''
)
}
}
lines.push(
`Review this QA session video for platform "${platformName}".`,
`Source video: ${toProjectRelativePath(videoPath)}.`,
'The video shows the full test session — analyze it chronologically.',
'Focus on UI regressions, broken states, visual glitches, unreadable text, missing labels/i18n, and clear workflow failures.',
'Note: Brief black frames during page transitions are NORMAL and should NOT be reported as issues.',
'Note: Small cyan/purple dashed labels prefixed with "QA:" are annotations placed by the automated test script — they are NOT part of the application UI. Do not treat them as bugs or evidence.',
'Report only concrete, visible problems and avoid speculation.',
'If confidence is low, mark it explicitly.',
'',
'Return markdown with these sections exactly:',
'## Summary',
isIssueContext
? '(Explain what bug was reported and whether the video confirms it is reproducible)'
: prContext
? '(Explain what the PR intended and whether the video confirms it works)'
: '',
'## Confirmed Issues',
'For each confirmed issue, use this exact format (one block per issue):',
'',
'### [Short issue title]',
'`HIGH` `01:03` `Confidence: High`',
'',
'[Description of the issue — what went wrong and what was expected]',
'',
'**Evidence:** [What you observed in the video at the given timestamp]',
'',
'**Suggested Fix:** [Actionable recommendation]',
'',
'---',
'',
'The first line after the heading MUST be exactly three backtick-wrapped labels:',
'`SEVERITY` `TIMESTAMP` `Confidence: LEVEL`',
'Do NOT use a table for issues — use the block format above.',
'## Possible Issues (Needs Human Verification)',
'## Overall Risk',
'',
'## Verdict',
'End your report with this EXACT JSON block (no markdown fence):',
'{"verdict": "REPRODUCED" | "NOT_REPRODUCIBLE" | "INCONCLUSIVE", "risk": "low" | "medium" | "high" | null, "confidence": "high" | "medium" | "low"}',
'- REPRODUCED: the bug/behavior is clearly visible in the video',
'- NOT_REPRODUCIBLE: the steps were performed correctly but the bug was not observed',
'- INCONCLUSIVE: the reproduction steps were not performed or the video is insufficient'
)
return lines.filter(Boolean).join('\n')
}
const MAX_VIDEO_BYTES = 100 * 1024 * 1024
async function readVideoFile(videoPath: string): Promise<Buffer> {
const fileStat = await stat(videoPath)
if (fileStat.size > MAX_VIDEO_BYTES) {
throw new Error(
`Video ${basename(videoPath)} is ${formatBytes(fileStat.size)}, exceeds ${formatBytes(MAX_VIDEO_BYTES)} limit`
)
}
return readFile(videoPath)
}
async function requestGeminiReview(options: {
apiKey: string
model: string
platformName: string
videoPath: string
beforeVideoPath: string
timeoutMs: number
prContext: string
}): Promise<string> {
const genAI = new GoogleGenerativeAI(options.apiKey)
const model = genAI.getGenerativeModel({ model: options.model })
const isComparative = options.beforeVideoPath.length > 0
const prompt = buildReviewPrompt({
platformName: options.platformName,
videoPath: options.videoPath,
prContext: options.prContext,
isComparative
})
const parts: Array<
{ text: string } | { inlineData: { mimeType: string; data: string } }
> = [{ text: prompt }]
if (isComparative) {
const beforeBuffer = await readVideoFile(options.beforeVideoPath)
parts.push(
{ text: 'Video 1 — BEFORE (main branch):' },
{
inlineData: {
mimeType: getMimeType(options.beforeVideoPath),
data: beforeBuffer.toString('base64')
}
}
)
}
const afterBuffer = await readVideoFile(options.videoPath)
if (isComparative) {
parts.push({ text: 'Video 2 — AFTER (PR branch):' })
}
parts.push({
inlineData: {
mimeType: getMimeType(options.videoPath),
data: afterBuffer.toString('base64')
}
})
const result = await model.generateContent(parts, {
timeout: options.timeoutMs
})
const response = result.response
const text = response.text()
if (!text || text.trim().length === 0) {
throw new Error('Gemini API returned no output text')
}
return text.trim()
}
function formatBytes(bytes: number): string {
if (bytes < 1024) return `${bytes} B`
if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`
return `${(bytes / (1024 * 1024)).toFixed(1)} MB`
}
function buildReportMarkdown(input: {
platformName: string
model: string
videoPath: string
videoSizeBytes: number
beforeVideoPath?: string
beforeVideoSizeBytes?: number
reviewText: string
targetUrl?: string
}): string {
const headerLines = [
`# ${input.platformName} QA Video Report`,
'',
`- Generated at: ${new Date().toISOString()}`,
`- Model: \`${input.model}\``
]
if (input.targetUrl) {
headerLines.push(`- Target: ${input.targetUrl}`)
}
if (input.beforeVideoPath) {
headerLines.push(
`- Before video: \`${toProjectRelativePath(input.beforeVideoPath)}\` (${formatBytes(input.beforeVideoSizeBytes ?? 0)})`,
`- After video: \`${toProjectRelativePath(input.videoPath)}\` (${formatBytes(input.videoSizeBytes)})`,
'- Mode: **Comparative (before/after)**'
)
} else {
headerLines.push(
`- Source video: \`${toProjectRelativePath(input.videoPath)}\``,
`- Video size: ${formatBytes(input.videoSizeBytes)}`
)
}
headerLines.push('', '## AI Review', '')
return `${headerLines.join('\n')}${input.reviewText.trim()}\n`
}
async function reviewVideo(
video: VideoCandidate,
options: CliOptions,
apiKey: string
): Promise<void> {
let prContext = ''
if (options.prContext) {
try {
prContext = await readFile(options.prContext, 'utf-8')
process.stdout.write(
`[${video.platformName}] Loaded PR context from ${options.prContext}\n`
)
} catch {
process.stdout.write(
`[${video.platformName}] Warning: Could not read PR context file ${options.prContext}\n`
)
}
}
const beforeVideoPath = options.beforeVideo
? resolve(options.beforeVideo)
: ''
if (beforeVideoPath) {
const beforeStat = await stat(beforeVideoPath)
process.stdout.write(
`[${video.platformName}] Before video: ${toProjectRelativePath(beforeVideoPath)} (${formatBytes(beforeStat.size)})\n`
)
}
process.stdout.write(
`[${video.platformName}] Sending ${beforeVideoPath ? '2 videos (comparative)' : 'video'} to ${options.model}\n`
)
const reviewText = await requestGeminiReview({
apiKey,
model: options.model,
platformName: video.platformName,
videoPath: video.videoPath,
beforeVideoPath,
timeoutMs: options.requestTimeoutMs,
prContext
})
const videoStat = await stat(video.videoPath)
const passSegment = options.passLabel ? `-${options.passLabel}` : ''
const outputPath = resolve(
options.outputDir,
`${video.platformName}${passSegment}-qa-video-report.md`
)
const reportInput: Parameters<typeof buildReportMarkdown>[0] = {
platformName: video.platformName,
model: options.model,
videoPath: video.videoPath,
videoSizeBytes: videoStat.size,
reviewText,
targetUrl: options.targetUrl || undefined
}
if (beforeVideoPath) {
const beforeStat = await stat(beforeVideoPath)
reportInput.beforeVideoPath = beforeVideoPath
reportInput.beforeVideoSizeBytes = beforeStat.size
}
const reportMarkdown = buildReportMarkdown(reportInput)
await mkdir(dirname(outputPath), { recursive: true })
await writeFile(outputPath, reportMarkdown, 'utf-8')
process.stdout.write(
`[${video.platformName}] Wrote ${toProjectRelativePath(outputPath)}\n`
)
}
function isExecutedAsScript(metaUrl: string): boolean {
const modulePath = fileURLToPath(metaUrl)
const scriptPath = process.argv[1] ? resolve(process.argv[1]) : ''
return modulePath === scriptPath
}
async function main(): Promise<void> {
const options = parseCliOptions(process.argv.slice(2))
const candidates = await collectVideoCandidates(options.artifactsDir)
if (candidates.length === 0) {
process.stdout.write(
`No qa-session.mp4 files found under ${toProjectRelativePath(resolve(options.artifactsDir))}\n`
)
return
}
const selectedVideo = selectVideoCandidateByFile(candidates, {
artifactsDir: options.artifactsDir,
videoFile: options.videoFile
})
process.stdout.write(
`Selected ${selectedVideo.platformName}: ${toProjectRelativePath(selectedVideo.videoPath)}\n`
)
if (options.dryRun) {
process.stdout.write('\nDry run mode enabled, no API calls were made.\n')
return
}
const apiKey = process.env.GEMINI_API_KEY
if (!apiKey) {
throw new Error('GEMINI_API_KEY is required unless --dry-run is set')
}
await reviewVideo(selectedVideo, options, apiKey)
}
if (isExecutedAsScript(import.meta.url)) {
void main().catch((error: unknown) => {
const message = errorToString(error)
process.stderr.write(`qa-video-review failed: ${message}\n`)
process.exit(1)
})
}

View File

@@ -42,7 +42,6 @@ import type { StyleValue } from 'vue'
import { useIntersectionObserver } from '@/composables/useIntersectionObserver'
import { useMediaCache } from '@/services/mediaCacheService'
import type { ClassValue } from '@/utils/tailwindUtil'
const {
src,
@@ -54,8 +53,8 @@ const {
} = defineProps<{
src: string
alt?: string
containerClass?: ClassValue
imageClass?: ClassValue
containerClass?: string
imageClass?: string
imageStyle?: StyleValue
rootMargin?: string
}>()

View File

@@ -65,8 +65,7 @@ describe('DefaultThumbnail', () => {
const classString = Array.isArray(imageClass)
? imageClass.join(' ')
: imageClass
expect(classString).toContain('w-full')
expect(classString).toContain('h-full')
expect(classString).toContain('size-full')
expect(classString).toContain('object-cover')
})

View File

@@ -3,12 +3,14 @@
<LazyImage
:src="src"
:alt="alt"
:image-class="[
'transform-gpu transition-transform duration-300 ease-out',
isVideoType
? 'w-full h-full object-cover'
: 'max-w-full max-h-64 object-contain'
]"
:image-class="
cn(
'transform-gpu transition-transform duration-300 ease-out',
isVideoType
? 'size-full object-cover'
: 'max-h-64 max-w-full object-contain'
)
"
:image-style="
isHovered ? { transform: `scale(${1 + hoverZoom / 100})` } : undefined
"
@@ -19,6 +21,7 @@
<script setup lang="ts">
import LazyImage from '@/components/common/LazyImage.vue'
import BaseThumbnail from '@/components/templates/thumbnails/BaseThumbnail.vue'
import { cn } from '@/utils/tailwindUtil'
const { src, isVideo } = defineProps<{
src: string

View File

@@ -6,7 +6,6 @@ import type { DialogPassThroughOptions } from 'primevue/dialog'
import { markRaw, ref } from 'vue'
import type { Component } from 'vue'
import type GlobalDialog from '@/components/dialog/GlobalDialog.vue'
import type { ComponentAttrs } from 'vue-component-type-helpers'
type DialogPosition =
@@ -34,23 +33,19 @@ interface CustomDialogComponentProps {
headless?: boolean
}
export type DialogComponentProps = ComponentAttrs<typeof GlobalDialog> &
CustomDialogComponentProps
export type DialogComponentProps = CustomDialogComponentProps &
Record<string, unknown>
export interface DialogInstance<
H extends Component = Component,
B extends Component = Component,
F extends Component = Component
> {
export interface DialogInstance {
key: string
visible: boolean
title?: string
headerComponent?: H
headerProps?: ComponentAttrs<H>
component: B
contentProps: ComponentAttrs<B>
footerComponent?: F
footerProps?: ComponentAttrs<F>
headerComponent?: Component
headerProps?: Record<string, unknown>
component: Component
contentProps: Record<string, unknown>
footerComponent?: Component
footerProps?: Record<string, unknown>
dialogComponentProps: DialogComponentProps
priority: number
}