Commit Graph

74 Commits

Author SHA1 Message Date
snomiao
b11d054e8a fix: enforce menu navigation pattern + add CI job link to report
- Strengthen prompt: MUST use openMenu → hoverMenuItem → clickMenuItem
  in that order. Previous runs skipped openMenu causing silent failures.
- Add CI Job link to the QA report site header for quick navigation
  to the GitHub Actions run that generated the report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:26:31 +00:00
snomiao
8db85802df fix: split dual badge generator into separate step to fix expression length
GitHub Actions has a 21000 char limit per expression. The combined
badge setup step exceeded this after adding the dual badge generator.
Split into its own step.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:31 +00:00
snomiao
c03d0436bc fix: remove Bug: prefix from PR badge, just show REPRODUCED directly
Badge now reads: QA Bot | REPRODUCED | Fix: APPROVED
Not all issues are bugs — could be feature requests too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:26:31 +00:00
snomiao
28798a658b fix: use blue for REPRODUCED badge (success, not failure)
Reproducing a bug is a successful outcome for the QA bot.
Blue (#2196f3) = bot succeeded. Red = bot found problems with the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:26:30 +00:00
snomiao
b1895fe068 feat: combine PR bug+fix into single dual-segment badge
PRs now show one badge with three segments:
  QA Bot | Bug: REPRODUCED | Fix: APPROVED

Instead of two separate badges. Uses gen-badge-dual.sh which
renders label + bug status + fix status in one SVG.

Issues still use single two-segment badge:
  QA Bot | FINISHED: REPRODUCED

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:26:30 +00:00
snomiao
d353cba489 fix: feed QA guide + issue context to agentic reproduce loop
Root cause: runAgenticLoop never read the QA guide — agent saw
"No issue context provided" for issues. Now reads qaGuideFile,
parses structured fields, and injects into system prompt.

Also: fetch issue body via gh issue view in workflow, increase
budget to 120s/30 turns, add focus reminders, smarter stuck
detection (50px grid normalization + action-type frequency),
reject invalid click targets, add loadDefaultWorkflow and
openSettings convenience actions, strategy hints in prompt.

Fix pre-existing typecheck error in eslint.config.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:30 +00:00
snomiao
fe9d145ead feat: dual badges for PRs — bug reproduction + fix quality
PRs now get two separate badges:
- Bug: REPRODUCED / NOT REPRODUCIBLE / PARTIAL (before branch)
- Fix: APPROVED / MAJOR ISSUES / MINOR ISSUES (after branch)

Issues keep a single badge: FINISHED: REPRODUCED / etc.

Both badge-bug.svg and badge-fix.svg served from the deploy site.
PR comment shows all three: ![badge] ![bug] ![fix]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:26:30 +00:00
snomiao
928db261f4 fix: badge FINISHED state includes result sub-state
FINISHED is not standalone — always shows result:
- FINISHED: REPRODUCED / NOT REPRODUCIBLE / PARTIAL (issues)
- FINISHED: APPROVED / MAJOR ISSUES / MINOR ISSUES (PRs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:26:30 +00:00
snomiao
861cb7a782 feat: add SVG status badge to QA report site
Badge shows QA pipeline status, deployed at each stage:
- PREPARING (blue) — setting up artifacts
- ANALYZING (orange) — running video review
- Final status with color:
  - Issues: REPRODUCED (red) / NOT REPRODUCIBLE (gray) / PARTIAL (yellow)
  - PRs: APPROVED (green) / MAJOR ISSUES (red) / MINOR ISSUES (yellow)

Badge served as /badge.svg from the same Cloudflare Pages site.
Included in PR comment as ![QA Badge](url/badge.svg).

Also restore @ts-expect-error for import-x plugin type incompatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:26:30 +00:00
snomiao
aa0c020e89 feat: agentic screenshot feedback loop + multi-pass recording
Replace single-shot step generation in reproduce mode with an agentic
loop where Gemini sees the screen after each action and decides what
to do next. For multi-bug issues, decompose into sub-issues and run
separate recording passes.

- Extract executeAction() from executeSteps() for reuse
- Add reload and done action types
- Add captureScreenshotForGemini() (JPEG q50, ~50KB)
- Add runAgenticLoop() with sliding window history (3 screenshots)
- Add decomposeIssue() for multi-pass recording (1-3 sub-issues)
- Update workflow to handle numbered session videos (qa-session-1, etc.)
2026-03-31 07:26:30 +00:00
snomiao
7170918fd7 feat: add frame-by-frame video controls to QA report player
- Add custom video controls below each video with frame stepping
- Frame back/forward buttons (1 frame at 30fps, 10 frames skip)
- Speed selector: 0.1x, 0.25x, 0.5x (default), 1x, 1.5x, 2x
- Keyboard shortcuts: arrow keys for frame step, space for play/pause
- SMPTE-style timecode display (m:ss.ms)
- Default 0.5x speed since AI operates UI faster than humans
- Videos no longer autoplay (pause on load for inspection)
- Zero external dependencies (pure HTML5 video API)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:26:30 +00:00
snomiao
cc7ec6fd32 feat: upgrade QA pipeline to Gemini 3.x models
- qa-record.ts, qa-analyze-pr.ts: gemini-2.5-flash/pro → gemini-3.1-pro-preview
- qa-video-review.ts, qa-generate-test.ts: gemini-2.5-flash → gemini-3-flash-preview
- pr-qa.yaml: update hardcoded model reference
- Add docs/qa/models.md with model comparison and rationale
2026-03-31 07:26:30 +00:00
snomiao
fe107d1f20 feat: add clickable issue/PR URLs to QA reports
- Add --target-url CLI option to qa-video-review.ts
- Include target URL in generated markdown reports
- Add clickable issue/PR link in deployed HTML report header
- Workflow passes the target URL automatically

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:26:30 +00:00
snomiao
2c08a98117 fix: use pulls API instead of gh pr view for PR/issue detection
gh pr view can't distinguish PRs from issues — it succeeds for both.
Use the REST API endpoint repos/{owner}/{repo}/pulls/{number} which
returns 404 for issues.
2026-03-31 07:26:29 +00:00
snomiao
836492e3dc fix: address Copilot review feedback on QA scripts
- Enforce requestTimeoutMs via Gemini SDK requestOptions
- Add 100MB video size check before base64 encoding
- Sanitize screenshot filenames to prevent path traversal
- Sort video files by mtime for reliable rename
- Validate --mode arg against allowed values
- Add Content-Length pre-check in downloadMedia
- Add GitHub domain allowlist for media downloads (SSRF mitigation)
- Add contents:write permission and git config for report job
- Update Node.js requirement in SKILL.md from 18+ to 22+
2026-03-31 07:26:29 +00:00
snomiao
0332f706fa feat: add behavior changes summary table to QA video review
Add a "Behavior Changes" table (Behavior, Before, After, Verdict)
alongside the existing timeline comparison. This gives reviewers a
quick high-level view of all behavioral differences before diving
into the frame-by-frame timeline.
2026-03-31 07:26:29 +00:00
snomiao
84a7b8ed62 fix: format before/after comparison as table in QA video review
Instruct Gemini to output the Before vs After section as a markdown
table with Time, Type, Severity, Before, After columns for easier
comparison. Update HTML template table styles with fixed layout and
column widths optimized for the 5-column comparison format.
2026-03-31 07:26:29 +00:00
snomiao
1adcd2aaa2 fix: make analyze-pr non-blocking and log Gemini response
- Log raw Gemini response for debugging when parsing fails
- Handle possible wrapper keys in response
- Make qa-before/qa-after run even if analyze-pr fails (only gate
  on resolve-matrix success)
2026-03-31 07:26:29 +00:00
snomiao
d28949b2b9 fix: extract PR number from sno-qa-<number> branch name
When running on push events for sno-qa-* branches without an open PR,
extract the PR number from the branch name so analyze-pr can fetch
the full PR thread for analysis.
2026-03-31 07:26:29 +00:00
snomiao
7375f16afe feat: add analyze-pr job to QA pipeline
Add Gemini Pro-powered PR analysis that generates targeted QA guides
from the full PR thread (description, comments, screenshots, diff).
The analyze-pr job runs on lightweight ubuntu before recordings start,
producing qa-guide-before.json and qa-guide-after.json that are
downloaded by recording jobs to produce more focused test steps.

Graceful fallback: if analysis fails, recordings proceed without guides.
2026-03-31 07:26:29 +00:00
snomiao
280ff17c15 fix: download before/after artifacts into separate directories
download-artifact@v7 merges all files flat regardless of
merge-multiple setting. Use separate path dirs (before/after)
and copy all files into the report directory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:29 +00:00
snomiao
cd9bbdae4c fix: set merge-multiple false for download-artifact v7
download-artifact@v7 defaults merge-multiple to true, which puts all
files flat in qa-artifacts/ instead of per-artifact subdirectories.
The merge step expects qa-artifacts/qa-before-{os}-{run}/ subdirs,
so the report directory never gets created and video review finds
no files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:29 +00:00
snomiao
36ce1fa406 refactor: split QA into parallel before/after jobs
Instead of running before/after sequentially in a single job with
fragile git stash/checkout gymnastics, split into two independent
parallel jobs on separate runners:

  resolve-matrix → qa-before (main) ─┐
                 → qa-after  (PR)   ─┴→ report

- qa-before: uses git worktree for clean main branch build
- qa-after: normal PR build via setup-frontend
- report: downloads both artifact sets, merges, runs Gemini review

Benefits:
- Clean workspace isolation (no git checkout origin/main -- .)
- ~2x faster (parallel execution)
- Each job gets its own ComfyUI server (no shared state)
- Eliminates entire class of workspace contamination bugs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:29 +00:00
snomiao
0bf8dbb130 fix: switch from Firefox to Chromium for WebGL canvas support
Firefox headless doesn't support WebGL, causing "getCanvas: canvas is
null" errors. Switch to Chromium which has full headless WebGL support.
Also fix login flow to wait for async router guard to settle and
create user via text input as fallback.
2026-03-31 07:26:29 +00:00
snomiao
441263dc64 fix: bypass login in QA recordings with localStorage pre-seeding
The QA recordings were stuck on the user selection screen because CI
has no existing users. Fix by pre-seeding localStorage with userId,
userName, and TutorialCompleted before navigation, plus creating a
qa-ci user via API as a fallback.
2026-03-31 07:26:29 +00:00
snomiao
641aee03c0 feat: autoplay and loop videos on QA dashboard 2026-03-31 07:26:29 +00:00
snomiao
d63a9058fb fix: handle flat artifact layout when no report.md exists
The normalize step couldn't create the qa-report-* subdir because it
only looked for *-report.md files. Add fallback to detect webm files.
2026-03-31 07:26:29 +00:00
snomiao
8402c63f99 fix: install main branch deps before building, then reinstall PR deps
Main branch imports @vueuse/router which isn't in PR's node_modules.
Need both: main deps for building main, PR deps for running QA scripts.
2026-03-31 07:26:29 +00:00
snomiao
588f19a774 fix: reinstall PR deps after main branch build to restore @google/generative-ai
The main branch build step was running pnpm install with main's lockfile,
which removed @google/generative-ai from node_modules. Move the reinstall
to after restoring PR files so the QA recording script can find its deps.
2026-03-31 07:26:29 +00:00
snomiao
e5f358057c fix: temporarily disable concurrency group to unstick QA runs 2026-03-31 07:26:28 +00:00
snomiao
e8e1e0eb32 fix: restore PR files with git checkout HEAD instead of git checkout -
git checkout - uses @{-1} which requires a previous branch switch.
Since we use 'git checkout origin/main -- .' (file checkout, not branch
switch), there is no @{-1} ref. Use HEAD to restore from current branch.

Also restore proper concurrency group.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:28 +00:00
snomiao
2f59c85469 fix: use unique concurrency group to unstick QA runs 2026-03-31 07:26:28 +00:00
snomiao
9f4b14ab9b feat: integrate comfy-qa skill test plan into QA recording pipeline
Pass the comprehensive test plan from .claude/skills/comfy-qa/SKILL.md
to Gemini when generating test steps. This gives Gemini knowledge of all
12 QA categories (canvas, menus, sidebar, settings, etc.) so it picks
the most relevant tests for each PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:28 +00:00
snomiao
f560eb8fb4 fix: use vite build directly to skip nx typecheck dependency
nx build runs typecheck as a prerequisite (via @nx/vite/plugin config).
Use vite build directly for the main branch comparison build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:28 +00:00
snomiao
3311d12546 fix: skip typecheck when building main branch for QA comparison
Main branch may have transient TS errors when built with the PR
branch's lockfile. Since we only need the dist for visual comparison,
run nx build directly instead of pnpm build (which includes typecheck).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:28 +00:00
snomiao
4e5f683185 refactor: replace Codex with direct Playwright recording in QA pipeline
Replace the unreliable codex exec approach with a Playwright script
(qa-record.ts) that uses Gemini to generate targeted test steps from
the PR diff, then executes them deterministically via Playwright's API.

Key changes:
- New scripts/qa-record.ts: Gemini generates JSON test actions, Playwright
  executes them with reliable helper functions (menu nav, dialog fill, etc.)
- Remove codex CLI and playwright-cli dependencies
- Remove 150+ lines of prompt templates from pr-qa.yaml
- Firefox headless with video recording (same approach proven locally)
- Fallback steps if Gemini fails

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:28 +00:00
snomiao
6993a7ad5f feat: auto-generate regression tests from QA reports
- Tighten BEFORE prompt to 15s snapshot (show old state only)
- Add qa-generate-test.ts: Gemini-powered Playwright test generator
- New workflow step: generate .spec.ts and push to {branch}-add-qa-test
- Tests assert UIUX behavior (tab names, dirty state, visibility)
2026-03-31 07:26:28 +00:00
snomiao
eb0ce5ed4e feat: before/after video comparison for QA pipeline
- Build both main (dist-before/) and PR (dist/) frontends in focused mode
- Run QA twice: BEFORE on main branch frontend, AFTER on PR branch
- Send both videos to Gemini in one request for comparative analysis
- Side-by-side dashboard layout with Before (main) / After (PR) panels
- Comparative prompt evaluates whether before confirms old behavior
  and after proves the fix works
- Falls back to single-video mode when no before video available
2026-03-31 07:26:28 +00:00
snomiao
06e4a512f0 fix: make QA videos seekable with faststart and frequent keyframes
moov atom was at end of file (8.6MB offset) — browser had to download
the entire video before seeking. Keyframes were only every 10 seconds.

Add -movflags +faststart (moov before mdat) and -g 60 (keyframe every
2.4s at 25fps) to ffmpeg conversion.
2026-03-31 07:26:28 +00:00
snomiao
5ad94cfaef fix: issue cards instead of dense table, rename to comfy-qa.pages.dev
- Replace 6-column confirmed issues table with vertical card blocks
  using colored severity/timestamp/confidence badges
- Rename Cloudflare Pages project from comfyui-qa-videos to comfy-qa
2026-03-31 07:26:28 +00:00
snomiao
7d34ffe6d4 fix: seekable video, hide empty cards, PR-aware video review
- Remove autoplay/loop so video timeline is seekable
- Skip card generation for platforms without recordings
- Add --pr-context flag to qa-video-review.ts so Gemini evaluates
  against PR purpose instead of just describing what happened
- Workflow now builds pr-context.txt from PR title/body/diff
2026-03-31 07:26:28 +00:00
snomiao
d0dc720698 feat: redesign QA dashboard with modern frontend design
OKLCH color tokens, liquid glass card surfaces, Inter + JetBrains Mono
typography, grain texture overlay, staggered fade-up animations, pill
action buttons with SVG icons, and improved report table styling.
2026-03-31 07:26:28 +00:00
snomiao
1a7ea13660 fix: make settings pre-seed non-fatal and try both API endpoints
The /api/settings endpoint returned 4xx in CI. Try both /api/settings
and /settings endpoints, and don't fail the job if neither works.
2026-03-31 07:26:28 +00:00
snomiao
cac0660de4 fix: pre-seed Comfy.TutorialCompleted to skip template gallery in QA
The Codex agent was spending 35s browsing the "Getting Started" template
gallery instead of testing the PR's changes. Pre-seeding this setting
via the ComfyUI API ensures the agent lands directly in the graph editor.
2026-03-31 07:26:28 +00:00
snomiao
421bf2ac9e fix: tighten focused QA prompt to only test PR-specific behavior
The Codex agent was spending time on login flow, template browsing,
and general smoke testing instead of testing the PR's actual changes.

Changes:
- Add 30-second time budget for video recording
- Move video-start AFTER login and editor verification
- Explicitly prohibit template browsing and sidebar exploration
- Reduce test steps to 3-6 targeted actions
- Restructure prompt with clear Instructions/Rules sections
2026-03-31 07:26:28 +00:00
snomiao
6bb2f0346f fix: render markdown in QA reports with marked.js
Replace crude sed-based markdown conversion with client-side
rendering via marked.js CDN. Adds proper table, list, and
code styling for the report section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:28 +00:00
snomiao
74dbf84e4b fix: run report job on workflow_dispatch events
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:28 +00:00
snomiao
76c6efcc50 refactor: replace GPT frame extraction with Gemini native video analysis
Replace the OpenAI GPT-based frame extraction approach (ffmpeg + screenshots)
with Gemini 2.5 Flash's native video understanding. This eliminates false
positives from frame-based analysis (e.g. "black screen = critical bug" during
page transitions) and produces dramatically better QA reviews.

Changes:
- Remove ffmpeg frame extraction, ffprobe duration detection, and all related
  logic (~365 lines removed)
- Add @google/generative-ai SDK for native video/mp4 upload to Gemini
- Update CLI: remove --max-frames, --min-interval-seconds, --keep-frames flags
- Update env: OPENAI_API_KEY → GEMINI_API_KEY
- Update workflow: swap API key secret and model in pr-qa.yaml
- Update report: replace "Frames analyzed" with "Video size"
- Add note in prompt that brief black frames during transitions are normal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 07:26:28 +00:00
snomiao
e787340c16 fix: use fill+click for quick login before video recording
playwright-cli doesn't support 'evaluate' command. Instead, instruct
Codex to quickly fill the username input and click Next on user-select
page BEFORE starting video recording, so the video only shows actual
QA testing.
2026-03-31 07:26:28 +00:00
snomiao
ae7c423a7e fix: use evaluate to set localStorage before video recording
storageState config doesn't work with playwright-cli. Instead, use
evaluate to set Comfy.userId/userName after opening the page, then
navigate back. This skips user-select before video-start so the
recording only shows actual QA testing.
2026-03-31 07:26:28 +00:00