Pass the comprehensive test plan from .claude/skills/comfy-qa/SKILL.md
to Gemini when generating test steps. This gives Gemini knowledge of all
12 QA categories (canvas, menus, sidebar, settings, etc.) so it picks
the most relevant tests for each PR.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
nx build runs typecheck as a prerequisite (via @nx/vite/plugin config).
Use vite build directly for the main branch comparison build.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main branch may have transient TS errors when built with the PR
branch's lockfile. Since we only need the dist for visual comparison,
run nx build directly instead of pnpm build (which includes typecheck).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the unreliable codex exec approach with a Playwright script
(qa-record.ts) that uses Gemini to generate targeted test steps from
the PR diff, then executes them deterministically via Playwright's API.
Key changes:
- New scripts/qa-record.ts: Gemini generates JSON test actions, Playwright
executes them with reliable helper functions (menu nav, dialog fill, etc.)
- Remove codex CLI and playwright-cli dependencies
- Remove 150+ lines of prompt templates from pr-qa.yaml
- Firefox headless with video recording (same approach proven locally)
- Fallback steps if Gemini fails
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Tighten BEFORE prompt to 15s snapshot (show old state only)
- Add qa-generate-test.ts: Gemini-powered Playwright test generator
- New workflow step: generate .spec.ts and push to {branch}-add-qa-test
- Tests assert UIUX behavior (tab names, dirty state, visibility)
- Build both main (dist-before/) and PR (dist/) frontends in focused mode
- Run QA twice: BEFORE on main branch frontend, AFTER on PR branch
- Send both videos to Gemini in one request for comparative analysis
- Side-by-side dashboard layout with Before (main) / After (PR) panels
- Comparative prompt evaluates whether before confirms old behavior
and after proves the fix works
- Falls back to single-video mode when no before video available
moov atom was at end of file (8.6MB offset) — browser had to download
the entire video before seeking. Keyframes were only every 10 seconds.
Add -movflags +faststart (moov before mdat) and -g 60 (keyframe every
2.4s at 25fps) to ffmpeg conversion.
- Remove autoplay/loop so video timeline is seekable
- Skip card generation for platforms without recordings
- Add --pr-context flag to qa-video-review.ts so Gemini evaluates
against PR purpose instead of just describing what happened
- Workflow now builds pr-context.txt from PR title/body/diff
The Codex agent was spending 35s browsing the "Getting Started" template
gallery instead of testing the PR's changes. Pre-seeding this setting
via the ComfyUI API ensures the agent lands directly in the graph editor.
The Codex agent was spending time on login flow, template browsing,
and general smoke testing instead of testing the PR's actual changes.
Changes:
- Add 30-second time budget for video recording
- Move video-start AFTER login and editor verification
- Explicitly prohibit template browsing and sidebar exploration
- Reduce test steps to 3-6 targeted actions
- Restructure prompt with clear Instructions/Rules sections
Replace crude sed-based markdown conversion with client-side
rendering via marked.js CDN. Adds proper table, list, and
code styling for the report section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the OpenAI GPT-based frame extraction approach (ffmpeg + screenshots)
with Gemini 2.5 Flash's native video understanding. This eliminates false
positives from frame-based analysis (e.g. "black screen = critical bug" during
page transitions) and produces dramatically better QA reviews.
Changes:
- Remove ffmpeg frame extraction, ffprobe duration detection, and all related
logic (~365 lines removed)
- Add @google/generative-ai SDK for native video/mp4 upload to Gemini
- Update CLI: remove --max-frames, --min-interval-seconds, --keep-frames flags
- Update env: OPENAI_API_KEY → GEMINI_API_KEY
- Update workflow: swap API key secret and model in pr-qa.yaml
- Update report: replace "Frames analyzed" with "Video size"
- Add note in prompt that brief black frames during transitions are normal
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
playwright-cli doesn't support 'evaluate' command. Instead, instruct
Codex to quickly fill the username input and click Next on user-select
page BEFORE starting video recording, so the video only shows actual
QA testing.
storageState config doesn't work with playwright-cli. Instead, use
evaluate to set Comfy.userId/userName after opening the page, then
navigate back. This skips user-select before video-start so the
recording only shows actual QA testing.
Write a Playwright storageState JSON with Comfy.userId/userName pre-set
so the app loads directly to the graph editor. Saves ~40s per QA run
that was wasted on navigating the user-select page.
The convert step was using find which picked up a 0-byte file from
playwright's videos/ directory instead of the valid qa-session.webm.
Now prefers qa-session.webm explicitly and skips empty files.
Codex was using pnpm dlx instead of the global playwright-cli.
Pre-install chromium in setup step and make prompt explicit about
using the global command directly without pnpm/npx.
Replace claude --print with codex exec for cheaper QA runs.
Uses codex-mini-latest model ($1.50/$6 vs Sonnet $3/$15).
Uses existing OPENAI_API_KEY secret (no new secrets needed).
- Replace saveVideo config (didn't produce video) with explicit
playwright-cli video-start/video-stop commands in QA prompt
- Remove apt-get install ffmpeg step (pre-installed on GH runners)
- Switch video review model from gpt-4o to gpt-4.1-mini
- Enable saveVideo in playwright-cli config for real video recording
- Replace screenshot stitching with webm→mp4 conversion
- Move video review step before deploy so reports are included
- Add GPT video review reports inline on the Cloudflare Pages site
- Each video card now has expandable "GPT Video Review" section
- Set .playwright/cli.config.json with outputDir pointing to screenshots/
- This way bare 'playwright-cli screenshot' auto-saves to the right place
- Create screenshot directory before Claude runs (don't rely on Claude)
- Collect step now searches working directory for stray PNGs
- Simplified prompt: no --filename needed, just 'playwright-cli screenshot'
Screenshots were saved to artifact root but stitch looked in frames/.
Now: prompt tells Claude to save to screenshots/ dir with numbered names,
collect step consolidates PNGs there, stitch step globs from screenshots/.
Removed video-start/video-stop (Claude doesn't use them).
- Add playwright-cli config with outputDir and saveVideo
- Use video-start/video-stop instead of relying on screenshot frames
- Add fallback artifact collection from .playwright-cli/ default dir
- Simplify prompts to focus on video recording workflow
The escaped \$QA_ARTIFACTS in the heredoc produced literal text
'$QA_ARTIFACTS' in the prompt. Claude's Bash tool didn't reliably
expand this env var, so no screenshots or reports were saved.
Remove the escapes so the heredoc expands the variable to the actual
path (e.g. /home/runner/work/_temp/qa-artifacts).
Backtick-wrapped playwright-cli examples in the unquoted heredoc were
being interpreted as bash command substitution, producing empty prompts.
Replace backtick syntax with plain "Run:" prefixed commands.
- Remove all Xvfb/ffmpeg screen recording infrastructure from qa job
(captured blank display since playwright-cli runs headless)
- Add screenshot instructions to QA prompts: Claude saves sequential
frames to $QA_ARTIFACTS/frames/ after every interaction
- Stitch screenshots into video via ffmpeg in report job (2fps)
- Merge video-review job into report job (4 jobs → 3 jobs)
- Unified PR comment with video links + video review in <details> collapse
- Clean up stale QA_VIDEO_REVIEW_COMMENT markers from prior runs
Add Claude Code skills and a label-triggered QA workflow:
- .claude/skills/comfy-qa/SKILL.md: 12-category QA test plan using
playwright-cli for browser automation
- .github/workflows/pr-qa.yaml: CI workflow triggered by qa-changes
(focused, Linux) or qa-full (3-OS matrix) labels. Records screen via
ffmpeg, runs Claude CLI with playwright-cli, deploys video gallery to
Cloudflare Pages, posts PR comment with GIF thumbnails, and runs
OpenAI vision-based video review
- scripts/qa-video-review.ts: frame extraction + GPT-4o analysis
- scripts/qa-video-review.test.ts: unit tests for video review
- knip.config.ts: resolve knip errors for ingest-types package