Commit Graph

54 Commits

Author SHA1 Message Date
snomiao
7502528733 fix: download before/after artifacts into separate directories
download-artifact@v7 merges all files flat regardless of
merge-multiple setting. Use separate path dirs (before/after)
and copy all files into the report directory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
0656091959 fix: set merge-multiple false for download-artifact v7
download-artifact@v7 defaults merge-multiple to true, which puts all
files flat in qa-artifacts/ instead of per-artifact subdirectories.
The merge step expects qa-artifacts/qa-before-{os}-{run}/ subdirs,
so the report directory never gets created and video review finds
no files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
6515170d08 refactor: split QA into parallel before/after jobs
Instead of running before/after sequentially in a single job with
fragile git stash/checkout gymnastics, split into two independent
parallel jobs on separate runners:

  resolve-matrix → qa-before (main) ─┐
                 → qa-after  (PR)   ─┴→ report

- qa-before: uses git worktree for clean main branch build
- qa-after: normal PR build via setup-frontend
- report: downloads both artifact sets, merges, runs Gemini review

Benefits:
- Clean workspace isolation (no git checkout origin/main -- .)
- ~2x faster (parallel execution)
- Each job gets its own ComfyUI server (no shared state)
- Eliminates entire class of workspace contamination bugs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:29 +00:00
snomiao
389f6ca6b8 fix: switch from Firefox to Chromium for WebGL canvas support
Firefox headless doesn't support WebGL, causing "getCanvas: canvas is
null" errors. Switch to Chromium which has full headless WebGL support.
Also fix login flow to wait for async router guard to settle and
create user via text input as fallback.
2026-03-25 03:17:29 +00:00
snomiao
c5d207fa9a fix: bypass login in QA recordings with localStorage pre-seeding
The QA recordings were stuck on the user selection screen because CI
has no existing users. Fix by pre-seeding localStorage with userId,
userName, and TutorialCompleted before navigation, plus creating a
qa-ci user via API as a fallback.
2026-03-25 03:17:29 +00:00
snomiao
ab6dff02c9 feat: autoplay and loop videos on QA dashboard 2026-03-25 03:17:29 +00:00
snomiao
7396d39a6a fix: handle flat artifact layout when no report.md exists
The normalize step couldn't create the qa-report-* subdir because it
only looked for *-report.md files. Add fallback to detect webm files.
2026-03-25 03:17:29 +00:00
snomiao
47e5c39ac9 fix: install main branch deps before building, then reinstall PR deps
Main branch imports @vueuse/router which isn't in PR's node_modules.
Need both: main deps for building main, PR deps for running QA scripts.
2026-03-25 03:17:29 +00:00
snomiao
28f530d53a fix: reinstall PR deps after main branch build to restore @google/generative-ai
The main branch build step was running pnpm install with main's lockfile,
which removed @google/generative-ai from node_modules. Move the reinstall
to after restoring PR files so the QA recording script can find its deps.
2026-03-25 03:17:28 +00:00
snomiao
282754743d fix: temporarily disable concurrency group to unstick QA runs 2026-03-25 03:17:28 +00:00
snomiao
0a91ec09e8 fix: restore PR files with git checkout HEAD instead of git checkout -
git checkout - uses @{-1} which requires a previous branch switch.
Since we use 'git checkout origin/main -- .' (file checkout, not branch
switch), there is no @{-1} ref. Use HEAD to restore from current branch.

Also restore proper concurrency group.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
25fd1b2700 fix: use unique concurrency group to unstick QA runs 2026-03-25 03:17:28 +00:00
snomiao
746d465912 feat: integrate comfy-qa skill test plan into QA recording pipeline
Pass the comprehensive test plan from .claude/skills/comfy-qa/SKILL.md
to Gemini when generating test steps. This gives Gemini knowledge of all
12 QA categories (canvas, menus, sidebar, settings, etc.) so it picks
the most relevant tests for each PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
e314a18b90 fix: use vite build directly to skip nx typecheck dependency
nx build runs typecheck as a prerequisite (via @nx/vite/plugin config).
Use vite build directly for the main branch comparison build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
78fb9ef27f fix: skip typecheck when building main branch for QA comparison
Main branch may have transient TS errors when built with the PR
branch's lockfile. Since we only need the dist for visual comparison,
run nx build directly instead of pnpm build (which includes typecheck).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
41999c2e0f refactor: replace Codex with direct Playwright recording in QA pipeline
Replace the unreliable codex exec approach with a Playwright script
(qa-record.ts) that uses Gemini to generate targeted test steps from
the PR diff, then executes them deterministically via Playwright's API.

Key changes:
- New scripts/qa-record.ts: Gemini generates JSON test actions, Playwright
  executes them with reliable helper functions (menu nav, dialog fill, etc.)
- Remove codex CLI and playwright-cli dependencies
- Remove 150+ lines of prompt templates from pr-qa.yaml
- Firefox headless with video recording (same approach proven locally)
- Fallback steps if Gemini fails

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
5c243d16be feat: auto-generate regression tests from QA reports
- Tighten BEFORE prompt to 15s snapshot (show old state only)
- Add qa-generate-test.ts: Gemini-powered Playwright test generator
- New workflow step: generate .spec.ts and push to {branch}-add-qa-test
- Tests assert UIUX behavior (tab names, dirty state, visibility)
2026-03-25 03:17:28 +00:00
snomiao
774dcd823e feat: before/after video comparison for QA pipeline
- Build both main (dist-before/) and PR (dist/) frontends in focused mode
- Run QA twice: BEFORE on main branch frontend, AFTER on PR branch
- Send both videos to Gemini in one request for comparative analysis
- Side-by-side dashboard layout with Before (main) / After (PR) panels
- Comparative prompt evaluates whether before confirms old behavior
  and after proves the fix works
- Falls back to single-video mode when no before video available
2026-03-25 03:17:28 +00:00
snomiao
b500f826fc fix: make QA videos seekable with faststart and frequent keyframes
moov atom was at end of file (8.6MB offset) — browser had to download
the entire video before seeking. Keyframes were only every 10 seconds.

Add -movflags +faststart (moov before mdat) and -g 60 (keyframe every
2.4s at 25fps) to ffmpeg conversion.
2026-03-25 03:17:28 +00:00
snomiao
1cd9e171c6 fix: issue cards instead of dense table, rename to comfy-qa.pages.dev
- Replace 6-column confirmed issues table with vertical card blocks
  using colored severity/timestamp/confidence badges
- Rename Cloudflare Pages project from comfyui-qa-videos to comfy-qa
2026-03-25 03:17:28 +00:00
snomiao
5c0bef9b72 fix: seekable video, hide empty cards, PR-aware video review
- Remove autoplay/loop so video timeline is seekable
- Skip card generation for platforms without recordings
- Add --pr-context flag to qa-video-review.ts so Gemini evaluates
  against PR purpose instead of just describing what happened
- Workflow now builds pr-context.txt from PR title/body/diff
2026-03-25 03:17:28 +00:00
snomiao
94e1388495 feat: redesign QA dashboard with modern frontend design
OKLCH color tokens, liquid glass card surfaces, Inter + JetBrains Mono
typography, grain texture overlay, staggered fade-up animations, pill
action buttons with SVG icons, and improved report table styling.
2026-03-25 03:17:28 +00:00
snomiao
0268e8f977 fix: make settings pre-seed non-fatal and try both API endpoints
The /api/settings endpoint returned 4xx in CI. Try both /api/settings
and /settings endpoints, and don't fail the job if neither works.
2026-03-25 03:17:28 +00:00
snomiao
4f1df7c7ce fix: pre-seed Comfy.TutorialCompleted to skip template gallery in QA
The Codex agent was spending 35s browsing the "Getting Started" template
gallery instead of testing the PR's changes. Pre-seeding this setting
via the ComfyUI API ensures the agent lands directly in the graph editor.
2026-03-25 03:17:28 +00:00
snomiao
3b2fdc786a fix: tighten focused QA prompt to only test PR-specific behavior
The Codex agent was spending time on login flow, template browsing,
and general smoke testing instead of testing the PR's actual changes.

Changes:
- Add 30-second time budget for video recording
- Move video-start AFTER login and editor verification
- Explicitly prohibit template browsing and sidebar exploration
- Reduce test steps to 3-6 targeted actions
- Restructure prompt with clear Instructions/Rules sections
2026-03-25 03:17:28 +00:00
snomiao
4d1ad4dcf0 fix: render markdown in QA reports with marked.js
Replace crude sed-based markdown conversion with client-side
rendering via marked.js CDN. Adds proper table, list, and
code styling for the report section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
7ba1aaed53 fix: run report job on workflow_dispatch events
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
09a3c10d50 refactor: replace GPT frame extraction with Gemini native video analysis
Replace the OpenAI GPT-based frame extraction approach (ffmpeg + screenshots)
with Gemini 2.5 Flash's native video understanding. This eliminates false
positives from frame-based analysis (e.g. "black screen = critical bug" during
page transitions) and produces dramatically better QA reviews.

Changes:
- Remove ffmpeg frame extraction, ffprobe duration detection, and all related
  logic (~365 lines removed)
- Add @google/generative-ai SDK for native video/mp4 upload to Gemini
- Update CLI: remove --max-frames, --min-interval-seconds, --keep-frames flags
- Update env: OPENAI_API_KEY → GEMINI_API_KEY
- Update workflow: swap API key secret and model in pr-qa.yaml
- Update report: replace "Frames analyzed" with "Video size"
- Add note in prompt that brief black frames during transitions are normal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:17:28 +00:00
snomiao
2f30fbe060 fix: use fill+click for quick login before video recording
playwright-cli doesn't support 'evaluate' command. Instead, instruct
Codex to quickly fill the username input and click Next on user-select
page BEFORE starting video recording, so the video only shows actual
QA testing.
2026-03-25 03:17:27 +00:00
snomiao
85adbe4878 fix: use evaluate to set localStorage before video recording
storageState config doesn't work with playwright-cli. Instead, use
evaluate to set Comfy.userId/userName after opening the page, then
navigate back. This skips user-select before video-start so the
recording only shows actual QA testing.
2026-03-25 03:17:27 +00:00
snomiao
0c707b5deb fix: pre-seed localStorage to skip user-select in QA runs
Write a Playwright storageState JSON with Comfy.userId/userName pre-set
so the app loads directly to the graph editor. Saves ~40s per QA run
that was wasted on navigating the user-select page.
2026-03-25 03:17:27 +00:00
snomiao
4f98518c22 fix: prefer explicit qa-session.webm over corrupt auto-recorded videos
The convert step was using find which picked up a 0-byte file from
playwright's videos/ directory instead of the valid qa-session.webm.
Now prefers qa-session.webm explicitly and skips empty files.
2026-03-25 03:17:27 +00:00
snomiao
ca394b9ff7 fix: improve focused QA prompt to test PR-specific behavior, not random walk 2026-03-25 03:17:27 +00:00
snomiao
51878db99d fix: re-add push trigger for sno-skills and sno-qa-* branches 2026-03-25 03:17:27 +00:00
snomiao
d0534003e7 fix: also install ffprobe for GPT video review frame extraction 2026-03-25 03:17:27 +00:00
snomiao
3d0cd72465 fix: use sudo for ffmpeg static binary extraction to /usr/local/bin 2026-03-25 03:17:27 +00:00
snomiao
82849df891 fix: use static ffmpeg binary instead of apt-get (avoids dpkg lock hang) 2026-03-25 03:17:27 +00:00
snomiao
1c62c0edc3 fix: add ffmpeg install back (not pre-installed on GH runners) 2026-03-25 03:17:27 +00:00
snomiao
6ea2ce755d fix: normalize flat artifact download into expected subdirectory 2026-03-25 03:17:27 +00:00
snomiao
8fc4480ee2 fix: pre-install chromium and clarify prompt for codex
Codex was using pnpm dlx instead of the global playwright-cli.
Pre-install chromium in setup step and make prompt explicit about
using the global command directly without pnpm/npx.
2026-03-25 03:17:27 +00:00
snomiao
3515a478fd fix: add debug output to video convert step 2026-03-25 03:17:27 +00:00
snomiao
11432992d3 fix: use danger-full-access sandbox for codex on GH Actions 2026-03-25 03:17:27 +00:00
snomiao
e619b0143a fix: use correct codex model name gpt-5.4-mini 2026-03-25 03:17:27 +00:00
snomiao
7d4a008f29 feat: switch QA from Claude Code to OpenAI Codex CLI
Replace claude --print with codex exec for cheaper QA runs.
Uses codex-mini-latest model ($1.50/$6 vs Sonnet $3/$15).
Uses existing OPENAI_API_KEY secret (no new secrets needed).
2026-03-25 03:17:27 +00:00
snomiao
a3e65140a9 fix: default to linux-only QA, full 3-OS only via qa-full label
Reduces per-run cost from ~$10-16 to ~$2.50 by defaulting to
Linux-only. Use qa-full label or workflow_dispatch for 3-OS runs.
2026-03-25 03:17:27 +00:00
snomiao
f55ab36dd7 fix: use explicit video-start/stop, remove ffmpeg install, use gpt-4.1-mini
- Replace saveVideo config (didn't produce video) with explicit
  playwright-cli video-start/video-stop commands in QA prompt
- Remove apt-get install ffmpeg step (pre-installed on GH runners)
- Switch video review model from gpt-4o to gpt-4.1-mini
2026-03-25 03:17:27 +00:00
snomiao
698a894b42 fix: use auto video recording and show GPT reports on QA site
- Enable saveVideo in playwright-cli config for real video recording
- Replace screenshot stitching with webm→mp4 conversion
- Move video review step before deploy so reports are included
- Add GPT video review reports inline on the Cloudflare Pages site
- Each video card now has expandable "GPT Video Review" section
2026-03-25 03:17:27 +00:00
snomiao
fbd7f404ef fix: configure playwright-cli outputDir and improve artifact collection
- Set .playwright/cli.config.json with outputDir pointing to screenshots/
- This way bare 'playwright-cli screenshot' auto-saves to the right place
- Create screenshot directory before Claude runs (don't rely on Claude)
- Collect step now searches working directory for stray PNGs
- Simplified prompt: no --filename needed, just 'playwright-cli screenshot'
2026-03-25 03:17:27 +00:00
snomiao
d633ce19a7 fix: stitch screenshots from correct directory and simplify prompt
Screenshots were saved to artifact root but stitch looked in frames/.
Now: prompt tells Claude to save to screenshots/ dir with numbered names,
collect step consolidates PNGs there, stitch step globs from screenshots/.
Removed video-start/video-stop (Claude doesn't use them).
2026-03-25 03:17:27 +00:00
snomiao
9a7b5f88a0 fix: use playwright-cli video recording and collect default output
- Add playwright-cli config with outputDir and saveVideo
- Use video-start/video-stop instead of relying on screenshot frames
- Add fallback artifact collection from .playwright-cli/ default dir
- Simplify prompts to focus on video recording workflow
2026-03-25 03:17:27 +00:00