- Add issues:[labeled] trigger and qa-issue label support
- Resolve github.event.issue.number for issue-triggered runs
- Include issue labels in context (feeds keyword matcher for hints)
- Remove qa-issue label after run completes (same as qa-changes/qa-full)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
setup-frontend must run first to install node/pnpm, then rebuild
with PR code. Also re-install sno-skills deps after switching back
so QA scripts' dependencies are available.
Also gitignore .claude/scheduled_tasks.lock.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When triggered via sno-qa-* push, the workflow checks out the PR code
to build its frontend, but this replaces qa-record.ts which only
exists on sno-skills. Fix: build PR frontend, then checkout back to
sno-skills so QA scripts remain available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Check INCONCLUSIVE before reproduced/confirmed in badge detection
- Exclude markdown headings from reproduced grep match
- Add --pass-label to qa-video-review.ts for unique multi-pass filenames
- Pass pass label from workflow YAML when reviewing numbered sessions
- Collect all pass-specific reports in deploy script HTML
- Add addNode/cloneNode convenience actions to qa-record agent
- Improve strategy hints for visual/rendering bug reproduction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove push trigger (was for dev testing only)
- Restore concurrency group (was commented out for dev)
- Move misplaced import in qa-analyze-pr.ts to top of file
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Strengthen prompt: MUST use openMenu → hoverMenuItem → clickMenuItem
in that order. Previous runs skipped openMenu causing silent failures.
- Add CI Job link to the QA report site header for quick navigation
to the GitHub Actions run that generated the report.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GitHub Actions has a 21000 char limit per expression. The combined
badge setup step exceeded this after adding the dual badge generator.
Split into its own step.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Badge now reads: QA Bot | REPRODUCED | Fix: APPROVED
Not all issues are bugs — could be feature requests too.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reproducing a bug is a successful outcome for the QA bot.
Blue (#2196f3) = bot succeeded. Red = bot found problems with the fix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PRs now show one badge with three segments:
QA Bot | Bug: REPRODUCED | Fix: APPROVED
Instead of two separate badges. Uses gen-badge-dual.sh which
renders label + bug status + fix status in one SVG.
Issues still use single two-segment badge:
QA Bot | FINISHED: REPRODUCED
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: runAgenticLoop never read the QA guide — agent saw
"No issue context provided" for issues. Now reads qaGuideFile,
parses structured fields, and injects into system prompt.
Also: fetch issue body via gh issue view in workflow, increase
budget to 120s/30 turns, add focus reminders, smarter stuck
detection (50px grid normalization + action-type frequency),
reject invalid click targets, add loadDefaultWorkflow and
openSettings convenience actions, strategy hints in prompt.
Fix pre-existing typecheck error in eslint.config.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PRs now get two separate badges:
- Bug: REPRODUCED / NOT REPRODUCIBLE / PARTIAL (before branch)
- Fix: APPROVED / MAJOR ISSUES / MINOR ISSUES (after branch)
Issues keep a single badge: FINISHED: REPRODUCED / etc.
Both badge-bug.svg and badge-fix.svg served from the deploy site.
PR comment shows all three: ![badge] ![bug] ![fix]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FINISHED is not standalone — always shows result:
- FINISHED: REPRODUCED / NOT REPRODUCIBLE / PARTIAL (issues)
- FINISHED: APPROVED / MAJOR ISSUES / MINOR ISSUES (PRs)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Badge shows QA pipeline status, deployed at each stage:
- PREPARING (blue) — setting up artifacts
- ANALYZING (orange) — running video review
- Final status with color:
- Issues: REPRODUCED (red) / NOT REPRODUCIBLE (gray) / PARTIAL (yellow)
- PRs: APPROVED (green) / MAJOR ISSUES (red) / MINOR ISSUES (yellow)
Badge served as /badge.svg from the same Cloudflare Pages site.
Included in PR comment as .
Also restore @ts-expect-error for import-x plugin type incompatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace single-shot step generation in reproduce mode with an agentic
loop where Gemini sees the screen after each action and decides what
to do next. For multi-bug issues, decompose into sub-issues and run
separate recording passes.
- Extract executeAction() from executeSteps() for reuse
- Add reload and done action types
- Add captureScreenshotForGemini() (JPEG q50, ~50KB)
- Add runAgenticLoop() with sliding window history (3 screenshots)
- Add decomposeIssue() for multi-pass recording (1-3 sub-issues)
- Update workflow to handle numbered session videos (qa-session-1, etc.)
- Add custom video controls below each video with frame stepping
- Frame back/forward buttons (1 frame at 30fps, 10 frames skip)
- Speed selector: 0.1x, 0.25x, 0.5x (default), 1x, 1.5x, 2x
- Keyboard shortcuts: arrow keys for frame step, space for play/pause
- SMPTE-style timecode display (m:ss.ms)
- Default 0.5x speed since AI operates UI faster than humans
- Videos no longer autoplay (pause on load for inspection)
- Zero external dependencies (pure HTML5 video API)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add --target-url CLI option to qa-video-review.ts
- Include target URL in generated markdown reports
- Add clickable issue/PR link in deployed HTML report header
- Workflow passes the target URL automatically
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gh pr view can't distinguish PRs from issues — it succeeds for both.
Use the REST API endpoint repos/{owner}/{repo}/pulls/{number} which
returns 404 for issues.
- Enforce requestTimeoutMs via Gemini SDK requestOptions
- Add 100MB video size check before base64 encoding
- Sanitize screenshot filenames to prevent path traversal
- Sort video files by mtime for reliable rename
- Validate --mode arg against allowed values
- Add Content-Length pre-check in downloadMedia
- Add GitHub domain allowlist for media downloads (SSRF mitigation)
- Add contents:write permission and git config for report job
- Update Node.js requirement in SKILL.md from 18+ to 22+
Add a "Behavior Changes" table (Behavior, Before, After, Verdict)
alongside the existing timeline comparison. This gives reviewers a
quick high-level view of all behavioral differences before diving
into the frame-by-frame timeline.
Instruct Gemini to output the Before vs After section as a markdown
table with Time, Type, Severity, Before, After columns for easier
comparison. Update HTML template table styles with fixed layout and
column widths optimized for the 5-column comparison format.
- Log raw Gemini response for debugging when parsing fails
- Handle possible wrapper keys in response
- Make qa-before/qa-after run even if analyze-pr fails (only gate
on resolve-matrix success)
When running on push events for sno-qa-* branches without an open PR,
extract the PR number from the branch name so analyze-pr can fetch
the full PR thread for analysis.
Add Gemini Pro-powered PR analysis that generates targeted QA guides
from the full PR thread (description, comments, screenshots, diff).
The analyze-pr job runs on lightweight ubuntu before recordings start,
producing qa-guide-before.json and qa-guide-after.json that are
downloaded by recording jobs to produce more focused test steps.
Graceful fallback: if analysis fails, recordings proceed without guides.
download-artifact@v7 merges all files flat regardless of
merge-multiple setting. Use separate path dirs (before/after)
and copy all files into the report directory.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
download-artifact@v7 defaults merge-multiple to true, which puts all
files flat in qa-artifacts/ instead of per-artifact subdirectories.
The merge step expects qa-artifacts/qa-before-{os}-{run}/ subdirs,
so the report directory never gets created and video review finds
no files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of running before/after sequentially in a single job with
fragile git stash/checkout gymnastics, split into two independent
parallel jobs on separate runners:
resolve-matrix → qa-before (main) ─┐
→ qa-after (PR) ─┴→ report
- qa-before: uses git worktree for clean main branch build
- qa-after: normal PR build via setup-frontend
- report: downloads both artifact sets, merges, runs Gemini review
Benefits:
- Clean workspace isolation (no git checkout origin/main -- .)
- ~2x faster (parallel execution)
- Each job gets its own ComfyUI server (no shared state)
- Eliminates entire class of workspace contamination bugs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Firefox headless doesn't support WebGL, causing "getCanvas: canvas is
null" errors. Switch to Chromium which has full headless WebGL support.
Also fix login flow to wait for async router guard to settle and
create user via text input as fallback.
The QA recordings were stuck on the user selection screen because CI
has no existing users. Fix by pre-seeding localStorage with userId,
userName, and TutorialCompleted before navigation, plus creating a
qa-ci user via API as a fallback.
The main branch build step was running pnpm install with main's lockfile,
which removed @google/generative-ai from node_modules. Move the reinstall
to after restoring PR files so the QA recording script can find its deps.
git checkout - uses @{-1} which requires a previous branch switch.
Since we use 'git checkout origin/main -- .' (file checkout, not branch
switch), there is no @{-1} ref. Use HEAD to restore from current branch.
Also restore proper concurrency group.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pass the comprehensive test plan from .claude/skills/comfy-qa/SKILL.md
to Gemini when generating test steps. This gives Gemini knowledge of all
12 QA categories (canvas, menus, sidebar, settings, etc.) so it picks
the most relevant tests for each PR.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
nx build runs typecheck as a prerequisite (via @nx/vite/plugin config).
Use vite build directly for the main branch comparison build.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main branch may have transient TS errors when built with the PR
branch's lockfile. Since we only need the dist for visual comparison,
run nx build directly instead of pnpm build (which includes typecheck).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the unreliable codex exec approach with a Playwright script
(qa-record.ts) that uses Gemini to generate targeted test steps from
the PR diff, then executes them deterministically via Playwright's API.
Key changes:
- New scripts/qa-record.ts: Gemini generates JSON test actions, Playwright
executes them with reliable helper functions (menu nav, dialog fill, etc.)
- Remove codex CLI and playwright-cli dependencies
- Remove 150+ lines of prompt templates from pr-qa.yaml
- Firefox headless with video recording (same approach proven locally)
- Fallback steps if Gemini fails
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Tighten BEFORE prompt to 15s snapshot (show old state only)
- Add qa-generate-test.ts: Gemini-powered Playwright test generator
- New workflow step: generate .spec.ts and push to {branch}-add-qa-test
- Tests assert UIUX behavior (tab names, dirty state, visibility)
- Build both main (dist-before/) and PR (dist/) frontends in focused mode
- Run QA twice: BEFORE on main branch frontend, AFTER on PR branch
- Send both videos to Gemini in one request for comparative analysis
- Side-by-side dashboard layout with Before (main) / After (PR) panels
- Comparative prompt evaluates whether before confirms old behavior
and after proves the fix works
- Falls back to single-video mode when no before video available
moov atom was at end of file (8.6MB offset) — browser had to download
the entire video before seeking. Keyframes were only every 10 seconds.
Add -movflags +faststart (moov before mdat) and -g 60 (keyframe every
2.4s at 25fps) to ffmpeg conversion.
- Remove autoplay/loop so video timeline is seekable
- Skip card generation for platforms without recordings
- Add --pr-context flag to qa-video-review.ts so Gemini evaluates
against PR purpose instead of just describing what happened
- Workflow now builds pr-context.txt from PR title/body/diff