- Add reproducedBy field to ResearchResult and done() tool
- Agent reports how bug was proven: e2e_test, video, both, or none
- Badge shows '1 via E2E test' instead of generic '1 reproduced'
- Deploy script reads reproducedBy from research-log.json
- Test code (reproduce.spec.ts) now deployed to report page
Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800
Co-authored-by: Amp <amp@ampcode.com>
- Add rule: assertions must be specific to the bug (not just > 0)
- If no bug-specific assertion possible, verdict must be NOT_REPRODUCIBLE
- Copy reproduce.spec.ts to deployed report for transparency
- Addresses #10307 false REPRODUCED (test only asserted node count > 0)
Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800
Co-authored-by: Amp <amp@ampcode.com>
Root cause: Cloudflare Pages serves stale deployments when multiple
deploys race to the same branch. The ANALYZING placeholder deployed
seconds before the final report would sometimes 'win' the race.
Fix: Only deploy once — the final report with the real badge. No more
intermediate PREPARING or ANALYZING placeholders.
Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800
Co-authored-by: Amp <amp@ampcode.com>
- Add explicit instruction to call done() immediately after test passes
- Inject warning message in runTest response when test passes
- Ban page.waitForTimeout() in system prompt (use retrying assertions)
- Instruct agent to write ONE focused test, not multiple
Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800
Co-authored-by: Amp <amp@ampcode.com>
- Add readFixture and readTest tools to qa-agent for fixture API discovery
- Enrich system prompt with comprehensive ComfyPage fixture API reference
- Switch CI to ANTHROPIC_API_KEY_QA secret
- Fix all TS errors in qa-agent.ts, qa-record.ts, qa-reproduce.ts
- Better error handling for API credit exhaustion
- Rewrite SKILL.md to reflect three-phase pipeline
Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800
Co-authored-by: Amp <amp@ampcode.com>
Claude sometimes keeps iterating after a test passes, exhausting
the time budget without calling done(). Now: when runTest() returns
TEST PASSED, the test code is saved. If the agent loop ends without
done(), auto-sets verdict=REPRODUCED with the passing test.
Fixes#8532 (17 calls, test passed twice, but INCONCLUSIVE verdict).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added || true to all grep/sed pipe chains that could exit non-zero:
- grep on pr-context.txt (line 149)
- sed/grep on pr-context description (line 158-159)
- grep -oiP on RISK_FIRST (line 331)
- wrangler deploy | grep URL (line 355)
All tested under set -euo pipefail with empty inputs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sed on video-reviews/*.md returns exit code 2 when no files match,
killing the script under set -euo pipefail. Added || true to all
potentially empty glob pipelines.
Affects 13/20 QA runs that had successful research but failed deploy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test uses comfyPageFixture, not bare page. Also match
firstNode await calls for node interaction pauses.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Regex inserts await page.waitForTimeout(800) before every
comfyPage/topbar/page/canvas/expect await call in the Phase 2
test code. Adds ~5-8s to a 10-step test (negligible vs 10min research).
Default playback changed to 0.5x (was 0.25x) since pauses provide
natural breathing room. A 15s video at 0.5x = 30s viewing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Report player defaults to 0.25x speed (was 0.5x) — 5s test videos
play in 20s, much more watchable
- Phase 2 injects cursor overlay via addInitScript into the test code
before running — white SVG arrow follows mousemove events
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Playwright config only records video when PLAYWRIGHT_LOCAL is set.
In CI, this env var was missing so Phase 2 produced no video.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After context.close(), renameLatestWebm would overwrite the Phase 2
test execution video with the idle research browser recording.
Now skips the rename if qa-session.webm already exists from Phase 2.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The old video showed an idle screen (research browser doing nothing).
Now Phase 2 runs the test with --video=on from browser_tests/tests/,
finds the recorded .webm, and copies it to qa-session.webm where
the deploy script expects it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Copy test to browser_tests/tests/ where Playwright config expects it
- System prompt teaches Claude the project's test fixtures:
comfyPageFixture, comfyPage.menu.topbar, comfyPage.workflow, etc.
- Increased time budget to 10 min for write→run→fix iterations
- Increased max turns to 50
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Also search qa-artifacts/before/*/research/ for the research log
since artifacts are downloaded with that nested structure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 1: Claude reads issue + a11y tree, writes a Playwright .spec.ts
test that asserts the bug exists. Runs the test, reads errors, iterates
until the test passes (proving the bug) or determines NOT_REPRODUCIBLE.
Phase 2: Run the passing test with --video=on for clean recording.
This replaces interactive browser driving with deterministic test code.
Claude Sonnet 4.6 excels at writing Playwright tests — much more
reliable than real-time browser interaction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
page.accessibility.snapshot() was removed in Playwright 1.49+.
Use page.locator('body').ariaSnapshot() which returns a text
representation of the accessibility tree.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Copy research-log.json to deploy dir (accessible at /research-log.json)
- Read verdict from research log first (a11y-verified ground truth)
- Fall back to video review verdict only if no research log exists
- Research log is uploaded as part of QA artifacts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 1 (qa-agent.ts): Claude investigates via a11y API only.
- No video, no Gemini vision — only page.accessibility.snapshot()
- Every action logged with a11y before/after state
- done() requires evidence citing inspect() results
- Outputs reproduction plan for Phase 2
Phase 2 (qa-reproduce.ts): Deterministic replay of research plan.
- Executes each step with a11y assertions
- Gemini describes visual changes (narration for humans)
- Clean focused video with subtitles
Phase 3: Report job reads research-log.json for verdict (ground truth),
narration-log.json for descriptions, video for visuals.
Gemini formats logs into report — never determines verdict.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent: MUST use inspect() after every action, verdict based on DOM
state not opinions. "NEVER claim REPRODUCED unless inspect() confirms."
Reviewer: Two-phase prompt — Phase 1 describes what it SEES (blind,
no context). Phase 2 compares observations against issue/PR context.
Anti-hallucination rules: "describe ONLY what you observe, NEVER infer."
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ANTHROPIC_API_KEY is optional: Agent SDK uses Claude Code OAuth
session when running locally (detects CLAUDE_CODE_SSE_PORT).
In CI, ANTHROPIC_API_KEY from secrets is used.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Gemini-only agentic loop had ~47% success rate — too low to be
useful as a fallback. Now ANTHROPIC_API_KEY is required for issue
reproduction. Fails clearly if missing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
addInitScript runs before page load — Vue's app mount destroys the
cursor div when it takes over the DOM. Using addScriptTag after login
ensures the cursor persists in the stable DOM.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Locator.click/hover bypasses our page.mouse monkey-patch. Now
clickByText, hoverMenuItem, clickSubmenuItem get the element
bounding box and update cursor overlay manually.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The grep \{"verdict":\s*"[^"]+ captures up to but not including the
closing quote. The second grep for "[A-Z_]+"$ then fails because
there's no closing quote. Fixed: match "verdict":\s*"[A-Z_]+ then
extract [A-Z_]+$ (no quotes needed).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Video review prompt now requests a ## Verdict JSON block:
{"verdict": "REPRODUCED|NOT_REPRODUCIBLE|INCONCLUSIVE", "risk": "low|medium|high"}
- Deploy script reads JSON verdict first, falls back to grep
- Eliminates all regex-matching false positives permanently
- Theme: light mode is default, dark via prefers-color-scheme:dark
- Cards use solid backgrounds, grain overlay only in dark mode
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add prefers-color-scheme:light media query with light palette.
Replace hardcoded dark oklch values with CSS variables.
Light mode: white surfaces, dark text, subtle borders, no grain overlay.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- loadDefaultWorkflow now calls app.resetToDefaultWorkflow() via JS API
instead of navigating File → Load Default menu (menu item name varies)
- pressKey reverted to instant press() — the 400ms hold via down/up
prevented Escape from propagating to parent dialog (#10397 BEFORE video
showed wrong behavior because hold intercepted the event)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of manually calling moveCursorOverlay in each action,
patch page.mouse.move/click/dblclick/down/up globally. Now EVERY
mouse operation shows the cursor — text clicks, menu hovers, etc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
"could not be confirmed" contains "confirmed" which matched the
positive reproduc|confirm check. Now caught by the negative check first.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Agent SDK returned "model not found" for claude-sonnet-4-6-20250514.
Correct ID is claude-sonnet-4-6.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Headless Chrome's Playwright CDP doesn't trigger DOM mousemove events
reliably. Now executeAction calls __moveCursor(x,y) directly after
every mouse.move/click/drag. Cursor is an SVG arrow (white + outline).
Click state shown via scale animation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pressKey now uses keyboard.down/up with 400ms hold instead of
instant press(). Shows subtitle "⌨ Escape" and the keyboard HUD
catches the held state for video frame capture.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Agent system prompt now instructs Claude to demonstrate BOTH working
(control) and broken (test) states when bug is triggered by a setting
- Added docs/qa/backlog.md with future improvements: Type B/C comparisons,
TTS, pre-seeding, cost optimization, environment-dependent issues
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Shows "QA @ abc1234" linking to the pipeline code commit
- Shows start time → deploy time in header
- Helps trace which version of QA scripts generated each report
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Architecture:
- Claude Sonnet 4.6 plans and reasons (via Claude Agent SDK)
- Gemini 2.5 Flash watches video buffer and describes what it sees
- 4 tools: observe(), inspect(), perform(), done()
observe(seconds, focus): builds video clip from screenshot buffer,
sends to Gemini with Claude's focused question.
inspect(selector): searches a11y tree for specific element state.
perform(action, params): executes Playwright action.
done(verdict, summary): signals completion.
Falls back to Gemini-only loop if ANTHROPIC_API_KEY not set.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Agent reasoning shown as subtitle bar at bottom of video during recording
- After recording, generates TTS audio via OpenAI API (tts-1, nova voice)
- Merges audio clips at correct timestamps into the video with ffmpeg
- Requires OPENAI_API_KEY env var; gracefully skips if not set
- No-sandbox + disable-dev-shm-usage for headless Chrome compatibility
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Download QA guide artifact in report job
- Extract prerequisites, test focus, and steps from guide JSON
- Display below the purpose description: focus → prerequisites → steps
- Separated by a subtle divider with smaller font
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Report site shows "PR #N aims to..." or "Issue #N reports..." block
above the video cards, extracted from pr-context.txt
- Multi-pass video links fall back to pass1 when qa-{os}.mp4 is 404
- More negative verdict patterns: "does not demonstrate", "never tested"
- Risk uses first word of Overall Risk (avoids "high confidence" match)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "does not demonstrate", "steps were not performed", "never tested"
to NOT_REPRO patterns (fixes#9101 false positive)
- Risk detection uses first word of Overall Risk section instead of
grepping entire text (fixes "high confidence" matching HIGH)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tsx compiles arrow functions with __name helpers that don't exist in
browser context. Using addScriptTag with plain JS string avoids this.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set<string>() in page.evaluate causes __name ReferenceError in browser.
Use untyped Set() since browser JS doesn't support TS generics.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
"fails to reproduce" contains "reproduce" — must check negatives first
within each report. Across reports, REPRODUCED still wins (multi-pass).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Injects a persistent overlay in bottom-right corner that displays
currently held keys (e.g. "⌨ Space", "⌨ CTRL+C"). Makes keyboard
interactions visible in the recording for both human and AI reviewers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When multiple report files exist, badge shows "2/3 REPRODUCED" instead
of just "REPRODUCED". Single-pass issues still show plain verdict.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When multiple passes exist and one confirms while another is
inconclusive, the badge should show REPRODUCED. Previously
INCONCLUSIVE was checked first, hiding successful reproductions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The "Clone" context menu item doesn't exist in Nodes 2.0 mode.
Using Ctrl+C/Ctrl+V works in both legacy and Nodes 2.0.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- #10307: preflight clones KSampler node, hint says drag to overlap
- #7414: preflight clicks numeric widget, hint says drag to change value
- #7806: preflight takes baseline screenshot, hint gives exact coords
for holdKeyAndDrag with spacebar
- Hints now reference "Preflight already did X, NOW do Y" pattern
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent was wasting turns re-doing loadDefaultWorkflow and setSetting
that preflight already executed. Now the system prompt includes
"Already Done" section listing preflight actions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-execute prerequisite actions (enable Nodes 2.0, load default
workflow) BEFORE the agentic loop starts. Agent model ignores prompt
hints but preflight guarantees nodes are on canvas.
- Add "fails to reproduce" to NOT REPRODUCIBLE badge patterns
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add issues:[labeled] trigger and qa-issue label support
- Resolve github.event.issue.number for issue-triggered runs
- Include issue labels in context (feeds keyword matcher for hints)
- Remove qa-issue label after run completes (same as qa-changes/qa-full)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scan issue context for keywords (clone, copy-paste, spacebar, resize,
sidebar, scroll, middle-click, node shape, Nodes 2.0, etc.) and inject
specific MUST-follow action steps into the agentic system prompt.
Addresses 9 INCONCLUSIVE issues where agent had actions available
but didn't know when to use them.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Annotations now use cyan dashed border + monospace "QA:" prefix
instead of red solid labels that look like UI error messages
- Video review prompts explicitly tell reviewer to ignore QA annotations
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change preload=metadata to preload=auto for full video download
- Add _headers file with Accept-Ranges for Cloudflare Pages
- Add custom seekbar (range input + buffer indicator) that works
even without server HTTP range request support
- Seekbar shows buffered progress and allows dragging to any point
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previous prompt said "test the specific behavior" which was too vague,
leading to generic UI walkthroughs instead of targeted tests.
New prompt: explicitly instructs to read the diff, trigger the exact
scenario the PR fixes, and avoid generic menu screenshots.
Also added reload action to before/after prompt for state persistence tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix quality badge now reads "## Overall Risk" section only
- Prevents false MAJOR ISSUES from severity labels or negated phrases
- "Low" risk → APPROVED, "High" → MAJOR ISSUES, "Medium" → MINOR ISSUES
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
setup-frontend must run first to install node/pnpm, then rebuild
with PR code. Also re-install sno-skills deps after switching back
so QA scripts' dependencies are available.
Also gitignore .claude/scheduled_tasks.lock.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When triggered via sno-qa-* push, the workflow checks out the PR code
to build its frontend, but this replaces qa-record.ts which only
exists on sno-skills. Fix: build PR frontend, then checkout back to
sno-skills so QA scripts remain available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SC2231: quote glob expansions in for loop
- SC2002: use sed directly instead of cat | sed
- SC2086: quote variable in echo
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update DefaultThumbnail test to match size-full class change
- Fix shellcheck warnings in qa-batch.sh (SC2001, SC2207)
- Fix shellcheck warnings in qa-deploy-pages.sh (SC2034, SC2235, SC2231, SC2002)
- Add qa-report-template.html to oxfmt ignore (minified, not formattable)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- "confirmed" didn't match "confirms"/"reproducible" — use "reproduc|confirm" stem
- "partial" matched unrelated text — require "partially reproduced" specifically
- collectVideoCandidates now finds qa-session-*.mp4 for multi-pass reviews
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Check INCONCLUSIVE before reproduced/confirmed in badge detection
- Exclude markdown headings from reproduced grep match
- Add --pass-label to qa-video-review.ts for unique multi-pass filenames
- Pass pass label from workflow YAML when reviewing numbered sessions
- Collect all pass-specific reports in deploy script HTML
- Add addNode/cloneNode convenience actions to qa-record agent
- Improve strategy hints for visual/rendering bug reproduction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove push trigger (was for dev testing only)
- Restore concurrency group (was commented out for dev)
- Move misplaced import in qa-analyze-pr.ts to top of file
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Strengthen prompt: MUST use openMenu → hoverMenuItem → clickMenuItem
in that order. Previous runs skipped openMenu causing silent failures.
- Add CI Job link to the QA report site header for quick navigation
to the GitHub Actions run that generated the report.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GitHub Actions has a 21000 char limit per expression. The combined
badge setup step exceeded this after adding the dual badge generator.
Split into its own step.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Badge now reads: QA Bot | REPRODUCED | Fix: APPROVED
Not all issues are bugs — could be feature requests too.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reproducing a bug is a successful outcome for the QA bot.
Blue (#2196f3) = bot succeeded. Red = bot found problems with the fix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PRs now show one badge with three segments:
QA Bot | Bug: REPRODUCED | Fix: APPROVED
Instead of two separate badges. Uses gen-badge-dual.sh which
renders label + bug status + fix status in one SVG.
Issues still use single two-segment badge:
QA Bot | FINISHED: REPRODUCED
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Inject fake cursor (red dot with click animation) via addInitScript
since headless Chrome doesn't render the system cursor in video
- Add hover-before-click delay to clickByText and canvas clicks
so viewers can see where the cursor moves before clicking
- Add 'annotate' action: shows a floating label at (x,y) for N ms
so AI can draw viewer attention to important UI state changes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: runAgenticLoop never read the QA guide — agent saw
"No issue context provided" for issues. Now reads qaGuideFile,
parses structured fields, and injects into system prompt.
Also: fetch issue body via gh issue view in workflow, increase
budget to 120s/30 turns, add focus reminders, smarter stuck
detection (50px grid normalization + action-type frequency),
reject invalid click targets, add loadDefaultWorkflow and
openSettings convenience actions, strategy hints in prompt.
Fix pre-existing typecheck error in eslint.config.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PRs now get two separate badges:
- Bug: REPRODUCED / NOT REPRODUCIBLE / PARTIAL (before branch)
- Fix: APPROVED / MAJOR ISSUES / MINOR ISSUES (after branch)
Issues keep a single badge: FINISHED: REPRODUCED / etc.
Both badge-bug.svg and badge-fix.svg served from the deploy site.
PR comment shows all three: ![badge] ![bug] ![fix]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FINISHED is not standalone — always shows result:
- FINISHED: REPRODUCED / NOT REPRODUCIBLE / PARTIAL (issues)
- FINISHED: APPROVED / MAJOR ISSUES / MINOR ISSUES (PRs)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Badge shows QA pipeline status, deployed at each stage:
- PREPARING (blue) — setting up artifacts
- ANALYZING (orange) — running video review
- Final status with color:
- Issues: REPRODUCED (red) / NOT REPRODUCIBLE (gray) / PARTIAL (yellow)
- PRs: APPROVED (green) / MAJOR ISSUES (red) / MINOR ISSUES (yellow)
Badge served as /badge.svg from the same Cloudflare Pages site.
Included in PR comment as .
Also restore @ts-expect-error for import-x plugin type incompatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace single-shot step generation in reproduce mode with an agentic
loop where Gemini sees the screen after each action and decides what
to do next. For multi-bug issues, decompose into sub-issues and run
separate recording passes.
- Extract executeAction() from executeSteps() for reuse
- Add reload and done action types
- Add captureScreenshotForGemini() (JPEG q50, ~50KB)
- Add runAgenticLoop() with sliding window history (3 screenshots)
- Add decomposeIssue() for multi-pass recording (1-3 sub-issues)
- Update workflow to handle numbered session videos (qa-session-1, etc.)
- Add custom video controls below each video with frame stepping
- Frame back/forward buttons (1 frame at 30fps, 10 frames skip)
- Speed selector: 0.1x, 0.25x, 0.5x (default), 1x, 1.5x, 2x
- Keyboard shortcuts: arrow keys for frame step, space for play/pause
- SMPTE-style timecode display (m:ss.ms)
- Default 0.5x speed since AI operates UI faster than humans
- Videos no longer autoplay (pause on load for inspection)
- Zero external dependencies (pure HTML5 video API)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reproduce video must be max 5 minutes (short, focused demo)
- Phase 4 reuses the environment from Phase 3 (no re-setup)
- Use video-start/video-stop commands (not --save-video flag)
- Start recording right before steps, stop immediately after
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New Claude agent-driven issue reproduction skill that:
- Phase 1-2: Research issue and set up environment (custom nodes, workflows, settings)
- Phase 3: Record research video while exploring interactively via playwright-cli
- Phase 4: Record clean reproduce video with only the minimal repro steps
- Phase 5: Generate structured reproduction report
Key difference from the old approach: Claude agent explores and adapts
instead of blindly executing a Gemini-generated static plan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gemini was right-clicking empty canvas instead of nodes because it
didn't know where the default workflow nodes are positioned. Now the
prompt includes approximate coordinates for all 7 default nodes and
clarifies the difference between node context menu vs canvas menu.
Also fixes TS2352 in page.evaluate by using double-cast through unknown.
- Add --target-url CLI option to qa-video-review.ts
- Include target URL in generated markdown reports
- Add clickable issue/PR link in deployed HTML report header
- Workflow passes the target URL automatically
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused @ts-expect-error directives in eslint.config.ts
- Simplify LazyImage prop types from ClassValue to string
- Fix DialogInstance to avoid infinitely deep type instantiation
- Use cn() in DefaultThumbnail for class merging
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- fillDialog now tries: PrimeVue dialog → node search box → focused input → keyboard fallback
- clickSubmenuItem now tries: PrimeVue tiered menu → litegraph context menu → role menuitem
- Fixes double-click-to-add-node flow and right-click context menu clicks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap each step in try/catch so failed steps don't abort the recording
- Add 5s timeout to clickByText to prevent 30s hangs on disabled elements
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gh pr view can't distinguish PRs from issues — it succeeds for both.
Use the REST API endpoint repos/{owner}/{repo}/pulls/{number} which
returns 404 for issues.
- Enforce requestTimeoutMs via Gemini SDK requestOptions
- Add 100MB video size check before base64 encoding
- Sanitize screenshot filenames to prevent path traversal
- Sort video files by mtime for reliable rename
- Validate --mode arg against allowed values
- Add Content-Length pre-check in downloadMedia
- Add GitHub domain allowlist for media downloads (SSRF mitigation)
- Add contents:write permission and git config for report job
- Update Node.js requirement in SKILL.md from 18+ to 22+
Instead of crashing the entire recording session when Gemini generates
an invalid key name (e.g. "mouseWheelDown"), catch the error and
continue with remaining steps.
Add a "Behavior Changes" table (Behavior, Before, After, Verdict)
alongside the existing timeline comparison. This gives reviewers a
quick high-level view of all behavioral differences before diving
into the frame-by-frame timeline.
Instruct Gemini to output the Before vs After section as a markdown
table with Time, Type, Severity, Before, After columns for easier
comparison. Update HTML template table styles with fixed layout and
column widths optimized for the 5-column comparison format.
- Log raw Gemini response for debugging when parsing fails
- Handle possible wrapper keys in response
- Make qa-before/qa-after run even if analyze-pr fails (only gate
on resolve-matrix success)
When running on push events for sno-qa-* branches without an open PR,
extract the PR number from the branch name so analyze-pr can fetch
the full PR thread for analysis.
Add Gemini Pro-powered PR analysis that generates targeted QA guides
from the full PR thread (description, comments, screenshots, diff).
The analyze-pr job runs on lightweight ubuntu before recordings start,
producing qa-guide-before.json and qa-guide-after.json that are
downloaded by recording jobs to produce more focused test steps.
Graceful fallback: if analysis fails, recordings proceed without guides.
download-artifact@v7 merges all files flat regardless of
merge-multiple setting. Use separate path dirs (before/after)
and copy all files into the report directory.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
download-artifact@v7 defaults merge-multiple to true, which puts all
files flat in qa-artifacts/ instead of per-artifact subdirectories.
The merge step expects qa-artifacts/qa-before-{os}-{run}/ subdirs,
so the report directory never gets created and video review finds
no files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of running before/after sequentially in a single job with
fragile git stash/checkout gymnastics, split into two independent
parallel jobs on separate runners:
resolve-matrix → qa-before (main) ─┐
→ qa-after (PR) ─┴→ report
- qa-before: uses git worktree for clean main branch build
- qa-after: normal PR build via setup-frontend
- report: downloads both artifact sets, merges, runs Gemini review
Benefits:
- Clean workspace isolation (no git checkout origin/main -- .)
- ~2x faster (parallel execution)
- Each job gets its own ComfyUI server (no shared state)
- Eliminates entire class of workspace contamination bugs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Pre-seed step creates qa-ci via API, so the "New user" form
shows "already exists" error. Fix by selecting the existing user
from the dropdown first, falling back to a unique username.
The localStorage userId bypass doesn't work because the server
validates user IDs and rejects the simple 'qa-ci' string. Instead,
detect the login page by its input fields and create a user via the
"New user" text input, which is how real users would log in.
Firefox headless doesn't support WebGL, causing "getCanvas: canvas is
null" errors. Switch to Chromium which has full headless WebGL support.
Also fix login flow to wait for async router guard to settle and
create user via text input as fallback.
Add coordinate fallback when .comfy-menu-button-wrapper selector isn't
found, and capture a debug screenshot after login to diagnose what the
page looks like when the editor UI fails to render.
The openComfyMenu was clicking at hardcoded coordinates (20, 67) which
missed the menu button. Now uses .comfy-menu-button-wrapper selector
matching the browser tests. Also fixes menu item hover/click selectors
to use .p-menubar-item-label and .p-tieredmenu-item classes, and adds
a wait for the editor UI to fully load before executing test steps.
The QA recordings were stuck on the user selection screen because CI
has no existing users. Fix by pre-seeding localStorage with userId,
userName, and TutorialCompleted before navigation, plus creating a
qa-ci user via API as a fallback.
The main branch build step was running pnpm install with main's lockfile,
which removed @google/generative-ai from node_modules. Move the reinstall
to after restoring PR files so the QA recording script can find its deps.
git checkout - uses @{-1} which requires a previous branch switch.
Since we use 'git checkout origin/main -- .' (file checkout, not branch
switch), there is no @{-1} ref. Use HEAD to restore from current branch.
Also restore proper concurrency group.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pass the comprehensive test plan from .claude/skills/comfy-qa/SKILL.md
to Gemini when generating test steps. This gives Gemini knowledge of all
12 QA categories (canvas, menus, sidebar, settings, etc.) so it picks
the most relevant tests for each PR.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
nx build runs typecheck as a prerequisite (via @nx/vite/plugin config).
Use vite build directly for the main branch comparison build.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main branch may have transient TS errors when built with the PR
branch's lockfile. Since we only need the dist for visual comparison,
run nx build directly instead of pnpm build (which includes typecheck).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the unreliable codex exec approach with a Playwright script
(qa-record.ts) that uses Gemini to generate targeted test steps from
the PR diff, then executes them deterministically via Playwright's API.
Key changes:
- New scripts/qa-record.ts: Gemini generates JSON test actions, Playwright
executes them with reliable helper functions (menu nav, dialog fill, etc.)
- Remove codex CLI and playwright-cli dependencies
- Remove 150+ lines of prompt templates from pr-qa.yaml
- Firefox headless with video recording (same approach proven locally)
- Fallback steps if Gemini fails
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Tighten BEFORE prompt to 15s snapshot (show old state only)
- Add qa-generate-test.ts: Gemini-powered Playwright test generator
- New workflow step: generate .spec.ts and push to {branch}-add-qa-test
- Tests assert UIUX behavior (tab names, dirty state, visibility)
- Build both main (dist-before/) and PR (dist/) frontends in focused mode
- Run QA twice: BEFORE on main branch frontend, AFTER on PR branch
- Send both videos to Gemini in one request for comparative analysis
- Side-by-side dashboard layout with Before (main) / After (PR) panels
- Comparative prompt evaluates whether before confirms old behavior
and after proves the fix works
- Falls back to single-video mode when no before video available
moov atom was at end of file (8.6MB offset) — browser had to download
the entire video before seeking. Keyframes were only every 10 seconds.
Add -movflags +faststart (moov before mdat) and -g 60 (keyframe every
2.4s at 25fps) to ffmpeg conversion.
- Remove autoplay/loop so video timeline is seekable
- Skip card generation for platforms without recordings
- Add --pr-context flag to qa-video-review.ts so Gemini evaluates
against PR purpose instead of just describing what happened
- Workflow now builds pr-context.txt from PR title/body/diff
The Codex agent was spending 35s browsing the "Getting Started" template
gallery instead of testing the PR's changes. Pre-seeding this setting
via the ComfyUI API ensures the agent lands directly in the graph editor.
The Codex agent was spending time on login flow, template browsing,
and general smoke testing instead of testing the PR's actual changes.
Changes:
- Add 30-second time budget for video recording
- Move video-start AFTER login and editor verification
- Explicitly prohibit template browsing and sidebar exploration
- Reduce test steps to 3-6 targeted actions
- Restructure prompt with clear Instructions/Rules sections
Replace crude sed-based markdown conversion with client-side
rendering via marked.js CDN. Adds proper table, list, and
code styling for the report section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the OpenAI GPT-based frame extraction approach (ffmpeg + screenshots)
with Gemini 2.5 Flash's native video understanding. This eliminates false
positives from frame-based analysis (e.g. "black screen = critical bug" during
page transitions) and produces dramatically better QA reviews.
Changes:
- Remove ffmpeg frame extraction, ffprobe duration detection, and all related
logic (~365 lines removed)
- Add @google/generative-ai SDK for native video/mp4 upload to Gemini
- Update CLI: remove --max-frames, --min-interval-seconds, --keep-frames flags
- Update env: OPENAI_API_KEY → GEMINI_API_KEY
- Update workflow: swap API key secret and model in pr-qa.yaml
- Update report: replace "Frames analyzed" with "Video size"
- Add note in prompt that brief black frames during transitions are normal
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
playwright-cli doesn't support 'evaluate' command. Instead, instruct
Codex to quickly fill the username input and click Next on user-select
page BEFORE starting video recording, so the video only shows actual
QA testing.
storageState config doesn't work with playwright-cli. Instead, use
evaluate to set Comfy.userId/userName after opening the page, then
navigate back. This skips user-select before video-start so the
recording only shows actual QA testing.
Write a Playwright storageState JSON with Comfy.userId/userName pre-set
so the app loads directly to the graph editor. Saves ~40s per QA run
that was wasted on navigating the user-select page.
The convert step was using find which picked up a 0-byte file from
playwright's videos/ directory instead of the valid qa-session.webm.
Now prefers qa-session.webm explicitly and skips empty files.
Codex was using pnpm dlx instead of the global playwright-cli.
Pre-install chromium in setup step and make prompt explicit about
using the global command directly without pnpm/npx.
Replace claude --print with codex exec for cheaper QA runs.
Uses codex-mini-latest model ($1.50/$6 vs Sonnet $3/$15).
Uses existing OPENAI_API_KEY secret (no new secrets needed).
- Replace saveVideo config (didn't produce video) with explicit
playwright-cli video-start/video-stop commands in QA prompt
- Remove apt-get install ffmpeg step (pre-installed on GH runners)
- Switch video review model from gpt-4o to gpt-4.1-mini
- Enable saveVideo in playwright-cli config for real video recording
- Replace screenshot stitching with webm→mp4 conversion
- Move video review step before deploy so reports are included
- Add GPT video review reports inline on the Cloudflare Pages site
- Each video card now has expandable "GPT Video Review" section
- Set .playwright/cli.config.json with outputDir pointing to screenshots/
- This way bare 'playwright-cli screenshot' auto-saves to the right place
- Create screenshot directory before Claude runs (don't rely on Claude)
- Collect step now searches working directory for stray PNGs
- Simplified prompt: no --filename needed, just 'playwright-cli screenshot'
Screenshots were saved to artifact root but stitch looked in frames/.
Now: prompt tells Claude to save to screenshots/ dir with numbered names,
collect step consolidates PNGs there, stitch step globs from screenshots/.
Removed video-start/video-stop (Claude doesn't use them).
- Add playwright-cli config with outputDir and saveVideo
- Use video-start/video-stop instead of relying on screenshot frames
- Add fallback artifact collection from .playwright-cli/ default dir
- Simplify prompts to focus on video recording workflow
The escaped \$QA_ARTIFACTS in the heredoc produced literal text
'$QA_ARTIFACTS' in the prompt. Claude's Bash tool didn't reliably
expand this env var, so no screenshots or reports were saved.
Remove the escapes so the heredoc expands the variable to the actual
path (e.g. /home/runner/work/_temp/qa-artifacts).
Backtick-wrapped playwright-cli examples in the unquoted heredoc were
being interpreted as bash command substitution, producing empty prompts.
Replace backtick syntax with plain "Run:" prefixed commands.
- Remove all Xvfb/ffmpeg screen recording infrastructure from qa job
(captured blank display since playwright-cli runs headless)
- Add screenshot instructions to QA prompts: Claude saves sequential
frames to $QA_ARTIFACTS/frames/ after every interaction
- Stitch screenshots into video via ffmpeg in report job (2fps)
- Merge video-review job into report job (4 jobs → 3 jobs)
- Unified PR comment with video links + video review in <details> collapse
- Clean up stale QA_VIDEO_REVIEW_COMMENT markers from prior runs
Move extra_server_params input to env var to prevent shell injection
from untrusted input. Replace wait-for-it pip dependency with a
cross-platform curl polling loop.
Add Claude Code skills and a label-triggered QA workflow:
- .claude/skills/comfy-qa/SKILL.md: 12-category QA test plan using
playwright-cli for browser automation
- .github/workflows/pr-qa.yaml: CI workflow triggered by qa-changes
(focused, Linux) or qa-full (3-OS matrix) labels. Records screen via
ffmpeg, runs Claude CLI with playwright-cli, deploys video gallery to
Cloudflare Pages, posts PR comment with GIF thumbnails, and runs
OpenAI vision-based video review
- scripts/qa-video-review.ts: frame extraction + GPT-4o analysis
- scripts/qa-video-review.test.ts: unit tests for video review
- knip.config.ts: resolve knip errors for ingest-types package
*PR Created by the Glary-Bot Agent*
---
## Summary
- Replace all `as unknown as Type` assertions in 59 unit test files with
type-safe `@total-typescript/shoehorn` functions
- Use `fromPartial<Type>()` for partial mock objects where deep-partial
type-checks (21 files)
- Use `fromAny<Type>()` for fundamentally incompatible types: null,
undefined, primitives, variables, class expressions, and mocks with
test-specific extra properties that `PartialDeepObject` rejects
(remaining files)
- All explicit type parameters preserved so TypeScript return types are
correct
- Browser test `.spec.ts` files excluded (shoehorn unavailable in
`page.evaluate` browser context)
## Verification
- `pnpm typecheck` ✅
- `pnpm lint` ✅
- `pnpm format` ✅
- Pre-commit hooks passed (format + oxlint + eslint + typecheck)
- Migrated test files verified passing (ran representative subset)
- No test behavior changes — only type assertion syntax changed
- No UI changes — screenshots not applicable
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10761-test-migrate-as-unknown-as-to-total-typescript-shoehorn-3336d73d365081f6b8adc44db5dcc380)
by [Unito](https://www.unito.io)
---------
Co-authored-by: Glary-Bot <glary-bot@users.noreply.github.com>
Co-authored-by: Amp <amp@ampcode.com>
*PR Created by the Glary-Bot Agent*
---
## Summary
Fixes the `Bulk context menu shows when multiple assets selected` test
that is failing on main.
**Root cause — two issues:**
1. `click({ modifiers: ['ControlOrMeta'] })` does not fire `keydown`
events that VueUse's `useKeyModifier('Control')` tracks (used in
`useAssetSelection.ts`). Multi-select silently fails because the
composable never sees the Control key pressed. Fix: use
`keyboard.down('Control')` / `keyboard.up('Control')` around the click.
2. `click({ button: 'right' })` can be intercepted by canvas overlays
(documented gotcha in `browser_tests/AGENTS.md`). Fix: use
`dispatchEvent('contextmenu', { bubbles: true, cancelable: true })`
which bypasses overlay interception.
Also removed the `toPass()` retry wrapper since the root causes are now
addressed directly.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10762-fix-test-fix-bulk-context-menu-test-using-correct-Playwright-patterns-3346d73d3650811c843ee4a39d3ab305)
by [Unito](https://www.unito.io)
---------
Co-authored-by: Glary-Bot <glary-bot@users.noreply.github.com>
## What changed
Added a runtime-safe `#e2e/*` alias for `browser_tests`, updated the
browser test docs, and migrated a representative fixture/spec import
path to the new convention.
## Why
`@/*` only covers `src/`, so browser test imports were falling back to
deep relative paths. `#e2e/*` resolves in both Node/Playwright runtime
and TypeScript.
## Validation
- `pnpm format`
- `pnpm typecheck:browser`
- `pnpm exec playwright test browser_tests/tests/actionbar.spec.ts
--list`
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10735-test-add-runtime-safe-browser_tests-alias-3336d73d36508122b253cb36a4ead1c1)
by [Unito](https://www.unito.io)
---------
Co-authored-by: Alexander Brown <drjkl@comfy.org>
## Summary
- Closing an inactive workflow tab and clicking "Save" overwrites that
workflow with the **active** tab's content, causing permanent data loss
- `saveWorkflow()` and `saveWorkflowAs()` call `checkState()` which
serializes `app.rootGraph` (the active canvas) into the inactive
workflow's `changeTracker.activeState`
- Guard `checkState()` to only run when the workflow being saved is the
active one — in both `saveWorkflow` and `saveWorkflowAs`
## Linked Issues
- Fixes https://github.com/Comfy-Org/ComfyUI/issues/13230
## Root Cause
PR #9137 (commit `9fb93a5b0`, v1.41.7) added
`workflow.changeTracker?.checkState()` inside `saveWorkflow()` and
`saveWorkflowAs()`. `checkState()` always serializes `app.rootGraph` —
the graph on the canvas. When called on an inactive tab's change
tracker, it captures the active tab's data instead.
## Test plan
- [x] E2E: "Closing an inactive tab with save preserves its own content"
— persisted workflow B with added node, close while A is active, re-open
and verify
- [x] E2E: "Closing an inactive unsaved tab with save preserves its own
content" — temporary workflow B with added node, close while A is
active, save-as with filename, re-open and verify
- [x] Manual: open A and B, edit B, switch to A, close B tab, click
Save, re-open B — content should be B's not A's
## What changed
Removed stale `tests-ui` configuration and documentation references from
the repo.
## Why
`tests-ui/` no longer exists, but the repo still carried:
- a dead `@tests-ui/*` tsconfig path
- stale `tests-ui/**/*` include
- a Vite watch ignore for a missing directory
- documentation examples that still referenced the old path
## Validation
- `pnpm format:check`
- `pnpm typecheck`
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10736-chore-remove-stale-tests-ui-config-3336d73d3650814a98bedfc113b6eb9b)
by [Unito](https://www.unito.io)
## Summary
replacement for https://github.com/Comfy-Org/ComfyUI_frontend/pull/9201
the first commit squashed
https://github.com/Comfy-Org/ComfyUI_frontend/pull/9201 and fixed
conflict.
the second commit change needed by:
- Enable GLSL live preview on SubgraphNodes by detecting the inner
GLSLShader and rendering its preview directly on the parent SubgraphNode
- Previously, SubgraphNodes containing a GLSLShader showed no live
preview at all To achieve this:
- Read shader source, uniform values, and renderer config from the inner
GLSLShader's widgets
- Trace IMAGE inputs through the subgraph boundary so the inner shader
can use images connected to the SubgraphNode's outer inputs
- Set preview output using the inner node's locator ID so the promoted
preview system picks it up on the SubgraphNode
- Extract setNodePreviewsByLocatorId from nodeOutputStore to support
setting previews by locator ID directly
- Fix graphId to use rootGraph.id for widget store lookups (was using
graph.id, which broke lookups for nodes inside subgraphs)
- Read uniform values from connected upstream nodes, not just local
widgets
- Fix blob URL lifecycle: use the store's
createSharedObjectUrl/releaseSharedObjectUrl reference-counting system
instead of manual revoke, preventing leaks on composable re-creation
## Screenshot
https://github.com/user-attachments/assets/9623fa32-de39-4a3a-b8b3-28688851390b
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10349-Feat-glsl-live-preview-3296d73d3650814b83aef52ab1962a77)
by [Unito](https://www.unito.io)
## Summary
Extract auth-routing logic (`getAuthHeaderOrThrow`,
`getFirebaseAuthHeaderOrThrow`) from `workspaceApi.ts` into
`authStore.ts`, eliminating a layering violation where the workspace API
re-implemented auth header resolution.
## Changes
- **What**: Moved `getAuthHeaderOrThrow` and
`getFirebaseAuthHeaderOrThrow` from `workspaceApi.ts` to `authStore.ts`.
`workspaceApi.ts` now calls through `useAuthStore()` instead of
re-implementing token resolution. Added tests for the new methods in
`authStore.test.ts`. Updated `authStoreMock.ts` with the new methods.
- **Files**: 4 files changed
## Review Focus
- The `getAuthHeaderOrThrow` / `getFirebaseAuthHeaderOrThrow` methods
throw `AuthStoreError` (auth domain error) — callers in workspace can
catch and re-wrap if needed
- `workspaceApi.ts` is simplified by ~19 lines
## Stack
PR 2/5: #10483 → **→ This PR** → #10485 → #10486 → #10487
## What
- Add `include: ['src/**/*.{ts,vue}']` to vitest coverage config so ALL
source files appear in reports (previously only imported files showed
up)
- Add `lcov` reporter for CI integration and VS Code coverage gutter
- Add `exclude` patterns for test files, locales, litegraph, assets,
declarations, stories
- Add `test:coverage` npm script
## Why
Coverage reports currently only show files that are imported during test
runs. Adding the `include` pattern reveals the true gap — files with
zero coverage that were previously invisible. The lcov reporter enables
IDE integration and future CI coverage comments (Codecov/Coveralls).
## Testing
`npx tsc --noEmit` passes. No behavioral changes — this only affects
coverage reporting configuration.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10575-config-add-vitest-coverage-include-pattern-lcov-reporter-32f6d73d365081c8b59ad2316dd2b198)
by [Unito](https://www.unito.io)
## Summary
Extract `makeMatcher` and `comfyExpect` from `ComfyPage.ts` into the
standalone `browser_tests/fixtures/utils/customMatchers.ts` module,
reducing the page-object file by ~50 lines.
## Changes
- **What**: Removed duplicate `makeMatcher`/`comfyExpect` definitions
from `ComfyPage.ts`; the canonical implementation now lives in
`customMatchers.ts`. A backward-compatible re-export keeps all existing
imports working.
## Review Focus
- The re-export ensures `import { comfyExpect } from
'../fixtures/ComfyPage'` continues to resolve for all ~25 spec files.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10652-refactor-extract-comfyExpect-and-makeMatcher-from-ComfyPage-3316d73d365081bf8e7cd7fa324bf9a6)
by [Unito](https://www.unito.io)
---------
Co-authored-by: Alexander Brown <drjkl@comfy.org>
Co-authored-by: GitHub Action <action@github.com>
## Summary
Adds Playwright E2E tests for the QueueClearHistoryDialog component.
## Tests added
- Dialog opens from queue panel history actions menu
- Dialog shows confirmation message with title, description, and assets
note
- Cancel button closes dialog without clearing history
- Close (X) button closes dialog without clearing history
- Confirm clear action triggers queue history clear API call
- Dialog state resets properly after close/reopen
## Task
Part of Test Coverage Q2 Overhaul (DLG-02).
## Conventions
- Uses Vue nodes with new menu enabled (`Comfy.UseNewMenu: 'Top'`)
- Tests read as user stories
- No full-page screenshots
- Proper waits, no sleeps
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10586-test-add-QueueClearHistoryDialog-E2E-tests-DLG-02-3306d73d36508174a07bd9782340a0f7)
by [Unito](https://www.unito.io)
---------
Co-authored-by: GitHub Action <action@github.com>
## Summary
Comprehensive Playwright E2E tests for the properties panel (right
sidebar).
Part of the **Test Coverage Q2 Overhaul** initiative (Phase 2: PNL-01).
## What's included
- **PropertiesPanelHelper** page object in `browser_tests/helpers/` —
locators + action methods for all panel elements
- **35 test cases** covering:
- Open/close via actionbar toggle
- Workflow Overview (no selection): tabs, title, nodes list, global
settings
- Single node selection: title, parameters, info tab, widgets display
- Multi-node selection: item count, node listing, hidden Info tab
- Title editing: pencil icon, edit mode, rename, visibility rules
- Search filtering: query, clear, empty state
- Settings tab: Normal/Bypass/Mute state, color swatches, pinned toggle
- Selection transitions: no-selection ↔ single ↔ multi
- Nodes tab: list all, search filter
- Tab label changes based on selection count
- **Errors tab scaffold** (for @jaeone94 ADD-03)
## Testing
- All tests use Vue nodes with new menu enabled
- Zero flaky tests (proper waits, no sleeps)
- Screenshots scoped to panel elements
## Unblocks
- **ADD-03** (error systems by @jaeone94) — errors tab scaffold ready to
extend
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10548-test-comprehensive-properties-panel-E2E-tests-PNL-01-32f6d73d36508199a216fd8d953d8e18)
by [Unito](https://www.unito.io)
---------
Co-authored-by: GitHub Action <action@github.com>
## What
12 regression tests covering 10 workflow persistence bug gaps, including
the **critical data corruption fix in PR #9531** (pythongosssss) which
previously had ZERO test coverage.
## Why
Deep scan of 37 workflow persistence bugs found 12 E2E-testable gaps
with no regression tests. Workflow persistence is a core reliability
concern — data corruption bugs are the highest risk category.
## Tests
### 🔴 Critical
| Bug | PR | Tests | Description |
|-----|----|-------|-------------|
| Data corruption | #9531 | 2 | checkState during graph loading corrupts
workflow data |
| State desync | #9533 | 2 | Rapid tab switching desyncs workflow/graph
state |
### 🟡 Medium
| Bug | PR/Commit | Tests | Description |
|-----|-----------|-------|-------------|
| Lost previews | #9380 | 1 | Node output previews lost on tab switch |
| Stale canvas | 44bb6f13 | 1 | Canvas not cleared before loading new
workflow |
| Widget loss | #7648 | 1 | Widget values lost on graph change |
| API format | #9694 | 1 | API format workflows fail with missing nodes
|
| Paste duplication | #8259 | 1 | Middle-click paste duplicates workflow
|
| Blob URLs | #8715 | 1 | Transient blob: URLs in serialization |
### 🟢 Low
| Bug | PR/Commit | Tests | Description |
|-----|-----------|-------|-------------|
| Locale break | #8963 | 1 | Locale change breaks workflows |
| Panel drift | — | 1 | Splitter panel size drift |
## Conventions
- All tests use Vue nodes + new menu enabled
- Each test documents which PR/commit it regresses
- Proper waits (no sleeps)
- Screenshots scoped to relevant elements
- Tests read like user stories
## 🎉 Shoutout
PR #9531 by @pythongosssss was a critical data corruption fix that now
has regression test coverage for the first time.
Part of: Test Coverage Q2 Overhaul (REG-01)
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10547-test-12-workflow-persistence-regression-tests-incl-critical-PR-9531-32f6d73d3650818796c6c5950c77f6d1)
by [Unito](https://www.unito.io)
---------
Co-authored-by: GitHub Action <action@github.com>
## Summary
Add deterministic mock data fixtures for browser tests so they can use
`page.route()` to intercept API calls without depending on a live
backend.
## Changes
- **`browser_tests/fixtures/data/nodeDefinitions.ts`** — Mock
`ComfyNodeDef` objects for KSampler, CheckpointLoaderSimple, and
CLIPTextEncode
- **`browser_tests/fixtures/data/systemStats.ts`** — Mock `SystemStats`
with realistic RTX 4090 GPU info
- **`browser_tests/fixtures/data/README.md`** — Usage guide for
`page.route()` interception
All fixtures are typed against the Zod schemas in `src/schemas/` and
pass `pnpm typecheck:browser`.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10662-test-add-mock-data-fixtures-for-backend-API-responses-3316d73d3650813ea5c8c1faa215db63)
by [Unito](https://www.unito.io)
---------
Co-authored-by: dante01yoon <bunggl@naver.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: GitHub Action <action@github.com>
## Summary
Adds Playwright E2E tests for the SignIn dialog component and its
sub-forms.
## Tests added
- Dialog opens from login button in topbar
- Sign In form is the default view with email/password fields
- Toggle between Sign In and Sign Up forms
- API Key form navigation (forward and back)
- Terms of Service and Privacy Policy links present
- Form field presence verification
- Dialog close behavior (close button and Escape key)
- Forgot password link presence
- 'Or continue with' divider and API key button
## Notes
- Tests focus on UI navigation and element presence (no real Firebase
auth in test env)
- Dialog opened via `extensionManager.dialog.showSignInDialog()` API
- All selectors use stable IDs from the component source
(`#comfy-org-sign-in-email`, etc.)
## Task
Part of Test Coverage Q2 Overhaul (DLG-04).
## Conventions
- Uses Vue nodes with new menu enabled (`Comfy.UseNewMenu: 'Top'`)
- Tests read as user stories
- No full-page screenshots
- Proper waits, no sleeps
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10587-test-add-SignIn-dialog-E2E-tests-DLG-04-3306d73d3650815db171f8c5228e2cf3)
by [Unito](https://www.unito.io)
---------
Co-authored-by: GitHub Action <action@github.com>
## Summary
Expose `renderMarkdownToHtml()` on the `ExtensionManager` interface so
custom node extensions can render markdown to sanitized HTML without
bundling their own copies of `marked`/`DOMPurify`.
## Motivation
Multiple custom node packs (KJNodes, comfy_mtb, rgthree-comfy) bundle
their own markdown rendering libraries to implement help popups on
nodes. This causes:
- **Cloud breakage**: KJNodes uses a `kjweb_async` pattern (custom
aiohttp static route) to lazily load `marked.min.js` and
`purify.min.js`. This 404s on Cloud because the custom route is not
registered.
- **Redundant bundling**: Both `marked` (^15.0.11) and `dompurify`
(^3.2.5) are already direct dependencies of the frontend, used
internally by `markdownRendererUtil.ts`, `NodePreview.vue`,
`WhatsNewPopup.vue`, etc.
- **XSS risk**: Custom nodes using raw `marked` without `DOMPurify`
could introduce XSS vulnerabilities.
By exposing the existing `renderMarkdownToHtml()` through the official
`ExtensionManager` API, custom nodes can:
```js
const html = app.extensionManager.renderMarkdownToHtml(nodeData.description)
```
...instead of bundling and loading their own copies.
## Changes
- **`src/types/extensionTypes.ts`**: Add `renderMarkdownToHtml(markdown:
string, baseUrl?: string): string` to the `ExtensionManager` interface
with JSDoc.
- **`src/stores/workspaceStore.ts`**: Import and re-export
`renderMarkdownToHtml` from `@/utils/markdownRendererUtil`.
## Impact
- **Zero bundle size increase** — the function and its dependencies are
already bundled in the `vendor-markdown` chunk.
- **No breaking changes** — purely additive to the `ExtensionManager`
interface.
- **Follows existing pattern** — same approach as `toast`, `dialog`,
`command`, `setting` on `ExtensionManager`.
Related: #TBD (long-term plan for custom node extension library
dependencies)
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10700-feat-expose-renderMarkdownToHtml-on-ExtensionManager-3326d73d36508149bc1dc6bb45e7c077)
by [Unito](https://www.unito.io)
## Summary
Extract `assetPath` from a `ComfyPage` method to a standalone pure
function, removing unnecessary coupling to the page object.
## Changes
- **What**: Moved `assetPath` to
`browser_tests/fixtures/utils/paths.ts`. `DragDropHelper` and
`WorkflowHelper` import it directly instead of receiving it via
`ComfyPage`. `ComfyPage.assetPath` kept as thin delegate for backward
compat.
## Review Focus
Structural-only refactor — no behavioral changes. The function was
already pure (no `this`/`page` usage).
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10651-refactor-extract-assetPath-as-standalone-pure-function-3316d73d365081c0b0e0ce6dde57ef8e)
by [Unito](https://www.unito.io)
## Summary
Add guidance to `docs/guidance/playwright.md` that new node-specific
assertions should be methods on page objects/helpers rather than new
`comfyExpect` custom matchers.
## Changes
- **What**: New "Custom Assertions" section in Playwright guidance
documenting that existing `comfyExpect` matchers are fine to use, but
new assertions should go on the page object for IntelliSense
discoverability.
## Review Focus
Documentation-only change. No code refactoring — this is a convention
for new code only.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10660-docs-add-convention-for-new-assertions-prefer-page-objects-over-custom-matchers-3316d73d3650816d97a8fbbdc33f6b75)
by [Unito](https://www.unito.io)
---------
Co-authored-by: GitHub Action <action@github.com>
## Summary
Remove the exclusion filter that prevented backend-mirrored endpoint
types from being generated in `@comfyorg/ingest-types`.
## Changes
- **What**: The `openapi-ts.config.ts` excluded all endpoints shared
with the ComfyUI Python backend (system_stats, object_info, prompt,
queue, history, settings, userdata, etc.). Since the cloud ingest API
mirrors the backend, these types should be generated from the OpenAPI
spec as the canonical source. This adds ~250 new types and Zod schemas
covering previously excluded endpoints.
- **Breaking**: None. This only adds new exported types — no existing
types or imports are changed.
## Review Focus
- The cloud ingest API is designed to mirror the ComfyUI Python backend.
The original exclusion filter was added to avoid duplication with
`src/schemas/apiSchema.ts`, but the generated types should be the
canonical source since they are auto-generated from the OpenAPI spec.
- A follow-up PR will migrate imports in `src/` from `apiSchema.ts` to
`@comfyorg/ingest-types` where applicable.
- Webhooks and internal analytics endpoints remain excluded
(server-to-server, not frontend-relevant).
Related: #10662
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10697-refactor-include-backend-mirrored-endpoints-in-ingest-types-codegen-3326d73d365081569614f743ab6f074d)
by [Unito](https://www.unito.io)
---------
Co-authored-by: GitHub Action <action@github.com>
## Motivation
Browser tests mock API responses with `route.fulfill()` using untyped
inline JSON. When the OpenAPI spec changes, these mocks silently drift —
mismatches aren't caught at compile time and only surface as test
failures at runtime.
We already have auto-generated types from OpenAPI and manual Zod
schemas. This PR makes those types the source of truth for test mock
data.
From Mar 27 PR review session action item: "instruct agents to use
schemas and types when writing browser tests."
## Type packages and their API coverage
The frontend has two OpenAPI-generated type packages, each targeting a
different backend API with a different code generation tool:
| Package | Target API | Generator | TS types | Zod schemas |
|---------|-----------|-----------|----------|-------------|
| `@comfyorg/registry-types` | Registry API (node packages, releases,
subscriptions, customers) | `openapi-typescript` | Yes | **No** |
| `@comfyorg/ingest-types` | Ingest API (hub workflows, asset uploads,
workspaces) | `@hey-api/openapi-ts` | Yes | Yes |
Additionally, Python backend endpoints (`/api/queue`, `/api/features`,
`/api/settings`, etc.) are typed via manual Zod schemas in
`src/schemas/apiSchema.ts`.
This PR applies **compile-time type checking** using these existing
types. Runtime validation via Zod `.parse()` is not yet possible for all
endpoints because `registry-types` does not generate Zod schemas — this
requires a separate migration of `registry-types` to
`@hey-api/openapi-ts` (#10674).
## Summary
- Add "Typed API Mocks" guideline to `docs/guidance/playwright.md` with
a sources-of-truth table mapping endpoint categories to their type
packages
- Add rule to `AGENTS.md` Playwright section requiring typed mock data
- Refactor `releaseNotifications.spec.ts` to use `ReleaseNote` type
(from `registry-types`) via `createMockRelease()` factory
- Annotate template mock in `templates.spec.ts` with
`WorkflowTemplates[]` type
Refs #10656
## Example workflow: writing a new typed E2E test mock
When adding a new `route.fulfill()` mock, follow these steps:
### 1. Identify the type source
Check which API the endpoint belongs to:
| Endpoint category | Type source | Zod available |
|---|---|---|
| Ingest API (hub, billing, workflows) | `@comfyorg/ingest-types` | Yes
— use `.parse()` |
| Registry API (releases, nodes, publishers) |
`@comfyorg/registry-types` | Not yet (#10674) — TS type only |
| Python backend (queue, history, settings) | `src/schemas/apiSchema.ts`
| Yes — use `z.infer` |
| Templates | `src/platform/workflow/templates/types/template.ts` | No —
TS type only |
### 2. Create a typed factory (with Zod when available)
**Ingest API endpoints** — Zod schemas exist, use `.parse()` for runtime
validation:
```typescript
import { zBillingStatusResponse } from '@comfyorg/ingest-types/zod'
import type { BillingStatusResponse } from '@comfyorg/ingest-types'
function createMockBillingStatus(
overrides?: Partial<BillingStatusResponse>
): BillingStatusResponse {
return zBillingStatusResponse.parse({
plan: 'free',
credits_remaining: 100,
renewal_date: '2026-04-28T00:00:00Z',
...overrides
})
}
```
**Registry API endpoints** — TS type only (Zod not yet generated):
```typescript
import type { ReleaseNote } from '../../src/platform/updates/common/releaseService'
function createMockRelease(
overrides?: Partial<ReleaseNote>
): ReleaseNote {
return {
id: 1,
project: 'comfyui',
version: 'v0.3.44',
attention: 'medium',
content: '## New Features',
published_at: new Date().toISOString(),
...overrides
}
}
```
### 3. Use in test
```typescript
test('should show upgrade banner for free plan', async ({ comfyPage }) => {
await comfyPage.page.route('**/billing/status', async (route) => {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify(createMockBillingStatus({ plan: 'free' }))
})
})
await comfyPage.setup()
await expect(comfyPage.page.getByText('Upgrade')).toBeVisible()
})
```
The factory pattern keeps test bodies focused on **what varies** (the
override) rather than the full response shape.
## Scope decisions
| File | Decision | Reason |
|------|----------|--------|
| `releaseNotifications.spec.ts` | Typed | `ReleaseNote` type available
from `registry-types` |
| `templates.spec.ts` | Typed | `WorkflowTemplates` type available in
`src/platform/workflow/templates/types/` |
| `QueueHelper.ts` | Skipped | Dead code — instantiated but never called
in any test |
| `FeatureFlagHelper.ts` | Skipped | Response type is inherently
`Record<string, unknown>`, no stronger type exists |
| Fixture factories | Deferred | Coordinate with Ben's fixture
restructuring work to avoid duplication |
## Follow-up work
Sub-issues of #10656:
- #10670 — Clean up dead `QueueHelper` or rewrite against `/api/jobs`
endpoint
- #10671 — Expand typed factory pattern to more endpoints
- #10672 — Evaluate OpenAPI generation for excluded Python backend
endpoints
- #10674 — Migrate `registry-types` from `openapi-typescript` to
`@hey-api/openapi-ts` to enable Zod schema generation
## Test plan
- [x] `pnpm typecheck:browser` passes
- [x] `pnpm lint` passes
- [ ] Existing `releaseNotifications` and `templates` tests pass in CI
## Summary
Document the agreed-upon architectural separation for browser test
fixtures:
- `fixtures/data/` — Static test data (mock API responses, workflow
JSONs, node definitions)
- `fixtures/components/` — Page object components (locators, user
interactions)
- `fixtures/helpers/` — Focused helper classes (domain-specific actions)
- `fixtures/utils/` — Pure utility functions (no page dependency)
## Changes
- **`browser_tests/AGENTS.md`** — Added architectural separation section
with clear rules for each directory
- **`browser_tests/fixtures/data/README.md`** (new) — Explains the data
directory purpose and what belongs here vs `assets/`
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10645-docs-document-fixture-page-object-separation-in-browser-tests-3316d73d365081febf52d165282c68f6)
by [Unito](https://www.unito.io)
---------
Co-authored-by: GitHub Action <action@github.com>
## What
Adds a `cloud` Playwright project so E2E tests can run against
`DISTRIBUTION=cloud` builds, with `@cloud` / `@oss` test tagging.
## Why
100+ usages of `isCloud` / `DISTRIBUTION` across 9 categories (API
routing, UI visibility, settings, auth). Zero cloud test infrastructure
existed — cloud-specific UI components (LoginButton, SubscribeButton,
etc.) had no E2E coverage path.
## Investigation: Runtime Toggle
Investigated whether `isCloud` could be made runtime-toggleable in
dev/test mode (via `window.__FORCE_CLOUD__`). **Not feasible** —
`__DISTRIBUTION__` is a Vite `define` compile-time constant used for
dead-code elimination. Runtime override would break tree-shaking in
production.
Full investigation:
`research/architecture/cloud-runtime-toggle-investigation.md`
## What's included
### Playwright Config
- New `cloud` project alongside existing `chromium`
- Cloud project: `grep: /@cloud/` — only runs `@cloud` tagged tests
- Chromium project: `grepInvert: /@cloud/` — excludes cloud tests
### Build Script
- `npm run build:cloud` → `DISTRIBUTION=cloud vite build`
### Test Tagging Convention
```typescript
test('works in both', async () => { ... });
test('subscription button visible @cloud', async () => { ... });
test('install manager prompt @oss', async () => { ... });
```
### Example Tests
- 2 cloud-only tests validating cloud UI visibility
## NOT included (future work)
- CI workflow job for cloud tests (separate PR)
- Cloud project is opt-in — not run by default locally
## Unblocks
- Cloud-specific E2E tests for entire team
- TB-03 LoginButton, TB-04 SubscribeButton (@Kaili Yang)
- DLG-04 SignIn, DLG-06 CancelSubscription
Part of: Test Coverage Q2 Overhaul
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10546-test-infra-cloud-Playwright-project-with-cloud-oss-tagging-32f6d73d3650810ebb59dea8ce4891e9)
by [Unito](https://www.unito.io)
---------
Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: Alexander Brown <drjkl@comfy.org>
## Summary
Document the recommended pattern for adding new domain-specific test
helpers as Playwright fixtures via `base.extend()` instead of attaching
them to `ComfyPage`.
## Changes
- **What**: Added "Creating New Test Helpers" section to
`docs/guidance/playwright.md` with fixture extension example and rules
## Review Focus
Documentation-only change. Verify the example code matches the existing
pattern in `browser_tests/fixtures/ComfyPage.ts`.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10653-docs-document-Playwright-fixture-injection-pattern-for-new-helpers-3316d73d36508145b402cf02a5c2c696)
by [Unito](https://www.unito.io)
---------
Co-authored-by: Alexander Brown <drjkl@comfy.org>
## Summary
Document the arrange/act/assert pattern for Playwright browser tests to
keep mock setup out of test bodies.
## Changes
- **What**: Added "Test Structure: Arrange/Act/Assert" section to
`docs/guidance/playwright.md` documenting that mock setup belongs in
`beforeEach`/fixtures, test bodies should only act and assert, and
`clearAllMocks` should never be called mid-test. Includes good/bad
examples.
## Review Focus
Docs-only change — no code impact.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10657-docs-add-arrange-act-assert-pattern-guidance-for-browser-tests-3316d73d365081aa92c0fb6442084484)
by [Unito](https://www.unito.io)
---------
Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: Alexander Brown <drjkl@comfy.org>
## Summary
From the primordial entropy of 17 scattered spec files — a formless
sprawl of mixed concerns and inconsistent naming — emerges a clean,
domain-organized hierarchy. Order triumphs over chaos.
## Changes
- **What**: Reorganize all subgraph E2E tests from 17 flat files in
`browser_tests/tests/` into 10 domain-grouped files under
`browser_tests/tests/subgraph/`.
| File | Tests | Domain |
|------|-------|--------|
| `subgraphSlots` | 16 | I/O slot CRUD, rename, alignment, promoted slot
position |
| `subgraphPromotion` | 22 | Auto-promote, visibility, reactivity,
context menu, cleanup |
| `subgraphSerialization` | 16 | Hydration, round-trip, legacy formats,
ID remapping |
| `subgraphNavigation` | 10 | Breadcrumb, viewport, hotkeys, progress
state |
| `subgraphNested` | 9 | Configure order, duplicate names, pack values,
stale proxies |
| `subgraphLifecycle` | 7 | Source removal cleanup, pseudo-preview
lifecycle |
| `subgraphPromotionDom` | 6 | DOM widget persistence, cleanup,
positioning |
| `subgraphCrud` | 5 | Create, delete, copy, unpack |
| `subgraphSearch` | 3 | Search aliases, description, persistence |
| `subgraphOperations` | 2 | Copy/paste inside, undo/redo inside |
Where once the monolith `subgraph.spec.ts` (856 lines) mixed slot CRUD
with hotkeys, DOM widgets with navigation, and copy/paste with undo/redo
— now each behavioral domain has its sovereign territory.
Where once `subgraph-rename-dialog.spec.ts`,
`subgraphInputSlotRename.spec.ts`, and
`subgraph-promoted-slot-position.spec.ts` scattered rename concerns
across three kingdoms — now they answer to one crown:
`subgraphSlots.spec.ts`.
Where once `kebab-case` and `camelCase` warred for dominion — now a
single convention reigns.
All 96 test cases preserved. Zero test logic changes. Purely structural.
## Review Focus
- Verify no tests were lost in the consolidation
- Confirm import paths all resolve correctly at the new depth
(`../../fixtures/`)
- The `import.meta.dirname` asset path in `subgraphSlots.spec.ts` (slot
alignment test) updated for new directory depth
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10695-test-reorganize-subgraph-E2E-tests-into-domain-organized-directory-3326d73d36508197939be8825b69ea88)
by [Unito](https://www.unito.io)
Co-authored-by: Amp <amp@ampcode.com>
## Summary
Add a "Fixture Data & Schemas" section to `docs/guidance/playwright.md`
so agents reference existing Zod schemas and TypeScript types when
creating test fixture data.
## Changes
- **What**: New section listing key schema/type locations (`apiSchema`,
`nodeDefSchema`, `jobTypes`, `workflowSchema`, etc.) to keep test
fixtures in sync with production types.
## Review Focus
Documentation-only change; no runtime impact.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10642-docs-add-Fixture-Data-Schemas-section-to-Playwright-test-guidance-3316d73d365081f5a234e4672b3dc4b9)
by [Unito](https://www.unito.io)
## Summary
Add composite assertion and scoped opening methods to the `ContextMenu`
Playwright page object.
## Changes
- **What**: Added `assertHasItems(items: string[])` using
`expect.soft()` per item, and `openFor(locator: Locator)` which
right-clicks and waits for menu visibility. Fully backward-compatible.
## Review Focus
Both methods reuse existing locators (`primeVueMenu`, `litegraphMenu`,
`getByRole("menuitem")`). `openFor` uses `.or()` to handle both menu
types.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10659-feat-add-assertHasItems-and-openFor-to-ContextMenu-page-object-3316d73d36508193af45da7d3af4f50c)
by [Unito](https://www.unito.io)
## Summary
Add helpers for safely interacting with nodes that share the same title
without hitting Playwright strict mode.
## Changes
- **What**: Added `getNodesByTitle(title)` and `getNodeByTitleNth(title,
index)` to `VueNodeHelpers`. Updated `docs/guidance/playwright.md` with
a gotcha note about duplicate node names.
## Review Focus
These are purely additive helpers — no existing behavior changes.
`getNodesByTitle` returns all matching nodes (callers use `.nth()` to
pick), and `getNodeByTitleNth` is a convenience wrapper. The existing
`selectNodes(nodeIds)` by-ID method is unchanged.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10666-feat-add-getNodesByTitle-and-getNodeByTitleNth-helpers-to-VueNodeHelpers-3316d73d3650812eabe6e56a768a34d2)
by [Unito](https://www.unito.io)
## Summary
Add ESLint `no-restricted-imports` rule to prevent usage of
`useVirtualList` from `@vueuse/core`.
## Changes
- **What**: New ESLint config block banning `useVirtualList` in
`**/*.{ts,vue}` files. The team standardized on TanStack Virtual (via
Reka UI virtualizer or `@tanstack/vue-virtual`) for all virtualization.
`useVirtualList` requires uniform item heights and is no longer desired.
This is a preventive ban — no existing usage exists.
## Review Focus
Straightforward lint rule addition following the existing
`no-restricted-imports` pattern in `eslint.config.ts`.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10643-feat-ban-useVirtualList-from-vueuse-core-via-ESLint-3316d73d365081d5adf0ec926aab6e28)
by [Unito](https://www.unito.io)
---------
Co-authored-by: Benjamin Lu <benjaminlu1107@gmail.com>
## Summary
Extract repeated patterns from 12 subgraph Playwright spec files into
shared test utilities, reducing duplication by ~142 lines.
## Changes
- **What**: New shared helpers for common subgraph test operations:
- `SubgraphHelper`: `getSlotCount()`, `getSlotLabel()`, `removeSlot()`,
`findSubgraphNodeId()`
- `NodeReference`: `delete()`
- `subgraphTestUtils`: `serializeAndReload()`,
`convertDefaultKSamplerToSubgraph()`, `expectWidgetBelowHeader()`,
`collectConsoleWarnings()`, `packAllInteriorNodes()`
- Replaced ~72 inline `page.evaluate` blocks and multi-line sequences
with single helper calls across 12 spec files
## Review Focus
- Behavioral equivalence: every replacement is a mechanical extraction
with no test logic changes
- API surface of new helpers: naming, parameter types, placement in
existing utility classes
- Whether any remaining inline patterns in the spec files would benefit
from further extraction
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10629-test-extract-shared-subgraph-E2E-test-utilities-3306d73d365081b0b6b5db52ed0a4552)
by [Unito](https://www.unito.io)
---------
Co-authored-by: Amp <amp@ampcode.com>
## Summary
Adds a `@perf` test to establish a baseline for viewport panning GC
churn on large graphs.
## Changes
- **What**: New `large graph viewport pan sweep` perf test that pans
aggressively back and forth across a 245-node graph, forcing many nodes
to cross the viewport boundary. Measures style recalcs, forced layouts,
task duration, heap delta, and DOM node count.
## Review Focus
This is **PR 1 of 2** (perf-fix-with-proof pattern). The fix (viewport
culling) will follow in a separate PR once this baseline is established
on main. CI will then show the delta proving the improvement.
The test uses 120 steps out + 120 steps back at 8px/step = ~960px total
displacement, enough to sweep across a significant portion of the large
graph layout.
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10479-test-add-perf-test-for-viewport-pan-sweep-GC-churn-32d6d73d365081cc9f15fe3d5890675d)
by [Unito](https://www.unito.io)
## Summary
Adds the layout shell for the marketing site: SEO head, analytics, nav,
and footer.
## Changes (incremental from #10140)
- BaseLayout.astro: SEO meta (OG/Twitter), GTM (GTM-NP9JM6K7), Vercel
Analytics, ClientRouter, i18n
- SiteNav.vue: Fixed nav with logo, Enterprise/Gallery/About/Careers
links, COMFY CLOUD + COMFY HUB CTAs, mobile hamburger with ARIA
- SiteFooter.vue: Product/Resources/Company/Legal columns, social icons
## Stack (via Graphite)
- #10140 [1/3] Scaffold ← merge first
- **[2/3] Layout Shell** ← this PR
- #10142 [3/3] Homepage Sections
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10141-feat-add-layout-shell-BaseLayout-SiteNav-SiteFooter-2-3-3266d73d365081aeb2d7e598943a8e17)
by [Unito](https://www.unito.io)
## Summary
Add a deprecation warning when custom nodes access `widget.inputEl` on
STRING multiline widgets, directing them to use `widget.element`
instead.
## Changes
- **What**: Add a reusable `defineDeprecatedProperty` helper in
`feedback.ts` that creates an ODP getter/setter proxy from a deprecated
property to its replacement, logging via the existing `warnDeprecated`
utility (deduplicates: warns once per unique message per session). Use
it to deprecate `widget.inputEl` → `widget.element`.
## Review Focus
- `defineDeprecatedProperty` is generic and can be reused for future
property deprecations across the codebase.
- `warnDeprecated` already handles deduplication via a `Set`, so heavy
access patterns (e.g. custom nodes reading `widget.inputEl` in tight
loops) won't spam.
- `enumerable: false` keeps the deprecated alias out of `Object.keys()`
/ `for...in` / `JSON.stringify`.
FixesComfy-Org/ComfyUI#12893
<!-- Pipeline-Ticket: 6b291ba2-694c-42d6-ac0c-fcbdcba9373a -->
---------
Co-authored-by: Dante <bunggl@naver.com>
## What
Replace `v-if` with `v-show` on SelectionRectangle and NodeTooltip
components.
## Why
Firefox profiler shows 687 Vue `insert` markers from mount/unmount
cycling during canvas interaction. These components toggle frequently
during drag and mouse move events.
## How
- **SelectionRectangle**: `v-if` → `v-show` (single element, safe to
keep in DOM)
- **NodeTooltip**: `v-if` → `v-show` + no-op guard on `hideTooltip()` to
skip redundant reactivity triggers
## Perf Impact
Expected reduction: ~687 Vue insert/remove operations per profiling
session
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-9401-fix-use-v-show-for-frequently-toggled-canvas-overlay-components-31a6d73d365081aba2d7fce079bde7e9)
by [Unito](https://www.unito.io)
## Summary
With a previously saved workflow, selecting "Save as" in app mode would
not correctly change the file extension to the chosen mode, and would
require an additional save after to persist the actual mode change.
Recreation:
- Build app
- Save as worklow X, app mode
- Select Save as from builder footer [Save | v] chevron button
- Select node graph
- Save
- Check workflow on disk - it's still called X.app.json and doesn't have
linearMode: false <-- bug
## Changes
- **What**:
- pass isApp to save workflow
- ensure active graph & initialMode are correctly set when calling
saveAs BEFORE the actual saveWorkflow call
- add linearMode to workflowShema to prevent casts
- tests
## Review Focus
e2e tests coming in a follow up PR along with some refactoring of the
browser tests (left this PR focused to the actual fix)
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10679-fix-App-mode-Save-as-not-using-correct-extension-or-persisting-mode-on-change-3316d73d365081ef985cf57c91c34299)
by [Unito](https://www.unito.io)
## Summary
- Rename `text-xxxs`/`text-xxs` to `text-3xs`/`text-2xs` in design
system CSS — fixes `tailwind-merge` incorrectly classifying custom
font-size utilities as color classes, which clobbered text color
- Add `Badge` component with updated severity colors matching Figma
design (white text on colored backgrounds)
- Add Badge stories under `Components/Badges/Badge`
- Add unit tests including twMerge regression coverage
Split from #10438 per review feedback — this PR contains the
foundational Badge component; migration of consumers follows in a
separate PR.
## Test plan
- [x] Unit tests pass (`Badge.test.ts` — 12 tests)
- [x] Typecheck passes
- [x] Lint passes
- [ ] Verify Badge stories render correctly in Storybook
- [ ] Verify existing components using `text-2xs`/`text-3xs` render
unchanged
Fixes#10438 (partial)
┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10580-refactor-add-Badge-component-and-fix-twMerge-font-size-detection-32f6d73d3650810dae7cd0d4af67fd1c)
by [Unito](https://www.unito.io)
2026-03-27 19:23:59 -07:00
338 changed files with 32169 additions and 5289 deletions
description: Reviews Playwright E2E test code for ComfyUI-specific patterns, flakiness risks, and fixture misuse
severity-default: medium
tools: [Read, Grep]
---
You are reviewing Playwright E2E test code in `browser_tests/`. Focus on issues a **reviewer** would catch that an author might miss — flakiness risks, fixture misuse, test isolation problems, and convention violations.
1.**`waitForTimeout` usage** — Always wrong. Must use retrying assertions (`toBeVisible`, `toHaveText`), `expect.poll()`, or `expect().toPass()`. See retry patterns in `.claude/skills/writing-playwright-tests/SKILL.md`.
2.**Missing `nextFrame()` after canvas ops** — Any `drag`, `click` on canvas, `resizeNode`, `pan`, `zoom`, or programmatic graph mutation via `page.evaluate` that changes visual state needs `await comfyPage.nextFrame()` before assertions. `loadWorkflow()` does NOT need it. Prefer encapsulating `nextFrame()` calls inside Page Object methods so tests don't manage frame timing directly.
3.**Keyboard actions without prior focus** — `page.keyboard.press()` without a preceding `comfyPage.canvas.click()` or element `.focus()` will silently send keys to nothing.
4.**Coordinate-based interactions where node refs exist** — Raw `{ x, y }` clicks on canvas are fragile. If the test targets a node, use `comfyPage.nodeOps.getNodeRefById()` / `getNodeRefsByTitle()` / `getNodeRefsByType()` instead.
5.**Shared mutable state between tests** — Variables declared outside `test()` blocks, `let` state mutated across tests, or tests depending on execution order. Each test must be independently runnable.
6.**Missing cleanup of server-persisted state** — Settings changed via `comfyPage.settings.setSetting()` persist across tests. Must be reset in `afterEach` or at test start. Same for uploaded files or saved workflows. Prefer moving cleanup into [fixture options](https://playwright.dev/docs/test-fixtures#fixtures-options) so individual tests don't manage reset logic.
7.**Double-click without `{ delay }` option** — `dblclick()` without `{ delay: 5 }` or similar can be too fast for the canvas event handler.
### Fixture & API Misuse (Medium)
8.**Reimplementing existing fixture helpers** — Before flagging, grep `browser_tests/fixtures/` for the functionality. Common missed helpers:
-`comfyPage.command.executeCommand()` for menu/command actions
-`comfyPage.workflow.loadWorkflow()` for loading test workflows
-`comfyPage.canvasOps.resetView()` for view reset
-`comfyPage.settings.setSetting()` for settings
- Component page objects in `browser_tests/fixtures/components/`
9.**Building workflows programmatically when a JSON asset would work** — Complex `page.evaluate` chains to construct a graph should use a premade JSON workflow in `browser_tests/assets/` loaded via `comfyPage.workflow.loadWorkflow()`.
10.**Selectors not using `TestIds`** — Hard-coded `data-testid` strings should reference `browser_tests/fixtures/selectors.ts` when a matching entry exists. Check `selectors.ts` before flagging.
### Convention Violations (Minor)
11.**Missing test tags** — Every `test.describe` should have `tag` with at least one of: `@smoke`, `@slow`, `@screenshot`, `@canvas`, `@node`, `@widget`, `@mobile`, `@2x`. See `.claude/skills/writing-playwright-tests/SKILL.md` for when to use each.
12.**`as any` type assertions** — Forbidden in E2E tests. Use specific type assertions or test-local type helpers. See `docs/guidance/playwright.md` for acceptable patterns.
13.**Screenshot tests without masking dynamic content** — Timestamps, version numbers, or other non-deterministic content in screenshots will cause flakes. Use `mask` option.
14.**`test.describe` without `afterEach` cleanup when canvas state changes** — Tests that manipulate canvas view (drag, zoom, pan) should include `afterEach` with `comfyPage.canvasOps.resetView()`. Prefer moving canvas reset into the fixture so individual tests don't manage cleanup.
15.**Debug helpers left in committed code** — `debugAddMarker`, `debugAttachScreenshot`, `debugShowCanvasOverlay`, `debugGetCanvasDataURL` are for local debugging only.
### Test Design (Nitpick)
16.**Screenshot-only assertions where functional assertions are possible** — Prefer `expect(await node.isPinned()).toBe(true)` over screenshot comparison when testing non-visual behavior.
17.**Overly large test workflows** — Test should load the minimal workflow needed. If a test only needs one node, don't load the full default graph.
18.**Vue Nodes / LiteGraph mismatch** — If testing Vue-rendered node UI (DOM widgets, CSS states), should use `comfyPage.vueNodes.*`. If testing canvas interactions/connections, should use `comfyPage.nodeOps.*`. Mixing both in one test is a smell.
## Rules
- Only review `.spec.ts` files and supporting code in `browser_tests/`
- Do NOT flag patterns in fixture/helper code (`browser_tests/fixtures/`) — those are shared infrastructure with different rules
- "Major" for flakiness risks (items 1-7), "medium" for fixture misuse (8-10), "minor" for convention violations (11-15), "nitpick" for test design (16-18)
- When flagging missing fixture usage (item 8), confirm the helper exists by checking the fixture code — don't assume
- Existing tests that predate conventions are acceptable to modify but not required to fix
description: 'Reproduce a GitHub issue by researching prerequisites, setting up the environment (custom nodes, workflows, settings), and interactively exploring ComfyUI via playwright-cli until the bug is confirmed. Then records a clean demo video.'
---
# Issue Reproduction Skill
Reproduce a reported GitHub issue against a running ComfyUI instance. This skill uses an interactive, agent-driven approach — not a static script. You will research, explore, retry, and adapt until the bug is reproduced, then record a clean demo.
## Architecture
Two videos are produced:
1.**Research video** — the full exploration session: installing deps, trying things, failing, retrying, figuring out the bug. Valuable for debugging context.
2.**Reproduce video** — a clean, minimal recording of just the reproduction steps. This is the demo you'd attach to the issue.
```
Phase 1: Research → Read issue, understand prerequisites
## Phase 4: Record Clean Demo — Reproduce Video (max 5 minutes)
Once the bug is confirmed, **stop the research video** and **close the research browser**:
```bash
playwright-cli video-stop
playwright-cli close
```
Now start a **fresh browser session** for the clean reproduce video (Video 2).
**IMPORTANT constraints:**
- **Max 5 minutes** — the reproduce video must be short and focused
- **No environment setup** — server, user, custom nodes are already set up from Phase 3. Just log in and go.
- **No exploration** — you already know the exact steps. Execute them quickly and precisely.
- **Start video recording immediately**, execute steps, stop. Don't leave the recording running while thinking.
1. **Open browser and start recording**:
```bash
playwright-cli open "http://127.0.0.1:8188"
playwright-cli video-start
```
2. **Execute only the minimal reproduction steps** — no exploration, no mistakes. Just the clean sequence that demonstrates the bug. You already know exactly what works from Phase 3.
description: 'Comprehensive QA of ComfyUI frontend. Navigates all routes, tests all interactive features using playwright-cli, generates a report, and submits a draft PR. Works in CI and local environments, cross-platform.'
---
# ComfyUI Frontend QA Skill
Automated quality assurance for the ComfyUI frontend. The pipeline reproduces reported bugs using Playwright E2E tests, records video evidence, and deploys reports to Cloudflare Pages.
## Architecture Overview
The QA pipeline uses a **three-phase approach**:
1.**RESEARCH** — Claude writes Playwright E2E tests to reproduce bugs (assertion-backed, no hallucination)
2.**REPRODUCE** — Deterministic replay of the research test with video recording
3.**REPORT** — Deploy results to Cloudflare Pages with badge, video, and verdict
### Key Design Decision
Earlier iterations used AI vision (Gemini) to drive a browser and judge results from video. This was abandoned after discovering **AI reviewers hallucinate** — Gemini reported "REPRODUCED" when videos showed idle screens. The current approach uses **Playwright assertions** as the source of truth: if the test passes, the bug is proven.
Go to Actions → "PR: QA" → Run workflow → choose mode (focused/full).
## CI Workflow (`.github/workflows/pr-qa.yaml`)
```
resolve-matrix → analyze-pr ──┐
├→ qa-before (main branch, worktree build)
├→ qa-after (PR branch)
└→ report (video review, deploy, comment)
```
Before/after jobs run **in parallel** on separate runners for clean isolation.
### Issue Reproduce Mode
For issues (not PRs), the pipeline:
1. Fetches the issue body and comments
2. Runs `qa-analyze-pr.ts --type issue` to generate a QA guide
3. Runs the research phase (Claude writes E2E test to reproduce)
4. Records video of the test execution
5. Posts results as a comment on the issue
## Running Locally
### Step 1: Environment Setup
```bash
# Ensure ComfyUI server is running
# Default: http://127.0.0.1:8188
# Install Playwright browsers
npx playwright install chromium
```
### Step 2: Analyze the Issue/PR
```bash
# For a PR
pnpm exec tsx scripts/qa-analyze-pr.ts \
--pr-number 10394\
--repo Comfy-Org/ComfyUI_frontend \
--output-dir qa-guides
# For an issue
pnpm exec tsx scripts/qa-analyze-pr.ts \
--pr-number 10394\
--repo Comfy-Org/ComfyUI_frontend \
--output-dir qa-guides \
--type issue
```
### Step 3: Record Before/After
```bash
# Before (main branch)
pnpm exec tsx scripts/qa-record.ts \
--mode before \
--diff /tmp/pr-diff.txt \
--output-dir /tmp/qa-before \
--qa-guide qa-guides/qa-guide-1.json
# After (PR branch)
pnpm exec tsx scripts/qa-record.ts \
--mode after \
--diff /tmp/pr-diff.txt \
--output-dir /tmp/qa-after \
--qa-guide qa-guides/qa-guide-1.json
```
### Step 4: Review Videos
```bash
pnpm exec tsx scripts/qa-video-review.ts \
--artifacts-dir /tmp/qa-artifacts \
--video-file qa-session.mp4 \
--before-video qa-before-session.mp4 \
--output-dir /tmp/video-reviews \
--pr-context /tmp/pr-context.txt
```
## Research Phase Details (`qa-agent.ts`)
Claude receives:
- The issue description and comments
- A QA guide from `qa-analyze-pr.ts`
- An accessibility tree snapshot of the current UI
Claude's tools:
- **`inspect(selector?)`** — Read a11y tree to discover element selectors
- **`writeTest(code)`** — Write a Playwright `.spec.ts` file
- **`runTest()`** — Execute the test and get pass/fail + errors
- **`done(verdict, summary, evidence, testCode)`** — Finish with verdict
The test uses the project's Playwright fixtures (`comfyPageFixture`), giving access to `comfyPage.page`, `comfyPage.menu`, `comfyPage.settings`, etc.
### Verdict Logic
- **REPRODUCED** — Test passes (asserting the bug exists) → bug is proven
- **NOT_REPRODUCIBLE** — Claude exhausted attempts, test cannot pass
- **INCONCLUSIVE** — Agent timed out or encountered infrastructure issues
Auto-completion: if a test passed but `done()` was never called, the pipeline auto-completes with REPRODUCED.
## Manual QA (Fallback)
When the automated pipeline isn't suitable (e.g., visual-only bugs, complex multi-step interactions), use **playwright-cli** for manual browser interaction:
```bash
# Install
npm install -g @playwright/cli@latest
# Open browser and navigate
playwright-cli open http://127.0.0.1:8188
# Get element references
playwright-cli snapshot
# Interact
playwright-cli click e1
playwright-cli fill e2 "test text"
playwright-cli press Escape
playwright-cli screenshot --filename=f.png
```
Snapshots return element references (`e1`, `e2`, …). Always run `snapshot` after navigation to refresh refs.
## Manual QA Test Plan
When performing manual QA (either via playwright-cli or the automated pipeline), systematically test each area below.
### Application Load & Routes
| Test | Steps |
|---|---|
| Root route loads | Navigate to `/` — GraphView should render with canvas |
| User select route | Navigate to `/user-select` — user selection UI should appear |
| 404 handling | Navigate to `/nonexistent` — should handle gracefully |
### Canvas & Graph View
| Test | Steps |
|---|---|
| Canvas renders | The LiteGraph canvas is visible and interactive |
| Pan canvas | Click and drag on empty canvas area |
| Zoom in/out | Use scroll wheel or Alt+=/Alt+- |
| Add node via double-click | Double-click canvas to open search, type "KSampler", select it |
| Delete node | Select a node, press Delete key |
| Connect nodes | Drag from output slot to input slot |
| Copy/Paste | Select nodes, Ctrl+C then Ctrl+V |
| Undo/Redo | Make changes, Ctrl+Z to undo, Ctrl+Y to redo |
| Context menus | Right-click node vs empty canvas — different menus |
--onlyAllow 'MIT;MIT*;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC;0BSD;BlueOak-1.0.0;Python-2.0;CC0-1.0;Unlicense;(MIT OR Apache-2.0);(MIT OR GPL-3.0);(Apache-2.0 OR MIT);(MPL-2.0 OR Apache-2.0);CC-BY-4.0;CC-BY-3.0;GPL-3.0-only'; then
@@ -216,6 +216,7 @@ See @docs/testing/\*.md for detailed patterns.
1. Follow the Best Practices described [in the Playwright documentation](https://playwright.dev/docs/best-practices)
2. Do not use waitForTimeout, use Locator actions and [retrying assertions](https://playwright.dev/docs/test-assertions#auto-retrying-assertions)
3. Tags like `@mobile`, `@2x` are respected by config and should be used for relevant tests
4. Type all API mock responses in `route.fulfill()` using generated types or schemas from `packages/ingest-types`, `packages/registry-types`, `src/workbench/extensions/manager/types/generatedManagerTypes.ts`, or `src/schemas/` — see `docs/guidance/playwright.md` for the full source-of-truth table
- **`fixtures/utils/`** — Pure utility functions. No `Page` dependency; stateless helpers that can be used anywhere.
## Polling Assertions
Prefer `expect.poll()` over `expect(async () => { ... }).toPass()` when the block contains a single async call with a single assertion. `expect.poll()` is more readable and gives better error messages (shows actual vs expected on failure).
```typescript
// ✅ Correct — single async call + single assertion
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.