ComfyUI_frontend

mirror of https://github.com/Comfy-Org/ComfyUI_frontend.git synced 2026-05-11 08:20:53 +00:00

Author	SHA1	Message	Date
dante01yoon	3bc88f5dea	fix: show file extensions in workflow sidebar Use node.key to derive leaf labels instead of node.label, which loses the file extension after PrimeVue Tree processing. Remove unused getFilenameDetails import. Fixes #10409	2026-04-03 10:03:43 +09:00
snomiao	04bf4cbe95	feat: badge shows reproduction method (E2E test / video / both) - Add reproducedBy field to ResearchResult and done() tool - Agent reports how bug was proven: e2e_test, video, both, or none - Badge shows '1 via E2E test' instead of generic '1 reproduced' - Deploy script reads reproducedBy from research-log.json - Test code (reproduce.spec.ts) now deployed to report page Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800 Co-authored-by: Amp <amp@ampcode.com>	2026-04-01 13:45:21 +00:00
snomiao	705d10bc8d	fix: prevent trivial assertions from false REPRODUCED + deploy test code - Add rule: assertions must be specific to the bug (not just > 0) - If no bug-specific assertion possible, verdict must be NOT_REPRODUCIBLE - Copy reproduce.spec.ts to deployed report for transparency - Addresses #10307 false REPRODUCED (test only asserted node count > 0) Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800 Co-authored-by: Amp <amp@ampcode.com>	2026-04-01 13:35:38 +00:00
snomiao	765b5dbf1d	fix: remove oversized files before CF Pages deploy (25MB limit) Large mp4 videos (37-76MB) caused wrangler to silently fail the deploy, leaving stale ANALYZING badges. Now strips files >25MB before deploying. Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800 Co-authored-by: Amp <amp@ampcode.com>	2026-04-01 11:12:57 +00:00
snomiao	8ac9a9843b	fix: remove PREPARING/ANALYZING intermediate badge deploys Root cause: Cloudflare Pages serves stale deployments when multiple deploys race to the same branch. The ANALYZING placeholder deployed seconds before the final report would sometimes 'win' the race. Fix: Only deploy once — the final report with the real badge. No more intermediate PREPARING or ANALYZING placeholders. Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800 Co-authored-by: Amp <amp@ampcode.com>	2026-04-01 10:37:19 +00:00
snomiao	3351f79c27	fix: log wrangler output and always use fallback deploy URL Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800 Co-authored-by: Amp <amp@ampcode.com>	2026-04-01 10:01:02 +00:00
snomiao	64c4bedc0d	fix: stop iterating after test passes, ban waitForTimeout in QA tests - Add explicit instruction to call done() immediately after test passes - Inject warning message in runTest response when test passes - Ban page.waitForTimeout() in system prompt (use retrying assertions) - Instruct agent to write ONE focused test, not multiple Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800 Co-authored-by: Amp <amp@ampcode.com>	2026-04-01 08:19:53 +00:00
snomiao	854f1c7da0	feat: readFixture/readTest tools, ANTHROPIC_API_KEY_QA, fix TS errors - Add readFixture and readTest tools to qa-agent for fixture API discovery - Enrich system prompt with comprehensive ComfyPage fixture API reference - Switch CI to ANTHROPIC_API_KEY_QA secret - Fix all TS errors in qa-agent.ts, qa-record.ts, qa-reproduce.ts - Better error handling for API credit exhaustion - Rewrite SKILL.md to reflect three-phase pipeline Amp-Thread-ID: https://ampcode.com/threads/T-019d4786-eb5f-7115-a10e-5b086c921800 Co-authored-by: Amp <amp@ampcode.com>	2026-04-01 06:44:34 +00:00
snomiao	ed96aaafc6	fix: auto-complete when test passes but agent doesn't call done() Claude sometimes keeps iterating after a test passes, exhausting the time budget without calling done(). Now: when runTest() returns TEST PASSED, the test code is saved. If the agent loop ends without done(), auto-sets verdict=REPRODUCED with the passing test. Fixes #8532 (17 calls, test passed twice, but INCONCLUSIVE verdict). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 13:25:07 +00:00
snomiao	4ba526b859	fix: remaining pipefail crashes in deploy script Added \|\| true to all grep/sed pipe chains that could exit non-zero: - grep on pr-context.txt (line 149) - sed/grep on pr-context description (line 158-159) - grep -oiP on RISK_FIRST (line 331) - wrangler deploy \| grep URL (line 355) All tested under set -euo pipefail with empty inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 12:23:32 +00:00
snomiao	a3dd897823	fix: pipefail crash in deploy script with empty video-reviews sed on video-reviews/*.md returns exit code 2 when no files match, killing the script under set -euo pipefail. Added \|\| true to all potentially empty glob pipelines. Affects 13/20 QA runs that had successful research but failed deploy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:08 +00:00
snomiao	979124109a	fix: use comfyPage.page.waitForTimeout for delay injection The test uses comfyPageFixture, not bare page. Also match firstNode await calls for node interaction pauses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:08 +00:00
snomiao	b0f8e69dc2	feat: inject 800ms pauses between test actions for readable videos Regex inserts await page.waitForTimeout(800) before every comfyPage/topbar/page/canvas/expect await call in the Phase 2 test code. Adds ~5-8s to a 10-step test (negligible vs 10min research). Default playback changed to 0.5x (was 0.25x) since pauses provide natural breathing room. A 15s video at 0.5x = 30s viewing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:08 +00:00
snomiao	377cbf775f	feat: default playback 0.25x + cursor overlay in E2E test videos - Report player defaults to 0.25x speed (was 0.5x) — 5s test videos play in 20s, much more watchable - Phase 2 injects cursor overlay via addInitScript into the test code before running — white SVG arrow follows mousemove events Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:08 +00:00
snomiao	4e261fe5c6	fix: set PLAYWRIGHT_LOCAL=1 for Phase 2 to enable video recording Playwright config only records video when PLAYWRIGHT_LOCAL is set. In CI, this env var was missing so Phase 2 produced no video. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:08 +00:00
snomiao	629dfc9d37	fix: don't overwrite Phase 2 test video with idle research video After context.close(), renameLatestWebm would overwrite the Phase 2 test execution video with the idle research browser recording. Now skips the rename if qa-session.webm already exists from Phase 2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	8752276945	fix: Phase 2 records test execution video, copies to qa-session.webm The old video showed an idle screen (research browser doing nothing). Now Phase 2 runs the test with --video=on from browser_tests/tests/, finds the recorded .webm, and copies it to qa-session.webm where the deploy script expects it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	4d6fe47e01	fix: runTest uses project Playwright config + fixtures - Copy test to browser_tests/tests/ where Playwright config expects it - System prompt teaches Claude the project's test fixtures: comfyPageFixture, comfyPage.menu.topbar, comfyPage.workflow, etc. - Increased time budget to 10 min for write→run→fix iterations - Increased max turns to 50 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	4de56e2696	fix: broaden research-log.json search paths in deploy script Also search qa-artifacts/before/*/research/ for the research log since artifacts are downloaded with that nested structure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	e037244d49	feat: Claude writes E2E tests to reproduce bugs instead of driving browser Phase 1: Claude reads issue + a11y tree, writes a Playwright .spec.ts test that asserts the bug exists. Runs the test, reads errors, iterates until the test passes (proving the bug) or determines NOT_REPRODUCIBLE. Phase 2: Run the passing test with --video=on for clean recording. This replaces interactive browser driving with deterministic test code. Claude Sonnet 4.6 excels at writing Playwright tests — much more reliable than real-time browser interaction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	e8ce53c92b	fix: use ariaSnapshot() instead of removed page.accessibility API page.accessibility.snapshot() was removed in Playwright 1.49+. Use page.locator('body').ariaSnapshot() which returns a text representation of the accessibility tree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	30f4976621	feat: deploy research-log.json + use it as primary verdict source - Copy research-log.json to deploy dir (accessible at /research-log.json) - Read verdict from research log first (a11y-verified ground truth) - Fall back to video review verdict only if no research log exists - Research log is uploaded as part of QA artifacts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	b156ecd493	feat: three-phase QA pipeline — Research → Reproduce → Report Phase 1 (qa-agent.ts): Claude investigates via a11y API only. - No video, no Gemini vision — only page.accessibility.snapshot() - Every action logged with a11y before/after state - done() requires evidence citing inspect() results - Outputs reproduction plan for Phase 2 Phase 2 (qa-reproduce.ts): Deterministic replay of research plan. - Executes each step with a11y assertions - Gemini describes visual changes (narration for humans) - Clean focused video with subtitles Phase 3: Report job reads research-log.json for verdict (ground truth), narration-log.json for descriptions, video for visuals. Gemini formats logs into report — never determines verdict. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	5541c9aaea	fix: prevent AI lies — assertion-based verdicts + blind reviewer Agent: MUST use inspect() after every action, verdict based on DOM state not opinions. "NEVER claim REPRODUCED unless inspect() confirms." Reviewer: Two-phase prompt — Phase 1 describes what it SEES (blind, no context). Phase 2 compares observations against issue/PR context. Anti-hallucination rules: "describe ONLY what you observe, NEVER infer." Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	ec29d5a828	feat: Agent SDK auto-detects Claude Code session — no API key needed locally ANTHROPIC_API_KEY is optional: Agent SDK uses Claude Code OAuth session when running locally (detects CLAUDE_CODE_SSE_PORT). In CI, ANTHROPIC_API_KEY from secrets is used. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	a96f836a00	refactor: require ANTHROPIC_API_KEY, remove Gemini-only fallback The Gemini-only agentic loop had ~47% success rate — too low to be useful as a fallback. Now ANTHROPIC_API_KEY is required for issue reproduction. Fails clearly if missing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	80db4a1ac6	fix: inject cursor overlay via addScriptTag after login, not addInitScript addInitScript runs before page load — Vue's app mount destroys the cursor div when it takes over the DOM. Using addScriptTag after login ensures the cursor persists in the stable DOM. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	9db6463b96	fix: cursor overlay on locator clicks (clickByText, menu items) Locator.click/hover bypasses our page.mouse monkey-patch. Now clickByText, hoverMenuItem, clickSubmenuItem get the element bounding box and update cursor overlay manually. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	6761c391b7	fix: verdict JSON grep pattern — capture value without closing quote The grep \{"verdict":\s"[^"]+ captures up to but not including the closing quote. The second grep for "[A-Z_]+"$ then fails because there's no closing quote. Fixed: match "verdict":\s"[A-Z_]+ then extract [A-Z_]+$ (no quotes needed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	bfc0931260	fix: grep pipefail crash + add QA troubleshooting doc - Add \|\| true to all grep pipelines in deploy script (grep returns 1 on no match, pipefail kills script) - Add docs/qa/TROUBLESHOOTING.md covering all failures encountered: __name errors, zod/v4 imports, model IDs, badge mismatches, cursor, loadDefaultWorkflow, pressKey timing, agent behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	be92dee046	feat: structured JSON verdict from AI reviewer, light-first theme - Video review prompt now requests a ## Verdict JSON block: {"verdict": "REPRODUCED\|NOT_REPRODUCIBLE\|INCONCLUSIVE", "risk": "low\|medium\|high"} - Deploy script reads JSON verdict first, falls back to grep - Eliminates all regex-matching false positives permanently - Theme: light mode is default, dark via prefers-color-scheme:dark - Cards use solid backgrounds, grain overlay only in dark mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	6e60706656	feat: report site follows system light/dark theme Add prefers-color-scheme:light media query with light palette. Replace hardcoded dark oklch values with CSS variables. Light mode: white surfaces, dark text, subtle borders, no grain overlay. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	40b5b0ca4a	fix: loadDefaultWorkflow uses API instead of menu, pressKey uses instant press - loadDefaultWorkflow now calls app.resetToDefaultWorkflow() via JS API instead of navigating File → Load Default menu (menu item name varies) - pressKey reverted to instant press() — the 400ms hold via down/up prevented Escape from propagating to parent dialog (#10397 BEFORE video showed wrong behavior because hold intercepted the event) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	0a9ca0c0ba	fix: monkey-patch page.mouse for universal cursor overlay Instead of manually calling moveCursorOverlay in each action, patch page.mouse.move/click/dblclick/down/up globally. Now EVERY mouse operation shows the cursor — text clicks, menu hovers, etc. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	159fdee471	fix: use gemini-3-flash-preview in hybrid agent (not 2.5 preview) Gemini 2.5 preview models return 404. Always use gemini-3+ models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	eac491c6b1	fix: add 'could not be confirmed' to negative verdict patterns "could not be confirmed" contains "confirmed" which matched the positive reproduc\|confirm check. Now caught by the negative check first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	4bfb059696	fix: correct Claude model ID — claude-sonnet-4-6 (not dated suffix) The Agent SDK returned "model not found" for claude-sonnet-4-6-20250514. Correct ID is claude-sonnet-4-6. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	99343cdef8	fix: cursor overlay now controlled via __moveCursor, not DOM events Headless Chrome's Playwright CDP doesn't trigger DOM mousemove events reliably. Now executeAction calls __moveCursor(x,y) directly after every mouse.move/click/drag. Cursor is an SVG arrow (white + outline). Click state shown via scale animation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	17776a850f	feat: badge label includes QA date — #10397 QA0327 Shows when the QA was run so stale results are obvious at a glance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	00dc89dad7	fix: make key presses visible in video — hold + subtitle pressKey now uses keyboard.down/up with 400ms hold instead of instant press(). Shows subtitle "⌨ Escape" and the keyboard HUD catches the held state for video frame capture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	0984ec2706	fix: use zod instead of zod/v4 — project zod doesn't export /v4 subpath Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	742216ad18	fix: add claude-agent-sdk to workspace catalog Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:07 +00:00
snomiao	6db13752a0	feat: control/test comparison strategy + QA backlog doc - Agent system prompt now instructs Claude to demonstrate BOTH working (control) and broken (test) states when bug is triggered by a setting - Added docs/qa/backlog.md with future improvements: Type B/C comparisons, TTS, pre-seeding, cost optimization, environment-dependent issues Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:06 +00:00
snomiao	9809d52ac6	feat: all badges use vertical box style Drop horizontal badges. Universal box badge shows: ┌──────────────────┐ │ #7414 QA │ │ ✓ 1 reproduced │ │ ⚙ Fix: APPROVED │ ← only for PRs └──────────────────┘ Issues show repro/not-repro/inconclusive rows. PRs add a fix quality row (APPROVED/MINOR/MAJOR). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:06 +00:00
snomiao	1a25756290	feat: show QA pipeline commit hash + timing on report site - Shows "QA @ abc1234" linking to the pipeline code commit - Shows start time → deploy time in header - Helps trace which version of QA scripts generated each report Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:06 +00:00
snomiao	83204b9a67	feat: hybrid QA agent — Claude Sonnet 4.6 brain + Gemini vision Architecture: - Claude Sonnet 4.6 plans and reasons (via Claude Agent SDK) - Gemini 2.5 Flash watches video buffer and describes what it sees - 4 tools: observe(), inspect(), perform(), done() observe(seconds, focus): builds video clip from screenshot buffer, sends to Gemini with Claude's focused question. inspect(selector): searches a11y tree for specific element state. perform(action, params): executes Playwright action. done(verdict, summary): signals completion. Falls back to Gemini-only loop if ANTHROPIC_API_KEY not set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:06 +00:00
snomiao	d78388c893	feat: pass OPENAI_API_KEY to recording step for TTS narration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:06 +00:00
snomiao	eeaedf9854	feat: subtitle overlay + OpenAI TTS narration on reproduce videos - Agent reasoning shown as subtitle bar at bottom of video during recording - After recording, generates TTS audio via OpenAI API (tts-1, nova voice) - Merges audio clips at correct timestamps into the video with ffmpeg - Requires OPENAI_API_KEY env var; gracefully skips if not set - No-sandbox + disable-dev-shm-usage for headless Chrome compatibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:06 +00:00
snomiao	3f1a580d35	feat: show test requirements from QA guide on report site - Download QA guide artifact in report job - Extract prerequisites, test focus, and steps from guide JSON - Display below the purpose description: focus → prerequisites → steps - Separated by a subtle divider with smaller font Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:06 +00:00
snomiao	6bd6e08195	feat: purpose description on report + multi-pass video link fix - Report site shows "PR #N aims to..." or "Issue #N reports..." block above the video cards, extracted from pr-context.txt - Multi-pass video links fall back to pass1 when qa-{os}.mp4 is 404 - More negative verdict patterns: "does not demonstrate", "never tested" - Risk uses first word of Overall Risk (avoids "high confidence" match) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 07:54:06 +00:00

1 2 3 4 5 ...

7604 Commits