feat: richer fixture API hints — i18n, queue mocks, subgraph helpers

- Add Comfy.Locale setting ID for i18n language switching tests - Add queue/assets mock API (mockOutputHistory, runButton) - Add subgraph workflow asset paths for loadWorkflow - Add SubgraphHelper API docs (slot ops, navigation, conversion) - Add VueNodeHelpers enterSubgraph/selectNode
feat: smarter agent — precondition reasoning + issue comments
2026-04-19 22:09:37 +00:00 · 2026-04-15 16:40:27 +00:00 · 2026-04-15 16:31:20 +00:00 · 2026-04-14 15:37:07 +00:00 · 2026-04-14 13:14:33 +00:00 · 2026-04-14 13:12:49 +00:00
4 changed files with 103 additions and 14 deletions
--- a/.claude/skills/comfy-qa/scripts/qa-agent.ts
+++ b/.claude/skills/comfy-qa/scripts/qa-agent.ts
@@ -35,6 +35,7 @@ interface ResearchOptions {
  anthropicApiKey?: string
  maxTurns?: number
  timeBudgetMs?: number
+  model?: string
 }

 export type ReproMethod = 'e2e_test' | 'video' | 'both' | 'none'
@@ -401,15 +402,28 @@ export async function runResearchPhase(
 - done(verdict, summary, evidence, testCode) — Finish with the final test

 ## Workflow
-1. Read the issue description carefully
-2. Use inspect() to understand the current UI state and discover element selectors
-3. If unsure about the fixture API, use readFixture() to read the relevant helper source code
-4. If unsure about test patterns, use readTest() to read an existing test for reference
+1. Read the issue description carefully. Think about:
+   - What PRECONDITIONS are needed? (many nodes on canvas? specific layout? saved workflow? subgraph?)
+   - What HIDDEN ASSUMPTIONS exist? (e.g. "z-index bug" means nodes must overlap → need a crowded canvas)
+   - What specific UI STATE triggers the bug? (dirty workflow? collapsed node? specific menu open?)
+2. Use readTest() to read 1-2 existing tests similar to the bug:
+   - For menu/workflow bugs: readTest("workflow.spec.ts") or readTest("topbarMenu.spec.ts")
+   - For node/canvas bugs: readTest("nodeInteraction.spec.ts") or readTest("copyPaste.spec.ts")
+   - For settings bugs: readTest("settingDialogSearch.spec.ts")
+   - For subgraph bugs: readTest("subgraph.spec.ts")
+3. Use inspect() to understand the current UI state and discover element selectors
+4. If unsure about the fixture API, use readFixture("ComfyPage.ts") or relevant helper
 5. Write a Playwright test that:
-   - Performs the exact reproduction steps from the issue
-   - Asserts the BROKEN behavior (the bug) — so the test PASSES when the bug exists
+   - FIRST sets up the preconditions (add multiple nodes, create specific layout, save workflow, etc.)
+   - THEN performs the reproduction steps from the issue
+   - FINALLY asserts the BROKEN behavior (the bug) — so the test PASSES when the bug exists
+   - Think like a tester: the bug may only appear under specific conditions that the reporter assumed were obvious
 6. Run the test with runTest()
-7. If it fails: read the error, fix the test, run again (max 5 attempts)
+7. If it fails, ANALYZE the error before retrying:
+   - Is it a selector issue? Use inspect() to find the right element
+   - Is it a timing issue? The UI may need time to update — use nextFrame() or expect.poll()
+   - Is the precondition wrong? Maybe the bug only appears with MORE nodes, AFTER a save, etc.
+   - Try a DIFFERENT approach, not the same code with minor tweaks
 8. Call done() with the final verdict and test code

 ## Test writing guidelines
@@ -423,6 +437,8 @@ export async function runResearchPhase(
 - Use \`comfyPage.nextFrame()\` after interactions that trigger UI updates
 - NEVER use \`page.waitForTimeout()\` — use Locator actions and retrying assertions instead
 - ALWAYS call done() when finished, even if the test passed — do not keep iterating after a passing test
+- CRITICAL: If your test FAILS 3 times in a row with the same or similar error, call done(NOT_REPRODUCIBLE) immediately. Do NOT keep retrying the same approach — try a completely different strategy or give up. Spending 20+ tool calls on failing tests is wasteful.
+- Budget your turns: spend at most 3 turns on inspect/readFixture, 2 turns writing the first test, then max 3 fix attempts. If still failing after ~10 tool calls, call done().
 - Use \`expect.poll()\` for async assertions: \`await expect.poll(() => comfyPage.nodeOps.getGraphNodesCount()).toBe(8)\`
 - CRITICAL: Your assertions must be SPECIFIC TO THE BUG. A test that asserts \`expect(count).toBeGreaterThan(0)\` proves nothing — it would pass even without the bug. Instead assert the exact broken state, e.g. \`expect(clonedWidgets).toHaveLength(0)\` (missing widgets) or \`expect(zIndex).toBeLessThan(parentZIndex)\` (wrong z-order). If a test passes trivially, it's a false positive.
 - NEVER write "debug", "discovery", or "inspect node types" tests. These waste turns and produce false REPRODUCED verdicts. If you need to discover node type names, use inspect() or readFixture() — not a passing test.
@@ -481,6 +497,10 @@ export async function runResearchPhase(
 ### Settings (comfyPage.settings)
 - \`.setSetting(id, value)\` — change a ComfyUI setting
 - \`.getSetting(id)\` — read current setting value
+- Common setting IDs:
+  - \`'Comfy.UseNewMenu'\` — 'Top' | 'Bottom' | 'Disabled'
+  - \`'Comfy.Locale'\` — 'en' | 'zh' | 'ja' | 'ko' | 'ru' | 'fr' | 'es' etc. (change UI language)
+  - \`'Comfy.NodeBadge.NodeSourceBadgeMode'\` — node badge display

 ### Keyboard (comfyPage.keyboard)
 - \`.undo()\` / \`.redo()\` — Ctrl+Z / Ctrl+Y
@@ -493,6 +513,7 @@ export async function runResearchPhase(
 - \`.setupWorkflowsDirectory(structure)\` — setup test directory
 - \`.deleteWorkflow(name)\`
 - \`.isCurrentWorkflowModified()\` — check dirty state
+- Available subgraph assets: loadWorkflow('subgraphs/basic-subgraph'), 'subgraphs/nested-subgraph', 'subgraphs/subgraph-with-promoted-text-widget', etc.

 ### Context Menu (comfyPage.contextMenu)
 - \`.openFor(locator)\` — right-click locator and wait for menu
@@ -500,12 +521,26 @@ export async function runResearchPhase(
 - \`.isVisible()\` — check if context menu is showing
 - \`.assertHasItems(items)\` — assert menu contains items

+### Queue & Assets (comfyPage.assets)
+- \`comfyPage.runButton.click()\` — execute current workflow (backend runs with --cpu in CI)
+- \`comfyPage.assets.mockOutputHistory(jobs)\` — mock queue history with fake job items
+- \`comfyPage.assets.mockEmptyState()\` — clear all mocked state
+- Queue overlay: \`page.getByTestId('queue-overlay-toggle')\` to open queue panel
+
+### Subgraph (comfyPage.subgraph)
+- \`.isInSubgraph()\` — check if currently viewing a subgraph
+- \`.getNodeCount()\` — nodes in current graph view
+- \`.getSlotCount('input'|'output')\` — I/O slot count
+- \`.connectToInput(sourceNode, slotIdx, inputName)\` — connect to subgraph input
+- \`.exitViaBreadcrumb()\` — navigate out of subgraph
+- \`.convertDefaultKSamplerToSubgraph()\` — helper: convert default workflow node to subgraph
+- NodeReference: \`.convertToSubgraph()\`, \`.navigateIntoSubgraph()\`
+
 ### Other helpers
 - \`comfyPage.settingDialog\` — SettingDialog component
 - \`comfyPage.searchBox\` / \`comfyPage.searchBoxV2\` — node search
 - \`comfyPage.toast\` — ToastHelper (\`.visibleToasts\`)
- \`comfyPage.subgraph\` — SubgraphHelper
- \`comfyPage.vueNodes\` — VueNodeHelpers
+- \`comfyPage.vueNodes\` — VueNodeHelpers (\`.enterSubgraph(nodeId)\`, \`.selectNode(nodeId)\`)
 - \`comfyPage.bottomPanel\` — BottomPanel
 - \`comfyPage.clipboard\` — ClipboardHelper
 - \`comfyPage.dragDrop\` — DragDropHelper
@@ -596,7 +631,7 @@ ${issueContext}`
      prompt:
        'Write a Playwright E2E test that reproduces the reported bug. Use inspect() to discover selectors, readFixture() or readTest() if you need to understand the fixture API or see existing test patterns, writeTest() to write the test, runTest() to execute it. Iterate until it works or you determine the bug cannot be reproduced.',
      options: {
-        model: 'claude-sonnet-4-6',
+        model: opts.model ?? 'claude-sonnet-4-6',
        systemPrompt,
        ...(anthropicApiKey ? { apiKey: anthropicApiKey } : {}),
        maxTurns,
--- a/.claude/skills/comfy-qa/scripts/qa-record.ts
+++ b/.claude/skills/comfy-qa/scripts/qa-record.ts
@@ -1952,7 +1952,7 @@ async function main() {
            // QA guide not available
          }
        }
-        const research = await runResearchPhase({
+        let research = await runResearchPhase({
          page,
          issueContext: issueCtx,
          qaGuide: qaGuideText,
@@ -1963,6 +1963,44 @@ async function main() {
        console.warn(
          `Research complete: ${research.verdict} — ${research.summary.slice(0, 100)}`
        )
+
+        // Opus escalation: if Sonnet couldn't reproduce, try Opus
+        if (
+          research.verdict === 'INCONCLUSIVE' &&
+          anthropicKey &&
+          process.env.QA_OPUS_ESCALATION !== '0'
+        ) {
+          console.warn('Escalating to claude-opus-4-6 for complex issue...')
+          try {
+            const opusResult = await runResearchPhase({
+              page,
+              issueContext: issueCtx,
+              qaGuide: qaGuideText,
+              outputDir: opts.outputDir,
+              serverUrl: opts.serverUrl,
+              anthropicApiKey: anthropicKey,
+              model: 'claude-opus-4-6',
+              maxTurns: 30
+            })
+            console.warn(
+              `Opus result: ${opusResult.verdict} — ${opusResult.summary.slice(0, 100)}`
+            )
+            // Only use Opus result if it's better than Sonnet's
+            if (
+              opusResult.verdict !== 'INCONCLUSIVE' ||
+              !opusResult.summary.includes('API error')
+            ) {
+              research = opusResult
+            } else {
+              console.warn('Opus failed (API error) — keeping Sonnet result')
+            }
+          } catch (opusErr) {
+            console.warn(
+              `Opus escalation failed: ${opusErr instanceof Error ? opusErr.message : opusErr}`
+            )
+            // Keep Sonnet's result
+          }
+        }
        console.warn(`Evidence: ${research.evidence.slice(0, 200)}`)

        // ═══ Phase 2: Record demo video with demowright ═══
--- a/.claude/skills/comfy-qa/scripts/qa.ts
+++ b/.claude/skills/comfy-qa/scripts/qa.ts
@@ -193,7 +193,17 @@ function fetchIssue(number: string, repo: string, outputDir: string): string {
  const body = shell(
    `gh issue view ${number} --repo ${repo} --json title,body,labels --jq '"Title: " + .title + "\\n\\nLabels: " + ([.labels[].name] | join(", ")) + "\\n\\n" + .body'`
  )
-  return writeTmpFile(outputDir, `issue-${number}.txt`, body)
+  // Append relevant comments for reproduction context
+  let comments = ''
+  try {
+    comments = shell(
+      `gh issue view ${number} --repo ${repo} --comments --json comments --jq '[.comments[] | select(.body | test("repro|step|how to|workaround"; "i")) | .body] | first(5; .[]) // empty'`
+    )
+  } catch {
+    // comments fetch failed, not critical
+  }
+  const content = comments ? `${body}\n\n--- Comments ---\n\n${comments}` : body
+  return writeTmpFile(outputDir, `issue-${number}.txt`, content)
 }

 function fetchPR(number: string, repo: string, outputDir: string): string {
--- a/.github/workflows/pr-qa.yaml
+++ b/.github/workflows/pr-qa.yaml
@@ -26,7 +26,7 @@ on:
        default: focused

 concurrency:
-  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.event.issue.number || github.ref }}
+  group: qa-${{ github.event.pull_request.number || github.event.issue.number || github.ref_name }}
  cancel-in-progress: true

 jobs:
@@ -53,7 +53,7 @@ jobs:

          # Only run on label events if it's one of our labels
          if [ "$EVENT_ACTION" = "labeled" ] && \
-             [ "$LABEL" != "qa-changes" ] && [ "$LABEL" != "qa-full" ] && [ "$LABEL" != "qa-issue" ]; then
+             [ "$LABEL" != "qa-changes" ] && [ "$LABEL" != "qa-full" ] && [ "$LABEL" != "qa-issue" ] && [ "$LABEL" != "Potential Bug" ] && [ "$LABEL" != "verified bug" ]; then
             echo "skip=true" >> "$GITHUB_OUTPUT"
          fi

@@ -272,6 +272,12 @@ jobs:
            --repo ${{ github.repository }} \
            --json title,body,labels --jq '"Labels: \([.labels[].name] | join(", "))\nTitle: \(.title)\n\n\(.body)"' \
            > "${{ runner.temp }}/issue-body.txt"
+          # Append top comments for reproduction context
+          gh issue view ${{ needs.resolve-matrix.outputs.number }} \
+            --repo ${{ github.repository }} \
+            --comments --json comments \
+            --jq '[.comments[] | select(.authorAssociation != "NONE" or (.body | test("repro|step|how to|workaround"; "i"))) | .body] | first(5; .[]) // empty' \
+            >> "${{ runner.temp }}/issue-body.txt" 2>/dev/null || true
          echo "Issue body saved ($(wc -c < "${{ runner.temp }}/issue-body.txt") bytes)"

      - name: Download QA guide
Author	SHA1	Message	Date
snomiao	fb5813dddb	feat: richer fixture API hints — i18n, queue mocks, subgraph helpers - Add Comfy.Locale setting ID for i18n language switching tests - Add queue/assets mock API (mockOutputHistory, runButton) - Add subgraph workflow asset paths for loadWorkflow - Add SubgraphHelper API docs (slot ops, navigation, conversion) - Add VueNodeHelpers enterSubgraph/selectNode	2026-04-15 16:40:27 +00:00
snomiao	e153e58c91	feat: smarter agent — precondition reasoning + issue comments - Prompt agent to reason about hidden preconditions before writing tests (e.g. z-index bugs need crowded canvas, not empty default workflow) - Fetch issue comments with reproduction hints (repro/step/workaround) - Better error analysis: different strategy on retry, not same code - Both CI workflow and pnpm qa CLI fetch comments	2026-04-15 16:31:20 +00:00
snomiao	cdea8bf2c9	fix: Opus escalation graceful fallback on credit exhaustion When Opus API call fails (credit balance, rate limit), keep Sonnet's result instead of overwriting with INCONCLUSIVE API error. Only use Opus result if it's actually better than Sonnet's attempt.	2026-04-14 15:37:07 +00:00
snomiao	a2da58eb0f	feat: Opus escalation for INCONCLUSIVE issues Sonnet tries first. If INCONCLUSIVE, automatically retries with claude-opus-4-6 (30 turns). Disable with QA_OPUS_ESCALATION=0. Also: model param added to ResearchOptions for flexibility.	2026-04-14 13:14:33 +00:00
snomiao	3154865ce2	feat: Phase 1 improvements — concurrency, auto-trigger, better prompts - B1: Fix concurrency group to use ref_name (parallel sno-qa-* branches) - D1: Auto-trigger QA on 'Potential Bug' and 'verified bug' labels - A4: Prompt agent to read existing tests first before writing - Turn budget enforcement from previous commit	2026-04-14 13:12:49 +00:00
snomiao	ff6034e2ee	fix: reduce INCONCLUSIVE rate — enforce turn budget and fail-fast - 3 consecutive test failures → call done(NOT_REPRODUCIBLE) - Turn budget: ~3 inspect, 2 write, 3 fix = ~10 tool calls max - Prevents 20+ tool call retry loops that waste CI time	2026-04-13 19:41:54 +00:00
snomiao	529ac3cea4	trigger: re-run cancelled batch 2	2026-04-13 18:42:20 +00:00
snomiao	f95eebf3db	trigger: re-run cancelled QA batches	2026-04-13 17:49:03 +00:00