Compare commits

..

8 Commits

Author SHA1 Message Date
snomiao
fb5813dddb feat: richer fixture API hints — i18n, queue mocks, subgraph helpers
- Add Comfy.Locale setting ID for i18n language switching tests
- Add queue/assets mock API (mockOutputHistory, runButton)
- Add subgraph workflow asset paths for loadWorkflow
- Add SubgraphHelper API docs (slot ops, navigation, conversion)
- Add VueNodeHelpers enterSubgraph/selectNode
2026-04-15 16:40:27 +00:00
snomiao
e153e58c91 feat: smarter agent — precondition reasoning + issue comments
- Prompt agent to reason about hidden preconditions before writing tests
  (e.g. z-index bugs need crowded canvas, not empty default workflow)
- Fetch issue comments with reproduction hints (repro/step/workaround)
- Better error analysis: different strategy on retry, not same code
- Both CI workflow and pnpm qa CLI fetch comments
2026-04-15 16:31:20 +00:00
snomiao
cdea8bf2c9 fix: Opus escalation graceful fallback on credit exhaustion
When Opus API call fails (credit balance, rate limit), keep Sonnet's
result instead of overwriting with INCONCLUSIVE API error. Only use
Opus result if it's actually better than Sonnet's attempt.
2026-04-14 15:37:07 +00:00
snomiao
a2da58eb0f feat: Opus escalation for INCONCLUSIVE issues
Sonnet tries first. If INCONCLUSIVE, automatically retries with
claude-opus-4-6 (30 turns). Disable with QA_OPUS_ESCALATION=0.
Also: model param added to ResearchOptions for flexibility.
2026-04-14 13:14:33 +00:00
snomiao
3154865ce2 feat: Phase 1 improvements — concurrency, auto-trigger, better prompts
- B1: Fix concurrency group to use ref_name (parallel sno-qa-* branches)
- D1: Auto-trigger QA on 'Potential Bug' and 'verified bug' labels
- A4: Prompt agent to read existing tests first before writing
- Turn budget enforcement from previous commit
2026-04-14 13:12:49 +00:00
snomiao
ff6034e2ee fix: reduce INCONCLUSIVE rate — enforce turn budget and fail-fast
- 3 consecutive test failures → call done(NOT_REPRODUCIBLE)
- Turn budget: ~3 inspect, 2 write, 3 fix = ~10 tool calls max
- Prevents 20+ tool call retry loops that waste CI time
2026-04-13 19:41:54 +00:00
snomiao
529ac3cea4 trigger: re-run cancelled batch 2 2026-04-13 18:42:20 +00:00
snomiao
f95eebf3db trigger: re-run cancelled QA batches 2026-04-13 17:49:03 +00:00
4 changed files with 103 additions and 14 deletions

View File

@@ -35,6 +35,7 @@ interface ResearchOptions {
anthropicApiKey?: string
maxTurns?: number
timeBudgetMs?: number
model?: string
}
export type ReproMethod = 'e2e_test' | 'video' | 'both' | 'none'
@@ -401,15 +402,28 @@ export async function runResearchPhase(
- done(verdict, summary, evidence, testCode) — Finish with the final test
## Workflow
1. Read the issue description carefully
2. Use inspect() to understand the current UI state and discover element selectors
3. If unsure about the fixture API, use readFixture() to read the relevant helper source code
4. If unsure about test patterns, use readTest() to read an existing test for reference
1. Read the issue description carefully. Think about:
- What PRECONDITIONS are needed? (many nodes on canvas? specific layout? saved workflow? subgraph?)
- What HIDDEN ASSUMPTIONS exist? (e.g. "z-index bug" means nodes must overlap → need a crowded canvas)
- What specific UI STATE triggers the bug? (dirty workflow? collapsed node? specific menu open?)
2. Use readTest() to read 1-2 existing tests similar to the bug:
- For menu/workflow bugs: readTest("workflow.spec.ts") or readTest("topbarMenu.spec.ts")
- For node/canvas bugs: readTest("nodeInteraction.spec.ts") or readTest("copyPaste.spec.ts")
- For settings bugs: readTest("settingDialogSearch.spec.ts")
- For subgraph bugs: readTest("subgraph.spec.ts")
3. Use inspect() to understand the current UI state and discover element selectors
4. If unsure about the fixture API, use readFixture("ComfyPage.ts") or relevant helper
5. Write a Playwright test that:
- Performs the exact reproduction steps from the issue
- Asserts the BROKEN behavior (the bug) — so the test PASSES when the bug exists
- FIRST sets up the preconditions (add multiple nodes, create specific layout, save workflow, etc.)
- THEN performs the reproduction steps from the issue
- FINALLY asserts the BROKEN behavior (the bug) — so the test PASSES when the bug exists
- Think like a tester: the bug may only appear under specific conditions that the reporter assumed were obvious
6. Run the test with runTest()
7. If it fails: read the error, fix the test, run again (max 5 attempts)
7. If it fails, ANALYZE the error before retrying:
- Is it a selector issue? Use inspect() to find the right element
- Is it a timing issue? The UI may need time to update — use nextFrame() or expect.poll()
- Is the precondition wrong? Maybe the bug only appears with MORE nodes, AFTER a save, etc.
- Try a DIFFERENT approach, not the same code with minor tweaks
8. Call done() with the final verdict and test code
## Test writing guidelines
@@ -423,6 +437,8 @@ export async function runResearchPhase(
- Use \`comfyPage.nextFrame()\` after interactions that trigger UI updates
- NEVER use \`page.waitForTimeout()\` — use Locator actions and retrying assertions instead
- ALWAYS call done() when finished, even if the test passed — do not keep iterating after a passing test
- CRITICAL: If your test FAILS 3 times in a row with the same or similar error, call done(NOT_REPRODUCIBLE) immediately. Do NOT keep retrying the same approach — try a completely different strategy or give up. Spending 20+ tool calls on failing tests is wasteful.
- Budget your turns: spend at most 3 turns on inspect/readFixture, 2 turns writing the first test, then max 3 fix attempts. If still failing after ~10 tool calls, call done().
- Use \`expect.poll()\` for async assertions: \`await expect.poll(() => comfyPage.nodeOps.getGraphNodesCount()).toBe(8)\`
- CRITICAL: Your assertions must be SPECIFIC TO THE BUG. A test that asserts \`expect(count).toBeGreaterThan(0)\` proves nothing — it would pass even without the bug. Instead assert the exact broken state, e.g. \`expect(clonedWidgets).toHaveLength(0)\` (missing widgets) or \`expect(zIndex).toBeLessThan(parentZIndex)\` (wrong z-order). If a test passes trivially, it's a false positive.
- NEVER write "debug", "discovery", or "inspect node types" tests. These waste turns and produce false REPRODUCED verdicts. If you need to discover node type names, use inspect() or readFixture() — not a passing test.
@@ -481,6 +497,10 @@ export async function runResearchPhase(
### Settings (comfyPage.settings)
- \`.setSetting(id, value)\` — change a ComfyUI setting
- \`.getSetting(id)\` — read current setting value
- Common setting IDs:
- \`'Comfy.UseNewMenu'\` — 'Top' | 'Bottom' | 'Disabled'
- \`'Comfy.Locale'\` — 'en' | 'zh' | 'ja' | 'ko' | 'ru' | 'fr' | 'es' etc. (change UI language)
- \`'Comfy.NodeBadge.NodeSourceBadgeMode'\` — node badge display
### Keyboard (comfyPage.keyboard)
- \`.undo()\` / \`.redo()\` — Ctrl+Z / Ctrl+Y
@@ -493,6 +513,7 @@ export async function runResearchPhase(
- \`.setupWorkflowsDirectory(structure)\` — setup test directory
- \`.deleteWorkflow(name)\`
- \`.isCurrentWorkflowModified()\` — check dirty state
- Available subgraph assets: loadWorkflow('subgraphs/basic-subgraph'), 'subgraphs/nested-subgraph', 'subgraphs/subgraph-with-promoted-text-widget', etc.
### Context Menu (comfyPage.contextMenu)
- \`.openFor(locator)\` — right-click locator and wait for menu
@@ -500,12 +521,26 @@ export async function runResearchPhase(
- \`.isVisible()\` — check if context menu is showing
- \`.assertHasItems(items)\` — assert menu contains items
### Queue & Assets (comfyPage.assets)
- \`comfyPage.runButton.click()\` — execute current workflow (backend runs with --cpu in CI)
- \`comfyPage.assets.mockOutputHistory(jobs)\` — mock queue history with fake job items
- \`comfyPage.assets.mockEmptyState()\` — clear all mocked state
- Queue overlay: \`page.getByTestId('queue-overlay-toggle')\` to open queue panel
### Subgraph (comfyPage.subgraph)
- \`.isInSubgraph()\` — check if currently viewing a subgraph
- \`.getNodeCount()\` — nodes in current graph view
- \`.getSlotCount('input'|'output')\` — I/O slot count
- \`.connectToInput(sourceNode, slotIdx, inputName)\` — connect to subgraph input
- \`.exitViaBreadcrumb()\` — navigate out of subgraph
- \`.convertDefaultKSamplerToSubgraph()\` — helper: convert default workflow node to subgraph
- NodeReference: \`.convertToSubgraph()\`, \`.navigateIntoSubgraph()\`
### Other helpers
- \`comfyPage.settingDialog\` — SettingDialog component
- \`comfyPage.searchBox\` / \`comfyPage.searchBoxV2\` — node search
- \`comfyPage.toast\` — ToastHelper (\`.visibleToasts\`)
- \`comfyPage.subgraph\`SubgraphHelper
- \`comfyPage.vueNodes\` — VueNodeHelpers
- \`comfyPage.vueNodes\`VueNodeHelpers (\`.enterSubgraph(nodeId)\`, \`.selectNode(nodeId)\`)
- \`comfyPage.bottomPanel\` — BottomPanel
- \`comfyPage.clipboard\` — ClipboardHelper
- \`comfyPage.dragDrop\` — DragDropHelper
@@ -596,7 +631,7 @@ ${issueContext}`
prompt:
'Write a Playwright E2E test that reproduces the reported bug. Use inspect() to discover selectors, readFixture() or readTest() if you need to understand the fixture API or see existing test patterns, writeTest() to write the test, runTest() to execute it. Iterate until it works or you determine the bug cannot be reproduced.',
options: {
model: 'claude-sonnet-4-6',
model: opts.model ?? 'claude-sonnet-4-6',
systemPrompt,
...(anthropicApiKey ? { apiKey: anthropicApiKey } : {}),
maxTurns,

View File

@@ -1952,7 +1952,7 @@ async function main() {
// QA guide not available
}
}
const research = await runResearchPhase({
let research = await runResearchPhase({
page,
issueContext: issueCtx,
qaGuide: qaGuideText,
@@ -1963,6 +1963,44 @@ async function main() {
console.warn(
`Research complete: ${research.verdict}${research.summary.slice(0, 100)}`
)
// Opus escalation: if Sonnet couldn't reproduce, try Opus
if (
research.verdict === 'INCONCLUSIVE' &&
anthropicKey &&
process.env.QA_OPUS_ESCALATION !== '0'
) {
console.warn('Escalating to claude-opus-4-6 for complex issue...')
try {
const opusResult = await runResearchPhase({
page,
issueContext: issueCtx,
qaGuide: qaGuideText,
outputDir: opts.outputDir,
serverUrl: opts.serverUrl,
anthropicApiKey: anthropicKey,
model: 'claude-opus-4-6',
maxTurns: 30
})
console.warn(
`Opus result: ${opusResult.verdict}${opusResult.summary.slice(0, 100)}`
)
// Only use Opus result if it's better than Sonnet's
if (
opusResult.verdict !== 'INCONCLUSIVE' ||
!opusResult.summary.includes('API error')
) {
research = opusResult
} else {
console.warn('Opus failed (API error) — keeping Sonnet result')
}
} catch (opusErr) {
console.warn(
`Opus escalation failed: ${opusErr instanceof Error ? opusErr.message : opusErr}`
)
// Keep Sonnet's result
}
}
console.warn(`Evidence: ${research.evidence.slice(0, 200)}`)
// ═══ Phase 2: Record demo video with demowright ═══

View File

@@ -193,7 +193,17 @@ function fetchIssue(number: string, repo: string, outputDir: string): string {
const body = shell(
`gh issue view ${number} --repo ${repo} --json title,body,labels --jq '"Title: " + .title + "\\n\\nLabels: " + ([.labels[].name] | join(", ")) + "\\n\\n" + .body'`
)
return writeTmpFile(outputDir, `issue-${number}.txt`, body)
// Append relevant comments for reproduction context
let comments = ''
try {
comments = shell(
`gh issue view ${number} --repo ${repo} --comments --json comments --jq '[.comments[] | select(.body | test("repro|step|how to|workaround"; "i")) | .body] | first(5; .[]) // empty'`
)
} catch {
// comments fetch failed, not critical
}
const content = comments ? `${body}\n\n--- Comments ---\n\n${comments}` : body
return writeTmpFile(outputDir, `issue-${number}.txt`, content)
}
function fetchPR(number: string, repo: string, outputDir: string): string {

View File

@@ -26,7 +26,7 @@ on:
default: focused
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.event.issue.number || github.ref }}
group: qa-${{ github.event.pull_request.number || github.event.issue.number || github.ref_name }}
cancel-in-progress: true
jobs:
@@ -53,7 +53,7 @@ jobs:
# Only run on label events if it's one of our labels
if [ "$EVENT_ACTION" = "labeled" ] && \
[ "$LABEL" != "qa-changes" ] && [ "$LABEL" != "qa-full" ] && [ "$LABEL" != "qa-issue" ]; then
[ "$LABEL" != "qa-changes" ] && [ "$LABEL" != "qa-full" ] && [ "$LABEL" != "qa-issue" ] && [ "$LABEL" != "Potential Bug" ] && [ "$LABEL" != "verified bug" ]; then
echo "skip=true" >> "$GITHUB_OUTPUT"
fi
@@ -272,6 +272,12 @@ jobs:
--repo ${{ github.repository }} \
--json title,body,labels --jq '"Labels: \([.labels[].name] | join(", "))\nTitle: \(.title)\n\n\(.body)"' \
> "${{ runner.temp }}/issue-body.txt"
# Append top comments for reproduction context
gh issue view ${{ needs.resolve-matrix.outputs.number }} \
--repo ${{ github.repository }} \
--comments --json comments \
--jq '[.comments[] | select(.authorAssociation != "NONE" or (.body | test("repro|step|how to|workaround"; "i"))) | .body] | first(5; .[]) // empty' \
>> "${{ runner.temp }}/issue-body.txt" 2>/dev/null || true
echo "Issue body saved ($(wc -c < "${{ runner.temp }}/issue-body.txt") bytes)"
- name: Download QA guide