mirror of
https://github.com/Comfy-Org/ComfyUI_frontend.git
synced 2026-04-20 06:20:11 +00:00
feat: control/test comparison strategy + QA backlog doc
- Agent system prompt now instructs Claude to demonstrate BOTH working (control) and broken (test) states when bug is triggered by a setting - Added docs/qa/backlog.md with future improvements: Type B/C comparisons, TTS, pre-seeding, cost optimization, environment-dependent issues Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
59
docs/qa/backlog.md
Normal file
59
docs/qa/backlog.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# QA Pipeline Backlog
|
||||
|
||||
## Comparison Modes
|
||||
|
||||
### Type A: Same code, different settings (IMPLEMENTED)
|
||||
Agent demonstrates both working (control) and broken (test) states in one session by toggling settings. E.g., Nodes 2.0 OFF → drag works, Nodes 2.0 ON → drag broken.
|
||||
|
||||
### Type B: Different commits
|
||||
For regressions reported as "worked in vX.Y, broken in vX.Z":
|
||||
- `qa-analyze-pr.ts` detects regression markers ("since v1.38", "after PR #1234")
|
||||
- Pipeline checks out the old commit, records control video
|
||||
- Records test video on current main
|
||||
- Side-by-side comparison on report page (reuses PR before/after infra)
|
||||
|
||||
### Type C: Different browsers
|
||||
For browser-specific bugs ("works on Chrome, broken on Firefox"):
|
||||
- Run recording with different Playwright browser contexts
|
||||
- Compare behavior across browsers in one report
|
||||
|
||||
## Agent Improvements
|
||||
|
||||
### TTS Narration
|
||||
- OpenAI TTS (`tts-1`, nova voice) generates audio from agent reasoning
|
||||
- Merged into video via ffmpeg at correct timestamps
|
||||
- Currently in qa-record.ts but needs wiring into hybrid agent path
|
||||
|
||||
### Image/Screenshot Reading
|
||||
- `qa-analyze-pr.ts` already downloads and sends images from issue bodies to Gemini
|
||||
- Could also send them to the Claude agent as context ("the reporter showed this screenshot")
|
||||
|
||||
### Placeholder Page
|
||||
- Deploy a status page immediately when CI starts
|
||||
- Auto-refreshes every 30s until final report replaces it
|
||||
- Shows spinner, CI link, badge
|
||||
|
||||
### Pre-seed Assets
|
||||
- Upload test images via ComfyUI API before recording
|
||||
- Enables reproduction of bugs requiring assets (#10424 zoom button)
|
||||
|
||||
### Environment-Dependent Issues
|
||||
- #7942: needs custom TestNode — could install a test custom node pack in CI
|
||||
- #9101: needs completed generation — could run with a tiny model checkpoint
|
||||
|
||||
## Cost Optimization
|
||||
|
||||
### Lazy A11y Tree
|
||||
- `inspect(selector)` searches tree for specific element (~20 tokens)
|
||||
- `getUIChanges()` diffs against previous snapshot (~100 tokens)
|
||||
- vs dumping full tree every turn (~2000 tokens)
|
||||
|
||||
### Gemini Video vs Images
|
||||
- 30s video clip: ~7,700 tokens (258 tok/s)
|
||||
- 15 screenshots: ~19,500 tokens (1,300 tok/frame)
|
||||
- Video is 2.5x cheaper and shows temporal changes
|
||||
|
||||
### Model Selection
|
||||
- Claude Sonnet 4.6: $3/$15 per 1M in/out — best reasoning
|
||||
- Gemini 2.5 Flash: $0.10/$0.40 per 1M — best vision-per-dollar
|
||||
- Hybrid uses each where it's strongest
|
||||
@@ -485,6 +485,20 @@ export async function runHybridAgent(opts: AgentOptions): Promise<{
|
||||
5. Take screenshots at key moments for the video evidence.
|
||||
6. When you've confirmed or ruled out the bug, call done().
|
||||
|
||||
## Control/Test Comparison (IMPORTANT)
|
||||
When a bug is triggered by a specific setting, mode, or configuration:
|
||||
1. **CONTROL phase**: First demonstrate the WORKING state. Disable the trigger (e.g., Nodes 2.0 OFF), perform the action, take a screenshot labeled "control-*", verify it works.
|
||||
2. **TEST phase**: Then enable the trigger (e.g., Nodes 2.0 ON), reload if needed, perform the SAME action, take a screenshot labeled "test-*", verify it's broken.
|
||||
3. In your done() summary, explicitly compare: "With X OFF, behavior was Y. With X ON, behavior was Z."
|
||||
|
||||
This contrast is critical evidence — it proves the bug is caused by the specific setting, not a general issue. Always try to show both states when possible.
|
||||
|
||||
Examples of control/test pairs:
|
||||
- Nodes 2.0 OFF → ON (for node rendering, widget, drag bugs)
|
||||
- Default theme → specific theme (for visual bugs)
|
||||
- Single node → multiple overlapping nodes (for z-index bugs)
|
||||
- Empty workflow → loaded workflow (for state bugs)
|
||||
|
||||
## ComfyUI Layout (1280×720 viewport)
|
||||
- Canvas with node graph centered at ~(640, 400)
|
||||
- Hamburger menu top-left (C logo)
|
||||
|
||||
Reference in New Issue
Block a user