mirror of https://github.com/Comfy-Org/ComfyUI_frontend.git synced 2026-04-20 06:20:11 +00:00

Files

snomiao cf54ddb6d3 feat: control/test comparison strategy + QA backlog doc

- Agent system prompt now instructs Claude to demonstrate BOTH working
  (control) and broken (test) states when bug is triggered by a setting
- Added docs/qa/backlog.md with future improvements: Type B/C comparisons,
  TTS, pre-seeding, cost optimization, environment-dependent issues

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-27 07:24:10 +00:00

2.3 KiB

Raw Blame History

QA Pipeline Backlog

Comparison Modes

Type A: Same code, different settings (IMPLEMENTED)

Agent demonstrates both working (control) and broken (test) states in one session by toggling settings. E.g., Nodes 2.0 OFF → drag works, Nodes 2.0 ON → drag broken.

Type B: Different commits

For regressions reported as "worked in vX.Y, broken in vX.Z":

qa-analyze-pr.ts detects regression markers ("since v1.38", "after PR #1234")
Pipeline checks out the old commit, records control video
Records test video on current main
Side-by-side comparison on report page (reuses PR before/after infra)

Type C: Different browsers

For browser-specific bugs ("works on Chrome, broken on Firefox"):

Run recording with different Playwright browser contexts
Compare behavior across browsers in one report

Agent Improvements

TTS Narration

OpenAI TTS (tts-1, nova voice) generates audio from agent reasoning
Merged into video via ffmpeg at correct timestamps
Currently in qa-record.ts but needs wiring into hybrid agent path

Image/Screenshot Reading

qa-analyze-pr.ts already downloads and sends images from issue bodies to Gemini
Could also send them to the Claude agent as context ("the reporter showed this screenshot")

Placeholder Page

Deploy a status page immediately when CI starts
Auto-refreshes every 30s until final report replaces it
Shows spinner, CI link, badge

Pre-seed Assets

Upload test images via ComfyUI API before recording
Enables reproduction of bugs requiring assets (#10424 zoom button)

Environment-Dependent Issues

#7942: needs custom TestNode — could install a test custom node pack in CI
#9101: needs completed generation — could run with a tiny model checkpoint

Cost Optimization

inspect(selector) searches tree for specific element (~20 tokens)
getUIChanges() diffs against previous snapshot (~100 tokens)
vs dumping full tree every turn (~2000 tokens)

Gemini Video vs Images

30s video clip: ~7,700 tokens (258 tok/s)
15 screenshots: ~19,500 tokens (1,300 tok/frame)
Video is 2.5x cheaper and shows temporal changes

Model Selection

Claude Sonnet 4.6: $3/$15 per 1M in/out — best reasoning
Gemini 2.5 Flash: $0.10/$0.40 per 1M — best vision-per-dollar
Hybrid uses each where it's strongest

2.3 KiB Raw Blame History