mirror of https://github.com/Comfy-Org/ComfyUI_frontend.git synced 2026-04-19 22:09:37 +00:00

Files

snomiao 7520a47b5a feat: add automated QA pipeline with E2E test-driven bug reproduction

Three-phase pipeline triggered by labels (qa-changes, qa-full, qa-issue):
1. Research: Claude writes Playwright E2E tests to reproduce reported bugs
2. Reproduce: Deterministic replay with video recording
3. Report: Deploy results to Cloudflare Pages with badges

Key design decisions:
- Playwright assertions are source of truth (not AI vision)
- Agent has readFixture/readTest tools to discover project patterns
- Bug-specific assertions required (trivial assertions banned)
- Main branch dist cached by SHA to speed up before/after comparisons
- QA deps installed inline in CI (no package.json changes needed)

Verified across 48 runs (22 PRs + 26 issues) with 0 false positives.

Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>

2026-04-10 08:51:10 +00:00

3.8 KiB

Raw Blame History

QA Pipeline Model Selection

Current Configuration

Script	Role	Model	Why
`qa-analyze-pr.ts`	PR/issue analysis, QA guide generation	`gemini-3.1-pro-preview`	Needs deep reasoning over PR diffs, screenshots, and issue threads
`qa-record.ts`	Playwright step generation	`gemini-3.1-pro-preview`	Step quality is critical — must understand ComfyUI's canvas UI and produce precise action sequences
`qa-video-review.ts`	Video comparison review	`gemini-3-flash-preview`	Video analysis with structured output; flash is sufficient and faster
`qa-generate-test.ts`	Regression test generation	`gemini-3-flash-preview`	Code generation from QA reports; flash handles this well

Model Comparison

Gemini 3.1 Pro vs GPT-5.4

	Gemini 3.1 Pro Preview	GPT-5.4
Context window	1M tokens	1M tokens
Max output	65K tokens	128K tokens
Video input	Yes	No
Image input	Yes	Yes
Audio input	Yes	No
Pricing (input)	$2/1M tokens	$2.50/1M tokens
Pricing (output)	$12/1M tokens	$15/1M tokens
Function calling	Yes	Yes
Code execution	Yes	Yes (interpreter)
Structured output	Yes	Yes

Why Gemini over GPT for QA:

Native video understanding (can review recordings directly)
Lower cost at comparable quality
Native multimodal input (screenshots, videos, audio from issue threads)
Better price/performance for high-volume CI usage

Gemini 3 Flash vs GPT-5.4 Mini

	Gemini 3 Flash Preview	GPT-5.4 Mini
Context window	1M tokens	1M tokens
Pricing (input)	$0.50/1M tokens	$0.40/1M tokens
Pricing (output)	$3/1M tokens	$1.60/1M tokens
Video input	Yes	No

Why Gemini Flash for video review:

Video input support is required — GPT models cannot process video files
Good enough quality for structured comparison reports

Upgrade History

Date	Change	Reason
2026-03-24	`gemini-2.5-flash` → `gemini-3.1-pro-preview` (record)	Shallow step generation; pro model needed for complex ComfyUI interactions
2026-03-24	`gemini-2.5-pro` → `gemini-3.1-pro-preview` (analyze)	Keep analysis on latest pro
2026-03-24	`gemini-2.5-flash` → `gemini-3-flash-preview` (review, test-gen)	Latest flash for cost-efficient tasks

Override

All scripts accept --model <name> to override the default. Pass any Gemini model ID.

3.8 KiB Raw Blame History