Three-phase pipeline triggered by labels (qa-changes, qa-full, qa-issue):
1. Research: Claude writes Playwright E2E tests to reproduce reported bugs
2. Reproduce: Deterministic replay with video recording
3. Report: Deploy results to Cloudflare Pages with badges
Key design decisions:
- Playwright assertions are source of truth (not AI vision)
- Agent has readFixture/readTest tools to discover project patterns
- Bug-specific assertions required (trivial assertions banned)
- Main branch dist cached by SHA to speed up before/after comparisons
- QA deps installed inline in CI (no package.json changes needed)
Verified across 48 runs (22 PRs + 26 issues) with 0 false positives.
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>