Resolved conflicts in update-playwright-expectations.yaml by: - Keeping main's improvements: concurrency control, comment reactions, better branch checkout - Keeping our selective snapshot update logic with validation - Keeping our workflow summary generation - Combined both sets of improvements for a robust solution Fixed eslint configuration issue where vite.config.mts was in both allowDefaultProject and tsconfig.json
24 KiB
Playwright Selective Test Rerun Alternatives
This document analyzes alternatives for selectively re-running only failed Playwright tests for snapshot updates, comparing native Playwright features with the current custom manifest approach used in this project.
Table of Contents
- Current Approach
- Native Playwright Features
- Playwright Reporter Options
- GitHub Actions Integration Patterns
- Third-Party Solutions
- Comparison and Recommendations
Current Approach
Implementation
The project currently uses a custom manifest-based approach that:
-
Generates a manifest of failed screenshot tests after CI runs
- Script:
scripts/cicd/build-failed-screenshot-manifest.ts - Parses JSON report to find tests with failed screenshot assertions
- Creates per-project text files:
ci-rerun/{project}.txt - Format:
file_path:line_number(e.g.,browser_tests/menu.test.ts:42)
- Script:
-
Stores manifest as GitHub artifact
- Artifact name:
failed-screenshot-tests - Retention: 7 days
- Only uploaded when chromium sharded tests fail
- Artifact name:
-
Downloads manifest in update workflow
- Workflow:
.github/workflows/update-playwright-expectations.yaml - Triggered by: PR label "New Browser Test Expectations" or
/update-playwrightcomment - Falls back to full test suite if manifest not found
- Workflow:
-
Re-runs only failed tests
for f in ci-rerun/*.txt; do project="$(basename "$f" .txt)" mapfile -t lines < "$f" # Filter empty lines pnpm exec playwright test --project="$project" --update-snapshots "${filtered[@]}" done
Advantages
- ✅ Works across workflow runs and different trigger mechanisms
- ✅ Survives beyond single workflow execution
- ✅ Precise control over which tests to re-run
- ✅ Supports multiple projects with separate manifests
- ✅ Works with sharded test runs (merged report)
- ✅ Platform-agnostic approach (works on any CI/CD platform)
Disadvantages
- ❌ Custom implementation requires maintenance
- ❌ Requires parsing JSON report format (could break with Playwright updates)
- ❌ Additional artifact storage needed
- ❌ More complex than native solutions
Native Playwright Features
1. --last-failed CLI Flag
Availability: Playwright v1.44.0+ (May 2024)
How It Works
# First run - execute all tests
npx playwright test
# Second run - only re-run failed tests
npx playwright test --last-failed
Playwright maintains a .last-run.json file in the test-results/ directory that tracks failed tests.
CLI Examples
# Run only failed tests from last run
npx playwright test --last-failed
# Update snapshots for only failed tests
npx playwright test --last-failed --update-snapshots
# Combine with project filtering
npx playwright test --last-failed --project=chromium
# Debug failed tests
npx playwright test --last-failed --debug
File Location and Format
- Location:
test-results/.last-run.json - Format: JSON object containing failed test information
- Structure: Contains a
failedTests: []array with test identifiers - Persistence: Cleared when all tests pass on subsequent run
Advantages
- ✅ Built into Playwright (no custom code)
- ✅ Simple CLI flag
- ✅ Automatically maintained by Playwright
- ✅ Works with all Playwright features (debug, UI mode, etc.)
Limitations
- ❌ Not designed for CI/CD distributed testing (per Playwright maintainers)
- ❌ Intended for local development only ("inner loop scenario")
- ❌ Cleared on new test runs (doesn't persist across clean environments)
- ❌ GitHub Actions starts with clean environment -
.last-run.jsonnot available on retry - ❌ Doesn't work with sharded tests - each shard creates its own
.last-run.json - ❌ No native way to merge
.last-run.jsonacross shards - ❌ Not designed for cross-workflow persistence
CI/CD Workaround (Not Recommended)
To use --last-failed in GitHub Actions, you would need to:
- name: Run Playwright tests
id: playwright-test
run: npx playwright test
- name: Upload last run state
if: failure()
uses: actions/upload-artifact@v4
with:
name: last-run-state
path: test-results/.last-run.json
# In retry workflow:
- name: Download last run state
uses: actions/download-artifact@v4
with:
name: last-run-state
path: test-results/
- name: Rerun failed tests
run: npx playwright test --last-failed --update-snapshots
Why This Isn't Ideal:
- Playwright maintainers explicitly state this is not the intended use case
- Doesn't work well with sharded tests (multiple
.last-run.jsonfiles) - Requires manual artifact management
- More complex than the current custom approach for this use case
2. File:Line Syntax for Specific Tests
Playwright supports running tests at specific line numbers:
# Run a specific test at line 42
npx playwright test tests/example.spec.ts:42
# Multiple tests
npx playwright test tests/file1.spec.ts:10 tests/file2.spec.ts:25
# With snapshot updates
npx playwright test tests/example.spec.ts:42 --update-snapshots
# With project selection
npx playwright test --project=chromium tests/example.spec.ts:42
This is exactly the format the current custom manifest uses, making it compatible with Playwright's native CLI.
3. Test Filtering Options
# Filter by grep pattern
npx playwright test -g "screenshot"
# Inverse grep
npx playwright test --grep-invert "mobile"
# By project
npx playwright test --project=chromium
# Multiple projects
npx playwright test --project=chromium --project=firefox
# Specific directory
npx playwright test tests/screenshots/
Playwright Reporter Options
1. JSON Reporter
Purpose: Machine-readable test results
Configuration
// playwright.config.ts
export default defineConfig({
reporter: [
['json', { outputFile: 'results.json' }]
]
})
Or via environment variable:
PLAYWRIGHT_JSON_OUTPUT_NAME=results.json npx playwright test --reporter=json
Output Structure
{
"stats": {
"expected": 100,
"unexpected": 5,
"flaky": 2,
"skipped": 3
},
"suites": [
{
"title": "Test Suite",
"specs": [
{
"file": "browser_tests/example.test.ts",
"line": 42,
"tests": [
{
"projectId": "chromium",
"results": [
{
"status": "failed",
"attachments": [
{ "contentType": "image/png" }
]
}
]
}
]
}
]
}
]
}
This is the format the current build-failed-screenshot-manifest.ts script parses.
Advantages
- ✅ Stable, documented JSON schema (
@playwright/test/reporter) - ✅ Includes all test metadata (file, line, project, status, attachments)
- ✅ Can be used programmatically
- ✅ Supports multiple reporters simultaneously
Current Project Usage
# In tests-ci.yaml
PLAYWRIGHT_JSON_OUTPUT_NAME=playwright-report/report.json \
pnpm exec playwright test --project=${{ matrix.browser }} \
--reporter=list \
--reporter=html \
--reporter=json
2. Blob Reporter
Purpose: Merging sharded test reports
Configuration
// playwright.config.ts
export default defineConfig({
reporter: process.env.CI ? 'blob' : 'html'
})
Usage with Sharding
# Run sharded test with blob output
npx playwright test --shard=1/4 --reporter=blob
# Merge blob reports
npx playwright merge-reports --reporter=html ./all-blob-reports
npx playwright merge-reports --reporter=json ./all-blob-reports
Current Project Usage
# Sharded chromium tests
- run: pnpm exec playwright test --project=chromium --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }} --reporter=blob
env:
PLAYWRIGHT_BLOB_OUTPUT_DIR: ../blob-report
# Merge reports job
- run: |
pnpm exec playwright merge-reports --reporter=html ./all-blob-reports
PLAYWRIGHT_JSON_OUTPUT_NAME=playwright-report/report.json \
pnpm exec playwright merge-reports --reporter=json ./all-blob-reports
Advantages
- ✅ Designed for distributed testing
- ✅ Can merge into any reporter format (HTML, JSON, etc.)
- ✅ Preserves all test information across shards
Blob Reporter and --last-failed
- ❌ Blob reports do not contain a merged
.last-run.json - ❌ Each shard creates its own
.last-run.jsonthat isn't included in blob - ❌ GitHub issue #30924 requests this feature (currently unsupported)
3. Multiple Reporters
You can use multiple reporters simultaneously:
export default defineConfig({
reporter: [
['list'], // Terminal output
['html'], // Browse results
['json', { outputFile: 'results.json' }], // Programmatic parsing
['junit', { outputFile: 'results.xml' }] // CI integration
]
})
Or via CLI:
npx playwright test --reporter=list --reporter=html --reporter=json
GitHub Actions Integration Patterns
Pattern 1: Comment-Triggered Workflow (JupyterLab Approach)
Example: jupyterlab/jupyterlab-git
name: Update Playwright Snapshots
on:
issue_comment:
types: [created, edited]
permissions:
contents: write
pull-requests: write
jobs:
update-snapshots:
# Only run for authorized users on PRs with specific comment
if: >
(github.event.issue.author_association == 'OWNER' ||
github.event.issue.author_association == 'COLLABORATOR' ||
github.event.issue.author_association == 'MEMBER'
) && github.event.issue.pull_request &&
contains(github.event.comment.body, 'please update snapshots')
runs-on: ubuntu-latest
steps:
- name: React to the triggering comment
run: gh api repos/${{ github.repository }}/issues/comments/${{ github.event.comment.id }}/reactions --raw-field 'content=+1'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Checkout
uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
- name: Checkout PR branch
run: gh pr checkout ${{ github.event.issue.number }}
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Setup and run tests
run: |
npm ci
npx playwright install --with-deps
npx playwright test --update-snapshots
- name: Commit and push
run: |
git config user.name 'github-actions'
git config user.email 'github-actions@github.com'
git add .
git diff --cached --quiet || git commit -m "Update snapshots"
git push
Advantages
- ✅ Simple comment-based trigger
- ✅ Visual feedback (reaction on comment)
- ✅ Authorization checks built-in
- ✅ Auto-commits to PR branch
Limitations
- ❌ Runs all tests with
--update-snapshots(not selective) - ❌ No integration with failed test information from CI
Pattern 2: Label-Based Trigger + Manifest (Current Approach)
name: Update Playwright Expectations
on:
pull_request:
types: [labeled]
issue_comment:
types: [created]
jobs:
test:
if: >
( github.event_name == 'pull_request' &&
github.event.label.name == 'New Browser Test Expectations' ) ||
( github.event.issue.pull_request &&
startsWith(github.event.comment.body, '/update-playwright') )
steps:
# ... setup steps ...
- name: Locate failed screenshot manifest artifact
id: locate-manifest
uses: actions/github-script@v8
with:
script: |
const { owner, repo } = context.repo
let headSha = ''
if (context.eventName === 'pull_request') {
headSha = context.payload.pull_request.head.sha
} else if (context.eventName === 'issue_comment') {
const prNumber = context.payload.issue.number
const pr = await github.rest.pulls.get({ owner, repo, pull_number: prNumber })
headSha = pr.data.head.sha
}
const { data } = await github.rest.actions.listWorkflowRuns({
owner, repo,
workflow_id: 'tests-ci.yaml',
head_sha: headSha,
per_page: 1,
})
const run = data.workflow_runs?.[0]
let has = 'false'
if (run) {
const { data: { artifacts = [] } } = await github.rest.actions.listWorkflowRunArtifacts({
owner, repo, run_id: run.id
})
if (artifacts.some(a => a.name === 'failed-screenshot-tests' && !a.expired))
has = 'true'
}
core.setOutput('has_manifest', has)
- name: Download failed screenshot manifest
if: steps.locate-manifest.outputs.has_manifest == 'true'
uses: actions/download-artifact@v4
with:
run-id: ${{ steps.locate-manifest.outputs.run_id }}
name: failed-screenshot-tests
path: ComfyUI_frontend/ci-rerun
- name: Re-run failed screenshot tests
run: |
if [ ! -d ci-rerun ]; then
echo "No manifest found; running full suite"
pnpm exec playwright test --update-snapshots
exit 0
fi
for f in ci-rerun/*.txt; do
project="$(basename "$f" .txt)"
mapfile -t lines < "$f"
filtered=()
for l in "${lines[@]}"; do
[ -n "$l" ] && filtered+=("$l")
done
if [ ${#filtered[@]} -gt 0 ]; then
echo "Re-running ${#filtered[@]} tests for project $project"
pnpm exec playwright test --project="$project" --update-snapshots "${filtered[@]}"
fi
done
Advantages
- ✅ Selective - only re-runs failed screenshot tests
- ✅ Works across different trigger mechanisms (label or comment)
- ✅ Fallback to full suite if manifest not found
- ✅ Per-project manifests support multiple browser configurations
- ✅ Handles sharded tests via merged report
Pattern 3: WordPress/Openverse Approach (Always Update)
Proposed pattern (not fully implemented):
- CI always runs with
--update-snapshotsflag - If snapshots change, create/update a secondary branch
- Open PR targeting the original PR branch
- Developer reviews snapshot changes before merging
Advantages
- ✅ Always generates correct snapshots
- ✅ Snapshot changes are visible in separate PR
- ✅ No test failures due to mismatched snapshots
Limitations
- ❌ Creates multiple PRs
- ❌ More complex merge workflow
- ❌ Potential for snapshot changes to mask real issues
Pattern 4: Manual Workflow Dispatch
name: Update Snapshots
on:
workflow_dispatch:
inputs:
update-snapshots:
description: 'Update snapshots'
type: boolean
default: false
test-pattern:
description: 'Test pattern (optional)'
type: string
required: false
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup
run: |
npm ci
npx playwright install --with-deps
- name: Run tests
run: |
if [ "${{ inputs.update-snapshots }}" = "true" ]; then
FLAGS="--update-snapshots"
fi
PATTERN="${{ inputs.test-pattern }}"
npx playwright test ${PATTERN} ${FLAGS}
Advantages
- ✅ Full manual control
- ✅ Can specify test patterns
- ✅ Simple to understand
Limitations
- ❌ Requires manual triggering
- ❌ Not integrated with CI failures
Third-Party Solutions
Currents.dev - Last Failed GitHub Action
Repository: currents-dev/playwright-last-failed
Purpose
Helps run last failed Playwright tests using Currents' cloud-based caching service.
Usage
- name: Playwright Last Failed action
id: last-failed-action
uses: currents-dev/playwright-last-failed@v1
with:
pw-output-dir: test-results
matrix-index: ${{ matrix.shard }}
matrix-total: ${{ strategy.job-total }}
How It Works
- Uses Currents' cloud service to persist failed test information
- Supports sharded tests via matrix parameters
- Enables selective rerun of failed tests across workflow retries
Advantages
- ✅ Works with sharded tests
- ✅ Persists across workflow runs
- ✅ Supports GitHub Actions retry mechanism
- ✅ Handles distributed testing
Limitations
- ❌ Requires Currents subscription (third-party paid service)
- ❌ Dependency on external service
- ❌ Data sent to third-party cloud
- ❌ Additional cost
- ❌ Vendor lock-in
Recommendation
Not suitable for this project due to:
- External service dependency
- Cost implications
- The current custom solution is already working well
Comparison and Recommendations
Feature Matrix
| Feature | Current Approach | --last-failed |
Currents | Comment Trigger Only |
|---|---|---|---|---|
| Works with sharded tests | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
| Persists across workflows | ✅ Yes | ❌ No | ✅ Yes | N/A |
| Selective reruns | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No (runs all) |
| No external dependencies | ✅ Yes | ✅ Yes | ❌ No | ✅ Yes |
| Simple implementation | ⚠️ Medium | ✅ Simple | ✅ Simple | ✅ Simple |
| Maintenance overhead | ⚠️ Medium | ✅ Low | ✅ Low | ✅ Low |
| Works in CI/CD | ✅ Yes | ⚠️ Workaround | ✅ Yes | ✅ Yes |
| Cost | ✅ Free | ✅ Free | ❌ Paid | ✅ Free |
| Supports multiple projects | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
Why --last-failed Isn't Suitable (Currently)
- Not designed for CI/CD: Playwright maintainers explicitly state it's for "inner loop scenario (local development)"
- Doesn't work with sharded tests: Each shard creates its own
.last-run.jsonwith no native merge - Clean environment issue: GitHub Actions starts fresh, losing
.last-run.json - Feature request pending: GitHub issue #30924 requests blob report integration (not yet implemented)
Recommendations
Short Term: Keep Current Approach
Verdict: The current custom manifest approach is the best solution for this project's needs.
Reasons:
- ✅ Works perfectly with sharded tests - merges results across 8 shards
- ✅ Persists across workflows - artifact storage for 7 days
- ✅ Selective reruns - only failed screenshot tests
- ✅ No external dependencies - fully self-contained
- ✅ Uses stable Playwright JSON format - typed via
@playwright/test/reporter - ✅ Already working well - proven in production
Minor Improvements:
// Add version check to warn if JSON schema changes
import { version } from '@playwright/test/package.json'
if (major(version) !== 1) {
console.warn('Playwright major version changed - verify JSON schema compatibility')
}
// Add more robust error handling
try {
const report: JSONReport = JSON.parse(raw)
} catch (error) {
throw new Error(`Failed to parse Playwright JSON report: ${error.message}`)
}
// Consider adding tests for the manifest builder
// e.g., tests/cicd/build-failed-screenshot-manifest.test.ts
Long Term: Monitor Playwright Development
Watch for these features:
- Blob report +
.last-run.jsonmerge - GitHub issue #30924 - Native CI/CD support for
--last-failed- may never happen (by design) - Report merging improvements - GitHub issue #33094
Migration path if native support improves:
# Future potential approach (if Playwright adds this feature)
- name: Merge reports with last-run
run: |
npx playwright merge-reports --reporter=html ./all-blob-reports
npx playwright merge-reports --reporter=last-failed ./all-blob-reports
- name: Upload merged last-run
uses: actions/upload-artifact@v4
with:
name: last-run-state
path: test-results/.last-run.json
# In update workflow
- name: Download last-run state
uses: actions/download-artifact@v4
with:
name: last-run-state
path: test-results/
- name: Update snapshots for failed tests
run: npx playwright test --last-failed --update-snapshots
However, this is speculative - Playwright maintainers have indicated --last-failed is not intended for CI/CD.
Alternative: Simplify to Full Suite Reruns
If the custom manifest becomes too complex to maintain, consider:
- name: Re-run ALL screenshot tests
run: |
# Simple grep-based filtering for screenshot tests
npx playwright test -g "screenshot" --update-snapshots
Trade-offs:
- ✅ Much simpler
- ✅ No custom scripts
- ❌ Slower (runs all screenshot tests, not just failed ones)
- ❌ Potentially updates snapshots that weren't actually failing
Conclusion
The current custom manifest approach is well-designed and appropriate for this project's requirements:
- Handles sharded tests - critical for CI performance
- Selective reruns - saves time and resources
- Stable implementation - uses documented Playwright JSON schema
- No external dependencies - fully controlled
While --last-failed is a nice feature for local development, Playwright's own documentation and maintainer comments confirm it's not suitable for distributed CI/CD testing, which is exactly what this project needs.
The only potentially better solution (Currents) requires a paid external service, which adds cost and complexity without significant benefits over the current approach.
Recommendation: Keep the current implementation, with minor improvements to error handling and documentation. Monitor Playwright development for native improvements, but don't expect --last-failed to become a viable alternative for this use case.
References
Official Playwright Documentation
Community Resources
- Playwright Solutions: How to Run Failures Only
- Medium: How to Run Only Last Failed Tests
- Medium: Streamlining Visual Regression Testing
GitHub Issues
- #30924 - Last-failed with blob reports
- #33094 - Merging main run with --last-failed
- #28254 - Feature request for --last-failed