mirror of
https://github.com/Comfy-Org/ComfyUI_frontend.git
synced 2026-04-20 14:30:41 +00:00
Stabilize flaky Playwright tests by improving test reliability. This PR aims to identify and fix flaky e2e tests. ┆Issue is synchronized with this [Notion page](https://www.notion.so/PR-10817-test-stabilize-flaky-Playwright-tests-3366d73d365081ada40de73ce11af625) by [Unito](https://www.unito.io) --------- Co-authored-by: Amp <amp@ampcode.com>
3.8 KiB
3.8 KiB
Browser Test Flake Prevention Rules
Reference this file as @browser_tests/FLAKE_PREVENTION_RULES.md when
debugging or updating flaky Playwright tests.
These rules are distilled from the PR 10817 stabilization thread chain. They exist to make flaky-test triage faster and more repeatable.
Quick Checklist
Before merging a flaky-test fix, confirm all of these are true:
- the latest CI artifact was inspected directly
- the root cause is stated as a race or readiness mismatch
- the fix waits on the real readiness boundary
- the assertion primitive matches the job
- the fix stays local unless a shared helper truly owns the race
- local verification uses a targeted rerun
1. Start With CI Evidence
- Do not trust the top-level GitHub check result alone.
- Inspect the latest Playwright
report.jsondirectly, even on a green run. - Treat tests marked
flakyinreport.jsonas real work. - Use
error-context.md, traces, and page snapshots before editing code. - Pull the newest run after each push instead of assuming the flaky set is unchanged.
2. Wait For The Real Readiness Boundary
- Visible is not always ready.
- If the behavior depends on internal state, wait on that state.
- After canvas interactions, call
await comfyPage.nextFrame()unless the helper already guarantees a settled frame. - After workflow reloads or node-definition refreshes, wait for the reload to finish before continuing.
Common readiness boundaries:
node.imgspopulated before opening image context menus- settings cleanup finished before asserting persisted state
- locale-triggered workflow reload finished before selecting nodes
- real builder UI ready, not transient helper metadata
3. Choose The Smallest Correct Assertion
- Use built-in retrying locator assertions when locator state is the behavior.
- Use
expect.poll()for a single async value. - Use
expect(async () => { ... }).toPass()only when multiple assertions must settle together. - Do not make immediate assertions after async UI mutations, settings writes, clipboard writes, or graph updates.
- Never use
waitForTimeout()to hide a race.
await expect
.poll(() => comfyPage.settings.getSetting('Comfy.NodeLibrary.Bookmarks.V2'))
.toEqual([])
4. Prefer Behavioral Assertions
- Use screenshots only when appearance is the behavior under test.
- If a screenshot only indirectly proves behavior, replace it with a direct assertion.
- Prefer assertions on link counts, positions, visible menu items, persisted settings, and node state.
5. Keep Helper Changes Narrow
- Shared helpers should drive setup to a stable boundary.
- Do not encode one-spec timing assumptions into generic helpers.
- If a race only matters to one spec, prefer a local wait in that spec.
- If a helper fails before the real test begins, remove or relax the brittle precondition and let downstream UI interaction prove readiness.
6. Verify Narrowly
- Prefer targeted reruns through
pnpm test:browser:local. - On Windows, prefer
file:lineor whole-spec arguments over--grepwhen the wrapper has quoting issues. - Use
--repeat-each 5for targeted flake verification unless the failure needs a different reproduction pattern. - Verify with the smallest command that exercises the flaky path.
Current Local Noise
These are local distractions, not automatic CI root causes:
- missing local input fixture files required by the test path
- missing local models directory
- teardown
EPERMwhile restoring the local browser-test user data directory - local screenshot baseline differences on Windows
Rules for handling local noise:
- first confirm whether it blocks the exact flaky path under investigation
- do not commit temporary local assets used only for verification
- do not commit local screenshot baselines