Commit Graph

5 Commits

Author SHA1 Message Date
Christian Byrne
e7c2cd04f4 perf: add FPS, p95 frame time, and target thresholds to CI perf report (#10516)
## Summary

Enhances the CI performance report with explicit FPS metrics, percentile
frame times, and milestone target thresholds.

### Changes

**PerformanceHelper** (data collection):
- `measureFrameDurations()` now returns individual frame durations
instead of just the average, enabling percentile computation
- Computes `p95FrameDurationMs` from sorted frame durations
- Strips `allFrameDurationsMs` from serialized JSON to avoid bloating
artifacts

**perf-report.ts** (report rendering):
- **Headline summary** at top of report with key metrics per test
scenario
- **FPS display**: derives avg FPS and P5 FPS from frame duration
metrics
- **Target thresholds**: shows P5 FPS ≥ 52 target with / pass/fail
indicator
- **p95 frame time**: added as a tracked metric in the comparison table
- Metrics reordered to show frame time/FPS first (what people look for)

### Target

From the Nodes 2.0 Perf milestone: **P5 ≥ 52 FPS** on 245-node workflow
(equivalent to P95 frame time ≤ 19.2ms).

### Example headline output

```
> **vue-large-graph-pan**: 60 avg FPS · 58 P5 FPS  (target: ≥52) · 12ms TBT · 45.2 MB heap
> **canvas-zoom-sweep**: 45 avg FPS · 38 P5 FPS  (target: ≥52) · 85ms TBT · 52.1 MB heap
```

Follow-up to #10477 (merged).

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10516-perf-add-FPS-p95-frame-time-and-target-thresholds-to-CI-perf-report-32e6d73d365081a2a2a6ceae7d6e9be5)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
2026-03-28 23:29:19 -07:00
Christian Byrne
437f41c553 perf: add layout/GC metrics + reduce false positives in regression detection (#10477)
## Summary

Add layout duration, style recalc duration, and heap usage metrics to CI
perf reports, while improving statistical reliability to reduce false
positive regressions.

## Changes

- **What**:
- Collect `layoutDurationMs`, `styleRecalcDurationMs`, `heapUsedBytes`
(absolute snapshot) alongside existing metrics
- Add effect size gate (`minAbsDelta`) for integer-quantized count
metrics (style recalcs, layouts, DOM nodes, event listeners) — prevents
z=7.2 false positives from e.g. 11→12 style recalcs
- Switch from mean to **median** for PR metric aggregation — robust to
outlier CI runs that dominate n=3 mean
- Increase historical baseline window from **5 to 15 runs** for more
stable σ estimates
- Reorder reported metrics: layout/style duration first (actionable),
counts and heap after (informational)

## Review Focus

The effect size gate in `classifyChange()` — it now requires both z > 2
AND absolute delta ≥ `minAbsDelta` (when configured) to flag a
regression. This addresses the core false positive issue where integer
metrics with near-zero historical variance produce extreme z-scores for
trivial changes.

Median vs mean tradeoff: median is more robust to outliers but less
sensitive to real shifts — acceptable given n=3 and CI noise levels.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10477-perf-add-layout-GC-metrics-reduce-false-positives-in-regression-detection-32d6d73d365081daa72cec96d8a07b90)
by [Unito](https://www.unito.io)
2026-03-25 10:16:56 -07:00
Christian Byrne
c420cff4f0 feat: add TBT/frameDuration metrics and new perf test scenarios (#9910)
## Summary

Adds Total Blocking Time (TBT) and frame duration metrics to the
performance testing infrastructure, plus three new test scenarios
covering zoom, pan, and many-nodes-idle.

## Changes

### New Metrics
- **`totalBlockingTimeMs`** — Computed from PerformanceObserver
`longtask` entries: `sum(duration - 50ms)` for tasks >50ms. Measures
main thread blocking.
- **`frameDurationMs`** — Average frame duration via rAF timing (16.67ms
= 60fps target). Measures rendering smoothness.

### New Test Scenarios
| Scenario | Description |
|---|---|
| `canvas-zoom-sweep` | 10 zoom-in + 10 zoom-out cycles on default
workflow |
| `canvas-pan-many-nodes` | 10 pan sweeps over 100-node workflow |
| `canvas-many-nodes-idle` | 2-second idle measurement with 100 nodes
rendered |

### Infrastructure
- `PerformanceHelper.ts`: Installs PerformanceObserver for longtask,
collects TBT, measures frame duration via rAF
- `perf-report.ts`: Reports TBT and frame duration in PR comment tables
- `browser_tests/assets/perf/many_nodes_100.json`: 100-node (10×10 grid)
test fixture

## Review Focus
- TBT collection clears entries at `startMeasuring()` and reads at
`stopMeasuring()` — ensure no race with observer buffering
- Frame duration sampling uses 10 frames — enough for signal without
slowing tests

Depends on: #9886, #9887

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-9910-feat-add-TBT-frameDuration-metrics-and-new-perf-test-scenarios-3236d73d365081488ae3c594a8bf7cff)
by [Unito](https://www.unito.io)
2026-03-15 08:54:00 -07:00
Christian Byrne
21069d54e7 feat: expand CDP perf metrics — add DOM nodes, script duration, event listeners (#9887)
## Summary

Expands the performance testing infrastructure to collect 4 additional
CDP metrics that are already returned by `Performance.getMetrics` but
were not being read. This is a zero-cost expansion — no additional CDP
calls, just reading more fields from the existing response.

## New Metrics

| Metric | CDP Source | What It Detects |
|---|---|---|
| `domNodes` | `Nodes` | DOM node count delta — widget DOM leaks during
node create/destroy |
| `jsHeapTotalBytes` | `JSHeapTotalSize` | Total heap delta — combined
with `heapDeltaBytes` shows GC pressure |
| `scriptDurationMs` | `ScriptDuration` | JS execution time vs total
task time — script vs rendering balance |
| `eventListeners` | `JSEventListeners` | Listener count delta — detects
listener accumulation across lifecycle |

## Changes

### `browser_tests/fixtures/helpers/PerformanceHelper.ts`
- Added 4 fields to `PerfSnapshot` interface
- Added 4 fields to `PerfMeasurement` interface
- Wired through `getSnapshot()` and `stopMeasuring()`

### `scripts/perf-report.ts`
- Added 4 fields to `PerfMeasurement` interface
- Expanded `MetricKey` type and `REPORTED_METRICS` array with 3 new
reported metrics (`domNodes`, `scriptDurationMs`, `eventListeners`)
- `jsHeapTotalBytes` is collected but not in `REPORTED_METRICS` — it's
used alongside `heapDeltaBytes` for GC pressure ratio analysis

## Why These 4

From a gap analysis of all ~30 CDP metrics, these were identified as
highest priority for ComfyUI:
- **`Nodes`** (P0): ComfyUI dynamically creates/destroys widget DOM. DOM
bloat from leaked widgets is a key performance risk, especially for Vue
Nodes 2.0.
- **`ScriptDuration`** (P1): Separates JS execution from layout/paint.
Reveals whether perf issues are script-heavy or rendering-heavy.
- **`JSEventListeners`** (P1): Widget lifecycle can leak listeners
across node add/remove cycles.
- **`JSHeapTotalSize`** (P1): With `JSHeapUsedSize`, the ratio shows GC
fragmentation pressure.

## Backward Compatibility

The `PerfMeasurement` interface is extended (not changed). Old baseline
`perf-metrics.json` files without these fields will have `undefined`
values, which the report script handles gracefully (shows `—` for
missing data).

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-9887-feat-expand-CDP-perf-metrics-add-DOM-nodes-script-duration-event-listeners-3226d73d3650818abea1d4a441667c38)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: Alexander Brown <drjkl@comfy.org>
2026-03-14 23:24:02 -07:00
Christian Byrne
8e215b3174 feat: add performance testing infrastructure with CDP metrics (#9170)
## Summary

Add a permanent, non-failing performance regression detection system
using Chrome DevTools Protocol metrics, with automatic PR commenting.

## Changes

- **What**: Performance testing infrastructure — `PerformanceHelper`
fixture class using CDP `Performance.getMetrics` to collect
`RecalcStyleCount`, `LayoutCount`, `LayoutDuration`, `TaskDuration`,
`JSHeapUsedSize`. Adds `@perf` Playwright project (Chromium-only,
single-threaded, 60s timeout), 4 baseline perf tests, CI workflow with
sticky PR comment reporting, and `perf-report.js` script for generating
markdown comparison tables.

## Review Focus

- `PerformanceHelper` uses `page.context().newCDPSession(page)` — CDP is
Chromium-only, so perf metrics are not collected on Firefox. This is
intentional since CDP gives us browser-level style recalc/layout counts
that `performance.mark/measure` cannot capture.
- The CI workflow uses `continue-on-error: true` so perf tests never
block merging.
- Baseline comparison uses `dawidd6/action-download-artifact` to
download metrics from the target branch, following the same pattern as
`pr-size-report.yaml`.

## Stack

This is the foundation PR for the Firefox performance fix stack:
1. **→ This PR: perf testing infrastructure**
2. `perf/fix-cursor-cache` — cursor style caching (depends on this)
3. `perf/fix-subgraph-svg` — SVG pre-rasterization (depends on this)
4. `perf/fix-clippath-raf` — RAF batching for clip-path (depends on
this)

PRs 2-4 are independent of each other.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-9170-feat-add-performance-testing-infrastructure-with-CDP-metrics-3116d73d3650817cb43def6f8e9917f8)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: Alexander Brown <drjkl@comfy.org>
2026-02-25 20:09:57 -08:00