ComfyUI_frontend

Comfy-Org/ComfyUI_frontend

Fork 0

mirror of https://github.com/Comfy-Org/ComfyUI_frontend.git synced 2026-05-03 04:31:58 +00:00

Commit Graph

Author	SHA1	Message	Date
Christian Byrne	437f41c553	perf: add layout/GC metrics + reduce false positives in regression detection (#10477 ) ## Summary Add layout duration, style recalc duration, and heap usage metrics to CI perf reports, while improving statistical reliability to reduce false positive regressions. ## Changes - What: - Collect `layoutDurationMs`, `styleRecalcDurationMs`, `heapUsedBytes` (absolute snapshot) alongside existing metrics - Add effect size gate (`minAbsDelta`) for integer-quantized count metrics (style recalcs, layouts, DOM nodes, event listeners) — prevents z=7.2 false positives from e.g. 11→12 style recalcs - Switch from mean to median for PR metric aggregation — robust to outlier CI runs that dominate n=3 mean - Increase historical baseline window from 5 to 15 runs for more stable σ estimates - Reorder reported metrics: layout/style duration first (actionable), counts and heap after (informational) ## Review Focus The effect size gate in `classifyChange()` — it now requires both z > 2 AND absolute delta ≥ `minAbsDelta` (when configured) to flag a regression. This addresses the core false positive issue where integer metrics with near-zero historical variance produce extreme z-scores for trivial changes. Median vs mean tradeoff: median is more robust to outliers but less sensitive to real shifts — acceptable given n=3 and CI noise levels. ┆Issue is synchronized with this [Notion page](https://www.notion.so/PR-10477-perf-add-layout-GC-metrics-reduce-false-positives-in-regression-detection-32d6d73d365081daa72cec96d8a07b90) by [Unito](https://www.unito.io)	2026-03-25 10:16:56 -07:00
Christian Byrne	2af3940867	feat: add trend visualization with sparklines to perf report (#9939 ) ## Summary Add historical trend visualization (ASCII sparklines + directional arrows) to the performance PR report, showing how each metric has moved over recent commits on main. ## Changes - What: New `sparkline()`, `trendDirection()`, `trendArrow()` functions in `perf-stats.ts`. New collapsible "Trend" section in the perf report showing per-metric sparklines, direction indicators, and latest values. CI workflow updated to download historical data from the `perf-data` orphan branch and switched to `setup-frontend` action with `pnpm exec tsx`. ## Review Focus - The trend section only renders when ≥3 historical data points exist (gracefully absent otherwise) - `trendDirection()` uses a split-half mean comparison with ±10% threshold — review whether this sensitivity is appropriate - The `git archive` step in `pr-perf-report.yaml` is idempotent and fails silently if no perf-history data exists yet on the perf-data branch ┆Issue is synchronized with this [Notion page](https://www.notion.so/PR-9939-feat-add-trend-visualization-with-sparklines-to-perf-report-3246d73d36508125a6fcc39612f850fe) by [Unito](https://www.unito.io) --------- Co-authored-by: GitHub Action <action@github.com>	2026-03-17 07:10:30 -07:00
Christian Byrne	7b316eb9a2	feat: add statistical significance to perf report with z-score thresholds (#9305 ) ## Summary Replace fixed 10%/20% perf delta thresholds with dynamic σ-based classification using z-scores, eliminating false alarms from naturally noisy duration metrics (10-17% CV). ## Changes - What: - Run each perf test 3× (`--repeat-each=3`) and report the mean, reducing single-run noise - Download last 5 successful main branch perf artifacts to compute historical μ/σ per metric - Replace fixed threshold flags with z-score significance: `⚠️ regression` (z>2), `✅ neutral/improvement`, `🔇 noisy` (CV>50%) - Add collapsible historical variance table (μ, σ, CV) to PR comment - Graceful cold start: falls back to simple delta table until ≥2 historical runs exist - New `scripts/perf-stats.ts` module with `computeStats`, `zScore`, `classifyChange` - 18 unit tests for stats functions - CI time impact: ~3 min → ~5-6 min (repeat-each adds ~2 min, historical download <10s) ## Review Focus - The `gh api` call in the new "Download historical perf baselines" step: it queries the last 5 successful push runs on the base branch. The `gh` CLI is available natively on `ubuntu-latest` runners and auto-authenticates with `GITHUB_TOKEN`. - `getHistoricalStats` averages per-run measurements before computing cross-run σ — this is intentional since historical artifacts may also contain repeated measurements after this change lands. - The `noisy` classification (CV>50%) suppresses metrics like `layouts` that hover near 0 and have meaningless percentage swings. ┆Issue is synchronized with this [Notion page](https://www.notion.so/PR-9305-feat-add-statistical-significance-to-perf-report-with-z-score-thresholds-3156d73d3650818d9360eeafd9ae7dc1) by [Unito](https://www.unito.io)	2026-03-04 16:16:53 -08:00

Author

SHA1

Message

Date

Christian Byrne

437f41c553

perf: add layout/GC metrics + reduce false positives in regression detection (#10477 )

## Summary

Add layout duration, style recalc duration, and heap usage metrics to CI
perf reports, while improving statistical reliability to reduce false
positive regressions.

## Changes

- **What**:
- Collect `layoutDurationMs`, `styleRecalcDurationMs`, `heapUsedBytes`
(absolute snapshot) alongside existing metrics
- Add effect size gate (`minAbsDelta`) for integer-quantized count
metrics (style recalcs, layouts, DOM nodes, event listeners) — prevents
z=7.2 false positives from e.g. 11→12 style recalcs
- Switch from mean to **median** for PR metric aggregation — robust to
outlier CI runs that dominate n=3 mean
- Increase historical baseline window from **5 to 15 runs** for more
stable σ estimates
- Reorder reported metrics: layout/style duration first (actionable),
counts and heap after (informational)

## Review Focus

The effect size gate in `classifyChange()` — it now requires both z > 2
AND absolute delta ≥ `minAbsDelta` (when configured) to flag a
regression. This addresses the core false positive issue where integer
metrics with near-zero historical variance produce extreme z-scores for
trivial changes.

Median vs mean tradeoff: median is more robust to outliers but less
sensitive to real shifts — acceptable given n=3 and CI noise levels.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-10477-perf-add-layout-GC-metrics-reduce-false-positives-in-regression-detection-32d6d73d365081daa72cec96d8a07b90)
by [Unito](https://www.unito.io)

2026-03-25 10:16:56 -07:00

Christian Byrne

2af3940867

feat: add trend visualization with sparklines to perf report (#9939 )

## Summary

Add historical trend visualization (ASCII sparklines + directional
arrows) to the performance PR report, showing how each metric has moved
over recent commits on main.

## Changes

- **What**: New `sparkline()`, `trendDirection()`, `trendArrow()`
functions in `perf-stats.ts`. New collapsible "Trend" section in the
perf report showing per-metric sparklines, direction indicators, and
latest values. CI workflow updated to download historical data from the
`perf-data` orphan branch and switched to `setup-frontend` action with
`pnpm exec tsx`.

## Review Focus

- The trend section only renders when ≥3 historical data points exist
(gracefully absent otherwise)
- `trendDirection()` uses a split-half mean comparison with ±10%
threshold — review whether this sensitivity is appropriate
- The `git archive` step in `pr-perf-report.yaml` is idempotent and
fails silently if no perf-history data exists yet on the perf-data
branch

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-9939-feat-add-trend-visualization-with-sparklines-to-perf-report-3246d73d36508125a6fcc39612f850fe)
by [Unito](https://www.unito.io)

---------

Co-authored-by: GitHub Action <action@github.com>

2026-03-17 07:10:30 -07:00

Christian Byrne

7b316eb9a2

feat: add statistical significance to perf report with z-score thresholds (#9305 )

## Summary

Replace fixed 10%/20% perf delta thresholds with dynamic σ-based
classification using z-scores, eliminating false alarms from naturally
noisy duration metrics (10-17% CV).

## Changes

- **What**:
- Run each perf test 3× (`--repeat-each=3`) and report the mean,
reducing single-run noise
- Download last 5 successful main branch perf artifacts to compute
historical μ/σ per metric
- Replace fixed threshold flags with z-score significance: `⚠️
regression` (z>2), `✅ neutral/improvement`, `🔇 noisy` (CV>50%)
  - Add collapsible historical variance table (μ, σ, CV) to PR comment
- Graceful cold start: falls back to simple delta table until ≥2
historical runs exist
- New `scripts/perf-stats.ts` module with `computeStats`, `zScore`,
`classifyChange`
  - 18 unit tests for stats functions

- **CI time impact**: ~3 min → ~5-6 min (repeat-each adds ~2 min,
historical download <10s)

## Review Focus

- The `gh api` call in the new "Download historical perf baselines"
step: it queries the last 5 successful push runs on the base branch. The
`gh` CLI is available natively on `ubuntu-latest` runners and
auto-authenticates with `GITHUB_TOKEN`.
- `getHistoricalStats` averages per-run measurements before computing
cross-run σ — this is intentional since historical artifacts may also
contain repeated measurements after this change lands.
- The `noisy` classification (CV>50%) suppresses metrics like `layouts`
that hover near 0 and have meaningless percentage swings.

┆Issue is synchronized with this [Notion
page](https://www.notion.so/PR-9305-feat-add-statistical-significance-to-perf-report-with-z-score-thresholds-3156d73d3650818d9360eeafd9ae7dc1)
by [Unito](https://www.unito.io)

2026-03-04 16:16:53 -08:00

3 Commits