mirror of
https://github.com/Comfy-Org/ComfyUI_frontend.git
synced 2026-03-08 06:30:04 +00:00
feat: add statistical significance to perf report with z-score thresholds (#9305)
## Summary Replace fixed 10%/20% perf delta thresholds with dynamic σ-based classification using z-scores, eliminating false alarms from naturally noisy duration metrics (10-17% CV). ## Changes - **What**: - Run each perf test 3× (`--repeat-each=3`) and report the mean, reducing single-run noise - Download last 5 successful main branch perf artifacts to compute historical μ/σ per metric - Replace fixed threshold flags with z-score significance: `⚠️ regression` (z>2), `✅ neutral/improvement`, `🔇 noisy` (CV>50%) - Add collapsible historical variance table (μ, σ, CV) to PR comment - Graceful cold start: falls back to simple delta table until ≥2 historical runs exist - New `scripts/perf-stats.ts` module with `computeStats`, `zScore`, `classifyChange` - 18 unit tests for stats functions - **CI time impact**: ~3 min → ~5-6 min (repeat-each adds ~2 min, historical download <10s) ## Review Focus - The `gh api` call in the new "Download historical perf baselines" step: it queries the last 5 successful push runs on the base branch. The `gh` CLI is available natively on `ubuntu-latest` runners and auto-authenticates with `GITHUB_TOKEN`. - `getHistoricalStats` averages per-run measurements before computing cross-run σ — this is intentional since historical artifacts may also contain repeated measurements after this change lands. - The `noisy` classification (CV>50%) suppresses metrics like `layouts` that hover near 0 and have meaningless percentage swings. ┆Issue is synchronized with this [Notion page](https://www.notion.so/PR-9305-feat-add-statistical-significance-to-perf-report-with-z-score-thresholds-3156d73d3650818d9360eeafd9ae7dc1) by [Unito](https://www.unito.io)
This commit is contained in:
28
.github/workflows/ci-perf-report.yaml
vendored
28
.github/workflows/ci-perf-report.yaml
vendored
@@ -45,7 +45,7 @@ jobs:
|
||||
- name: Run performance tests
|
||||
id: perf
|
||||
continue-on-error: true
|
||||
run: pnpm exec playwright test --project=performance --workers=1
|
||||
run: pnpm exec playwright test --project=performance --workers=1 --repeat-each=3
|
||||
|
||||
- name: Upload perf metrics
|
||||
if: always()
|
||||
@@ -61,6 +61,7 @@ jobs:
|
||||
if: github.event_name == 'pull_request'
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
actions: read
|
||||
contents: read
|
||||
pull-requests: write
|
||||
|
||||
@@ -90,6 +91,31 @@ jobs:
|
||||
path: temp/perf-baseline/
|
||||
if_no_artifact_found: warn
|
||||
|
||||
- name: Download historical perf baselines
|
||||
continue-on-error: true
|
||||
run: |
|
||||
RUNS=$(gh api \
|
||||
"/repos/${{ github.repository }}/actions/workflows/ci-perf-report.yaml/runs?branch=${{ github.event.pull_request.base.ref }}&event=push&status=success&per_page=5" \
|
||||
--jq '.workflow_runs[].id' || true)
|
||||
|
||||
if [ -z "$RUNS" ]; then
|
||||
echo "No historical runs available"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
mkdir -p temp/perf-history
|
||||
INDEX=0
|
||||
for RUN_ID in $RUNS; do
|
||||
DIR="temp/perf-history/$INDEX"
|
||||
mkdir -p "$DIR"
|
||||
gh run download "$RUN_ID" -n perf-metrics -D "$DIR/" 2>/dev/null || true
|
||||
INDEX=$((INDEX + 1))
|
||||
done
|
||||
|
||||
echo "Downloaded $(ls temp/perf-history/*/perf-metrics.json 2>/dev/null | wc -l) historical baselines"
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: Generate perf report
|
||||
run: npx --yes tsx scripts/perf-report.ts > perf-report.md
|
||||
|
||||
|
||||
Reference in New Issue
Block a user