Restrict stopping criterion parameter usage in command line (#174)

* restrict stopping criterion parameter usage in command line
* Update docs for stopping criterion.
* Add convenience benchmark_base API for criterion params.
* Add more test cases for stopping criterion parsing.

---------

Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com>
Co-authored-by: Allison Piper <alliepiper16@gmail.com>
This commit is contained in:
Sergey Pavlov
2025-04-30 23:53:45 +04:00
committed by GitHub
parent ca0e795b46
commit 433376fd83
9 changed files with 482 additions and 88 deletions

View File

@@ -83,36 +83,6 @@
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--min-samples <count>`
* Gather at least `<count>` samples per measurement.
* Default is 10 samples.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--stopping-criterion <criterion>`
* After `--min-samples` is satisfied, use `<criterion>` to detect if enough
samples were collected.
* Only applies to Cold measurements.
* Default is stdrel (`--stopping-criterion stdrel`)
* `--min-time <seconds>`
* Accumulate at least `<seconds>` of execution time per measurement.
* Only applies to `stdrel` stopping criterion.
* Default is 0.5 seconds.
* If both GPU and CPU times are gathered, this applies to GPU time only.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--max-noise <value>`
* Gather samples until the error in the measurement drops below `<value>`.
* Noise is specified as the percent relative standard deviation.
* Default is 0.5% (`--max-noise 0.5`)
* Only applies to `stdrel` stopping criterion.
* Only applies to Cold measurements.
* If both GPU and CPU times are gathered, this applies to GPU noise only.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--skip-time <seconds>`
* Skip a measurement when a warmup run executes in less than `<seconds>`.
* Default is -1 seconds (disabled).
@@ -123,16 +93,6 @@
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--timeout <seconds>`
* Measurements will timeout after `<seconds>` have elapsed.
* Default is 15 seconds.
* `<seconds>` is walltime, not accumulated sample time.
* If a measurement times out, the default markdown log will print a warning to
report any outstanding termination criteria (min samples, min time, max
noise).
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--throttle-threshold <value>`
* Set the GPU throttle threshold as percentage of the device's default clock rate.
* Default is 75.
@@ -166,3 +126,68 @@
* Intended for use with external profiling tools.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
## Stopping Criteria
* `--timeout <seconds>`
* Measurements will timeout after `<seconds>` have elapsed.
* Default is 15 seconds.
* `<seconds>` is walltime, not accumulated sample time.
* If a measurement times out, the default markdown log will print a warning to
report any outstanding termination criteria (min samples, min time, max
noise).
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--min-samples <count>`
* Gather at least `<count>` samples per measurement before checking any
other stopping criterion besides the timeout.
* Default is 10 samples.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--stopping-criterion <criterion>`
* After `--min-samples` is satisfied, use `<criterion>` to detect if enough
samples were collected.
* Only applies to Cold and CPU-only measurements.
* If both GPU and CPU times are gathered, GPU time is used for stopping
analysis.
* Stopping criteria provided by NVBench are:
* "stdrel": (default) Converges to a minimal relative standard deviation,
stdev / mean
* "entropy": Converges based on the cumulative entropy of all samples.
* Each stopping criterion may provide additional parameters to customize
behavior, as detailed below:
### "stdrel" Stopping Criterion Parameters
* `--min-time <seconds>`
* Accumulate at least `<seconds>` of execution time per measurement.
* Only applies to `stdrel` stopping criterion.
* Default is 0.5 seconds.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--max-noise <value>`
* Gather samples until the error in the measurement drops below `<value>`.
* Noise is specified as the percent relative standard deviation (stdev/mean).
* Default is 0.5% (`--max-noise 0.5`)
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
### "entropy" Stopping Criterion Parameters
* `--max-angle <value>`
* Maximum linear regression angle of cumulative entropy.
* Smaller values give more accurate results.
* Default is 0.048.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--min-r2 <value>`
* Minimum coefficient of determination for linear regression of cumulative
entropy.
* Larger values give more accurate results.
* Default is 0.36.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.