mirror of
https://github.com/NVIDIA/nvbench.git
synced 2026-06-29 02:37:36 +00:00
* Reduce stdrel criterion complexity and ensure termination
Replace the stdrel criterion's growing sample history with an online
mean/variance accumulator. This keeps the stopping criterion based on
relative standard deviation, preserves the unbiased standard-deviation
estimate used for convergence, and reduces per-sample update work from
recomputing over the full history to constant time.
Add a bounded invalid-noise path so measurements that persistently produce
non-finite relative noise, such as all-zero timings, can terminate without
waiting for the wall-time timeout. Keep the normal min-time gate for ordinary
stdrel convergence.
Add focused tests for the online accumulator, stdrel sample-count threshold,
sample-standard-deviation behavior, deterministic convergence inputs, and
persistent invalid-noise termination. Update the CLI help for the stdrel
termination behavior.
* change max-noise to for consistency
* Use online_mean_variance on m_noise_tracker in is_finished()
Previously, standard deviation call was made using current
noise level instead of mean noise level. Because of identity
E[ (N - C)^2 ] =
E[ (N - E[N])^2 ] + (E[N] - C)^2 >= E[ (N - E[N])^2 ]
this led to criterion terminating later than it could have because
the estimated expectation is always greater of equal that the
estimate relative to the mean.
Code used current noise level instead of mean to avoid needing to
make two passed through m_noise_tracker container.
Use of online_mean_variance allows to improve accuracy of estimating
dispersion of noise signal while maintaining single pass through
container.
* Address review feedback
Fixed misleading commit. Introduce private methods to refactor
computation of repeated expressions.
Renamed m_cuda_times_summary to m_measurements_summary, since
criterion can be applied for CPU-only measurements too.
Introduced is_close utility for checking whether two floating
point numbers are closed to one another.
Introduced descriptive constexpr variables for hard-wired
constants
9.1 KiB
9.1 KiB
Queries
-
--list,-l- List all devices and benchmarks without running them.
-
--help,-h- Print usage information and exit.
-
--help-axes,--help-axis- Print axis specification documentation and exit.
-
--version- Print information about the version of NVBench used to build the executable.
Device Modification
-
--persistence-mode <state>,--pm <state>- Sets persistence mode for one or more GPU devices.
- Applies to the devices described by the most recent
--devicesoption, or all devices if--devicesis not specified. - This option requires root / admin permissions.
- This option is only supported on Linux.
- This call must precede all other device modification options, if any.
- Note that persistence mode is deprecated and will be removed at some point in favor of the new persistence daemon. See the following link for more details: https://docs.nvidia.com/deploy/driver-persistence/index.html
- Valid values for
stateare:0: Disable persistence mode.1: Enable persistence mode.
-
--lock-gpu-clocks <rate>,--lgc <rate>- Lock GPU clocks for one or more devices to a particular rate.
- Applies to the devices described by the most recent
--devicesoption, or all devices if--devicesis not specified. - This option requires root / admin permissions.
- This option is only supported in Volta+ (sm_70+) devices.
- Valid values for
rateare:reset,unlock,none: Unlock the GPU clocks.base,tdp: Lock clocks to base frequency (best for stable results).max,maximum: Lock clocks to max frequency (best for fastest results).
Output
-
--csv <filename/stream>- Write CSV output to a file, or "stdout" / "stderr".
-
--json <filename/stream>- Write JSON output to a file, or "stdout" / "stderr".
-
--markdown <filename/stream>,--md <filename/stream>- Write markdown output to a file, or "stdout" / "stderr".
- Markdown is written to "stdout" by default.
-
--quiet,-q- Suppress output.
-
--color- Use color in output (markdown + stdout only).
Benchmark / Axis Specification
-
--benchmark <benchmark name/index>,-b <benchmark name/index>- Execute a specific benchmark.
- Argument is a benchmark name or index, taken from
--list. - If not specified, all benchmarks will run.
--benchmarkmay be specified multiple times to run several benchmarks.- The same benchmark may be specified multiple times with different configurations.
-
--axis <axis specification>,-a <axis specification>- Override an axis specification.
- See
--help-axisfor details on axis specifications. - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
Benchmark Properties
-
--devices <device ids>,--device <device ids>,-d <device ids>- Limit execution to one or more devices.
<device ids>is a single id, a comma separated list, or the string "all".- Device ids can be obtained from
--list. - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--skip-time <seconds>- Skip a measurement when a warmup run executes in less than
<seconds>. - Default is -1 seconds (disabled).
- Intended for testing / debugging only.
- Very fast kernels (<5us) often require an extremely large number of samples
to converge
max-noise. This option allows them to be skipped to save time during testing. - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Skip a measurement when a warmup run executes in less than
-
--cold-warmup-runs <count>- Execute up to
<count>warmup runs before collecting cold measurement samples. - The minimum is 1 warmup run.
- Default is 1 warmup run.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Execute up to
-
--cold-max-warmup-walltime <seconds>- Stop cold warmup after the total warmup walltime exceeds
<seconds>. - The limit is checked after each warmup run, so actual warmup time may exceed this value by one warmup run.
- Default is -1 seconds (disabled).
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Stop cold warmup after the total warmup walltime exceeds
-
--throttle-threshold <value>- Set the GPU throttle threshold as percentage of the device's default clock rate.
- Default is 75.
- Set to 0 to disable throttle detection entirely.
- Note that throttling is disabled when
nvbench::exec_tag::syncis used. - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--throttle-recovery-delay <value>- Set the GPU throttle recovery delay in seconds.
- Default is 0.05 seconds.
- Note that throttling is disabled when
nvbench::exec_tag::syncis used. - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--profile- Only run each benchmark once.
- Disable any instrumentation that may interfere with profilers.
- Intended for use with external profiling tools.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--no-batch- Do not run batched measurements even if enabled.
- Intended to shorten run-time when batched measurements are not of interest.
- Applied to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
Measurement Collection
-
--timeout <seconds>- Measurements will timeout after
<seconds>have elapsed. - Default is 15 seconds.
<seconds>is walltime, not accumulated sample time.- If a measurement times out, the default markdown log will print a warning to report any outstanding termination criteria (min samples, min time, max noise).
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Measurements will timeout after
-
--min-samples <count>- Gather at least
<count>samples per measurement before checking any other stopping criterion besides the timeout. - Default is 10 samples.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Gather at least
Stopping Criteria
--stopping-criterion <criterion>- After
--min-samplesis satisfied, use<criterion>to detect if enough samples were collected. - Only applies to Cold and CPU-only measurements.
- If both GPU and CPU times are gathered, GPU time is used for stopping analysis.
- Stopping criteria provided by NVBench are:
- "stdrel": (default) Stops when relative standard deviation falls below
--max-noise, or when the noise estimate stabilizes without reaching that threshold. - "entropy": Stops when the entropy estimate of all collected samples converges.
- "sample-count": Stops after a target number of samples.
- "stdrel": (default) Stops when relative standard deviation falls below
- Each stopping criterion may provide additional parameters to customize behavior, as detailed below:
- After
"stdrel" Stopping Criterion Parameters
-
--min-time <seconds>- Require at least
<seconds>of accumulated execution time beforestdrelcan stop based on the relative standard deviation estimate, either because it falls below--max-noiseor because it stabilizes above that threshold. To avoid running indefinitely when relative standard deviation cannot be computed reliably, NVBench may also stop earlier after repeated non-finite noise estimates. - Only applies to
stdrelstopping criterion. - Default is 0.5 seconds.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Require at least
-
--max-noise <value>- Target relative standard deviation (stdev/mean).
stdrelstops once the estimate falls below this value, or once the estimate has stabilized above it and additional samples are unlikely to improve convergence. - Noise is specified as the percent relative standard deviation.
- Default is 0.5% (
--max-noise 0.5) - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Target relative standard deviation (stdev/mean).
"entropy" Stopping Criterion Parameters
-
--max-angle <value>- Maximum linear regression angle of cumulative entropy.
- Smaller values give more accurate results.
- Default is 0.048.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--min-r2 <value>- Minimum coefficient of determination for linear regression of cumulative entropy.
- Larger values give more accurate results.
- Default is 0.36.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
"sample-count" Stopping Criterion Parameters
--target-samples <count>- Stop after at least
<count>samples are collected. - Default is 100 samples.
- The total number of collected samples is
max(--min-samples, --target-samples). - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Stop after at least