Files
nvbench/docs/cli_help.md
Oleksandr Pavlyk 9ea77bccaa Implement CLI option to control warmups for cold measurements (#339)
* Implement warmup-runs count, supported as CLI

CLI option --warmup-runs implemented and documented.

The warm-up counts is enforced to always be positive.
This is necessary to ensure that JIT-ting has occurred,
and use of blocking kernel would not result in time-outs.

Test is option parser is added.

* Ensure that measure_cold::run_warmup instantiates blocking kernel

Because warm-up runs are executed without use of blocking kernel,
the blocking kernel was not jitted until actual measurements were
collected. The module loading cost incurred during the first run
shows as elevated CPU time noise value for the first measurement
as noted in https://github.com/NVIDIA/nvbench/pull/339

This PR adds `this->block_stream(); this->unblock_stream();` prior
to executing warm-up loop with use of blocking kernel disabled.

This ensures that blocking kernel is instantiated during the warm-up,
but it no other kernel is launched between its launch and stream sync
thus avoiding deadlocking.

* Rename --warmup-runs to --cold-warmup-runs, add --cold-max-warmup-walltime

Since configurable number of warmups only applies to measure_cold.cuh
rename the CLI option to reflect that.

Also add --cold-max-warmup-walltime (defaults to -1, i.e. disabled).
If enabled, exits warmup loop before request count is reached if
the wall-time expanded executign warmups exceeds this max-warmup-walltime
value.
2026-05-12 14:30:08 -05:00

203 lines
8.1 KiB
Markdown

# Queries
* `--list`, `-l`
* List all devices and benchmarks without running them.
* `--help`, `-h`
* Print usage information and exit.
* `--help-axes`, `--help-axis`
* Print axis specification documentation and exit.
* `--version`
* Print information about the version of NVBench used to build the executable.
# Device Modification
* `--persistence-mode <state>`, `--pm <state>`
* Sets persistence mode for one or more GPU devices.
* Applies to the devices described by the most recent `--devices` option,
or all devices if `--devices` is not specified.
* This option requires root / admin permissions.
* This option is only supported on Linux.
* This call must precede all other device modification options, if any.
* Note that persistence mode is deprecated and will be removed at some point
in favor of the new persistence daemon. See the following link for more
details: https://docs.nvidia.com/deploy/driver-persistence/index.html
* Valid values for `state` are:
* `0`: Disable persistence mode.
* `1`: Enable persistence mode.
* `--lock-gpu-clocks <rate>`, `--lgc <rate>`
* Lock GPU clocks for one or more devices to a particular rate.
* Applies to the devices described by the most recent `--devices` option,
or all devices if `--devices` is not specified.
* This option requires root / admin permissions.
* This option is only supported in Volta+ (sm_70+) devices.
* Valid values for `rate` are:
* `reset`, `unlock`, `none`: Unlock the GPU clocks.
* `base`, `tdp`: Lock clocks to base frequency (best for stable results).
* `max`, `maximum`: Lock clocks to max frequency (best for fastest results).
# Output
* `--csv <filename/stream>`
* Write CSV output to a file, or "stdout" / "stderr".
* `--json <filename/stream>`
* Write JSON output to a file, or "stdout" / "stderr".
* `--markdown <filename/stream>`, `--md <filename/stream>`
* Write markdown output to a file, or "stdout" / "stderr".
* Markdown is written to "stdout" by default.
* `--quiet`, `-q`
* Suppress output.
* `--color`
* Use color in output (markdown + stdout only).
# Benchmark / Axis Specification
* `--benchmark <benchmark name/index>`, `-b <benchmark name/index>`
* Execute a specific benchmark.
* Argument is a benchmark name or index, taken from `--list`.
* If not specified, all benchmarks will run.
* `--benchmark` may be specified multiple times to run several benchmarks.
* The same benchmark may be specified multiple times with different
configurations.
* `--axis <axis specification>`, `-a <axis specification>`
* Override an axis specification.
* See `--help-axis`
for [details on axis specifications](./cli_help_axis.md).
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
# Benchmark Properties
* `--devices <device ids>`, `--device <device ids>`, `-d <device ids>`
* Limit execution to one or more devices.
* `<device ids>` is a single id, a comma separated list, or the string "all".
* Device ids can be obtained from `--list`.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--skip-time <seconds>`
* Skip a measurement when a warmup run executes in less than `<seconds>`.
* Default is -1 seconds (disabled).
* Intended for testing / debugging only.
* Very fast kernels (<5us) often require an extremely large number of samples
to converge `max-noise`. This option allows them to be skipped to save time
during testing.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--cold-warmup-runs <count>`
* Execute up to `<count>` warmup runs before collecting cold measurement samples.
* The minimum is 1 warmup run.
* Default is 1 warmup run.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--cold-max-warmup-walltime <seconds>`
* Stop cold warmup after the total warmup walltime exceeds `<seconds>`.
* The limit is checked after each warmup run, so actual warmup time may exceed
this value by one warmup run.
* Default is -1 seconds (disabled).
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--throttle-threshold <value>`
* Set the GPU throttle threshold as percentage of the device's default clock rate.
* Default is 75.
* Set to 0 to disable throttle detection entirely.
* Note that throttling is disabled when `nvbench::exec_tag::sync` is used.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--throttle-recovery-delay <value>`
* Set the GPU throttle recovery delay in seconds.
* Default is 0.05 seconds.
* Note that throttling is disabled when `nvbench::exec_tag::sync` is used.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--profile`
* Only run each benchmark once.
* Disable any instrumentation that may interfere with profilers.
* Intended for use with external profiling tools.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--no-batch`
* Do not run batched measurements even if enabled.
* Intended to shorten run-time when batched measurements are not of interest.
* Applied to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
## Stopping Criteria
* `--timeout <seconds>`
* Measurements will timeout after `<seconds>` have elapsed.
* Default is 15 seconds.
* `<seconds>` is walltime, not accumulated sample time.
* If a measurement times out, the default markdown log will print a warning to
report any outstanding termination criteria (min samples, min time, max
noise).
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--min-samples <count>`
* Gather at least `<count>` samples per measurement before checking any
other stopping criterion besides the timeout.
* Default is 10 samples.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--stopping-criterion <criterion>`
* After `--min-samples` is satisfied, use `<criterion>` to detect if enough
samples were collected.
* Only applies to Cold and CPU-only measurements.
* If both GPU and CPU times are gathered, GPU time is used for stopping
analysis.
* Stopping criteria provided by NVBench are:
* "stdrel": (default) Converges to a minimal relative standard deviation,
stdev / mean
* "entropy": Converges based on the cumulative entropy of all samples.
* Each stopping criterion may provide additional parameters to customize
behavior, as detailed below:
### "stdrel" Stopping Criterion Parameters
* `--min-time <seconds>`
* Accumulate at least `<seconds>` of execution time per measurement.
* Only applies to `stdrel` stopping criterion.
* Default is 0.5 seconds.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--max-noise <value>`
* Gather samples until the error in the measurement drops below `<value>`.
* Noise is specified as the percent relative standard deviation (stdev/mean).
* Default is 0.5% (`--max-noise 0.5`)
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
### "entropy" Stopping Criterion Parameters
* `--max-angle <value>`
* Maximum linear regression angle of cumulative entropy.
* Smaller values give more accurate results.
* Default is 0.048.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.
* `--min-r2 <value>`
* Minimum coefficient of determination for linear regression of cumulative
entropy.
* Larger values give more accurate results.
* Default is 0.36.
* Applies to the most recent `--benchmark`, or all benchmarks if specified
before any `--benchmark` arguments.