Files
nvbench/docs/cli_help.md
Oleksandr Pavlyk 2decce303d Add scaffolding to build C++/Python docs
Add sphinx-combined folder that builds combined C++ & Python docs

Fixed relative text alignment in docstrings to fix autodoc warnigns

Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions
to start with underscore, signaling that these functions are internal and should
not be documented

Account for test_cpp_exceptions -> _test_cpp_exception, same for *_py_*

Fix cpp_benchmarks, add py_benchmarks

1. Fixed xrefs in docs/sphinx-combined/cpp_benchmarks.md, which is built on top of
   docs/benchmarks.md

   Added level-1 heading, and pushed existing headings one level down.

2. Added py_benchmarks.md to document benchmarking of Python scripts.

3. Rearranged entries in index.rst so that overview documents come before
   API enumeration.

Make sure to reset __module__ of reexported symbols to be cuda.bench

Enumerate free functions in nvbench:: namespace

Tweak to index.rst intro sentence and title

Changed title, fixed references, added intro borrowed from README

Fix punctuation in one of the itemlist item text

Hide TOC from the index page. It is too long and confusing
2026-04-22 08:38:33 -05:00

7.4 KiB

Queries

  • --list, -l

    • List all devices and benchmarks without running them.
  • --help, -h

    • Print usage information and exit.
  • --help-axes, --help-axis

    • Print axis specification documentation and exit.
  • --version

    • Print information about the version of NVBench used to build the executable.

Device Modification

  • --persistence-mode <state>, --pm <state>

    • Sets persistence mode for one or more GPU devices.
    • Applies to the devices described by the most recent --devices option, or all devices if --devices is not specified.
    • This option requires root / admin permissions.
    • This option is only supported on Linux.
    • This call must precede all other device modification options, if any.
    • Note that persistence mode is deprecated and will be removed at some point in favor of the new persistence daemon. See the following link for more details: https://docs.nvidia.com/deploy/driver-persistence/index.html
    • Valid values for state are:
      • 0: Disable persistence mode.
      • 1: Enable persistence mode.
  • --lock-gpu-clocks <rate>, --lgc <rate>

    • Lock GPU clocks for one or more devices to a particular rate.
    • Applies to the devices described by the most recent --devices option, or all devices if --devices is not specified.
    • This option requires root / admin permissions.
    • This option is only supported in Volta+ (sm_70+) devices.
    • Valid values for rate are:
      • reset, unlock, none: Unlock the GPU clocks.
      • base, tdp: Lock clocks to base frequency (best for stable results).
      • max, maximum: Lock clocks to max frequency (best for fastest results).

Output

  • --csv <filename/stream>

    • Write CSV output to a file, or "stdout" / "stderr".
  • --json <filename/stream>

    • Write JSON output to a file, or "stdout" / "stderr".
  • --markdown <filename/stream>, --md <filename/stream>

    • Write markdown output to a file, or "stdout" / "stderr".
    • Markdown is written to "stdout" by default.
  • --quiet, -q

    • Suppress output.
  • --color

    • Use color in output (markdown + stdout only).

Benchmark / Axis Specification

  • --benchmark <benchmark name/index>, -b <benchmark name/index>

    • Execute a specific benchmark.
    • Argument is a benchmark name or index, taken from --list.
    • If not specified, all benchmarks will run.
    • --benchmark may be specified multiple times to run several benchmarks.
    • The same benchmark may be specified multiple times with different configurations.
  • --axis <axis specification>, -a <axis specification>

    • Override an axis specification.
    • See --help-axis for details on axis specifications.
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.

Benchmark Properties

  • --devices <device ids>, --device <device ids>, -d <device ids>

    • Limit execution to one or more devices.
    • <device ids> is a single id, a comma separated list, or the string "all".
    • Device ids can be obtained from --list.
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.
  • --skip-time <seconds>

    • Skip a measurement when a warmup run executes in less than <seconds>.
    • Default is -1 seconds (disabled).
    • Intended for testing / debugging only.
    • Very fast kernels (<5us) often require an extremely large number of samples to converge max-noise. This option allows them to be skipped to save time during testing.
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.
  • --throttle-threshold <value>

    • Set the GPU throttle threshold as percentage of the device's default clock rate.
    • Default is 75.
    • Set to 0 to disable throttle detection entirely.
    • Note that throttling is disabled when nvbench::exec_tag::sync is used.
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.
  • --throttle-recovery-delay <value>

    • Set the GPU throttle recovery delay in seconds.
    • Default is 0.05 seconds.
    • Note that throttling is disabled when nvbench::exec_tag::sync is used.
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.
  • --profile

    • Only run each benchmark once.
    • Disable any instrumentation that may interfere with profilers.
    • Intended for use with external profiling tools.
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.
  • --no-batch

    • Do not run batched measurements even if enabled.
    • Intended to shorten run-time when batched measurements are not of interest.
    • Applied to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.

Stopping Criteria

  • --timeout <seconds>

    • Measurements will timeout after <seconds> have elapsed.
    • Default is 15 seconds.
    • <seconds> is walltime, not accumulated sample time.
    • If a measurement times out, the default markdown log will print a warning to report any outstanding termination criteria (min samples, min time, max noise).
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.
  • --min-samples <count>

    • Gather at least <count> samples per measurement before checking any other stopping criterion besides the timeout.
    • Default is 10 samples.
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.
  • --stopping-criterion <criterion>

    • After --min-samples is satisfied, use <criterion> to detect if enough samples were collected.
    • Only applies to Cold and CPU-only measurements.
    • If both GPU and CPU times are gathered, GPU time is used for stopping analysis.
    • Stopping criteria provided by NVBench are:
      • "stdrel": (default) Converges to a minimal relative standard deviation, stdev / mean
      • "entropy": Converges based on the cumulative entropy of all samples.
    • Each stopping criterion may provide additional parameters to customize behavior, as detailed below:

"stdrel" Stopping Criterion Parameters

  • --min-time <seconds>

    • Accumulate at least <seconds> of execution time per measurement.
    • Only applies to stdrel stopping criterion.
    • Default is 0.5 seconds.
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.
  • --max-noise <value>

    • Gather samples until the error in the measurement drops below <value>.
    • Noise is specified as the percent relative standard deviation (stdev/mean).
    • Default is 0.5% (--max-noise 0.5)
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.

"entropy" Stopping Criterion Parameters

  • --max-angle <value>

    • Maximum linear regression angle of cumulative entropy.
    • Smaller values give more accurate results.
    • Default is 0.048.
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.
  • --min-r2 <value>

    • Minimum coefficient of determination for linear regression of cumulative entropy.
    • Larger values give more accurate results.
    • Default is 0.36.
    • Applies to the most recent --benchmark, or all benchmarks if specified before any --benchmark arguments.