Commit Graph

21 Commits

Author SHA1 Message Date
Oleksandr Pavlyk
5f15bd69d7 Implement sample-count stopping criterion with parameter target-samples
--stopping-criterion sample-count --target-samples 100 would stop once
max(--min-samples, --target-samples) samples are collected
2026-05-12 14:31:46 -05:00
Oleksandr Pavlyk
9ea77bccaa Implement CLI option to control warmups for cold measurements (#339)
* Implement warmup-runs count, supported as CLI

CLI option --warmup-runs implemented and documented.

The warm-up counts is enforced to always be positive.
This is necessary to ensure that JIT-ting has occurred,
and use of blocking kernel would not result in time-outs.

Test is option parser is added.

* Ensure that measure_cold::run_warmup instantiates blocking kernel

Because warm-up runs are executed without use of blocking kernel,
the blocking kernel was not jitted until actual measurements were
collected. The module loading cost incurred during the first run
shows as elevated CPU time noise value for the first measurement
as noted in https://github.com/NVIDIA/nvbench/pull/339

This PR adds `this->block_stream(); this->unblock_stream();` prior
to executing warm-up loop with use of blocking kernel disabled.

This ensures that blocking kernel is instantiated during the warm-up,
but it no other kernel is launched between its launch and stream sync
thus avoiding deadlocking.

* Rename --warmup-runs to --cold-warmup-runs, add --cold-max-warmup-walltime

Since configurable number of warmups only applies to measure_cold.cuh
rename the CLI option to reflect that.

Also add --cold-max-warmup-walltime (defaults to -1, i.e. disabled).
If enabled, exits warmup loop before request count is reached if
the wall-time expanded executign warmups exceeds this max-warmup-walltime
value.
2026-05-12 14:30:08 -05:00
Oleksandr Pavlyk
f049f10977 Fix typo 2026-02-02 14:41:42 -06:00
Oleksandr Pavlyk
cff6df9bb2 Renamed option to --no-batch to stay aligned with tag name 2026-02-02 12:28:39 -06:00
Oleksandr Pavlyk
f1b9d44304 Support --no-batched CLI option
The option sets m_skip_batched boolean member in benchmark_base class.
Methods `bool get_skip_batched()` and `void set_skip_batched(bool)` added.

m_skip_batched is also added to state class. Similarly named methods
are added.

CLI help file documents `--no-batched` option.
2026-02-02 11:32:57 -06:00
Oleksandr Pavlyk
4ad3088a47 Update docs/cli_help.md
Spare users of implementation details in description of `--profile` option

Co-authored-by: Allison Piper <apiper@nvidia.com>
2025-07-28 14:52:57 -05:00
Oleksandr Pavlyk
8416342af0 Remove mentions of --run-once and --disable-blocking-kernel from help
Text for --profile modified to be self-consistent, i.e., not to refer
to removed --run-once and --disable-blocking-kernel for explanantion
of what it does.
2025-07-28 07:55:25 -05:00
Sergey Pavlov
433376fd83 Restrict stopping criterion parameter usage in command line (#174)
* restrict stopping criterion parameter usage in command line
* Update docs for stopping criterion.
* Add convenience benchmark_base API for criterion params.
* Add more test cases for stopping criterion parsing.

---------

Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com>
Co-authored-by: Allison Piper <alliepiper16@gmail.com>
2025-04-30 15:53:45 -04:00
Allison Piper
e4057575c7 Disable throttling when sync exec tag is used. 2025-04-24 22:48:35 +00:00
Allison Piper
18926ced87 Replace references to peak_sm_clock with default_sm_clock.
The actual measured clock speed can exceed this value, so default is less confusing than peak.
2025-04-14 11:33:04 -04:00
Georgy Evtushenko
254ac2517f Remove discard on throttle option 2025-04-12 21:13:13 -07:00
Georgy Evtushenko
b926daf09f Better throttle recovery delay 2025-04-12 21:04:12 -07:00
Georgy Evtushenko
f29f7ac2fb Detect throttle
Signed-off-by: Georgy Evtushenko <evtushenko.georgy@gmail.com>
2025-04-11 14:35:40 -07:00
Georgy Evtushenko
b789240c76 Entropy-based stopping criterion 2024-01-05 14:59:48 -08:00
Paul Große-Bley
7f51ead595 Add --disable-blocking-kernel and --profile options. 2022-04-08 20:03:44 +02:00
Allison Vacanti
b948e79cab Add NVML support for persistence mode, locking clocks.
Locking clocks is currently only implemented for Volta+ devices.

Example usage:

my_bench -d [0,1,3] --persistence-mode 1 --lock-gpu-clocks base

See the cli_help.md docs for more info.
2021-12-17 13:59:43 -05:00
Allison Vacanti
1875d9962d Document new --version option. 2021-10-26 17:45:20 -04:00
Allison Vacanti
6d79c80152 Add --run-once option.
Fixes #10.

Adds a mode that forces a benchmark to only run once, simplifying
profiling usecases. This can be enabled by any of the following methods:

* Passing `--run-once` on the command line
* `NVBENCH_CREATE(...).set_run_once(true)` when declaring a benchmark
* `state.set_run_once(true)` from within the benchmark implementation.
2021-10-07 16:28:15 -04:00
Allison Vacanti
4e83e048ba Store percentages as ratios.
Human-readable outputs (md) and CLI inputs still use percentages.
In-memory and machine-readable outputs (csv, json) use ratios.

This is the convention that spreadsheet apps expect. Fixes #2.
2021-03-18 13:42:43 -04:00
Allison Vacanti
922a6d09d0 Add --json option to CLI docs. 2021-03-05 16:37:23 -05:00
Allison Vacanti
65bc2c1e3f Documentation overhaul.
Revamp README, split into multiple files. Add docs on CLI.

Add `--help` and `--help-axis`.
2021-03-04 18:40:23 -05:00