nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-05-13 09:45:39 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	5f15bd69d7	Implement sample-count stopping criterion with parameter target-samples --stopping-criterion sample-count --target-samples 100 would stop once max(--min-samples, --target-samples) samples are collected	2026-05-12 14:31:46 -05:00
Oleksandr Pavlyk	9ea77bccaa	Implement CLI option to control warmups for cold measurements (#339 ) * Implement warmup-runs count, supported as CLI CLI option --warmup-runs implemented and documented. The warm-up counts is enforced to always be positive. This is necessary to ensure that JIT-ting has occurred, and use of blocking kernel would not result in time-outs. Test is option parser is added. * Ensure that measure_cold::run_warmup instantiates blocking kernel Because warm-up runs are executed without use of blocking kernel, the blocking kernel was not jitted until actual measurements were collected. The module loading cost incurred during the first run shows as elevated CPU time noise value for the first measurement as noted in https://github.com/NVIDIA/nvbench/pull/339 This PR adds `this->block_stream(); this->unblock_stream();` prior to executing warm-up loop with use of blocking kernel disabled. This ensures that blocking kernel is instantiated during the warm-up, but it no other kernel is launched between its launch and stream sync thus avoiding deadlocking. * Rename --warmup-runs to --cold-warmup-runs, add --cold-max-warmup-walltime Since configurable number of warmups only applies to measure_cold.cuh rename the CLI option to reflect that. Also add --cold-max-warmup-walltime (defaults to -1, i.e. disabled). If enabled, exits warmup loop before request count is reached if the wall-time expanded executign warmups exceeds this max-warmup-walltime value.	2026-05-12 14:30:08 -05:00
Oleksandr Pavlyk	f049f10977	Fix typo	2026-02-02 14:41:42 -06:00
Oleksandr Pavlyk	cff6df9bb2	Renamed option to --no-batch to stay aligned with tag name	2026-02-02 12:28:39 -06:00
Oleksandr Pavlyk	f1b9d44304	Support --no-batched CLI option The option sets m_skip_batched boolean member in benchmark_base class. Methods `bool get_skip_batched()` and `void set_skip_batched(bool)` added. m_skip_batched is also added to state class. Similarly named methods are added. CLI help file documents `--no-batched` option.	2026-02-02 11:32:57 -06:00
Oleksandr Pavlyk	4ad3088a47	Update docs/cli_help.md Spare users of implementation details in description of `--profile` option Co-authored-by: Allison Piper <apiper@nvidia.com>	2025-07-28 14:52:57 -05:00
Oleksandr Pavlyk	8416342af0	Remove mentions of --run-once and --disable-blocking-kernel from help Text for --profile modified to be self-consistent, i.e., not to refer to removed --run-once and --disable-blocking-kernel for explanantion of what it does.	2025-07-28 07:55:25 -05:00
Sergey Pavlov	433376fd83	Restrict stopping criterion parameter usage in command line (#174 ) * restrict stopping criterion parameter usage in command line * Update docs for stopping criterion. * Add convenience benchmark_base API for criterion params. * Add more test cases for stopping criterion parsing. --------- Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com> Co-authored-by: Allison Piper <alliepiper16@gmail.com>	2025-04-30 15:53:45 -04:00
Allison Piper	e4057575c7	Disable throttling when `sync` exec tag is used.	2025-04-24 22:48:35 +00:00
Allison Piper	18926ced87	Replace references to `peak_sm_clock` with `default_sm_clock`. The actual measured clock speed can exceed this value, so default is less confusing than peak.	2025-04-14 11:33:04 -04:00
Georgy Evtushenko	254ac2517f	Remove discard on throttle option	2025-04-12 21:13:13 -07:00
Georgy Evtushenko	b926daf09f	Better throttle recovery delay	2025-04-12 21:04:12 -07:00
Georgy Evtushenko	f29f7ac2fb	Detect throttle Signed-off-by: Georgy Evtushenko <evtushenko.georgy@gmail.com>	2025-04-11 14:35:40 -07:00
Georgy Evtushenko	b789240c76	Entropy-based stopping criterion	2024-01-05 14:59:48 -08:00
Paul Große-Bley	7f51ead595	Add --disable-blocking-kernel and --profile options.	2022-04-08 20:03:44 +02:00
Allison Vacanti	b948e79cab	Add NVML support for persistence mode, locking clocks. Locking clocks is currently only implemented for Volta+ devices. Example usage: my_bench -d [0,1,3] --persistence-mode 1 --lock-gpu-clocks base See the cli_help.md docs for more info.	2021-12-17 13:59:43 -05:00
Allison Vacanti	1875d9962d	Document new `--version` option.	2021-10-26 17:45:20 -04:00
Allison Vacanti	6d79c80152	Add --run-once option. Fixes #10. Adds a mode that forces a benchmark to only run once, simplifying profiling usecases. This can be enabled by any of the following methods: * Passing `--run-once` on the command line * `NVBENCH_CREATE(...).set_run_once(true)` when declaring a benchmark * `state.set_run_once(true)` from within the benchmark implementation.	2021-10-07 16:28:15 -04:00
Allison Vacanti	4e83e048ba	Store percentages as ratios. Human-readable outputs (md) and CLI inputs still use percentages. In-memory and machine-readable outputs (csv, json) use ratios. This is the convention that spreadsheet apps expect. Fixes #2.	2021-03-18 13:42:43 -04:00
Allison Vacanti	922a6d09d0	Add `--json` option to CLI docs.	2021-03-05 16:37:23 -05:00
Allison Vacanti	65bc2c1e3f	Documentation overhaul. Revamp README, split into multiple files. Add docs on CLI. Add `--help` and `--help-axis`.	2021-03-04 18:40:23 -05:00

21 Commits