nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-06-29 10:47:36 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	7ba2b79d5b	Reduce stdrel criterion complexity and ensure termination (#374 ) * Reduce stdrel criterion complexity and ensure termination Replace the stdrel criterion's growing sample history with an online mean/variance accumulator. This keeps the stopping criterion based on relative standard deviation, preserves the unbiased standard-deviation estimate used for convergence, and reduces per-sample update work from recomputing over the full history to constant time. Add a bounded invalid-noise path so measurements that persistently produce non-finite relative noise, such as all-zero timings, can terminate without waiting for the wall-time timeout. Keep the normal min-time gate for ordinary stdrel convergence. Add focused tests for the online accumulator, stdrel sample-count threshold, sample-standard-deviation behavior, deterministic convergence inputs, and persistent invalid-noise termination. Update the CLI help for the stdrel termination behavior. * change max-noise to for consistency * Use online_mean_variance on m_noise_tracker in is_finished() Previously, standard deviation call was made using current noise level instead of mean noise level. Because of identity E[ (N - C)^2 ] = E[ (N - E[N])^2 ] + (E[N] - C)^2 >= E[ (N - E[N])^2 ] this led to criterion terminating later than it could have because the estimated expectation is always greater of equal that the estimate relative to the mean. Code used current noise level instead of mean to avoid needing to make two passed through m_noise_tracker container. Use of online_mean_variance allows to improve accuracy of estimating dispersion of noise signal while maintaining single pass through container. * Address review feedback Fixed misleading commit. Introduce private methods to refactor computation of repeated expressions. Renamed m_cuda_times_summary to m_measurements_summary, since criterion can be applied for CPU-only measurements too. Introduced is_close utility for checking whether two floating point numbers are closed to one another. Introduced descriptive constexpr variables for hard-wired constants	2026-05-29 17:06:28 +00:00
omribz156	ec025d7e0d	docs: separate measurement options from stopping criteria (#373 ) Signed-off-by: Omri SirComp <omribz156@gmail.com>	2026-05-28 16:51:12 -05:00
Oleksandr Pavlyk	ce75dab94b	Add stopping criterion sample count (#341 ) * Implement sample-count stopping criterion with parameter target-samples --stopping-criterion sample-count --target-samples 100 would stop once max(--min-samples, --target-samples) samples are collected * Address review nitpicks	2026-05-15 15:15:12 -05:00
Oleksandr Pavlyk	9ea77bccaa	Implement CLI option to control warmups for cold measurements (#339 ) * Implement warmup-runs count, supported as CLI CLI option --warmup-runs implemented and documented. The warm-up counts is enforced to always be positive. This is necessary to ensure that JIT-ting has occurred, and use of blocking kernel would not result in time-outs. Test is option parser is added. * Ensure that measure_cold::run_warmup instantiates blocking kernel Because warm-up runs are executed without use of blocking kernel, the blocking kernel was not jitted until actual measurements were collected. The module loading cost incurred during the first run shows as elevated CPU time noise value for the first measurement as noted in https://github.com/NVIDIA/nvbench/pull/339 This PR adds `this->block_stream(); this->unblock_stream();` prior to executing warm-up loop with use of blocking kernel disabled. This ensures that blocking kernel is instantiated during the warm-up, but it no other kernel is launched between its launch and stream sync thus avoiding deadlocking. * Rename --warmup-runs to --cold-warmup-runs, add --cold-max-warmup-walltime Since configurable number of warmups only applies to measure_cold.cuh rename the CLI option to reflect that. Also add --cold-max-warmup-walltime (defaults to -1, i.e. disabled). If enabled, exits warmup loop before request count is reached if the wall-time expanded executign warmups exceeds this max-warmup-walltime value.	2026-05-12 14:30:08 -05:00
Oleksandr Pavlyk	f049f10977	Fix typo	2026-02-02 14:41:42 -06:00
Oleksandr Pavlyk	cff6df9bb2	Renamed option to --no-batch to stay aligned with tag name	2026-02-02 12:28:39 -06:00
Oleksandr Pavlyk	f1b9d44304	Support --no-batched CLI option The option sets m_skip_batched boolean member in benchmark_base class. Methods `bool get_skip_batched()` and `void set_skip_batched(bool)` added. m_skip_batched is also added to state class. Similarly named methods are added. CLI help file documents `--no-batched` option.	2026-02-02 11:32:57 -06:00
Oleksandr Pavlyk	4ad3088a47	Update docs/cli_help.md Spare users of implementation details in description of `--profile` option Co-authored-by: Allison Piper <apiper@nvidia.com>	2025-07-28 14:52:57 -05:00
Oleksandr Pavlyk	8416342af0	Remove mentions of --run-once and --disable-blocking-kernel from help Text for --profile modified to be self-consistent, i.e., not to refer to removed --run-once and --disable-blocking-kernel for explanantion of what it does.	2025-07-28 07:55:25 -05:00
Sergey Pavlov	433376fd83	Restrict stopping criterion parameter usage in command line (#174 ) * restrict stopping criterion parameter usage in command line * Update docs for stopping criterion. * Add convenience benchmark_base API for criterion params. * Add more test cases for stopping criterion parsing. --------- Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com> Co-authored-by: Allison Piper <alliepiper16@gmail.com>	2025-04-30 15:53:45 -04:00
Allison Piper	e4057575c7	Disable throttling when `sync` exec tag is used.	2025-04-24 22:48:35 +00:00
Allison Piper	18926ced87	Replace references to `peak_sm_clock` with `default_sm_clock`. The actual measured clock speed can exceed this value, so default is less confusing than peak.	2025-04-14 11:33:04 -04:00
Georgy Evtushenko	254ac2517f	Remove discard on throttle option	2025-04-12 21:13:13 -07:00
Georgy Evtushenko	b926daf09f	Better throttle recovery delay	2025-04-12 21:04:12 -07:00
Georgy Evtushenko	f29f7ac2fb	Detect throttle Signed-off-by: Georgy Evtushenko <evtushenko.georgy@gmail.com>	2025-04-11 14:35:40 -07:00
Georgy Evtushenko	b789240c76	Entropy-based stopping criterion	2024-01-05 14:59:48 -08:00
Paul Große-Bley	7f51ead595	Add --disable-blocking-kernel and --profile options.	2022-04-08 20:03:44 +02:00
Allison Vacanti	b948e79cab	Add NVML support for persistence mode, locking clocks. Locking clocks is currently only implemented for Volta+ devices. Example usage: my_bench -d [0,1,3] --persistence-mode 1 --lock-gpu-clocks base See the cli_help.md docs for more info.	2021-12-17 13:59:43 -05:00
Allison Vacanti	1875d9962d	Document new `--version` option.	2021-10-26 17:45:20 -04:00
Allison Vacanti	6d79c80152	Add --run-once option. Fixes #10. Adds a mode that forces a benchmark to only run once, simplifying profiling usecases. This can be enabled by any of the following methods: * Passing `--run-once` on the command line * `NVBENCH_CREATE(...).set_run_once(true)` when declaring a benchmark * `state.set_run_once(true)` from within the benchmark implementation.	2021-10-07 16:28:15 -04:00
Allison Vacanti	4e83e048ba	Store percentages as ratios. Human-readable outputs (md) and CLI inputs still use percentages. In-memory and machine-readable outputs (csv, json) use ratios. This is the convention that spreadsheet apps expect. Fixes #2.	2021-03-18 13:42:43 -04:00
Allison Vacanti	922a6d09d0	Add `--json` option to CLI docs.	2021-03-05 16:37:23 -05:00
Allison Vacanti	65bc2c1e3f	Documentation overhaul. Revamp README, split into multiple files. Add docs on CLI. Add `--help` and `--help-axis`.	2021-03-04 18:40:23 -05:00

23 Commits