nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-05-13 09:45:39 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	9ea77bccaa	Implement CLI option to control warmups for cold measurements (#339 ) * Implement warmup-runs count, supported as CLI CLI option --warmup-runs implemented and documented. The warm-up counts is enforced to always be positive. This is necessary to ensure that JIT-ting has occurred, and use of blocking kernel would not result in time-outs. Test is option parser is added. * Ensure that measure_cold::run_warmup instantiates blocking kernel Because warm-up runs are executed without use of blocking kernel, the blocking kernel was not jitted until actual measurements were collected. The module loading cost incurred during the first run shows as elevated CPU time noise value for the first measurement as noted in https://github.com/NVIDIA/nvbench/pull/339 This PR adds `this->block_stream(); this->unblock_stream();` prior to executing warm-up loop with use of blocking kernel disabled. This ensures that blocking kernel is instantiated during the warm-up, but it no other kernel is launched between its launch and stream sync thus avoiding deadlocking. * Rename --warmup-runs to --cold-warmup-runs, add --cold-max-warmup-walltime Since configurable number of warmups only applies to measure_cold.cuh rename the CLI option to reflect that. Also add --cold-max-warmup-walltime (defaults to -1, i.e. disabled). If enabled, exits warmup loop before request count is reached if the wall-time expanded executign warmups exceeds this max-warmup-walltime value.	2026-05-12 14:30:08 -05:00
Oleksandr Pavlyk	f049f10977	Fix typo	2026-02-02 14:41:42 -06:00
Oleksandr Pavlyk	cff6df9bb2	Renamed option to --no-batch to stay aligned with tag name	2026-02-02 12:28:39 -06:00
Oleksandr Pavlyk	f1b9d44304	Support --no-batched CLI option The option sets m_skip_batched boolean member in benchmark_base class. Methods `bool get_skip_batched()` and `void set_skip_batched(bool)` added. m_skip_batched is also added to state class. Similarly named methods are added. CLI help file documents `--no-batched` option.	2026-02-02 11:32:57 -06:00
Oleksandr Pavlyk	4ad3088a47	Update docs/cli_help.md Spare users of implementation details in description of `--profile` option Co-authored-by: Allison Piper <apiper@nvidia.com>	2025-07-28 14:52:57 -05:00
Oleksandr Pavlyk	8416342af0	Remove mentions of --run-once and --disable-blocking-kernel from help Text for --profile modified to be self-consistent, i.e., not to refer to removed --run-once and --disable-blocking-kernel for explanantion of what it does.	2025-07-28 07:55:25 -05:00
Sergey Pavlov	433376fd83	Restrict stopping criterion parameter usage in command line (#174 ) * restrict stopping criterion parameter usage in command line * Update docs for stopping criterion. * Add convenience benchmark_base API for criterion params. * Add more test cases for stopping criterion parsing. --------- Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com> Co-authored-by: Allison Piper <alliepiper16@gmail.com>	2025-04-30 15:53:45 -04:00
Allison Piper	e4057575c7	Disable throttling when `sync` exec tag is used.	2025-04-24 22:48:35 +00:00
Allison Piper	18926ced87	Replace references to `peak_sm_clock` with `default_sm_clock`. The actual measured clock speed can exceed this value, so default is less confusing than peak.	2025-04-14 11:33:04 -04:00
Georgy Evtushenko	254ac2517f	Remove discard on throttle option	2025-04-12 21:13:13 -07:00
Georgy Evtushenko	b926daf09f	Better throttle recovery delay	2025-04-12 21:04:12 -07:00
Georgy Evtushenko	f29f7ac2fb	Detect throttle Signed-off-by: Georgy Evtushenko <evtushenko.georgy@gmail.com>	2025-04-11 14:35:40 -07:00
Allison Piper	a6df59a9b5	Add support for CPU-only benchmarking. Fixes #95. CPU-only mode is enabled by setting the `is_cpu_only` property while defining a benchmark, e.g. `NVBENCH_BENCH(foo).set_is_cpu_only(true)`. An optional `nvbench::exec_tag::no_gpu` hint can also be passed to `state.exec` to avoid instantiating GPU benchmarking backends. Note that a CUDA compiler and CUDA runtime are always required, even if all benchmarks in a translation unit are CPU-only. Similarly, a new `nvbench::exec_tag::gpu` hint can be used to avoid compiling CPU-only backends for GPU benchmarks.	2025-04-08 11:17:23 -04:00
Georgy Evtushenko	b789240c76	Entropy-based stopping criterion	2024-01-05 14:59:48 -08:00
Bryce Adelstein Lelbach aka wash	39b2770b62	Fix typo in documentation: `set_type_axis_names` should be `set_type_axes_names`	2023-10-05 13:16:16 -04:00
Paul Große-Bley	7f51ead595	Add --disable-blocking-kernel and --profile options.	2022-04-08 20:03:44 +02:00
Allison Vacanti	48d94259b4	Fix typo in new docs.	2022-02-11 14:01:49 -05:00
Allison Vacanti	039d455727	Move documentation on streams to new subsection. Also update to use `nvbench::make_cuda_stream_view`.	2022-02-11 13:29:06 -05:00
Yunsong Wang	e7c29c1c1b	Update docs	2022-02-06 19:34:57 -05:00
Yunsong Wang	a2a12c689c	Update docs/benchmarks.md Co-authored-by: Jake Hemstad <jhemstad@nvidia.com>	2022-02-06 19:31:20 -05:00
Yunsong Wang	76cbbcc8f9	Update benchmarks.md	2022-02-04 17:20:40 -05:00
Allison Vacanti	b948e79cab	Add NVML support for persistence mode, locking clocks. Locking clocks is currently only implemented for Volta+ devices. Example usage: my_bench -d [0,1,3] --persistence-mode 1 --lock-gpu-clocks base See the cli_help.md docs for more info.	2021-12-17 13:59:43 -05:00
Allison Vacanti	1875d9962d	Document new `--version` option.	2021-10-26 17:45:20 -04:00
Allison Vacanti	6d79c80152	Add --run-once option. Fixes #10. Adds a mode that forces a benchmark to only run once, simplifying profiling usecases. This can be enabled by any of the following methods: * Passing `--run-once` on the command line * `NVBENCH_CREATE(...).set_run_once(true)` when declaring a benchmark * `state.set_run_once(true)` from within the benchmark implementation.	2021-10-07 16:28:15 -04:00
Allison Vacanti	ff507596bf	Fix typo in docs.	2021-04-12 14:48:45 -04:00
Allison Vacanti	4e83e048ba	Store percentages as ratios. Human-readable outputs (md) and CLI inputs still use percentages. In-memory and machine-readable outputs (csv, json) use ratios. This is the convention that spreadsheet apps expect. Fixes #2.	2021-03-18 13:42:43 -04:00
Allison Vacanti	60c94d9ed6	Add `enum_type_axis` and `examples/enums.cu`. - `enum_type_axis` simplifies using integral_constants with type axes. - `examples/enums.cu` demonstrates various ways of implementing parameter sweeps with enum types.	2021-03-16 13:57:52 -04:00
Yunsong Wang	a097e6d90d	Minor corrections in doc	2021-03-11 16:47:03 -05:00
Allison Vacanti	3fc75f5ea6	Add more examples. - exec_tag_timer - exec_tag_sync - skip - throughput	2021-03-09 16:03:14 -05:00
Allison Vacanti	33aa9e1a07	Update README to link to the new example.	2021-03-08 18:26:26 -05:00
Allison Vacanti	922a6d09d0	Add `--json` option to CLI docs.	2021-03-05 16:37:23 -05:00
Allison Vacanti	33fa0c773f	Typo.	2021-03-04 23:24:37 -05:00
Allison Vacanti	65bc2c1e3f	Documentation overhaul. Revamp README, split into multiple files. Add docs on CLI. Add `--help` and `--help-axis`.	2021-03-04 18:40:23 -05:00

33 Commits