nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-06-29 10:47:36 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	0dc93b0c0e	Introduce robust metrics (#379 ) * Add statistics utilities to compute quartiles Quartiles are computed using nearest rank method. Two implementations are provided: 1. Sort-based: a. sort array b. extract values at ranks of interest 2. Selection based: a. Run nth_element to find median on whole range b. Run nth_element on left side to find first quartile c. Run nth_element on right side to find thirst quartile Public API copies input into temporary vector which is mutated as needed. Public API uses sort-based implementation for small arrays ( <= 4096 elements), and selection-based implementation for larger arrays. Sort-based implementation can support computation of arbitrary percentiles, which could be useful later if more extreme statistics is needed. Add tests covering percentile and quartile edge cases, input iterators, selection-vs-sorting agreement, empty and singleton inputs, and relative dispersion validation. * Add quartiles information to summaries Use the quartile helpers to report robust cold and CPU-only timing summaries: Q1, median, Q3, interquartile range, and relative interquartile range. These values stay hidden. Summary tags are nv/cold/time/gpu/q1, nv/cold/time/gpu/median, nv/cold/time/gpu/q3, nv/cold/time/gpu/ir/absolute, nv/cold/time/gpu/ir/relative ir/absolute = q3 - q1, ir/relative = (q3 - q1)/median Similar tags added for nv/cold/time/cpu and for CPU-only measures. Validate relative-dispersion calculations before publishing relative noise summaries so invalid centers or dispersion values do not produce misleading summary entries. * Prefer robust summaries in default output Only flip visibility for nv/cold/cpu/time, nv/cold/gpu/time, and nv/cpu_only/only: - hide mean - hide stdev/relative - show median - show ir/relative * Use is_close where std::abs(act-exp) was used * Revert "Prefer robust summaries in default output" This reverts commit `9a0afc361c`. Basically, all robust statistics summaries entries are hidden, and mean + stdev/relative are back to be default displayed items * Address PR review feedback	2026-06-02 13:20:15 -05:00
Oleksandr Pavlyk	7ba2b79d5b	Reduce stdrel criterion complexity and ensure termination (#374 ) * Reduce stdrel criterion complexity and ensure termination Replace the stdrel criterion's growing sample history with an online mean/variance accumulator. This keeps the stopping criterion based on relative standard deviation, preserves the unbiased standard-deviation estimate used for convergence, and reduces per-sample update work from recomputing over the full history to constant time. Add a bounded invalid-noise path so measurements that persistently produce non-finite relative noise, such as all-zero timings, can terminate without waiting for the wall-time timeout. Keep the normal min-time gate for ordinary stdrel convergence. Add focused tests for the online accumulator, stdrel sample-count threshold, sample-standard-deviation behavior, deterministic convergence inputs, and persistent invalid-noise termination. Update the CLI help for the stdrel termination behavior. * change max-noise to for consistency * Use online_mean_variance on m_noise_tracker in is_finished() Previously, standard deviation call was made using current noise level instead of mean noise level. Because of identity E[ (N - C)^2 ] = E[ (N - E[N])^2 ] + (E[N] - C)^2 >= E[ (N - E[N])^2 ] this led to criterion terminating later than it could have because the estimated expectation is always greater of equal that the estimate relative to the mean. Code used current noise level instead of mean to avoid needing to make two passed through m_noise_tracker container. Use of online_mean_variance allows to improve accuracy of estimating dispersion of noise signal while maintaining single pass through container. * Address review feedback Fixed misleading commit. Introduce private methods to refactor computation of repeated expressions. Renamed m_cuda_times_summary to m_measurements_summary, since criterion can be applied for CPU-only measurements too. Introduced is_close utility for checking whether two floating point numbers are closed to one another. Introduced descriptive constexpr variables for hard-wired constants	2026-05-29 17:06:28 +00:00
mfranzrebsal	4a33a61591	Add Windows support (#354 )	2026-05-19 15:10:58 -05:00
Oleksandr Pavlyk	ce75dab94b	Add stopping criterion sample count (#341 ) * Implement sample-count stopping criterion with parameter target-samples --stopping-criterion sample-count --target-samples 100 would stop once max(--min-samples, --target-samples) samples are collected * Address review nitpicks	2026-05-15 15:15:12 -05:00
Oleksandr Pavlyk	6dd27aedfd	Fix exception safety (#358 ) Improve exception safety of timer structs by using local scope guards to ensure that cleanup steps, such as signaling blocking kernel to unblock and making sure that the stream is synchronized are performed even launch object throws an exception. Tests of exception safety were added. -- * blocking_kernel.unblock_noexcept() noexcept method added This decouples the logic of signaling to unblock from checking of the timeout. * Improve exception safely in kernel_launch_timer Introduce noexcept cleanup methods. Place body of start() and stop() methods in the try/catch block and execute noexcept clean-up on exception before rethrowing. * Improve exception safety of measure_hot * Make sure that throwing methods call noexcept ones instead of duplicating functionality * Use cleanup_guard in measure_cold_base::kernel_launch_timer Replace try/catch pattern with cleaner use of cleanup_guard class. * cpu_timer::start, cpu_timer::stop methods marked noexcept These methods do not throw, and marking them noexcept explicitly makes it fine to call them from other noexcept methods, as such cleanup_noexcept in measure_cold. * Address remaining exception safety issue in measure_hot * Renamed guard variables to reflect their purpose, apply arm-then-do to ops queueing kernels Set m_block_stream_armed = true; before launching the kernel. Doing so signals cleanup guard that stream must be unblocked, even if launching of the kernel failed. Same for operation launching time-stamps kernel. * Add testing/device/exception_safety.cu This test add benchmark that throws. It verifies that it did not time-out and control counters the benchmark maintains are at the expected values. * Refactor measurement cleanup guards for testability Extract hot stream cleanup and cold launch timer cleanup into reusable detail helpers. Keep measure_hot and measure_cold using those helpers through thin adapters so the tested cleanup logic matches the production path. Add driver-free cleanup guard tests using a fake measure object to verify cleanup ordering when exceptions occur after blocking stream setup, after hot unblock, and around cold GPU frequency start/stop paths. * Implement cpu_timer_stop_noexcept in terms of cpu_timer_stop The cpu_timer_stop is already noexcept by nature of implementation, but we maintain cpu_timer_stop_noexcept method for symmetry with other pairs sync_stream()/sync_stream_noexcept(). The cpu_timer_stop_noexcept() is implemented via cpu_timer_stop(). These methods are annotated __forceinline__, so the same code should be generated. * More readable initialization of bool members * Moved exception_safety.cu back to testing/ folder testing/device is reserved for tests that require locking of GPU frequency per CMake option description. * Fixed nitpick and bug it discovered Changed testing/exception_safety.cu:237 so run_benchmark now iterates over every state from bench.get_states() and checks each one is skipped with a reason containing "requested". That exposed a real runner behavior gap, so I also made a minimal fix in nvbench/runner.cuh:120: after stop_runner_loop, remaining states are now explicitly marked skipped with a reason instead of only printing a skip notification. * Move static assertions (pertaining to cleanup guards) to testing/cleanup_guards.cu The CI failure with CTK 12.0 and certain version of GCC is caused by OOM in cudafe++ process tripped by compiling instantiation of contract verification on cold_launch_timer_probe struct. As a work-around, this instantiation is excluded for CTK 12.0-12.6	2026-05-15 15:14:30 -05:00
Oleksandr Pavlyk	9ea77bccaa	Implement CLI option to control warmups for cold measurements (#339 ) * Implement warmup-runs count, supported as CLI CLI option --warmup-runs implemented and documented. The warm-up counts is enforced to always be positive. This is necessary to ensure that JIT-ting has occurred, and use of blocking kernel would not result in time-outs. Test is option parser is added. * Ensure that measure_cold::run_warmup instantiates blocking kernel Because warm-up runs are executed without use of blocking kernel, the blocking kernel was not jitted until actual measurements were collected. The module loading cost incurred during the first run shows as elevated CPU time noise value for the first measurement as noted in https://github.com/NVIDIA/nvbench/pull/339 This PR adds `this->block_stream(); this->unblock_stream();` prior to executing warm-up loop with use of blocking kernel disabled. This ensures that blocking kernel is instantiated during the warm-up, but it no other kernel is launched between its launch and stream sync thus avoiding deadlocking. * Rename --warmup-runs to --cold-warmup-runs, add --cold-max-warmup-walltime Since configurable number of warmups only applies to measure_cold.cuh rename the CLI option to reflect that. Also add --cold-max-warmup-walltime (defaults to -1, i.e. disabled). If enabled, exits warmup loop before request count is reached if the wall-time expanded executign warmups exceeds this max-warmup-walltime value.	2026-05-12 14:30:08 -05:00
Oleksandr Pavlyk	7dfbcad27c	Create directories for output files (#360 ) * QOL UX, NVBench creates directories for output JSON, MD, CSV files This closes #185 and supports specifying `--json path/to/nonexistent/folder/result.json` This would create sequence of folders where to place result.json ``` (py313) :~/repos/nvbench$ rm -rf /tmp/nested/ (py313) :~/repos/nvbench$ ./build2/bin/nvbench.example.cpp20.axes -b copy_type_and_block_size_sweep -a Type=I32 -a BlockSize=64 --jsonbin /tmp/nested/json/axes.json --md /tmp/nested/md/res.md --csv /tmp/nested/csv/res.csv > /dev/null 2>&1 (py313) :~/repos/nvbench$ tree /tmp/nested/ /tmp/nested/ ├── csv │ └── res.csv ├── json │ ├── axes.json │ ├── axes.json-bin │ │ └── 0.bin │ └── axes.json-freqs-bin │ └── 0.bin └── md └── res.md 6 directories, 5 files ``` * Add a test that non-existent output folder is created * Remove throwing custom error message. Use default * Replace static_assert(false, ...) with #error	2026-05-12 10:26:28 -05:00
Oleksandr Pavlyk	4c278b08b3	Link against fmt::fmt target, not fmt. Consistent with nvbench/CMakeLists.txt Co-authored-by: Dominic Charrier <docharri@amd.com>	2026-03-19 14:53:06 -05:00
Oleksandr Pavlyk	d160a2bafa	Replace --run-once in testing/CMakeLists.txt with --profile	2025-07-28 12:03:42 -05:00
Allison Piper	f44f5cc22c	Remove min-time/max-noise API. (#223 ) These are now owned by the stdrel stopping criterion, and should not be exposed directly in the benchmark/state/etc APIs. This will affect users that are calling `NVBENCH_BENCH(...).set_min_time(...)` or `NVBENCH_BENCH(...).set_max_noise(...)`. These can be updated to `NVBENCH_BENCH(...).set_criterion_param_float64(["min-time"\|"max-noise"], ...)`.	2025-05-08 10:02:54 -04:00
Allison Piper	9d189280de	Fix `get_config_count` for CPU-only benchmarks. (#218 )	2025-05-01 12:34:35 -04:00
Sergey Pavlov	433376fd83	Restrict stopping criterion parameter usage in command line (#174 ) * restrict stopping criterion parameter usage in command line * Update docs for stopping criterion. * Add convenience benchmark_base API for criterion params. * Add more test cases for stopping criterion parsing. --------- Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com> Co-authored-by: Allison Piper <alliepiper16@gmail.com>	2025-04-30 15:53:45 -04:00
Elias Stehle	ca0e795b46	Merge pull request #113 from elstehle/fix/per-device-stream Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream	2025-04-30 15:40:33 -04:00
Allison Piper	3440855dbd	Formatting updates.	2025-04-14 17:26:12 +00:00
Allison Piper	93ea533fd3	Drop support for MSVC.	2025-04-04 22:17:03 +00:00
Allison Piper	4d7b3e8100	Add missing header to test.	2025-04-04 17:44:33 -04:00
Sergey Pavlov	a171514056	Added cudaGetLastError() calls to reset benchmarking kernel errors (issue 88). (#173 ) * Create and use NVBENCH_CUDA_CALL_RESET_ERROR. * Moved cudaGetLastError() call to NVBENCH_CUDA_CALL macro --------- Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com>	2024-05-31 11:32:01 -04:00
Allison Piper	555d628e9b	Use a reproducible seed in test rng. (#164 )	2024-04-12 11:55:05 -04:00
Allison Piper	5ee8811a1a	Fix and test using RAII global state in `main`. (#168 )	2024-04-09 17:27:49 -04:00
Allison Piper	165cf924c5	Refactor main implementation to improve reusability and customization. (#165 ) * Refactor main implementation to improve reusability and customization. Move the implementation of `main` out of macros and into separate functions. This allows for easier reuse and customization of the macros. Existing macro usage should still work as expected, and new customization points will simplify common tasks like argument parsing going forward. * Add tests that validate common main customizations.	2024-04-09 12:45:58 -04:00
Allison Piper	a0f2fab72b	Squashed commit of the following: commit `c5b2fc0a8b` Author: Allison Piper <alliepiper16@gmail.com> Date: Sat Apr 6 21:48:20 2024 +0000 Add supported compilers and tools in README.md. commit `92fe366da5` Author: Allison Piper <alliepiper16@gmail.com> Date: Sat Apr 6 20:45:30 2024 +0000 Fix issues discovered by header tests. commit `f7f6c92143` Author: Allison Piper <alliepiper16@gmail.com> Date: Sat Apr 6 20:45:06 2024 +0000 Setup header tests, add C++20 header tests + examples. The core library will always be built with C++17, but we test our headers / examples under 17 and 20. commit `4b24f26b66` Author: Allison Piper <alliepiper16@gmail.com> Date: Sat Apr 6 16:21:42 2024 +0000 Pass CUDA FLAGS to install tests. commit `4fb672ae91` Author: Allison Piper <alliepiper16@gmail.com> Date: Sat Apr 6 15:43:41 2024 +0000 Add newer GCC (13) and Clang (17, 18).	2024-04-06 22:05:40 +00:00
Allison Piper	e8c8877d36	Squashed commit of the following: commit `4b309e6ad8` Author: Allison Piper <alliepiper16@gmail.com> Date: Sat Apr 6 13:19:14 2024 +0000 Minor cleanups commit `476ed2ceae` Author: Allison Piper <alliepiper16@gmail.com> Date: Sat Apr 6 12:53:37 2024 +0000 WAR compiler ice in nlohmann json. Only seeing this on GCC 9 + CTK 11.1. Seems to be having trouble with the `[[no_unique_address]]` optimization. commit `a9bf1d3e42` Author: Allison Piper <alliepiper16@gmail.com> Date: Sat Apr 6 00:24:47 2024 +0000 Bump nlohmann json. commit `80980fe373` Author: Allison Piper <alliepiper16@gmail.com> Date: Sat Apr 6 00:22:07 2024 +0000 Fix llvm filesystem support commit `f6099e6311` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 23:18:44 2024 +0000 Drop MSVC 2017 testing. commit `5ae50a8ef5` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 23:02:32 2024 +0000 Add mroe missing headers. commit `b2a9ae04d9` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 22:37:56 2024 +0000 Remove old CUDA+MSVC builds and make windows build-only. commit `5b18c26a28` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 22:37:07 2024 +0000 Fix header for std::min/max. Why do I always think it's utility instead of algorithm.... commit `6a409efa2d` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 22:18:18 2024 +0000 Temporarily disable CUPTI on all windows builds. commit `f432f88866` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 21:42:52 2024 +0000 Fix warnings on MSVC. commit `829787649b` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 21:03:16 2024 +0000 More flailing about in powershell. commit `21742e6bea` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 20:36:08 2024 +0000 Cleanup filesystem header handling. commit `de3d202635` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 20:09:00 2024 +0000 Windows CI debugging. commit `a4151667ff` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 19:45:40 2024 +0000 Quotation mark madness commit `dd04f3befe` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 19:27:27 2024 +0000 Temporarily disable NVML on windows CI until new containers are ready. commit `f3952848c4` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 19:25:22 2024 +0000 WAR issues on gcc-7. commit `198986875e` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 19:25:04 2024 +0000 More matrix/devcontainer updates. commit `b9712f8696` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 18:30:35 2024 +0000 Fix windows build scripts. commit `943f268280` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 18:18:33 2024 +0000 Fix warnings with clang host compiler. commit `7063e1d60a` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 18:14:28 2024 +0000 More devcontainer hijinks. commit `06532fde81` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 17:51:25 2024 +0000 More matrix updates. commit `78a265ea55` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 17:34:00 2024 +0000 Support CLI CMake options for windows ci scripts. commit `670895c867` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 17:31:59 2024 +0000 Add missing devcontainers. commit `b121823e74` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 17:22:54 2024 +0000 Build for `all-major` architectures in presets. We can get away with this because we require CMake 3.23.1. This was added in 3.23. commit `fccfd44685` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 17:22:08 2024 +0000 Update matrix file. commit `e7d43ba90e` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 16:23:48 2024 +0000 Consolidate build/test jobs. commit `c4044056ec` Author: Allison Piper <alliepiper16@gmail.com> Date: Fri Apr 5 16:04:11 2024 +0000 Add missing build script.	2024-04-06 13:56:10 +00:00
Georgy Evtushenko	4be0c5bdcd	API convention	2024-01-11 10:48:52 -08:00
Georgy Evtushenko	dacbee127c	Base method naming convention	2024-01-11 10:41:11 -08:00
Georgy Evtushenko	182c77e4f4	Got rid of the params description API	2024-01-10 12:30:17 -08:00
Georgy Evtushenko	42c6bdea70	Handle empty input in mean	2024-01-10 09:52:14 -08:00
Georgy Evtushenko	fade52fa2e	Different singleton convention	2024-01-08 14:08:12 -08:00
Georgy Evtushenko	85ed6f007c	Rename criterion registry to criterion manager	2024-01-08 13:15:46 -08:00
Georgy Evtushenko	de724a21f1	Rename get_params to get_params_description	2024-01-08 13:06:48 -08:00
Georgy Evtushenko	b789240c76	Entropy-based stopping criterion	2024-01-05 14:59:48 -08:00
Vyas Ramasubramani	a3b729bca8	fmt::memory_buffer is no longer an iterator.	2022-11-03 10:04:02 -07:00
Yunsong Wang	af4c35d78b	Fix a bug in config count unit test: count number of devices as well	2022-02-11 18:24:58 -05:00
Yunsong Wang	6159d9c6cb	Minor correction in unit test	2022-02-06 20:19:21 -05:00
Yunsong Wang	33a896f99e	Update copyright year	2022-02-04 17:25:50 -05:00
Yunsong Wang	470beda9f0	Add nvbench::state stream tests	2022-02-04 16:55:29 -05:00
Allison Vacanti	a72f248af6	Require the NVBench package in test_export testing.	2022-01-19 15:42:26 -05:00
Allison Vacanti	6dee1eec3b	Refactor summary API and update nvbench/summary.cuh docs. The string used when constructing a summary is no longer a human readable name, but rather a tag string (e.g. "nv/cold/time/gpu/mean"). These will make lookup easier and more stable going forward. name vs. short_name no longer exists. Now there is just "name", which is used for column headings. The "description" string may still be used for detailed information. Updated the json tests and compare script to reflect these changes.	2022-01-11 15:06:26 -05:00
Allison Vacanti	2f8bb28c52	Merge pull request #64 from allisonvacanti/noise_convergence New convergence check	2021-12-21 21:30:39 -05:00
Allison Vacanti	178dd0eb68	Implement new convergence check for noisy kernels. Previously, convergence was tested by waiting for the relative stdev of cuda timings ("noise") to drop below a certain percentage (`max_noise`). This assumed that all benchmarks would eventually see their noise drop to some threshold, but this is not the case. In practice, many benchmarks never converge to the default 0.5% relative stdev and instead will always run to the 15s timeout -- even if the means have converged in a second or two. Added a new check that tests when the noise itself stabilizes and ends the benchmark, even if noise > max_noise. After testing, this patch alone significantly reduces the runtime of the Thrust+CUB benchmark suite (from 30 hours to 5 hours) and produces similar timing results. The parameters used to tune this feature are not exposed -- if this approach works long-term and there's a strong motivation to let users tweak them, then we can worry about names/APIs/CLI/docs later.	2021-12-21 21:24:02 -05:00
Allison Vacanti	8e56a7bd94	Add `noisy_bench` with some benchmarks that currently always time-out.	2021-12-21 21:05:13 -05:00
Allison Vacanti	c9ab8e2eb3	Fix progress display for inactive type axis values. When type axis values were disabled they were still counted towards a benchmark's total number of configs.	2021-12-21 20:36:52 -05:00
Allison Vacanti	20522c807d	Add an `nvbench-ctl` executable. This will provide functionality such as clock locking (--lgm), persistance mode (--pm), device querying (--list), version checking (--version), and documentation (--help). This is possible already with any nvbench executable, but having one with a reliable name will be helpful for scripting and writing documentation.	2021-12-21 12:02:07 -05:00
Allison Vacanti	5d70492714	Enable more warning flags. - /W4 on MSVC - -Wall -Wextra + others on gcc/clang - New NVBench_ENABLE_WERROR option to toggle "warnings as errors" - Mark the nlohmann_json library as IMPORTED to switch to system includes - Rename nvbench_main -> nvbench.main to follow target name conventions - Explicitly suppress some cudafe warnings when compiling templates in nlohmann_json headers. - Explicitly suppress some warnings from Thrust headers. - Various fixes for warnings exposed by new flags. - Disable CUPTI on CTK < 11.3 (See #52).	2021-12-18 20:13:25 -05:00
Georgy Evtushenko	1bc715267c	CUPTI support	2021-12-18 12:03:52 +03:00
Allison Vacanti	b2d37c21fd	Add export tests.	2021-10-20 14:02:16 -04:00
Allison Vacanti	ef36d3a558	Port to rapids-cmake. - Add export sets - Add install rules - Remove manual CPM import, port to rapids_cpm_, etc - Organize CMake code into cmake/.cmake files. - NVBench is now a shared library.	2021-10-20 14:02:16 -04:00
Allison Vacanti	ed27365a41	Disable portion of test due to GCC 7 bug. Fixes #39.	2021-10-19 12:26:02 -04:00
Allison Vacanti	0dcd915ea6	Fix test failure.	2021-03-18 16:07:50 -04:00
Allison Vacanti	4e83e048ba	Store percentages as ratios. Human-readable outputs (md) and CLI inputs still use percentages. In-memory and machine-readable outputs (csv, json) use ratios. This is the convention that spreadsheet apps expect. Fixes #2.	2021-03-18 13:42:43 -04:00
Allison Vacanti	ea53972af8	Add nvbench.all metatarget. This builds all NVBench tests and examples without building targets in any parent projects.	2021-03-18 13:33:23 -04:00

1 2 3

108 Commits