Commit Graph

33 Commits

Author SHA1 Message Date
Oleksandr Pavlyk
d160a2bafa Replace --run-once in testing/CMakeLists.txt with --profile 2025-07-28 12:03:42 -05:00
Elias Stehle
ca0e795b46 Merge pull request #113 from elstehle/fix/per-device-stream
Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream
2025-04-30 15:40:33 -04:00
Sergey Pavlov
a171514056 Added cudaGetLastError() calls to reset benchmarking kernel errors (issue 88). (#173)
* Create and use NVBENCH_CUDA_CALL_RESET_ERROR.

* Moved cudaGetLastError() call to NVBENCH_CUDA_CALL macro

---------

Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com>
2024-05-31 11:32:01 -04:00
Allison Piper
5ee8811a1a Fix and test using RAII global state in main. (#168) 2024-04-09 17:27:49 -04:00
Allison Piper
165cf924c5 Refactor main implementation to improve reusability and customization. (#165)
* Refactor main implementation to improve reusability and customization.

Move the implementation of `main` out of macros and into separate
functions. This allows for easier reuse and customization of the macros.
Existing macro usage should still work as expected, and new
customization points will simplify common tasks like argument parsing
going forward.

* Add tests that validate common main customizations.
2024-04-09 12:45:58 -04:00
Georgy Evtushenko
85ed6f007c Rename criterion registry to criterion manager 2024-01-08 13:15:46 -08:00
Georgy Evtushenko
b789240c76 Entropy-based stopping criterion 2024-01-05 14:59:48 -08:00
Allison Vacanti
178dd0eb68 Implement new convergence check for noisy kernels.
Previously, convergence was tested by waiting for the relative stdev
of cuda timings ("noise") to drop below a certain percentage
(`max_noise`).

This assumed that all benchmarks would eventually see their noise drop
to some threshold, but this is not the case. In practice, many benchmarks
never converge to the default 0.5% relative stdev and instead will always
run to the 15s timeout -- even if the means have converged in a second
or two.

Added a new check that tests when the noise itself stabilizes and ends
the benchmark, even if noise > max_noise.

After testing, this patch alone significantly reduces the runtime of the
Thrust+CUB benchmark suite (from 30 hours to 5 hours) and produces similar
timing results.

The parameters used to tune this feature are not exposed -- if this
approach works long-term and there's a strong motivation to let users
tweak them, then we can worry about names/APIs/CLI/docs later.
2021-12-21 21:24:02 -05:00
Allison Vacanti
b2d37c21fd Add export tests. 2021-10-20 14:02:16 -04:00
Allison Vacanti
ef36d3a558 Port to rapids-cmake.
- Add export sets
- Add install rules
- Remove manual CPM import, port to rapids_cpm_*, etc
- Organize CMake code into cmake/*.cmake files.
- NVBench is now a shared library.
2021-10-20 14:02:16 -04:00
Allison Vacanti
ea53972af8 Add nvbench.all metatarget.
This builds all NVBench tests and examples without building targets in
any parent projects.
2021-03-18 13:33:23 -04:00
Allison Vacanti
9f6404bac6 Pad range max for floating point types.
This avoids an annoying case where the max value is dropped due to
rounding errors.

Adds a few other missing test cases for `nvbench::range`, too.

Fixes #3.
2021-03-16 15:12:53 -04:00
Allison Vacanti
60c94d9ed6 Add enum_type_axis and examples/enums.cu.
- `enum_type_axis` simplifies using integral_constants with type axes.
- `examples/enums.cu` demonstrates various ways of implementing parameter
  sweeps with enum types.
2021-03-16 13:57:52 -04:00
Allison Vacanti
f15b668b03 Add nvbench.test.all and nvbench.example.all metatargets. 2021-03-09 16:03:14 -05:00
Allison Vacanti
cf71f6ee15 Update NVBench build system with initial standalone support. 2021-03-03 13:59:29 -05:00
Allison Vacanti
a747982415 Add nvbench::main CMake target.
Linking to this instead of `nvbench::nvbench` will automatically include
the `NVBENCH_MAIN` macro.
2021-02-19 09:34:02 -05:00
Allison Vacanti
efd4442d1b Add option_parser.
Currently supports `--benchmark` and `--axis` options.
2021-02-03 21:39:17 -05:00
Allison Vacanti
3ffd2e6aea Add NVBENCH_CREATE and associated machinery. 2021-01-01 01:36:53 -05:00
Allison Vacanti
0776bdc4be Add nvbench::runner. 2020-12-31 21:27:51 -05:00
Allison Vacanti
ad44463d6e Replace params class with nvbench::named_values.
Refactor nvbench::state to use this for axis parameters.

These will also be useful for summaries and measurements.

Also adds a new ASSERT_THROWS_ANY macro to test some of the new API.
2020-12-30 14:45:46 -05:00
Allison Vacanti
8c0b8e3423 Add cpu_timer. 2020-12-29 23:51:09 -05:00
Allison Vacanti
b07ffafff4 Add cuda_timer, cuda_stream. 2020-12-29 23:50:39 -05:00
Allison Vacanti
beaead2c3f Split benchmark into more specialized, nontemplated structs. 2020-12-29 19:34:11 -05:00
Allison Vacanti
093077de5f Add nvbench::state.
This class holds a single value for each runtime axis.
2020-12-27 10:44:22 -05:00
Allison Vacanti
7b14ceb3fe Add detail::state_generator.
This helper utility computes the cartesian product of the runtime
axes.
2020-12-27 10:29:24 -05:00
Allison Vacanti
40f92b4705 Add initial nvbench::benchmark.
It's basically just a container for the various axis classes at this
point.
2020-12-24 17:33:03 -05:00
Allison Vacanti
691ed2c18d Add nvbench::params. 2020-12-22 17:44:33 -05:00
Allison Vacanti
76f9c9b0d6 Add nvbench::string_axis. 2020-12-22 16:57:42 -05:00
Allison Vacanti
1e5fe88c9b Add float64_axis. 2020-12-22 16:38:05 -05:00
Allison Vacanti
07e4cc36c2 Add type_axis. 2020-12-22 15:20:04 -05:00
Allison Vacanti
95e2eaf607 Add nvbench::tl::foreach. 2020-12-22 15:20:04 -05:00
Allison Vacanti
13dc404d56 Add int64_axis. 2020-12-21 20:31:12 -05:00
Allison Vacanti
014d94e402 Add nvbench::type_list. 2020-12-20 21:09:47 -05:00