Similar to `no_block`, this is a runtime variable that doesn't need to be encoded statically.
It was not exposed publicly and existing solely as an implementation detail of `state::exec`, introducing unnecessary complexity there.
It's not worth instantiating multiple instances of the measurement class to handle this.
Since there's already runtime option to disable the blocking kernel, the current implementation by default will instantiate both the blocking and non-blocking version of the algorithm for dynamic dispatch.
Fixes#95.
CPU-only mode is enabled by setting the `is_cpu_only` property while
defining a benchmark, e.g. `NVBENCH_BENCH(foo).set_is_cpu_only(true)`.
An optional `nvbench::exec_tag::no_gpu` hint can also be passed to
`state.exec` to avoid instantiating GPU benchmarking backends. Note that
a CUDA compiler and CUDA runtime are always required, even if all benchmarks
in a translation unit are CPU-only.
Similarly, a new `nvbench::exec_tag::gpu` hint can be used to avoid
compiling CPU-only backends for GPU benchmarks.
Newer versions of fmt have a ton of issues building on CTK 11.1, and 11.8 is the next available container we have built for CI. We may still work with some earlier versions, but we do not test them.
We no longer have CI images available for clang < 14, so drop official support.
Switched away from the rapids-cmake provided version and manually CPM'd it.
rapids-cmake will stop providing fmtlib later this year, and the version currently supported is rather old.
Included the same logic that rapids-cmake currently uses to hopefully provide a smooth transition for edge cases (external fmt, etc).
Added `FMT_SYSTEM_HEADERS=ON` to mark fmt headers as system includes, suppressing any internal warnings.
The error message was being generated after moving strings out of `names`, so some of the axis names were blank.
This moves the check + error before any strings are moved.
* Create and use NVBENCH_CUDA_CALL_RESET_ERROR.
* Moved cudaGetLastError() call to NVBENCH_CUDA_CALL macro
---------
Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com>
* Refactor main implementation to improve reusability and customization.
Move the implementation of `main` out of macros and into separate
functions. This allows for easier reuse and customization of the macros.
Existing macro usage should still work as expected, and new
customization points will simplify common tasks like argument parsing
going forward.
* Add tests that validate common main customizations.