* Build native arch by default, update rapids-cmake.
* Add check that CXX and CUDA_HOST compiler match.
Similar to CCCL, we need these to match to ensure that our warning flag detection functions properly.
* GCC only recognizes `unused-local-typedefs`.
Clang recognizes both. Ensure that we set this for both compilers.
This will provide functionality such as clock locking (--lgm),
persistance mode (--pm), device querying (--list), version checking
(--version), and documentation (--help).
This is possible already with any nvbench executable, but having
one with a reliable name will be helpful for scripting and writing
documentation.
- /W4 on MSVC
- -Wall -Wextra + others on gcc/clang
- New NVBench_ENABLE_WERROR option to toggle "warnings as errors"
- Mark the nlohmann_json library as IMPORTED to switch to system includes
- Rename nvbench_main -> nvbench.main to follow target name conventions
- Explicitly suppress some cudafe warnings when compiling templates in
nlohmann_json headers.
- Explicitly suppress some warnings from Thrust headers.
- Various fixes for warnings exposed by new flags.
- Disable CUPTI on CTK < 11.3 (See #52).
Locking clocks is currently only implemented for Volta+ devices.
Example usage:
my_bench -d [0,1,3] --persistence-mode 1 --lock-gpu-clocks base
See the cli_help.md docs for more info.
- Add export sets
- Add install rules
- Remove manual CPM import, port to rapids_cpm_*, etc
- Organize CMake code into cmake/*.cmake files.
- NVBench is now a shared library.
- New NVBench_ENABLE_EXAMPLES CMake option.
- examples/axis.cu provides examples of parameter sweeps.
- Moves testing/sleep_kernel.cuh -> nvbench/test_kernels.cuh
- Accessible to examples and provides some built-in kernels for users
to experiement with.
- Not included with `<nvbench/nvbench.cuh>`.
Remove the original attempt to adapt gbench to do CUDA stuff.
Update all benchmarks to use some conventions:
- Element count -> "Elements" [16:32]
- Throughput calcs
- Add input buffer column: "Size"