Fixes#95.
CPU-only mode is enabled by setting the `is_cpu_only` property while
defining a benchmark, e.g. `NVBENCH_BENCH(foo).set_is_cpu_only(true)`.
An optional `nvbench::exec_tag::no_gpu` hint can also be passed to
`state.exec` to avoid instantiating GPU benchmarking backends. Note that
a CUDA compiler and CUDA runtime are always required, even if all benchmarks
in a translation unit are CPU-only.
Similarly, a new `nvbench::exec_tag::gpu` hint can be used to avoid
compiling CPU-only backends for GPU benchmarks.
Changes in the work include:
- [x] Internally use linear_space for iterating
- [x] Simplify type and value iteration in `state_iterator::build_axis_configs`
- [x] Store the iteration space in `axes_metadata`
- [x] Expose `tie` and `user` spaces to user
- [x] Add tests for `linear`, `tie`, and `user`
- [x] Add examples for `tie` and `user`
Locking clocks is currently only implemented for Volta+ devices.
Example usage:
my_bench -d [0,1,3] --persistence-mode 1 --lock-gpu-clocks base
See the cli_help.md docs for more info.
Fixes#10.
Adds a mode that forces a benchmark to only run once, simplifying
profiling usecases. This can be enabled by any of the following methods:
* Passing `--run-once` on the command line
* `NVBENCH_CREATE(...).set_run_once(true)` when declaring a benchmark
* `state.set_run_once(true)` from within the benchmark implementation.
Human-readable outputs (md) and CLI inputs still use percentages.
In-memory and machine-readable outputs (csv, json) use ratios.
This is the convention that spreadsheet apps expect. Fixes#2.
- `enum_type_axis` simplifies using integral_constants with type axes.
- `examples/enums.cu` demonstrates various ways of implementing parameter
sweeps with enum types.