Fixes#95.
CPU-only mode is enabled by setting the `is_cpu_only` property while
defining a benchmark, e.g. `NVBENCH_BENCH(foo).set_is_cpu_only(true)`.
An optional `nvbench::exec_tag::no_gpu` hint can also be passed to
`state.exec` to avoid instantiating GPU benchmarking backends. Note that
a CUDA compiler and CUDA runtime are always required, even if all benchmarks
in a translation unit are CPU-only.
Similarly, a new `nvbench::exec_tag::gpu` hint can be used to avoid
compiling CPU-only backends for GPU benchmarks.
Changes in the work include:
- [x] Internally use linear_space for iterating
- [x] Simplify type and value iteration in `state_iterator::build_axis_configs`
- [x] Store the iteration space in `axes_metadata`
- [x] Expose `tie` and `user` spaces to user
- [x] Add tests for `linear`, `tie`, and `user`
- [x] Add examples for `tie` and `user`
- `enum_type_axis` simplifies using integral_constants with type axes.
- `examples/enums.cu` demonstrates various ways of implementing parameter
sweeps with enum types.