Text for --profile modified to be self-consistent, i.e., not to refer
to removed --run-once and --disable-blocking-kernel for explanantion
of what it does.
Fixes#95.
CPU-only mode is enabled by setting the `is_cpu_only` property while
defining a benchmark, e.g. `NVBENCH_BENCH(foo).set_is_cpu_only(true)`.
An optional `nvbench::exec_tag::no_gpu` hint can also be passed to
`state.exec` to avoid instantiating GPU benchmarking backends. Note that
a CUDA compiler and CUDA runtime are always required, even if all benchmarks
in a translation unit are CPU-only.
Similarly, a new `nvbench::exec_tag::gpu` hint can be used to avoid
compiling CPU-only backends for GPU benchmarks.
Locking clocks is currently only implemented for Volta+ devices.
Example usage:
my_bench -d [0,1,3] --persistence-mode 1 --lock-gpu-clocks base
See the cli_help.md docs for more info.
Fixes#10.
Adds a mode that forces a benchmark to only run once, simplifying
profiling usecases. This can be enabled by any of the following methods:
* Passing `--run-once` on the command line
* `NVBENCH_CREATE(...).set_run_once(true)` when declaring a benchmark
* `state.set_run_once(true)` from within the benchmark implementation.
Human-readable outputs (md) and CLI inputs still use percentages.
In-memory and machine-readable outputs (csv, json) use ratios.
This is the convention that spreadsheet apps expect. Fixes#2.
- `enum_type_axis` simplifies using integral_constants with type axes.
- `examples/enums.cu` demonstrates various ways of implementing parameter
sweeps with enum types.