Also cleaned up the annoying quirk where `set_type_axes_names` *had*
to be called on all benchmarks with type axes.
Default names are {"T", "U", "V", "W"} for up-to four type axes. For
five or more, {"T0", "T1", ...} is used instead.
Remove the original attempt to adapt gbench to do CUDA stuff.
Update all benchmarks to use some conventions:
- Element count -> "Elements" [16:32]
- Throughput calcs
- Add input buffer column: "Size"
Instead of:
```
state.set_element_count(size);
state.set_global_memory_bytes_accessed(
size * (sizeof(InT) + sizeof(OutT)));
```
do:
```
state.add_element_count(size, "Elements");
state.add_global_memory_read<InT>(size, "InputSize");
state.add_global_memory_write<InT>(size, "OutputSize");
```
The string arguments are optional. If provided, a new column will
be added to the output with the indicated name and number
of bytes (or elements for `add_element_count`).
See the new thrust/sort/basic.cu benchmark for usage.
Other notable changes:
- Updated summary column names:
- Cold GPU -> GPU Time
- Cold CPU -> CPU Time
- Hot GPU -> Batch GPU
- Removed CPU timings from measure_hot
- They'd been hidden for a while, and aren't really useful.
- Moved the throughput calcs to measure_cold
- `timer` will disable `hot` timings, still want throughput
- `cold` timings make more sense for throughput, global BW numbers
are meaningless if the data is sitting in L2.
nvbench::exec_tags are used to request measurement types and share
information about the kernel. They are used to ensure that templated
measurement code is not instantiated unless actually used.
Replaces the nvbench::exec(state, launcher, tags) pattern with:
state.exec(tags, launcher);
state.exec(launcher); // defaults to hot/cold cuda measurements
s/NVBENCH_CREATE/NVBENCH_BENCH/g
s/NVBENCH_BENCH_TEMPLATE/NVBENCH_BENCH_TYPES/g
This will fit nicer once the exec_tags version are added:
NVBENCH_BENCH
NVBENCH_BENCH_TYPES
NVBENCH_BENCH_FLAGS
NVBENCH_BENCH_TYPES_FLAGS