Previously, convergence was tested by waiting for the relative stdev
of cuda timings ("noise") to drop below a certain percentage
(`max_noise`).
This assumed that all benchmarks would eventually see their noise drop
to some threshold, but this is not the case. In practice, many benchmarks
never converge to the default 0.5% relative stdev and instead will always
run to the 15s timeout -- even if the means have converged in a second
or two.
Added a new check that tests when the noise itself stabilizes and ends
the benchmark, even if noise > max_noise.
After testing, this patch alone significantly reduces the runtime of the
Thrust+CUB benchmark suite (from 30 hours to 5 hours) and produces similar
timing results.
The parameters used to tune this feature are not exposed -- if this
approach works long-term and there's a strong motivation to let users
tweak them, then we can worry about names/APIs/CLI/docs later.
- Add export sets
- Add install rules
- Remove manual CPM import, port to rapids_cpm_*, etc
- Organize CMake code into cmake/*.cmake files.
- NVBench is now a shared library.
This avoids an annoying case where the max value is dropped due to
rounding errors.
Adds a few other missing test cases for `nvbench::range`, too.
Fixes#3.
- `enum_type_axis` simplifies using integral_constants with type axes.
- `examples/enums.cu` demonstrates various ways of implementing parameter
sweeps with enum types.
Refactor nvbench::state to use this for axis parameters.
These will also be useful for summaries and measurements.
Also adds a new ASSERT_THROWS_ANY macro to test some of the new API.