1. For JSON files that contains repeated measurements of run-time
axis values, make sure that scripts compares corresponding
reference entries.
If cmp had two states with the same name and ref had two, we
would compare measurements for each state in cmp against the
first state in ref.
Change here introduces counters tracking how many times each
particular axis value, and retrieve corresponding entry in ref.
Previously, I had
```
| BlockSize | NumBlocks | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
|-------------|-------------|------------|-------------|------------|-------------|-----------|---------|----------|
| 2^8 | 64 | 1.776 ms | 0.46% | 1.777 ms | 0.40% | 1.024 us | 0.06% | SAME |
| 2^8 | 64 | 1.776 ms | 0.46% | 1.774 ms | 0.52% | -2.048 us | -0.12% | SAME |
| 2^8 | 64 | 1.776 ms | 0.46% | 1.773 ms | 0.52% | -3.072 us | -0.17% | SAME |
| 2^8 | 64 | 1.776 ms | 0.46% | 1.774 ms | 0.58% | -2.048 us | -0.12% | SAME |
| 2^8 | 64 | 1.776 ms | 0.46% | 1.773 ms | 0.58% | -3.072 us | -0.17% | SAME |
```
and now it becomes
```
| BlockSize | NumBlocks | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
|-------------|-------------|------------|-------------|------------|-------------|-----------|---------|----------|
| 2^8 | 64 | 1.776 ms | 0.46% | 1.777 ms | 0.40% | 1.024 us | 0.06% | SAME |
| 2^8 | 64 | 1.773 ms | 0.64% | 1.774 ms | 0.52% | 1.024 us | 0.06% | SAME |
| 2^8 | 64 | 1.774 ms | 0.46% | 1.773 ms | 0.52% | -1.024 us | -0.06% | SAME |
| 2^8 | 64 | 1.773 ms | 0.46% | 1.774 ms | 0.58% | 1.024 us | 0.06% | SAME |
| 2^8 | 64 | 1.774 ms | 0.52% | 1.773 ms | 0.58% | -1.024 us | -0.06% | SAME |
```
With the following raw data expected
```
(py313) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/median") | .data[] | .value' base.json
"0.0017756160497665405"
"0.0017725440263748169"
"0.001773568034172058"
"0.0017725440263748169"
"0.001773568034172058"
(py313) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/median") | .data[] | .value' test.json
"0.0017766400575637818"
"0.001773568034172058"
"0.0017725440263748169"
"0.001773568034172058"
"0.0017725440263748169"
```
2. nvbench_compare changes from using min_noise = min(ref_noise, cmp_noise) to using max_noise = max(ref_noise, cmp_noise)
Using larger of ref and cmp noise level as a reference against which to gauge timing difference ratio makes more sense.
* Correct Python API signature of State.get_axis_values_as_strings
The C++ API has default boolean argument color, but Python API
declared no arguments.
Closes#345
* Also exercise invocation of get_axis_values_as_string with keyword argument value
* Remove use of cuda.core.experimental
Fixed relative text alignment in docstrings to fix autodoc warnigns
Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions
to start with underscore, signaling that these functions are internal and should
not be documented
Account for test_cpp_exceptions -> _test_cpp_exception, same for *_py_*
Make sure to reset __module__ of reexported symbols to be cuda.bench
* Introduce function colorize to modularize colorization/no-color handling
* Use sns.set_theme instead of deprecated sns.set()
* Use str.format instead of legacy % syntax
* Simplified iteration over list
Use f-string (supported since Python 3.6) instead of str.format for
better readability and performance
Fix GCC16 sfinae incomplete warnings.
GCC16 started requiring that the type `T` used in `std::reference_wrapper<T>` is complete where using `-std=c++17`. Since NVBench has to forward declare some types in header files to break circular dependency, use of incomplete type breaks build due to use of `-Werror` flag due to `-Wsfinae-incomplete` warning emitted by GCC16.
This commit replaced affected uses of `std::reference_wrapper<const nvbench::benchmark_base>` in state.cxx, and `std::reference_wrapper<nvbench::printer_base>` in benchmark_base.cxx with raw pointers.
* Add cuda architectures to build wheel for
* Package scripts in wheel
* Separate cuda major version extraction to fix architecutre selection logic
* Add back statement printing cuda version
* [pre-commit.ci] auto code formatting
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This loop uses benchmark.run_or_skip to resolve#284 even
for scripts that contain more than one benchmark, or when
a script with a single benchmark is executed when more than
one device is available.