This allows for Pythonic way of working with BenchResult
as if it was a dictionary.
```
In [1]: import array, numpy as np, cuda.bench
In [2]: r = cuda.bench.BenchResult("temp_data/axes_run1.json")
In [3]: list(r)
Out[3]:
['simple',
'single_float64_axis',
'copy_sweep_grid_shape',
'copy_type_sweep',
'copy_type_conversion_sweep',
'copy_type_and_block_size_sweep']
In [4]: r["simple"].centers(lambda t: np.percentile(t, [25,75]))
Out[4]: {'Device=0': array([0.00100966, 0.00101299])}
In [5]: r.centers(lambda t: np.percentile(t, [25,75]))["simple"]
Out[5]: {'Device=0': array([0.00100966, 0.00101299])}
In [6]: len(r)
Out[6]: 6
In [7]: "fake" in r
Out[7]: False
```
Add arbitrary BenchResult metadata and explicit parse control, replacing
the previous code/elapsed fields. Make BenchResult subscriptable by
subbenchmark name and make SubBenchResult list-like over its states.
Extend SubBenchState parsing to expose summaries by tag, read paired
sample frequency data, return None for unavailable sample/frequency
files, and validate matching sample/frequency lengths.
Harden parsing for NVBench JSON output with no-axis benchmarks, null
axis_values, skipped states with null summaries, float axis input_string
lookups, and recorded sidecar binary paths.
Expand BenchResult tests to cover metadata, parse=False, sequence-style
access, frequency-aware centers, missing binary data, skipped states,
and mismatched sample/frequency counts.
Example usage:
```
import array, numpy as np, cuda.bench
r = cuda.bench.BenchResult("perf_data/axes_run1.json")
r["copy_sweep_grid_shape"].centers_with_frequencies(
lambda t, f: np.median(np.asarray(t)*np.asarray(f)))
```
* Correct Python API signature of State.get_axis_values_as_strings
The C++ API has default boolean argument color, but Python API
declared no arguments.
Closes#345
* Also exercise invocation of get_axis_values_as_string with keyword argument value
* Remove use of cuda.core.experimental
Fixed relative text alignment in docstrings to fix autodoc warnigns
Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions
to start with underscore, signaling that these functions are internal and should
not be documented
Account for test_cpp_exceptions -> _test_cpp_exception, same for *_py_*
Make sure to reset __module__ of reexported symbols to be cuda.bench