Move NVBench JSON result parsing into cuda.bench.results with explicit
BenchmarkResult, BenchmarkResultDevice, BenchmarkResultSummary,
SubBenchmarkResult, and SubBenchmarkState types. Remove the result reader
from the top-level cuda.bench namespace and require construction through
BenchmarkResult.from_json() or BenchmarkResult.empty().
Preserve bulk sample/frequency parsing and estimator helpers while making
summaries rich objects that retain tag/name/hint/hide/description metadata.
Add nvbench-json-summary to render NVBench JSON output as an NVBench-style
markdown summary table, including axis formatting, device sections, hidden
summary filtering, and summary hint formatting.
Update packaging, type stubs, and tests for the new namespace, renamed
classes, Python 3.10-compatible annotations, and summary-table generation.
This allows for Pythonic way of working with BenchResult
as if it was a dictionary.
```
In [1]: import array, numpy as np, cuda.bench
In [2]: r = cuda.bench.BenchResult("temp_data/axes_run1.json")
In [3]: list(r)
Out[3]:
['simple',
'single_float64_axis',
'copy_sweep_grid_shape',
'copy_type_sweep',
'copy_type_conversion_sweep',
'copy_type_and_block_size_sweep']
In [4]: r["simple"].centers(lambda t: np.percentile(t, [25,75]))
Out[4]: {'Device=0': array([0.00100966, 0.00101299])}
In [5]: r.centers(lambda t: np.percentile(t, [25,75]))["simple"]
Out[5]: {'Device=0': array([0.00100966, 0.00101299])}
In [6]: len(r)
Out[6]: 6
In [7]: "fake" in r
Out[7]: False
```
Add arbitrary BenchResult metadata and explicit parse control, replacing
the previous code/elapsed fields. Make BenchResult subscriptable by
subbenchmark name and make SubBenchResult list-like over its states.
Extend SubBenchState parsing to expose summaries by tag, read paired
sample frequency data, return None for unavailable sample/frequency
files, and validate matching sample/frequency lengths.
Harden parsing for NVBench JSON output with no-axis benchmarks, null
axis_values, skipped states with null summaries, float axis input_string
lookups, and recorded sidecar binary paths.
Expand BenchResult tests to cover metadata, parse=False, sequence-style
access, frequency-aware centers, missing binary data, skipped states,
and mismatched sample/frequency counts.
Example usage:
```
import array, numpy as np, cuda.bench
r = cuda.bench.BenchResult("perf_data/axes_run1.json")
r["copy_sweep_grid_shape"].centers_with_frequencies(
lambda t, f: np.median(np.asarray(t)*np.asarray(f)))
```
* Correct Python API signature of State.get_axis_values_as_strings
The C++ API has default boolean argument color, but Python API
declared no arguments.
Closes#345
* Also exercise invocation of get_axis_values_as_string with keyword argument value
* Remove use of cuda.core.experimental