Files
nvbench/python/cuda/bench/__init__.pyi
Oleksandr Pavlyk 338936b6fe Provide BenchmarkResult class for parsing JSON output of NVBench-instrumented benchmarks (#356)
Implements `cuda.bench.results.BenchmarkResult` class to represent data from JSON output of benchmark execution.

The contains implements two class methods `BenchmarkResult.from_json(filename : str | os.PathLike, *, metadata : Any = None)` which expects well-formed JSON filename and `BenchmarkResult.empty(*, metadata : Any = None)` intended to represent failed result with reasons that can be recorded in metadata at user's discretion.

The `BenchmarkResult` implements mapping interface, supporting `.keys()`, `.values()`, `.items()` methods, `__len__`, `__contains__`, `__getitem__` and `__iter__` special methods. 

Values in `BenchmarkResult` has type `cuda.bench.results.SubBenchmarkResult` which implements a list-like interface, i.e. implements `__len__`, `__getitem__`, and `__iter__` special methods. Values in this list-like structure correspond to measurements of individual states of a particular benchmark (the key in `BenchmarkResult`).

Elements of `SubBenchmarkResult` structure have type `SubBenchmarkState` that supports mapping protocol with axis_values as a key and represent data corresponding to measurements for a particular state (combination of settings for each axis). 

The state provides `.samples` and `.frequencies` attributes storing raw execution duration values and estimates for average GPU frequencies. 

Example usage:

```
import array, numpy as np, cuda.bench.results

r = cuda.bench.results.BenchmarkResult("perf_data/axes_run1.json")

r["copy_sweep_grid_shape"].centers_with_frequencies(
     lambda t, f: np.median(np.asarray(t)*np.asarray(f)))

```

```
In [1]: import array, numpy as np, cuda.bench.results

In [2]: r = cuda.bench.results.BenchmarkResult("temp_data/axes_run1.json")

In [3]: list(r)
Out[3]:
['simple',
 'single_float64_axis',
 'copy_sweep_grid_shape',
 'copy_type_sweep',
 'copy_type_conversion_sweep',
 'copy_type_and_block_size_sweep']

In [4]: r["simple"].centers(lambda t: np.percentile(t, [25,75]))
Out[4]: {'Device=0': array([0.00100966, 0.00101299])}

In [5]: r.centers(lambda t: np.percentile(t, [25,75]))["simple"]
Out[5]: {'Device=0': array([0.00100966, 0.00101299])}

In [6]: len(r)
Out[6]: 6

In [7]: "fake" in r
Out[7]: False
```

Each `SubBenchmarkState` implements 
`.summaries` attribute - rich object that retains tag/name/hint/hide/description metadata.

* Add nvbench-json-summary to render NVBench JSON output as an NVBench-style
markdown summary table, including axis formatting, device sections, hidden
summary filtering, and summary hint formatting.

Update packaging, type stubs, and tests for the new namespace, renamed
classes, Python 3.10-compatible annotations, and summary-table generation.

* Split tests in test_benchmark_result into smaller tests

* Fix break due to file name change

* Add python/examples/benchmark_result_autotune.py

This example demonstrates using cuda.bench and cuda.bench.results
to implement simple auto-tuning, demonstrated on selecting of
tile shape hyperparameter for naive stencil kernel implemented
in numba-cuda.

* Resolve ruff PLE0604

* Fix for format_axis_value in json format script to handle None value

Add tests to cover such input.

* Address code rabbit review feedback

* Fix license header, add validation

* Addressed both issues raised in review

Malformed values are now represented in result as None.

Skipped benchmarks are no longer dropped, i.e., they are present
in BenchmarkResult data, but they are not reflected in summary
table in line with what NVBench-instrumented benchmarks do.
2026-05-13 13:23:58 -05:00

128 lines
5.2 KiB
Python

# Copyright 2025-2026 NVIDIA Corporation
#
# Licensed under the Apache License, Version 2.0 with the LLVM exception
# (the "License"); you may not use this file except in compliance with
# the License.
#
# You may obtain a copy of the License at
#
# http://llvm.org/foundation/relicensing/LICENSE.txt
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================
# PLEASE KEEP IN SYNC WITH py_nvbench.cpp FILE
# ============================================
# Please be sure to keep these type hints and docstring in sync
# with the pybind11 bindings in ``../../src/py_nvbench.cpp``
# Use mypy's stubgen to auto-generate stubs using
# ``stubgen -m cuda.nvbench._nvbench`` and compare
# stubs in generated out/cuda/nvbench/_nvbench.pyi
# with definitions given here.
from collections.abc import (
Callable,
Sequence,
)
from typing import (
Optional,
Self,
SupportsFloat,
SupportsInt,
Union,
)
class CudaStream:
def __cuda_stream__(self) -> tuple[int, int]: ...
def addressof(self) -> int: ...
class Benchmark:
def get_name(self) -> str: ...
def add_int64_axis(self, name: str, values: Sequence[SupportsInt]) -> Self: ...
def add_int64_power_of_two_axis(
self, name: str, values: Sequence[SupportsInt]
) -> Self: ...
def add_float64_axis(self, name: str, values: Sequence[SupportsFloat]) -> Self: ...
def add_string_axis(self, name: str, values: Sequence[str]) -> Self: ...
def set_name(self, name: str) -> Self: ...
def set_run_once(self, v: bool) -> Self: ...
def set_skip_time(self, duration_seconds: SupportsFloat) -> Self: ...
def set_throttle_recovery_delay(self, delay_seconds: SupportsFloat) -> Self: ...
def set_throttle_threshold(self, threshold: SupportsFloat) -> Self: ...
def set_timeout(self, duration_seconds: SupportsFloat) -> Self: ...
def set_stopping_criterion(self, criterion: str) -> Self: ...
def set_criterion_param_float64(self, name: str, value: SupportsFloat) -> Self: ...
def set_criterion_param_int64(self, name: str, value: SupportsInt) -> Self: ...
def set_criterion_param_string(self, name: str, value: str) -> Self: ...
def set_min_samples(self, count: SupportsInt) -> Self: ...
def set_is_cpu_only(self, is_cpu_only: bool) -> Self: ...
class Launch:
def get_stream(self) -> CudaStream: ...
class State:
def has_device(self) -> bool: ...
def has_printers(self) -> bool: ...
def get_device(self) -> Union[int, None]: ...
def get_stream(self) -> CudaStream: ...
def get_int64(self, name: str) -> int: ...
def get_int64_or_default(self, name: str, default_value: SupportsInt) -> int: ...
def get_float64(self, name: str) -> float: ...
def get_float64_or_default(
self, name: str, default_value: SupportsFloat
) -> float: ...
def get_string(self, name: str) -> str: ...
def get_string_or_default(self, name: str, default_value: str) -> str: ...
def add_element_count(
self, count: SupportsInt, column_name: Optional[str] = None
) -> None: ...
def set_element_count(self, count: SupportsInt) -> None: ...
def get_element_count(self) -> int: ...
def skip(self, reason: str) -> None: ...
def is_skipped(self) -> bool: ...
def get_skip_reason(self) -> str: ...
def add_global_memory_reads(
self, nbytes: SupportsInt, /, column_name: str = ""
) -> None: ...
def add_global_memory_writes(
self, nbytes: SupportsInt, /, column_name: str = ""
) -> None: ...
def get_benchmark(self) -> Benchmark: ...
def get_throttle_threshold(self) -> float: ...
def set_throttle_threshold(self, threshold_fraction: SupportsFloat) -> None: ...
def get_min_samples(self) -> int: ...
def set_min_samples(self, min_samples_count: SupportsInt) -> None: ...
def get_disable_blocking_kernel(self) -> bool: ...
def set_disable_blocking_kernel(self, flag: bool) -> None: ...
def get_run_once(self) -> bool: ...
def set_run_once(self, run_once_flag: bool) -> None: ...
def get_timeout(self) -> float: ...
def set_timeout(self, duration: SupportsFloat) -> None: ...
def get_blocking_kernel_timeout(self) -> float: ...
def set_blocking_kernel_timeout(self, duration: SupportsFloat) -> None: ...
def exec(
self,
fn: Callable[[Launch], None],
/,
*,
batched: Optional[bool] = True,
sync: Optional[bool] = False,
): ...
def get_short_description(self) -> str: ...
def add_summary(
self, column_name: str, value: Union[SupportsInt, SupportsFloat, str]
) -> None: ...
def get_axis_values(self) -> dict[str, int | float | str]: ...
def get_axis_values_as_string(self, color: bool = ...) -> str: ...
def get_stopping_criterion(self) -> str: ...
def register(fn: Callable[[State], None]) -> Benchmark: ...
def run_all_benchmarks(argv: Sequence[str]) -> None: ...
class NVBenchRuntimeError(RuntimeError): ...