Commit Graph

132 Commits

Author SHA1 Message Date
Oleksandr Pavlyk
119b45defc Add wrong decorator order test for bench.axis.* 2026-05-13 13:34:48 -05:00
Oleksandr Pavlyk
df4cec071f Remove example/auto_throughput.py
The C++ counterpart's purpose is to demonstrate use of CUPTI
metrics, but these are not supported in Python bindings, so
this example is a duplicate of example/throughput.py
2026-05-13 13:34:48 -05:00
Oleksandr Pavlyk
63019a7e42 Add decorators for registering benchmarks and adding axis
cuda.bench.register(fn) continues returning Benchmark, and supports
legacy use.

New signature added:
   cuda.bench.register():
      Returns a decorator

```
@bench.register()
@bench.axis.float64("Duration (s)", [7e-5, 1e-4, 5e-4])
@bench.option.min_samples(120)
def single_float64_axis(state: bench.State):
   ...
```
2026-05-13 13:34:48 -05:00
Oleksandr Pavlyk
338936b6fe Provide BenchmarkResult class for parsing JSON output of NVBench-instrumented benchmarks (#356)
Implements `cuda.bench.results.BenchmarkResult` class to represent data from JSON output of benchmark execution.

The contains implements two class methods `BenchmarkResult.from_json(filename : str | os.PathLike, *, metadata : Any = None)` which expects well-formed JSON filename and `BenchmarkResult.empty(*, metadata : Any = None)` intended to represent failed result with reasons that can be recorded in metadata at user's discretion.

The `BenchmarkResult` implements mapping interface, supporting `.keys()`, `.values()`, `.items()` methods, `__len__`, `__contains__`, `__getitem__` and `__iter__` special methods. 

Values in `BenchmarkResult` has type `cuda.bench.results.SubBenchmarkResult` which implements a list-like interface, i.e. implements `__len__`, `__getitem__`, and `__iter__` special methods. Values in this list-like structure correspond to measurements of individual states of a particular benchmark (the key in `BenchmarkResult`).

Elements of `SubBenchmarkResult` structure have type `SubBenchmarkState` that supports mapping protocol with axis_values as a key and represent data corresponding to measurements for a particular state (combination of settings for each axis). 

The state provides `.samples` and `.frequencies` attributes storing raw execution duration values and estimates for average GPU frequencies. 

Example usage:

```
import array, numpy as np, cuda.bench.results

r = cuda.bench.results.BenchmarkResult("perf_data/axes_run1.json")

r["copy_sweep_grid_shape"].centers_with_frequencies(
     lambda t, f: np.median(np.asarray(t)*np.asarray(f)))

```

```
In [1]: import array, numpy as np, cuda.bench.results

In [2]: r = cuda.bench.results.BenchmarkResult("temp_data/axes_run1.json")

In [3]: list(r)
Out[3]:
['simple',
 'single_float64_axis',
 'copy_sweep_grid_shape',
 'copy_type_sweep',
 'copy_type_conversion_sweep',
 'copy_type_and_block_size_sweep']

In [4]: r["simple"].centers(lambda t: np.percentile(t, [25,75]))
Out[4]: {'Device=0': array([0.00100966, 0.00101299])}

In [5]: r.centers(lambda t: np.percentile(t, [25,75]))["simple"]
Out[5]: {'Device=0': array([0.00100966, 0.00101299])}

In [6]: len(r)
Out[6]: 6

In [7]: "fake" in r
Out[7]: False
```

Each `SubBenchmarkState` implements 
`.summaries` attribute - rich object that retains tag/name/hint/hide/description metadata.

* Add nvbench-json-summary to render NVBench JSON output as an NVBench-style
markdown summary table, including axis formatting, device sections, hidden
summary filtering, and summary hint formatting.

Update packaging, type stubs, and tests for the new namespace, renamed
classes, Python 3.10-compatible annotations, and summary-table generation.

* Split tests in test_benchmark_result into smaller tests

* Fix break due to file name change

* Add python/examples/benchmark_result_autotune.py

This example demonstrates using cuda.bench and cuda.bench.results
to implement simple auto-tuning, demonstrated on selecting of
tile shape hyperparameter for naive stencil kernel implemented
in numba-cuda.

* Resolve ruff PLE0604

* Fix for format_axis_value in json format script to handle None value

Add tests to cover such input.

* Address code rabbit review feedback

* Fix license header, add validation

* Addressed both issues raised in review

Malformed values are now represented in result as None.

Skipped benchmarks are no longer dropped, i.e., they are present
in BenchmarkResult data, but they are not reflected in summary
table in line with what NVBench-instrumented benchmarks do.
2026-05-13 13:23:58 -05:00
Oleksandr Pavlyk
d13a0fde32 Correct cuda cccl examples per change in api (#353) 2026-05-06 13:30:44 -05:00
Oleksandr Pavlyk
f392725015 Correct Python API signature of State.get_axis_values_as_strings (#346)
* Correct Python API signature of State.get_axis_values_as_strings

The C++ API has default boolean argument color, but Python API
declared no arguments.

Closes #345

* Also exercise invocation of get_axis_values_as_string with keyword argument value

* Remove use of cuda.core.experimental
2026-05-04 08:40:29 -05:00
Oleksandr Pavlyk
a3364ca5c7 Port changes to the package from #323 (#337)
Fixed relative text alignment in docstrings to fix autodoc warnigns

Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions
to start with underscore, signaling that these functions are internal and should
not be documented

Account for test_cpp_exceptions -> _test_cpp_exception, same for *_py_*

Make sure to reset __module__ of reexported symbols to be cuda.bench
2026-04-22 08:28:15 -05:00
Oleksandr Pavlyk
b0a46f44c2 Modularize color handling (#336)
* Introduce function colorize to modularize colorization/no-color handling

* Use sns.set_theme instead of deprecated sns.set()

* Use str.format instead of legacy % syntax

* Simplified iteration over list

Use f-string (supported since Python 3.6) instead of str.format for
better readability and performance
2026-04-14 08:09:44 -05:00
Nader Al Awar
373970323f Merge pull request #331 from oleksandr-pavlyk/update-python-examples
Update python examples
2026-04-02 15:20:24 -04:00
Oleksandr Pavlyk
39730efbc3 Update requirements to reflect packages used by examples 2026-04-02 10:37:17 -05:00
Oleksandr Pavlyk
9f75642387 Add patch to cutlass.base_dsl.dsl.BaseDSL to work-around a bug
See https://github.com/NVIDIA/cutlass/issues/3142
2026-04-02 10:29:31 -05:00
Nader Al Awar
7a68e53df0 Rename flag from markdown to no-color 2026-04-01 17:01:29 -05:00
Nader Al Awar
7e5e784855 Add --markdown flag to nvbench_compare.py which can be use for github issues/prs 2026-04-01 14:53:13 -05:00
Oleksandr Pavlyk
93bc59d05c Renamed CUTLASS example to reflect that it uses CuteDSL 2026-04-01 08:24:29 -05:00
Oleksandr Pavlyk
e4cfddeb87 Rewrote cutlass_gemm example to use CuteDSL 2026-04-01 08:23:41 -05:00
Oleksandr Pavlyk
3f284b4004 Renamed cccl_* examples
cccl_parallel_* -> cuda_compute_*
cccl_cooperative_* -> cuda_coop_*
2026-04-01 08:20:20 -05:00
Oleksandr Pavlyk
5bdb30f4b6 Update to cccl_parallel_segmented_reduce example per changes in API
Update namespace changes. Use make_segmented_reduce factory function,
and update call signatures.
2026-04-01 08:18:15 -05:00
Oleksandr Pavlyk
d8739fc208 Update to cccl_cooperative_block_reduce example 2026-04-01 08:17:52 -05:00
Oleksandr Pavlyk
974eb5ee0f Replace use of cupy.cuda.ExternalStream with cupy.cuda.Stream.from_external 2026-04-01 08:17:12 -05:00
Oleksandr Pavlyk
7c60edcc0a cuda.core.experimental -> cuda.core 2026-04-01 08:16:04 -05:00
Oleksandr Pavlyk
836a6c12f4 Merge pull request #326 from oleksandr-pavlyk/fix-sfinae-incomplete
Fix GCC16 sfinae incomplete warnings.

GCC16 started requiring that the type `T` used in `std::reference_wrapper<T>` is complete where using `-std=c++17`. Since NVBench has to forward declare some types in header files to break circular dependency, use of incomplete type breaks build due to use of `-Werror` flag due to `-Wsfinae-incomplete` warning emitted by GCC16.

This commit replaced affected uses of `std::reference_wrapper<const nvbench::benchmark_base>` in state.cxx, and `std::reference_wrapper<nvbench::printer_base>` in benchmark_base.cxx with raw pointers.
2026-03-24 16:02:28 -05:00
Bernhard Manfred Gruber
4164909c52 Feedback 2026-02-28 01:19:18 +01:00
Bernhard Manfred Gruber
0abc8ec82b Extend nvbench_compare.py with --plot, axis/benchmark filtering, and dark mode
Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>
2026-02-27 11:06:20 +01:00
Bernhard Manfred Gruber
800f640c20 Apply reviewer feedback 2026-02-26 19:23:51 +01:00
Bernhard Manfred Gruber
d3a0bec4a8 Feedback from review 2026-02-05 14:13:16 +01:00
Bernhard Manfred Gruber
28ed32bb47 Implement dark mode using style sheets 2026-02-05 14:00:33 +01:00
Bernhard Manfred Gruber
ec9759037d I have no idea what I am doing 2026-02-05 11:15:27 +01:00
Bernhard Manfred Gruber
ccde9fc4d4 More 2026-02-05 10:56:36 +01:00
Bernhard Manfred Gruber
0be190b407 Add a script to plot benchmark results 2026-02-05 10:36:52 +01:00
Nader Al Awar
dc59f98ecd Remove cupti from cuda-bench dependencies (#311) 2026-02-03 14:16:26 -06:00
Bernhard Manfred Gruber
c6ef87575c Allow partial comparison in nvbench_compare.py
Fixes: #295
2026-02-03 16:32:11 +01:00
Nader Al Awar
d75fc74162 Merge branch 'main' into remove-cupti-python 2026-02-03 08:58:41 -06:00
Nader Al Awar
4fa4296810 Remove cuda.pathfinder function 2026-02-02 16:43:45 -06:00
Nader Al Awar
f2d5730104 Disable CUPTI in cmake file 2026-02-02 16:03:15 -06:00
Nader Al Awar
6df5fc8c67 Remove cupti from cuda-bench dependencies 2026-02-02 15:37:13 -06:00
Oleksandr Pavlyk
8ff0557ad8 Replace use of py::handle to store global_registry
Use py::gil_safe_call_once_and_store facility pybind11 provides.
2026-02-02 11:55:48 -06:00
Oleksandr Pavlyk
39c29026fd Move docstrings from PYI file to implementation
Added tests that docstrings exist and are not empty.

This closes #291
2026-02-02 11:55:48 -06:00
Nader Al Awar
edf0b80599 Add installation instructions 2026-01-30 09:32:44 -06:00
Nader Al Awar
fa1eed69c0 Rename test file to refer to cuda_bench 2026-01-29 13:53:29 -06:00
Nader Al Awar
711c1e2eb1 Replace all occurences of pynvbench with cuda-bench 2026-01-29 13:25:17 -06:00
Nader Al Awar
5e7adc5c3f Build multi architecture cuda wheels (#302)
* Add cuda architectures to build wheel for

* Package scripts in wheel

* Separate cuda major version extraction to fix architecutre selection logic

* Add back statement printing cuda version

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-01-29 01:13:24 +00:00
Ashwin Srinath
a681e2185d Add multi-cuda wheel build (#289)
Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>
Co-authored-by: Nader Al Awar <naderalawar@gmail.com>
2026-01-28 10:37:55 -05:00
Oleksandr Pavlyk
f6a9b245d3 Only trigger skipping of outstanding benchmarks on KeyboardInterrupt exception, on others benchmakr is to continue execution 2025-12-08 14:46:59 -06:00
Oleksandr Pavlyk
7e9a9a8983 Replace main_arg_run_benchmarks with run_interriptible
This loop uses benchmark.run_or_skip to resolve #284 even
for scripts that contain more than one benchmark, or when
a script with a single benchmark is executed when more than
one device is available.
2025-12-08 14:29:27 -06:00
Oleksandr Pavlyk
a7763bdd7a Remove debug outputs 2025-12-08 12:25:31 -06:00
Oleksandr Pavlyk
b2a80c92b8 Revert "Scripts to triage 284"
This reverts commit c286199adc.
2025-12-08 11:53:08 -06:00
Oleksandr Pavlyk
ce9a76167f Use nvbench::stop_runner_loop to signal stop of runner loop
Add try/catch around Python calls to improve keyboard interrup
response.
2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
c286199adc Scripts to triage 284 2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
de471e1d42 Use pybind11==3.0.1, do not use pybind11_add_module 2025-12-05 19:38:11 -06:00
Ashwin Srinath
77b7afc3c9 Remove the Python version file 2025-12-03 16:23:14 -05:00