nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-06-29 10:47:36 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	4472e7b59b	Add python api for cold warmup parameters (#363 )	2026-05-18 10:56:44 -05:00
Oleksandr Pavlyk	d63a2761eb	Implement Timer, and support State.exec(fn, timer=True) (#364 ) * Add type annotations for future functionality ```python class Timer: def start(self) -> None: ... def stop(self) -> None: ... ``` and overloaded `State.exec` so: - normal mode accepts `Callable[[Launch], None]` - `timer=True` accepts `Callable[[Launch, Timer], None]` No implementation yet. Type annotation checked with ``` (py313) :~/repos/nvbench/python$ python -m mypy --ignore-missing-imports /tmp/check_timer.py /tmp/check_timer.py:24: error: No overload variant of "exec" of "State" matches argument types "Callable[[Launch], None]", "bool" [call-overload] /tmp/check_timer.py:24: note: Possible overload variants: /tmp/check_timer.py:24: note: def exec(self, Callable[[Launch], None], /, , batched: bool \| None = ..., sync: bool \| None = ..., timer: Literal[False] = ...) -> None /tmp/check_timer.py:24: note: def exec(self, Callable[[Launch, Timer], None], /, , timer: Literal[True], sync: bool \| None = ...) -> None /tmp/check_timer.py:25: error: Argument 1 to "exec" of "State" has incompatible type "Callable[[Launch, Timer], None]"; expected "Callable[[Launch], None]" [arg-type] /tmp/check_timer.py:26: error: No overload variant of "exec" of "State" matches argument types "Callable[[Launch, int], None]", "bool" [call-overload] /tmp/check_timer.py:26: note: Possible overload variants: /tmp/check_timer.py:26: note: def exec(self, Callable[[Launch], None], /, , batched: bool \| None = ..., sync: bool \| None = ..., timer: Literal[False] = ...) -> None /tmp/check_timer.py:26: note: def exec(self, Callable[[Launch, Timer], None], /, , timer: Literal[True], sync: bool \| None = ...) -> None Found 3 errors in 1 file (checked 1 source file) (py313) :~/repos/nvbench/python$ nl -ba /tmp/check_timer.py 1 # /tmp/check_nvbench_timer.py 2 import cuda.bench as bench 3 4 def normal_ok(launch: bench.Launch) -> None: 5 pass 6 7 def timer_ok(launch: bench.Launch, timer: bench.Timer) -> None: 8 timer.start() 9 timer.stop() 10 11 def missing_timer(launch: bench.Launch) -> None: 12 pass 13 14 def extra_timer(launch: bench.Launch, timer: bench.Timer) -> None: 15 pass 16 17 def wrong_timer_type(launch: bench.Launch, timer: int) -> None: 18 pass 19 20 def state_bench(state: bench.State) -> None: 21 state.exec(normal_ok) 22 state.exec(normal_ok, timer=False) 23 state.exec(timer_ok, timer=True) 24 state.exec(missing_timer, timer=True) # should fail 25 state.exec(extra_timer) # should fail 26 state.exec(wrong_timer_type, timer=True) # should fail ``` * Implement cuda.bench.Timer object The Timer class is not user-constructible. It exposes two nullary methods timer.start() and timer.stop(). The instance of Timer class would be provided to launchable object passed to State.exec with timer=True. * Implement support for State.exec( launch_fn, timer=True) * Change type annotation for batch to default to None None is interpreted as `not timer`, i.e., it effectively defaults to True (as before) for usage without timer set, but starts defaulting to `False` is `timer=True` is set. The batched keyword type is `bool \| None`. * Implement default batched=None behavior API allows one to specify all 3 keywords, sync, batched, and timer. batched is None by default, run-time interpreted as `(not timer)`. * Update tests for new behavior of batched/time combination * Add python/examples/exec_tag_timer.py * Expand Timer class and methods docstrings * Reworked python/example/exec_tag_timer.py to align with C++ example. * Replace ::cuda::std::name with cuda::std::name * Resolve review feedback	2026-05-15 10:19:40 -05:00
Oleksandr Pavlyk	44ec7de6bd	Implement decorators to register benchmarks add axis and options (#347 ) * Add decorators for registering benchmarks and adding axis cuda.bench.register(fn) continues returning Benchmark, and supports legacy use. New signature added: cuda.bench.register(): Returns a decorator ``` @bench.register() @bench.axis.float64("Duration (s)", [7e-5, 1e-4, 5e-4]) @bench.option.min_samples(120) def single_float64_axis(state: bench.State): ... ``` * Remove example/auto_throughput.py The C++ counterpart's purpose is to demonstrate use of CUPTI metrics, but these are not supported in Python bindings, so this example is a duplicate of example/throughput.py * Add wrong decorator order test for bench.axis.* * Strengthen type annotation for register function Acting on code rabbit nit-pick require that function being registered take cuda.bench.State object as an argument. Verified the fix as ``` (py313) :~/repos/nvbench/python$ python -m mypy --ignore-missing-import /tmp/t.py /tmp/t.py:8: error: Argument 1 has incompatible type "Callable[[], None]"; expected "Callable[[State], None]" [arg-type] Found 1 error in 1 file (checked 1 source file) (py313) :~/repos/nvbench/python$ nl -ba /tmp/t.py 1 # /tmp/check_nvbench_register.py 2 import cuda.bench as bench 3 4 @bench.register() 5 def good(state: bench.State) -> None: 6 pass 7 8 @bench.register() 9 def bad() -> None: 10 pass ``` * Replace use of global variable with thread-safe lru_cache This improves thread-safety of module initialization. * Abide by RUF005 linting rule * Expand docstrings regarding cuda.bench.register() decorator It explains to the user what the decorator does and provides a concise usage example. * Sharpen wording on exception maybe-thrown by decorator	2026-05-14 15:41:30 -05:00
Oleksandr Pavlyk	338936b6fe	Provide BenchmarkResult class for parsing JSON output of NVBench-instrumented benchmarks (#356 ) Implements `cuda.bench.results.BenchmarkResult` class to represent data from JSON output of benchmark execution. The contains implements two class methods `BenchmarkResult.from_json(filename : str \| os.PathLike, , metadata : Any = None)` which expects well-formed JSON filename and `BenchmarkResult.empty(, metadata : Any = None)` intended to represent failed result with reasons that can be recorded in metadata at user's discretion. The `BenchmarkResult` implements mapping interface, supporting `.keys()`, `.values()`, `.items()` methods, `__len__`, `__contains__`, `__getitem__` and `__iter__` special methods. Values in `BenchmarkResult` has type `cuda.bench.results.SubBenchmarkResult` which implements a list-like interface, i.e. implements `__len__`, `__getitem__`, and `__iter__` special methods. Values in this list-like structure correspond to measurements of individual states of a particular benchmark (the key in `BenchmarkResult`). Elements of `SubBenchmarkResult` structure have type `SubBenchmarkState` that supports mapping protocol with axis_values as a key and represent data corresponding to measurements for a particular state (combination of settings for each axis). The state provides `.samples` and `.frequencies` attributes storing raw execution duration values and estimates for average GPU frequencies. Example usage: ``` import array, numpy as np, cuda.bench.results r = cuda.bench.results.BenchmarkResult("perf_data/axes_run1.json") r["copy_sweep_grid_shape"].centers_with_frequencies( lambda t, f: np.median(np.asarray(t)np.asarray(f))) ``` ``` In [1]: import array, numpy as np, cuda.bench.results In [2]: r = cuda.bench.results.BenchmarkResult("temp_data/axes_run1.json") In [3]: list(r) Out[3]: ['simple', 'single_float64_axis', 'copy_sweep_grid_shape', 'copy_type_sweep', 'copy_type_conversion_sweep', 'copy_type_and_block_size_sweep'] In [4]: r["simple"].centers(lambda t: np.percentile(t, [25,75])) Out[4]: {'Device=0': array([0.00100966, 0.00101299])} In [5]: r.centers(lambda t: np.percentile(t, [25,75]))["simple"] Out[5]: {'Device=0': array([0.00100966, 0.00101299])} In [6]: len(r) Out[6]: 6 In [7]: "fake" in r Out[7]: False ``` Each `SubBenchmarkState` implements `.summaries` attribute - rich object that retains tag/name/hint/hide/description metadata. Add nvbench-json-summary to render NVBench JSON output as an NVBench-style markdown summary table, including axis formatting, device sections, hidden summary filtering, and summary hint formatting. Update packaging, type stubs, and tests for the new namespace, renamed classes, Python 3.10-compatible annotations, and summary-table generation. * Split tests in test_benchmark_result into smaller tests * Fix break due to file name change * Add python/examples/benchmark_result_autotune.py This example demonstrates using cuda.bench and cuda.bench.results to implement simple auto-tuning, demonstrated on selecting of tile shape hyperparameter for naive stencil kernel implemented in numba-cuda. * Resolve ruff PLE0604 * Fix for format_axis_value in json format script to handle None value Add tests to cover such input. * Address code rabbit review feedback * Fix license header, add validation * Addressed both issues raised in review Malformed values are now represented in result as None. Skipped benchmarks are no longer dropped, i.e., they are present in BenchmarkResult data, but they are not reflected in summary table in line with what NVBench-instrumented benchmarks do.	2026-05-13 13:23:58 -05:00
Oleksandr Pavlyk	f392725015	Correct Python API signature of State.get_axis_values_as_strings (#346 ) * Correct Python API signature of State.get_axis_values_as_strings The C++ API has default boolean argument color, but Python API declared no arguments. Closes #345 * Also exercise invocation of get_axis_values_as_string with keyword argument value * Remove use of cuda.core.experimental	2026-05-04 08:40:29 -05:00
Oleksandr Pavlyk	a3364ca5c7	Port changes to the package from #323 (#337 ) Fixed relative text alignment in docstrings to fix autodoc warnigns Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions to start with underscore, signaling that these functions are internal and should not be documented Account for test_cpp_exceptions -> _test_cpp_exception, same for _py_ Make sure to reset __module__ of reexported symbols to be cuda.bench	2026-04-22 08:28:15 -05:00
Oleksandr Pavlyk	39c29026fd	Move docstrings from PYI file to implementation Added tests that docstrings exist and are not empty. This closes #291	2026-02-02 11:55:48 -06:00
Nader Al Awar	fa1eed69c0	Rename test file to refer to cuda_bench	2026-01-29 13:53:29 -06:00
Oleksandr Pavlyk	b5e4b4ba31	cuda.nvbench -> cuda.bench Per PR review suggestion: - `cuda.parallel` - device-wide algorithms/Thrust - `cuda.cooperative` - Cooperative algorithsm/CUB - `cuda.bench` - Benchmarking/NVBench	2025-08-04 13:42:43 -05:00
Oleksandr Pavlyk	9dfdd8af89	Minimal test file	2025-08-04 11:59:17 -05:00
Oleksandr Pavlyk	6aff4712f8	Change permissions of test/run_1.py	2025-08-04 10:13:08 -05:00
Oleksandr Pavlyk	453a1648aa	Improvements to readability of examples per PR review	2025-07-31 16:20:52 -05:00
Oleksandr Pavlyk	88a3ad0138	Add test/stub.py The following static analysis run should run green ``` mypy --ignore-missing-imports test/stub.py ```	2025-07-30 13:54:37 -05:00
Oleksandr Pavlyk	b97e27cbf2	Add use of add_axis_values and add_axis_values_as_string to test/run_1.py	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	526856db4e	Fix typo in the method spelling	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	e589518376	Change test and examples from using camelCase to using snake_case as implementation changed	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	6552ef503c	Draft of Python API for NVBench The prototype is based on pybind11 to minimize boiler-plate code needed to deal with move-only semantics of many nvbench classes.	2025-07-28 15:37:04 -05:00

17 Commits