Oleksandr Pavlyk
4472e7b59b
Add python api for cold warmup parameters ( #363 )
2026-05-18 10:56:44 -05:00
Oleksandr Pavlyk
d63a2761eb
Implement Timer, and support State.exec(fn, timer=True) ( #364 )
...
* Add type annotations for future functionality
```python
class Timer:
def start(self) -> None: ...
def stop(self) -> None: ...
```
and overloaded `State.exec` so:
- normal mode accepts `Callable[[Launch], None]`
- `timer=True` accepts `Callable[[Launch, Timer], None]`
No implementation yet. Type annotation checked with
```
(py313) :~/repos/nvbench/python$ python -m mypy --ignore-missing-imports /tmp/check_timer.py
/tmp/check_timer.py:24: error: No overload variant of "exec" of "State" matches argument types "Callable[[Launch], None]", "bool" [call-overload]
/tmp/check_timer.py:24: note: Possible overload variants:
/tmp/check_timer.py:24: note: def exec(self, Callable[[Launch], None], /, *, batched: bool | None = ..., sync: bool | None = ..., timer: Literal[False] = ...) -> None
/tmp/check_timer.py:24: note: def exec(self, Callable[[Launch, Timer], None], /, *, timer: Literal[True], sync: bool | None = ...) -> None
/tmp/check_timer.py:25: error: Argument 1 to "exec" of "State" has incompatible type "Callable[[Launch, Timer], None]"; expected "Callable[[Launch], None]" [arg-type]
/tmp/check_timer.py:26: error: No overload variant of "exec" of "State" matches argument types "Callable[[Launch, int], None]", "bool" [call-overload]
/tmp/check_timer.py:26: note: Possible overload variants:
/tmp/check_timer.py:26: note: def exec(self, Callable[[Launch], None], /, *, batched: bool | None = ..., sync: bool | None = ..., timer: Literal[False] = ...) -> None
/tmp/check_timer.py:26: note: def exec(self, Callable[[Launch, Timer], None], /, *, timer: Literal[True], sync: bool | None = ...) -> None
Found 3 errors in 1 file (checked 1 source file)
(py313) :~/repos/nvbench/python$ nl -ba /tmp/check_timer.py
1 # /tmp/check_nvbench_timer.py
2 import cuda.bench as bench
3
4 def normal_ok(launch: bench.Launch) -> None:
5 pass
6
7 def timer_ok(launch: bench.Launch, timer: bench.Timer) -> None:
8 timer.start()
9 timer.stop()
10
11 def missing_timer(launch: bench.Launch) -> None:
12 pass
13
14 def extra_timer(launch: bench.Launch, timer: bench.Timer) -> None:
15 pass
16
17 def wrong_timer_type(launch: bench.Launch, timer: int) -> None:
18 pass
19
20 def state_bench(state: bench.State) -> None:
21 state.exec(normal_ok)
22 state.exec(normal_ok, timer=False)
23 state.exec(timer_ok, timer=True)
24 state.exec(missing_timer, timer=True) # should fail
25 state.exec(extra_timer) # should fail
26 state.exec(wrong_timer_type, timer=True) # should fail
```
* Implement cuda.bench.Timer object
The Timer class is not user-constructible. It exposes two nullary
methods timer.start() and timer.stop().
The instance of Timer class would be provided to launchable object
passed to State.exec with timer=True.
* Implement support for State.exec( launch_fn, timer=True)
* Change type annotation for batch to default to None
None is interpreted as `not timer`, i.e., it effectively
defaults to True (as before) for usage without timer set,
but starts defaulting to `False` is `timer=True` is set.
The batched keyword type is `bool | None`.
* Implement default batched=None behavior
API allows one to specify all 3 keywords, sync, batched,
and timer. batched is None by default, run-time interpreted
as `(not timer)`.
* Update tests for new behavior of batched/time combination
* Add python/examples/exec_tag_timer.py
* Expand Timer class and methods docstrings
* Reworked python/example/exec_tag_timer.py to align with C++ example.
* Replace ::cuda::std::name with cuda::std::name
* Resolve review feedback
2026-05-15 10:19:40 -05:00
Oleksandr Pavlyk
f392725015
Correct Python API signature of State.get_axis_values_as_strings ( #346 )
...
* Correct Python API signature of State.get_axis_values_as_strings
The C++ API has default boolean argument color, but Python API
declared no arguments.
Closes #345
* Also exercise invocation of get_axis_values_as_string with keyword argument value
* Remove use of cuda.core.experimental
2026-05-04 08:40:29 -05:00
Oleksandr Pavlyk
a3364ca5c7
Port changes to the package from #323 ( #337 )
...
Fixed relative text alignment in docstrings to fix autodoc warnigns
Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions
to start with underscore, signaling that these functions are internal and should
not be documented
Account for test_cpp_exceptions -> _test_cpp_exception, same for *_py_*
Make sure to reset __module__ of reexported symbols to be cuda.bench
2026-04-22 08:28:15 -05:00
Oleksandr Pavlyk
836a6c12f4
Merge pull request #326 from oleksandr-pavlyk/fix-sfinae-incomplete
...
Fix GCC16 sfinae incomplete warnings.
GCC16 started requiring that the type `T` used in `std::reference_wrapper<T>` is complete where using `-std=c++17`. Since NVBench has to forward declare some types in header files to break circular dependency, use of incomplete type breaks build due to use of `-Werror` flag due to `-Wsfinae-incomplete` warning emitted by GCC16.
This commit replaced affected uses of `std::reference_wrapper<const nvbench::benchmark_base>` in state.cxx, and `std::reference_wrapper<nvbench::printer_base>` in benchmark_base.cxx with raw pointers.
2026-03-24 16:02:28 -05:00
Nader Al Awar
d75fc74162
Merge branch 'main' into remove-cupti-python
2026-02-03 08:58:41 -06:00
Nader Al Awar
6df5fc8c67
Remove cupti from cuda-bench dependencies
2026-02-02 15:37:13 -06:00
Oleksandr Pavlyk
8ff0557ad8
Replace use of py::handle to store global_registry
...
Use py::gil_safe_call_once_and_store facility pybind11 provides.
2026-02-02 11:55:48 -06:00
Oleksandr Pavlyk
39c29026fd
Move docstrings from PYI file to implementation
...
Added tests that docstrings exist and are not empty.
This closes #291
2026-02-02 11:55:48 -06:00
Ashwin Srinath
a681e2185d
Add multi-cuda wheel build ( #289 )
...
Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com >
Co-authored-by: Nader Al Awar <naderalawar@gmail.com >
2026-01-28 10:37:55 -05:00
Oleksandr Pavlyk
f6a9b245d3
Only trigger skipping of outstanding benchmarks on KeyboardInterrupt exception, on others benchmakr is to continue execution
2025-12-08 14:46:59 -06:00
Oleksandr Pavlyk
7e9a9a8983
Replace main_arg_run_benchmarks with run_interriptible
...
This loop uses benchmark.run_or_skip to resolve #284 even
for scripts that contain more than one benchmark, or when
a script with a single benchmark is executed when more than
one device is available.
2025-12-08 14:29:27 -06:00
Oleksandr Pavlyk
a7763bdd7a
Remove debug outputs
2025-12-08 12:25:31 -06:00
Oleksandr Pavlyk
ce9a76167f
Use nvbench::stop_runner_loop to signal stop of runner loop
...
Add try/catch around Python calls to improve keyboard interrup
response.
2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
c2a2acc9b6
Change float64_t arg-type for set_throttle_threshold to float32_t
...
The C++ method signature of set_throttle_threshold/set_trottle_recovery_delay,
which uses nvbench::float32_t
2025-08-04 12:14:52 -05:00
Oleksandr Pavlyk
a5e0a48f80
Add test test functions for cpp/python exceptions
2025-08-04 10:09:10 -05:00
Oleksandr Pavlyk
40a2337a6b
Review fix: make nvbenhch_run_error constructable
...
Allow `throw nvbench_run_error("Msg");` to compile.
Add comment around definition of nvbench_run_error
2025-08-04 10:09:04 -05:00
Oleksandr Pavlyk
4fc628c4d7
Python native extension to use CXX/CUDA standard of NVBench library
...
This fixes cryptic build failure with GNU compiler 14
2025-08-01 15:33:39 -05:00
Oleksandr Pavlyk
9c01f229a6
Add Benchmark set methods, such as set_stopping_criterion, set_timeout, etc
...
Add
- State.get_stopping_criterion() -> str
- Benchmark.set_stopping_criterion(criterion: str) -> Self
- Benchmark.set_criterion_param_int64(name: str, value: int) -> Self
- Benchmark.set_criterion_param_float64(name: str, value: float) -> Self
- Benchmark.set_criterion_param_string(name: str, value: str) -> Self
- Benchmark.set_timeout(duration: float) -> Self
- Benchmark.set_skip_time(skip_time: float) -> Self
- Benchmark.set_throttle_threshold(frac: float) -> Self
- Benchmark.set_throttle_recovery_delay(duration: float) -> Self
- Benchmark.set_min_samples(count: int) -> Self
2025-07-30 13:37:17 -05:00
Oleksandr Pavlyk
413c4a114b
Support nvbench.State.set_throttle_threshold
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
b6821b7624
Rename NVBenchRuntimeException to NVBenchRuntimeError
...
Added exception to __init__.pyi
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
eb614ac52f
Add State.get_axis_values and State.get_axis_values_as_string
...
Add nvbench.State methods to get Python dictionary representing
axis values of benchmark configuration state represents.
get_axis_values_as_string gives a string of space-separated
name=values pairs.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
5613281c2e
nvbench.State.exec validates arg to be a callable
...
Add names to method arguments to make it more self-descriptive.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
c747a19b98
Remove code setting up CUDA_MODULE_LOADING=EAGER in Python extension
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
480614e847
Add license to stub fuile, add comment about syncing impl and stubs
...
Add comments stating the need to keep implementation and Python stub
file in sync to both files. In the stub file to comment documents
use of mypy's stubgen to generate stubs and calls to compare that against
current stubs. It also calls out the need to keep docstrings and
doctring examples in sync with implementation.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
51fa07fab8
Avoid overloading get_int64_or_default as get_int64
...
Introduce get_int64_or_default method, and counterparts for
float64 and string.
Provided names for Python arguments.
Tried generating Python stubs automatically with
```
stubgen -m cuda.nvbench._nvbench
```
Gave up on this, since it does not include doc-strings.
It would be nice to compare auto-generated _nvbench.pyi with
__init__.pyi for discrepancies though.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
dc7f9edfd4
Support nvbench.Benchmark.add_int64_power_of_two_axis
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
13ad115ca3
Add nvbench.Benchmark.set_run_once method
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
e426368485
Correct propagating nvbench_main exceptions to Python
...
python examples/cpu_only.py --run-once -d 0 --output foo.md
used to trip SystemError, returned a result with an exception set.
It now returns a clean NVBenchmarkError exception.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
81fff085b9
Change method nameing from camelCase to snake_case
...
This ensures names of Python API methods are consistent with those of C++
counterparts.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
11ae98389d
Replace use of py::object copy constructor with use of move constructor
...
Change explicit constructor of benchmark_wrapper_t to use move-constructor
of py::object instead of copy constructor by replacing `py::object(o)` with
`py::object(std::move(o))`.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
d3071fb038
Addressed PR feedback re: definition of benchmark_wrapper_t
...
See https://github.com/NVIDIA/nvbench/pull/237#discussion_r2183749750
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
6b4da8c5cb
add comments to body of launcher_fn lambda in State.exec method
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
aa2b4d9960
Add Benchmark.setIsCPUOnly API
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
7f9d672cec
Raise Python exception if error is encountered while executing benchmarks
...
Introduce new exception type to raise on errors that occurred while
NVBench runs benchmarks.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
8c112d529f
Include Pybind11 headers before anything else
...
See https://github.com/NVIDIA/nvbench/pull/237#discussion_r2183703828
for the rationale
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
02ad6e5490
Implement Benchmark.setName
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
9dba866426
Add State.add_summary method
...
state.add_summary(column_name: str, value: Union[int, float, str])
This is used in examples/axes.py to map integral value from Int64Axis
to string description.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
576c473481
Add implementation of and signature for State.getDevice
...
make batch/sync arguments of State.exec keyword-only
Provide default column_name value for State.addElementCount method,
so that it can be called state.addElementCount(count), or as
state.addElementCount(count, column_name="Descriptive Name")
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
6f8bcdc774
Fixed correctness of nvbench.State.getStream() method
...
Fix run-time exception:
```
Fail: Unexpected error: RuntimeError: return_value_policy = copy, but type is non-copyable! (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)
```
caused by attempt to returning move-only `nvbench::cuda_stream` class
instance using default `pybind11::return_value_policy::copy`.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
c49d718f65
Corrected nvbench.State.getBlockingKernel -> getBlockingKernelTimeout
...
Similar change for setBlockingKernelTimeout.
Corrected statement in a comment.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
b88cc78aeb
Add license header to py_nvbench.cpp
...
Also updated comment as to why calling
`nvbench::benchmark_manager::get().initialize()` is necessary
for running all tests.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
6552ef503c
Draft of Python API for NVBench
...
The prototype is based on pybind11 to minimize boiler-plate
code needed to deal with move-only semantics of many nvbench
classes.
2025-07-28 15:37:04 -05:00