mirror of
https://github.com/NVIDIA/nvbench.git
synced 2026-06-29 02:37:36 +00:00
* Add type annotations for future functionality
```python
class Timer:
def start(self) -> None: ...
def stop(self) -> None: ...
```
and overloaded `State.exec` so:
- normal mode accepts `Callable[[Launch], None]`
- `timer=True` accepts `Callable[[Launch, Timer], None]`
No implementation yet. Type annotation checked with
```
(py313) :~/repos/nvbench/python$ python -m mypy --ignore-missing-imports /tmp/check_timer.py
/tmp/check_timer.py:24: error: No overload variant of "exec" of "State" matches argument types "Callable[[Launch], None]", "bool" [call-overload]
/tmp/check_timer.py:24: note: Possible overload variants:
/tmp/check_timer.py:24: note: def exec(self, Callable[[Launch], None], /, *, batched: bool | None = ..., sync: bool | None = ..., timer: Literal[False] = ...) -> None
/tmp/check_timer.py:24: note: def exec(self, Callable[[Launch, Timer], None], /, *, timer: Literal[True], sync: bool | None = ...) -> None
/tmp/check_timer.py:25: error: Argument 1 to "exec" of "State" has incompatible type "Callable[[Launch, Timer], None]"; expected "Callable[[Launch], None]" [arg-type]
/tmp/check_timer.py:26: error: No overload variant of "exec" of "State" matches argument types "Callable[[Launch, int], None]", "bool" [call-overload]
/tmp/check_timer.py:26: note: Possible overload variants:
/tmp/check_timer.py:26: note: def exec(self, Callable[[Launch], None], /, *, batched: bool | None = ..., sync: bool | None = ..., timer: Literal[False] = ...) -> None
/tmp/check_timer.py:26: note: def exec(self, Callable[[Launch, Timer], None], /, *, timer: Literal[True], sync: bool | None = ...) -> None
Found 3 errors in 1 file (checked 1 source file)
(py313) :~/repos/nvbench/python$ nl -ba /tmp/check_timer.py
1 # /tmp/check_nvbench_timer.py
2 import cuda.bench as bench
3
4 def normal_ok(launch: bench.Launch) -> None:
5 pass
6
7 def timer_ok(launch: bench.Launch, timer: bench.Timer) -> None:
8 timer.start()
9 timer.stop()
10
11 def missing_timer(launch: bench.Launch) -> None:
12 pass
13
14 def extra_timer(launch: bench.Launch, timer: bench.Timer) -> None:
15 pass
16
17 def wrong_timer_type(launch: bench.Launch, timer: int) -> None:
18 pass
19
20 def state_bench(state: bench.State) -> None:
21 state.exec(normal_ok)
22 state.exec(normal_ok, timer=False)
23 state.exec(timer_ok, timer=True)
24 state.exec(missing_timer, timer=True) # should fail
25 state.exec(extra_timer) # should fail
26 state.exec(wrong_timer_type, timer=True) # should fail
```
* Implement cuda.bench.Timer object
The Timer class is not user-constructible. It exposes two nullary
methods timer.start() and timer.stop().
The instance of Timer class would be provided to launchable object
passed to State.exec with timer=True.
* Implement support for State.exec( launch_fn, timer=True)
* Change type annotation for batch to default to None
None is interpreted as `not timer`, i.e., it effectively
defaults to True (as before) for usage without timer set,
but starts defaulting to `False` is `timer=True` is set.
The batched keyword type is `bool | None`.
* Implement default batched=None behavior
API allows one to specify all 3 keywords, sync, batched,
and timer. batched is None by default, run-time interpreted
as `(not timer)`.
* Update tests for new behavior of batched/time combination
* Add python/examples/exec_tag_timer.py
* Expand Timer class and methods docstrings
* Reworked python/example/exec_tag_timer.py to align with C++ example.
* Replace ::cuda::std::name with cuda::std::name
* Resolve review feedback
CUDA Kernel Benchmarking Package
This package provides a Python API to the CUDA Kernel Benchmarking
Library NVBench.
Installation
Install from PyPi
pip install cuda-bench[cu13] # For CUDA 13.x
pip install cuda-bench[cu12] # For CUDA 12.x
Building from source
Ensure recent version of CMake
Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using
conda create -n build_env --yes cmake ninja
conda activate build_env
Ensure CUDA compiler
Since building NVBench library requires CUDA compiler, ensure that appropriate environment variables
are set. For example, assuming CUDA toolkit is installed system-wide, and assuming Ampere GPU architecture:
export CUDACXX=/usr/local/cuda/bin/nvcc
export CUDAARCHS=86
Build Python project
Now switch to python folder, configure and install NVBench library, and install the package in editable mode:
cd nvbench/python
pip install -e .
Verify that package works
python test/run_1.py
Run examples
# Example benchmarking numba.cuda kernel
python examples/throughput.py
# Example benchmarking kernels authored using cuda.core
python examples/axes.py
# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py
# Example benchmarking CuPy function
python examples/cupy_extract.py