Files
nvbench/python
Oleksandr Pavlyk dd683850f4 Addressed both issues raised in review
Malformed values are now represented in result as None.

Skipped benchmarks are no longer dropped, i.e., they are present
in BenchmarkResult data, but they are not reflected in summary
table in line with what NVBench-instrumented benchmarks do.
2026-05-13 12:35:09 -05:00
..
2025-07-28 15:37:04 -05:00
2026-02-02 16:03:15 -06:00
2026-01-30 09:32:44 -06:00

CUDA Kernel Benchmarking Package

This package provides a Python API to the CUDA Kernel Benchmarking Library NVBench.

Installation

Install from PyPi

pip install cuda-bench[cu13]  # For CUDA 13.x
pip install cuda-bench[cu12]  # For CUDA 12.x

Building from source

Ensure recent version of CMake

Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using

conda create -n build_env --yes  cmake ninja
conda activate build_env

Ensure CUDA compiler

Since building NVBench library requires CUDA compiler, ensure that appropriate environment variables are set. For example, assuming CUDA toolkit is installed system-wide, and assuming Ampere GPU architecture:

export CUDACXX=/usr/local/cuda/bin/nvcc
export CUDAARCHS=86

Build Python project

Now switch to python folder, configure and install NVBench library, and install the package in editable mode:

cd nvbench/python
pip install -e .

Verify that package works

python test/run_1.py

Run examples

# Example benchmarking numba.cuda kernel
python examples/throughput.py
# Example benchmarking kernels authored using cuda.core
python examples/axes.py
# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py
# Example benchmarking CuPy function
python examples/cupy_extract.py