mirror of https://github.com/NVIDIA/nvbench.git synced 2026-06-10 00:08:27 +00:00

Files

Oleksandr Pavlyk 7e9a9a8983 Replace main_arg_run_benchmarks with run_interriptible

This loop uses benchmark.run_or_skip to resolve #284 even
for scripts that contain more than one benchmark, or when
a script with a single benchmark is executed when more than
one device is available.

2025-12-08 14:29:27 -06:00

cuda/bench

cuda.nvbench -> cuda.bench

2025-08-04 13:42:43 -05:00

examples

Revert "Scripts to triage 284"

2025-12-08 11:53:08 -06:00

src

Replace main_arg_run_benchmarks with run_interriptible

2025-12-08 14:29:27 -06:00

test

cuda.nvbench -> cuda.bench

2025-08-04 13:42:43 -05:00

.gitignore

Draft of Python API for NVBench

2025-07-28 15:37:04 -05:00

CMakeLists.txt

Use pybind11==3.0.1, do not use pybind11_add_module

2025-12-05 19:38:11 -06:00

pyproject.toml

Initial wheel build and publishing infrastructure

2025-12-03 10:15:32 -05:00

README.md

Update README.md

2025-11-14 14:34:18 -08:00

README.md

CUDA Kernel Benchmarking Package

This package provides Python API to CUDA Kernel Benchmarking Library NVBench.

Building

Ensure recent version of CMake

Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using

conda create -n build_env --yes  cmake ninja
conda activate build_env

Ensure CUDA compiler

Since building NVBench library requires CUDA compiler, ensure that appropriate environment variables are set. For example, assuming CUDA toolkit is installed system-wide, and assuming Ampere GPU architecture:

export CUDACXX=/usr/local/cuda/bin/nvcc
export CUDAARCHS=86

Build Python project

Now switch to python folder, configure and install NVBench library, and install the package in editable mode:

cd nvbench/python
pip install -e .

Verify that package works

python test/run_1.py

Run examples

# Example benchmarking numba.cuda kernel
python examples/throughput.py

# Example benchmarking kernels authored using cuda.core
python examples/axes.py

# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py

# Example benchmarking CuPy function
python examples/cupy_extract.py