mirror of https://github.com/NVIDIA/nvbench.git synced 2026-07-01 11:47:33 +00:00

Files

Oleksandr Pavlyk df15de4b7a Treat unusable bulk cycle data as ambiguous

When bulk sample or frequency sources are present, do not silently fall
back to summary SM clock confirmation if the bulk cycle data cannot be
used. Report the clear-gap decision as AMBG with a
bulk_cycle_data_unusable reason instead.

Still allow summary-clock fallback when no bulk sample/frequency sources
are present.

Also update the Unknown summary label to describe the broader set of
input-data failures now counted as UNKNOWN.

2026-06-30 06:40:44 -05:00

cuda/bench

Fix docutil error when building docs (#365 )

2026-05-18 10:57:19 -05:00

examples

Implement Timer, and support State.exec(fn, timer=True) (#364 )

2026-05-15 10:19:40 -05:00

scripts

Treat unusable bulk cycle data as ambiguous

2026-06-30 06:40:44 -05:00

src

Add python api for cold warmup parameters (#363 )

2026-05-18 10:56:44 -05:00

test

Explicitly handle unavailable timings in nvbench-compare

2026-06-30 06:40:44 -05:00

.gitignore

Draft of Python API for NVBench

2025-07-28 15:37:04 -05:00

CMakeLists.txt

Disable CUPTI in cmake file

2026-02-02 16:03:15 -06:00

pyproject.toml

Provide BenchmarkResult class for parsing JSON output of NVBench-instrumented benchmarks (#356 )

2026-05-13 13:23:58 -05:00

README.md

Add installation instructions

2026-01-30 09:32:44 -06:00

README.md

CUDA Kernel Benchmarking Package

This package provides a Python API to the CUDA Kernel Benchmarking Library NVBench.

Installation

Install from PyPi

pip install cuda-bench[cu13]  # For CUDA 13.x
pip install cuda-bench[cu12]  # For CUDA 12.x

Building from source

Ensure recent version of CMake

Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using

conda create -n build_env --yes  cmake ninja
conda activate build_env

Ensure CUDA compiler

Since building NVBench library requires CUDA compiler, ensure that appropriate environment variables are set. For example, assuming CUDA toolkit is installed system-wide, and assuming Ampere GPU architecture:

export CUDACXX=/usr/local/cuda/bin/nvcc
export CUDAARCHS=86

Build Python project

Now switch to python folder, configure and install NVBench library, and install the package in editable mode:

cd nvbench/python
pip install -e .

Verify that package works

python test/run_1.py

Run examples

# Example benchmarking numba.cuda kernel
python examples/throughput.py

# Example benchmarking kernels authored using cuda.core
python examples/axes.py

# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py

# Example benchmarking CuPy function
python examples/cupy_extract.py