Files
nvbench/python
Oleksandr Pavlyk 8c85393ee2 Use bulk samples to confirm same comparisons
Add a bulk-data SAME path to nvbench_compare for cases where summary
intervals do not provide a clear FAST/SLOW decision. The new path compares
sample times and SM-clock-adjusted cycles with symmetric nearest-neighbor
coverage over unique values and sample counts.

The comparison now requires both sample-weight coverage and unique-support
coverage to pass before declaring SAME. If bulk data is available but coverage
does not pass, the result remains UNDECIDED instead of falling back to the
summary-only SAME rule.

Also improve undecided diagnostics by aggregating reason codes while preserving
the most severe representative detail, including observed coverage values and
thresholds for bulk support mismatches.

Add tests for:
 - bulk data confirming SAME despite changed mode weights;
 - bulk time mismatch overriding summary-only SAME;
 - cycle coverage vetoing time-only agreement;
 - sample-weight and unique-support coverage diagnostics;
 - aggregation of undecided reason details.
2026-06-03 09:36:05 -05:00
..
2025-07-28 15:37:04 -05:00
2026-02-02 16:03:15 -06:00
2026-01-30 09:32:44 -06:00

CUDA Kernel Benchmarking Package

This package provides a Python API to the CUDA Kernel Benchmarking Library NVBench.

Installation

Install from PyPi

pip install cuda-bench[cu13]  # For CUDA 13.x
pip install cuda-bench[cu12]  # For CUDA 12.x

Building from source

Ensure recent version of CMake

Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using

conda create -n build_env --yes  cmake ninja
conda activate build_env

Ensure CUDA compiler

Since building NVBench library requires CUDA compiler, ensure that appropriate environment variables are set. For example, assuming CUDA toolkit is installed system-wide, and assuming Ampere GPU architecture:

export CUDACXX=/usr/local/cuda/bin/nvcc
export CUDAARCHS=86

Build Python project

Now switch to python folder, configure and install NVBench library, and install the package in editable mode:

cd nvbench/python
pip install -e .

Verify that package works

python test/run_1.py

Run examples

# Example benchmarking numba.cuda kernel
python examples/throughput.py
# Example benchmarking kernels authored using cuda.core
python examples/axes.py
# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py
# Example benchmarking CuPy function
python examples/cupy_extract.py