mirror of https://github.com/NVIDIA/nvbench.git synced 2026-06-30 19:27:34 +00:00

Files

Oleksandr Pavlyk db4db61596 Lazy-load nvbench-compare bulk timing data

Store JSON-bin sample time and frequency metadata in GpuTimingData instead of
reading the binary files during summary extraction.

Add Float32BinarySource and lazy cached accessors for samples and frequencies.
Use np.fromfile by default, but allow tests and alternate callers to inject a
float32 reader returning any buffer-compatible object convertable to "<f4" data
type.

Treat optional bulk-data failures as unavailable evidence instead of aborting
comparison: unreadable files, invalid buffers, count mismatches, and mismatched
sample/frequency metadata now emit RuntimeWarning and return None.

Update nvbench_compare tests to verify lazy loading, cache reuse, injected
reader behavior, warning-based degradation, and count mismatch handling.

2026-06-28 09:40:54 -05:00

cuda/bench

Fix docutil error when building docs (#365 )

2026-05-18 10:57:19 -05:00

examples

Implement Timer, and support State.exec(fn, timer=True) (#364 )

2026-05-15 10:19:40 -05:00

scripts

Lazy-load nvbench-compare bulk timing data

2026-06-28 09:40:54 -05:00

src

Add python api for cold warmup parameters (#363 )

2026-05-18 10:56:44 -05:00

test

Lazy-load nvbench-compare bulk timing data

2026-06-28 09:40:54 -05:00

.gitignore

Draft of Python API for NVBench

2025-07-28 15:37:04 -05:00

CMakeLists.txt

Disable CUPTI in cmake file

2026-02-02 16:03:15 -06:00

pyproject.toml

Provide BenchmarkResult class for parsing JSON output of NVBench-instrumented benchmarks (#356 )

2026-05-13 13:23:58 -05:00

README.md

Add installation instructions

2026-01-30 09:32:44 -06:00

README.md

CUDA Kernel Benchmarking Package

This package provides a Python API to the CUDA Kernel Benchmarking Library NVBench.

Installation

Install from PyPi

pip install cuda-bench[cu13]  # For CUDA 13.x
pip install cuda-bench[cu12]  # For CUDA 12.x

Building from source

Ensure recent version of CMake

Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using

conda create -n build_env --yes  cmake ninja
conda activate build_env

Ensure CUDA compiler

Since building NVBench library requires CUDA compiler, ensure that appropriate environment variables are set. For example, assuming CUDA toolkit is installed system-wide, and assuming Ampere GPU architecture:

export CUDACXX=/usr/local/cuda/bin/nvcc
export CUDAARCHS=86

Build Python project

Now switch to python folder, configure and install NVBench library, and install the package in editable mode:

cd nvbench/python
pip install -e .

Verify that package works

python test/run_1.py

Run examples

# Example benchmarking numba.cuda kernel
python examples/throughput.py

# Example benchmarking kernels authored using cuda.core
python examples/axes.py

# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py

# Example benchmarking CuPy function
python examples/cupy_extract.py