Files
Oleksandr Pavlyk 807917cf54 Add scaffolding to build C++/Python docs
Add sphinx-combined folder that builds combined C++ & Python docs

Fixed relative text alignment in docstrings to fix autodoc warnigns

Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions
to start with underscore, signaling that these functions are internal and should
not be documented

Account for test_cpp_exceptions -> _test_cpp_exception, same for *_py_*

Fix cpp_benchmarks, add py_benchmarks

1. Fixed xrefs in docs/sphinx-combined/cpp_benchmarks.md, which is built on top of
   docs/benchmarks.md

   Added level-1 heading, and pushed existing headings one level down.

2. Added py_benchmarks.md to document benchmarking of Python scripts.

3. Rearranged entries in index.rst so that overview documents come before
   API enumeration.

Make sure to reset __module__ of reexported symbols to be cuda.bench

Enumerate free functions in nvbench:: namespace

Tweak to index.rst intro sentence and title

Changed title, fixed references, added intro borrowed from README

Fix punctuation in one of the itemlist item text

Hide TOC from the index page. It is too long and confusing
2026-03-24 16:21:39 -05:00
..
2026-02-28 01:19:18 +01:00
2025-07-28 15:37:04 -05:00
2026-02-02 16:03:15 -06:00
2026-01-30 09:32:44 -06:00

CUDA Kernel Benchmarking Package

This package provides a Python API to the CUDA Kernel Benchmarking Library NVBench.

Installation

Install from PyPi

pip install cuda-bench[cu13]  # For CUDA 13.x
pip install cuda-bench[cu12]  # For CUDA 12.x

Building from source

Ensure recent version of CMake

Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using

conda create -n build_env --yes  cmake ninja
conda activate build_env

Ensure CUDA compiler

Since building NVBench library requires CUDA compiler, ensure that appropriate environment variables are set. For example, assuming CUDA toolkit is installed system-wide, and assuming Ampere GPU architecture:

export CUDACXX=/usr/local/cuda/bin/nvcc
export CUDAARCHS=86

Build Python project

Now switch to python folder, configure and install NVBench library, and install the package in editable mode:

cd nvbench/python
pip install -e .

Verify that package works

python test/run_1.py

Run examples

# Example benchmarking numba.cuda kernel
python examples/throughput.py
# Example benchmarking kernels authored using cuda.core
python examples/axes.py
# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py
# Example benchmarking CuPy function
python examples/cupy_extract.py