Files
nvbench/python
Oleksandr Pavlyk 51fa07fab8 Avoid overloading get_int64_or_default as get_int64
Introduce get_int64_or_default method, and counterparts for
float64 and string.

Provided names for Python arguments.

Tried generating Python stubs automatically with

```
stubgen -m cuda.nvbench._nvbench
```

Gave up on this, since it does not include doc-strings.
It would be nice to compare auto-generated _nvbench.pyi with
__init__.pyi for discrepancies though.
2025-07-28 15:37:05 -05:00
..
2025-07-28 15:37:05 -05:00
2025-07-28 15:37:04 -05:00
2025-07-28 15:37:04 -05:00

CUDA Kernel Benchmarking Package

This package provides Python API to CUDA Kernel Benchmarking Library NVBench.

Building

Build NVBench project

Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using

conda create -n build_env --yes  cmake ninja
conda activate build_env

Now switch to python folder, configure and install NVBench library, and install the package in editable mode:

cd nvbench/python
cmake -B nvbench_build --preset nvbench-ci -S $(pwd)/.. -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DNVBench_ENABLE_EXAMPLES=OFF -DCMAKE_INSTALL_PREFIX=$(pwd)/nvbench_install
cmake --build nvbench_build/ --config Release --target install

nvbench_DIR=$(pwd)/nvbench_install/lib/cmake CUDACXX=/usr/local/cuda/bin/nvcc pip install -e .

Verify that package works

python test/run_1.py

Run examples

# Example benchmarking numba.cuda kernel
python examples/throughput.py
# Example benchmarking kernels authored using cuda.core
python examples/axes.py
# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py
# Example benchmarking CuPy function
python examples/cupy_extract.py