mirror of
https://github.com/NVIDIA/nvbench.git
synced 2026-03-14 20:27:24 +00:00
Make it explicit in README that we build and locally install NVBench first, and then build Python package use the library as a dependency. The nvbench library is installed into Python layout alongside the native extension.
1.5 KiB
1.5 KiB
CUDA Kernel Benchmarking Package
This package provides Python API to CUDA Kernel Benchmarking Library NVBench.
Building
Build NVBench project
Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using
conda create -n build_env --yes cmake ninja
conda activate build_env
Now switch to python folder, configure and install NVBench library, and install the package in editable mode:
cd nvbench/python
cmake -B nvbench_build --preset nvbench-ci -S $(pwd)/.. -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DNVBench_ENABLE_EXAMPLES=OFF -DCMAKE_INSTALL_PREFIX=$(pwd)/nvbench_install
cmake --build nvbench_build/ --config Release --target install
Build Python extension
Specify location local installation of NVBench library and perform editable pip install:
nvbench_DIR=$(pwd)/nvbench_install/lib/cmake CUDACXX=/usr/local/cuda/bin/nvcc pip install -e .
Note that CUDACXX must be set for NVBench cmake script to work, but Python extension itself only uses host compiler.
Verify that package works
python test/run_1.py
Run examples
# Example benchmarking numba.cuda kernel
python examples/throughput.py
# Example benchmarking kernels authored using cuda.core
python examples/axes.py
# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py
# Example benchmarking CuPy function
python examples/cupy_extract.py