mirror of
https://github.com/NVIDIA/nvbench.git
synced 2026-06-29 18:57:44 +00:00
Add versioned TOML configuration support for nvbench-compare threshold
settings. The new --config option reads grouped settings for clear-gap,
same-result, bulk coverage, and rare-support filtering thresholds. The parser
validates the schema strictly so unknown tables, unknown keys, invalid types,
unsupported versions, and out-of-range values fail early.
Add --dump-config to print the effective configuration without requiring input
JSON files. This makes the currently selected preset and resolved threshold
values discoverable and gives users a starting point for custom configuration.
Preset resolution is:
- default is used when neither TOML nor CLI selects a preset
- [preset] name = "..." in TOML selects the base preset
- --preset ... overrides the TOML preset selection
- explicit threshold values in TOML override whichever base preset was selected
For example:
- nvbench-compare --dump-config
Prints the built-in default settings as grouped TOML.
- nvbench-compare --preset permissive --dump-config
Prints the permissive preset values as TOML.
- nvbench-compare --config compare.toml ref.json cmp.json
Compares using the preset named in compare.toml, plus any explicit TOML
threshold overrides.
- nvbench-compare --config compare.toml --preset strict ref.json cmp.json
Uses the strict preset as the base, while preserving explicit threshold
overrides from compare.toml.
Keep TOML parsing lazy: Python 3.11+ uses tomllib, while Python 3.10 only
requires tomli when --config is used. Add focused tests for grouped config
dumping, strict validation, preset/override precedence, and CLI dump behavior.
CUDA Kernel Benchmarking Package
This package provides a Python API to the CUDA Kernel Benchmarking
Library NVBench.
Installation
Install from PyPi
pip install cuda-bench[cu13] # For CUDA 13.x
pip install cuda-bench[cu12] # For CUDA 12.x
Building from source
Ensure recent version of CMake
Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using
conda create -n build_env --yes cmake ninja
conda activate build_env
Ensure CUDA compiler
Since building NVBench library requires CUDA compiler, ensure that appropriate environment variables
are set. For example, assuming CUDA toolkit is installed system-wide, and assuming Ampere GPU architecture:
export CUDACXX=/usr/local/cuda/bin/nvcc
export CUDAARCHS=86
Build Python project
Now switch to python folder, configure and install NVBench library, and install the package in editable mode:
cd nvbench/python
pip install -e .
Verify that package works
python test/run_1.py
Run examples
# Example benchmarking numba.cuda kernel
python examples/throughput.py
# Example benchmarking kernels authored using cuda.core
python examples/axes.py
# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py
# Example benchmarking CuPy function
python examples/cupy_extract.py