mirror of
https://github.com/NVIDIA/nvbench.git
synced 2026-05-12 09:15:47 +00:00
1. For JSON files that contains repeated measurements of run-time axis values, make sure that scripts compares corresponding reference entries. If cmp had two states with the same name and ref had two, we would compare measurements for each state in cmp against the first state in ref. Change here introduces counters tracking how many times each particular axis value, and retrieve corresponding entry in ref. Previously, I had ``` | BlockSize | NumBlocks | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |-------------|-------------|------------|-------------|------------|-------------|-----------|---------|----------| | 2^8 | 64 | 1.776 ms | 0.46% | 1.777 ms | 0.40% | 1.024 us | 0.06% | SAME | | 2^8 | 64 | 1.776 ms | 0.46% | 1.774 ms | 0.52% | -2.048 us | -0.12% | SAME | | 2^8 | 64 | 1.776 ms | 0.46% | 1.773 ms | 0.52% | -3.072 us | -0.17% | SAME | | 2^8 | 64 | 1.776 ms | 0.46% | 1.774 ms | 0.58% | -2.048 us | -0.12% | SAME | | 2^8 | 64 | 1.776 ms | 0.46% | 1.773 ms | 0.58% | -3.072 us | -0.17% | SAME | ``` and now it becomes ``` | BlockSize | NumBlocks | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |-------------|-------------|------------|-------------|------------|-------------|-----------|---------|----------| | 2^8 | 64 | 1.776 ms | 0.46% | 1.777 ms | 0.40% | 1.024 us | 0.06% | SAME | | 2^8 | 64 | 1.773 ms | 0.64% | 1.774 ms | 0.52% | 1.024 us | 0.06% | SAME | | 2^8 | 64 | 1.774 ms | 0.46% | 1.773 ms | 0.52% | -1.024 us | -0.06% | SAME | | 2^8 | 64 | 1.773 ms | 0.46% | 1.774 ms | 0.58% | 1.024 us | 0.06% | SAME | | 2^8 | 64 | 1.774 ms | 0.52% | 1.773 ms | 0.58% | -1.024 us | -0.06% | SAME | ``` With the following raw data expected ``` (py313) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/median") | .data[] | .value' base.json "0.0017756160497665405" "0.0017725440263748169" "0.001773568034172058" "0.0017725440263748169" "0.001773568034172058" (py313) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/median") | .data[] | .value' test.json "0.0017766400575637818" "0.001773568034172058" "0.0017725440263748169" "0.001773568034172058" "0.0017725440263748169" ``` 2. nvbench_compare changes from using min_noise = min(ref_noise, cmp_noise) to using max_noise = max(ref_noise, cmp_noise) Using larger of ref and cmp noise level as a reference against which to gauge timing difference ratio makes more sense.
CUDA Kernel Benchmarking Package
This package provides a Python API to the CUDA Kernel Benchmarking
Library NVBench.
Installation
Install from PyPi
pip install cuda-bench[cu13] # For CUDA 13.x
pip install cuda-bench[cu12] # For CUDA 12.x
Building from source
Ensure recent version of CMake
Since nvbench requires a rather new version of CMake (>=3.30.4), either build CMake from sources, or create a conda environment with a recent version of CMake, using
conda create -n build_env --yes cmake ninja
conda activate build_env
Ensure CUDA compiler
Since building NVBench library requires CUDA compiler, ensure that appropriate environment variables
are set. For example, assuming CUDA toolkit is installed system-wide, and assuming Ampere GPU architecture:
export CUDACXX=/usr/local/cuda/bin/nvcc
export CUDAARCHS=86
Build Python project
Now switch to python folder, configure and install NVBench library, and install the package in editable mode:
cd nvbench/python
pip install -e .
Verify that package works
python test/run_1.py
Run examples
# Example benchmarking numba.cuda kernel
python examples/throughput.py
# Example benchmarking kernels authored using cuda.core
python examples/axes.py
# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py
# Example benchmarking CuPy function
python examples/cupy_extract.py