mirror of
https://github.com/NVIDIA/nvbench.git
synced 2026-03-14 20:27:24 +00:00
Revamp README, split into multiple files. Add docs on CLI. Add `--help` and `--help-axis`.
78 lines
3.0 KiB
Markdown
78 lines
3.0 KiB
Markdown
# Overview
|
|
|
|
This project is a work-in-progress. Everything is subject to change.
|
|
|
|
NVBench is a C++17 library designed to simplify CUDA kernel benchmarking. It
|
|
features:
|
|
|
|
* [Parameter sweeps](docs/benchmarks.md#parameter-axes): a powerful and
|
|
flexible "axis" system explores a kernel's configuration space. Parameters may
|
|
be dynamic numbers/strings or [static types](docs/benchmarks.md#type-axes).
|
|
* [Runtime customization](docs/cli_help.md): A rich command-line interface
|
|
allows [redefinition of parameter axes](docs/cli_help_axis.md), CUDA device
|
|
selection, changing output formats, and more.
|
|
* [Throughput calculations](docs/benchmarks.md#throughput-measurements): Compute
|
|
and report:
|
|
* Item throughput (elements/second)
|
|
* Global memory bandwidth usage (bytes/second and per-device %-of-peak-bw)
|
|
* Multiple output formats: Currently supports markdown (default) and CSV output.
|
|
* [Manual timer mode](docs/benchmarks.md#explicit-timer-mode-nvbenchexec_tagtimer):
|
|
(optional) Explicitly start/stop timing in a benchmark implementation.
|
|
* Multiple measurement types:
|
|
* Cold Measurements:
|
|
* Each sample runs the benchmark once with a clean device L2 cache.
|
|
* GPU and CPU times are reported.
|
|
* Batch Measurements:
|
|
* Executes the benchmark multiple times back-to-back and records total time.
|
|
* Reports the average execution time (total time / number of executions).
|
|
|
|
# Getting Started
|
|
|
|
## Minimal Benchmark
|
|
|
|
A basic kernel benchmark can be created with just a few lines of CUDA C++:
|
|
|
|
```cpp
|
|
void my_benchmark(nvbench::state& state) {
|
|
state.exec([](nvbench::launch& launch) {
|
|
my_kernel<<<num_blocks, 256, 0, launch.get_stream()>>>();
|
|
});
|
|
}
|
|
NVBENCH_BENCH(my_benchmark);
|
|
```
|
|
|
|
See [Benchmarks](docs/benchmarks.md) for information on customizing benchmarks
|
|
and implementing parameter sweeps.
|
|
|
|
## Command Line Interface
|
|
|
|
Each benchmark executable produced by NVBench provides a rich set of
|
|
command-line options for configuring benchmark execution at runtime. See the
|
|
[CLI overview](docs/cli_help.md)
|
|
and [CLI axis specification](docs/cli_help_axis.md) for more information.
|
|
|
|
## Example Sandbox Project
|
|
|
|
To get started with NVBench, consider trying out
|
|
the [NVBench Demo Project](https://github.com/allisonvacanti/nvbench_demo). This
|
|
repository contains a simple CMake project that uses NVBench to build an example
|
|
benchmark. It's a great way to experiment with the library without a lot of
|
|
investment.
|
|
|
|
# License
|
|
|
|
NVBench is released under the Apache 2.0 License with LLVM exceptions.
|
|
See [LICENSE](./LICENSE).
|
|
|
|
# Scope and Related Projects
|
|
|
|
NVBench will measure the CPU and CUDA GPU execution time of a ***single
|
|
host-side critical region*** per benchmark. It is intended for regression
|
|
testing and parameter tuning of individual kernels. For in-depth analysis of
|
|
end-to-end performance of multiple applications, the NVIDIA Nsight tools are
|
|
more appropriate.
|
|
|
|
NVBench is focused on evaluating the performance of CUDA kernels and is not
|
|
optimized for CPU microbenchmarks. This may change in the future, but for now,
|
|
consider using Google Benchmark for high resolution CPU benchmarks.
|