This commit is contained in:
Yunsong Wang
2025-09-29 12:22:07 -07:00
parent df7abef849
commit 8af0aa38d5

View File

@@ -18,13 +18,13 @@ Lets begin with a simple example for users who are new to NVBench and want to
```cpp
void sequence_bench(nvbench::state& state) {
auto data = thrust::device_vector<int>(10);
state.exec([](nvbench::launch& launch) {
state.exec([](nvbench::launch&) {
thrust::sequence(data.begin(), data.end());
});
}
NVBENCH_BENCH(sequence_bench);
```
Will this code work as-is? Depending on the build system configuration, compilation may succeed but generate warnings indicating that `launch` is an unused parameter. The code may or may not execute correctly. This often occurs when users, accustomed to a sequential programming mindset, overlook the fact that GPU architectures are highly parallel. Proper use of streams and synchronization is essential for accurately measuring performance in benchmark code.
Will this code run correctly as written? While it may compile successfully, runtime behavior isnt guaranteed. This is a common pitfall for developers used to sequential programming, who may overlook the massively parallel nature of GPU architectures. To ensure accurate performance measurement in benchmark code, proper use of streams and synchronization is crucial.
A common mistake in this context is neglecting stream specification: NVBench requires knowledge of the exact CUDA stream being targeted to correctly trace kernel execution and measure performance. Therefore, users must explicitly provide the stream to be benchmarked. For example, passing the NVBench launch stream ensures correct execution and accurate measurement: