mirror of
https://github.com/NVIDIA/nvbench.git
synced 2026-05-01 12:11:14 +00:00
Updates
This commit is contained in:
@@ -18,13 +18,13 @@ Let’s begin with a simple example for users who are new to NVBench and want to
|
||||
```cpp
|
||||
void sequence_bench(nvbench::state& state) {
|
||||
auto data = thrust::device_vector<int>(10);
|
||||
state.exec([](nvbench::launch& launch) {
|
||||
state.exec([](nvbench::launch&) {
|
||||
thrust::sequence(data.begin(), data.end());
|
||||
});
|
||||
}
|
||||
NVBENCH_BENCH(sequence_bench);
|
||||
```
|
||||
Will this code work as-is? Depending on the build system configuration, compilation may succeed but generate warnings indicating that `launch` is an unused parameter. The code may or may not execute correctly. This often occurs when users, accustomed to a sequential programming mindset, overlook the fact that GPU architectures are highly parallel. Proper use of streams and synchronization is essential for accurately measuring performance in benchmark code.
|
||||
Will this code run correctly as written? While it may compile successfully, runtime behavior isn’t guaranteed. This is a common pitfall for developers used to sequential programming, who may overlook the massively parallel nature of GPU architectures. To ensure accurate performance measurement in benchmark code, proper use of streams and synchronization is crucial.
|
||||
|
||||
A common mistake in this context is neglecting stream specification: NVBench requires knowledge of the exact CUDA stream being targeted to correctly trace kernel execution and measure performance. Therefore, users must explicitly provide the stream to be benchmarked. For example, passing the NVBench launch stream ensures correct execution and accurate measurement:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user