Updates

2026-05-01 04:01:14 +00:00 · 2025-09-29 12:22:07 -07:00
parent df7abef849
commit 8af0aa38d5
1 changed files with 11 additions and 11 deletions
--- a/docs/best_practices.md
+++ b/docs/best_practices.md
@@ -18,13 +18,13 @@ Let’s begin with a simple example for users who are new to NVBench and want to
 ```cpp
 void sequence_bench(nvbench::state& state) {
  auto data = thrust::device_vector<int>(10);
-  state.exec([](nvbench::launch& launch) {
+  state.exec([](nvbench::launch&) {
    thrust::sequence(data.begin(), data.end());
  });
 }
 NVBENCH_BENCH(sequence_bench);
 ```
-Will this code work as-is? Depending on the build system configuration, compilation may succeed but generate warnings indicating that `launch` is an unused parameter. The code may or may not execute correctly. This often occurs when users, accustomed to a sequential programming mindset, overlook the fact that GPU architectures are highly parallel. Proper use of streams and synchronization is essential for accurately measuring performance in benchmark code.
+Will this code run correctly as written? While it may compile successfully, runtime behavior isn’t guaranteed. This is a common pitfall for developers used to sequential programming, who may overlook the massively parallel nature of GPU architectures. To ensure accurate performance measurement in benchmark code, proper use of streams and synchronization is crucial.

 A common mistake in this context is neglecting stream specification: NVBench requires knowledge of the exact CUDA stream being targeted to correctly trace kernel execution and measure performance. Therefore, users must explicitly provide the stream to be benchmarked. For example, passing the NVBench launch stream ensures correct execution and accurate measurement: