Move documentation on streams to new subsection.

Also update to use `nvbench::make_cuda_stream_view`.
2026-03-14 20:27:24 +00:00 · 2022-02-11 13:29:06 -05:00
parent 3b41387637
commit 039d455727
1 changed files with 30 additions and 19 deletions
--- a/docs/benchmarks.md
+++ b/docs/benchmarks.md
@@ -11,25 +11,6 @@ void my_benchmark(nvbench::state& state) {
 NVBENCH_BENCH(my_benchmark);
 ```

-NVBench records the elapsed time of work on a CUDA stream for each iteration of a benchmark. 
-By default, NVBench creates and provides an explicit stream via `launch::get_stream()` 
-to pass to every stream-ordered operation. 
-
-Sometimes it is inconvenient or impossible to specify an explicit CUDA stream to every 
-stream-ordered operation. A `cudaStream_t` may be provided via `state::set_cuda_stream`. 
-It is assumed that all work of interest executes on or synchronizes with this stream. 
-```cpp
-void my_benchmark(nvbench::state& state) {
-  cudaStream_t default_stream = 0;
-  state.set_cuda_stream(nvbench::cuda_stream{default_stream, false});
-  state.exec([](nvbench::launch&) {
-    my_func(); // a host API invoking GPU kernels without taking an explicit stream
-    my_kernel<<<num_blocks, 256>>>(); // or a kernel launched with the default stream
-  });
-}
-NVBENCH_BENCH(my_benchmark);
-```
-
 There are three main components in the definition of a benchmark:

 - A `KernelGenerator` callable (`my_benchmark` above)
@@ -60,6 +41,36 @@ attributes to be modified.
 NVBENCH_BENCH(my_benchmark).set_name("my_kernel<<<num_blocks, 256>>>");
 ```

+# CUDA Streams
+
+NVBench records GPU execution times on a specific CUDA stream. By default, a new
+stream is created and passed to the `KernelLauncher` via the
+`nvbench::launch::get_stream()` method, as shown in
+[Minimal Benchmark](#minimal-benchmark). All benchmarked kernels and other
+stream-ordered work must be launched on this stream for NVBench to capture it.
+
+In some instances, it may be inconvenient or impossible to specify an explicit
+CUDA stream for the benchmarked operation to use. For example, a library may
+manage and use its own streams, or an opaque API may always launch work on the
+default stream. In these situations, users may provide NVBench with an explicit
+stream via `nvbench::state::set_cuda_stream` and `nvbench::make_stream_view`.
+It is assumed that all work of interest executes on or synchronizes with this
+stream.
+
+```cpp
+void my_benchmark(nvbench::state& state) {
+  cudaStream_t default_stream = 0;
+  state.set_cuda_stream(nvbench::make_cuda_stream_view(default_stream));
+  state.exec([](nvbench::launch&) {
+    my_func(); // a host API invoking GPU kernels on the default stream
+    my_kernel<<<num_blocks, 256>>>(); // or a kernel launched with the default stream
+  });
+}
+NVBENCH_BENCH(my_benchmark);
+```
+
+A full example can be found in [examples/stream.cu](../examples/stream.cu).
+
 # Parameter Axes

 Some kernels will be used with a variety of options, input data types/sizes, and