Commit Graph

143 Commits

Author SHA1 Message Date
Allison Vacanti
47c69b83c9 More README cleanup. 2021-03-03 17:29:15 -05:00
Allison Vacanti
d7c34c835d More README cleanup. 2021-03-03 17:25:12 -05:00
Allison Vacanti
21e13f002d Fix error in README. 2021-03-03 17:22:35 -05:00
Allison Vacanti
544deaf539 Fix warning. 2021-03-03 16:57:17 -05:00
Allison Vacanti
75439d3ef8 Fix device global arg storage. 2021-03-03 16:49:43 -05:00
Allison Vacanti
9ff22cb12d Update README to use current macro names. 2021-03-03 16:30:45 -05:00
Allison Vacanti
a6b26ef7be Add initial README.md. 2021-03-03 16:00:11 -05:00
Allison Vacanti
cf71f6ee15 Update NVBench build system with initial standalone support. 2021-03-03 13:59:29 -05:00
Allison Vacanti
2ce12d2201 More printer cleanups.
- Initialize color from env_var.
- Change ivar to m_global_benchmark_args to clarify usage.
2021-03-02 17:36:19 -05:00
Allison Vacanti
3f3c648358 Update option_parser for recent refactorings. 2021-03-02 17:13:52 -05:00
Allison Vacanti
c83bf8cdb8 s/output_multiplex/printer_multiplex/g 2021-03-02 17:12:22 -05:00
Allison Vacanti
9d2194f2ab s/csv_format/csv_printer/g 2021-03-02 17:10:09 -05:00
Allison Vacanti
015b8d1fb1 s/markdown_format/markdown_printer/g 2021-03-02 17:08:54 -05:00
Allison Vacanti
780dc3b649 s/output_format/printer_base/g 2021-03-02 17:06:13 -05:00
Allison Vacanti
4a1e670f50 Use a more robust method to add the stdout printer, add --quiet. 2021-03-02 16:57:03 -05:00
Allison Vacanti
9cd0d10fe1 Route log messages through output formats. 2021-03-02 16:36:28 -05:00
Allison Vacanti
52fbbbcc7a Add --markdown / --csv options to option_parser. 2021-03-01 17:13:11 -05:00
Allison Vacanti
630aefda93 Make output_format explicitly move-only. 2021-03-01 17:10:33 -05:00
Allison Vacanti
0112993de2 Add output_multiplex::get_output_count. 2021-03-01 17:10:13 -05:00
Allison Vacanti
6c28a6a791 Refactor to keep stream includes out of headers. 2021-03-01 17:09:56 -05:00
Allison Vacanti
33a069af2b Add output_multiplex.
This allows an arbitrary number of output_formats to be wrapped up
into a single object. This output format just forwards all calls to its
children.
2021-03-01 15:16:43 -05:00
Allison Vacanti
14d41bb7e1 Add initial implementation of csv_format. 2021-02-25 16:10:28 -05:00
Allison Vacanti
17db5d31cc Split table_builder out from markdown_table. 2021-02-25 16:09:06 -05:00
Allison Vacanti
359db2c592 Initial pass at output_format.cuh, ported markdown_format. 2021-02-23 16:28:30 -05:00
Allison Vacanti
8d6d934dfe Add default axis names.
Also cleaned up the annoying quirk where `set_type_axes_names` *had*
to be called on all benchmarks with type axes.

Default names are {"T", "U", "V", "W"} for up-to four type axes. For
five or more, {"T0", "T1", ...} is used instead.
2021-02-19 12:37:05 -05:00
Allison Vacanti
324b0d107e Add "global args" to option parser.
If a benchmark modifier is passed before `--benchmark`, the modifier
will apply to all benchmarks.
2021-02-19 10:37:00 -05:00
Allison Vacanti
a747982415 Add nvbench::main CMake target.
Linking to this instead of `nvbench::nvbench` will automatically include
the `NVBENCH_MAIN` macro.
2021-02-19 09:34:02 -05:00
Allison Vacanti
2cc9bf41e3 Add demangle<T>() convenience overload. 2021-02-18 23:41:49 -05:00
Allison Vacanti
543488ef75 Make kernel wrapper into an lvalue. 2021-02-18 23:41:23 -05:00
Allison Vacanti
b5443e98c8 Use std::size_t for element counts, buffer size metadata. 2021-02-18 18:23:14 -05:00
Allison Vacanti
7dd46b0021 Update old benchmarks to use nvbench, remove old scaffolding.
Remove the original attempt to adapt gbench to do CUDA stuff.

Update all benchmarks to use some conventions:

- Element count -> "Elements" [16:32]
- Throughput calcs
- Add input buffer column: "Size"
2021-02-18 18:22:50 -05:00
Allison Vacanti
7657036f9c Add helper methods to configure throughput.
Instead of:

```
state.set_element_count(size);
state.set_global_memory_bytes_accessed(
  size * (sizeof(InT) + sizeof(OutT)));
```

do:

```
state.add_element_count(size, "Elements");
state.add_global_memory_read<InT>(size, "InputSize");
state.add_global_memory_write<InT>(size, "OutputSize");
```

The string arguments are optional. If provided, a new column will
be added to the output with the indicated name and number
of bytes (or elements for `add_element_count`).
2021-02-18 15:47:59 -05:00
Allison Vacanti
dcd5d1ffa6 Update markdown output format. 2021-02-18 14:44:17 -05:00
Allison Vacanti
ef3e1594eb Implement manual timers.
See the new thrust/sort/basic.cu benchmark for usage.

Other notable changes:

- Updated summary column names:
  - Cold GPU -> GPU Time
  - Cold CPU -> CPU Time
  - Hot GPU  -> Batch GPU
- Removed CPU timings from measure_hot
  - They'd been hidden for a while, and aren't really useful.
- Moved the throughput calcs to measure_cold
  - `timer` will disable `hot` timings, still want throughput
  - `cold` timings make more sense for throughput, global BW numbers
    are meaningless if the data is sitting in L2.
2021-02-17 18:48:26 -05:00
Allison Vacanti
385d4f77ba Teach markdown_format about sample_sizes. 2021-02-17 18:34:35 -05:00
Allison Vacanti
8a1f017a4e Inline some methods used in benchmark loops. 2021-02-17 18:34:09 -05:00
Allison Vacanti
f61be70a93 Add initial implementation of exec_tag dispatching.
nvbench::exec_tags are used to request measurement types and share
information about the kernel. They are used to ensure that templated
measurement code is not instantiated unless actually used.

Replaces the nvbench::exec(state, launcher, tags) pattern with:

state.exec(tags, launcher);
state.exec(launcher); // defaults to hot/cold cuda measurements
2021-02-16 23:47:36 -05:00
Allison Vacanti
37e753f7b6 Update benchmark macros:
s/NVBENCH_CREATE/NVBENCH_BENCH/g
s/NVBENCH_BENCH_TEMPLATE/NVBENCH_BENCH_TYPES/g

This will fit nicer once the exec_tags version are added:

NVBENCH_BENCH
NVBENCH_BENCH_TYPES
NVBENCH_BENCH_FLAGS
NVBENCH_BENCH_TYPES_FLAGS
2021-02-16 16:08:38 -05:00
Allison Vacanti
d12326083d Clean up l2flush initialization. 2021-02-16 12:01:50 -05:00
Allison Vacanti
f46dda0e81 Use noexcept CUDA_CALL check in destructor. 2021-02-16 12:00:04 -05:00
Allison Vacanti
55aa78ce17 Make the use of the blocking_kernel optional.
This breaks thrust algorithms, which sync internally. I'll need
to add an exec_tag to toggle this.
2021-02-15 21:55:26 -05:00
Allison Vacanti
bb871094c3 Fixes for multidevice/gcc.
- Allow devices to be cleared during benchmark definition.
- Fix various demangling bugs.
2021-02-15 21:26:21 -05:00
Allison Vacanti
8897490a6d Add cxxabi demangling for gcc/clang. 2021-02-15 21:00:09 -05:00
Allison Vacanti
6c67578dcd Implement skip_time and improve logging. 2021-02-15 17:39:46 -05:00
Allison Vacanti
ead8392bce Use NVBENCH_THROW in option_parser.cu. 2021-02-15 17:19:07 -05:00
Allison Vacanti
6cf29b5083 Various small updates and refactorings.
- collapse nested namespace specifiers.
- Clean up markdown format tables.
2021-02-15 17:18:03 -05:00
Allison Vacanti
d323f569b8 Add termination criteria API.
- min_samples
- min_time
- max_noise
- skip_time (not yet implemented)
- timeout

Refactored s/(trials)|(iters)/samples/s.
2021-02-15 12:04:15 -05:00
Allison Vacanti
e5914ff620 Clean up blocking_kernel.
- Rename release() -> unblock() to avoid confusion with release fences.
- Remove some unused headers.
2021-02-14 16:07:22 -05:00
Allison Vacanti
1cea5e1965 Add and use blocking_kernel. 2021-02-13 11:21:30 -05:00
Allison Vacanti
2125ada770 Call cudaDeviceReset from NVBENCH_MAIN. 2021-02-13 10:01:09 -05:00