Commit Graph

256 Commits

Author SHA1 Message Date
Allison Vacanti
37e753f7b6 Update benchmark macros:
s/NVBENCH_CREATE/NVBENCH_BENCH/g
s/NVBENCH_BENCH_TEMPLATE/NVBENCH_BENCH_TYPES/g

This will fit nicer once the exec_tags version are added:

NVBENCH_BENCH
NVBENCH_BENCH_TYPES
NVBENCH_BENCH_FLAGS
NVBENCH_BENCH_TYPES_FLAGS
2021-02-16 16:08:38 -05:00
Allison Vacanti
d12326083d Clean up l2flush initialization. 2021-02-16 12:01:50 -05:00
Allison Vacanti
f46dda0e81 Use noexcept CUDA_CALL check in destructor. 2021-02-16 12:00:04 -05:00
Allison Vacanti
55aa78ce17 Make the use of the blocking_kernel optional.
This breaks thrust algorithms, which sync internally. I'll need
to add an exec_tag to toggle this.
2021-02-15 21:55:26 -05:00
Allison Vacanti
bb871094c3 Fixes for multidevice/gcc.
- Allow devices to be cleared during benchmark definition.
- Fix various demangling bugs.
2021-02-15 21:26:21 -05:00
Allison Vacanti
8897490a6d Add cxxabi demangling for gcc/clang. 2021-02-15 21:00:09 -05:00
Allison Vacanti
6c67578dcd Implement skip_time and improve logging. 2021-02-15 17:39:46 -05:00
Allison Vacanti
ead8392bce Use NVBENCH_THROW in option_parser.cu. 2021-02-15 17:19:07 -05:00
Allison Vacanti
6cf29b5083 Various small updates and refactorings.
- collapse nested namespace specifiers.
- Clean up markdown format tables.
2021-02-15 17:18:03 -05:00
Allison Vacanti
d323f569b8 Add termination criteria API.
- min_samples
- min_time
- max_noise
- skip_time (not yet implemented)
- timeout

Refactored s/(trials)|(iters)/samples/s.
2021-02-15 12:04:15 -05:00
Allison Vacanti
e5914ff620 Clean up blocking_kernel.
- Rename release() -> unblock() to avoid confusion with release fences.
- Remove some unused headers.
2021-02-14 16:07:22 -05:00
Allison Vacanti
1cea5e1965 Add and use blocking_kernel. 2021-02-13 11:21:30 -05:00
Allison Vacanti
2125ada770 Call cudaDeviceReset from NVBENCH_MAIN. 2021-02-13 10:01:09 -05:00
Allison Vacanti
3b57127571 Add NVBENCH_CUDA_CALL_NOEXCEPT.
It'll just exit instead of throw. Used in destructors and other
noexcept contexts.
2021-02-13 10:00:15 -05:00
Allison Vacanti
878f1ca4f6 Add --device option. 2021-02-12 21:51:17 -05:00
Allison Vacanti
92cc3b1189 Execute benchmarks on all devices. 2021-02-12 20:53:10 -05:00
Allison Vacanti
5348f65e12 Clean up nvbench::state.
- Replace cref member with ref_wrapper to make movable.
- Use friendship instead of inheritance for testing.
- Add missing [[nodiscard]] annotations.
2021-02-11 21:27:01 -05:00
Allison Vacanti
9f9c6e5278 Refactor state_generator.
The old implementation was scattered and ad hoc. This one is slightly
less so.

More importantly, refactoring to this design will make it easier to
add device traversal.
2021-02-11 21:24:58 -05:00
Allison Vacanti
4820c557a6 Update docstring. 2021-02-11 21:14:22 -05:00
Allison Vacanti
56d182ad41 Fix float64_axis test.
Changed to use `{:0.5g}` formatting for input strings until I figure
out something better.
2021-02-11 21:13:48 -05:00
Allison Vacanti
3bc8291b28 Allow benchmarks to be specified by index with --benchmark. 2021-02-10 21:40:39 -05:00
Allison Vacanti
2561816f15 Print indices for benchmarks in --list output. 2021-02-10 21:39:23 -05:00
Allison Vacanti
e9ae291736 Add benchmark_manager::get_benchmark(idx). 2021-02-10 21:38:56 -05:00
Allison Vacanti
0477514bb6 Axis spec revamp.
- Add support for single values ("Axis=Value").
- Make other value specs shell friendly:
  - Range: "Axis:(2:10:2)"  -> "Axis=[2:10:2]"
  - List:  "Axis:{2,3,4,5}" -> "Axis=[2,3,4,5]"
  - ":" -> "=" feels more natural
  - "{}()" characters have special meaning in bash.
  - "[]" character don't require escapes.
  - Using the same braces for both ranges/list is easier to remember,
    only the delimiter changes.
2021-02-10 09:55:50 -05:00
Allison Vacanti
ed658f0cec Make summary move-only.
This prevents subtle bugs like

auto s = state.add_summary("foo");

instead of

auto& s = state.add_summary("foo");
2021-02-10 09:19:18 -05:00
Allison Vacanti
bf60ff3f0f Add byte formatting to markdown_format. 2021-02-10 09:17:49 -05:00
Allison Vacanti
696578422f Compact the summary table columns.
Before:

```
| In  | Out | Init | Size |  (Size)   | Cold Trials |  Cold GPU  | GPU Noise |  Cold CPU  | CPU Noise | Hot Trials |  Hot GPU   | Item Rate | GlobalMemUse | PeakGMem |
|-----|-----|------|------|-----------|-------------|------------|-----------|------------|-----------|------------|------------|-----------|--------------|----------|
| I32 | F32 |  I32 | 2^20 |   1048576 |        7863 |   85.60 us |     3.35% |  127.19 us |    10.00% |       7864 |   84.98 us | 12.34 GHz |  91.94 GiB/s |   77.10% |
| I32 | F32 |  I32 | 2^24 |  16777216 |         380 | 1240.13 us |     0.36% | 1316.48 us |     2.90% |        424 | 1236.32 us | 13.57 GHz | 101.11 GiB/s |   84.79% |
| I32 | F32 |  I32 | 2^28 | 268435456 |          51 |   19.67 ms |     0.05% |   19.76 ms |     0.13% |         51 |   19.67 ms | 13.65 GHz | 101.70 GiB/s |   85.29% |
```

After:

```
| In  | Out | Init | Size |  (Size)   |  Cold GPU  | Noise |  Cold CPU  | Noise  | Trials |  Hot GPU   | Trials | Item Rate | GlobalMemUse | PeakGMem |
|-----|-----|------|------|-----------|------------|-------|------------|--------|--------|------------|--------|-----------|--------------|----------|
| I32 | F32 |  I32 | 2^20 |   1048576 |   96.25 us | 2.41% |  141.23 us | 13.84% |   7081 |   96.16 us |   7082 | 10.90 GHz |  81.24 GiB/s |   68.13% |
| I32 | F32 |  I32 | 2^24 |  16777216 | 1406.90 us | 0.27% | 1482.86 us |  2.21% |    338 | 1402.06 us |    374 | 11.97 GHz |  89.15 GiB/s |   74.77% |
| I32 | F32 |  I32 | 2^28 | 268435456 |   22.29 ms | 0.12% |   22.38 ms |  0.22% |     45 |   22.28 ms |     45 | 12.05 GHz |  89.78 GiB/s |   75.29% |
```
2021-02-09 10:04:07 -05:00
Allison Vacanti
cd38a4e9ca Allow duplicate column headers in markdown tables. 2021-02-09 10:03:39 -05:00
Allison Vacanti
d0ad118136 Add implementation of transform_reduce.
No released version of GCC supports this yet.
2021-02-08 14:20:53 -05:00
Allison Vacanti
bf94881477 Add warnings when max_time is exceeded without meeting other criteria. 2021-02-06 10:48:26 -05:00
Allison Vacanti
a0c4480a1e Catch and report exceptions in NVBENCH_MAIN. 2021-02-06 09:34:54 -05:00
Allison Vacanti
40aa60b709 Report total time in log. 2021-02-06 09:32:28 -05:00
Allison Vacanti
478d657124 Clean up the cold convergence implementation. 2021-02-06 09:32:03 -05:00
Allison Vacanti
8c9cc84025 Only consider target time if noise convergence is not available. 2021-02-05 18:44:31 -05:00
Allison Vacanti
f7b985cd6e Rework benchmark termination critera to use rel stdev convergence. 2021-02-05 18:08:39 -05:00
Allison Vacanti
d8b3c8967c Fix "peak sm clock" descriptor to "default sm clock". 2021-02-05 18:03:29 -05:00
Allison Vacanti
88119aede1 Implement --list/-l, more markdown cleanup. 2021-02-05 16:35:37 -05:00
Allison Vacanti
8915fe6fc8 Add virtuals to get metadata from axis_base. 2021-02-05 16:27:36 -05:00
Allison Vacanti
e2364be3ba Add more device stats and tweak markdown output. 2021-02-05 13:38:48 -05:00
Allison Vacanti
340196d778 Force-inline timer / cache flush APIs. 2021-02-05 13:08:08 -05:00
Allison Vacanti
4f8c6aac94 More markdown output tweaks. 2021-02-04 19:11:15 -05:00
Allison Vacanti
4bdc27de8c More fixes for int64_t != long long. 2021-02-04 19:02:58 -05:00
Allison Vacanti
74d19a8e16 Clean up formatting. 2021-02-04 18:54:44 -05:00
Allison Vacanti
e302583c67 Fix usages of ASSERT_MSG that creates empty __VA_ARGS__.
Invoking a variadic macro with zero variadic args is illegal until
C++20. The extra calls to fmt::format were unnecessary, anyway.
2021-02-04 18:54:20 -05:00
Allison Vacanti
a12d17c1ca Workaround the GCC/clang <charconv> dumpster fire.
<charconv> doesn't exist until gcc 8/clang 7, and still doesn't support
float types as of gcc 10.2 / clang 11.

Construct a string and use std::stoX instead, grumble grumble.
2021-02-04 18:19:27 -05:00
Allison Vacanti
f45114a4f3 Avoid ambiguity on systems where is_same_v<uint64_t, long long> == false. 2021-02-04 18:19:01 -05:00
Allison Vacanti
9e1465210c Add newline between devices in markdown output. 2021-02-04 18:17:10 -05:00
Allison Vacanti
932024f4da Add device_manager, device_info, and device_scope.
Added device printouts to markdown_format.
2021-02-04 16:44:41 -05:00
Allison Vacanti
6bb22b952c Add option parsing to NVBENCH_MAIN.
- Convert benchmark_manager into a read-only structure.
  - Mutable benchmarks will be provided by
    `option_parser::get_benchmarks()` or
    `benchmark_manager::clone_benchmarks()`.
2021-02-04 13:15:21 -05:00
Allison Vacanti
3e9d0ebc34 Clean up stray punctuation. 2021-02-04 12:06:07 -05:00