Allison Vacanti
37e753f7b6
Update benchmark macros:
...
s/NVBENCH_CREATE/NVBENCH_BENCH/g
s/NVBENCH_BENCH_TEMPLATE/NVBENCH_BENCH_TYPES/g
This will fit nicer once the exec_tags version are added:
NVBENCH_BENCH
NVBENCH_BENCH_TYPES
NVBENCH_BENCH_FLAGS
NVBENCH_BENCH_TYPES_FLAGS
2021-02-16 16:08:38 -05:00
Allison Vacanti
d12326083d
Clean up l2flush initialization.
2021-02-16 12:01:50 -05:00
Allison Vacanti
f46dda0e81
Use noexcept CUDA_CALL check in destructor.
2021-02-16 12:00:04 -05:00
Allison Vacanti
55aa78ce17
Make the use of the blocking_kernel optional.
...
This breaks thrust algorithms, which sync internally. I'll need
to add an exec_tag to toggle this.
2021-02-15 21:55:26 -05:00
Allison Vacanti
bb871094c3
Fixes for multidevice/gcc.
...
- Allow devices to be cleared during benchmark definition.
- Fix various demangling bugs.
2021-02-15 21:26:21 -05:00
Allison Vacanti
8897490a6d
Add cxxabi demangling for gcc/clang.
2021-02-15 21:00:09 -05:00
Allison Vacanti
6c67578dcd
Implement skip_time and improve logging.
2021-02-15 17:39:46 -05:00
Allison Vacanti
ead8392bce
Use NVBENCH_THROW in option_parser.cu.
2021-02-15 17:19:07 -05:00
Allison Vacanti
6cf29b5083
Various small updates and refactorings.
...
- collapse nested namespace specifiers.
- Clean up markdown format tables.
2021-02-15 17:18:03 -05:00
Allison Vacanti
d323f569b8
Add termination criteria API.
...
- min_samples
- min_time
- max_noise
- skip_time (not yet implemented)
- timeout
Refactored s/(trials)|(iters)/samples/s.
2021-02-15 12:04:15 -05:00
Allison Vacanti
e5914ff620
Clean up blocking_kernel.
...
- Rename release() -> unblock() to avoid confusion with release fences.
- Remove some unused headers.
2021-02-14 16:07:22 -05:00
Allison Vacanti
1cea5e1965
Add and use blocking_kernel.
2021-02-13 11:21:30 -05:00
Allison Vacanti
2125ada770
Call cudaDeviceReset from NVBENCH_MAIN.
2021-02-13 10:01:09 -05:00
Allison Vacanti
3b57127571
Add NVBENCH_CUDA_CALL_NOEXCEPT.
...
It'll just exit instead of throw. Used in destructors and other
noexcept contexts.
2021-02-13 10:00:15 -05:00
Allison Vacanti
878f1ca4f6
Add --device option.
2021-02-12 21:51:17 -05:00
Allison Vacanti
92cc3b1189
Execute benchmarks on all devices.
2021-02-12 20:53:10 -05:00
Allison Vacanti
5348f65e12
Clean up nvbench::state.
...
- Replace cref member with ref_wrapper to make movable.
- Use friendship instead of inheritance for testing.
- Add missing [[nodiscard]] annotations.
2021-02-11 21:27:01 -05:00
Allison Vacanti
9f9c6e5278
Refactor state_generator.
...
The old implementation was scattered and ad hoc. This one is slightly
less so.
More importantly, refactoring to this design will make it easier to
add device traversal.
2021-02-11 21:24:58 -05:00
Allison Vacanti
4820c557a6
Update docstring.
2021-02-11 21:14:22 -05:00
Allison Vacanti
56d182ad41
Fix float64_axis test.
...
Changed to use `{:0.5g}` formatting for input strings until I figure
out something better.
2021-02-11 21:13:48 -05:00
Allison Vacanti
3bc8291b28
Allow benchmarks to be specified by index with --benchmark.
2021-02-10 21:40:39 -05:00
Allison Vacanti
2561816f15
Print indices for benchmarks in --list output.
2021-02-10 21:39:23 -05:00
Allison Vacanti
e9ae291736
Add benchmark_manager::get_benchmark(idx).
2021-02-10 21:38:56 -05:00
Allison Vacanti
0477514bb6
Axis spec revamp.
...
- Add support for single values ("Axis=Value").
- Make other value specs shell friendly:
- Range: "Axis:(2:10:2)" -> "Axis=[2:10:2]"
- List: "Axis:{2,3,4,5}" -> "Axis=[2,3,4,5]"
- ":" -> "=" feels more natural
- "{}()" characters have special meaning in bash.
- "[]" character don't require escapes.
- Using the same braces for both ranges/list is easier to remember,
only the delimiter changes.
2021-02-10 09:55:50 -05:00
Allison Vacanti
ed658f0cec
Make summary move-only.
...
This prevents subtle bugs like
auto s = state.add_summary("foo");
instead of
auto& s = state.add_summary("foo");
2021-02-10 09:19:18 -05:00
Allison Vacanti
bf60ff3f0f
Add byte formatting to markdown_format.
2021-02-10 09:17:49 -05:00
Allison Vacanti
696578422f
Compact the summary table columns.
...
Before:
```
| In | Out | Init | Size | (Size) | Cold Trials | Cold GPU | GPU Noise | Cold CPU | CPU Noise | Hot Trials | Hot GPU | Item Rate | GlobalMemUse | PeakGMem |
|-----|-----|------|------|-----------|-------------|------------|-----------|------------|-----------|------------|------------|-----------|--------------|----------|
| I32 | F32 | I32 | 2^20 | 1048576 | 7863 | 85.60 us | 3.35% | 127.19 us | 10.00% | 7864 | 84.98 us | 12.34 GHz | 91.94 GiB/s | 77.10% |
| I32 | F32 | I32 | 2^24 | 16777216 | 380 | 1240.13 us | 0.36% | 1316.48 us | 2.90% | 424 | 1236.32 us | 13.57 GHz | 101.11 GiB/s | 84.79% |
| I32 | F32 | I32 | 2^28 | 268435456 | 51 | 19.67 ms | 0.05% | 19.76 ms | 0.13% | 51 | 19.67 ms | 13.65 GHz | 101.70 GiB/s | 85.29% |
```
After:
```
| In | Out | Init | Size | (Size) | Cold GPU | Noise | Cold CPU | Noise | Trials | Hot GPU | Trials | Item Rate | GlobalMemUse | PeakGMem |
|-----|-----|------|------|-----------|------------|-------|------------|--------|--------|------------|--------|-----------|--------------|----------|
| I32 | F32 | I32 | 2^20 | 1048576 | 96.25 us | 2.41% | 141.23 us | 13.84% | 7081 | 96.16 us | 7082 | 10.90 GHz | 81.24 GiB/s | 68.13% |
| I32 | F32 | I32 | 2^24 | 16777216 | 1406.90 us | 0.27% | 1482.86 us | 2.21% | 338 | 1402.06 us | 374 | 11.97 GHz | 89.15 GiB/s | 74.77% |
| I32 | F32 | I32 | 2^28 | 268435456 | 22.29 ms | 0.12% | 22.38 ms | 0.22% | 45 | 22.28 ms | 45 | 12.05 GHz | 89.78 GiB/s | 75.29% |
```
2021-02-09 10:04:07 -05:00
Allison Vacanti
cd38a4e9ca
Allow duplicate column headers in markdown tables.
2021-02-09 10:03:39 -05:00
Allison Vacanti
d0ad118136
Add implementation of transform_reduce.
...
No released version of GCC supports this yet.
2021-02-08 14:20:53 -05:00
Allison Vacanti
bf94881477
Add warnings when max_time is exceeded without meeting other criteria.
2021-02-06 10:48:26 -05:00
Allison Vacanti
a0c4480a1e
Catch and report exceptions in NVBENCH_MAIN.
2021-02-06 09:34:54 -05:00
Allison Vacanti
40aa60b709
Report total time in log.
2021-02-06 09:32:28 -05:00
Allison Vacanti
478d657124
Clean up the cold convergence implementation.
2021-02-06 09:32:03 -05:00
Allison Vacanti
8c9cc84025
Only consider target time if noise convergence is not available.
2021-02-05 18:44:31 -05:00
Allison Vacanti
f7b985cd6e
Rework benchmark termination critera to use rel stdev convergence.
2021-02-05 18:08:39 -05:00
Allison Vacanti
d8b3c8967c
Fix "peak sm clock" descriptor to "default sm clock".
2021-02-05 18:03:29 -05:00
Allison Vacanti
88119aede1
Implement --list/-l, more markdown cleanup.
2021-02-05 16:35:37 -05:00
Allison Vacanti
8915fe6fc8
Add virtuals to get metadata from axis_base.
2021-02-05 16:27:36 -05:00
Allison Vacanti
e2364be3ba
Add more device stats and tweak markdown output.
2021-02-05 13:38:48 -05:00
Allison Vacanti
340196d778
Force-inline timer / cache flush APIs.
2021-02-05 13:08:08 -05:00
Allison Vacanti
4f8c6aac94
More markdown output tweaks.
2021-02-04 19:11:15 -05:00
Allison Vacanti
4bdc27de8c
More fixes for int64_t != long long.
2021-02-04 19:02:58 -05:00
Allison Vacanti
74d19a8e16
Clean up formatting.
2021-02-04 18:54:44 -05:00
Allison Vacanti
e302583c67
Fix usages of ASSERT_MSG that creates empty __VA_ARGS__.
...
Invoking a variadic macro with zero variadic args is illegal until
C++20. The extra calls to fmt::format were unnecessary, anyway.
2021-02-04 18:54:20 -05:00
Allison Vacanti
a12d17c1ca
Workaround the GCC/clang <charconv> dumpster fire.
...
<charconv> doesn't exist until gcc 8/clang 7, and still doesn't support
float types as of gcc 10.2 / clang 11.
Construct a string and use std::stoX instead, grumble grumble.
2021-02-04 18:19:27 -05:00
Allison Vacanti
f45114a4f3
Avoid ambiguity on systems where is_same_v<uint64_t, long long> == false.
2021-02-04 18:19:01 -05:00
Allison Vacanti
9e1465210c
Add newline between devices in markdown output.
2021-02-04 18:17:10 -05:00
Allison Vacanti
932024f4da
Add device_manager, device_info, and device_scope.
...
Added device printouts to markdown_format.
2021-02-04 16:44:41 -05:00
Allison Vacanti
6bb22b952c
Add option parsing to NVBENCH_MAIN.
...
- Convert benchmark_manager into a read-only structure.
- Mutable benchmarks will be provided by
`option_parser::get_benchmarks()` or
`benchmark_manager::clone_benchmarks()`.
2021-02-04 13:15:21 -05:00
Allison Vacanti
3e9d0ebc34
Clean up stray punctuation.
2021-02-04 12:06:07 -05:00