nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-04-20 14:58:54 +00:00

Author	SHA1	Message	Date
Allison Vacanti	8d6d934dfe	Add default axis names. Also cleaned up the annoying quirk where `set_type_axes_names` had to be called on all benchmarks with type axes. Default names are {"T", "U", "V", "W"} for up-to four type axes. For five or more, {"T0", "T1", ...} is used instead.	2021-02-19 12:37:05 -05:00
Allison Vacanti	324b0d107e	Add "global args" to option parser. If a benchmark modifier is passed before `--benchmark`, the modifier will apply to all benchmarks.	2021-02-19 10:37:00 -05:00
Allison Vacanti	a747982415	Add `nvbench::main` CMake target. Linking to this instead of `nvbench::nvbench` will automatically include the `NVBENCH_MAIN` macro.	2021-02-19 09:34:02 -05:00
Allison Vacanti	2cc9bf41e3	Add demangle<T>() convenience overload.	2021-02-18 23:41:49 -05:00
Allison Vacanti	543488ef75	Make kernel wrapper into an lvalue.	2021-02-18 23:41:23 -05:00
Allison Vacanti	b5443e98c8	Use `std::size_t` for element counts, buffer size metadata.	2021-02-18 18:23:14 -05:00
Allison Vacanti	7dd46b0021	Update old benchmarks to use nvbench, remove old scaffolding. Remove the original attempt to adapt gbench to do CUDA stuff. Update all benchmarks to use some conventions: - Element count -> "Elements" [16:32] - Throughput calcs - Add input buffer column: "Size"	2021-02-18 18:22:50 -05:00
Allison Vacanti	7657036f9c	Add helper methods to configure throughput. Instead of: ``` state.set_element_count(size); state.set_global_memory_bytes_accessed( size * (sizeof(InT) + sizeof(OutT))); ``` do: ``` state.add_element_count(size, "Elements"); state.add_global_memory_read<InT>(size, "InputSize"); state.add_global_memory_write<InT>(size, "OutputSize"); ``` The string arguments are optional. If provided, a new column will be added to the output with the indicated name and number of bytes (or elements for `add_element_count`).	2021-02-18 15:47:59 -05:00
Allison Vacanti	dcd5d1ffa6	Update markdown output format.	2021-02-18 14:44:17 -05:00
Allison Vacanti	ef3e1594eb	Implement manual timers. See the new thrust/sort/basic.cu benchmark for usage. Other notable changes: - Updated summary column names: - Cold GPU -> GPU Time - Cold CPU -> CPU Time - Hot GPU -> Batch GPU - Removed CPU timings from measure_hot - They'd been hidden for a while, and aren't really useful. - Moved the throughput calcs to measure_cold - `timer` will disable `hot` timings, still want throughput - `cold` timings make more sense for throughput, global BW numbers are meaningless if the data is sitting in L2.	2021-02-17 18:48:26 -05:00
Allison Vacanti	385d4f77ba	Teach markdown_format about sample_sizes.	2021-02-17 18:34:35 -05:00
Allison Vacanti	8a1f017a4e	Inline some methods used in benchmark loops.	2021-02-17 18:34:09 -05:00
Allison Vacanti	f61be70a93	Add initial implementation of exec_tag dispatching. nvbench::exec_tags are used to request measurement types and share information about the kernel. They are used to ensure that templated measurement code is not instantiated unless actually used. Replaces the nvbench::exec(state, launcher, tags) pattern with: state.exec(tags, launcher); state.exec(launcher); // defaults to hot/cold cuda measurements	2021-02-16 23:47:36 -05:00
Allison Vacanti	37e753f7b6	Update benchmark macros: s/NVBENCH_CREATE/NVBENCH_BENCH/g s/NVBENCH_BENCH_TEMPLATE/NVBENCH_BENCH_TYPES/g This will fit nicer once the exec_tags version are added: NVBENCH_BENCH NVBENCH_BENCH_TYPES NVBENCH_BENCH_FLAGS NVBENCH_BENCH_TYPES_FLAGS	2021-02-16 16:08:38 -05:00
Allison Vacanti	d12326083d	Clean up l2flush initialization.	2021-02-16 12:01:50 -05:00
Allison Vacanti	f46dda0e81	Use noexcept CUDA_CALL check in destructor.	2021-02-16 12:00:04 -05:00
Allison Vacanti	55aa78ce17	Make the use of the blocking_kernel optional. This breaks thrust algorithms, which sync internally. I'll need to add an exec_tag to toggle this.	2021-02-15 21:55:26 -05:00
Allison Vacanti	bb871094c3	Fixes for multidevice/gcc. - Allow devices to be cleared during benchmark definition. - Fix various demangling bugs.	2021-02-15 21:26:21 -05:00
Allison Vacanti	8897490a6d	Add cxxabi demangling for gcc/clang.	2021-02-15 21:00:09 -05:00
Allison Vacanti	6c67578dcd	Implement skip_time and improve logging.	2021-02-15 17:39:46 -05:00
Allison Vacanti	ead8392bce	Use NVBENCH_THROW in option_parser.cu.	2021-02-15 17:19:07 -05:00
Allison Vacanti	6cf29b5083	Various small updates and refactorings. - collapse nested namespace specifiers. - Clean up markdown format tables.	2021-02-15 17:18:03 -05:00
Allison Vacanti	d323f569b8	Add termination criteria API. - min_samples - min_time - max_noise - skip_time (not yet implemented) - timeout Refactored s/(trials)\|(iters)/samples/s.	2021-02-15 12:04:15 -05:00
Allison Vacanti	e5914ff620	Clean up blocking_kernel. - Rename release() -> unblock() to avoid confusion with release fences. - Remove some unused headers.	2021-02-14 16:07:22 -05:00
Allison Vacanti	1cea5e1965	Add and use blocking_kernel.	2021-02-13 11:21:30 -05:00
Allison Vacanti	2125ada770	Call cudaDeviceReset from NVBENCH_MAIN.	2021-02-13 10:01:09 -05:00
Allison Vacanti	3b57127571	Add NVBENCH_CUDA_CALL_NOEXCEPT. It'll just exit instead of throw. Used in destructors and other noexcept contexts.	2021-02-13 10:00:15 -05:00
Allison Vacanti	878f1ca4f6	Add --device option.	2021-02-12 21:51:17 -05:00
Allison Vacanti	92cc3b1189	Execute benchmarks on all devices.	2021-02-12 20:53:10 -05:00
Allison Vacanti	5348f65e12	Clean up nvbench::state. - Replace cref member with ref_wrapper to make movable. - Use friendship instead of inheritance for testing. - Add missing [[nodiscard]] annotations.	2021-02-11 21:27:01 -05:00
Allison Vacanti	9f9c6e5278	Refactor state_generator. The old implementation was scattered and ad hoc. This one is slightly less so. More importantly, refactoring to this design will make it easier to add device traversal.	2021-02-11 21:24:58 -05:00
Allison Vacanti	4820c557a6	Update docstring.	2021-02-11 21:14:22 -05:00
Allison Vacanti	56d182ad41	Fix float64_axis test. Changed to use `{:0.5g}` formatting for input strings until I figure out something better.	2021-02-11 21:13:48 -05:00
Allison Vacanti	3bc8291b28	Allow benchmarks to be specified by index with `--benchmark`.	2021-02-10 21:40:39 -05:00
Allison Vacanti	2561816f15	Print indices for benchmarks in `--list` output.	2021-02-10 21:39:23 -05:00
Allison Vacanti	e9ae291736	Add benchmark_manager::get_benchmark(idx).	2021-02-10 21:38:56 -05:00
Allison Vacanti	0477514bb6	Axis spec revamp. - Add support for single values ("Axis=Value"). - Make other value specs shell friendly: - Range: "Axis:(2:10:2)" -> "Axis=[2:10:2]" - List: "Axis:{2,3,4,5}" -> "Axis=[2,3,4,5]" - ":" -> "=" feels more natural - "{}()" characters have special meaning in bash. - "[]" character don't require escapes. - Using the same braces for both ranges/list is easier to remember, only the delimiter changes.	2021-02-10 09:55:50 -05:00
Allison Vacanti	ed658f0cec	Make summary move-only. This prevents subtle bugs like auto s = state.add_summary("foo"); instead of auto& s = state.add_summary("foo");	2021-02-10 09:19:18 -05:00
Allison Vacanti	bf60ff3f0f	Add byte formatting to markdown_format.	2021-02-10 09:17:49 -05:00
Allison Vacanti	696578422f	Compact the summary table columns. Before: ``` \| In \| Out \| Init \| Size \| (Size) \| Cold Trials \| Cold GPU \| GPU Noise \| Cold CPU \| CPU Noise \| Hot Trials \| Hot GPU \| Item Rate \| GlobalMemUse \| PeakGMem \| \|-----\|-----\|------\|------\|-----------\|-------------\|------------\|-----------\|------------\|-----------\|------------\|------------\|-----------\|--------------\|----------\| \| I32 \| F32 \| I32 \| 2^20 \| 1048576 \| 7863 \| 85.60 us \| 3.35% \| 127.19 us \| 10.00% \| 7864 \| 84.98 us \| 12.34 GHz \| 91.94 GiB/s \| 77.10% \| \| I32 \| F32 \| I32 \| 2^24 \| 16777216 \| 380 \| 1240.13 us \| 0.36% \| 1316.48 us \| 2.90% \| 424 \| 1236.32 us \| 13.57 GHz \| 101.11 GiB/s \| 84.79% \| \| I32 \| F32 \| I32 \| 2^28 \| 268435456 \| 51 \| 19.67 ms \| 0.05% \| 19.76 ms \| 0.13% \| 51 \| 19.67 ms \| 13.65 GHz \| 101.70 GiB/s \| 85.29% \| ``` After: ``` \| In \| Out \| Init \| Size \| (Size) \| Cold GPU \| Noise \| Cold CPU \| Noise \| Trials \| Hot GPU \| Trials \| Item Rate \| GlobalMemUse \| PeakGMem \| \|-----\|-----\|------\|------\|-----------\|------------\|-------\|------------\|--------\|--------\|------------\|--------\|-----------\|--------------\|----------\| \| I32 \| F32 \| I32 \| 2^20 \| 1048576 \| 96.25 us \| 2.41% \| 141.23 us \| 13.84% \| 7081 \| 96.16 us \| 7082 \| 10.90 GHz \| 81.24 GiB/s \| 68.13% \| \| I32 \| F32 \| I32 \| 2^24 \| 16777216 \| 1406.90 us \| 0.27% \| 1482.86 us \| 2.21% \| 338 \| 1402.06 us \| 374 \| 11.97 GHz \| 89.15 GiB/s \| 74.77% \| \| I32 \| F32 \| I32 \| 2^28 \| 268435456 \| 22.29 ms \| 0.12% \| 22.38 ms \| 0.22% \| 45 \| 22.28 ms \| 45 \| 12.05 GHz \| 89.78 GiB/s \| 75.29% \| ```	2021-02-09 10:04:07 -05:00
Allison Vacanti	cd38a4e9ca	Allow duplicate column headers in markdown tables.	2021-02-09 10:03:39 -05:00
Allison Vacanti	d0ad118136	Add implementation of transform_reduce. No released version of GCC supports this yet.	2021-02-08 14:20:53 -05:00
Allison Vacanti	bf94881477	Add warnings when max_time is exceeded without meeting other criteria.	2021-02-06 10:48:26 -05:00
Allison Vacanti	a0c4480a1e	Catch and report exceptions in NVBENCH_MAIN.	2021-02-06 09:34:54 -05:00
Allison Vacanti	40aa60b709	Report total time in log.	2021-02-06 09:32:28 -05:00
Allison Vacanti	478d657124	Clean up the cold convergence implementation.	2021-02-06 09:32:03 -05:00
Allison Vacanti	8c9cc84025	Only consider target time if noise convergence is not available.	2021-02-05 18:44:31 -05:00
Allison Vacanti	f7b985cd6e	Rework benchmark termination critera to use rel stdev convergence.	2021-02-05 18:08:39 -05:00
Allison Vacanti	d8b3c8967c	Fix "peak sm clock" descriptor to "default sm clock".	2021-02-05 18:03:29 -05:00
Allison Vacanti	88119aede1	Implement --list/-l, more markdown cleanup.	2021-02-05 16:35:37 -05:00

1 2 3

119 Commits