nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-04-20 14:58:54 +00:00

Author	SHA1	Message	Date
Georgy Evtushenko	61afb8d7e7	Initial implementation of nvbench_histogram.	2022-01-11 15:06:54 -05:00
Allison Vacanti	f1c985955a	Clean up JSON output for consistency and easier parsing. - Prefer an array of objects with `.name` fields over key/value pairs for arbitrary collections of objects. - Write common summary value names directly as fields.	2022-01-11 15:06:54 -05:00
Allison Vacanti	b11c0ba3a0	Add a binary JSON output (--jsonbin) that dumps timing samples.	2022-01-11 15:06:54 -05:00
Allison Vacanti	74e96e8618	Add nvbench_walltime.py script.	2022-01-11 15:06:54 -05:00
Allison Vacanti	5eac6b6340	Measure and report walltime for all measurements.	2022-01-11 15:06:54 -05:00
Allison Vacanti	6dee1eec3b	Refactor summary API and update nvbench/summary.cuh docs. The string used when constructing a summary is no longer a human readable name, but rather a tag string (e.g. "nv/cold/time/gpu/mean"). These will make lookup easier and more stable going forward. name vs. short_name no longer exists. Now there is just "name", which is used for column headings. The "description" string may still be used for detailed information. Updated the json tests and compare script to reflect these changes.	2022-01-11 15:06:26 -05:00
Allison Vacanti	9481e947aa	Add C++ dialect detection macros.	2022-01-11 14:40:33 -05:00
Allison Vacanti	fc86d6a524	Merge pull request #65 from allisonvacanti/fix_cpu_noise_calc Fix cpu noise calcs.	2021-12-22 13:29:37 -05:00
Allison Vacanti	6edc5b91a5	Fix cpu noise calcs.	2021-12-22 13:27:07 -05:00
Allison Vacanti	2f8bb28c52	Merge pull request #64 from allisonvacanti/noise_convergence New convergence check	2021-12-21 21:30:39 -05:00
Allison Vacanti	178dd0eb68	Implement new convergence check for noisy kernels. Previously, convergence was tested by waiting for the relative stdev of cuda timings ("noise") to drop below a certain percentage (`max_noise`). This assumed that all benchmarks would eventually see their noise drop to some threshold, but this is not the case. In practice, many benchmarks never converge to the default 0.5% relative stdev and instead will always run to the 15s timeout -- even if the means have converged in a second or two. Added a new check that tests when the noise itself stabilizes and ends the benchmark, even if noise > max_noise. After testing, this patch alone significantly reduces the runtime of the Thrust+CUB benchmark suite (from 30 hours to 5 hours) and produces similar timing results. The parameters used to tune this feature are not exposed -- if this approach works long-term and there's a strong motivation to let users tweak them, then we can worry about names/APIs/CLI/docs later.	2021-12-21 21:24:02 -05:00
Allison Vacanti	8e56a7bd94	Add `noisy_bench` with some benchmarks that currently always time-out.	2021-12-21 21:05:13 -05:00
Allison Vacanti	3c01814945	Skip non-json files and empty files in compare script.	2021-12-21 21:03:02 -05:00
Allison Vacanti	e70c31d7e1	Merge pull request #63 from allisonvacanti/fix_progress_display Fix progress display for inactive type axis values.	2021-12-21 20:42:05 -05:00
Allison Vacanti	8cacc821d0	Fix an error message. This path gets hit for type axes as well as strings.	2021-12-21 20:41:45 -05:00
Allison Vacanti	c9ab8e2eb3	Fix progress display for inactive type axis values. When type axis values were disabled they were still counted towards a benchmark's total number of configs.	2021-12-21 20:36:52 -05:00
Allison Vacanti	0f5c8624f6	Merge pull request #62 from allisonvacanti/debug_warnings Suppress warnings on MSVC Debug builds.	2021-12-21 19:41:19 -05:00
Allison Vacanti	288b1564e0	Suppress warnings on MSVC Debug builds. Also moved the config.cuh.in template into the source directory where it'll be easier to find.	2021-12-21 19:35:23 -05:00
Allison Vacanti	edf2018fd7	Merge pull request #58 from allisonvacanti/nvbench_executable Add an `nvbench-ctl` executable.	2021-12-21 12:08:39 -05:00
Allison Vacanti	20522c807d	Add an `nvbench-ctl` executable. This will provide functionality such as clock locking (--lgm), persistance mode (--pm), device querying (--list), version checking (--version), and documentation (--help). This is possible already with any nvbench executable, but having one with a reliable name will be helpful for scripting and writing documentation.	2021-12-21 12:02:07 -05:00
Allison Vacanti	986736aa09	Merge pull request #60 from allisonvacanti/59_ubuntu_cupti Add cupti path for ubuntu packages.	2021-12-20 14:35:27 -05:00
Allison Vacanti	61d094abf1	Add cupti path for ubuntu packages. Fixes #59	2021-12-20 14:34:12 -05:00
Allison Vacanti	ff1ad78cfa	Merge pull request #48 from robertmaynard/improve_compare_script_features nvbench_compare handles directories and can filter out non-interesting results	2021-12-20 13:46:24 -05:00
Robert Maynard	6c1f372c45	Allow nvbench [-flags] (files\|dirs)	2021-12-20 13:31:32 -05:00
Robert Maynard	35dd8de2ce	Remove unneeded scripts/requirements.txt	2021-12-20 13:24:24 -05:00
Allison Vacanti	a8422197a9	Merge pull request #57 from senior-zero/fix_option_parser Fix UB in option parser	2021-12-20 11:58:51 -05:00
Allison Vacanti	113b2f3f7f	Merge pull request #56 from allisonvacanti/pow2_axis_compact_md Reduce the width of pow2 axes in markdown tables.	2021-12-20 11:45:44 -05:00
Allison Vacanti	610b7767b5	Merge pull request #54 from allisonvacanti/progress_display Print progress in markdown log.	2021-12-20 11:44:50 -05:00
Allison Vacanti	51efc7d1a8	Merge pull request #53 from allisonvacanti/50_warning_flags Enable extra warning flags	2021-12-20 11:44:17 -05:00
Georgy Evtushenko	3bd37d0e75	Fix UB in option parser	2021-12-20 15:25:39 +03:00
Allison Vacanti	84f930809f	Reduce the width of pow2 axes in markdown tables. Before: ``` \| BlockSize \| (BlockSize) \| NumBlocks \| (NumBlocks) \| \|-----------\|-------------\|-----------\|-------------\| \| 2^6 \| 64 \| 2^6 \| 64 \| \| 2^8 \| 256 \| 2^6 \| 64 \| \| 2^10 \| 1024 \| 2^6 \| 64 \| \| 2^6 \| 64 \| 2^8 \| 256 \| \| 2^8 \| 256 \| 2^8 \| 256 \| \| 2^10 \| 1024 \| 2^8 \| 256 \| \| 2^6 \| 64 \| 2^10 \| 1024 \| \| 2^8 \| 256 \| 2^10 \| 1024 \| \| 2^10 \| 1024 \| 2^10 \| 1024 \| ``` After: ``` \| BlockSize \| NumBlocks \| \|-------------\|-------------\| \| 2^6 = 64 \| 2^6 = 64 \| \| 2^8 = 256 \| 2^6 = 64 \| \| 2^10 = 1024 \| 2^6 = 64 \| \| 2^6 = 64 \| 2^8 = 256 \| \| 2^8 = 256 \| 2^8 = 256 \| \| 2^10 = 1024 \| 2^8 = 256 \| \| 2^6 = 64 \| 2^10 = 1024 \| \| 2^8 = 256 \| 2^10 = 1024 \| \| 2^10 = 1024 \| 2^10 = 1024 \| ```	2021-12-19 10:38:14 -05:00
Allison Vacanti	37dd61b275	Clean up some virtual interfaces. - nvbench::benchmark doesn't add state, no need to override the destructor. - nvbench::printer_base's virtual API should support decoration, not just overriding. Making the virtual API protected instead of private allows derived classes to extend base class behavior. - nvbench::printer_base needs a virtual destructor. - Fix a bug in nvbench::printer_multiplex that caused the new `get_[total\|completed]_state_count()` methods to always return 0.	2021-12-19 10:26:40 -05:00
Allison Vacanti	3508775d71	Print progress in markdown log. e.g. ``` Run: [1/63] copy_type_sweep [Device=0 T=U8] Pass: Cold: 10.659315ms GPU, 10.670530ms CPU, 0.11s total GPU, 10x Pass: Batch: 10.298826ms GPU, 0.51s total GPU, 50x Run: [2/63] copy_type_sweep [Device=0 T=U16] Pass: Cold: 6.185874ms GPU, 6.194119ms CPU, 0.10s total GPU, 16x Pass: Batch: 6.174837ms GPU, 0.53s total GPU, 86x Run: [3/63] copy_type_sweep [Device=0 T=U32] ... Run: [63/63] copy_sweep_grid_shape [Device=0 BlockSize=2^10 NumBlocks=2^10] Pass: Cold: 4.921733ms GPU, 4.929724ms CPU, 0.10s total GPU, 21x Pass: Batch: 4.917333ms GPU, 0.53s total GPU, 107x ```	2021-12-19 03:07:17 -05:00
Allison Vacanti	5d70492714	Enable more warning flags. - /W4 on MSVC - -Wall -Wextra + others on gcc/clang - New NVBench_ENABLE_WERROR option to toggle "warnings as errors" - Mark the nlohmann_json library as IMPORTED to switch to system includes - Rename nvbench_main -> nvbench.main to follow target name conventions - Explicitly suppress some cudafe warnings when compiling templates in nlohmann_json headers. - Explicitly suppress some warnings from Thrust headers. - Various fixes for warnings exposed by new flags. - Disable CUPTI on CTK < 11.3 (See #52).	2021-12-18 20:13:25 -05:00
Allison Vacanti	15edfe2eee	Refactor to use NVBENCH_THROW where possible.	2021-12-18 17:52:39 -05:00
Allison Vacanti	9ff857ee29	Merge pull request #49 from senior-zero/fix_markdown_table Fix markdown table	2021-12-18 10:33:11 -05:00
Georgy Evtushenko	eb29ab27ff	Fix markdown table	2021-12-18 18:08:29 +03:00
Georgy Evtushenko	21ea12cd10	Merge pull request #29 from senior-zero/main-feature/github/cupti CUPTI support	2021-12-18 12:09:25 +03:00
Georgy Evtushenko	1bc715267c	CUPTI support	2021-12-18 12:03:52 +03:00
Allison Vacanti	3d6c16f8ba	Maintain iterator state in markdown table printer.	2021-12-18 01:27:38 -05:00
Allison Vacanti	07e1c56608	Merge pull request #46 from allisonvacanti/nvml Add NVML support for persistence mode, locking clocks.	2021-12-17 16:07:44 -05:00
Allison Vacanti	b948e79cab	Add NVML support for persistence mode, locking clocks. Locking clocks is currently only implemented for Volta+ devices. Example usage: my_bench -d [0,1,3] --persistence-mode 1 --lock-gpu-clocks base See the cli_help.md docs for more info.	2021-12-17 13:59:43 -05:00
Robert Maynard	f9b44378bf	nvbench_compare now supports comparing directories of results	2021-12-16 16:26:13 -05:00
Robert Maynard	905f84272e	Add --threshold-diff command option to nvbench_compare Allows us to filter output to only see the significantly different benchmarks	2021-12-16 15:52:30 -05:00
Robert Maynard	52d9aed8da	refactor to have a proper main entry point	2021-12-16 15:27:51 -05:00
Robert Maynard	3f6d496824	Add a requirements.txt for the nv_bench script	2021-12-16 13:44:40 -05:00
Allison Vacanti	d0c90ff920	Build static fmtlib with -fPIC.	2021-12-15 12:54:53 -05:00
Allison Vacanti	af03585543	Add coloring to markdown tables.	2021-12-14 23:03:14 -05:00
Allison Vacanti	8d77dc2b6c	Merge pull request #47 from allisonvacanti/base-two-bandwidth Use base2 format for displaying bandwidth.	2021-12-14 21:22:50 -05:00
Allison Vacanti	54fda533e1	Use base2 format for displaying bandwidth. Fixes #4.	2021-12-14 21:19:10 -05:00

1 2 3 4 5 ...

312 Commits