Commit Graph

312 Commits

Author SHA1 Message Date
Georgy Evtushenko
61afb8d7e7 Initial implementation of nvbench_histogram. 2022-01-11 15:06:54 -05:00
Allison Vacanti
f1c985955a Clean up JSON output for consistency and easier parsing.
- Prefer an array of objects with `.name` fields over key/value pairs
  for arbitrary collections of objects.
- Write common summary value names directly as fields.
2022-01-11 15:06:54 -05:00
Allison Vacanti
b11c0ba3a0 Add a binary JSON output (--jsonbin) that dumps timing samples. 2022-01-11 15:06:54 -05:00
Allison Vacanti
74e96e8618 Add nvbench_walltime.py script. 2022-01-11 15:06:54 -05:00
Allison Vacanti
5eac6b6340 Measure and report walltime for all measurements. 2022-01-11 15:06:54 -05:00
Allison Vacanti
6dee1eec3b Refactor summary API and update nvbench/summary.cuh docs.
The string used when constructing a summary is no longer a human
readable name, but rather a tag string (e.g. "nv/cold/time/gpu/mean").
These will make lookup easier and more stable going forward.

name vs. short_name no longer exists. Now there is just "name", which
is used for column headings. The "description" string may still be
used for detailed information.

Updated the json tests and compare script to reflect these changes.
2022-01-11 15:06:26 -05:00
Allison Vacanti
9481e947aa Add C++ dialect detection macros. 2022-01-11 14:40:33 -05:00
Allison Vacanti
fc86d6a524 Merge pull request #65 from allisonvacanti/fix_cpu_noise_calc
Fix cpu noise calcs.
2021-12-22 13:29:37 -05:00
Allison Vacanti
6edc5b91a5 Fix cpu noise calcs. 2021-12-22 13:27:07 -05:00
Allison Vacanti
2f8bb28c52 Merge pull request #64 from allisonvacanti/noise_convergence
New convergence check
2021-12-21 21:30:39 -05:00
Allison Vacanti
178dd0eb68 Implement new convergence check for noisy kernels.
Previously, convergence was tested by waiting for the relative stdev
of cuda timings ("noise") to drop below a certain percentage
(`max_noise`).

This assumed that all benchmarks would eventually see their noise drop
to some threshold, but this is not the case. In practice, many benchmarks
never converge to the default 0.5% relative stdev and instead will always
run to the 15s timeout -- even if the means have converged in a second
or two.

Added a new check that tests when the noise itself stabilizes and ends
the benchmark, even if noise > max_noise.

After testing, this patch alone significantly reduces the runtime of the
Thrust+CUB benchmark suite (from 30 hours to 5 hours) and produces similar
timing results.

The parameters used to tune this feature are not exposed -- if this
approach works long-term and there's a strong motivation to let users
tweak them, then we can worry about names/APIs/CLI/docs later.
2021-12-21 21:24:02 -05:00
Allison Vacanti
8e56a7bd94 Add noisy_bench with some benchmarks that currently always time-out. 2021-12-21 21:05:13 -05:00
Allison Vacanti
3c01814945 Skip non-json files and empty files in compare script. 2021-12-21 21:03:02 -05:00
Allison Vacanti
e70c31d7e1 Merge pull request #63 from allisonvacanti/fix_progress_display
Fix progress display for inactive type axis values.
2021-12-21 20:42:05 -05:00
Allison Vacanti
8cacc821d0 Fix an error message.
This path gets hit for type axes as well as strings.
2021-12-21 20:41:45 -05:00
Allison Vacanti
c9ab8e2eb3 Fix progress display for inactive type axis values.
When type axis values were disabled they were still counted towards
a benchmark's total number of configs.
2021-12-21 20:36:52 -05:00
Allison Vacanti
0f5c8624f6 Merge pull request #62 from allisonvacanti/debug_warnings
Suppress warnings on MSVC Debug builds.
2021-12-21 19:41:19 -05:00
Allison Vacanti
288b1564e0 Suppress warnings on MSVC Debug builds.
Also moved the config.cuh.in template into the source directory where
it'll be easier to find.
2021-12-21 19:35:23 -05:00
Allison Vacanti
edf2018fd7 Merge pull request #58 from allisonvacanti/nvbench_executable
Add an `nvbench-ctl` executable.
2021-12-21 12:08:39 -05:00
Allison Vacanti
20522c807d Add an nvbench-ctl executable.
This will provide functionality such as clock locking (--lgm),
persistance mode (--pm), device querying (--list), version checking
(--version), and documentation (--help).

This is possible already with any nvbench executable, but having
one with a reliable name will be helpful for scripting and writing
documentation.
2021-12-21 12:02:07 -05:00
Allison Vacanti
986736aa09 Merge pull request #60 from allisonvacanti/59_ubuntu_cupti
Add cupti path for ubuntu packages.
2021-12-20 14:35:27 -05:00
Allison Vacanti
61d094abf1 Add cupti path for ubuntu packages.
Fixes #59
2021-12-20 14:34:12 -05:00
Allison Vacanti
ff1ad78cfa Merge pull request #48 from robertmaynard/improve_compare_script_features
nvbench_compare handles directories and can filter out non-interesting results
2021-12-20 13:46:24 -05:00
Robert Maynard
6c1f372c45 Allow nvbench [-flags] (files|dirs) 2021-12-20 13:31:32 -05:00
Robert Maynard
35dd8de2ce Remove unneeded scripts/requirements.txt 2021-12-20 13:24:24 -05:00
Allison Vacanti
a8422197a9 Merge pull request #57 from senior-zero/fix_option_parser
Fix UB in option parser
2021-12-20 11:58:51 -05:00
Allison Vacanti
113b2f3f7f Merge pull request #56 from allisonvacanti/pow2_axis_compact_md
Reduce the width of pow2 axes in markdown tables.
2021-12-20 11:45:44 -05:00
Allison Vacanti
610b7767b5 Merge pull request #54 from allisonvacanti/progress_display
Print progress in markdown log.
2021-12-20 11:44:50 -05:00
Allison Vacanti
51efc7d1a8 Merge pull request #53 from allisonvacanti/50_warning_flags
Enable extra warning flags
2021-12-20 11:44:17 -05:00
Georgy Evtushenko
3bd37d0e75 Fix UB in option parser 2021-12-20 15:25:39 +03:00
Allison Vacanti
84f930809f Reduce the width of pow2 axes in markdown tables.
Before:

```
| BlockSize | (BlockSize) | NumBlocks | (NumBlocks) |
|-----------|-------------|-----------|-------------|
|       2^6 |          64 |       2^6 |          64 |
|       2^8 |         256 |       2^6 |          64 |
|      2^10 |        1024 |       2^6 |          64 |
|       2^6 |          64 |       2^8 |         256 |
|       2^8 |         256 |       2^8 |         256 |
|      2^10 |        1024 |       2^8 |         256 |
|       2^6 |          64 |      2^10 |        1024 |
|       2^8 |         256 |      2^10 |        1024 |
|      2^10 |        1024 |      2^10 |        1024 |
```

After:

```
|  BlockSize  |  NumBlocks  |
|-------------|-------------|
|    2^6 = 64 |    2^6 = 64 |
|   2^8 = 256 |    2^6 = 64 |
| 2^10 = 1024 |    2^6 = 64 |
|    2^6 = 64 |   2^8 = 256 |
|   2^8 = 256 |   2^8 = 256 |
| 2^10 = 1024 |   2^8 = 256 |
|    2^6 = 64 | 2^10 = 1024 |
|   2^8 = 256 | 2^10 = 1024 |
| 2^10 = 1024 | 2^10 = 1024 |
```
2021-12-19 10:38:14 -05:00
Allison Vacanti
37dd61b275 Clean up some virtual interfaces.
- nvbench::benchmark doesn't add state, no need to override the destructor.
- nvbench::printer_base's virtual API should support decoration, not just
  overriding. Making the virtual API protected instead of private allows
  derived classes to extend base class behavior.
- nvbench::printer_base needs a virtual destructor.
- Fix a bug in nvbench::printer_multiplex that caused the new
  `get_[total|completed]_state_count()` methods to always return 0.
2021-12-19 10:26:40 -05:00
Allison Vacanti
3508775d71 Print progress in markdown log.
e.g.

```
Run:  [1/63] copy_type_sweep [Device=0 T=U8]
Pass: Cold: 10.659315ms GPU, 10.670530ms CPU, 0.11s total GPU, 10x
Pass: Batch: 10.298826ms GPU, 0.51s total GPU, 50x
Run:  [2/63] copy_type_sweep [Device=0 T=U16]
Pass: Cold: 6.185874ms GPU, 6.194119ms CPU, 0.10s total GPU, 16x
Pass: Batch: 6.174837ms GPU, 0.53s total GPU, 86x
Run:  [3/63] copy_type_sweep [Device=0 T=U32]
...
Run:  [63/63] copy_sweep_grid_shape [Device=0 BlockSize=2^10 NumBlocks=2^10]
Pass: Cold: 4.921733ms GPU, 4.929724ms CPU, 0.10s total GPU, 21x
Pass: Batch: 4.917333ms GPU, 0.53s total GPU, 107x
```
2021-12-19 03:07:17 -05:00
Allison Vacanti
5d70492714 Enable more warning flags.
- /W4 on MSVC
- -Wall -Wextra + others on gcc/clang
- New NVBench_ENABLE_WERROR option to toggle "warnings as errors"
- Mark the nlohmann_json library as IMPORTED to switch to system includes
- Rename nvbench_main -> nvbench.main to follow target name conventions
- Explicitly suppress some cudafe warnings when compiling templates in
  nlohmann_json headers.
- Explicitly suppress some warnings from Thrust headers.
- Various fixes for warnings exposed by new flags.
- Disable CUPTI on CTK < 11.3 (See #52).
2021-12-18 20:13:25 -05:00
Allison Vacanti
15edfe2eee Refactor to use NVBENCH_THROW where possible. 2021-12-18 17:52:39 -05:00
Allison Vacanti
9ff857ee29 Merge pull request #49 from senior-zero/fix_markdown_table
Fix markdown table
2021-12-18 10:33:11 -05:00
Georgy Evtushenko
eb29ab27ff Fix markdown table 2021-12-18 18:08:29 +03:00
Georgy Evtushenko
21ea12cd10 Merge pull request #29 from senior-zero/main-feature/github/cupti
CUPTI support
2021-12-18 12:09:25 +03:00
Georgy Evtushenko
1bc715267c CUPTI support 2021-12-18 12:03:52 +03:00
Allison Vacanti
3d6c16f8ba Maintain iterator state in markdown table printer. 2021-12-18 01:27:38 -05:00
Allison Vacanti
07e1c56608 Merge pull request #46 from allisonvacanti/nvml
Add NVML support for persistence mode, locking clocks.
2021-12-17 16:07:44 -05:00
Allison Vacanti
b948e79cab Add NVML support for persistence mode, locking clocks.
Locking clocks is currently only implemented for Volta+ devices.

Example usage:

my_bench -d [0,1,3] --persistence-mode 1 --lock-gpu-clocks base

See the cli_help.md docs for more info.
2021-12-17 13:59:43 -05:00
Robert Maynard
f9b44378bf nvbench_compare now supports comparing directories of results 2021-12-16 16:26:13 -05:00
Robert Maynard
905f84272e Add --threshold-diff command option to nvbench_compare
Allows us to filter output to only see the significantly different
benchmarks
2021-12-16 15:52:30 -05:00
Robert Maynard
52d9aed8da refactor to have a proper main entry point 2021-12-16 15:27:51 -05:00
Robert Maynard
3f6d496824 Add a requirements.txt for the nv_bench script 2021-12-16 13:44:40 -05:00
Allison Vacanti
d0c90ff920 Build static fmtlib with -fPIC. 2021-12-15 12:54:53 -05:00
Allison Vacanti
af03585543 Add coloring to markdown tables. 2021-12-14 23:03:14 -05:00
Allison Vacanti
8d77dc2b6c Merge pull request #47 from allisonvacanti/base-two-bandwidth
Use base2 format for displaying bandwidth.
2021-12-14 21:22:50 -05:00
Allison Vacanti
54fda533e1 Use base2 format for displaying bandwidth.
Fixes #4.
2021-12-14 21:19:10 -05:00