Commit Graph

455 Commits

Author SHA1 Message Date
Allison Vacanti
fc86d6a524 Merge pull request #65 from allisonvacanti/fix_cpu_noise_calc
Fix cpu noise calcs.
2021-12-22 13:29:37 -05:00
Allison Vacanti
6edc5b91a5 Fix cpu noise calcs. 2021-12-22 13:27:07 -05:00
Allison Vacanti
2f8bb28c52 Merge pull request #64 from allisonvacanti/noise_convergence
New convergence check
2021-12-21 21:30:39 -05:00
Allison Vacanti
178dd0eb68 Implement new convergence check for noisy kernels.
Previously, convergence was tested by waiting for the relative stdev
of cuda timings ("noise") to drop below a certain percentage
(`max_noise`).

This assumed that all benchmarks would eventually see their noise drop
to some threshold, but this is not the case. In practice, many benchmarks
never converge to the default 0.5% relative stdev and instead will always
run to the 15s timeout -- even if the means have converged in a second
or two.

Added a new check that tests when the noise itself stabilizes and ends
the benchmark, even if noise > max_noise.

After testing, this patch alone significantly reduces the runtime of the
Thrust+CUB benchmark suite (from 30 hours to 5 hours) and produces similar
timing results.

The parameters used to tune this feature are not exposed -- if this
approach works long-term and there's a strong motivation to let users
tweak them, then we can worry about names/APIs/CLI/docs later.
2021-12-21 21:24:02 -05:00
Allison Vacanti
8e56a7bd94 Add noisy_bench with some benchmarks that currently always time-out. 2021-12-21 21:05:13 -05:00
Allison Vacanti
3c01814945 Skip non-json files and empty files in compare script. 2021-12-21 21:03:02 -05:00
Allison Vacanti
e70c31d7e1 Merge pull request #63 from allisonvacanti/fix_progress_display
Fix progress display for inactive type axis values.
2021-12-21 20:42:05 -05:00
Allison Vacanti
8cacc821d0 Fix an error message.
This path gets hit for type axes as well as strings.
2021-12-21 20:41:45 -05:00
Allison Vacanti
c9ab8e2eb3 Fix progress display for inactive type axis values.
When type axis values were disabled they were still counted towards
a benchmark's total number of configs.
2021-12-21 20:36:52 -05:00
Allison Vacanti
0f5c8624f6 Merge pull request #62 from allisonvacanti/debug_warnings
Suppress warnings on MSVC Debug builds.
2021-12-21 19:41:19 -05:00
Allison Vacanti
288b1564e0 Suppress warnings on MSVC Debug builds.
Also moved the config.cuh.in template into the source directory where
it'll be easier to find.
2021-12-21 19:35:23 -05:00
Allison Vacanti
edf2018fd7 Merge pull request #58 from allisonvacanti/nvbench_executable
Add an `nvbench-ctl` executable.
2021-12-21 12:08:39 -05:00
Allison Vacanti
20522c807d Add an nvbench-ctl executable.
This will provide functionality such as clock locking (--lgm),
persistance mode (--pm), device querying (--list), version checking
(--version), and documentation (--help).

This is possible already with any nvbench executable, but having
one with a reliable name will be helpful for scripting and writing
documentation.
2021-12-21 12:02:07 -05:00
Allison Vacanti
986736aa09 Merge pull request #60 from allisonvacanti/59_ubuntu_cupti
Add cupti path for ubuntu packages.
2021-12-20 14:35:27 -05:00
Allison Vacanti
61d094abf1 Add cupti path for ubuntu packages.
Fixes #59
2021-12-20 14:34:12 -05:00
Allison Vacanti
ff1ad78cfa Merge pull request #48 from robertmaynard/improve_compare_script_features
nvbench_compare handles directories and can filter out non-interesting results
2021-12-20 13:46:24 -05:00
Robert Maynard
6c1f372c45 Allow nvbench [-flags] (files|dirs) 2021-12-20 13:31:32 -05:00
Robert Maynard
35dd8de2ce Remove unneeded scripts/requirements.txt 2021-12-20 13:24:24 -05:00
Allison Vacanti
a8422197a9 Merge pull request #57 from senior-zero/fix_option_parser
Fix UB in option parser
2021-12-20 11:58:51 -05:00
Allison Vacanti
113b2f3f7f Merge pull request #56 from allisonvacanti/pow2_axis_compact_md
Reduce the width of pow2 axes in markdown tables.
2021-12-20 11:45:44 -05:00
Allison Vacanti
610b7767b5 Merge pull request #54 from allisonvacanti/progress_display
Print progress in markdown log.
2021-12-20 11:44:50 -05:00
Allison Vacanti
51efc7d1a8 Merge pull request #53 from allisonvacanti/50_warning_flags
Enable extra warning flags
2021-12-20 11:44:17 -05:00
Georgy Evtushenko
3bd37d0e75 Fix UB in option parser 2021-12-20 15:25:39 +03:00
Allison Vacanti
84f930809f Reduce the width of pow2 axes in markdown tables.
Before:

```
| BlockSize | (BlockSize) | NumBlocks | (NumBlocks) |
|-----------|-------------|-----------|-------------|
|       2^6 |          64 |       2^6 |          64 |
|       2^8 |         256 |       2^6 |          64 |
|      2^10 |        1024 |       2^6 |          64 |
|       2^6 |          64 |       2^8 |         256 |
|       2^8 |         256 |       2^8 |         256 |
|      2^10 |        1024 |       2^8 |         256 |
|       2^6 |          64 |      2^10 |        1024 |
|       2^8 |         256 |      2^10 |        1024 |
|      2^10 |        1024 |      2^10 |        1024 |
```

After:

```
|  BlockSize  |  NumBlocks  |
|-------------|-------------|
|    2^6 = 64 |    2^6 = 64 |
|   2^8 = 256 |    2^6 = 64 |
| 2^10 = 1024 |    2^6 = 64 |
|    2^6 = 64 |   2^8 = 256 |
|   2^8 = 256 |   2^8 = 256 |
| 2^10 = 1024 |   2^8 = 256 |
|    2^6 = 64 | 2^10 = 1024 |
|   2^8 = 256 | 2^10 = 1024 |
| 2^10 = 1024 | 2^10 = 1024 |
```
2021-12-19 10:38:14 -05:00
Allison Vacanti
37dd61b275 Clean up some virtual interfaces.
- nvbench::benchmark doesn't add state, no need to override the destructor.
- nvbench::printer_base's virtual API should support decoration, not just
  overriding. Making the virtual API protected instead of private allows
  derived classes to extend base class behavior.
- nvbench::printer_base needs a virtual destructor.
- Fix a bug in nvbench::printer_multiplex that caused the new
  `get_[total|completed]_state_count()` methods to always return 0.
2021-12-19 10:26:40 -05:00
Allison Vacanti
3508775d71 Print progress in markdown log.
e.g.

```
Run:  [1/63] copy_type_sweep [Device=0 T=U8]
Pass: Cold: 10.659315ms GPU, 10.670530ms CPU, 0.11s total GPU, 10x
Pass: Batch: 10.298826ms GPU, 0.51s total GPU, 50x
Run:  [2/63] copy_type_sweep [Device=0 T=U16]
Pass: Cold: 6.185874ms GPU, 6.194119ms CPU, 0.10s total GPU, 16x
Pass: Batch: 6.174837ms GPU, 0.53s total GPU, 86x
Run:  [3/63] copy_type_sweep [Device=0 T=U32]
...
Run:  [63/63] copy_sweep_grid_shape [Device=0 BlockSize=2^10 NumBlocks=2^10]
Pass: Cold: 4.921733ms GPU, 4.929724ms CPU, 0.10s total GPU, 21x
Pass: Batch: 4.917333ms GPU, 0.53s total GPU, 107x
```
2021-12-19 03:07:17 -05:00
Allison Vacanti
5d70492714 Enable more warning flags.
- /W4 on MSVC
- -Wall -Wextra + others on gcc/clang
- New NVBench_ENABLE_WERROR option to toggle "warnings as errors"
- Mark the nlohmann_json library as IMPORTED to switch to system includes
- Rename nvbench_main -> nvbench.main to follow target name conventions
- Explicitly suppress some cudafe warnings when compiling templates in
  nlohmann_json headers.
- Explicitly suppress some warnings from Thrust headers.
- Various fixes for warnings exposed by new flags.
- Disable CUPTI on CTK < 11.3 (See #52).
2021-12-18 20:13:25 -05:00
Allison Vacanti
15edfe2eee Refactor to use NVBENCH_THROW where possible. 2021-12-18 17:52:39 -05:00
Allison Vacanti
9ff857ee29 Merge pull request #49 from senior-zero/fix_markdown_table
Fix markdown table
2021-12-18 10:33:11 -05:00
Georgy Evtushenko
eb29ab27ff Fix markdown table 2021-12-18 18:08:29 +03:00
Georgy Evtushenko
21ea12cd10 Merge pull request #29 from senior-zero/main-feature/github/cupti
CUPTI support
2021-12-18 12:09:25 +03:00
Georgy Evtushenko
1bc715267c CUPTI support 2021-12-18 12:03:52 +03:00
Allison Vacanti
3d6c16f8ba Maintain iterator state in markdown table printer. 2021-12-18 01:27:38 -05:00
Allison Vacanti
07e1c56608 Merge pull request #46 from allisonvacanti/nvml
Add NVML support for persistence mode, locking clocks.
2021-12-17 16:07:44 -05:00
Allison Vacanti
b948e79cab Add NVML support for persistence mode, locking clocks.
Locking clocks is currently only implemented for Volta+ devices.

Example usage:

my_bench -d [0,1,3] --persistence-mode 1 --lock-gpu-clocks base

See the cli_help.md docs for more info.
2021-12-17 13:59:43 -05:00
Robert Maynard
f9b44378bf nvbench_compare now supports comparing directories of results 2021-12-16 16:26:13 -05:00
Robert Maynard
905f84272e Add --threshold-diff command option to nvbench_compare
Allows us to filter output to only see the significantly different
benchmarks
2021-12-16 15:52:30 -05:00
Robert Maynard
52d9aed8da refactor to have a proper main entry point 2021-12-16 15:27:51 -05:00
Robert Maynard
3f6d496824 Add a requirements.txt for the nv_bench script 2021-12-16 13:44:40 -05:00
Allison Vacanti
d0c90ff920 Build static fmtlib with -fPIC. 2021-12-15 12:54:53 -05:00
Allison Vacanti
af03585543 Add coloring to markdown tables. 2021-12-14 23:03:14 -05:00
Allison Vacanti
8d77dc2b6c Merge pull request #47 from allisonvacanti/base-two-bandwidth
Use base2 format for displaying bandwidth.
2021-12-14 21:22:50 -05:00
Allison Vacanti
54fda533e1 Use base2 format for displaying bandwidth.
Fixes #4.
2021-12-14 21:19:10 -05:00
Allison Vacanti
7c740975dd Force fmt to build static libs.
Otherwise it shows up in our export set when a parent project enables
BUILD_SHARED_LIBS
2021-10-28 12:39:14 -04:00
Allison Vacanti
cda8d320cb Merge pull request #44 from allisonvacanti/fix_for_conda
Don't explicitly link with cudart.
2021-10-27 12:17:09 -04:00
Allison Vacanti
f984efdc26 Don't explicitly link with cudart.
This is implicitly added by nvcc, and the explicit setting was breaking
environments where cudart_static is unavailable, e.g. conda.
2021-10-27 12:13:32 -04:00
Allison Vacanti
611385b047 Print version info with --help. 2021-10-26 17:45:33 -04:00
Allison Vacanti
1875d9962d Document new --version option. 2021-10-26 17:45:20 -04:00
Allison Vacanti
e6b5f51f1c Merge pull request #42 from allisonvacanti/rapids-cmake
Port to rapids-cmake
2021-10-26 17:26:08 -04:00
Allison Vacanti
b2d37c21fd Add export tests. 2021-10-20 14:02:16 -04:00