Commit Graph

118 Commits

Author SHA1 Message Date
Oleksandr Pavlyk
d230a16e2b Tighten statistics and timeout warning tests
Document that percentile helpers return quiet NaNs for NaN-containing inputs.

Make quartile expected-value tests compute ranks from the documented
round(p / 100 * (n - 1)) rule instead of reusing statistics::percentile_rank(),
so rank regressions are caught independently.

Extend timeout-warning coverage to exercise the too-few-samples max-noise path
in addition to unavailable, invalid, and infinite stdev-noise inputs.
2026-06-28 08:50:21 -05:00
Oleksandr Pavlyk
36d8c5ba46 Test nullopt explicitly in warning check test
check_noise_warning() now takes std::optional<nvbench::float64_t>,
matching the production helper, and the test now covers
std::nullopt explicitly in addition to NaN, negative, and +inf.
2026-06-28 08:22:03 -05:00
Oleksandr Pavlyk
e99ae66989 timeout_warnings now treats engaged NaN and negative stdev noise as unavailable
Add a focused test target, nvbench.test.measure_timeout_warnings, covering:

  - NaN stdev noise -> “unable to estimate noise”
  - negative stdev noise -> “unable to estimate noise”
  - +inf stdev noise -> “over noise threshold”
2026-06-28 07:43:29 -05:00
Oleksandr Pavlyk
bb0f90f1a0 Preserve stdev noise summaries for low sample counts
Keep legacy stdev/relative summary tags present even when too few
samples are available to compute a meaningful standard-deviation noise
estimate. Use the standard-deviation unavailable sentinel for those
values so existing summary consumers continue to see the expected tags.

Factor the sentinel into the statistics helpers and use it from both
standard_deviation() and stdev_noise_or_sentinel(), keeping the schema
compatibility behavior explicit and tested.
2026-06-28 07:14:02 -05:00
Oleksandr Pavlyk
55467266d9 test_compute_standard_deviation_noise exercises other invalid inputs 2026-06-28 07:14:02 -05:00
Oleksandr Pavlyk
caa2f466c8 Check consistency of sort- vs. select-based quartiles using threshold constant
Expose quartile threshold value, use it in testing to test around that value.
2026-06-26 17:02:45 -05:00
Oleksandr Pavlyk
86eb2a8ddd Add tests for handling of NaNs in quartile routine inputs 2026-06-26 16:28:34 -05:00
Oleksandr Pavlyk
d9cdd8bd1e Test quartile values across selection threshold
Add fixed expected-value assertions for quartile tests around the
sort/selection switch point, including duplicate-heavy inputs. This keeps the
tests from only proving that both implementations agree with each other.
2026-06-26 16:28:34 -05:00
Oleksandr Pavlyk
8dc36a6d79 Generate cold summaries only if some accepted samples have been accumulated
Cold measurement can discard throttled trials before incrementing the accepted
sample count, then stop on timeout with zero recorded samples. In that case,
only emit the sample-size summary and skip derived timing, bandwidth, clock, and
bulk summaries that require accepted samples.

This avoids divide-by-zero mean calculations and quartile/IQR computation over
empty sample vectors.

Keep timeout diagnostics reachable for zero-sample runs and add an explicit
warning when no accepted cold samples were recorded. Factor timeout warning
emission into a private helper so the zero-sample and normal paths share the
same diagnostic logic.

Suppress low-sample relative stdev noise

Add a statistics helper that returns no relative standard-deviation noise until
there are enough samples for a meaningful estimate. Use it for cold CPU/GPU and
CPU-only summaries so the low-sample +inf stdev sentinel is not published as
real relative noise or used for max-noise timeout warnings.

Add statistics coverage for suppressing the low-sample sentinel and computing
relative stdev noise once the sample threshold is reached.

compute_standard_deviation_noise return nullopt if standard deviation is not finite

Test verify that noise is nullopt when not enough samples are accumulated

Added statistics::has_enough_samples_for_noise_estimate(...)

Used it in standard_deviation, compute_standard_deviation_noise,
compute_robust_noise.

Added timeout diagnostics in cold and CPU-only paths.
if max-noise is configured and the run timed out before enough
samples exist to estimate noise, the log now says that explicitly,
otherwise the existing “over noise threshold” warning remains
unchanged.

Added a statistics test assertion for the new sample-count
predicate.
2026-06-26 16:28:34 -05:00
Oleksandr Pavlyk
86e1c2c881 Duplicate-heavy boundary test is added
Prepare duplicate heavy input and check sort-based
quartile computation result with selection-based one.

std::nth_element only guarantees that the nth element
is the value that would appear there in sorted order;
it does not fully sort equal partitions. Bugs in the
selection implementation, especially when selecting Q1
from the left half and Q3 from the right half after
selecting the median, are more likely to show up when
many samples equal the quartile values.
2026-06-26 16:28:34 -05:00
Oleksandr Pavlyk
0dc93b0c0e Introduce robust metrics (#379)
* Add statistics utilities to compute quartiles

Quartiles are computed using nearest rank method.

Two implementations are provided:
  1. Sort-based:
     a. sort array
     b. extract values at ranks of interest
  2. Selection based:
     a. Run nth_element to find median on whole range
     b. Run nth_element on left side to find first quartile
     c. Run nth_element on right side to find thirst quartile

Public API copies input into temporary vector which is mutated as needed.

Public API uses sort-based implementation for small arrays ( <= 4096 elements),
and selection-based implementation for larger arrays.

Sort-based implementation can support computation of arbitrary percentiles,
which could be useful later if more extreme statistics is needed.

Add tests covering percentile and quartile edge cases, input iterators,
selection-vs-sorting agreement, empty and singleton inputs, and relative
dispersion validation.

* Add quartiles information to summaries

Use the quartile helpers to report robust cold and CPU-only timing summaries:
Q1, median, Q3, interquartile range, and relative interquartile range.
These values stay hidden.

Summary tags are nv/cold/time/gpu/q1, nv/cold/time/gpu/median,
nv/cold/time/gpu/q3, nv/cold/time/gpu/ir/absolute, nv/cold/time/gpu/ir/relative

ir/absolute = q3 - q1, ir/relative = (q3 - q1)/median

Similar tags added for nv/cold/time/cpu and for CPU-only measures.

Validate relative-dispersion calculations before publishing relative noise
summaries so invalid centers or dispersion values do not produce misleading
summary entries.

* Prefer robust summaries in default output

Only flip visibility for nv/cold/cpu/time, nv/cold/gpu/time,
and nv/cpu_only/only:
  - hide mean
  - hide stdev/relative
  - show median
  - show ir/relative

* Use is_close where std::abs(act-exp) was used

* Revert "Prefer robust summaries in default output"

This reverts commit 9a0afc361c.

Basically, all robust statistics summaries entries are hidden,
and mean + stdev/relative are back to be default displayed items

* Address PR review feedback
2026-06-02 13:20:15 -05:00
Oleksandr Pavlyk
7ba2b79d5b Reduce stdrel criterion complexity and ensure termination (#374)
* Reduce stdrel criterion complexity and ensure termination

Replace the stdrel criterion's growing sample history with an online
mean/variance accumulator. This keeps the stopping criterion based on
relative standard deviation, preserves the unbiased standard-deviation
estimate used for convergence, and reduces per-sample update work from
recomputing over the full history to constant time.

Add a bounded invalid-noise path so measurements that persistently produce
non-finite relative noise, such as all-zero timings, can terminate without
waiting for the wall-time timeout. Keep the normal min-time gate for ordinary
stdrel convergence.

Add focused tests for the online accumulator, stdrel sample-count threshold,
sample-standard-deviation behavior, deterministic convergence inputs, and
persistent invalid-noise termination. Update the CLI help for the stdrel
termination behavior.

* change max-noise to  for consistency

* Use online_mean_variance on m_noise_tracker in is_finished()

Previously, standard deviation call was made using current
noise level instead of mean noise level. Because of identity

E[ (N - C)^2 ] =
    E[ (N - E[N])^2 ] + (E[N] - C)^2 >= E[ (N - E[N])^2 ]

this led to criterion terminating later than it could have because
the estimated expectation is always greater of equal that the
estimate relative to the mean.

Code used current noise level instead of mean to avoid needing to
make two passed through m_noise_tracker container.

Use of online_mean_variance allows to improve accuracy of estimating
dispersion of noise signal while maintaining single pass through
container.

* Address review feedback

Fixed misleading commit. Introduce private methods to refactor
computation of repeated expressions.

Renamed m_cuda_times_summary to m_measurements_summary, since
criterion can be applied for CPU-only measurements too.

Introduced is_close utility for checking whether two floating
point numbers are closed to one another.

Introduced descriptive constexpr variables for hard-wired
constants
2026-05-29 17:06:28 +00:00
mfranzrebsal
4a33a61591 Add Windows support (#354) 2026-05-19 15:10:58 -05:00
Oleksandr Pavlyk
ce75dab94b Add stopping criterion sample count (#341)
* Implement sample-count stopping criterion with parameter target-samples

--stopping-criterion sample-count --target-samples 100 would stop once
max(--min-samples, --target-samples) samples are collected

* Address review nitpicks
2026-05-15 15:15:12 -05:00
Oleksandr Pavlyk
6dd27aedfd Fix exception safety (#358)
Improve exception safety of timer structs by using local scope guards to ensure that cleanup steps, such as signaling blocking kernel to unblock and making sure that the stream is synchronized are performed even launch object throws an exception.

Tests of exception safety were added.

--

* blocking_kernel.unblock_noexcept() noexcept method added

This decouples the logic of signaling to unblock from checking
of the timeout.

* Improve exception safely in kernel_launch_timer

Introduce noexcept cleanup methods. Place body of start()
and stop() methods in the try/catch block and execute
noexcept clean-up on exception before rethrowing.

* Improve exception safety of measure_hot

* Make sure that throwing methods call noexcept ones instead of duplicating functionality

* Use cleanup_guard in measure_cold_base::kernel_launch_timer

Replace try/catch pattern with cleaner use of cleanup_guard
class.

* cpu_timer::start, cpu_timer::stop methods marked noexcept

These methods do not throw, and marking them noexcept explicitly
makes it fine to call them from other noexcept methods, as such
cleanup_noexcept in measure_cold.

* Address remaining exception safety issue in measure_hot

* Renamed guard variables to reflect their purpose, apply arm-then-do to ops queueing kernels

Set m_block_stream_armed = true; before launching the kernel. Doing so signals
cleanup guard that stream must be unblocked, even if launching of the kernel failed.

Same for operation launching time-stamps kernel.

* Add testing/device/exception_safety.cu

This test add benchmark that throws. It verifies that it did not
time-out and control counters the benchmark maintains are at
the expected values.

* Refactor measurement cleanup guards for testability

Extract hot stream cleanup and cold launch timer cleanup into reusable
detail helpers. Keep measure_hot and measure_cold using those helpers through
thin adapters so the tested cleanup logic matches the production path.

Add driver-free cleanup guard tests using a fake measure object to verify
cleanup ordering when exceptions occur after blocking stream setup, after hot
unblock, and around cold GPU frequency start/stop paths.

* Implement cpu_timer_stop_noexcept in terms of cpu_timer_stop

The cpu_timer_stop is already noexcept by nature of implementation,
but we maintain cpu_timer_stop_noexcept method for symmetry with
other pairs sync_stream()/sync_stream_noexcept().

The cpu_timer_stop_noexcept() is implemented via cpu_timer_stop().
These methods are annotated __forceinline__, so the same code should be
generated.

* More readable initialization of bool members

* Moved exception_safety.cu back to testing/ folder

testing/device is reserved for tests that require locking
of GPU frequency per CMake option description.

* Fixed nitpick and bug it discovered

Changed testing/exception_safety.cu:237 so run_benchmark now iterates over every state
from bench.get_states() and checks each one is skipped with a reason
containing "requested".

That exposed a real runner behavior gap, so I also made a minimal fix in
nvbench/runner.cuh:120: after stop_runner_loop, remaining states are now explicitly
marked skipped with a reason instead of only printing a skip notification.

* Move static assertions (pertaining to cleanup guards) to
testing/cleanup_guards.cu

The CI failure with CTK 12.0 and certain version of GCC is caused
by OOM in cudafe++ process tripped by compiling instantiation
of contract verification on cold_launch_timer_probe struct.

As a work-around, this instantiation is excluded for CTK 12.0-12.6
2026-05-15 15:14:30 -05:00
Oleksandr Pavlyk
9ea77bccaa Implement CLI option to control warmups for cold measurements (#339)
* Implement warmup-runs count, supported as CLI

CLI option --warmup-runs implemented and documented.

The warm-up counts is enforced to always be positive.
This is necessary to ensure that JIT-ting has occurred,
and use of blocking kernel would not result in time-outs.

Test is option parser is added.

* Ensure that measure_cold::run_warmup instantiates blocking kernel

Because warm-up runs are executed without use of blocking kernel,
the blocking kernel was not jitted until actual measurements were
collected. The module loading cost incurred during the first run
shows as elevated CPU time noise value for the first measurement
as noted in https://github.com/NVIDIA/nvbench/pull/339

This PR adds `this->block_stream(); this->unblock_stream();` prior
to executing warm-up loop with use of blocking kernel disabled.

This ensures that blocking kernel is instantiated during the warm-up,
but it no other kernel is launched between its launch and stream sync
thus avoiding deadlocking.

* Rename --warmup-runs to --cold-warmup-runs, add --cold-max-warmup-walltime

Since configurable number of warmups only applies to measure_cold.cuh
rename the CLI option to reflect that.

Also add --cold-max-warmup-walltime (defaults to -1, i.e. disabled).
If enabled, exits warmup loop before request count is reached if
the wall-time expanded executign warmups exceeds this max-warmup-walltime
value.
2026-05-12 14:30:08 -05:00
Oleksandr Pavlyk
7dfbcad27c Create directories for output files (#360)
* QOL UX, NVBench creates directories for output JSON, MD, CSV files

This closes #185 and supports specifying
`--json path/to/nonexistent/folder/result.json`

This would create sequence of folders where to place result.json

```
(py313) :~/repos/nvbench$ rm -rf /tmp/nested/
(py313) :~/repos/nvbench$ ./build2/bin/nvbench.example.cpp20.axes -b copy_type_and_block_size_sweep -a Type=I32 -a BlockSize=64 --jsonbin /tmp/nested/json/axes.json --md /tmp/nested/md/res.md --csv /tmp/nested/csv/res.csv > /dev/null 2>&1
(py313) :~/repos/nvbench$ tree /tmp/nested/
/tmp/nested/
├── csv
│   └── res.csv
├── json
│   ├── axes.json
│   ├── axes.json-bin
│   │   └── 0.bin
│   └── axes.json-freqs-bin
│       └── 0.bin
└── md
    └── res.md

6 directories, 5 files
```

* Add a test that non-existent output folder is created

* Remove throwing custom error message. Use default

* Replace static_assert(false, ...) with #error
2026-05-12 10:26:28 -05:00
Oleksandr Pavlyk
4c278b08b3 Link against fmt::fmt target, not fmt. Consistent with nvbench/CMakeLists.txt
Co-authored-by: Dominic Charrier <docharri@amd.com>
2026-03-19 14:53:06 -05:00
Oleksandr Pavlyk
d160a2bafa Replace --run-once in testing/CMakeLists.txt with --profile 2025-07-28 12:03:42 -05:00
Allison Piper
f44f5cc22c Remove min-time/max-noise API. (#223)
These are now owned by the stdrel stopping criterion, and should not be exposed directly in the benchmark/state/etc APIs.

This will affect users that are calling
`NVBENCH_BENCH(...).set_min_time(...)` or
`NVBENCH_BENCH(...).set_max_noise(...)`.

These can be updated to
`NVBENCH_BENCH(...).set_criterion_param_float64(["min-time"|"max-noise"], ...)`.
2025-05-08 10:02:54 -04:00
Allison Piper
9d189280de Fix get_config_count for CPU-only benchmarks. (#218) 2025-05-01 12:34:35 -04:00
Sergey Pavlov
433376fd83 Restrict stopping criterion parameter usage in command line (#174)
* restrict stopping criterion parameter usage in command line
* Update docs for stopping criterion.
* Add convenience benchmark_base API for criterion params.
* Add more test cases for stopping criterion parsing.

---------

Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com>
Co-authored-by: Allison Piper <alliepiper16@gmail.com>
2025-04-30 15:53:45 -04:00
Elias Stehle
ca0e795b46 Merge pull request #113 from elstehle/fix/per-device-stream
Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream
2025-04-30 15:40:33 -04:00
Allison Piper
3440855dbd Formatting updates. 2025-04-14 17:26:12 +00:00
Allison Piper
93ea533fd3 Drop support for MSVC. 2025-04-04 22:17:03 +00:00
Allison Piper
4d7b3e8100 Add missing header to test. 2025-04-04 17:44:33 -04:00
Sergey Pavlov
a171514056 Added cudaGetLastError() calls to reset benchmarking kernel errors (issue 88). (#173)
* Create and use NVBENCH_CUDA_CALL_RESET_ERROR.

* Moved cudaGetLastError() call to NVBENCH_CUDA_CALL macro

---------

Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com>
2024-05-31 11:32:01 -04:00
Allison Piper
555d628e9b Use a reproducible seed in test rng. (#164) 2024-04-12 11:55:05 -04:00
Allison Piper
5ee8811a1a Fix and test using RAII global state in main. (#168) 2024-04-09 17:27:49 -04:00
Allison Piper
165cf924c5 Refactor main implementation to improve reusability and customization. (#165)
* Refactor main implementation to improve reusability and customization.

Move the implementation of `main` out of macros and into separate
functions. This allows for easier reuse and customization of the macros.
Existing macro usage should still work as expected, and new
customization points will simplify common tasks like argument parsing
going forward.

* Add tests that validate common main customizations.
2024-04-09 12:45:58 -04:00
Allison Piper
a0f2fab72b Squashed commit of the following:
commit c5b2fc0a8b
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Sat Apr 6 21:48:20 2024 +0000

    Add supported compilers and tools in README.md.

commit 92fe366da5
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Sat Apr 6 20:45:30 2024 +0000

    Fix issues discovered by header tests.

commit f7f6c92143
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Sat Apr 6 20:45:06 2024 +0000

    Setup header tests, add C++20 header tests + examples.

    The core library will always be built with C++17, but
    we test our headers / examples under 17 and 20.

commit 4b24f26b66
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Sat Apr 6 16:21:42 2024 +0000

    Pass CUDA FLAGS to install tests.

commit 4fb672ae91
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Sat Apr 6 15:43:41 2024 +0000

    Add newer GCC (13) and Clang (17, 18).
2024-04-06 22:05:40 +00:00
Allison Piper
e8c8877d36 Squashed commit of the following:
commit 4b309e6ad8
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Sat Apr 6 13:19:14 2024 +0000

    Minor cleanups

commit 476ed2ceae
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Sat Apr 6 12:53:37 2024 +0000

    WAR compiler ice in nlohmann json.

    Only seeing this on GCC 9 + CTK 11.1. Seems to be
    having trouble with the `[[no_unique_address]]` optimization.

commit a9bf1d3e42
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Sat Apr 6 00:24:47 2024 +0000

    Bump nlohmann json.

commit 80980fe373
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Sat Apr 6 00:22:07 2024 +0000

    Fix llvm filesystem support

commit f6099e6311
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 23:18:44 2024 +0000

    Drop MSVC 2017 testing.

commit 5ae50a8ef5
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 23:02:32 2024 +0000

    Add mroe missing headers.

commit b2a9ae04d9
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 22:37:56 2024 +0000

    Remove old CUDA+MSVC builds and make windows build-only.

commit 5b18c26a28
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 22:37:07 2024 +0000

    Fix header for std::min/max.

    Why do I always think it's utility instead of algorithm....

commit 6a409efa2d
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 22:18:18 2024 +0000

    Temporarily disable CUPTI on all windows builds.

commit f432f88866
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 21:42:52 2024 +0000

    Fix warnings on MSVC.

commit 829787649b
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 21:03:16 2024 +0000

    More flailing about in powershell.

commit 21742e6bea
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 20:36:08 2024 +0000

    Cleanup filesystem header handling.

commit de3d202635
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 20:09:00 2024 +0000

    Windows CI debugging.

commit a4151667ff
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 19:45:40 2024 +0000

    Quotation mark madness

commit dd04f3befe
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 19:27:27 2024 +0000

    Temporarily disable NVML on windows CI until new containers are ready.

commit f3952848c4
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 19:25:22 2024 +0000

    WAR issues on gcc-7.

commit 198986875e
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 19:25:04 2024 +0000

    More matrix/devcontainer updates.

commit b9712f8696
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 18:30:35 2024 +0000

    Fix windows build scripts.

commit 943f268280
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 18:18:33 2024 +0000

    Fix warnings with clang host compiler.

commit 7063e1d60a
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 18:14:28 2024 +0000

    More devcontainer hijinks.

commit 06532fde81
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 17:51:25 2024 +0000

    More matrix updates.

commit 78a265ea55
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 17:34:00 2024 +0000

    Support CLI CMake options for windows ci scripts.

commit 670895c867
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 17:31:59 2024 +0000

    Add missing devcontainers.

commit b121823e74
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 17:22:54 2024 +0000

    Build for `all-major` architectures in presets.

    We can get away with this because we require CMake 3.23.1.
    This was added in 3.23.

commit fccfd44685
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 17:22:08 2024 +0000

    Update matrix file.

commit e7d43ba90e
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 16:23:48 2024 +0000

    Consolidate build/test jobs.

commit c4044056ec
Author: Allison Piper <alliepiper16@gmail.com>
Date:   Fri Apr 5 16:04:11 2024 +0000

    Add missing build script.
2024-04-06 13:56:10 +00:00
Georgy Evtushenko
4be0c5bdcd API convention 2024-01-11 10:48:52 -08:00
Georgy Evtushenko
dacbee127c Base method naming convention 2024-01-11 10:41:11 -08:00
Georgy Evtushenko
182c77e4f4 Got rid of the params description API 2024-01-10 12:30:17 -08:00
Georgy Evtushenko
42c6bdea70 Handle empty input in mean 2024-01-10 09:52:14 -08:00
Georgy Evtushenko
fade52fa2e Different singleton convention 2024-01-08 14:08:12 -08:00
Georgy Evtushenko
85ed6f007c Rename criterion registry to criterion manager 2024-01-08 13:15:46 -08:00
Georgy Evtushenko
de724a21f1 Rename get_params to get_params_description 2024-01-08 13:06:48 -08:00
Georgy Evtushenko
b789240c76 Entropy-based stopping criterion 2024-01-05 14:59:48 -08:00
Vyas Ramasubramani
a3b729bca8 fmt::memory_buffer is no longer an iterator. 2022-11-03 10:04:02 -07:00
Yunsong Wang
af4c35d78b Fix a bug in config count unit test: count number of devices as well 2022-02-11 18:24:58 -05:00
Yunsong Wang
6159d9c6cb Minor correction in unit test 2022-02-06 20:19:21 -05:00
Yunsong Wang
33a896f99e Update copyright year 2022-02-04 17:25:50 -05:00
Yunsong Wang
470beda9f0 Add nvbench::state stream tests 2022-02-04 16:55:29 -05:00
Allison Vacanti
a72f248af6 Require the NVBench package in test_export testing. 2022-01-19 15:42:26 -05:00
Allison Vacanti
6dee1eec3b Refactor summary API and update nvbench/summary.cuh docs.
The string used when constructing a summary is no longer a human
readable name, but rather a tag string (e.g. "nv/cold/time/gpu/mean").
These will make lookup easier and more stable going forward.

name vs. short_name no longer exists. Now there is just "name", which
is used for column headings. The "description" string may still be
used for detailed information.

Updated the json tests and compare script to reflect these changes.
2022-01-11 15:06:26 -05:00
Allison Vacanti
2f8bb28c52 Merge pull request #64 from allisonvacanti/noise_convergence
New convergence check
2021-12-21 21:30:39 -05:00
Allison Vacanti
178dd0eb68 Implement new convergence check for noisy kernels.
Previously, convergence was tested by waiting for the relative stdev
of cuda timings ("noise") to drop below a certain percentage
(`max_noise`).

This assumed that all benchmarks would eventually see their noise drop
to some threshold, but this is not the case. In practice, many benchmarks
never converge to the default 0.5% relative stdev and instead will always
run to the 15s timeout -- even if the means have converged in a second
or two.

Added a new check that tests when the noise itself stabilizes and ends
the benchmark, even if noise > max_noise.

After testing, this patch alone significantly reduces the runtime of the
Thrust+CUB benchmark suite (from 30 hours to 5 hours) and produces similar
timing results.

The parameters used to tune this feature are not exposed -- if this
approach works long-term and there's a strong motivation to let users
tweak them, then we can worry about names/APIs/CLI/docs later.
2021-12-21 21:24:02 -05:00
Allison Vacanti
8e56a7bd94 Add noisy_bench with some benchmarks that currently always time-out. 2021-12-21 21:05:13 -05:00