nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-03-14 20:27:24 +00:00

Author	SHA1	Message	Date
Allison Piper	f2011f2281	Add new hidden summary with percent sm clock scaling,	2025-04-14 11:37:20 -04:00
Allison Piper	e0a486b03b	Reduce memory usage of clock rate logging.	2025-04-14 11:35:27 -04:00
Allison Piper	18926ced87	Replace references to `peak_sm_clock` with `default_sm_clock`. The actual measured clock speed can exceed this value, so default is less confusing than peak.	2025-04-14 11:33:04 -04:00
Allison Piper	87dd03254f	Merge pull request #206 from gevtushenko/throttle Discard measurements while GPU is throttling	2025-04-14 10:57:33 -04:00
Georgy Evtushenko	254ac2517f	Remove discard on throttle option	2025-04-12 21:13:13 -07:00
Georgy Evtushenko	b926daf09f	Better throttle recovery delay	2025-04-12 21:04:12 -07:00
Georgy Evtushenko	5c0d674757	Fix overflow in default clock rate	2025-04-11 15:44:11 -07:00
Georgy Evtushenko	2ba2d1131d	Report mean SM clock rate	2025-04-11 15:33:57 -07:00
Georgy Evtushenko	f29f7ac2fb	Detect throttle Signed-off-by: Georgy Evtushenko <evtushenko.georgy@gmail.com>	2025-04-11 14:35:40 -07:00
Allison Piper	36adf3a210	Merge pull request #204 from alliepiper/summaries Add min/max timings, new "summaries" example.	2025-04-08 17:51:36 -04:00
Allison Piper	2ba8acd4ea	Add example that demonstrates how to add/remove columns from the markdown table.	2025-04-08 21:14:21 +00:00
Allison Piper	94fde7777c	Clean up summary code, add min/max times summaries.	2025-04-08 19:15:25 +00:00
Allison Piper	beca2c0038	Merge pull request #203 from alliepiper/exec_tag_cleanup Clean up unnecessary exec_tags.	2025-04-08 13:35:34 -04:00
Allison Piper	35360614ed	Remove run_once exec_tag. Similar to `no_block`, this is a runtime variable that doesn't need to be encoded statically. It was not exposed publicly and existing solely as an implementation detail of `state::exec`, introducing unnecessary complexity there.	2025-04-08 17:15:58 +00:00
Allison Piper	851d7aadd0	Make blocking kernel use a runtime option. It's not worth instantiating multiple instances of the measurement class to handle this. Since there's already runtime option to disable the blocking kernel, the current implementation by default will instantiate both the blocking and non-blocking version of the algorithm for dynamic dispatch.	2025-04-08 17:15:58 +00:00
Allison Piper	52028be94f	Merge pull request #201 from alliepiper/cpu_only Add cpu-only benchmarking support.	2025-04-08 11:39:30 -04:00
Allison Piper	a6df59a9b5	Add support for CPU-only benchmarking. Fixes #95. CPU-only mode is enabled by setting the `is_cpu_only` property while defining a benchmark, e.g. `NVBENCH_BENCH(foo).set_is_cpu_only(true)`. An optional `nvbench::exec_tag::no_gpu` hint can also be passed to `state.exec` to avoid instantiating GPU benchmarking backends. Note that a CUDA compiler and CUDA runtime are always required, even if all benchmarks in a translation unit are CPU-only. Similarly, a new `nvbench::exec_tag::gpu` hint can be used to avoid compiling CPU-only backends for GPU benchmarks.	2025-04-08 11:17:23 -04:00
Allison Piper	1efed5f8e1	Merge pull request #200 from alliepiper/update_deps Update dependencies, drop support for old compilers and MSVC.	2025-04-04 18:45:59 -04:00
Allison Piper	93ea533fd3	Drop support for MSVC.	2025-04-04 22:17:03 +00:00
Allison Piper	1d0daa52ae	Add skip-vdc option to CI.	2025-04-04 17:44:33 -04:00
Allison Piper	7d210614f5	Attempt to suppress system include warnings on MSVC.	2025-04-04 17:44:33 -04:00
Allison Piper	2a25b351ab	Bump required cmake version to 3.30.4 for rapids-cmake.	2025-04-04 17:44:33 -04:00
Allison Piper	a3fb3ce610	Migrate CI to l4 queue.	2025-04-04 17:44:33 -04:00
Allison Piper	15d34106d4	Disable unicode in fmtlib on nvcc + msvc. This doesn't appear to be supported.	2025-04-04 17:44:33 -04:00
Allison Piper	435df5220f	Regenerate devcontainers.	2025-04-04 17:44:33 -04:00
Allison Piper	1a5fa2277e	Drop support for CTK < 11.8 and clang < 14. Newer versions of fmt have a ton of issues building on CTK 11.1, and 11.8 is the next available container we have built for CI. We may still work with some earlier versions, but we do not test them. We no longer have CI images available for clang < 14, so drop official support.	2025-04-04 17:44:33 -04:00
Allison Piper	8478f7d0bf	Guard fmt def behind nvcc check.	2025-04-04 17:44:33 -04:00
Allison Piper	9d9a30fbd6	Bump devcontainers to 25.06 branch.	2025-04-04 17:44:33 -04:00
Allison Piper	5f6f8a65ee	Enable /utf-8 on MSVC.	2025-04-04 17:44:33 -04:00
Allison Piper	a1acb3e8b2	Update CI matrix and devcontainers.	2025-04-04 17:44:33 -04:00
Allison Piper	4d7b3e8100	Add missing header to test.	2025-04-04 17:44:33 -04:00
Allison Piper	0e8089a246	Disable fmtlib's use of llvm _BitInt, as it is not supported when using nvcc.	2025-04-04 17:44:33 -04:00
Allison Piper	e6705e3114	Update fmtlib/fmt to 11.1.4. Switched away from the rapids-cmake provided version and manually CPM'd it. rapids-cmake will stop providing fmtlib later this year, and the version currently supported is rather old. Included the same logic that rapids-cmake currently uses to hopefully provide a smooth transition for edge cases (external fmt, etc). Added `FMT_SYSTEM_HEADERS=ON` to mark fmt headers as system includes, suppressing any internal warnings.	2025-04-04 17:44:33 -04:00
Allison Piper	5aa5a3c225	Update rapids-cmake to 25.04.	2025-04-04 17:44:33 -04:00
Allison Piper	2d9eafc765	Merge pull request #202 from alliepiper/misc-fixes Misc fixes pre_msvc_drop	2025-04-04 16:47:25 -04:00
Allison Piper	497eaed1d9	Use correct timer when computing cpu stats in measure_cold.	2025-04-04 20:16:04 +00:00
Allison Piper	618e1f048c	Fix typo in docstring.	2025-04-04 20:14:44 +00:00
Allison Piper	f6af8b9769	Whitespace cleanup.	2025-04-04 20:14:21 +00:00
Allison Piper	c03033b50e	Fix error message when #TypeAxisNames != #TypeAxes. (#192 ) The error message was being generated after moving strings out of `names`, so some of the axis names were blank. This moves the check + error before any strings are moved.	2024-11-20 13:11:03 -05:00
Bernhard Manfred Gruber	f52aa4b0aa	Distinguish slower, same and faster comparisons (#190 ) Fixes: #178	2024-11-15 12:41:47 -05:00
Georgii Evtushenko	0ce45af043	Plot comparison results (#90 )	2024-11-13 14:28:04 -05:00
Jordan Jacobelli	92286e1d4a	devcontainer: replace `VAULT_HOST` with `AWS_ROLE_ARN` (#187 ) * devcontainer: replace VAULT_HOST with AWS_ROLE_ARN Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com> * Update devcontainers base image to support AWS_ROLE_ARN Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com> * Bump cuda latest version to 12.6 Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com> * Replace ubuntu18.04 with ubuntu20.04 Ubuntu 18.04 is not supported anymore Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com> * Use DOOD stategy to keep supporting ubuntu18.04 See https://github.com/NVIDIA/cccl/pull/1779 Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com> --------- Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com>	2024-10-25 11:49:02 -04:00
Sergey Pavlov	a171514056	Added cudaGetLastError() calls to reset benchmarking kernel errors (issue 88). (#173 ) * Create and use NVBENCH_CUDA_CALL_RESET_ERROR. * Moved cudaGetLastError() call to NVBENCH_CUDA_CALL macro --------- Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com>	2024-05-31 11:32:01 -04:00
Cliff Burdick	088c9ee658	Removed extraneous break statements that caused nvc++ to give warnings (#171 )	2024-05-17 16:35:11 -04:00
Allison Piper	961fa91258	Don't fail CI when ninja_summary fails. (#172 )	2024-05-17 14:35:12 -04:00
Allison Piper	555d628e9b	Use a reproducible seed in test rng. (#164 )	2024-04-12 11:55:05 -04:00
Allison Piper	60761e0946	Enable extra NVBench features in windows build. (#169 ) * Enable extra NVBench features in windows build. These were delayed as they required changes to the devcontainers. * Revamp nvml.dll logic.	2024-04-10 13:45:53 -04:00
Allison Piper	5ee8811a1a	Fix and test using RAII global state in `main`. (#168 )	2024-04-09 17:27:49 -04:00
Allison Piper	165cf924c5	Refactor main implementation to improve reusability and customization. (#165 ) * Refactor main implementation to improve reusability and customization. Move the implementation of `main` out of macros and into separate functions. This allows for easier reuse and customization of the macros. Existing macro usage should still work as expected, and new customization points will simplify common tasks like argument parsing going forward. * Add tests that validate common main customizations.	2024-04-09 12:45:58 -04:00
Allison Piper	9e8efa2c88	Preserve `.devcontainers/img/` when cleaning. (#166 )	2024-04-08 18:24:07 -04:00

1 2 3 4 5 ...

496 Commits