Allison Piper
0c56311174
Fetch clock rates using cudaDeviceGetAttribute.
2025-04-14 16:59:54 -04:00
Allison Piper
33fc77aabc
Merge pull request #210 from alliepiper/vdc_update
...
Update verify-devcontainers workflow to match CCCL.
2025-04-14 14:50:20 -04:00
Allison Piper
457b9f1064
Update verify-devcontainers workflow to match CCCL.
...
This prevents us from spawning a ton of jobs unless the devcontainers actually change.
2025-04-14 14:37:40 -04:00
Allison Piper
2c2f40a659
Merge pull request #209 from alliepiper/pre-commit-ci
...
Add pre-commit.ci configs, format.
2025-04-14 14:05:48 -04:00
Allison Piper
47bd2838da
Remove stale devcontainer.
2025-04-14 17:50:18 +00:00
Allison Piper
4c38b2d5f7
Clang-format doesn't like the 1'000'000 separators.
2025-04-14 17:44:31 +00:00
Allison Piper
a3a2337e04
Merge pull request #208 from alliepiper/drop-support-for-11.8
...
Remove coverage for 11.8.
2025-04-14 13:41:54 -04:00
Allison Piper
8cefac8463
Update blame-ignore file.
2025-04-14 17:31:13 +00:00
Allison Piper
3440855dbd
Formatting updates.
2025-04-14 17:26:12 +00:00
Allison Piper
de36f1a248
Add pre-commit.ci configs.
2025-04-14 12:23:44 -04:00
Allison Piper
b89c36a5c2
Remove coverage for 11.8.
...
We're going to be dropping these devcontainers soon in CCCL, and they're causing issues with our pre-commit hooks.
2025-04-14 16:03:49 +00:00
Allison Piper
87dd03254f
Merge pull request #206 from gevtushenko/throttle
...
Discard measurements while GPU is throttling
2025-04-14 10:57:33 -04:00
Georgy Evtushenko
254ac2517f
Remove discard on throttle option
2025-04-12 21:13:13 -07:00
Georgy Evtushenko
b926daf09f
Better throttle recovery delay
2025-04-12 21:04:12 -07:00
Georgy Evtushenko
5c0d674757
Fix overflow in default clock rate
2025-04-11 15:44:11 -07:00
Georgy Evtushenko
2ba2d1131d
Report mean SM clock rate
2025-04-11 15:33:57 -07:00
Georgy Evtushenko
f29f7ac2fb
Detect throttle
...
Signed-off-by: Georgy Evtushenko <evtushenko.georgy@gmail.com >
2025-04-11 14:35:40 -07:00
Allison Piper
36adf3a210
Merge pull request #204 from alliepiper/summaries
...
Add min/max timings, new "summaries" example.
2025-04-08 17:51:36 -04:00
Allison Piper
2ba8acd4ea
Add example that demonstrates how to add/remove columns from the markdown table.
2025-04-08 21:14:21 +00:00
Allison Piper
94fde7777c
Clean up summary code, add min/max times summaries.
2025-04-08 19:15:25 +00:00
Allison Piper
beca2c0038
Merge pull request #203 from alliepiper/exec_tag_cleanup
...
Clean up unnecessary exec_tags.
2025-04-08 13:35:34 -04:00
Allison Piper
35360614ed
Remove run_once exec_tag.
...
Similar to `no_block`, this is a runtime variable that doesn't need to be encoded statically.
It was not exposed publicly and existing solely as an implementation detail of `state::exec`, introducing unnecessary complexity there.
2025-04-08 17:15:58 +00:00
Allison Piper
851d7aadd0
Make blocking kernel use a runtime option.
...
It's not worth instantiating multiple instances of the measurement class to handle this.
Since there's already runtime option to disable the blocking kernel, the current implementation by default will instantiate both the blocking and non-blocking version of the algorithm for dynamic dispatch.
2025-04-08 17:15:58 +00:00
Allison Piper
52028be94f
Merge pull request #201 from alliepiper/cpu_only
...
Add cpu-only benchmarking support.
2025-04-08 11:39:30 -04:00
Allison Piper
a6df59a9b5
Add support for CPU-only benchmarking.
...
Fixes #95 .
CPU-only mode is enabled by setting the `is_cpu_only` property while
defining a benchmark, e.g. `NVBENCH_BENCH(foo).set_is_cpu_only(true)`.
An optional `nvbench::exec_tag::no_gpu` hint can also be passed to
`state.exec` to avoid instantiating GPU benchmarking backends. Note that
a CUDA compiler and CUDA runtime are always required, even if all benchmarks
in a translation unit are CPU-only.
Similarly, a new `nvbench::exec_tag::gpu` hint can be used to avoid
compiling CPU-only backends for GPU benchmarks.
2025-04-08 11:17:23 -04:00
Allison Piper
1efed5f8e1
Merge pull request #200 from alliepiper/update_deps
...
Update dependencies, drop support for old compilers and MSVC.
2025-04-04 18:45:59 -04:00
Allison Piper
93ea533fd3
Drop support for MSVC.
2025-04-04 22:17:03 +00:00
Allison Piper
1d0daa52ae
Add skip-vdc option to CI.
2025-04-04 17:44:33 -04:00
Allison Piper
7d210614f5
Attempt to suppress system include warnings on MSVC.
2025-04-04 17:44:33 -04:00
Allison Piper
2a25b351ab
Bump required cmake version to 3.30.4 for rapids-cmake.
2025-04-04 17:44:33 -04:00
Allison Piper
a3fb3ce610
Migrate CI to l4 queue.
2025-04-04 17:44:33 -04:00
Allison Piper
15d34106d4
Disable unicode in fmtlib on nvcc + msvc.
...
This doesn't appear to be supported.
2025-04-04 17:44:33 -04:00
Allison Piper
435df5220f
Regenerate devcontainers.
2025-04-04 17:44:33 -04:00
Allison Piper
1a5fa2277e
Drop support for CTK < 11.8 and clang < 14.
...
Newer versions of fmt have a ton of issues building on CTK 11.1, and 11.8 is the next available container we have built for CI. We may still work with some earlier versions, but we do not test them.
We no longer have CI images available for clang < 14, so drop official support.
2025-04-04 17:44:33 -04:00
Allison Piper
8478f7d0bf
Guard fmt def behind nvcc check.
2025-04-04 17:44:33 -04:00
Allison Piper
9d9a30fbd6
Bump devcontainers to 25.06 branch.
2025-04-04 17:44:33 -04:00
Allison Piper
5f6f8a65ee
Enable /utf-8 on MSVC.
2025-04-04 17:44:33 -04:00
Allison Piper
a1acb3e8b2
Update CI matrix and devcontainers.
2025-04-04 17:44:33 -04:00
Allison Piper
4d7b3e8100
Add missing header to test.
2025-04-04 17:44:33 -04:00
Allison Piper
0e8089a246
Disable fmtlib's use of llvm _BitInt, as it is not supported when using nvcc.
2025-04-04 17:44:33 -04:00
Allison Piper
e6705e3114
Update fmtlib/fmt to 11.1.4.
...
Switched away from the rapids-cmake provided version and manually CPM'd it.
rapids-cmake will stop providing fmtlib later this year, and the version currently supported is rather old.
Included the same logic that rapids-cmake currently uses to hopefully provide a smooth transition for edge cases (external fmt, etc).
Added `FMT_SYSTEM_HEADERS=ON` to mark fmt headers as system includes, suppressing any internal warnings.
2025-04-04 17:44:33 -04:00
Allison Piper
5aa5a3c225
Update rapids-cmake to 25.04.
2025-04-04 17:44:33 -04:00
Allison Piper
2d9eafc765
Merge pull request #202 from alliepiper/misc-fixes
...
Misc fixes
pre_msvc_drop
2025-04-04 16:47:25 -04:00
Allison Piper
497eaed1d9
Use correct timer when computing cpu stats in measure_cold.
2025-04-04 20:16:04 +00:00
Allison Piper
618e1f048c
Fix typo in docstring.
2025-04-04 20:14:44 +00:00
Allison Piper
f6af8b9769
Whitespace cleanup.
2025-04-04 20:14:21 +00:00
Allison Piper
c03033b50e
Fix error message when #TypeAxisNames != #TypeAxes. ( #192 )
...
The error message was being generated after moving strings out of `names`, so some of the axis names were blank.
This moves the check + error before any strings are moved.
2024-11-20 13:11:03 -05:00
Bernhard Manfred Gruber
f52aa4b0aa
Distinguish slower, same and faster comparisons ( #190 )
...
Fixes : #178
2024-11-15 12:41:47 -05:00
Georgii Evtushenko
0ce45af043
Plot comparison results ( #90 )
2024-11-13 14:28:04 -05:00
Jordan Jacobelli
92286e1d4a
devcontainer: replace VAULT_HOST with AWS_ROLE_ARN ( #187 )
...
* devcontainer: replace VAULT_HOST with AWS_ROLE_ARN
Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com >
* Update devcontainers base image to support AWS_ROLE_ARN
Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com >
* Bump cuda latest version to 12.6
Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com >
* Replace ubuntu18.04 with ubuntu20.04
Ubuntu 18.04 is not supported anymore
Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com >
* Use DOOD stategy to keep supporting ubuntu18.04
See https://github.com/NVIDIA/cccl/pull/1779
Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com >
---------
Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com >
2024-10-25 11:49:02 -04:00