Nader Al Awar
711c1e2eb1
Replace all occurences of pynvbench with cuda-bench
2026-01-29 13:25:17 -06:00
Nader Al Awar
5e7adc5c3f
Build multi architecture cuda wheels ( #302 )
...
* Add cuda architectures to build wheel for
* Package scripts in wheel
* Separate cuda major version extraction to fix architecutre selection logic
* Add back statement printing cuda version
* [pre-commit.ci] auto code formatting
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-01-29 01:13:24 +00:00
Ashwin Srinath
a681e2185d
Add multi-cuda wheel build ( #289 )
...
Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com >
Co-authored-by: Nader Al Awar <naderalawar@gmail.com >
2026-01-28 10:37:55 -05:00
Oleksandr Pavlyk
f3fa93f388
Merge pull request #290 from oleksandr-pavlyk/debug/outstanding-changes
...
Make python nvbench benchmarks interruptible
2026-01-23 15:39:23 -06:00
Bernhard Manfred Gruber
2d4690e07d
Merge pull request #298 from bernhardmgruber/ignore_device
...
Allow to by-pass device section check and compare different devices
2025-12-10 18:24:26 +01:00
Bernhard Manfred Gruber
85548809d6
Allow to by-pass device section check and compare different devices
...
Fixes : #297
2025-12-10 13:14:50 +01:00
Oleksandr Pavlyk
f6a9b245d3
Only trigger skipping of outstanding benchmarks on KeyboardInterrupt exception, on others benchmakr is to continue execution
2025-12-08 14:46:59 -06:00
Oleksandr Pavlyk
7e9a9a8983
Replace main_arg_run_benchmarks with run_interriptible
...
This loop uses benchmark.run_or_skip to resolve #284 even
for scripts that contain more than one benchmark, or when
a script with a single benchmark is executed when more than
one device is available.
2025-12-08 14:29:27 -06:00
Oleksandr Pavlyk
8e6154511e
Introduce runner->run_or_skip(bool &) and benchmark->run_or_skip(bool &)
...
These methods take reference to a boolean whose value signals whether
benchmark instances pending for execution are to be skipped.
void benchmark->run_or_skip(bool &) is called by Python to ensure
that KeyboardInterrupt is properly handled in scripts that contain
multiple benchmarks, or in case when single benchmark script is
executed on a machine with more than one device.
2025-12-08 14:24:32 -06:00
Oleksandr Pavlyk
a7763bdd7a
Remove debug outputs
2025-12-08 12:25:31 -06:00
Oleksandr Pavlyk
b2a80c92b8
Revert "Scripts to triage 284"
...
This reverts commit c286199adc .
2025-12-08 11:53:08 -06:00
Oleksandr Pavlyk
ce9a76167f
Use nvbench::stop_runner_loop to signal stop of runner loop
...
Add try/catch around Python calls to improve keyboard interrup
response.
2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
e57f1ecf4c
Introduce nvbench::stop_runner_loop exception. If application throws it, runner loop is stopped and other pending benchmark instances are skipped
2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
c286199adc
Scripts to triage 284
2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
de471e1d42
Use pybind11==3.0.1, do not use pybind11_add_module
2025-12-05 19:38:11 -06:00
Jerry Hou
f651636501
entropy criterion optimizations ( #286 )
...
* entropy criterion optimizations
* online linear regression module
* online regression refactor
* revising ss_tot handling
---------
Co-authored-by: Jerry Hou <jerryhou@fb.com >
2025-12-06 01:02:21 +00:00
Ashwin Srinath
a6995413ac
Merge pull request #288 from shwina/wheel-build-and-publish-infra
...
Initial wheel build and publishing infrastructure
2025-12-04 04:37:07 -05:00
Ashwin Srinath
1d33536ce1
Re-enable other CI jobs
2025-12-03 16:42:30 -05:00
Ashwin Srinath
603a2df445
Remove workaround
2025-12-03 16:23:42 -05:00
Ashwin Srinath
77b7afc3c9
Remove the Python version file
2025-12-03 16:23:14 -05:00
Ashwin Srinath
3af11c8ee7
Expand the CI matrix back
2025-12-03 15:48:40 -05:00
Ashwin Srinath
cadfa7de61
We no longer need to install libnvidia-ml.so
2025-12-03 15:37:20 -05:00
Ashwin Srinath
7ad064ea4f
Change to GPU runner for testing
2025-12-03 15:18:39 -05:00
Ashwin Srinath
b7eaf44ca3
Install libnvidia-ml.so.1 in test environment
2025-12-03 14:56:37 -05:00
Ashwin Srinath
c2c34c9378
Temporarily reduce CI matrix
2025-12-03 14:37:23 -05:00
Ashwin Srinath
a293af1d52
Try capturing the Python path before changing directories
2025-12-03 14:15:34 -05:00
Ashwin Srinath
a7f92b7436
Try an inner and outer script
2025-12-03 13:21:53 -05:00
Ashwin Srinath
9746aa14df
Maybe fix to test script
2025-12-03 12:47:43 -05:00
Ashwin Srinath
d1efef03bc
Fix wheel naming
2025-12-03 11:54:46 -05:00
Ashwin Srinath
618001143b
Fixes to test script
2025-12-03 11:41:36 -05:00
Ashwin Srinath
8443a2059c
Ensure test jobs find wheels correctly
2025-12-03 11:22:19 -05:00
Ashwin Srinath
f3df4104de
Make wheels manylinux compliant
2025-12-03 11:22:12 -05:00
Ashwin Srinath
e15d9ebf58
Lint fixes
2025-12-03 11:07:03 -05:00
Ashwin Srinath
98e0b5994a
Introduce build-and-test-python-wheels workflow
2025-12-03 11:06:11 -05:00
Ashwin Srinath
e9cf53a1a4
Add PR workflow for building and testing wheels
2025-12-03 10:30:27 -05:00
Ashwin Srinath
8b2afa6c16
Lint fixes
2025-12-03 10:17:23 -05:00
Ashwin Srinath
29389b5791
Initial wheel build and publishing infrastructure
2025-12-03 10:15:32 -05:00
Bernhard Manfred Gruber
34f1e2a7ee
Merge pull request #285 from ashermancinelli/patch-1
...
Update README.md
2025-11-16 00:11:42 +01:00
Asher Mancinelli
e91559edf0
Update README.md
2025-11-14 14:34:18 -08:00
comeyrd
92d2e01cd1
Profile only the kernels involved in the benchmark ( #277 )
...
Co-authored-by: Allison Piper <alliepiper16@gmail.com >
2025-10-21 13:51:37 -04:00
Allison Piper
9b133a94bc
Remove GLOBAL tags from fmt targets. ( #281 )
...
Fixes #279 .
2025-10-21 11:16:44 -04:00
Allison Piper
e6283df79c
Build native arch by default, update rapids-cmake. ( #280 )
...
* Build native arch by default, update rapids-cmake.
* Add check that CXX and CUDA_HOST compiler match.
Similar to CCCL, we need these to match to ensure that our warning flag detection functions properly.
* GCC only recognizes `unused-local-typedefs`.
Clang recognizes both. Ensure that we set this for both compilers.
2025-10-21 10:41:36 -04:00
Bernhard Manfred Gruber
98d701c054
Diff device sections on mismatch in nvbench_compare.py ( #278 )
2025-10-15 08:58:08 -04:00
pre-commit-ci[bot]
7feda2cf3a
[pre-commit.ci] pre-commit autoupdate ( #276 )
...
* [pre-commit.ci] pre-commit autoupdate
updates:
- [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v5.0.0...v6.0.0 )
- [github.com/pre-commit/mirrors-clang-format: v20.1.7 → v21.1.2](https://github.com/pre-commit/mirrors-clang-format/compare/v20.1.7...v21.1.2 )
- [github.com/astral-sh/ruff-pre-commit: v0.12.2 → v0.13.3](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.2...v0.13.3 )
* Update matrix + devcontainers.
* Fix typo.
Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com >
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Allison Piper <alliepiper16@gmail.com >
Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com >
2025-10-07 15:22:36 -04:00
Oleksandr Pavlyk
e7cc1e344c
Add an benchmark example parametrized by typename and integral constant. ( #275 )
...
* Add an benchmark example parametrized by typename and integral constant.
Add a variation of copy_type_sweep kernel, where block size is controlled
via integral constant passed as template parameter.
* Addressed PR review feedback
* Use auto to gridSize
* Address PR review change request
* Add comment to use ceil_div with CCCL >= 2.8
2025-10-07 13:49:17 -04:00
Oleksandr Pavlyk
b88a45f417
Merge pull request #269 from jayavenkatesh19/main
...
remove pynvjitlink references in examples
2025-09-17 13:54:36 -05:00
Jaya Venkatesh
0f997271f7
added numba-cuda to requirements
...
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com >
2025-09-16 14:54:08 -07:00
Jaya Venkatesh
bfa6a6c7c6
remove pynvjitlink references in examples
...
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com >
2025-09-08 16:00:19 -07:00
Allison Piper
4642df7006
Fix sccache checks when running locally. ( #268 )
2025-09-05 15:50:09 -04:00
Allison Piper
33a659ecd3
Add CTK 13.0 + Clang20 to CI. ( #266 )
2025-09-03 11:24:07 -04:00