Commit Graph

688 Commits

Author SHA1 Message Date
Oleksandr Pavlyk
cfb4a9b8b0 Fix for comment grammar 2026-02-02 12:58:15 -06:00
Oleksandr Pavlyk
27d6492355 Factor out check for whether to skip hot measurement to a nvbench::state private method 2026-02-02 12:43:39 -06:00
Oleksandr Pavlyk
cff6df9bb2 Renamed option to --no-batch to stay aligned with tag name 2026-02-02 12:28:39 -06:00
Oleksandr Pavlyk
f1b9d44304 Support --no-batched CLI option
The option sets m_skip_batched boolean member in benchmark_base class.
Methods `bool get_skip_batched()` and `void set_skip_batched(bool)` added.

m_skip_batched is also added to state class. Similarly named methods
are added.

CLI help file documents `--no-batched` option.
2026-02-02 11:32:57 -06:00
Oleksandr Pavlyk
f3fa93f388 Merge pull request #290 from oleksandr-pavlyk/debug/outstanding-changes
Make python nvbench benchmarks interruptible
2026-01-23 15:39:23 -06:00
Bernhard Manfred Gruber
2d4690e07d Merge pull request #298 from bernhardmgruber/ignore_device
Allow to by-pass device section check and compare different devices
2025-12-10 18:24:26 +01:00
Bernhard Manfred Gruber
85548809d6 Allow to by-pass device section check and compare different devices
Fixes: #297
2025-12-10 13:14:50 +01:00
Oleksandr Pavlyk
f6a9b245d3 Only trigger skipping of outstanding benchmarks on KeyboardInterrupt exception, on others benchmakr is to continue execution 2025-12-08 14:46:59 -06:00
Oleksandr Pavlyk
7e9a9a8983 Replace main_arg_run_benchmarks with run_interriptible
This loop uses benchmark.run_or_skip to resolve #284 even
for scripts that contain more than one benchmark, or when
a script with a single benchmark is executed when more than
one device is available.
2025-12-08 14:29:27 -06:00
Oleksandr Pavlyk
8e6154511e Introduce runner->run_or_skip(bool &) and benchmark->run_or_skip(bool &)
These methods take reference to a boolean whose value signals whether
benchmark instances pending for execution are to be skipped.

void benchmark->run_or_skip(bool &) is called by Python to ensure
that KeyboardInterrupt is properly handled in scripts that contain
multiple benchmarks, or in case when single benchmark script is
executed on a machine with more than one device.
2025-12-08 14:24:32 -06:00
Oleksandr Pavlyk
a7763bdd7a Remove debug outputs 2025-12-08 12:25:31 -06:00
Oleksandr Pavlyk
b2a80c92b8 Revert "Scripts to triage 284"
This reverts commit c286199adc.
2025-12-08 11:53:08 -06:00
Oleksandr Pavlyk
ce9a76167f Use nvbench::stop_runner_loop to signal stop of runner loop
Add try/catch around Python calls to improve keyboard interrup
response.
2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
e57f1ecf4c Introduce nvbench::stop_runner_loop exception. If application throws it, runner loop is stopped and other pending benchmark instances are skipped 2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
c286199adc Scripts to triage 284 2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
de471e1d42 Use pybind11==3.0.1, do not use pybind11_add_module 2025-12-05 19:38:11 -06:00
Jerry Hou
f651636501 entropy criterion optimizations (#286)
* entropy criterion optimizations

* online linear regression module

* online regression refactor

* revising ss_tot handling

---------

Co-authored-by: Jerry Hou <jerryhou@fb.com>
2025-12-06 01:02:21 +00:00
Ashwin Srinath
a6995413ac Merge pull request #288 from shwina/wheel-build-and-publish-infra
Initial wheel build and publishing infrastructure
2025-12-04 04:37:07 -05:00
Ashwin Srinath
1d33536ce1 Re-enable other CI jobs 2025-12-03 16:42:30 -05:00
Ashwin Srinath
603a2df445 Remove workaround 2025-12-03 16:23:42 -05:00
Ashwin Srinath
77b7afc3c9 Remove the Python version file 2025-12-03 16:23:14 -05:00
Ashwin Srinath
3af11c8ee7 Expand the CI matrix back 2025-12-03 15:48:40 -05:00
Ashwin Srinath
cadfa7de61 We no longer need to install libnvidia-ml.so 2025-12-03 15:37:20 -05:00
Ashwin Srinath
7ad064ea4f Change to GPU runner for testing 2025-12-03 15:18:39 -05:00
Ashwin Srinath
b7eaf44ca3 Install libnvidia-ml.so.1 in test environment 2025-12-03 14:56:37 -05:00
Ashwin Srinath
c2c34c9378 Temporarily reduce CI matrix 2025-12-03 14:37:23 -05:00
Ashwin Srinath
a293af1d52 Try capturing the Python path before changing directories 2025-12-03 14:15:34 -05:00
Ashwin Srinath
a7f92b7436 Try an inner and outer script 2025-12-03 13:21:53 -05:00
Ashwin Srinath
9746aa14df Maybe fix to test script 2025-12-03 12:47:43 -05:00
Ashwin Srinath
d1efef03bc Fix wheel naming 2025-12-03 11:54:46 -05:00
Ashwin Srinath
618001143b Fixes to test script 2025-12-03 11:41:36 -05:00
Ashwin Srinath
8443a2059c Ensure test jobs find wheels correctly 2025-12-03 11:22:19 -05:00
Ashwin Srinath
f3df4104de Make wheels manylinux compliant 2025-12-03 11:22:12 -05:00
Ashwin Srinath
e15d9ebf58 Lint fixes 2025-12-03 11:07:03 -05:00
Ashwin Srinath
98e0b5994a Introduce build-and-test-python-wheels workflow 2025-12-03 11:06:11 -05:00
Ashwin Srinath
e9cf53a1a4 Add PR workflow for building and testing wheels 2025-12-03 10:30:27 -05:00
Ashwin Srinath
8b2afa6c16 Lint fixes 2025-12-03 10:17:23 -05:00
Ashwin Srinath
29389b5791 Initial wheel build and publishing infrastructure 2025-12-03 10:15:32 -05:00
Bernhard Manfred Gruber
34f1e2a7ee Merge pull request #285 from ashermancinelli/patch-1
Update README.md
2025-11-16 00:11:42 +01:00
Asher Mancinelli
e91559edf0 Update README.md 2025-11-14 14:34:18 -08:00
comeyrd
92d2e01cd1 Profile only the kernels involved in the benchmark (#277)
Co-authored-by: Allison Piper <alliepiper16@gmail.com>
2025-10-21 13:51:37 -04:00
Allison Piper
9b133a94bc Remove GLOBAL tags from fmt targets. (#281)
Fixes #279.
2025-10-21 11:16:44 -04:00
Allison Piper
e6283df79c Build native arch by default, update rapids-cmake. (#280)
* Build native arch by default, update rapids-cmake.
* Add check that CXX and CUDA_HOST compiler match.
  Similar to CCCL, we need these to match to ensure that our warning flag detection functions properly.
* GCC only recognizes `unused-local-typedefs`.
  Clang recognizes both. Ensure that we set this for both compilers.
2025-10-21 10:41:36 -04:00
Bernhard Manfred Gruber
98d701c054 Diff device sections on mismatch in nvbench_compare.py (#278) 2025-10-15 08:58:08 -04:00
pre-commit-ci[bot]
7feda2cf3a [pre-commit.ci] pre-commit autoupdate (#276)
* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v5.0.0...v6.0.0)
- [github.com/pre-commit/mirrors-clang-format: v20.1.7 → v21.1.2](https://github.com/pre-commit/mirrors-clang-format/compare/v20.1.7...v21.1.2)
- [github.com/astral-sh/ruff-pre-commit: v0.12.2 → v0.13.3](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.2...v0.13.3)

* Update matrix + devcontainers.

* Fix typo.

Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Allison Piper <alliepiper16@gmail.com>
Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>
2025-10-07 15:22:36 -04:00
Oleksandr Pavlyk
e7cc1e344c Add an benchmark example parametrized by typename and integral constant. (#275)
* Add an benchmark example parametrized by typename and integral constant.

Add a variation of copy_type_sweep kernel, where block size is controlled
via integral constant passed as template parameter.

* Addressed PR review feedback

* Use auto to gridSize

* Address PR review change request

* Add comment to use ceil_div with CCCL >= 2.8
2025-10-07 13:49:17 -04:00
Oleksandr Pavlyk
b88a45f417 Merge pull request #269 from jayavenkatesh19/main
remove pynvjitlink references in examples
2025-09-17 13:54:36 -05:00
Jaya Venkatesh
0f997271f7 added numba-cuda to requirements
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
2025-09-16 14:54:08 -07:00
Jaya Venkatesh
bfa6a6c7c6 remove pynvjitlink references in examples
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
2025-09-08 16:00:19 -07:00
Allison Piper
4642df7006 Fix sccache checks when running locally. (#268) 2025-09-05 15:50:09 -04:00